Wharton OIDD 245: Analytics & the Digital Economy


Course Objectives

The goal of this course is to provide students with hands-on experience with the world of data science projects. In doing so, a course objective is to ensure that students who complete the course are comfortable in any business or policy environment where data are extensively used to inform strategic decision-making. Students should leave the course with an understanding of what is required to build data products, and with the confidence that they have the skills necessary to acquire, analyze, and communicate insights in a data rich environment.

The course is oriented around hands-on in-class exercises, homeworks, and labs. Students will be expected to leave the class with a level of proficiency in modern data analysis tools. Broadly, here’s what you’ll learn from the course, and why these things are important:

Projects throughout the course will reinforce your learning of how to use data analysis to solve business problems. We focus on working with large, unstructured data sources and gain experience with introductory machine learning concepts. Students will spend time inside and outside of the classroom combining data and code to develop data products for a number of new industries, including finance, the restaurant industry, and health care.

At the end of the course, students will be expected to complete an advanced data project, which involves acquiring data from an online web property (e.g. Uber, Facebook) through an API and developing an interactive data visualization. Students who complete this course should have the necessary tools to begin building a portfolio of data science projects that they can share online with future employers through platforms such as GitHub.

Course Overview

Over the last decade, there has been a dramatic rise in the use of tech skills and data analytic thinking to solve business problems in many domains, including finance, HR, policy, and strategy. As a result, the modern “analytic leader” increasingly requires the use of technology, statistics, and data analysis skills to facilitate business analysis. This includes knowing how to a) effectively frame data-driven questions, b) analyze data, and c) use a new generation of tools that are becoming available to acquire, analyze, interpret, and communicate insights derived from data. Students that take this course will engage with the world of data analysis using tools such as Tableau and R that are becoming increasingly popular in industry.

The first half of the course is designed for students with limited experience with data analysis projects, and while familiarity with R, via courses such as STAT 405 or STAT 470, will be ideal preparation, students with other programming exposure can pick up the required skills via review sessions and self-instruction. The second half of the course will extend students’ experience to industry applications of text mining and machine learning and require students to work with more unstructured data.

Throughout the semester, each week of the course will be devoted to analysis of a data set from a particular industry (e.g. HR, sports, fashion, real estate, music, education, politics, restaurants, non-profit work), which we will use to answer business questions by applying analytic techniques. Beyond applications of data tools and methods, a learning goal of this course is exposure to how data is changing decision-making in different industries. The course is extremely hands on, and each week focuses on the application of a particular set of tools or analytic methods. Limited time will be devoted to lectures. Most class time will be devoted to supervised work on weekly data projects. Through these exercises, students are expected to become proficient at applying data to business decisions and at effectively analyzing big data sets to inform decisions about business problems using data analysis tools.

Course web site

We will be using Canvas to submit assignments and receive grades. All course information will be posted on the course website. Course communication will be primarily through Slack.

Required textbooks and software

There is no textbook. Occasional readings will consist of selected online content which will be posted on the course site. As part of your homework, you will also be expected to complete some online courses that supplement what we do in class. The majority of the homework requirements involve working on data analysis projects.

Deliverables and grading

During this course, you will be assigned a number of hands on data projects which you will spend time on both in class and out of class. You are expected to participate in classroom discussions (there is more information about participation below). The breakdown of points is as follows:

Data Labs 15%
Individual Homeworks 20%
Assessment Exam 20%
Data Projects 30%
Professionalism + Participation 15%

With each project, you will be provided with a set of guidelines. You can expect to use various data analysis tools extensively, including R and Tableau. We may also, to a limited extent, explore the use of Python and SQL for data analysis/visualization.

In corporate America, you will be expected to present your analytic findings and make a recommendation. Therefore deliverables may include short, informal analyses and an accompanying recommendation.

Group projects will be completed in small groups (two to three students, no more than three). You may also be asked to evaluate the contribution of each of your team members after the group project.

The classroom presentation and discussion presents a unique opportunity for you to develop and enhance your confidence and skills in articulating a personal position, sharing your knowledge, and reacting to new ideas. All of you have personal experience that can enhance our understanding of this subject, and we want to encourage you to share that experience.

Participation and Professionalism

This course, like many other courses at Wharton, uses learning methods that require active involvement (e.g. attendance, participation in discussions, and in-class exercises). Not only is this the best way to learn, but it also develops your communication and presentation skills. Regular attendance, participation, presentations, and in general, presenting yourself professionally are all very important, and are an important part of your grade. Active participation requires good preparation—thoughtful completion of homework before class is essential. We recognize that expressing viewpoints in a group is difficult, but it is an important skill for you to develop. We will do what we can to make this as easy as possible. Remember though that only regular and insightful contributions will be rewarded.

The grade we assign for your class participation and attendance is a careful, subjective assessment of the value of your input to classroom learning. We keep careful track of attendance, your contributions towards each class session, and these contributions can include (but are not restricted to) raising questions that make your classmates think, providing imaginative yet relevant analysis of a situation, contributing background or a perspective on a classroom topic that enhances its discussion, providing thoughtful feedback on the presentations of other students, and simply answering questions raised in class. A lack of preparation, missing classes without justification, negative classroom comments, or improper behavior (such as talking to each other, sleeping in the classroom or walking in and out of the class while the lecture is in process) can lower this grade.

Grading Guidelines

At Wharton, we strive to create courses that challenge students intellectually and that meet the Wharton standards of academic excellence. If you believe that an assignment or project grade you received was unjustified, you can appeal the grade. To appeal the grade you must write a one-page explanation as to the reason for your appeal and hand it along with your graded assignment back to the TA responsible for that assignment. Please think twice before appealing a grade: the TA will completely re-grade the assignment, which may increase your grade, but may also lower it (e.g., if the TA catches more mistakes the second time around). If after re-grading you feel that your grade was again unjustified, you can appeal the grade with the instructor.

Points will be deducted from late assignments, labs, or projects at the rate of a 20% penalty for each day the submission is late.

Overview of Course Schedule

Session Topic Date Due
1 Introduction to course Jan 17 Lab 0: Setup
2 Introduction to Tableau Jan 22
3 Lab 1: Citibike Jan 24
4 Web scraping Jan 29
5 Data wrangling Jan 31 Lab 1: Citibike
6 Lab 2: Moneyball Feb 5
7 R review with applications: Part I Feb 7
8 R review with applications: Part II Feb 12 Lab 2: Moneyball
9 Lab 3: Baby Names + R Markdown Feb 14
10 Lab 3: Baby Names Feb 19
11 In class exam Feb 21
12 Packages and API's Feb 26
13 Data project 1 presentations Feb 28 Data project 1
14 Tidying data (tidyverse) Mar 12
15 Web scraping and visualization in R Mar 14
16 Datathon 1: In-class challenge Mar 19
17 Introduction to machine learning Mar 21 HW 1: Rats!
18 Machine learning and A/B testing Mar 26
19 Datathon 2: In-class challenge Mar 28
20 Text mining Apr 2 HW 2: Experiments
21 Lab 4: Yelp reviews Apr 4
22 Lab 4: Yelp reviews Apr 9
23 Applications of machine learning Apr 11 Lab 4: Yelp Reviews
24 Lab 5: Peer to peer lending Apr 16 Final project proposals
25 Datathon 3: In-class challenge Apr 18
26 Coding in the cloud: Github and AWS Apr 23 HW 3: News Analytics
27 Data project 2 presentations Apr 25
28 Data project 2 presentations + wrapup Apr 30 Data project 2