Practical Data Science (IDS 720)
Data Science is an intrinsically applied field, and yet all too often students are taught the advanced math and statistics behind data science tools, but are left to fend for themselves when it comes to learning the tools we use to do data science on a day-to-day basis, or how to manage actual projects. This course is designed to fill that gap.
Practical Data Science is a flipped-classroom, exercise and project-focused course. It is designed to give students practical experience manipulating and analyzing real (often messy, error-ridden, and poorly documented) data using the full range of bread-and-butter Python data science tools (like the command line, git, Python (especially numpy and pandas), Jupyter notebooks, and more). By the end of the course, students will be able to:
- Plan and execute a full data science project from planning data manipulations through analysis and presentation of findings.
- Manipulate and analyze data in any format, including cleaning, merging, and summarizing all standard tabular formats and levels of cleanliness, as well as large datasets,
- Identify and resolve data issues using defensive programming practices,
- Setup and manage a data science programming environment on their own computers, including installing Python, managing packages with pip and conda, setting PATH variables, and working with VS Code,
- Collaborate with colleagues effectively using git and github,
- Plan and execute a full data science project from planning data manipulations through analysis and presentation of findings.
Pre-Reqs: Introduction to Programming in Python. This course was developed primarily for Masters of Interdisciplinary Data Science (MIDS) students. All MIDS students complete a 4 week, in-person programming bootcamp in addition to asynchronous Python training prior to their first semester at Duke, so this course assumes a strong background in the Python standard library. If you do not have a solid background in Python programming, take a look at Practical Data Science I & II (IDS 520 and IDS 521) below, which covers very similar material — and some material not in IDS 720 — and begins with the material MIDS students seen during their summer bootcamp.
Causal Inference & Solving Problems with Data (IDS 701)
The aim of this course is to provide students a final jumping off point for students transitioning from their first year of MIDS to their second. To meet the goal of setting MIDS students up for their second year, this course is essentially two half-courses.
Answering Causal Questions
In the first half of the semester, students will learn how to answer causal questions — that is, questions about the likely effects of different courses of action you (or a stakeholder you work for) might wish to undertake. The study of how to answer causal questions is known as Causal Inference, and has wide applicability in data science. Whether one is changing the design of a website, launching an advertising campaign, or administering a new drug or medical treatment, etc., causal inference provides data scientists with a rigorous tool set for making accurate predictions about the effects one can expect to see. In this portion of the class, we will study both experimental and observational techniques for answering causal questions.
Solving Problems with Data
The second half of the class is designed to help students understand how to effectively deploy all the tools they have learned during their first year in MIDS to solve real problems. This part of the class can be thought of as a module on project design and execution through backwards design. In it, we will discuss the role different approaches to data science (e.g., statistical inference, machine learning, causal inference) play in solving problems, as well as important professional concepts like stakeholder management, effective teamwork, and statistical decision making.
Practical Data Science I (IDS 590 in 2025)
Practical Data Science I is a flipped-classroom, exercise and project-focused course. It requires zero prior experience with programming. It begins with an introduction to Python, computational thinking, and the principles of good programming using the 7 Steps method. It then turns its focus to data analysis with a focus on the type of analyses of interest to social scientists, public policy students, and natural scientists. The course provides students with experience manipulating and analyzing real (often messy, error-ridden, and poorly documented) data using the full range of bread-and-butter Python data science tools (like the command line, git, VS Code, numpy, pandas, matplotlib, statsmodels, and more).
As noted above, this course was developed to make the material covered in IDS 720 accessible to Duke students not enrolled in the Masters of Interdisciplinary Data Science (MIDS) program. MIDS students complete a 4 week, in-person programming bootcamp in addition to asynchronous Python training prior to their first semester at Duke, so IDS 720 assumes a strong background in the Python standard library. This class, by contrast, begins with the material MIDS students see during their summer bootcamp, and eventually covers even more than is covered in IDS 720.
Don’t be confused by the 590 numbering! The level will be the same in IDS 590 as in IDS 720. The fact that IDS 720 is a 700-level class and IDS 590 is a 500-level class is not meant to imply one is more serious or rigorous than the other — IDS 720 was created with MIDS in mind, so we scheduled it as a 700-level course number because that makes it “grad only.” But because we want to allow advanced undergraduates to take IDS 590, we scheduled it as a 500-level course number because that allows graduate students and advanced undergraduates to take the class.
Practical Data Science II (Code Forthcoming)
Practical Data Science II is a flipped-classroom, exercise and project-focused course. Building on the computational thinking skills developed in Practical Data Science I, this course introduces students to a range of methods of computational inquiry, including network analysis, geospatial analysis, and natural language processing (NLP). Throughout, the focus will be on developing hands-on experience implementing these methods with messy real-world data to ensure students are prepared to deploy these tools to answer the questions they care about.
Pre-Reqs: Practical Data Science I
Computational Methods for Social Scientists Bootcamp
A computational methods bootcamp for incoming Duke graduate students with the goal of providing students with a foundational understanding of topics including: variable assignment; vectors and matrices; loading, subsetting, cleaning, merging, and collapsing tabular data; plotting; and using loops and functions.
Throughout, the emphasis of the bootcamp is on learning the generalized principles that underlie how R works. Few fields are changing as quickly as computational social science/data science, and so any specific packages or tools we teach you now are sure to be out of date within a few years. But there are a set of fundamental concepts and patterns that are so powerful and flexible that they end up being common to most data science tools, not just across packages in R, but also across languages, and so those are the concepts emphasized.