In this class you'll learn how to think like a data scientist. You'll learn what data scientists do and how they do it. You'll also learn about the contexts in which a data scientist exists. By the end of the course, you should be able to enter any organization and begin to understand the social and technical contexts in which you help make decisions. If you want to be a great data scientist, this is the course for you.Learning objectives for the course:
You should have aspirations to be a data scientist or to work closely with them. Because we'll use data to inform decisions, you should also know:
The prerequisite course, INFO 201 (the Technical Foundations of Informatics), should be suitable preparation for the above. Refer to the INFO 201 online book to refresh your knowledge of the course.
We are available to talk about jobs, careers, graduate school, research, class, taboos, and anything else. Benji's office hours this quarter are Monday and Tuesday 9-10:30 am in MGH-015 (door is locked, so just knock to get in). Greg's office hours this quarter are Wednesday 2-3:20 and Thursday 1-3PM in CSE atrium tables. Occasionally we need to schedule things over it. To guarantee we'll be around, write to us in advance to secure a time.
We will use smartphones and laptops throughout the quarter to facilitate activities and project work in-class. However, research and student feedback clearly shows that using devices on non-class related activities not only harms your own learning, but other students' learning as well. Therefore, I only allow device usage during activities that require devices. At all other times, you should not be using your device. We'll help you remember this by announcing when to bring devices out and when to put them away.
|Week 0 — What is data science?|
Data science is a process
|Week 1 — Decision Making in Data Science|
|10/2||Lecture||Decision-making and Probability|
|10/2||Lab||Automating data science process with an R script|
Decision contexts in data science
Homework 2. Due
|Week 2 — Using probability models to support decisions|
|10/9||Lecture||Building Bayesian Models|
|10/9||Lab||Running Statistical Models|
|10/11||Lecture||Improving modeling decisions using Baye's rule Homework 3. Due Tues 10/17.|
|Week 3 — Cleaning and Selecting Data|
|10/16||Lecture||Bayesian Inference in Action, Data cleaning process|
Applying data cleaning process using Wrangler
Finding and selecting data sources
|Week 4 — Collecting and Making Sense of Data|
|10/23||Lecture||Collecting data from the internet Project Milestone 2: Pilot Study. Due Sun 10/29.|
|10/23||Lab||Practicing web scraping|
|10/25||Lecture||Exploring data using visualizations and models|
|Week 5 — Visualization; Predictive Models|
|10/30||Lecture||Visualization design Project Milestone 3: Project Proposal. Due Sun 11/5.|
|10/30||Lab||Improving ggplot graphics|
|11/1||Lecture||Using data and simulations to decide among predictive models|
|Week 6 — An Introduction to Modeling|
|11/6||Lecture||Modeling as a search for "optimal" parameters|
|11/6||Lab||Making basic models run in R.|
|Week 7 — Parameter Tuning & Feature Selection|
|11/13||Lecture||Selecting model parameters, features|
|11/13||Lab||Fitting model parameters and features|
Fitting model parameters and features (cont'd)
Assigned: Project Milestone 7 & 8: Presentation & Artifact. Due Mon 12/4,
|Week 8 — Making Models Run|
|11/20||Lecture||Common models and how to use them|
|11/20||Lab||Guest Lecture: Data Science at Microsoft (Dr. Winson Taam, Microsoft)|
|Week 9 — Interpreting Big Data|
Work on projects
Where is your data from - Mediocristan or Extremistan? & Problem of Induction
|Week 10 — Project Work and Reflections|
Reflecting on class projects
|Finals week||Homework 4 (Project and Course Reflection) Due 12/14.
No class or finals will be held this week.
There are 100 points you can earn in this class:
We will use the iSchool Standard Grading Scale to convert your grade percentage (as shown in Canvas) to a 4.0 scale.
|≥ 97% → 4.0||90.5 → 3.5||83.9 → 3.0||78 → 2.5||73 → 2.0*||68 → 1.5||62 → 0.9|
|95.7 → 3.9||89.2 → 3.4||82.6 → 2.9||77 → 2.4||72 → 1.9||67 → 1.4||61 → 0.8|
|94.4 → 3.8||87.8 → 3.3||81.3 → 2.8||76 → 2.3||71 → 1.8||65 → 1.2||60 → 0.7***|
|93.1 → 3.7||86.5 → 3.2||80 → 2.7||75 → 2.2||70 → 1.7**||64 → 1.1||< 60 → 0.0|
|91.8 → 3.6||85.2 → 3.1||79 → 2.6||74 → 2.1||69 → 1.6||63 → 1.0|
*: 2.0 is the minimum grade required for any required INFO course to count towards an informatics degree.
**: The UW requires a 1.7 or better for non-degree requirements for undergraduate courses.
***: 0.7 is lowest passing grade in an undergraduate course.
Late work receives no credit unless you can provide a note from a health care professional or provost documenting the reason for your absence. However, you can miss up to 3 activities without penalty and without documentation. This should be enough to allow for sickness, unavoidable travel, or other personal matters.
If you miss a reading quiz due to sickness, you can make up the quiz credit by sending a 250-500 word critique of the reading and submitting it to your Google Drive folder within a week of the quiz you missed. Title the Google doc with the class number and "make up quiz". E.g. "2.3 make up quiz" for the make up quiz for week 2 and class 3/wednesday lecture.
Each day in class we'll practice some skill. You'll get 0.5 points if you engage in and complete the activity. How to get credit for the activity will depend on the activity; sometimes being present will be enough, sometimes being to class on time will be enough, and sometimes you'll have to turn something in.
To access the readings, you will do the following:
You should complete your readings and reflection before at the beginning of each lecture (twice a week). The Google Doc in your personal Drive folder is your submission (not using Canvas for readings). Each class, you'll come prepared to discuss the assigned reading.
The day that each reading is due, we'll do the following:
You will receive 0.5 points for completing the reading and reflection before class (on the Google Doc). You will receive up to another 0.5 points for getting the in-class reading quiz correct. We will give partial credit for partially correct answers on the reading quiz, at our discretion. In total, you can receive up to 1 point per reading.
There will be a few individual homework assignments which are separate from reading assignments and project milestones.
All homeworks are due by 11:59:00 PM PST on the specified date.
The goal of the individual homework assignments is to ensure you an understanding of specific concepts which are critical to your understanding of data science.
The project is split across 8 milestones/assignments, each worth a different amount:
All assignments except the Project check-in meeting are due by 11:59:00 PM PST on the specified date.
The goal of the project is for you to practice the process of data science to make or inform a decision, so you can experience the nuances of formulating a good question, setting up process, constraints, and plans in relation to a context. Note, however, that because the timeline for the project is so short, it won't give you a deep, longitudinal experience with data science, nor will it give you practice with massive complexity or scale. I believe these are experiences best left to practice in industry, as they're very difficult to replicate in the artificial setting of school.
Links to Data Science communities at/near UW:
Links to recommended learning resources (most of which are free)
Links to important UW resources: