Back to Homepage
Data Science is a Process
Benjamin Xie & Greg L. Nelson
We define data science as an iterative process of augmenting human thinking with computational tools to use data to make decisions in/about the world.
Let's decompose that definition:
"iterative process": While we see data science as a process (explained below), it is very much an iterative one.
We will often find ourselves jumping back to a previous step in the process or jumping "out of order" as the situation demands.
"augmenting human thinking with computational tools": Human thinking and reasoning is at the core of data science.
We want to teach you first and foremost how to think like a data scientist. Computational tools supplement human thinking,
but we reiterate that human thinking is at the core of data science.
"make decisions in/about the world": The purpose of data science is inform decisions.
Because these decisions are very dependent on the contexts they are made in, the contexts data scientists work are very critical.
We define the data science process as 5 steps:
We reiterate that this process is iterative and we may jump backwards or out of order to different steps.
We also reiterate that this entire process exists within specific contexts, so data scientists much be critical of their work at each step.
This means (among other things) considering bias in the data, model, and interpretations and ethical and privacy concerns.
- Identify decision context and data science question(s)
- Collect and clean data
- Model data
- Generate explanations and models
- Evaluate and interpret explanations and models
- Make/Inform decisions
- Archive work