Data Science is a Process

Benjamin Xie & Greg L. Nelson

We define data science as an iterative process of augmenting human thinking with computational tools to use data to make decisions in/about the world.

Let's decompose that definition:

We define the data science process as 5 steps:

  1. Identify decision context and data science question(s)
  2. Collect and clean data
  3. Model data
    1. Generate explanations and models
    2. Evaluate and interpret explanations and models
  4. Make/Inform decisions
  5. Archive work
We reiterate that this process is iterative and we may jump backwards or out of order to different steps. We also reiterate that this entire process exists within specific contexts, so data scientists much be critical of their work at each step. This means (among other things) considering bias in the data, model, and interpretations and ethical and privacy concerns.