Data Science Value

Posted 5 months, 1 week ago | Originally written on 9 Jan 2024

The value of data science is two-fold:

  • insights through the interpretation of statistical models, or
  • insights through prediction from statistical models.

As with any value, the true measure is time-to-value: how long does it take to get insights? We would much rather get value sooner rather than later. However, we would be willing to trade some accuracy for time. If it takes slightly longer to get slightly more accuracy then it may be worthwhile to do so.

Nevertheless, any team of data scientists needs to spend the vast majority of their time creating the said value. This means that any time a data scientist has to clean, modify or transform data is time away from adding true value. Similar to Shigeo Shingo's approach to optimising manufacturing process resulting in single minute exchange of dye (SMED), there are two types of value addition:

  • internal setup involves all actual value addition tasks i.e. building the statistical model;
  • external setup involves all tangential value addition tasks required for internal setup tasks such as cleaning, modifying and transform data.

In the ideal world, data scientist will only ever have to deal with internal setup tasks. This implies that considerable operational efficiencies have been setup to automate (or autonomate) external setup. In practice, the data science team will need to have decided on the sets of cleanup, modification and transformation tasks to be performed on the data. As unprocessed data arrives, automatic steps perform the full suite of QC, cleanup, modification and transform tasks and even preliminarily load it into appropriate stores (DBs, object stores etc.) so that data scientists will only ever have to load data into the appropriate statistical models and evaluate which will produce the best results. Even preliminary model selection can be perform on a subset of a handful of statistical models so that the data scientist can begin from an advanced stage of the analysis process.