The value of data science is two-fold:
As with any value, the true measure is time-to-value: how long does it take to get insights? We would much rather get value sooner rather than later. However, we would be willing to trade some accuracy for time. If it takes slightly longer to get slightly more accuracy then it may be worthwhile to do so.
Nevertheless, any team of data scientists needs to spend the vast majority of their time creating the said value. This means that any time a data scientist has to clean, modify or transform data is time away from adding true value. Similar to Shigeo Shingo's approach to optimising manufacturing process resulting in single minute exchange of dye (SMED), there are two types of value addition:
In the ideal world, data scientist will only ever have to deal with internal setup tasks. This implies that considerable operational efficiencies have been setup to automate (or autonomate) external setup. In practice, the data science team will need to have decided on the sets of cleanup, modification and transformation tasks to be performed on the data. As unprocessed data arrives, automatic steps perform the full suite of QC, cleanup, modification and transform tasks and even preliminarily load it into appropriate stores (DBs, object stores etc.) so that data scientists will only ever have to load data into the appropriate statistical models and evaluate which will produce the best results. Even preliminary model selection can be perform on a subset of a handful of statistical models so that the data scientist can begin from an advanced stage of the analysis process.