The Data RevolutionPosted: January 15, 2012
Since we were schoolkids we have been taught about the various major milestones of humankind economic development and what made each one of them possible. We have all heard about the agricultural revolution around 6000 BC, the industrial revolution around 1800 and the information technology revolution especially the birth and growth of the Internet around 2000. And this is where the list stops in every school book. However, without many people realizing we are slowly but surely entering the next revolution in humankind’s development. The Data Revolution.
What exactly is the Data Revolution? The Data Revolution is not about simply generating, collecting or storing data. Actually, this is what enables the Data Revolution, the same way that learning how to make steam power engines enabled using them to skyrocket production. The true revolutionary part of data comes from analyzing, processing, distilling and understanding it. A popular term for this is “data mining” although it should really be called “information mining”. Information mining is about predicting what is likely to happen, learning what works and what doesn’t, filtering out noise and keeping the true signal and ultimately assisting with or even taking over the decision-making. Any and every decision-making, whether it is “What restaurant should I go tonight?” and “What is the most likely career path for me?” to “Should I take this medicine now?”. Information mining is not what most people say Business Intelligence. BI is about setting up metrics of interest, monitoring them using historical data and then presenting them for human consumption. Information mining goes above and beyond at looking at the past. It is about understanding the past, throwing out the non-essential and arriving at the holy grail of the information mining process. The decision-making.
Amazon, which is an amazing company by many respects, is one of the strongest proponent of the data-driven decision-making process. In a great paper that you can find here they termed the phrase “Listen to your data, not the HiPPO”. HiPPO is short for Highest Paid People’s Opinion and really this is pretty much how businesses operate worldwide irrespective of their size. Whenever there is any decision about a product or operations or marketing the opinion of the highest paid person trumps everyone and everything else. Should we market a new product? Let’s go ask the CEO or VP of Marketing or whoever is the boss. Why are people buying our products? Let’s go ask the CEO or hire a firm to tell us.
Why do we ask people to learn what works best? Because we believe that they know better or more accurate they are paid under the assumption that they know better. Amazon was one of the first that applied extensively A/B tests, the right method for assessing changes in a website or application or even a process. With the A/B test if you want to have a green or a red “Buy” button or place it on the top or at the bottom, it doesn’t matter what the CEO says. It matters what the data say. Because at the end of the day the CEO’s decision is making you 100 euros while the data-drive decision is making you 120 euros.
Asking the right questions and using the right tools to answer them is not easy, though. You can lie or be inefficient with statistics or create the most sophisticated models that take garbage and spurn out garbage. It takes Data Scientists to know how you are going to handle each situation. And I am not advocating that all of a sudden people will be replaced by robots. I am advocating that in matters of optimizing processes our intuition fails us. Data Science can help us improve our decision-making dramatically, just like inventing the steam power engine made possible mass transportation and mass agricultural production. We can’t go past a certain point of optimization without data.
The Data Revolution is not going to happen with a big bang. There is no silver bullet for that. There are myriad data problems that are connected but solved in distinct ways. This is why you will not read any news article that says “Data Problems solved”, such as the gene sequencing challenge in the 2000 that everyone was expecting the finish to the race. The Data Revolution is a marathon and the race has just begun.