Data as Cost — Big Cakes are Nice if you Like Cake

If data is cost, and algos are a commodity, what is it that we should value in the coming age of automated decision making?

Mikko
11 min readJul 24, 2017

Once upon a time, in a land of rainbow-colored sponge cakes with bubbles, Big-IT companies realized they had a problem; their customers were becoming increasingly reluctant to keep pumping money into the IT stuffs they were peddling. Simply put, FORTUNE500 companies were no longer responding to words such as “infrastructure” or “integration”. In one part, they had already spent way too much money on both, on the other, they had probably started to realize that both ideas were rooted in the benefit of the IT-companies as opposed to their own interests. So the IT-companies did what a respectable purveyor of stuffs do — they sent an urgent message to their friends the Big-Consultancy.

Being master brewers, the Big-Consultancy came up with the perfect mix of emotion and logic, something so irresistible that it would light the world on fire. Not only it would make it seem like all the mindless past spending on IT stuffs was meaningful, but it would justify even wilder spending going forward. In fact, it would make perpetually growing spending seem not only like a good idea, but a necessity.

Indeed, if such a thing of marvel as a perpetual kool-aid brewing machine was possible, the Big-Consultancy was sure this must be it. They decided to call it “Big Data.” Big-IT was over the moon for this new marvelous contraption.

What the (Big Data Kool-aid) Package Labeling Forgot to Mention

The basic idea of Big Data has its roots in a world where data was scarce. In which light the proposition of “big” data seems to make sense; if you have a lot of something that is scarce, you will have some kind of competitive advantage as a result.

The other idea at the heart of “Big Data”, is the just-in-case mentality. An idea, which more developed industries such as car manufacturing, clothing, and paper industry had already proven to be detrimental to business. In the case of car manufacturing, the moment car sales started to slow down, Toyota invented the just-in-time manufacturing principle. Today most goods are produced just-in-time as opposed to just-in-case.

When it comes to data, the difference in ‘just-in-case’ vs. ‘just-enough’ (same as just-in-time for physical goods) is staggering.

  • Store data for one second instead of one day = 86,400x performance gain
  • Store data for one second instead of one year = 31,536,000x performance gain

It’s quite typical that in analytics platforms, data is stored for a year or more. Mostly for the purpose of having it available just-in-case for displaying vanity metrics or for some other mundane purpose.

Data as Cost

Actually, data in itself is nothing but cost. Unless you do something to extract value out of it and make it worth something, data is nothing but an expense. In order to prove this, we can perform a simple experiment:

Set up an organization, take up an AWS instance and put some data on it and let it be without doing anything to it

The first thing you will find is that you will accumulate costs without generating any value.

The second thing you will find is that if you go and try to sell what you have set up to someone, as is, nobody will buy it.

The third thing you will find is that if you try to convince the IRS that what you have in your setup is an asset, they will not see it like that.

The second and third points show how having data is starkly different from having oil or minerals in the ground.

The third point is slightly different for an individual who is not required to adhere to what is referred to as the double-entry bookkeeping method. For organizations, first, we have to consider that there are things that are considered negatives, and things that are considered positives. For negatives, there are two kinds; operational expenses (OPEX) and capital expenses (CAPEX). Capital expenses are generally preferred for the fact that they are not “just” expense, but are investments that create assets on the other side of the books (in the positives).

The setup in our experiment will be considered an OPEX. It is the money going out, without leaving any value behind. At least that is how it plays out in accounting.

The most pertinent of all the catchphrases that came with this new kool-aid, one we’ve all heard to the death, is that data is the new oil. Well, it's not.

An actual McKinsey marketing image

Let’s go back to our experiment, and now imagine a scenario where you have a piece of land, and somehow can prove there is oil deposited there. The land you are buying is a capital investment, and it goes into the CAPEX side of your books. That means it is treated as an asset on the positive side of your books. As long as there is as much as a whiff of oil, you’ll have offers coming in to buy the land off you as fast you can turn them down. Even the IRS will not debate this.

No, it’s not.

There is a more recent version, which says that data is the new oil, but it needs to be refined first. Well, then it’s clearly not oil in terms of value, because oil does not need to be refined to have value. It just has more value when it’s already refined. Whereas data is just cost when it's not refined, and can not be considered an asset.

No again. Also, note the slide; a leading PR firm quoting a notable hype firm.

Back in the snake oil times of the late 19th century before Edward Bernays introduced propaganda to the US, the equal of this would have been a snake oil seller quoting another snake oil seller. Perhaps, in the not so distant future, we will look back to this era and its communication (regarding Big Data) in similar amusement as we look back to the snake oil years now.

An Age of Automated Decision Making

Some people like to freak out about machine intelligence. But actually, history shows that we should not be afraid of automation. Take for example building the pyramids, a lot of people died doing it. There had to be a lot of slaves for the whole project to make sense in the first place. In Dubai, they built a building almost one kilometer high, and not that many people died building it. Some of the most iconic constructions of today are built without anybody dying. Also, instead of slaves, migrant workers from very poor countries earn a living that feeds multiple generations of their family in the country of their origin.

Over the past 500 years or so, automation has developed in an easy to understand way:

  • the steam engine automated power generation
  • production lines automated mechanical labor
  • the computer-automated data processing
  • communications technologies automated access to information

Next, machine intelligence will automate access to decisions. This is potentially great news, as we humans are not wired for making decisions.

Read more about the problem of human decision making in Thou Shalt Not Fear Automatons.

As a result, as is the case in each of the previous major developments in the automation “tech tree”, human capital will become more available for making contributions that only humans can make. Contributions we can’t even imagine yet. For example, we might finally be able to put our attention and our incredible ability in pattern detection and pattern making, into solving the mother of problems. Problem-solving itself.

I have written about this in Solving the Problem of Problem Solving.

Human ingenuity will be as valuable, no, more valuable, as before. In this regard, it is essential that JCR Licklider’s seminal work on Man-Computer Symbiosis gets more attention, and that researchers consider his vision in the modern-day context. What we call today “data scientist”, is the kind of individual that has the potential for creating a symbiotic relationship with computers. Amidst all the hype surrounding the topic, to avoid doubt in respect to what “data science” actually is, in the I-COM Data Science Board that I chaired for some years, we came up with the following description for data science:

Data Science involves the theoretical and practical approach to extracting value, which is knowledge and insights, from data.

The practice of data science can be broken down into four components; the data, the algorithms that are used for processing it, the systems where the data is stored and the people who operate those systems. In another way, the breakdown is into two; data, algorithms, and storage are computer, and people are man. The key point is the symbiosis of the two.

Actually, the idea of computer-man symbiosis was showcased for the first time in a significant way soon 80 years ago, by no other than Alan Turing. In fact, he was the first true modern-day data scientists we can readily name. This is due to the process in which Mr. Turing had developed a theory based on patterns, and then developed an instrument around his theory using computer technology. Working symbiotically with his device, he could do what others could not. He cracked the Nazi code, in some part helping the UK win its war against Hitler.

The essential point here is to understand the significance of humans. To understand that the more technology evolves, the more significance of humans will be highlighted. The less human capital is bogged down in grunt work, the more it can thrive. In this light, it is very important to understand that any piece of technology that is widely available with a low barrier of entry, is a commodity. Actually, the most advanced deep learning platforms of today — Keras, Tensorflow, and Pytorch — will seem as impressive to people in the not so distant future, as a piece of rock seems to us today.

Big Cake is Nice Only if You Like Cake

Because unadulterated data is nothing but cost, then it seems accurate to say that “data”, actually is a problem. This means “big data” is a “big problem”. The fact that there is so much data is not an opportunity, it is an obstacle.

To deal with the obstacle, we have commodity tools in form of algorithms and libraries that make managing algorithms and data more straightforward. To use a comparison with building, if data are the nails, then algos are the hammer, and the various management tools are the toolbox. Even if there is a robot of some sort that hits the nails, there is a human who designs the robot. If there is a deep learning system that designs the robot, there is a human who designs a deep learning system. If there is some sort of advanced machine intelligence that designs the deep learning system, then there is a human that designs that…ad infinitum.

This seems to be the essential point of automation; as much as we humans came from the earth and can’t escape that, the machine came from us human and can’t escape us. This way, the true value never shifts away from human ingenuity, it just gets channeled in ever subtler ways. This is great news for us humans as we are incredibly subtle beings. Further, it’s not just the question of humans doing design work, but also operations. Somewhere, high enough in the automation construct, there is always a human pulling some sort of a lever. Again, it is just a question of subtlety. We’ve already come a far way from the grossness of the lever pulling back in the pyramid building time.

Fundamentally “science” refers to the human ability for pattern detection, structuring ideas based on it, and articulating those ideas to others. Once you add “science” to data, it stops being an obstacle and becomes an opportunity. Broadly speaking, Data Science is the opportunity associated with data.

It has to be better understood, that it is not in data we should invest in, as much as we should invest in data science. Meaning that we should invest in people more than anything. Most importantly, we must invest in completely changing the early education system. Particularly our math education. As Paul Lockhart put it so eloquently:

“If I had to design a mechanism for the express purpose of destroying a child’s natural curiosity and love of pattern-making, I simply wouldn’t have the imagination to come up with the kind of senseless, soul-crushing ideas that constitute contemporary mathematics education.”

The Proof is in the Pudding They Say

Let’s close off by bringing everything down to a very simple line of reasoning.

When you are making investments in data, and none in data science, you have no value at all.

Even if you weren’t making any investment in data, you could still invest in science and create theories, and find value.

Similarly, we can prove data as a problem or cost. When you have no data, you have no cost.

When you have data but no data science, you have just cost.

If we all agree that technology and algos are (or will soon be) commodities, this clearly shows that data science is the value, because with it in the equation we can negate data’s problematic cost generating nature. and turn data into value.

In this light, it seems fair to suggest that the data scientist is to the decision age what the engineer was for the industrial age or the programmer for the information age.

Thank you so very much for taking the time to make it this far :) I’m happy we can share this information together.

You might also like my commentary on data visualization:

…or Decoding Intelligence an interview I did recently covering the topic of creativity and machine intelligence.

Finally, a 5-minute video on Secret History of Data Science, my last (and final) public talk on covering the topic of data and technology.

--

--

Mikko
Mikko

Written by Mikko

Worked with machine intelligence for 15 years, and built the interwebs for 25. Nothing here is my own.

Responses (1)