We have the big data tools — let's learn to use them
Recently, at the Apache Spark Maker Community event in San Francisco, I was on a panel and feeling a bit salty. It seems many people have prematurely declared victory in the data game. A few people have achieved self-service, and even more have claimed to.
In truth, this is a tiny minority — and most of those people have achieved cargo-cult datacentricity. They use Hadoop and/or Spark and pull data into Excel, manipulate it, and paste it into PowerPoint. Maybe they’ve added Tableau and are able to make prettier charts, but what really has changed? Jack, that’s what.
Self-service is only step one on this trip to data-driven decision-making. Companies need to know their data before they can consider their choices — but this is still very much data at the edges with a meat cloud in the center.
So far, we use computer aided decision-making and computer-driven process where we have to: advanced fraud detection, algorithmic trading, and rigorously regulated processes (such as Obamacare). Generally, we don’t use it elsewhere.
Hundreds of millions of people are sitting in cubicles with a grid on their screen manually typing numbers into a spreadsheet. This manual data labor is the bane of corporate existence. As Peter Gibbons put it, “Human beings were not meant to sit in little cubicles staring at computer screens all day, filling out useless forms.”
We already have the technologies necessary to eliminate this and free humans for the intuitive leaps and creative endeavors they excel at. Yet as a recent New York Times article noted, we mostly use new technology to do the same old thing and do not reap the productivity rewards.
Though we need better tools, the wisdom of the day is that everyone will code, because that’s what the tools require. Truly, that only seems reasonable because Spark still sucks so much (more fairly, it’s a relatively low-level distributed computing framework). It only looks brilliant compared to what we had.
At the same time, Spark isn’t actually a framework for managing and gaining insights from our data. Now, the rabble will start chanting “applications!” Yet having 100 closed-loop applications will quickly lead to more Excelitis.
Instead, it’s time to employ a strategy. As I once said in a discussion about groupware, in a mature business, every email is a little failure, as is every hand-generated report or spreadsheet.
I’ll go further and say every time you have to stare at your phone, it’s a microfailure. In any city, look around, you’ll see hundreds of people missing everything around them while they hold their phones in their hands and stare at a tiny screen. Part of the problem is we’ll still polling and pulling for data. A machine-driven process (designed by people) would instead prompt us: You would know you’re not missing anything and do your job — or better yet, live your life.
Success isn’t more visualizations. Success is the abolition of the PC and the smartphone as we know them. Success is when we’re alerted to data as needed and spend most of our time making creative and intuitive leaps. Self-service, in other words, still indentures us to data labor. The next huge leaps are when we design real systems and go back to living something that looks a lot more like the future envisioned in the 19th century.
To do so, we must use data and the scientific method to make decisions and, more important, create processes and systems to make decisions rather than making them ourselves. We need to create methodologies around doing this rather than hoping the next tool of the day will free us from thinking about how to do this.
We already have the tools we need to get there. It’s time to start using them correctly.
Source: InfoWorld Big Data