We have the big data tools — let's learn to use them

Recently, at the Apache Spark Maker Community event in San Francisco, I was on a panel and feeling a bit salty. It seems many people have prematurely declared victory in the data game. A few people have achieved self-service, and even more have claimed to.

In truth, this is a tiny minority — and most of those people have achieved cargo-cult datacentricity. They use Hadoop and/or Spark and pull data into Excel, manipulate it, and paste it into PowerPoint. Maybe they’ve added Tableau and are able to make prettier charts, but what really has changed? Jack, that’s what.

Self-service is only step one on this trip to data-driven decision-making. Companies need to know their data before they can consider their choices — but this is still very much data at the edges with a meat cloud in the center.

So far, we use computer aided decision-making and computer-driven process where we have to: advanced fraud detection, algorithmic trading, and rigorously regulated processes (such as Obamacare). Generally, we don’t use it elsewhere.