Take a closer look at your Spark implementation

Take a closer look at your Spark implementation

Apache Spark, the extremely popular data analytics execution engine, was initially released in 2012. It wasn’t until 2015 that Spark really saw an uptick in support, but by November 2015, Spark saw 50 percent more activity than the core Apache Hadoop project itself, with more than 750 contributors from hundreds of companies participating in its development in one form or another.

Spark is a hot new commodity for a reason. Its performance, general-purpose applicability, and programming flexibility combine to make it a versatile execution engine. Yet that variety also leads to varying levels of support for the product and different ways solutions are delivered.

While evaluating analytic software products that support Spark, customers should look closely under the hood and examine four key facets of how the support for Spark is implemented:

  • How Spark is utilized inside the platform
  • What you get in a packaged product that includes Spark
  • How Spark is exposed to you and your team
  • How you perform analytics with the different Spark libraries

Spark can be used as a developer tool via its APIs, or it can be used by BI tools via its SQL interface. Or Spark can be embedded in an application, providing access to business users without requiring programming skills and without limiting Spark’s utility through a SQL interface. I examine each of these options below and explain why all Spark support is not the same.