Twitter open-sources Heron for real-time stream analytics
Heron, the real-time stream-processing system Twitter devised as a replacement for Apache Storm, is finally being open-sourced after powering Twitter for more than two years.
Twitter explained in a blog post that it created Heron because it needed more than speed and scale from its real-time stream processing framework. The company also needed easier debugging, easier deployment and management capabilities, and the ability to work well in a shared, multitenant cluster environment.
Apache Storm was the original solution to Twitter’s problems. It was created by a marketing intelligence company called BackType, and Twitter bought the company in 2011 and eventually open-sourced Storm, providing it to the Apache Foundation.
There’s no question Storm has a lot of advantages. It’s scalable and fault-tolerant, with a decent ecosystem of “spouts,” or systems for receiving data from established sources. But it was reputedly also hard to work with and hard to get good results from, and despite a recent 1.0 renovation, it’s been challenged by other projects, including Apache Spark and its own revised streaming framework.
Rather than reuse an existing software project, Twitter elected to start from scratch with a container- and cluster-based design, outlined in a paper released last year. The user creates Heron jobs, or “topologies,” and submits them to a scheduling system, which launches the topology in a series of containers.
The scheduler can be any of a number of popular schedulers, like Apache Mesos or Apache Aurora. Storm, by contrast, has to be manually provisioned on clusters to add scale.
One smart decision by Twitter early on was to make Heron backward-compatible with Storm’s API. This was a practical choice on Twitter’s part, since it meant that existing Storm spouts and bolts could be reused in Heron. But it means anyone else with an existing investment in Storm can make the switch to Heron with less effort than it would take to make use of another project.
That should give existing Storm users some incentive to check out Heron. Twitter claims it’s been able to gain anywhere from two to five times an improvement in “efficiency” (basically, lower opex and capex) with Heron, and now others looking for a way to speed up their stream processing can find out for themselves.
Source: InfoWorld Big Data