A Brief History of Kafka, LinkedIn’s Messaging Platform

A Brief History of Kafka, LinkedIn’s Messaging Platform

Apache Kafka is a highly scalable messaging system that plays a critical role as LinkedIn’s central data pipeline. But it was not always this way. Over the years, we have had to make hard architecture decisions to arrive at the point where developing Kafka was the right decision for LinkedIn to make. We also had to solve some basic issues to turn this project into something that can support the more than 1.4 trillion messages that pass through the Kafka infrastructure at LinkedIn. What follows is a brief history of Kafka development at LinkedIn and an explanation of how we’ve integrated Kafka into virtually everything we do. Hopefully, this will help others that are making similar technology decisions as their companies grow and scale.

Why did we develop Kafka?

Over six years ago, our engineering team need to completely redesign LinkedIn’s infrastructure. To accommodate our growing membership and increasing site complexity, we had already migrated from a monolithic application infrastructure to one based on microservices. This change allowed our search, profile, communications, and other platforms to scale more efficiently. It also led to the creation of a second set of mid-tier services to provide API access to data models and back-end services to provide consistent access to our databases.

We initially developed several different custom data pipelines for our various streaming and queuing data. The use cases for these platforms ranged from tracking site events like page views to gathering aggregated logs from other services. Other pipelines provided queuing functionality for our InMail messaging system, etc. These needed to scale along with the site. Rather than maintaining and scaling each pipeline individually, we invested in the development of a single, distributed pub-sub platform. Thus, Kafka was born.

Kafka was built with a few key design principles in mind: a simple API for both producers and consumers, designed for high throughput, and a scaled-out architecture from the beginning.

What is Kafka today at LinkedIn?

Kafka became a universal pipeline, built around the concept of a commit log, and was built with speed and scalability in mind. Our early Kafka use cases encompassed both the online and offline worlds, both feeding systems that consume events in real-time and those that perform batch analysis. Some common ways we used Kafka included traditional messaging (publishing data from our content feeds and relevance systems to our online serving stores), to provide metrics for system health (used in dashboards and alerts), and to better understand how members use our products (user activity tracking and feeding data to Hadoop grid for analysis and report generation). In 2011 we open sourced Kafka via the Apache Software Foundation, providing the world with a powerful open source solution for managing streams of information.

Today we run several clusters of Kafka brokers for different purposes in each data center. We generally run off the open source Apache Kafka trunk and put out a new internal release a few times a year. However, as our Kafka usage continued to rapidly grow, we had to solve some significant problems to make all of this happen at scale. In the years since we released Kafka as open source, the Engineering team at LinkedIn has developed an entire ecosystem around Kafka.

As pointed out in this blog post by Todd Palino, a key problem for an operation as big as LinkedIn’s is the need for message consistency across multiple datacenters. Many applications, such as those maintaining the indices that enable search, need a view of what is going on in all of our datacenters around the world. At LinkedIn, we use the Kafka MirrorMaker to make copies of of our clusters. There are multiple mirroring pipelines that run both within data centers and across data centers and are laid out to keep network costs and latency to a minimum.

The Kafka ecosystem

A key innovation that has allowed Kafka to maintain a mostly self-service model has been our integration with Nuage, the self-service portal for online data-infrastructure resources at LinkedIn. This service offers a convenient place for users to manage their topics and associated metadata, abstracting some of the nuances of Kafka’s administrative utilities and making the process easier for topic owners.

Another open source project, Burrow, is our answer to the tricky problem of monitoring Kafka consumer health. It provides a comprehensive view of consumer status, and consumer lag checking as a service without the need to specify thresholds. It monitors committed offsets for all consumers at topic-partition granularity and calculates the status of those consumers on demand.

Scaling Kafka in a time of rapid growth

The scale of Kafka at LinkedIn continues to grow in terms of data transferred, clusters and the number of applications it powers.  As a result we face unique challenges in terms of reliability, availability and cost of our heavily multi-tenant clusters. In this blog post, Kartik Paramasivam explains the various things that we have improved in Kafka and its ecosystem at LinkedIn to address these issues.

Samza is LinkedIn’s stream processing platform that empowers users to get their stream processing jobs up and running in production as quickly as possible. Unlike other stream processing systems that focus on a very broad feature set, we concentrated on making Samza reliable, performant and operable at the scale of LinkedIn. Now that we have a lot of production workloads up and running, we can turn our attention to broadening the feature set. You can read about our use-cases for relevance, analytics, site-monitoring, security, etc., here.

Kafka’s strong durability, low latency, and recently improved security have enabled us to use Kafka to power a number of newer mission-critical use cases. These include replacing MySQL replication with Kafka-based replication in Espresso, our distributed document store. We also plan to support the next generation of Databus, our source-agnostic distributed change data capture system, using Kafka. We are continuing to invest in Kafka to ensure that our messaging backbone stays healthy as we ask more and more from it.

The Kafka Summit in San Francisco was recently held on April 26.

JoelKoshy_LinkedInContributed by: Joel Koshy, a member of the Kafka team within the Data Infrastructure group at LinkedIn and has worked on distributed systems infrastructure and applications for the past eight years. He is also a PMC member and committer for the Apache Kafka project. Prior to LinkedIn, he was with the Yahoo! search team where he worked on web crawlers. Joel received his PhD in Computer Science from UC Davis and his bachelors in Computer Science from IIT Madras.

Sign up for the free insideBIGDATA newsletter.

Source: insideBigData

Tech Complexity Giving IT Professionals Headaches

Tech Complexity Giving IT Professionals Headaches

The management challenges IT teams are most worried about include mobile devices and wireless networks, cloud apps and virtualization, according to an Ipswitch survey.

Two-thirds of IT professionals believe increasingly complex technology is making it more difficult for them to do their jobs successfully, according to a global Ipswitch survey.The goal of the research, in which more than 1,300 respondents were surveyed, was to gain insight into the current IT management challenges facing today’s IT teams, specifically regarding what they need to monitor, how they accomplish it and where they believe improvements could be made.”What we found most surprising was that 88 percent of respondents reported that they want IT management software that offers more flexibility with fewer licensing restrictions,” Jeff Loeb, chief marketing officer for Ipswitch, told eWEEK. “It is surprising because with such a high level of dissatisfaction, we think that vendors would have offered alternative solutions earlier. Also, while vendors have focused on single-pane-of-glass solutions that allow you to visualize complex problems, the underlying software license model has not evolved.”The IT management challenges teams report being most worried about include mobile devices and wireless networks (55 percent), cloud applications (50 percent), virtualization (49 percent), bring your own device (BYOD) (43 percent) and high-bandwidth applications (41 percent), such as video or streaming.

“Business needs are constantly changing, so monitoring needs to be flexible to adapt to these changing business priorities,” Loeb said. “IT teams often have one-off challenges, like troubleshooting a unique problem, so having the flexibility to adapt to these one-offs without having to buy new tools is crucial.”

He noted monitoring flexibility is important so businesses can see where software licenses can be fully utilized without becoming shelfware and unused capacity can be split across technology silos, avoiding waste.IT teams reported they were not monitoring everything that they would like to to ensure control. Top reasons for this include budget (28 percent), lack of staff (18 percent) and the complexity of the IT environment they have to deal with (15 percent).Finally, 54 percent said IT management software licensing models are too expensive, inflexible and complicated to deal with.Overall, the research found IT teams are concerned about losing control of their company’s IT environment as new technologies, devices and requirements are added on a regular basis.”BYOD devices consume bandwidth on networks, which can tremendously slow down performance of business applications and introduce security vulnerabilities,” Loeb noted. “It’s much harder for IT teams to enforce security policies for devices they do not control.”
Source: eWeek

Consumer Software Deals Power Tech M&A Market

Consumer Software Deals Power Tech M&A Market

The industry’s largest transaction to date this year is Cisco Systems’ acquisition of Jasper Technologies for $1.4 billion, according to Berkery Noyes.

The software industry merger and acquisition (M&A) deals volume increased 7 percent, with a total of 523 transactions, over the past three months; however, overall value decreased 81 percent to $21.6 billion from $111.5 billion, according to independent mid-market investment bank Berkery Noyes’ “Q1 2016 Software Industry M&A Report.”Of note, the industry’s largest transaction to date this year is Cisco Systems’ acquisition of Jasper Technologies for $1.4 billion.Aggregate value declined 9 percent on a year-over-year basis, and in the past five quarters, deal volume reached its peak in Q3 2015. Deal value reached its peak in Q4 2015.”In general there have been fewer megadeals, but middle-market transaction volume should continue at a steady pace as acquirers look for innovative technologies to help expand their product offerings,” Mary Jo Zandy, managing director at Berkery Noyes, told eWEEK.

Most notable in Q4 was Dell’s announced acquisition of EMC Corp. for $67.5 billion. If these four deals are excluded, deal value would have only decreased by 15 percent.

“Much of the activity in the consumer software sector was driven by mobile application deals,” Zandy said. “High-profile, mobile-based transactions in Q1 included Microsoft’s announced acquisition of Swiftkey, which provides predictive keyboard technology for Android and iOS devices, with a reported purchase price of approximately $250 million; GoPro’s announced acquisition of video editing apps Replay and Splice for $105 million; and Spotify’s acquisitions of Soundwave and Cord Project, as the digital music service looks to bolster its social and messaging capabilities.”Other notable acquirers were Snapchat with the announced acquisition of Bitstrips, which allows users to create personalized emojis and carton avatars, for a reported $100 million, and Facebook, with its announced acquisition of Masquerade, a face-swapping application.The infrastructure software segment’s deal volume decreased 21 percent in Q1 2016. One noteworthy deal was Micro Focus’ announced acquisition of Serena Software for $540 million.The consumer software segment’s deal volume increased 22 percent in Q1 2016 for its third consecutive quarterly rise, while the business software segment’s deal volume increased 18 percent in Q1 2016.”We expect this momentum to carry on throughout the rest of the year and into 2017, with a focus on companies that can provide new customers, new technologies or access to new markets,” Zandy said.
Source: eWeek

Microsoft is making big data really small using DNA

Microsoft is making big data really small using DNA

Microsoft has partnered with a San Francisco-based company to encode information on synthetic DNA to test its potential as a new medium for data storage. 

Twist Bioscience will provide Microsoft with 10 million DNA strands for the purpose of encoding digital data. In other words, Microsoft is trying to figure out how the same molecules that make up humans’ genetic code can be used to encode digital information. 

While a commercial product is still years away, initial tests have shown that it’s possible to encode and recover 100 percent of digital data from synthetic DNA, said Doug Carmean, a Microsoft partner architect, in a statement.

Using DNA could allow massive amounts of data to be stored in a tiny physical footprint. Twist claims a gram of DNA could store almost a trillion gigabytes of data.

Facebook Rides Mobile Ads to 52 Percent Revenue Surge

Facebook Rides Mobile Ads to 52 Percent Revenue Surge

Q1 net income was $1.51 billion, nearly triple that of the $512 million it profited in Q1 last year.

Facebook continues its winning ways, reporting a whopping 52 percent surge in revenue in its Q1 2016 earnings report to the U.S. Securities and Exchange Commission April 27.The bottom-line numbers for the social network spoke for themselves: revenue of $5.38 billion, up 52 percent over $3.54 billion a year ago; and net income of $1.51 billion, nearly tripling the $512 million it profited last year.Shares of Facebook stock, which have risen 33 percent during the past year, spiked up 9.5 percent to $119.28 in after-hours trading following the earnings release.The results were in stark comparison to those of fellow Silicon Valley superstar companies such as Apple, which reported its first quarterly drop in revenue in 13 years; Yahoo, which lost $99 million last quarter; Twitter, which missed first-quarter revenue expectations; and Google parent Alphabet Inc., which also missed analysts’ projections.

Facebook, which now has 1.65 billion monthly users, continues to ride the strength of its mobile ad sales—which it started selling in earnest in 2012—and the rising popularity of its video ads to the new profitability. The rapidly expanding development of its Messenger platform to connect users with businesses also is gaining traction and is expected to start contributing to the bottom line soon.

Video ads are selling as advertisers channel funds from print and television budgets. Video ads on Facebook cost about $4 per 1,000 views during the first quarter, up from $3.44 in 2015 and higher than the $3.14 average across Facebook, according to marketing technology company Kenshoo.The company also announced it is proposing to create a new class of nonvoting capital stock, known as the Class C capital stock. The proposal is designed to create a capital structure that will, among other things, maintain 31-year-old CEO and co-founder Mark Zuckerberg’s leadership role at the company for years to come, according to the company.

If the Class C proposal is OK’d by shareholders, the company said it would issue two shares of Class C capital stock as a one-time dividend for each share of Class A and Class B stock.Facebook’s success isn’t just attributable to the social network. In fact, analysts were extremely impressed with the company’s other platforms. In particular, they were pleased to see Facebook is starting to make money from its 410 million Instagram users, and argued it could help the company generate an additional $4 billion to $5 billion in the next two years.WhatsApp and Facebook Messenger also are growing rapidly, which analysts say will only add to the revenue the company generates.

Source: eWeek

Qubole and Looker Join Forces to Empower Business Users to Make Data-Driven Decisions

Qubole and Looker Join Forces to Empower Business Users to Make Data-Driven Decisions

Qubole_logoQubole, the big data-as-a-service company, and Looker, the company that is powering data-driven businesses, today announced that they are integrating Looker’s business analytics with Qubole’s cloud-based big data platform, giving line of business users across organizations access to powerful, yet easy-to-use big data analytics.

Business units face an uphill battle when it comes to gleaning information from vast and disparate sources. Line of business users find it challenging to extract, shape and present the variety and volume of data to executives to help make informed business decisions. As a result, data scientists are overwhelmed with requests to access data or provide fixed reports to line of business users, diverting their attention from gathering data insights through statistics and modeling techniques. Furthermore, line of business users become frustrated when they are forced to decipher the output of SQL aggregations created by data scientists.

Qubole and Looker are addressing this issue by integrating the Qubole Data Service (QDS) and Looker’s analytics data platform. The combination gives line of business users instant access to automated, scalable, self-service data analytics without having to rely on or overburden the data science team — and without having to build and maintain on-premises infrastructure.

Data has become essential for every business function across the enterprise, but most big data offerings are still too complicated for line of business users to use, substantially reducing the business impact data can have,” said Ashish Thusoo, co-founder and CEO of Qubole. “Qubole and Looker have similar philosophies that it is essential for businesses to make insights accessible to as many people in an organization as possible to stay competitive. The integration of our offerings serves that very purpose.”

Looker-logoQDS is a self-service platform for big data analytics that runs on the three major public clouds: Amazon AWS, Google Compute Engine and Microsoft Azure. QDS automatically provisions, manages and scales up clusters to match the needs of a particular job, and then winds down nodes when they’re no longer needed. QDS is a fully managed big data offering that leverages the latest open source technologies, such as Apache Hadoop, Hive, Presto, Pig, Oozie, Sqoop and Spark, to provide the only comprehensive, “everything-as-a-service” data analytics platform, complete with enterprise security features, an easy to use UI and built-in data governance.

Our customers are using Looker every day to operationalize their data and make better business decisions,” said Keenan Rice, vice president of alliances, Looker. “Now with our support for Qubole’s automated, scalable, big data platform, businesses have greater access to their cloud-based data. At the same time, Qubole’s rapidly growing list of customers utilize our data platform to find, explore and understand the data that runs their business.”

Sign up for the free insideBIGDATA newsletter.

Source: insideBigData

Redis Collaborates with Samsung Electronics to Achieve Groundbreaking Database Performance

Redis Collaborates with Samsung Electronics to Achieve Groundbreaking Database Performance

Redis today announced the general availability of Redis on Flash with standard x86 servers, including standard SATA-based SSD instances available on public clouds and more advanced NVMe based SSDs like the Samsung PM1725. Running Redis, the world’s most popular in-memory data structure store, on cost effective persistent memory options enables customers to process and analyze large datasets at near real-time speeds with 70% lower cost.

The Redis on Flash offering has been optimized to run Redis with flash memory used as a RAM extender. Operational processing and analysis of very large datasets in-memory is often limited by the cost of dynamic random access memory (DRAM). By running a combination of Redis on Flash and DRAM, datacenter managers benefit from leveraging the high throughput and low latency characteristics of Redis while achieving substantial cost savings.

New next generation persistent memory technology like Samsung’s NVMe SSD delivers orders of magnitude higher performance at only an incremental added cost compared to standard flash memory. Redis collaborated with Samsung to demonstrate 2 million ops/second with sub-millisecond latency and over 1GB disk bandwidth on a single standard Dell Xeon Server, placing 80 percent of the dataset on the NVMe SSD technology and only 20 percent of it on DRAM.

We are happy to contribute to a new solution for our customers, one that shows a 40X improvement in throughput at sub-millisecond latencies compared to standard SATA-based SSDs,” stated Mike Williams, vice president, product planning, Samsung Device Solutions Americas. “This solution – using our next generation NVMe SSD technology and Redis in-memory processing – can play a key role in the advancement of high performance computing technology for the analysis of extremely large data sets.”

Spot.IM, a next generation on-demand social network that powers social conversations on leading entertainment and media websites such as Entertainment Weekly and CG Media is already reaping the benefits of deploying Redis on Flash. Spot.IM’s cutting-edge architecture seeks minimal latency, so the transition from webpage viewing to interactive dialog appears to be seamless. With Redis automatically scaling, highly responsive database, the service is able to easily handle 400,000 to one million user requests a day, to and from third-party websites at sub-millisecond latencies. As Spot.IM scaled out its architecture in an AWS Virtual Private Cloud (VPC) environment, the company turned to Redis on Flash, delivered as Redis Enterprise Cluster (RLEC), to help optimize the costs of running an extremely demanding, high performance, low latency application without compromising on responsiveness. With RLEC Flash, Spot.IM maintains extremely high throughput (processing several hundred thousands of requests per second) at significantly lower costs compared to a pure RAM solution.

Redis is our main database and a critical component of our highly demanding application because our architecture needs to handle extremely high speed operations with very little complexity and at minimal cost” says Ishay Green, CTO, Spot.IM. “Redis technology satisfies all our requirements around high availability, seamless scalability, high performance and now at a very attractive price point with Redis on Flash.”

Redis on Flash is now available as RLEC (Redis Enterprise Cluster) over standard x86 servers, including SSD backed cloud instances and IBM POWER8 platforms. It is also available to Redis Cloud customers running on a dedicated Virtual Private Cloud environment.

Sign up for the free insideBIGDATA newsletter.

Source: insideBigData

BackOffice Associates Releases Data Stewardship Platform 6.5 and dspConduct for Information Stewardship

BackOffice Associates Releases Data Stewardship Platform 6.5 and dspConduct for Information Stewardship

BackOffice_logoBackOffice Associates, a leader of information governance and data modernization solutions, today announced Version 6.5 of its flagship Data Stewardship Platform (DSP) and debuted its newest dspConduct application for comprehensive business process governance and application data management across all data in all systems.

Next-generation information governance is necessary to maximize the value of an enterprise’s data assets, improve the efficiency of business processes and increase the overall value of the organization,” said David Booth, chairman and CEO, BackOffice Associates. “Our continued vision and offerings are designed to help organizations embrace the next wave in data stewardship.”

dspConduct is built on DSP 6.5 – the most powerful data stewardship platform to date.  With this latest release, the DSP continues to drive the consumption and adoption of data stewardship by linking business users and technical experts through the business processes of data.  By introducing new user experience paradigms, executive and management reporting, extended data source connectivity, and improved performance and scale, the 6.5 release continues to expand the platform’s capabilities and reach.

dspConduct helps Global 2000 organizations proactively set and enforce strategic data policies across the enterprise. The solution complements master data management (MDM) strategies by ensuring transactions run as planned in critical business systems such as ERP, CRM, PLM, and others.

We designed dspConduct to extend beyond the traditional capabilities of master data management—bringing today’s business users a single platform that addresses their complex application data landscape with the tools needed to conduct world-class business process governance and achieve measurable business results,” added Rex Ahlstrom, Chief Strategy Officer, BackOffice Associates.

dspConduct helps business users achieve business process governance across all application data found in their organization’s enterprise architecture. The solution empowers users to plan and analyze specific policies for various types of enterprise data—whether customer, supplier, financial, human resources, manufacturing—and then execute and enforce those policies across the organization’s heterogeneous IT system landscape.  Built on BackOffice Associates’ more than 20 years of real-world experience meeting the most complex and critical data challenges, dspConduct and the DSP bring to the market a proven solution to maximize the business value of data.

Additional enhancements available in DSP 6.5 include:

  • Highest performance platform for data stewardship to date
  • Native Excel interoperability through the DSP for a simpler business-user experience
  • Native SAP HANA® connectivity and support for migrations to SAP® Business Suite 4 SAP HANA (SAP S/4HANA)
  • Generic interface layer for complete enterprise architecture interconnectivity
  • Native SAP Fiori® apps for migration and data quality metrics accessible by all stakeholders

BackOffice Associates was recently named a Strong Performer by Forrester Research in its independent report, “The Forrester Wave™: Data Governance Stewardship Applications, Q1 2016.”

Sign up for the free insideBIGDATA newsletter.

Source: insideBigData

Understanding TCO: How to Avoid the Four Common Pitfalls that May Lead to Skyrocketing Bills After Implementing a BI Solution

Understanding TCO: How to Avoid the Four Common Pitfalls that May Lead to Skyrocketing Bills After Implementing a BI Solution

Ulrik PedersonIn this special guest feature, Ulrik Pedersen, Chief Operations Officer at TARGIT,  highlights the constant battle between IT and finance on Total Cost of Ownership (TCO) when it comes to implementing a new BI solution. But, with IT budgets increasingly moving hands from IT departments to specific lines of business that may not be aware of this concept, TCO can quickly become a convoluted quagmire. Ulrik Pedersen joined the TARGIT team as Project Manager in 1999. Since then, he’s taken on the challenge of penetrating the North American market with TARGIT’s Business Intelligence and Analytics solution. Ulrik holds a Master of Science in Economics and Business Administration, B2B Marketing from Aalborg University.

Just about any savvy IT or business professional today understands the value a business intelligence (BI) solution can bring to an organization. From uncovering new sales opportunities to measuring growth to streamlining processes, BI solutions provide many benefits to the organization. However, those benefits come with a price tag. Often the total cost of a BI solution isn’t necessarily in accordance with the value it brings.

Organizations need to think carefully before investing in a BI solution to ensure they are aware of hidden costs. Total Cost of Ownership (TCO) isn’t as simple as just adding up infrastructure plus people. In reality, software only accounts for a fraction of the total cost of a BI project, and there are many other direct and indirect costs that rise steadily up front and over time. Having a full understanding of the time and resources a BI solution will cost your organization beyond the initial price tag is essential. These are the four most common pitfalls IT and business leaders should avoid to drive the most value from a BI solution.

1 – Poor Data Quality

The first step in implementing a BI project is pulling data into the data warehouse from the various other corporate systems such as the CRM, HR, and finance systems. Unfortunately, this is also one of the most time-consuming and costly steps because the data must first be cleansed and brought up to standard.

Cleansing and updating data is a long, arduous process that typically comes with a high price tag by the consultants that have to do it. It doesn’t take long for those consultancy hours to add up in a significantly expensive way.

2 – The Never Ending Project

Otherwise known as “scope creep,” long-stretch projects plague companies who struggle to select the most important data to bring into a BI project. Unfortunately for many of these companies, it’s impossible to truly know which data sets they want until they see the numbers. By then, a consultant or data scientist has already taken the time—and handed over the bill—for incorporating that data.

This results in a seemingly never ending process of starting and stopping the BI project. Worse, it’s not uncommon to see corporate priorities change before any analytics objectives can be obtained, rendering everything already done up until that point useless. The business world is changing so rapidly that a slow BI implementation can mean no BI at all.

3- License Creep

License creep refers to the uncontrolled growth in software licenses within a company. The ultimate goal of any successful BI implementation is to spread the power of analytics to as many users as possible throughout the company. But with many BI solutions, each additional user comes with a price tag, regardless of their level of BI involvement.

Additionally, rolling out an enterprise-wide BI solution usually necessitates additional servers.

It isn’t fair to say license creep is the result of poor project management. Rather, it is a result of unrealistic planning of license cost related to a successfully adopted BI solution. Imagine TCO as a line chart: license creep is where that line takes a dramatic 45-degree projection up from the initial cost. Over time, that final price tag can be double the estimated price was originally quoted.

4- The Under-Utlization Obstacle

A powerful BI and analytics solution is worthless if users aren’t armed with the know-how they need to take advantage of the various levels of tools. Companies are often won over with the words “self-service” only to discover that quite a bit of technical expertise is needed and when business decision makers need to dig in to further details, they need expensive consultants to help.

As a result, an overall under-utilization of the BI platform ensures the ambition of transformation into a data-driven company will never be realized, nor will ROI. Opportunities are lost on multiple scales, including the very basic objective of eliminating different data-truths that are floating around a company and aligning every decision-maker with the true data they need.

The Bottom Line

Don’t fall victim to these common TCO pitfalls. Enter the buying process informed about what should – and what shouldn’t—lie ahead in a successful business intelligence implementation and strategy. The right partner is incentivized to ensure you enter into a plan that works best for the unique needs of your company and works with you for a fast return on investment and long-lasting, mutually beneficial relationship.

Sign up for the free insideBIGDATA newsletter.

Source: insideBigData

Streaming Analytics with StreamAnalytix by Impetus

Streaming Analytics with StreamAnalytix by Impetus

The insideBIGDATA Guide to Streaming Analytics is a useful new resource directed toward enterprise thought leaders who wish to gain strategic insights into this exciting new area of technology. Many enterprises find themselves at a key inflection point in the big data timeline with respect  to streaming analytics technology. There is a huge opportunity for direct financial and market growth for enterprises by leveraging streaming analytics. Streaming analytics deployments are being engaged by companies in a broad variety of different use cases. The vendor and technology landscape is complex and numerous open source options are mushrooming. It’s important to choose a platform that will supply a proven and  pre-integrated, performance-tuned stack, ease of use, enterprise-class reliability and flexibility to protect the enterprise from rapid technology  changes. Maybe the most important reason to evaluate this technology now is that a company’s competitors are very likely implementing  enterprise-wide real-time streaming analytics right now and may soon gain significant advantages in customer perception & market-share. The complete insideBIGDATA Guide to Streaming Analytics is available for download from the insideBIGDATA White Paper Library.

insideBIGDATA_Guide_Streaming_AnalyticsStreamAnalytix is a state-of-the-art streaming analytics platform based on a best-of-breed open source technology stack. StreamAnalytix is a  horizontal product for comprehensive dataingestionacross industry verticals. It is developed on an enterprise-grade scale with open source components including Apache Kafka, Apache Storm and Apache Spark while also incorporating the popular Hadoop and NoSQL platforms into its structure. The solution provides all required components for streaming app-development not normally found in one place, all brought together under this platform combined with an extremely friendly UI.

A key benefit of StreamAnalytix is the multi-engine abstracted architecture which enables alternative streaming engines underneath—supporting  Spark Streaming for rapid and easy development of realtime streaming analytics applications in addition to original support for Apache Storm. Being  able to choose among multiple streaming engines means you can take the risk out of being constrained with a single engine. With a multiengine streaming analytics platform, you can do Storm streaming pipelines and Spark streaming pipelines and interconnect them—using the best engine for the best use case based on the optimal architecture. When new engines become widely accepted in the future they can be rolled into this multi-engine platform.

StreamAnalytix_NEW

The following is an overview of the product and its enterprise-grade, multi-engine open source based platform:

Open source technology

StreamAnalytix is built on Apache Storm and Apache Spark (open source distributed real-time computation systems) and is therefore able to leverage the numerous upgrades, improvements and flow of innovation that are foundational to the global Open Source movement.

Spark streaming

Spark streaming includes a rich array of drag-and-drop Spark data transformations, Spark SQL support, and built-in operators for predictive models with inline model-test feature.

Versatility and comprehensiveness

StreamAnalytix is a “horizontal” product for comprehensive high-speed data-ingestion across industry verticals. Its IDE development environment  offers a palette of applications based on customer requirements. Multiple components can be dragged and dropped into a smart dash-board in order  to create a customized work-sphere. The visual pipeline designer can be used to create, configure and administer complex real-time data pipelines.

Stream_Analytics_StreamAnalytixAbstraction layer driving simplicity

The platform’s architecture incorporates an abstraction layer beneath the application definition interface. This innovative setup enables automatic selection of the ideal streaming engine while also allowing concurrent use of several engines.

Compatibility

Built on Apache Storm, Apache Spark, Kafka and Hadoop, the StreamAnalytix platform is seamlessly compatible with all Hadoop distributions and vendors. This enables easy ingestion, processing, analysis, storage and visualization of streaming data from any input data source, proactively  boosting split-second decision making.

“Low latency” capability and flexible scalability

The platform’s ability to ingest high-speed streaming data with very low, sub-second latencies makes it ideal for use cases which warrant split-second response, such as flight-alerts or critical control of risk factors prevalent in complex manufacturing environments. Any fast-ingest data store can be used.

Intricate robust analytics

StreamAnalytix offers a wide collection of built-in data-processing operators. These operators enable high-speed data ingestion and processing in  terms of complex correlations, multiple aggregation functions, statistical models and window aggregates. For rapid application development, it is possible to port predictive analytics and machine learning models built in SAS or R via PMML onto real-time data.

Detailed data visualization

StreamAnalytix provides comprehensive support for 360-degree real-time data visualization. This means the system delivers incoming data streams instantaneously in the form of appropriate charts and dashboards.

If you prefer, the complete insideBigData Guide to Streaming Analytics is available for download as a PDF from the insideBIGDATA White Paper Library, courtesy of Impetus.

Source: insideBigData