ODPi Publishes First Runtime Specification and Test Suite To Simplify and Expedite Development of Data-Driven Applications

ODPi Publishes First Runtime Specification and Test Suite To Simplify and Expedite Development of Data-Driven Applications

ODPi_logoODPi, a nonprofit organization accelerating the open ecosystem of big data solutions, announced the first release of the ODPi Runtime Specification and test suite to ensure applications will work across multiple Apache Hadoop® distributions.

Designed to make it easier to create big data solutions and data-driven applications, the ODPi Runtime Specification is the first release from the industry-backed organization. While the Hadoop ecosystem is rapidly innovating, a certain degree of diversity and complexity are actually impeding adoption. Founded last year, more than 25 ODPi members are focused on simplification and standardization within the big data ecosystem and further advancing the work of the Apache Software Foundation.

Descending from Apache Hadoop 2.7, the Runtime Specification features HDFS, YARN, and MapReduce components and is part of the common reference platform ODPi Core.

The turbulent big data market needs more confidence, more maturity, and less friction for both technology vendors and consumers alike,” said Nik Rouda, senior big data analyst at Enterprise Strategy Group (ESG). “ESG research found that 85% of those responsible for current Hadoop deployments believed that ODPi would add value.”

Key ODPi Runtime Specification Technical Features

The ODPi test framework and self-certification also aligns closely with the Apache Software Foundation by leveraging Apache BigTop for comprehensive packaging, testing, and configuration. Additionally, more than half the code in the latest Big Top release originated in ODPi.

All ODPi Runtime-Compliance tests are linked directly to lines in the ODPi Runtime Specification. To assist with compliance, in addition to the test suite, ODPi also provides a reference build.

The published specification also includes rules and guidelines on how to incorporate additional, non-breaking features, which are allowed provided source code is made available through relevant Apache community processes.

What’s Next for ODPi

The ODPi Operations Specification to help enterprises improve installation and management of Hadoop and Hadoop-based applications will be available later this year.  The Operations Specification covers Apache Ambari, the ASF project for provisioning, managing, and monitoring Apache Hadoop clusters.

ODPi complements the work done in the Apache projects by filling a gap in the big data community in bringing together all members of the Hadoop ecosystem,” said John Mertic, senior manager of ODPi. “Our members – Hadoop distros, app vendors, solution providers, and end-users – are fully committed to leveraging Apache projects and utilizing feedback from real-world use cases to provide industry guidance on how Hadoop should be deployed, configured, and managed. We will continue to expand and contribute to innovation happening inside the Hadoop ecosystem.”

Comments from Members

Ampool

With its broader, flexible approach to standardizing the Hadoop stack, ODPi is particularly attractive to smaller companies, such as Ampool. Instead of spending testing/qualification cycles across different distributions and respective versions, the reference implementation would really help reduce both the effort and risk of Hadoop integration for us.” – Milind Bhandarkar, Ph.D, founder and CEO, Ampool

DataTorrent

ODPi will simplify developing and testing applications that work across distros and hence lower the cost of building Hadoop-based big data applications. For example, DataTorrent will be able to certify RTS installation and runtime for ODPi and know it will work with multiple platform providers.” – Thomas Weise, Apache Apex (incubating) PPMC member and architect/co-founder, DataTorrent

Hortonworks

At Hortonworks, we aim to speed Hadoop adoption through ecosystem interoperability rooted in open source so enterprise customers can reap the benefits of increased choice with more modern data applications and solutions. As a founding member, we are pleased to see ODPi’s first release become available to the ecosystem and look forward to our continued involvement to accelerate the adoption of modern data applications.” – Alan Gates, co-founder, Hortonworks

IBM

Big Data is the key to enterprises welcoming the cognitive era and there’s a need across the board for advancements in the Hadoop ecosystem to ensure companies can get the most out of their deployments in the most efficient ways possible. With the ODPi Runtime Specification, developers can write their application once and run it across a variety of distributions – ensuring more efficient applications that can generate the insights necessary for business change.” – Rob Thomas, vice president of product development, IBM Analytics

Linaro

Linaro recognizes the importance of ODPi’s work to promote and advance the state of Apache Hadoop and Big Data technologies for the enterprise while minimizing fragmentation and redundant effort. Linaro’s own focus is similar to this in developing open source software for the ARM ecosystem and it makes perfect sense that where these two areas intersect that Linaro and ODPi should work together to ensure ARM is fully supported and that fragmentation is minimized across the industry.” – Martin Stadtler, director of the Linaro Enterprise Group (LEG)

Pivotal

It was a little over a year ago that ODPi was formed, and we have already proved beneficial to upstream ASF projects (Hadoop, Bigtop, Ambari). There’s a need for a stable enterprise-grade platform that is managed as an industry asset to benefit all of the companies driving value from Hadoop and big data. This is why the first release of the ODPi Runtime Specification and test suite is so exciting. It is a big step toward realizing our goal of accelerating the delivery of business outcomes through big data solutions by driving interoperability on an enterprise-ready core platform.” – Roman Shaposhnik, director of Open Source at Pivotal, Apache Hadoop and Bigtop committer and ASF member

SAS

As a founding member, SAS’s support of the Open Data Platform Initiative demonstrates our ongoing commitment to developing innovative applications and solutions for our customers that are compatible with the Hadoop ecosystem. OPDi enables us to remain committed to ensuring our applications work with and exploit the Hadoop distribution of our customers’ choice, while being able to bank on the stability and quality expected in demanding business environments.” – Craig Rubendall, vice president of platform R&D, SAS

Sign up for the free insideBIGDATA newsletter.

Source: insideBigData

Paxata Continues to Redefine the Traditional Data-to-Information Pipeline with New Spring ‘16 Release

Paxata Continues to Redefine the Traditional Data-to-Information Pipeline with New Spring ‘16 Release

paxataPaxata, provider of the Adaptive Data Preparation™ platform for the enterprise, announced the availability of its Spring ’16 product release. Paxata’s latest release bridges the gap between analysts and IT with new intuitive capabilities, providing connected information to every person in the enterprise without compromising on security, scale, and cost efficiency. Spring ’16 also enables analysts to collaboratively explore and prepare all of their data no matter the source or format.

Our investigations involve a great deal of unknowns in the data, and our customers turn to us to make sense of it,” said Conrad Mulcahy, Associate Managing Director and Director of Data Analytics, K2 Intelligence. “Paxata’s Spring ’16 release allows K2 to do a highly sophisticated MRI on the data. Paxata already showed us hard tissue versus soft tissue, but now we can distinguish between different kinds of soft tissue. Granular observations can be made on the data at an early stage with all of the new capabilities: sophisticated sampling options, cluster and edit, column search and support for nested files. Paxata keeps us from going in the wrong direction early on, keeps us focused, and gets the dialogue with the client headed in the right direction. It’s hard to put a price on how valuable that is for us as investigators, having our clients know that we’re not wasting their valuable time or resources.”

Paxata’s new release serves as another milestone in Paxata’s mission of delivering connected information to every person in the enterprise, without compromising on security, scale, and cost efficiency. Key features of Paxata’s Spring ’16 release includes:

  • Advanced filtergrams for comprehensive data profiling with semantic-awareness of timestamp and numeric data, automatically suggested intelligent visualizations and custom bucketing
  • Smart integration of complex nested JSON/XML data and Hadoop compressed files – unfolded, flattened and ready for multi-structured data analysis to address IoT and other high-value use cases
  • Granular searching across all columns of wide datasets and in every cell value for patterns, outliers and duplicate values
  • New options for iterative and flexible data discovery with smart statistical selections of datasets at any scale

Cloudera is committed to advancing Hadoop as a mainstream platform that improves customer experiences and drives new revenue streams through highly scalable, more intelligent storage and processing capabilities,” said Tim Stevens, vice president, Business and Corporate Development at Cloudera. “Paxata continues to deliver on the promise of the Hadoop ecosystem with numerous joint customers who have amplified the benefits of their Cloudera platform by making it accessible through Paxata’s connected information platform for self-service data quality, integration, governance and collaboration.”

In addition to providing quick access to data, the new release provides IT-specific controls to support governance, security and scale, including:

  • Visual column-lineage for detailed and understandable traceability
  • REST API for SAML for complete integration into the IT environment
  • Ability to use analyst projects as repeatable “recipes” to build into ETL, virtualized views or data quality dashboards

Since we began the self-service data preparation revolution, we set the pace for delivering major advancements against our roadmap. With every quarterly release, we ask two questions, the first being ‘how do we make the life of the analyst easier so they can go from raw data to the right information regardless of analytic use case?’” The second is ‘how do we lead the industry in moving from legacy scale-up, on-premise, relational worlds to distributed, elastic cloud, scale-out architectures?’” said Prakash Nanduri, Co-Founder and CEO of Paxata. “Every major Fortune 1000 corporation is moving to this new world and Paxata is leading the way. The Spring ’16 release is another major advancement in this transformation. I am proud of the hard work of our team, customers and partners.”

Additional details about the Paxata Spring ’16 release can be found HERE.

Sign up for the free insideBIGDATA newsletter.

Source: insideBigData

Manage Supercharges Its Demand-Side Platform for Mobile Advertising With Aerospike NoSQL Database

Manage Supercharges Its Demand-Side Platform for Mobile Advertising With Aerospike NoSQL Database

BigData use caseAerospike, the high-performance NoSQL database company recognized for “speed at scale” leadership and as the NoSQL leader in the digital media and ad tech industries, announced that Manage.com Group Inc. (Manage) has selected Aerospike to power its innovative demand-side platform for mobile advertising. Manage chose Aerospike to support its goals for technology evolution and business growth.

Industry analysts expect mobile ad spend to exceed $65 Billion by 2019, with nearly 50 percent of that revenue generated by display ads.1 This growth will be fueled by the ability of companies like Manage to deliver fast, data-driven solutions that bring mobile advertisers and publishers closer to mobile consumers.

We needed a no-compromise database with speed, scalability, reliability and economy,” said Kai Sung, CTO at Manage. “Aerospike delivers on all counts with a fast key-value store and database infrastructure that allows us to evolve and grow our platform, without the need for a caching system.”

Manage provides fully managed programmatic mobile advertising and real-time bidding (RTB) capabilities in a demand-side platform (DSP) that relies on a distributed database infrastructure with clusters in the U.S., Europe and the Asia Pacific region. Manage requires high availability, automatic failover, efficient scalability and cross data center replication (XDR) to meet the growing needs of its customers. When the company’s previous database solution could no longer keep pace with business demands for scalability and high availability, its development team evaluated several alternatives and adopted Aerospike based on its performance and price advantages. Manage processes more than 40 billion programmatic bid requests every day and handles up to 500,000 queries per second at peak hours. Manage needs to meet a Service Level Agreement (SLA) driven by business requirements of 100 milliseconds for each bid request, and every bid request is made up of many different sub-steps — each of which has its own SLA. For the database it has an SLA of 10 milliseconds. Aerospike routinely provides sub-2 millisecond latency, enabling Manage to meet its business SLA and provide enhanced RTB capabilities for its customers.

Manage is serious about both the performance of its mobile DSP and operational efficiencies behind the scenes,” said Brian Bulkowski, CTO and co-founder at Aerospike. “Their developers are leveraging Aerospike capabilities to meet SLAs and better utilize stored data while reducing manual tasks and storage costs. We’re proud to be an enabling technology partner in the company’s success.”

Through its partnership with Aerospike, Manage has increased its storage capacity by 10x to store more user profile and customer segment data, which it uses to enhance campaign optimization. The faster processing speed of Aerospike’s SSD-based NoSQL database allows Manage to obtain a more accurate real-time calculation of a user’s value for each impression and advertiser, which enables the company to bid more efficiently and price bids more appropriately — ultimately increasing its win-rate for customers. With Aerospike, Manage is able to:

  • Consistently deliver a sub-2 millisecond database SLA, which in turn enables it to meet its business SLA
  • Store 1 billion rich user profiles
  • Process 400,000 writes and 300,000 reads per second
  • Boost storage capacity by 10x with cost savings
  • Automate cluster management, failover and replication
  • Ensure zero downtime

Aerospike is the backbone for storing our user data in the smartest, most efficient and reliable way,” said Sung. “By enabling our expansion to one billion robust user profiles and leveraging the increased richness of these stored profiles, Aerospike gives us a competitive advantage. Now we can bid with more precision and efficiency, price bids more appropriately, and win more inventory.”

Sign up for the free insideBIGDATA newsletter.

Source: insideBigData