IBM promises a one-stop analytics shop with AI-powered big data platform

IBM promises a one-stop analytics shop with AI-powered big data platform

Big data is in many ways still a wild frontier, requiring wily smarts and road-tested persistence on the part of those hoping to find insight in all the petabytes. On Tuesday, IBM announced a new platform it hopes will make things easier.

Dubbed Project DataWorks, the new cloud-based platform is the first to integrate all types of data and bring AI to the table for analytics, IBM said.

Project DataWorks is available on IBM’s Bluemix cloud platform and aims to foster collaboration among the many types of people who need to work with data. Tapping technologies including Apache Spark, IBM Watson Analytics and the IBM Data Science Experience launched in June, the new offering is designed to give users self-service access to data and models while ensuring governance and rapid-iteration capabilities.

Project DataWorks can ingest data faster than any other data platform, from 50 to hundreds of Gbps, deriving from sources including enterprise databases, the internet of things (IoT) and social media, according to IBM.  What the company calls “cognitive” capabilities such as those found in its Watson artificial intelligence software, meanwhile, can help pave a speedier path to new insights, it says.

“Analytics is no longer something in isolation for IT to solve,” said Derek Schoettle, general manager of cloud data services for IBM Analytics, in an interview. “In the world we’re entering, it’s a team sport where data professionals all want to be able to operate on a platform that lets them collaborate securely in a governed manner.”

Users can open any data set in Watson Analytics for answers to questions phrased in natural language, such as “what drives this product line?” Whereas often a data scientist might have to go through hundreds of fields manually to find the answer, Watson Analytics allows them to do it near instantaneously, IBM said.

More than 3,000 developers are working on the Project DataWorks platform, Schoettle said. Some 500,000 users have been trained on the platform, and more than a million business analysts are using it through Watson Analytics.

Available now, the software can be purchased through a pay-as-you-go plan starting at $75 per month for 20GB. Enterprise pricing is also available.

“Broadly speaking, this brings two things to the table that weren’t there before,” said Gene Leganza, a vice president and research director with Forrester Research.

First is “a really comprehensive cloud-based platform that brings together all the elements you’d need to drive data innovation,” Leganza said. “It’s data management, it’s analytics, it’s Watson, it’s collaboration across different roles, and it’s a method to get started. It’s really comprehensive, and the fact that it’s cloud-based means everyone has access.”

The platform’s AI-based capabilities, meanwhile, can help users “drive to the next level of innovation with data,” he said.

Overall, it’s “an enterprise architect’s dream” because it could put an end to the ongoing need to integrate diverse products into a functioning whole, Leganza said.

Competition in the analytics market has been largely segmented according to specific technologies, agreed Charles King, principal analyst with Pund-IT.

“If Project DataWorks delivers what IBM intends,” King said, “It could change the way that organizations approach and gain value from analyzing their data assets.”

Source: InfoWorld Big Data

IDG Contributor Network: Tech leaders must act quickly to ensure algorithmic fairness

IDG Contributor Network: Tech leaders must act quickly to ensure algorithmic fairness

Do big data algorithms treat people differently based on characteristics like race, religion, and gender? Mary Worth in her new book Weapons of Math Destruction and Frank Pasquale in The Black Box Society both look closely and critically at concerns over discrimination, the challenges of knowing if algorithms are treating people unfairly and the role of public policy in addressing these questions.

Tech leaders must take seriously the debate over data usage — both because discrimination in any form has to be addressed, and because a failure to do so could lead to misguided measures such as mandated disclosure of algorithmic source code.

What’s not in question is that the benefits of the latest computational tools are all around us. Machine learning helps doctors diagnose cancer, speech recognition software simplifies our everyday interactions and helps those with disabilities, educational software improves learning and prepares children for the challenges of a global economy, new analytics and data sources are extending credit to previously excluded groups. And autonomous cars promise to reduce accidents by 90 percent.

Jason Furman, the Chairman of the Council of Economic Advisors, got it right when he said in a recent speech that his biggest worry about artificial intelligence is that we do not have enough of it.

Of course, any technology, new or old, can further illegal or harmful activities, and the latest computational tools are no exception. But, in the same regard, there is no exception for big data analysis in the existing laws that protect consumers and citizens from harm and discrimination.

The Fair Credit Reporting Act protects the public against the use of inaccurate or incomplete information in decisions regarding credit, employment, and insurance.  While passed in the 1970s, this law has been effectively applied to business ventures that use advanced techniques of data analysis, including the scraping of personal data from social media to create profiles of people applying for jobs.

Further, no enterprise can legally use computational techniques to evade statutory prohibitions against discrimination on the basis of race, color, religion, gender and national origin in employment, credit, and housing. In a 2014 report on big data, the Obama Administration emphasized this point and told regulatory to agencies “identify practices and outcomes facilitated by big data analytics that have a discriminatory impact on protected classes, and develop a plan for investigating and resolving violations….”

Even with these legal protections, there is a move to force greater public scrutiny — including a call for public disclosure of all source code used in decision-making algorithms. Full algorithmic transparency would be harmful. It would reveal selection criteria in such areas as tax audits and terrorist screening that must be kept opaque to prevent people from gaming the system. And by allowing business competitors to use a company’s proprietary algorithms, it would reduce incentives to create better algorithms.

Moreover, it won’t actually contribute to the responsible use of algorithms. Source code is only understandable by experts, and even for them it is hard to say definitively what a program will do based solely on the source code. This is especially true for many of today’s programs that update themselves frequently as they use new data.

To respond to public concern about algorithmic fairness, businesses, government, academics and public interest groups need to come together to establish a clear operational framework for responsible use of big data analytics. Current rules already require some validation of the predictive accuracy of statistical models used in credit, housing, and employment. But technology industry leaders can and should do more.

FTC Commissioner McSweeny has the right idea, with her call for a framework of “responsibility by design.” This would incorporate fairness into algorithms by testing them — at the development stage — for potential bias. This should be supplemented by audits after the fact to ensure that algorithms are not only properly designed, but properly operated.  

Important in this cooperative stakeholder effort would be a decision about which areas of economic life to include — starting with sensitive areas like credit, housing and employment. The framework would also need to address the proper roles of the public, developers and users of algorithms, regulators, independent researchers, and subject matter experts, including ethics experts.

It is important to begin to develop this framework now, and to ensure the uses of the new technology are, and are perceived to be, fair to all. The public must be confident in the fairness of algorithms, or a backlash will threaten their very real and substantial benefits.

This article is published as part of the IDG Contributor Network. Want to Join?

Source: InfoWorld Big Data

A high-tech shirt made for speed

A high-tech shirt made for speed

Race cars are packed with sensors that constantly send back telemetry data to pit row here. In IndyCar’s number 10 team, the driver is also monitored thanks to a high-tech shirt.

A SHIRT BUILT FOR SPEED

The shirt is based on a fabric called Hitoe, it’s Japanese for single layer, and was developed by engineers at NTT Data, which sponsors the team, and Toray Industries.

In Hitoe, the nanofibers in the shirt are coated with an electro-conductive polymer, so the fabric itself is the sensor. For IndyCar, pieces of Hitoe fabric were attached to the fire resistant shirt that all drivers have to wear by regulation. And the result was this shirt for driver Tony Kanaan.

“The amazing thing about this shirt, it tells me my heart rate, it tells me the stress on my muscles with some of the sensors that I have underneath my forearms, my biceps and my core, which is very unique.”

Data from the shirt feeds into the car’s telemetry system and is sent to pit row for realtime analysis. What the team discovers can help Kanaan compete.

Brian Welling
Support Engineer for 10 car
“We can tell the heart rate realtime, we can tell muscle fatigue on the arms so if he is gripping the steering wheel too hard, and if that is the case we can say let off the steering wheel a bit on the straight away.”

If he grips less where he doesn’t need it, Kanaan can retain strength for the end of the race — giving him an edge at the end of the race.

“We’re learning about the whole atmosphere of this shirt. We’re starting to learn about the fatigure level, when you’re going through a corner, the drivers knew it but we didn’t, they hold their breath a lot, all the way through the corner, you can learn to relax just a little bit more.”

Its use in IndyCar is but one application for Hitoe. A shirt using the fabric is already on sale in Japan and finding use in industry.

“A high-tech, high-tower construction worker, sensor worker, who is hundreds of feet in the air and what they are going through.”

Kanaan sees much greater uses for the shirt too, way beyond IndyCar.

“we’re working on this thing to help other people in hospitals. You can track it with your phone, so you can send patients home with the shirt and they don’t have to suffer through being depressed in a hospital. You’re still ill, but you are being monitored by wearing the shirt and a doctor can have an app or alarm on a phone that can tell if you’re in trouble or not.”

The shirt isn’t available in the U.S. yet, but will be soon.

Source: InfoWorld Big Data

Bossie Awards 2016: The best open source big data tools

Bossie Awards 2016: The best open source big data tools

Elasticsearch, also based on the Apache Lucene engine, is an open source distributed search engine that focuses on modern concepts like REST APIs and JSON documents. Its approach to scaling makes it easy to take Elasticsearch clusters from gigabytes to petabytes of data with low operational overhead.

As part of the ELK stack (Elasticsearch, Logstash, and Kibana, all developed by Elasticsearch’s creators, Elastic), Elasticsearch has found its killer app as an open source Splunk replacement for log analysis. Companies like Netflix, Facebook, Microsoft, and LinkedIn run large Elasticsearch clusters for their logging infrastructure. Furthermore, the ELK stack is finding its way into other domains, such as fraud detection and domain-specific business analytics, spreading the use of Elasticsearch throughout the enterprise.

— Ian Pointer

This article appears to continue on subsequent pages which we could not extract

Source: InfoWorld Big Data

Bossies 2016: The Best of Open Source Software Awards

Bossies 2016: The Best of Open Source Software Awards

Does anyone even try to sell closed-source software anymore? It must be hard, when so many of the tools used to power the world’s largest datacenters and build the likes of Google, Facebook, and LinkedIn have been planted on GitHub for everyone to use. Even Google’s magic sauce, the software that knows what you will read or buy before you read or buy it, is now freely available to any ambitious developer with dreams of a smarter application.

Google didn’t used to share its source code with the rest of us. It used to share research papers, then leave it to others to come up with the code. Perhaps Google regrets letting Yahoo steal its thunder with Hadoop. Whatever the reason, Google is clearly in the thick of open source now, having launched its own projects — TensorFlow and Kubernetes — that are taking the world by storm.

Of course, TensorFlow is the machine learning magic sauce noted above, and Kubernetes the orchestration tool that is fast becoming the leading choice for managing containerized applications. You can read all about TensorFlow and Kubernetes, along with dozens of other excellent open source projects, in this year’s Best of Open Source Awards, aka the Bossies. In all, our 2016 Bossies cover 72 winners in five categories:

The software tumbling out of Google and other cloudy skies marks a huge shift in the open source landscape and an even bigger shift in the nature of the tools that businesses use to build and run their applications. Just as Hadoop reinvented data analytics by distributing the work across a cluster of machines, projects such as Docker and Kubernetes (and Mesos and Consul and Habitat and CoreOS) are reinventing the application “stack” and bringing the power and efficiencies of distributed computing to the rest of the datacenter.

This new world of containers, microservices, and distributed systems brings plenty of challenges too. How do you handle monitoring, logging, networking, and security in an environment with thousands of moving parts, where services come and go? Naturally, many open source projects are already working to answer these questions. You’ll find a number of them among our Bossie winners.

We’ve come to expect new names in the Bossies, but this year’s winners may include more newcomers than ever. Even in the arena of business applications, where you find many of the older codebases and established vendors, we see pockets of reinvention and innovation. New machine learning libraries and frameworks are taking their place among the best open source development and big data tools. New security projects are taking a cloud-inspired devops approach to exposing weaknesses in security controls.

Open source software projects continue to fuel an amazing boom in enterprise technology development. If you want to know what our applications, datacenters, and clouds will look like in the years to come, check out the winners of InfoWorld’s Best of Open Source Awards.

Source: InfoWorld Big Data

SAP woos SMB developers with an 'express' edition of Hana

SAP woos SMB developers with an 'express' edition of Hana

SAP has made no secret of the fact that its bets for the future rest largely on its Hana in-memory computing platform. But broad adoption is a critical part of making those bets pay off.

Aiming to make Hana more accessible to companies of all shapes and sizes, the enterprise software giant on Monday unveiled a downloadable “express” edition that developers can use for free.

The new express edition of SAP Hana can be used free of charge on a laptop or PC to develop, test, and deploy production applications that use up to 32GB of memory; users who need more memory can upgrade for a fee. Either way, the software delivers database, application, and advanced analytics services, allowing developers to build applications that use Hana’s transactional and analytical processing against a single copy of data, whether structured or unstructured.

Originally launched more than five years ago, Hana uses an in-memory computing engine in which data to be processed is held in RAM instead of being read from disks or flash storage. This makes for faster performance. Hana was recently updated with expanded analytics capabilities and tougher security, among other features.

Hana also forms the basis for S/4Hana, the enterprise suite that SAP released in early 2015.

The new express edition of Hana can be downloaded from the SAP developer center and installed on commodity servers, desktops, and laptops using a binary installation package with support for either SUSE Linux Enterprise Server or Red Hat Enterprise Linux. Alternatively, it can be installed on Windows or Mac OS by downloading a virtual machine installation image that is distributed with SUSE Linux Enterprise Server.

Tutorials, videos, and community support are available. The software can also be obtained through the SAP Cloud Appliance Library, which provides deployment options for popular public cloud platforms.

“The new easy-to-consume model via the cloud or PC and free entry point make a very attractive offering from SAP,” said Cindy Jutras, president of research firm Mint Jutras. “Now companies such as small-to-midsize enterprises have access to a data management and app development platform that has traditionally been used by large enterprises.”

Source: InfoWorld Big Data

Salesforce is betting its Einstein AI will make CRM better

Salesforce is betting its Einstein AI will make CRM better

If there was any doubt that AI has officially arrived in the world of enterprise software, Salesforce just put it to rest. The CRM giant on Sunday announced Einstein, a set of artificial intelligence capabilities it says will help users of its platform serve their customers better.

AI’s potential to augment human capabilities has already been proven in multiple areas, but tapping it for a specific business purpose isn’t always straightforward. “AI is out of reach for the vast majority of companies because it’s really hard,” John Ball, general manager for Salesforce Einstein, said in a press conference last week.

With Einstein, Salesforce aims to change all that. Billing the technology as “AI for everyone,” it’s putting Einstein’s capabilities into all its clouds, bringing machine learning, deep learning, predictive analytics, and natural language processing into each piece of its CRM platform.

In Salesforce’s Sales Cloud, for instance, machine learning will power predictive lead scoring, a new tool that can analyze all data related to leads — including standard and custom fields, activity data from sales reps, and behavioral activity from prospects — to generate a predictive score for each lead. The models will continuously improve over time by learning from signals like lead source, industry, job title, web clicks, and emails, Salesforce said. 

Another tool will analyze CRM data combined with customer interactions such as inbound emails from prospects to identify buying signals earlier in the sales process and recommend next steps to increase the sales rep’s ability to close a deal.

In Service Cloud, Einstein will power a tool that aims to improve productivity by pushing a prioritized list of response suggestions to service agents based on case context, case history, and previous communications.

Salesforce’s Marketing, Commerce, Community, Analytics, IoT and App Clouds will benefit similarly from Einstein, which leverages all data within Salesforce — including activity data from its Chatter social network, email, calendar, and ecommerce as well as social data streams and even IoT signals — to train its machine learning models.

The technology draws on recent Salesforce acquisitions including MetaMind. Roughly 175 data scientists have helped build it, Ball said.

Every vendor is now facing the challenge of coming up with a viable AI product, said Denis Pombriant, managing principal at Beagle Research Group.

“Good AI has to make insight and knowledge easy to grasp and manipulate,” Pombriant said. “By embedding products like Einstein into customer-facing applications, we can enhance the performance of regular people and enable them to do wonderful things for customers. It’s not about automation killing jobs; it’s about automation making new jobs possible.”

Most of Salesforce’s direct competitors, including Oracle, Microsoft, and SAP, have AI programs of their own, some of them dating back further than Salesforce’s, Pombriant noted.

Indeed, predictive analytics has been an increasingly significant part of the marketer’s toolbox for some time, and vendors including Pegasystems have been applying such capabilities to CRM.

“I think more than any other move, such as IoT, AI is the next big thing we need to focus on,” Pombriant said. “If IoT is going to be successful, it will need a lot of good AI to make it all work.”

New Einstein features will start to become available next month as part of Salesforce’s Winter ‘17 release. Many will be added into existing licenses and editions; others will require an additional charge.

Also on Sunday, Salesforce announced a new research group focused on delivering deep learning, natural language processing, and computer vision to Salesforce’s product and engineering teams.

Source: InfoWorld Big Data

ClearDB Joins Google Cloud Platform Technology Partner Program

ClearDB Joins Google Cloud Platform Technology Partner Program

ClearDB has announced it has been named a Google Cloud Platform Technology Partner, providing a fully-managed database service that improves MySQL database performance, availability, and ease of control.

The collaboration between Google Cloud Platform and ClearDB enables organizations to benefit from accelerated application development via rapid deployment of database assets, highly available MySQL that avoids application disruptions and a pay-as-you-go model that eliminates the need for organizations to procure and maintain costly infrastructure.

To help customers get the most out of Google Cloud Platform services, Google works closely with companies such as ClearDB, that deliver best in class fully managed database services on top of Google Cloud Platform.

“The combination of ClearDB and Google Cloud Platform can free users from the overhead of managing infrastructure, provisioning servers and configuring networks,” said Jason Stamper, an analyst with 451 Research.  “ClearDB on Google Cloud Platform can allow users to focus on business innovation and growth.”

ClearDB is cloud agnostic and is the only vendor offering MySQL database-as-a-service (DBaaS) to three cloud providers. The company offers My SQL users a quick and efficient means to rapidly deploy highly available database assets in the cloud. ClearDB’s DBaaS is built on top of native MySQL, no code changes are required, thus simplifying deployment while ensuring high availability with sub-second automatic failover capabilities via high-availability routers.

“With database assets playing a vital role in achieving business success in today’s always-on, data-driven economy, the ability to accelerate database-powered application development, ensure ‘always-on’ availability and provide fully-managed database services is essential,” said Allen Holmes, ClearDB vice president of marketing and platform alliances. “ClearDB is committed to expanding its DBaaS offering to all major cloud providers and we are excited to add Google Cloud to our existing lineup of Microsoft Azure and Amazon EC2 offerings.”

Designed to work on major public clouds and to support private cloud and on-premises operations, ClearDB’s nonstop Data Services Platform extends ClearDB’s MySQL DBaaS offering and automates the provisioning and management process with an intuitive services framework that accelerates performance and guarantees high availability in any cloud marketplace, including Microsoft Azure, Amazon Web Services (AWS), Heroku, AppFog, SoftLayer and IBM Bluemix – all while reducing database license footprint and related infrastructure costs. 

Source: CloudStrategyMag

How MIT's C/C++ extension breaks parallel processing bottlenecks

How MIT's C/C++ extension breaks parallel processing bottlenecks

Earlier this week, MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) department announced word of Milk, a system that speeds up parallel processing of big data sets by as much as three or four times.

If you think this involves learning a whole new programming language, breathe easy. Milk is less a radical departure from existing software development than a refinement of an existing set of C/C++ tools.

All together now

According to the paper authored by the CSAIL team, Milk is a  C/C++ language family extension that addresses the memory bottlenecks plaguing big data applications. Apps that run in parallel contend with each other for memory access, so any gains from parallel processing are offset by the time spent waiting for memory.

Milk solves these problems by extending an existing library, OpenMP, widely used in C/C++ programming for parallelizing access to shared memory. Programmers typically use OpenMP by annotating sections of their code with directives (“pragmas”) to the compiler to use OpenMP extensions, and Milk works the same way. The directives are syntactically similar, and in some cases, they’re minor variants of the existing OpenMP pragmas, so existing OpenMP apps don’t have to be heavily reworked to be sped up.

Milk’s big advantage is that it performs what the paper’s authors describe as “DRAM-conscious clustering.” Since data shuttled from memory is cached locally on the CPU, batching together data requests from multiple processes allows the on-CPU cache to be shared more evenly between them.

The most advanced use of Milk requires using some functions exposed by the library — in other words, some rewriting — but it’s clearly possible to get some results right away by simply decorating existing code.

Let’s not throw all this out yet

As CPU speeds top out, attention has turned to other methods to ramp up processing power. The most direct option is to scale out: spreading workloads across multiple cores on a single chip, across multiple CPUs, or throughout a cluster of machines. While a plethora of tools exist to spread out workloads in these ways, the languages used for them don’t take parallelism into account as part of their designs. Hence the creation of functional languages like Pony to provide a fresh set of metaphors for how to program in such environments.

Another approach has been to work around the memory-to-CPU bottleneck by moving more of the processing to where the data already resides. Example: the MapD database, which uses GPUs and their local memory for both accelerated processing and distributed data caching.

Each of these approaches has their downsides. With new languages, there’s the pain of scrapping existing workflows and toolchains, some of which have decades of work behind them. Using GPUs has some of the same problems: Shifting workloads to a GPU is easy only if the existing work’s abstracted away through a toolkit that can be made GPU-aware. Otherwise, you’re back to rewriting everything from scratch. 

A project like Milk, on the other hand, is adds a substantial improvement to a tool set that’s already widely used and well-understood. It’s always easier to transform existing work than tear it down and start over, so Milk provides a way to squeeze more out of what we already have.

Source: InfoWorld Big Data

Global Data Center Construction Market Expected To Grow Through 2020

Global Data Center Construction Market Expected To Grow Through 2020

The global data center construction market is expected grow at a CAGR of more than 12% during the period 2016-2020, according to Technavio’s latest report.

In this report, Technavio covers the market outlook and growth prospects of the global data center construction market for 2016-2020. The report also presents the vendor landscape and a corresponding detailed analysis of the major vendors operating in the market. It includes vendors across all geographical regions. The report provides the performance and market dominance of each of the vendors in terms of experience, product portfolio, geographical presence, financial condition, R&D, and customer base.

“The data center construction market is growing significantly with major contributions by cloud service provider (CSPs) and telecommunication and colocation service providers worldwide. The increased construction is facilitated by the increased demand for cloud-based service offerings and big data analytics driven by the stronger growth of data through connected devices in the form of the IoT,” says Rakesh Kumar Panda, a lead data center research expert from Technavio.

Technavio’s ICT research analysts segment the global data center construction market into the following regions:

  • Americas
  • EMEA
  • APAC

In 2015, with a market share of close to 45%, Americas dominated the global data center construction market, followed by EMEA at around 34% and APAC with a little over 21%.

Source: CloudStrategyMag