Kinetica And Nimbix Partner To Offer GPU Computing In The Cloud

Kinetica And Nimbix Partner To Offer GPU Computing In The Cloud

Kinetica has announced its real-time analytics and visualization solution is immediately available on the Nimbix Cloud. Providing instant results and visualized insights across massive streaming datasets, Kinetica on the Nimbix Cloud can be launched in seconds and is the ideal solution for GPU-accelerated analytics.

“Kinetica on the Nimbix Cloud harnesses the power of parallel GPUs to deliver real-time analytics and data written to Kinetica is automatically routed to parallel connections across the cluster,” said Amit Vij, cofounder and CEO, Kinetica.  “The full Kinetica stack can be provisioned with a couple of mouse clicks from the Nimbix console or launched and automated with JARVICE’s powerful task API.”

“Kinetica’s GPU accelerated database platform paired with Nimbix’s Cloud is a natural fit for customers looking for industry-leading performance and ease-of-use from a cloud deployment,” said Steve Hebert, CEO, Nimbix. “With a few mouse clicks, customers can have unprecedented computing capabilities to help them solve some of their biggest data analytics challenges.”

The Nimbix Cloud offers customer selectable systems with either Intel x86 based processors or IBM POWER8 Processors both coupled with NVIDIA GPUs to deliver the best performance and economics available for Kinetica customers. Nimbix is the only public cloud provider featuring NVIDIA’s latest generation Tesla P100 GPUs with NVLink, a high-bandwidth, energy-efficient interconnect that allows data sharing at rates 5 to 12 times faster than the traditional PCIe interconnects.  Nimbix Cloud machines are interconnected with industry leading 56Gbps FDR infiniband for optimal GPU Cluster performance.

“The IBM POWER8 NVLink server is ideally suited for databases and advanced analytic applications because the NVLink super-highway between the POWER8 CPU and the NVIDIA Tesla P100 Pascal GPU accelerator enables data to up to move 2.5x higher throughput from the system memory to the accelerator,” said Sumit Gupta, IBM vice president of High Performance Computing, AI and Analytics.  “The availability of Kinetica running on these POWER8 NVLink servers in the Nimbix cloud enables enterprises to quickly try the real-time analytics of the Kinetica accelerated database.”

Source: CloudStrategyMag

FloQast Unveils New Cloud Storage And Security Integrations

FloQast Unveils New Cloud Storage And Security Integrations

FloQast, Inc. has announced new integrations with leading cloud storage providers Microsoft OneDrive and Egnyte as well as cloud Single Sign-On(SSO) solutions from Google and Okta.

The out-of-the-box integrations help simplify the setup and adoption of FloQast’s close management software while bolstering security by providing secure access via SSO. These new integrations address enhanced security and governance requirements for security-conscious industries such as, among others, financial services, healthcare and aerospace.

With the new integrations of Microsoft OneDrive and Egnyte, FloQast close management software can directly and securely access financial data residing in Excel workbooks housed within these cloud storage applications.  This innovative approach ensures that accountants can leverage the familiarity and flexibility of Excel while maintaining security and retaining ownership and control of their sensitive financial data. FloQast accomplishes this by securely accessing customer financial data from Excel-based account reconciliations to make certain all accounts are automatically tied-out against the General Ledger system.  This approach reduces the risk of error and eliminates hours of manual work each month.

The integrations with Okta and Google SSO further strengthen security by supporting password complexity and Multi-Factor Authentication.  Integration with the identity management solutions helps ensure FloQast close management software can only be accessed by authorized users which bolsters governance and security.

These new integrations complement FloQast’s existing partnerships with Box, Dropbox and Google Drive. 

“The financial services industry — specifically accounting — is extremely security conscious as it constantly deals with high volumes of highly sensitive information,” said Ronen Vengosh, vice president of business development at Egnyte. “Egnyte’s integration with FloQast provides an easy-to-use interface for accounting professionals to collaborate on financial records and efficiently close their books, without losing custody of sensitive documents or risking violation of compliance regulations.”

“FloQast provides accounting teams a single place to manage the close and gives everyone visibility.  These new integrations expand our current product footprint and extend our capabilities while also demonstrating FloQast’s flexibility to address the myriad increasingly important concerns accountants face today ,” said Mike Whitmire, CPA, co-founder and chief executive officer of FloQast. “The new integrations with Egnyte, OneDrive, Google SSO, and Okta, along with our existing integration partners, ensure that enforce the highest levels of governance and security.”

 

Source: CloudStrategyMag

Fujitsu Develops New Cloud Network Infrastructure Technology

Fujitsu Develops New Cloud Network Infrastructure Technology

Fujitsu Laboratories Ltd. and Fujitsu Laboratories of America, Inc. have developed technology that makes it easy to design, build, and operate virtual network infrastructure that extend across multiple clouds and corporate networks.

With conventional technology it is possible to build “virtual network infrastructure” within one individual cloud or corporate location using Software Defined Networking (SDN) technology. However, when building virtual network infrastructure spanning multiple clouds and locations, not only are the setup methods different for each cloud and location, it is also necessary for a skilled engineer to take time to configure settings for connecting each part of the IT infrastructure individually.

Now Fujitsu Laboratories and Fujitsu Laboratories of America have developed software technology that abstracts the structural elements of networks in IT infrastructure, and which can collectively design, build, manage, and operate across multiple IT infrastructures. The two companies have confirmed that this technology allows virtual network infrastructure to be built in one-tenth the time required by conventional technology, when it is applied to the design and configuration of the virtual network infrastructure extending across one cloud and a location with a few dozen devices.

With this technology, it is now possible to build and operate networks in response to user or application requirements, contributing to the creation of virtual networks across multiple IT infrastructures that support a variety of services in the era of 5G mobile communications, IoT, and the cloud.

Details of this technology were announced at the Technical Committee Conference on Network Systems, held March 2 in Okinawa. This technology will also be on display at the upcoming Fujitsu North America Technology Forum (NATF) 2017, an exhibition to be held March 9 in Santa Clara, California.

Development Background

In anticipation of an era in which 5G mobile communications and IoT technology become increasingly widespread, there has been a demand for technology that can efficiently provide a variety of services in the cloud in response to diversified needs and requirements.

With SDN and other technology for virtualizing computer resources, and through the development of virtual network infrastructure that extends from IoT devices, smart devices, and PCs to the cloud for each service as needed, it is now possible for cloud service providers to securely and efficiently provide their services, independent of the physical infrastructure. These technologies are beginning to be used, for example, in delivering large volumes of data from IoT devices to the cloud, and analyzing that data for use in a variety of fields, or in flexibly building enterprise networks in response to changes in corporate organization. There has been a demand with this sort of virtual network infrastructure for the ability to quickly respond to changes in service needs and requirements that continually arise.

Issues

With conventional technology, virtual network infrastructure can be created using SDN technology based on network designs required by a user, or “logical network,” within a single IT infrastructure. However, when building virtual network infrastructure that encompasses cloud services being operated by different companies or in multiple locations, there are different setup methods for the individual SDN controllers. In addition, connecting different IT infrastructure usually requires expert knowledge to manually configure virtual network infrastructures in between, such as VPNs and VXLANs, thus making network configuration laborious and time consuming.

About the Technology

Now Fujitsu Laboratories and Fujitsu Laboratories of America have developed technology that abstracts the structural elements of IT infrastructure, and which can automatically design and build IT infrastructure that extends across multiple locations, managing and operating it as a single network infrastructure.

Features of the newly developed technology are as follows:

1. IT infrastructure abstraction technology

Fujitsu Laboratories and Fujitsu Laboratories of America developed a technology that models structural elements of IT infrastructure using logical software components to enable key features such as element configuration and operational status monitoring. Using this technology, users can automatically build virtual network infrastructures across multiple instances of IT infrastructure based on the logical network designed by the user. In addition, enabling operations and fault response management of virtual network infrastructures, lifecycle management (design, configuration, operations, and fault response), can also be done on the logical network.

2. Virtual network function automatic supplemental technology

Fujitsu Laboratories and Fujitsu Laboratories of America developed a technology to automatically supplement necessary network functions in virtual network design when converting from a logical network to a virtual network infrastructure by deriving the designer’s intentions from the connection status of the logical network.

This technology focuses on the connection between the structural elements of the designed logical network, and determines the designer’s intentions and automatically supplements necessary network functions by checking the connections between objects at different levels, such as IT infrastructure, subnets, and nodes, located at both ends of the connection.

Effects

With this new technology, it is now possible to quickly build a complex virtual network infrastructure that extends across multiple companies or locations, without expert knowledge. It is possible, for example, that the design and configuration of a virtual network infrastructure extending across one cloud and a location with a few dozen devices could be completed by one engineer in one day, whereas previously four engineers with expert virtual knowledge took three days. This can shorten the configuration time to less than one-tenth what was previously required.

In case of migrating enterprise systems to the cloud, it is now possible to complete migration in a short period of time by simply performing operations on the logical network. In addition, when setting up a network at multiple locations, the user can design one logical network, then easily set up a virtual network at each location and add security functionality by copying and customizing the same design.

Future Plans

Going forward, Fujitsu Laboratories and Fujitsu Laboratories of America plan to continue development of other functions for lifecycle management in logical networks using this technology, with the goal of Fujitsu Limited conducting its sales as a network management function for enterprise users of cloud services after fiscal 2017. This function will also be available on the FUJITSU Cloud Service K5 provided by Fujitsu Limited.

Source: CloudStrategyMag

MongoDB adds free tier and migration utility to cloud service

MongoDB adds free tier and migration utility to cloud service

NoSQL database specialist MongoDB unveiled a new free tier for its MongoDB Atlas database-as-a-service (DaaS) offering on Tuesday. The company also released a utility to support live migration of data to MongoDB Atlas, whether that data is on-premise or in the cloud.

“Since we first introduced MongoDB to the community in 2009, we have been laser-focused on one thing—building a technology that gets out of the way of developers and makes them more productive,” Eliot Horowitz, CTO and co-founder of MongoDB, said in a statement Tuesday. “Now, with these updates to MongoDB Atlas, we’re tearing down more of the barriers that stand between developers and their giant ideas.”

The MongoDB Atlas service now offers a free cluster with 512 MB of storage, with nodes distributed to ensure high availability. Data is secured by default with authorization via SCRAM-SHA-1, TLS/SSL encryption for data traveling over networks and encrypted storage volumes for data at rest.

The new MongoMirror migration utility is designed to help users who are already running MongoDB to seamlessly pull data from their existing deployments and push it into MongoDB Atlas. The company says it will work with any existing MongoDB replica set running MongoDB 3.0 or higher. A hosted version of the live migration tool will soon be available in MongoDB Atlas.

MongoDB Atlas was engineered by the same team that built MongoDB, and the company says it incorporates the best practices of real world use cases, from startups to Fortune 500 companies. MongoDB launched it in June 2016 and says thousands of organizations around the world are already making use of the service, including companies like eHarmony and Thermo Fisher Scientific. Now it hopes to reduce barriers to adoption even more by making it free to get started and offering a way to seamlessly migrate existing workloads.

“The move to MongoDB Atlas has been a great win for us,” said James Mullaney, Technical Director at UK-based learning and performance R&D specialist HT2 Labs, which recently migrated its education data platform from a third-party MongoDB service provider to MongoDB Atlas to reduce costs and better scale its business. “We were able to scale to five times as much data while keeping database costs flat. Also, protecting our data assets is critical as we handle massive amounts of private education data. The fact that MongoDB’s native security features are baked into the Atlas platform made our migration decision that much easier.”

This story, “MongoDB adds free tier and migration utility to cloud service” was originally published by CIO.

Source: InfoWorld Big Data

MapR and Outscale partner on big data PaaS

MapR and Outscale partner on big data PaaS

At the Big Data Paris event in Paris, France, today, MapR Technologies and French enterprise-class cloud provider Outscale announced that they have joined forces to provide a big data platform-as-a-service (PaaS) offering built on the MapR Converged Data Platform.

Outscale will provide the new premium cloud service in Europe, North America and Asia and says it will provide customers with a cost-effective and flexible platform to support their big data journey — from initial proof of concept to prototype and application deployment, all with unlimited scalability.

“We are proud to offer MapR as the core technology to power our big data as a platform service because it enables our customers to spin up a full complete, multi-TB data platform in the cloud in a matter of minutes,” David Chassan, chief product officer, Outscale, said in a statement today. “We’ve worked on other solutions for big data, but found that MapR was the only one with the stability, scale and functionality that meets the needs of our enterprise customers and VARs.”

No expertise needed

Outscale says its new Big Data PaaS is designed to be simple to use in the cloud, and offers the entire MapR Converged Data Platform to provide fast access to data stored in files, databases and event streams for performing real-time analysis on business critical, operational applications. The service allows customers to test and deploy cloud-based applications on any size cluster, accessing open APIs including HDFS, Spark, Drill and POSIX NFS without the need for extensive professional service expertise in big data.

“Outscale is one of the leading cloud providers in France because of their strong expertise,” Yann Aubry, area vice president, Northern & Western Europe, MapR Technologies, said in a statement Monday. “With MapR at the core of their cloud platform, they quickly provide their customers with managed access to a converged data platform including the key big data technologies today. We are pleased to work with Outscale to deliver a technologically advanced cloud platform.”

Knocking down data silos

MapR Technologies introduced its Converged Data Platform in December 2015 as part of an effort to tear down the new data silos brought about by the scattershot proliferation of new analytics tools and the consumerization of enterprise software.

The MapR Converged Data Platform brings together the MapR Distribution, including Apache Hadoop, MapR-DB and MapR Streams (its global even stream system, which allows organizations to continuously collect, analyze and act on streaming data). It integrates file, database, stream processing and analytics to support data-driven applications.

This story, ” MapR and Outscale partner on big data PaaS” was originally published by CIO.

Source: InfoWorld Big Data

Facebook's new machine learning framework emphasizes efficiency over accuracy

Facebook's new machine learning framework emphasizes efficiency over accuracy

In machine learning parlance, clustering or similarity search looks for affinities in sets of data that normally don’t make such a job easy. If you wanted to compare 100 million images against each other and find the ones that looked most like each other, that’s a clustering job. The hard part is scaling well across multiple processors, where you’d get the biggest speedup.

Facebook’s AI research division (FAIR) recently unveiled, with little fanfare, a proposed solution called Faiss. It’s an open source library, written in C++ and with bindings for Python, that allows massive data sets like still images or videos to be searched efficiently for similarities.

It’s also one of a growing class of machine learning solutions that’s exploring better methods of making algorithms operate in parallel across multiple GPUs for speed that’s only available at scale.

A magnet for the needle in the haystack

FAIR described the project and its goals in a paper published at the end of last February. The problem wasn’t only how to run similarity searches, or “k-selection” algorithms, on GPUs, but how to run them effectively in parallel across multiple GPUs, and how to deal with data sets that don’t fit into RAM (such as terabytes of video).

Faiss’ trick is not to search the data itself, but a compressed representation that trades a slight amount of accuracy for an order of magnitude or more of storage efficiency. Think of an MP3: Though MP3 is “lossy” compression format, it sounds good enough for most ears. In the same manner, Faiss uses encoding called PQ (product quantization) that can be split efficiently across multiple GPUs.

One example search shown in the paper involves searching the Yahoo Flickr Creative Commons 100 Million data set, a library of 100 million images. Faiss was fed two images — one of a red flower, and one of a yellow flower — and instructed to find a chain of similar images between them. Searching all 100 million images for such similarities took 35 minutes on a set of four Nvidia Titan X GPUs.

FAIR claims Faiss is “8.5× faster than prior GPU state of the art” and provided some benchmarks to support its claim. When compared against two previous GPU k-selection algorithms, FAIR claimed, the Faiss algorithm was not only faster, but came a good deal closer to maximizing the available memory bandwidth for the GPU.

Another advantage with Faiss, said FAIR, was the total end-to-end time for the search — the time needed to construct the PQ version of the data, plus the time needed to actually run the search. Competing solutions took days on end simply to build PQ graph data for one test; with Faiss, a “high-quality” graph can be built in “about half a day.”

Pick up the pace

FAIR’s strategy of slightly sacrificing accuracy is one of a variety of speedup tactics used by the latest generation of machine learning technologies.

Many of these speedups don’t simply goose the performance of high-end hardware like Nvidia Titan boards, but also empower lower-end hardware, like the GPUs in smartphones. Google’s deep leaning system TensorFlow was recently upgraded to allow smartphone-grade GPUs to perform image-recognition work.

Another likely long-term advantage of algorithms that can efficiently trade accuracy for speed is to divide labor between a local device (fast, but not as accurate) and a remote back end (more accurate, but requires more processing power). Classifications made by a local device could be used as-is or augmented with more horsepower on the back end if there’s a network connection.

The biggest takeaway with Faiss: There’s still plenty of work to be done in figuring out how machine learning of all stripes can further benefit from massively parallel hardware.

Source: InfoWorld Big Data

Making sense of machine learning

Making sense of machine learning

As Matt Asay observed last week, AI appears to be reaching “peak ludicrous mode,” with almost every software vendor laying claim to today’s most hyped technology. In fact, Gartner’s latest Hype Cycle for Emerging Technologies places machine learning at the Peak of Inflated Expectations.

Hang on — see what I did there? I used “AI” and “machine learning” interchangeably, which should get me busted by the artificial thought police. The first thing you need to know about AI (and machine learning) is that it’s full of confusing, overlapping terminology, not to mention algorithms with functions that are opaque to all but a select few.

This combination of hype and nearly impenetrable nomenclature can get pretty irritating. So let’s start with a very basic taxonomy:

Artificial intelligence is the umbrella phrase under which all other terminology in this area falls. As an area of computer research, AI dates back to the 1940s. AI researchers were flush with optimism until the 1970s, when they encountered unforeseen challenges and funding dried up, a period known as “AI winter.” Despite such triumphs as IBM’s 1990s chess-playing system Deep Blue, the term AI did not recover really from its long winter until a few years ago. New nomenclature needed to be invented.

Machine intelligence is synonymous with AI. It never gained the currency AI did, but you never know when it might suddenly become popular.

Machine learning is the phrase you hear most often today, although it was first coined in the 1950s. It refers to a subset of AI in which programs feed on data and, by recognizing patterns in that data and learning from them, execute functions or make predictions without being explicitly programmed to do so. Most of the recent advances we hear about fall under the rubric of machine learning. Why is it so hot today? You often hear that Moore’s Law and cheap, abundant memory have given new life to old machine learning algorithms, which have led to a wave of practical applications, particularly relating to pattern recognition. That’s true, but even more important has been the hyperabundance of data to enable machine learning systems to learn.

Cognitive computing has been the phrase preferred by IBM and bestowed on its Jeopardy winner Watson. As best as I can determine, cognitive computing is more or less synonymous with AI, although IBM’s definition emphasizes human interaction with that intelligence. Some people object to the phrase because it implies human-like reasoning, which computer systems in their current form are unlikely to attain.

Neural networks are a form of machine learning dating back to early AI research. They very loosely emulate the way neurons in the brain work — the objective generally being pattern recognition. As neural networks are trained with data, connections between neurons are strengthened, the outputs from which form patterns and drive machine decisionmaking. Disparaged as slow and inexact during the AI winter, neural net technology is at the root of today’s excitement over AI and machine learning.

Deep learning is the hottest area of machine learning. In most cases, deep learning refers to many layers of neural networks working together. Deep learning has benefited from abundant GPU processing services in the cloud, which greatly enhance performance (and of course eliminate the chore of setting up GPU clusters on prem). All the major clouds — AWS, Microsoft Azure, and Google Cloud Platform — now offer deep learning frameworks, although Google’s TensorFlow is considered the most advanced. If you want a full explanation from someone who actually understands this stuff, read Martin Heller’s “What deep learning really means.” Also check out his comparative review of the six most popular machine/deep learning frameworks.

Despite the current enthusiasm for deep learning, most machine learning algorithms have nothing to do with neural nets. As I discovered a couple of years ago when I interviewed Dr. Hui Wang, senior director of risk sciences for PayPal, advanced systems often use deep learning in conjunction with linear algorithms to solve such major challenges as fraud detection. The almost unlimited ability to pile on not only deep learning layers, but also a wide variety of other machine learning algorithms — and apply them to a single problem — is one reason you’ve heard those cautionary verses about machine intelligence one day approaching human intelligence.

Just last December, another milestone was reached: An AI system known as DeepStack beat professional poker players for the first time at heads-up no-limit Texas hold’em poker, which unlike chess is a classic game of “imperfect” information (i.e., players have information that others do not have). That’s less than a year after Google’s AlphaGo system beat world champion Lee Sodol in the ancient Chinese game of Go.

So AI’s fever pitch is understandable, but Matt Asay’s ridicule still hits home. As with many hot trends, it’s all too easy to grandfather in prosaic existing technology (I mean, predictive text is technically AI). The other problem is that very few people understand much in this area beyond the superficial, including me. How many people who throw around phrases like k-means clustering or two-class logistic regression have a clear idea of what they’re talking about? For most of us, this is black box territory (though a good primer can help, such as this one from the University of Washington computer science department). Ultimately, it takes experts like Martin Heller, who along with polyglot programming skills has a Ph.D. in physics, to evaluate AI solutions.

This week promises to be a big one for AI and machine learning: Salesforce will be showing off its Einstein AI capability and at the Google Next conference, the agenda features no less than 20 sessions on machine learning. I was hoping to attend the “TensorFlow and Deep Learning without a Ph.D.” session, but it’s already full.

Source: InfoWorld Big Data

HPE refocuses tech services group on cloud, big data

HPE refocuses tech services group on cloud, big data

Hewlett Packard Enterprise has revamped its existing technology services unit to focus on helping customers adopt emerging technologies, including cloud computing, the internet of things, and big data.

HPE’s new Pointnext technology services division, announced Thursday, is designed to help businesses speed up their adoption of several technologies, also including hybrid IT services and analytics, the company said. HPE announced the rebranded services unit with an “unboxing” video.

The revamped technology services unit is designed to meet customer needs as digital transformation is “driving an incredible pace of change” in the IT industry, Antonio Neri, executive vice president and general manager of the HPE Enterprise Group, said in a press release.

HPE Pointnext combines HPE’s consulting and support organizations in one group under general manager Ana Pinczuk, former chief product officer at Veritas. HPE announced her hiring last month.

Pointnext will use HPE’s existing 25,000 technology services specialists in 80 countries to help business with digital transformations, the company said.

Crawford Del Prete, an enterprise computing analyst at IDC, called the announcement a relaunch and rebranding of HPE’s tech services group. 

The announcement comes after HPE’s announcement of a merger of its separate enterprise services unit with CSC, he noted.

Technology services have been “a bright spot in HPE’s portfolio, and with the spin-merge of professional services to CSC, will be the focus of HPE’s services in the future,” he said via email.

Source: InfoWorld Big Data

Harness Hadoop and Spark for user-friendly BI

Harness Hadoop and Spark for user-friendly BI

Big data shouldn’t be an area for only academics, data scientists, and other specialists. In fact, it can’t be. If we want big data to benefit industry at large, it needs to be accessible by mainstream information workers. Big data technology must fit into the workflows, habits, skill sets, and requirements of business users across enterprises.

Datameer is a big data analytics application doing exactly that. Combining the user interface metaphors of a file browser and a spreadsheet, Datameer runs natively on open source big data technologies like Hadoop and Spark, while hiding their complexity and facilitating their use in enterprise IT environments and business user scenarios. 

In other words, Datameer creates an abstraction layer over open source big data technologies that integrates them into the stable of platforms and toolchains in use in enterprise business environments. Business users tap the power of big data analytics through a familiar spreadsheet workbook and formula interface, while also benefiting from enterprise-grade management, security, and governance controls.  

Before we dive into the details of the platform, we should note that Datameer supports the full data lifecycle, including data acquisition and import (sometimes referred to as “ingest”), data preparation, analysis, and visualization, as well as export to other systems, such as databases, file stores, and even other BI tools. 

Data import is achieved with more than 70 connectors to databases, file formats, and applications, providing diversified connectivity for structured, semistructured, and unstructured data. Nevertheless, once a given set of data is in Datameer, it can migrate through all of the data lifecycle stages mentioned previously, right along with data from other sources.

System architecture

At the heart of Datameer is the core server referred to internally as the conductor. The conductor orchestrates all work and manages the configuration of all jobs performed on the Hadoop cluster. It lets users interact with the underlying data sources via Datameer’s user interface, and it lets tools interact with the data via its API.

The conductor also has a special interactive mode that accommodates the user’s incidental work in the spreadsheet user interface. This interactivity is facilitated by Datameer’s Smart Sampling technology, which allows the user to work with a manageable and representative subset of the data in memory. When the design work is done, the workbook is executed against the full data set via a job submitted to the cluster.

This fluid movement between interactive design work by the user and bulk execution by the Conductor (running on the Hadoop cluster) is the key to Datameer’s harmonization of open source big data and enterprise BI sensibilities and workflows.

Although Datameer works cooperatively with a Hadoop cluster, the application itself executes on a standalone server (“edge node”) or desktop PC running Windows, Linux, Unix, or MacOS. It is compatible with modern browsers, including Safari and Chrome, as well as Microsoft’s Internet Explorer 11 and the new Edge browser in Windows 10.

Security and governance

From the very early days of the Datameer product – when Hadoop itself offered only file-level security – Datameer provided for role-based access controls on specific subsets of data, a non-negotiable requirement for most enterprises.

By sharing data through multiple workbooks, each of which may contain a different subset of data, and assigning permissions on each workbook to unique security groups, Datameer provides for the row-level security that enterprises need. Column-level security is accommodated as well, either through inclusion of select columns in a group-specific workbook or via masking of data in particular columns, for particular security groups.

While Datameer allows users, roles, and groups to be created and maintained, it can also integrate with Microsoft Active Directory or other LDAP user stores, authenticating users against, and assigning permissions to, the groups that are defined in those systems. Datameer can also integrate with enterprise single-sign-on (SSO) systems.

As a web application, Datameer can be run over SSL or HTTPS connections, thus providing encryption of actions and data between user and application.

Datameer provides full data lineage information, rendered in diagrammatic or columnar views (see figure below), so data can be tracked from import job to workbooks to individual chart widgets in business infographic data visualizations.

datameer data lineageDatameer

For audit control, Datameer supports an “event bus” listener-based API, wherein all user interface and data entity events (creation of workbooks, addition of users to groups, assigning or revoking of permissions) are described on an emergent basis to all API subscribers.

This event bus facilitates integration with external governance systems that may be in use at particular customer sites. For more standalone audit management, Datameer records these events in its own log files, which can in turn be imported into Datameer itself, then analyzed and visualized there.

Data integration architecture

Because Datameer is designed to work natively with big data technology, even its data import and export functionality is run on the cluster. This allows for limitless horizontal scaling to facilitate data processing at very high volume. It’s an approach that sets Datameer apart from many of its competitors. 

Nonetheless, for smaller data sets, Datameer does provide file upload jobs and the ability to download the content of any sheet in a workbook on an ad hoc basis (in the form of a simple file).

Datameer accommodates workflows where the source data set remains in its home repository (database, file, and so on) and is queried only for Smart Sampling purposes and during workbook execution. These “data links” assure that data movement and duplication are minimized while still allowing for interactive work against the data source, and cluster-based processing against the full data set when the workbook is executed.

Data preparation, analytics, and visualization

The key to Datameer’s “aesthetic” is the use of successive columns in a sheet, and successive sheets in a workbook, to yield a gradual (and self-documenting) evolution of source data sets into the analyzed whole.

This approach combines the tasks of data preparation and analysis in a single environment, by providing a library of more than 270 spreadsheet formula functions that serve each purpose (and sometimes both purposes). An example of the Datameer workbook is shown below. 

datameer workbookDatameer

Formula functions run the gamut from mundane standbys like functions for manipulating text, formatting numbers, and doing simple arithmetic to functions that group data in specific ways for aggregational analysis to specialized functions for parsing file names, HTML content, and XML- and JSON-formatted data. You’ll even find functions that can mine text for organization names and parts of speech and provide sentiment analysis on individual words.

Formulas can be entered in a formula bar or built with the assistance of the Formula Builder dialog (below), which allows pointing-and-clicking on individual columns in particular sheets to supply their data as formula parameter values.

datameer formulabuilderDatameer

Each sheet in a workbook serves as a view on the data set because workbooks don’t alter the original data. Sheets have a cascading relationship where, for example, data from sheet A is used and transformed by sheet B, which is further used and transformed by sheet C. In this way, every transformation and analysis step is made transparent and easily discoverable.

What’s more, each sheet in a workbook can have its data profiled at any time by switching to that sheet’s Flipside view, which provides histogram visualizations for each column. Along with the histogram, Flipside shows the distribution of values the column contains, the data type, the total number of values and distinct values in the column, and the minimum, maximum, and average value for all data within it.

Moving between different workbooks, the file browser, and various business infographics is made easy by Datameer’s context tabs, which allow the user to shift fluidly between the different views and return to each, in context, whenever that may be called for.

Context tabs, when combined with Datameer’s full data lifecycle functionality, perfectly facilitate working on data from multiple angles at once, rather than being forced into a linear, assembly-line approach of doing preparation, analysis, and visualization in a particular order. Many people prefer to work on each phase in a piecemeal fashion, bringing it all together at the end. Datameer fully supports that scenario.

Built using the latest HTML5 technologies, Datameer’s Infographic Designer (below) supports the creation and viewing of informative visualizations from any browser on a multitude of devices. The pixel-perfect design interface allows users to combine chart widgets with text annotations, images, videos, and other elements. Infographics are composed from more than 20 fully customizable drag-and-drop chart widgets, each based on the popular D3 visualization standard. The Infographic Designer can create high-end dashboards, operational reports, and beautiful, customized, special-purpose infographics.

datameer infographic designerDatameer

Smart Analytics

Datameer’s Smart Analytics technology provides four major algorithms that make it even easier to find the signal in the noise of big data: clustering, decision trees, column dependencies, and recommendations.

Models based on these algorithms manifest the same way other analytical assets within Datameer do: as sheets in a workbook. The sheets show all model data, along with predicted values, and the Flipside view will render a graphical representation of the model and its content.

By incorporating machine learning functionality in-situ, within the workbook user experience, Datameer provides machine learning capabilities without forcing users to have vastly specialized skills or endure abrupt user interface context switches.

This integration is further extended through the use of an optional Predictive Model Markup Language (PMML) plugin, provided by our partner, Zementis. The plugin allows scoring against machine learning models built in other tools (and published in PMML format) by exposing them within Datameer as additional spreadsheet functions.

A patent-pending execution framework

Datameer simplifies selection of execution frameworks through its patent-pending Smart Execution engine, which picks the best framework for users along each step in the analytics workflow. It takes full advantage of Apache Tez, Apache Spark, and Datameer’s own single-node, in-memory engine, freeing users from having to evaluate the best engine for any given analytics job or task.

Smart Execution provides a future-proof approach to big data analytics. By decoupling the design experience from processing on a particular execution engine, Datameer permits workbooks developed today to be functional against new execution frameworks tomorrow, as they emerge and take their place in the Smart Execution platform.

While open source big data technologies hold the keys to answering new business questions, they weren’t designed with business users in mind. Combining a spreadsheet workbook and formula interface with a cost-based query optimizer that picks the right engine for a particular set of tasks, Datameer turns Hadoop, Spark, and company into user-friendly BI tools for the business at large.

Datameer makes the big data aspiration a reality by harnessing the power of these platforms and working with them in their native capacities, not merely treating them as relational databases. At the same time, Datameer embeds these open source technologies into a business-user-oriented application, premised on familiar spreadsheet constructs, for working with data across its lifecycle and extracting relevant information from it.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Source: InfoWorld Big Data

Faction® Earns Patent For Its Hybrid And Multi-Cloud Solutions

Faction® Earns Patent For Its Hybrid And Multi-Cloud Solutions

Faction® has announced that the U.S. Patent and Trademark Office (USPTO) has granted Faction a patent for its pioneering work on hybrid and multi-cloud networking.

“We’re proud that Faction’s technology has essentially created the hybrid cloud category more than five years ago, and we look forward to exercising control of this technology now that we have received our initial patent issuance,” said Luke Norris, Faction’s CEO and founder.

Faction’s hybrid and multi-cloud technology that is now patented under USPTO Patent #9,571,301 powers Faction Cloud Bloc and Faction Internetwork eXchange (FIX) product sets. This allows service providers and enterprises to seamlessly connect the best features of various private and public clouds and design a robust cloud architecture that still operates as a single unified cloud.

Faction’s groundbreaking approach to cloud networking greatly reduces the cost and complexity of composing true hybrid cloud and multi-cloud solutions for customers. The technology is also broadly utilized to allow access from datacenters into private and public clouds, which Faction utilizes to connect customers currently in 22 datacenters in the United States directly into the company’s offering.

With this technology, cloud architects are now freed from rigid networking constructs and burdensome administrative tasks that have perpetually frustrated infrastructure and operations teams and slowed important business initiatives due to network constraints. Faction’s customers benefit from Faction’s composable cloud technology, allowing the combination of datacenter, private, and public cloud resources without sacrificing security or performance, and without incurring substantial migration or interconnection costs typical to traditional solutions.

This initial patent demonstrates the company’s commitment to advancing cutting-edge cloud infrastructure technology and simplifying how enterprise IT cloud infrastructures are designed, built and managed. Specifically, it details how physical resources that may be hosted within a datacenter or colocation site can connect to one or more cloud providers creating a seamless, single pool of resources. Additionally, once an enterprise is connected to the Faction composable cloud fabric, it gains the ability to easily mix other third-party cloud services together creating a true multi-cloud solution.  Further patent applications are in process, and last week a second Notice of Allowance was received on a Faction patent application to further expand the scope of issued patent claims, with others expected to follow.

The Faction technology also provides a unified data fabric for composing true hybrid and multi-cloud solutions. Enterprise IT can leverage public cloud resources on-demand while retaining the control and security of their private cloud infrastructure. The network technology enabling composable clouds complements Faction’s private cloud offering, which fully decouples compute, storage capacity, storage performance, and network capacity, enabling enterprises to compose their ideal private cloud. The Faction Internetwork eXchange makes these composable resources available not only within the private cloud environments, but the data center and colocation environments and the public clouds as well.

The announcement of the USPTO issuing a first patent follows last month’s announcement that Faction raised $11 million in capital, which will be used to expand the company and help meet strong customer demand. In 2016, Faction saw 44% year-over-year growth, and it projects similar growth in 2017.

Source: CloudStrategyMag