7 big data tools to ditch in 2017

7 big data tools to ditch in 2017

We’ve been on this big data adventure for a while. Not everything is still shiny and new anymore. In fact, some technologies may be holding you back. Remember, this is the fastest-moving area of enterprise tech — so much so that some software acts as a placeholder until better bits arrive.

Those upgrades — or replacements — can make the difference between a successful big data initiative and one you’ll be living down for the next few years. Here’s are some elements of the stack you should start to think about replacing:

1. MapReduce. MapReduce is slow. It’s rarely the best way to go about a problem. There are other algorithms to choose from — the most common is DAG, of which MapReduce can be considered a subset. If you’ve done a bunch of custom MapReduce jobs, the performance difference compared to Spark is worth the cost and trouble of switching.

2. Storm. I’m not saying Spark will eat the streaming world, although it might, but with technologies like Apex and Flink there are better, lower-latency alternatives to Spark than Storm. Besides, you should probably evaluate your latency tolerance and whether the bugs you have in your lower-level, more complicated code are worth a few extra milliseconds. Storm doesn’t have the support that it could, with Hortonworks as the only real backer — and with Hortonworks facing increasing market pressure, Storm is unlikely to get more attention.

3. Pig. Pig kind of blows. You can do anything it does with Spark or other technologies. At first Pig seems like a nice “PL/SQL for big data,” but you quickly find out it’s a little bizarre.

4. Java. No, not the JVM, but the language. The syntax is clunky for big data jobs. Plus, newer constructs like Lambda have been bolted onto the side in a somewhat awkward manner. The big data world has largely moved to Scala and Python (the latter when you can afford the performance hit and need Python libraries or are infested with Python developers). Of course, you can use R for stats, until you rewrite it in Python because R doesn’t have all the fun scale features.

5. Tez. This is another Hortonworks pet project. It’s a DAG implementation, but unlike Spark, Tez is described by one of its developers as like writing in “assembly language.” At the moment, with a Hortonworks distribution, you’ll end up using Tez behind Hive and other tools — but you can already use Spark as the engine in other distributions. Tez has always been kind of buggy anyhow. Again, this is one vendor’s project and doesn’t have the industry or community support of other technologies. It doesn’t have any runaway advantages over other solutions. This is an engine I’d look to consolidate out.

6. Oozie. I’ve long hated on Oozie. It isn’t much of a workflow engine or much of a scheduler — yet it’s both and neither at the same time! It is, however, a collection of bugs for a piece of software that shouldn’t be that hard to write. Between StreamSets, DAG implementations, and all, you should have ways to do most of what Oozie does.

7. Flume. Between StreamSets and Kafka and other solutions, you probably have an alternative to Flume. That May 20, 2015, release is looking a bit rusty. You can track the year-on-year activity level. Hearts and minds have left. It’s probably time to move on.

Maybe by 2018 …

What’s left? Some technology is showing its age, but complete viable alternatives have not arrived yet. Think ahead about replacing these:

1. Hive. This is overly snarky, but Hive is like the least performant distributed database on the planet. If we hadn’t as an industry decided RDBMSes were the greatest thing since sliced bread for like 40 years, then would we really have created this monster?

2. HDFS. Writing a system-level service in Java is not the greatest of ideas. Java’s memory management also makes pushing massive amounts of bytes around a bit slow. The way the HDFS NameNode works is not ideal for anything and constitutes a bottleneck. Various vendors have workarounds to make this better, but honestly, nicer things are available. There are other distributed filesystems. MaprFS is a pretty well-designed one. There’s also Gluster and a slew of others.

Your gripes here

With an eye to the future, it’s time to cull the herd of technologies that looked promising but have grown either obsolete or rusty. This is my list. What else should I add?

Source: InfoWorld Big Data

Review: TensorFlow shines a light on deep learning

Review: TensorFlow shines a light on deep learning

What makes Google Google? Arguably it is machine intelligence, along with a vast sea of data to apply it to. While you may never have as much data to process as Google does, you can use the very same machine learning and neural network library as Google. That library, TensorFlow, was developed by the Google Brain team over the past several years and released to open source in November 2015.

TensorFlow does computation using data flow graphs. Google uses TensorFlow internally for many of its products, both in its datacenters and on mobile devices. For example, the Translate, Maps, and Google apps all use TensorFlow-based neural networks running on our smartphones. And TensorFlow underpins the applied machine learning APIs for Google Cloud Natural Language, Speech, Translate, and Vision.

Source: InfoWorld Big Data

OneNeck® IT Solutions Moves Luck Stone Corporation From Public To Private Cloud

OneNeck® IT Solutions Moves Luck Stone Corporation From Public To Private Cloud

OneNeck® IT Solutions has announced that Luck Stone Corporation contracted for conversion of their IT infrastructure. OneNeck is moving Luck Stone’s Microsoft Dynamics AX environment from public cloud onto OneNeck’s private hosted cloud known as ReliaCloud®. OneNeck will continue providing Luck Stone with enterprise application, network, security and infrastructure management, along with database and operating system administration.

“During the last five years, OneNeck has proven they have the ability, expertise and experience to manage our Microsoft Dynamics AX environment,” said Donald Jones, VP of IT at Luck Stone. “We began noticing performance issues within our public cloud environment and realized we needed a more scalable, agile and secure solution. In talking with OneNeck about their Infrastructure as a Service solution, AX on ReliaCloud, we realized it could enhance our performance and security. At the same time, moving to ReliaCloud would allow us to take advantage of a more cost-effective solution. Adding ReliaCloud to the portfolio and moving to a private cloud environment just made sense.”

Luck Stone is one of the nation’s largest family-owned and operated producers of crushed stone, sand and gravel. Based in Richmond, Va., Luck Stone prides itself on being a dependable, responsive partner who continually strives to innovate and deliver consistent, quality material customers can count on for project and business success.

By moving their Microsoft Dynamics AX environment onto ReliaCloud (an enterprise-level hosted, private cloud that delivers the power and flexibility of a public cloud solution), Luck Stone will:

  • Avoid unscheduled downtime and unplanned maintenance.
  • Optimize their IT cost structure between capital and operating costs.
  • Control their Microsoft Dynamics AX license base, while still maintaining full-platform certification and supportability.
  • Leverage OneNeck’s depth of Microsoft application expertise.
  • Have access to flexible resource pools to deploy (and re-deploy) as their IT environment changes.
  • Continually meet security and compliance requirements.
  • Securely connect with a variety of access points.
  • Achieve advanced disaster recovery capabilities using resources in multiple data centers owned and operated by OneNeck.

“We’re proud to continue our partnership with Luck Stone and to help them move to an environment that enhances the performance of their IT and delivers greater security,” says Terry Swanson, Senior VP of Sales and Marketing at OneNeck. “We appreciate their business and attribute it to the dedication and experience of our employees. Having Luck Stone expand their contract to include ReliaCloud is a huge testament to the commitment of our entire team.”

Source: CloudStrategyMag

OffsiteDataSync Ranked Among Top 100 Cloud Services Providers

OffsiteDataSync Ranked Among Top 100 Cloud Services Providers

OffsiteDataSync ranks among the world’s Top 100 cloud services providers (CSPs), according to Penton’s sixth-annual Talkin’ Cloud 100 report.

Based on data from Talkin’ Cloud’s online survey, conducted through June to August 2016, the Talkin’ Cloud 100 list recognizes top cloud services providers (CSPs), considering annual cloud services revenue growth, and input from Penton Technology’s Channel editors.

“OffsiteDataSync is honored to be included among the 2016 TC100,” said Matthew Chesterton, CEO, OffsiteDataSync. “Our steady rise in the rankings to 22nd is a testament to our depth of experience, commitment to continuous improvement, and strong partner relationships.”

“On behalf of Penton and Talkin’ Cloud, I would like to congratulate OffsiteDataSync for its recognition as a Talkin’ Cloud 100 honoree,” said Nicole Henderson, editor in chief, Talkin’ Cloud. “Cloud services providers on the Talkin’ Cloud 100 set themselves apart through innovative cloud offerings and new support models, demonstrating a deep understanding of their customers’ needs and future cloud opportunities.”

Source: CloudStrategyMag

Salesforce will buy Krux to expand behavioral tracking capabilities

Salesforce will buy Krux to expand behavioral tracking capabilities

Salesforce.com has agreed to buy user data management platform Krux Digital, potentially allowing businesses to process even more data in their CRM systems.

Krux describes its business as “capturing, unifying, and activating data signatures across every device and every channel, in real time.”

Essentially, it performs the tracking underlying behavioral advertising, handling 200 billion “data collection events” on three billion browsers and devices (desktop, mobile, tablet and set-top) each month.

With that staggering volume of data, “Krux will extend the Salesforce Marketing Cloud’s audience segmentation and targeting capabilities to power consumer marketing with even more precision, at scale,” Krux CEO and co-founder Tom Chavez wrote on the company blog.

The acquisition will also allow joint customers of Salesforce and Crux to feed “billions of new signals” to Salesforce Einstein, a suite of AI-based tools for building predictive models, Chavez said.

Unveiled two weeks ago, Salesforce Einstein will include functions such as predictive lead scoring and recommended case classification. Some functions will be available for free, while others will be charged for based on data volume and user numbers.

Krux is part of the Salesforce ecosystem, but also works with other vendors including Oracle, Google’s DoubleClick, Criteo and a host of other advertising networks. According to Chavez, it won’t be cutting those ties following the acquisition. “Openness remains a guiding principal,” he said. “We expect to continue supporting our thriving partner ecosystem and integrating with a wide variety of platforms.”

Businesses already using Krux to track their customers include media companies BBC, HBO, NBCUniversal and DailyMotion; publishers The Guardian and Financial Times, and food and drink companies ABInBev, Mondelez International, Kelloggs and Keurig.

Salesforce will pay around $340 million in cash and a similar amount in shares for Krux, according to a filing it made with the SEC Tuesday. It expects to close the deal by the end of January.

Later Tuesday, Salesforce will open its Dreamforce customer and partner conference in San Francisco. Krux is one of the exhibitors.

Source: InfoWorld Big Data

Google Cloud Machine Learning hits public beta, with additions

Google Cloud Machine Learning hits public beta, with additions

Google unveiled today machine learning-related additions to its cloud platform, both to enrich its own cloud-based offerings and to offer expanded toolsets for businesses to develop their own machine learning-powered products.

The most prominent offering was the public beta of Google Cloud Machine Learning, a platform for building and training machine learning models with the TensorFlow  learning framework and data stored in the BigQuery and Cloud Storage back ends.

Google says its system simplifies the whole process of creating and deploying machine learning back ends for apps. Some of this is simply by making models faster to train. Google claims Cloud Machine Learning’s distributed training “can train models on terabytes of data within hours, instead of waiting for days.”

Much of it, however, is about Cloud Machine Learning’s APIs reducing the amount of programming required to build useful things. In a live demo, Google built and demonstrated a five-layer neural net for stock market analysis with just a few lines of code.

Another announced feature, HyperTune, removes another source of drudgery often associated with building machine learning models. Models often need to have parameters tweaked to yield the best results. Google claims HyperTune “automatically improves predictive accuracy” by automating that step.

Google Cloud Machine Learning was previously only available as an alpha-level tech preview, but InfoWorld’s Martin Heller was impressed with its pre-trained APIs for artificial vision, speech, natural language, and language translation.

Many of the machine learning tools Google now offers for end users, such as TensorFlow, arose from Google’s internal work to bolster its projects. The revamped version of Google’s office applications, G Suite, is one of the latest to be dressed up with machine-learning powered features. Most of these additions are for automating common busywork, such as finding a free time slot on a calendar to hold a meeting.

Google’s machine learning offerings pit it against several other big-league cloud vendors offering their own variations on the same themes, from IBM’s BlueMix and Watson services to Microsoft’s Azure Machine Learning. All of them, along with Amazon, Facebook, and others, recently announced the Partnership on AI effort to “study and formulate best practices on AI technologies” — although it seems more like a general clearinghouse for public awareness about machine learning than a way for those nominal rivals to collaborate on shared projects.

Source: InfoWorld Big Data

Clarient Global Adopts IBM Cloud, VMware

Clarient Global Adopts IBM Cloud, VMware

IBM has announced that Clarient Global LLC (“Clarient”), a joint venture established to transform client data and document management in the financial services industry, has selected VMware Cloud Foundation on IBM Cloud to continue to enhance its existing SoftLayer private cloud implementation for its Clarient Entity Hub platform.

Clarient Entity Hub allows users to securely upload, maintain and share information about legal entities through a secure, easy-to-use interface. The platform automates the validation of client data and documentation, providing users with greater transparency and control, as well as improved risk management capabilities.

With this new implementation, Clarient has enhanced its Clarient Entity Hub application with VMware Cloud Foundation on IBM Cloud running on bare metal servers. Through this private cloud, Clarient will continue to improve its security, scale and flexibility while achieving greater server density due to the ability to control and manage the hypervisors.

In addition to this private cloud solution, which was streamlined due to the strategic partnership between IBM and VMware announced earlier this year, Clarient has integrated with IBM’s Business Process Management (BPM) to provide clients with even greater data and process visibility and management.

“Clarient creates efficiency in the client entity data and document management space by providing transparency, control and standardization,” said Natalia Kory, CTO, Clarient. “As the Clarient Entity Hub community grows, we continually assess ways to further enhance the solution workflow in order to improve the overall client experience and increase processing efficiencies. IBM’s BPM solution, in conjunction with the VMWare Cloud Foundation on IBM Cloud solution, will help Clarient to achieve these requirements while reducing the cost of platform maintenance.”

In addition to flexibility, the IBM Cloud provides Clarient with a fully redundant, low latency network that allows for near real-time communication between datacenter locations, making it easier to keep replication sites in sync at no charge for network usage.

“The innovative solution that Clarient provides financial institutions around the world is increasingly critical, as the need for accurate and compliant client entity data continues to grow,” said Bill Karpovich, general manager, IBM Cloud Platform. “By leveraging a partnership with IBM and VMware, Clarient is able to extend its global and controllable infrastructure footprint.”

“Our cloud partnership with IBM continues to grow and evolve as we look to enable clients, such as Clarient, to solve the key industry challenges,” said Geoff Waters, vice president, global service provider channel, VMware. “The Clarient Entity Hub is a new way to address the unique requirements within the financial services sector and provide fast, automated and accurate client entity data. We look forward to joint success with IBM and enabling clients to continue to adopt the cloud while preserving their existing investments.”

Source: CloudStrategyMag

OneNeck® IT Solutions Releases OneNeck Connect

OneNeck® IT Solutions Releases OneNeck Connect

OneNeck IT Solutions has announced the general availability of OneNeck Connect, the company’s newest service. With OneNeck Connect, businesses get access to a 1 Gbps broadband pipe into the company’s Tier III data center in Eden Prairie.

“With OneNeck Connect, businesses throughout the metro area gain fast and reliable access into our state-of-the-art facility,” says Clint Harder, CTO and senior vice president at OneNeck. “This new offering provides a quick and easy way to connect into a comprehensive array of hybrid cloud and managed services provided by OneNeck.”

Currently, OneNeck Connect is available to businesses throughout the Twin Cities metro area who are on-net with Zayo or Comcast. Throughout the coming months, OneNeck expects to broaden availability with other providers in the metro area. The company already offers OneNeck Connect in Denver and plans to introduce the high-speed data solution into its other Tier 3 data centers in Arizona, Iowa, and Wisconsin.

Source: CloudStrategyMag

Zetta Launches Zetta Disaster Recovery

Zetta Launches Zetta Disaster Recovery

Zetta has announced Zetta Disaster Recovery, a new cloud-first disaster recovery (DR) solution that offers sub-five minute failover with the push of a button. The new solution enables small and mid-sized enterprise (SME) customers and partners to continue accessing business-critical applications with minimal disruption during a downtime event. The cost-effective service offers high availability and reliability for even the most demanding recovery time objectives (RTOs).

“From human-led malicious attacks to unexpected system downtime to natural disasters, unforeseen events can be costly, even devastating, for today’s data-driven business,” said Mike Grossman, CEO, Zetta. “With the new Zetta Disaster Recovery, applications and databases can failover in less than five minutes, so businesses and their employees can continue working without interruption. This delivers true peace-of-mind without the cost and complexity that has been traditionally associated with disaster recovery solutions.”

“At EMA we have estimated that the cost of downtime can vary from as much as $90,000 to $6 million an hour, depending on the industry and its application environment. But, no matter how you slice it, downtime is a cost most businesses simply can’t endure,” said Jim Miller, senior analyst, Enterprise Management Associates. “Disaster Recovery in the cloud can be an efficient and cost-effective way to avoid the potentially high cost of downtime. With Zetta Disaster Recovery, Zetta delivers cloud-based business continuance with a complete service that features both simplicity and affordability.”

Easy-to-Achieve Disaster Readiness and Recovery

Zetta Disaster Recovery is an end-to-end service that provides complete deployment-to-failback coverage. It includes upfront network, firewall, VPN and connectivity configuration and automated DR testing, which can be easily customized to accommodate an organization’s unique network environment, ensuring that, in the event of a disaster of any kind, a company can be fully operational in the cloud. 

The new DR service also supports incremental failback, allowing companies to continue to run their systems in the cloud, while Zetta Disaster Recovery manages sequential failback in the background. As a result, final switchover from cloud to local operations can happen painlessly – in minutes. 

Enterprise-Grade DR Solution at an Affordable Price

Zetta Disaster Recovery is a cost-effective option for companies that cannot afford to invest in a secondary DR site but who require rapid failover with truly dynamic scalability and rapid throughput rates. Zetta Disaster Recovery bundles network and VPN configuration, and DR testing and planning, eliminating the need for companies to engage outside professional service firms to perform these functions.

Optimized for Complex IT Environments

Zetta Disaster Recovery has been architected with the needs of larger enterprises in mind: to rapidly protect very large data sets and complex IT environments using fewer system resources and in less time than alternative options. Key features of Zetta Disaster Recovery include:

  • Comprehensive support for end-to-end DR including backup, failover and failback
  • Failback flexibility with support for incremental failbacks
  • High-performing IO, CPU and RAM resources to support workload demands of SME organizations
  • Pre-provisioned virtual VPN and firewall to ensure that an organization’s workers have on-demand access to applications running in the Zetta Cloud
  • Power-on DR testing to validate that systems and applications will be operational in event of disaster

All protected data is encrypted via SSL in flight to and via AES at rest in the Zetta Cloud. For additional security, options for secure VPN connectivity to the recovered environment in the Zetta Cloud include Point to Site, Site to Site and IP takeover.

 

Source: CloudStrategyMag