IDG Contributor Network: Getting off the data treadmill

IDG Contributor Network: Getting off the data treadmill

Most companies start their data journey the same way: with Excel. People who are deeply familiar with the business start collecting some basic data, slicing and dicing it, and trying to get a handle on what’s happening.

The next place they go, especially now, with the advent of SaaS tools that aid in everything from resource planning to sales tracking to email marketing, is into the analytic tools that come packaged with their SaaS tools.

These tools provide basic analytic functions, and can give a window into what’s happening in at least one slice of the business. But drawing connections between those slices (joining finance data with marketing data, or sales with customer service) is where the real value lies. And that’s exactly where these department-specific tools fall down.

So when you talk to people in that second phase, understandably, they’re looking forward to the day when all of their data automatically flows into one place.. No more manual, laborious hours spent combining data. Just one place to look and see exactly what’s happening in the business.

Except

Once you give people a taste of the data and they can see what’s happening, naturally, their very next question is, “Well, why did that happen?”

How things usually work

And that’s where things break down. For most of the history of business intelligence, the way you answered “why” questions was to extract the relevant data from that beautiful centralized tool and send it off to an analyst. They would load the data back into a workbook, start from scratch on a new report, and you’d wait.

By the time you got your answer, it was usually too late to use that knowledge in making your decision.

The whole thing is kind of silly, though — you’d successfully gotten rid of a manual, laborious process and replaced it with one that is, well, manual and laborious. You thought you were moving forward, but it turns out you were just on a treadmill.

To sketch it out, here’s what that looks like:

img1Daniel Mintz

Another path

Recently though, more and more businesses are realizing that there’s another way: With the right tools, you can put the means to answer why questions in the hands of the people who can (and will) take action based on those answers.

In the old world, you’d find out in February that January leads were down, and wait until March for the analysis that reveals that — d’oh! — the webform wasn’t working on mobile. In the new world, you can get an automated alert about the drop-off in the first week of the year. You can drill into the relevant data immediately by device type, realize that the drop-off only affects mobile, surface the bug, and get it fixed that afternoon.

That’s the real value that most businesses aren’t realizing from their data. It’s much less about incorporating the latest machine learning algorithm that delivers a 3% improvement in behavioral prediction, and more about the seemingly simple task of putting the right information in front of the right person at the right time.

The task isn’t simple (especially considering the mountains of data most companies are sitting on). But the good news is that it is achievable and it doesn’t take a room full of Ph.D’s or millions of dollars in specialized software.

What it does take is focus, and a commitment to being data-driven.

Luckily, it’s worth it. The payoff of facilitating this kind of exploration is enormous. It can be the difference between making the right decision and the wrong one — hundreds of times a month — all across your company.

img2Daniel Mintz

So if you find yourself stuck on the treadmill, try stepping off. I think you’ll like where the path takes you.

This article is published as part of the IDG Contributor Network. Want to Join?

Source: InfoWorld Big Data

InfoWorld's 2017 Technology of the Year Award winners

InfoWorld's 2017 Technology of the Year Award winners

Imagine if the files, processes, and events in your entire network of Windows, MacOS, and Linux endpoints were recorded in a database in real time. Finding malicious processes, software vulnerabilities, and other evil artifacts would be as easy as asking the database. That’s the power of OSquery, a Facebook open source project that makes sifting through system and process information to uncover security issues as simple as writing a SQL query.

Facebook ported OSquery to Windows in 2016, finally letting administrators use the powerful open source endpoint security tool on all three major platforms. On each Linux, MacOS, and Windows system, OSquery creates various tables containing operating system information such as running processes, loaded kernel modules, open network connections, browser plugins, hardware events, and file hashes. When administrators need answers, they can ask the infrastructure.

The query language is SQL-like. For example, the following query will return malicious processes kicked off by malware that has deleted itself from disk:

SELECT name, path, pid FROM processes WHERE on_disk = 0;

This ability has been available to Linux and MacOS administrators since 2014 —Windows administrators are only now coming to the table.

Porting OSquery from Linux to Windows was no easy feat. Some creative engineering was needed to overcome certain technical challenges, such as reimplementing the processes table so that existing Windows Management Instrumentation (WMI) functionality could be used to retrieve the list of running processes. (Trail of Bits, a security consultancy that worked on the project, shares the details in its blog.)  

Administrators don’t need to rely on complicated manual steps to perform incident response, diagnose systems operations problems, and handle security maintenance for Windows systems. With OSquery, it’s all in the database.

— Fahmida Y. Rashid

This article appears to continue on subsequent pages which we could not extract

Source: InfoWorld Big Data

Tap the power of graph databases with IBM Graph

Tap the power of graph databases with IBM Graph

Natural relationships between data contain a gold mine of insights for business users. Unfortunately, traditional databases have long stored data in ways that break these relationships, hiding what could be valuable insight. Although databases that focus on the relational aspect of data analytics abound, few are as effective at revealing the hidden valuable insights as a graph database.

A graph database is designed from the ground up to help the user understand and extrapolate nuanced insight from large, complex networks of interrelated data. Highly visual graph databases represent discrete data points as “vertices” or “nodes.” The relationships between these vertices are depicted as connections called “edges.” Metadata, or “properties” of vertices and edges, are also stored within the graph database to provide more in-depth knowledge of each object. Traversal allows users to move between all the data points and find the specific insights the user seeks.

To better explain how graph databases work, I will use IBM Graph, a technology that I helped to build and am excited to teach new users about. Let’s dive in.

Intro to IBM Graph

Based on the Apache TinkerPop framework for building high-performance graph applications, IBM Graph is built to enable and work with powerful applications through a fully managed graph database service. In turn, the service provides users with simplified HTTP APIs, an Apache TinkerPop v3 compatible API, and the full Apache TinkerPop v3 query language. The goal of this type of database is to make it easier to discover and explore the relationships in a property graph with index-free adjacency using nodes, edges, and properties. In other words, every element in the graph is directly connected to adjoining elements, eliminating the need for index lookups to traverse a graph. 

Through the graph-based NoSQL store it provides, IBM Graph creates rich representations of data in an easily digestible manner. If you can whiteboard it, you can graph it. All team members, from the developer to the business analyst, can contribute to the process.

The flexibility and ease of use offered by a graph database such as IBM Graph mean that analyzing complex relationships is no longer a daunting task. A graph database is the right tool for a time when data is generated at exponentially high rates amid new applications and services. A graph database can be leveraged to produce results for recommendations, social networks, efficient routes between locations or items, fraud detection, and more. It efficiently allows users to do the following:

  • Analyze how things are interconnected
  • Analyze data to follow the relationships between people, products, and so on
  • Process large amounts of raw data and generate results into a graph
  • Work with data that involves complex relationships and dynamic schema
  • Address constantly changing business requirements during iterative development cycles

How a graph database works

Schema with indexes. Graph databases can either leverage a schema or not. IBM Graph works with a schema to create indexes that are used for querying data. The schema defines the data types for the properties that will be employed and allows for the creation of indexes for the properties. In IBM Graph, indexes are required for the first properties accessed in the query. The schema is best done beforehand (although it can be appended later) in order to ensure that the vertices and edges introduced along the way can work as intended.

A schema should define properties, labels, and indexes for a graph. For instance, if analyzing Twitter data, the data would be outlined as person, hashtag, and tweet vertices, and the connections between them are mentions, hashes, tweets, and favorites. Indices are also created to query schemas.

graph database graphIBM

Loading data. Although a bulk upload endpoint is available, the Gremlin endpoint is the recommended method for uploading data to the service. This is because you can upload as much data as you want via the Gremlin endpoint. Moreover, the service automatically assigns IDs to graph elements when you use the bulk upload endpoint, preventing connections from being made between nodes and edges from separate bulk uploads. The response to your upload should let you know if there was an error in the Gremlin script and return the last expression on your script. A successful input should result in something like this:

graph database graphIBM

Querying data. IBM Graph provides various API endpoints for querying data. For example, the /vertices and /edges endpoints can be used to query graph elements by properties or label. But these endpoints should not be employed for production queries. Instead, go with the /Gremlin endpoint, which can work for more complex queries or for performing multiple queries in a single request. Here’s an example of a query that returns the tweets favorited by user Kamal on Twitter:

ibm graph query 1IBM

To improve query performance and prevent Gremlin query code from being compiled every time, use bindings. Bindings allow you to keep the script the same (cached) while varying the data it uses with every call. For example, if there is a query that retrieves a particular group of discrete data points, you can assign a name in a binding. The binding can then reduce the time it takes to run similar queries, as the code only has to be compiled a single time. Below is a modified version of the above query that uses binding:

ibm graph query 2IBM

It is important to note there is no direct access to the Gremlin binary protocol. Instead, you interact with the HTTP API. If you can make a Curl request or an HTTP request, you can still manipulate the graph. You make the request to endpoints.

For running the code examples in this article locally on your own machine, you need bash, curl, and jq.

Configuring applications for IBM Graph

When creating an instance of IBM Graph service, the necessary details for your application to interact with the service are provided in JSON format.

ibm graph jsonIBM

Service instances can typically be used by one or more applications and can be accessed via IBM Bluemix or outside it. If it’s a Bluemix application, the service is tied to the credentials used to create it, which can be found in the VCAP_SERVICES environment variable.

Remember to make sure the application is configured to use:

  • IBM Graph endpoints that are identified by the apiURL value
  • The service instance username that is identified by the username value
  • The service instance password that is identified by the password value

In the documentation, Curl examples use $username, $password, and $apiURL when referring to the fields in the service credentials.

Bluemix and IBM Graph

IBM Graph is a service provided via IBM’s Bluemix—a platform as a service that supports several programming languages and services along with integrated devops to build, run, deploy, and manage cloud-based applications. There are three steps to using a Bluemix service like IBM Graph:

  • Create a service instance in Bluemix by requesting a new service instance. Alternatively, when using the command-line interface, go with IBM Graph as the service name and Standard as the service plan.
  • (Optional) Identify the application that will use the service. If it’s a Bluemix application, you can identify it when you create a service instance. If external, the service can remain unbound.
  • Write code in your application that interacts with the service.

Ultimately, the best way to learn a new tool like IBM Graph is to build an application that solves a real-world problem. Graph databases are used for social graphs, fraud detection, and recommendation engines, and there are simplified versions of these applications that you can build based on pre-existing data sets that are open for use (like census data). One demonstration that is simple, yet entertaining, is to test a graph with a six-degrees-of-separation-type example. Take a data set that interests you, and explore new ways to find previously hidden connections in your data.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Source: InfoWorld Big Data

Review: Scikit-learn shines for simpler machine learning

Review: Scikit-learn shines for simpler machine learning

Scikits are Python-based scientific toolboxes built around SciPy, the Python library for scientific computing. Scikit-learn is an open source project focused on machine learning: classification, regression, clustering, dimensionality reduction, model selection, and preprocessing. It’s a fairly conservative project that’s pretty careful about avoiding scope creep and jumping on unproven algorithms, for reasons of maintainability and limited developer resources. On the other hand, it has quite a nice selection of solid algorithms, and it uses Cython (the Python-to-C compiler) for functions that need to be fast, such as inner loops.

Among the areas Scikit-learn does not cover are deep learning, reinforcement learning, graphical models, and sequence prediction. It is defined as being in and for Python, so it doesn’t have APIs for other languages. Scikit-learn doesn’t support PyPy, the fast just-in-time compiling Python implementation because its dependencies NumPy and SciPy don’t fully support PyPy.

Source: InfoWorld Big Data

CallidusCloud CPQ And Clicktools Earn Servicemax Certification

CallidusCloud CPQ And Clicktools Earn Servicemax Certification

Callidus Software Inc. has announced the integrations of its award-winning configure price quote (CPQ) and Clicktools feedback solutions with ServiceMax’s leading cloud-based field service management platform.

ServiceMax customers, who manage all aspects of the field service process, including work orders, parts management, entitlements and dispatch, can now seamlessly employ CallidusCloud’s quote generation capabilities and the ability to collect customer feedback from within the ServiceMax platform. The combination simplifies and removes previously manual CPQ processes to better tackle new business opportunities, which frequently arise during the service process.

The combination enables field service reps to generate value-rich quotes and proposals while they’re on-site with customers. Once the quote has been generated, customer feedback can be collected to ensure an excellent customer experience and to help improve future interactions.

“To meet customer expectations in the Internet of Things era, it’s vital to automate as many steps in the sales process as possible,” said Giles House, chief marketing officer at CallidusCloud. “Customers want to complete transactions faster and more accurately, especially when they’re face to face with field service representatives. Giving service reps the power to efficiently complete transactions will help ServiceMax customers make more money faster.”

“Price automation used to be one more hurdle for our customers to provide a seamless service experience,” said Jonathan Skelding, vice president of global alliances at ServiceMax. “Integrating CallidusCloud’s technology into our platform has facilitated a faster, more intuitive quote-automation process — and as a result, it’s empowered technicians to provide a flawless field service experience. When used in an Internet of Things environment such as our Connected Field Service platform, imagine a connected machine initiating a parts and labor quote which, once authorized, creates the work order and schedules the technician.”

CPQ and the Clicktools platform are delivered as part of CallidusCloud’s Lead to Money suite, a SaaS suite designed to help businesses drive enterprise engagement, sales performance management and sales effectiveness throughout the sales cycle to close bigger deals, faster.

Source: CloudStrategyMag

CloudGenix Partners With Converged Network Services Group

CloudGenix Partners With Converged Network Services Group

CloudGenix, Inc. has announced it has entered into a master agent agreement with Converged Network Services Group (CNSG), the premier Master Distributor for connectivity, cloud, and cloud enablement. With this partnership, CloudGenix will accelerate its business development, while CNSG will add the CloudGenix Instant-On Networks (ION) product family to its portfolio of solutions. With CloudGenix ION, CNSG and its partners can now provide customers with the best solutions for their connectivity needs, independent of carriers and connectivity transports.

CNSG, a solutions provider for end-to-end telecommunications services, has a decade-long track record of helping businesses manage their communications infrastructure. Together, CNSG and CloudGenix will provide customers with not only a best-of-breed connectivity solution, but will also deliver SLAs for cloud applications such as Office365, AWS, Azure, Unified Communications, and VoIP. CloudGenix ION eliminates complex routing protocols and hardware routers, enabling direct setup of business rules and app SLAs, while also reducing WAN costs by 50% to 70%. All network and app flows are stored in a centralized database, providing customer access to native, actionable application and network insights. CloudGenix uniquely delivers single-sided, per-app controls and SLAs for cloud apps.

“CNSG is committed to working with only the best-of-breed technology suppliers to deliver the highest quality solutions for our partners and their customers,” said Randy Friedberg, vice president of business development at CNSG. “Our alliance with CloudGenix reflects this mission, and ensures our product portfolio continues to align with customers’ needs for cost savings and unmatched application performance. CloudGenix uniquely offers provider-agnostic SD-WAN solutions and provides unmatched support for our partners.”

“This agreement is a win all around: CNSG benefits from leading-edge SD-WAN product offerings for its customers that enables its telco aggregation service, CloudGenix is partnering with a leader in the industry, while customers benefit with cost savings, streamlined business processes and a solution that will take them into the future,” said Kumar Ramachandran, CEO of CloudGenix. “It’s a strong strategic fit that maximizes the strengths of both companies.”

Register here for a February 18, 2017 webinar featuring CNSG and CloudGenix, which will discuss the successes companies are realizing with CloudGenix SD-WAN.

Source: CloudStrategyMag

Fusion Wins Three Year, $350,000 Contract

Fusion Wins Three Year, 0,000 Contract

Fusion has announced that it has signed a three year, $350,000 cloud solutions contract with a major, multi-site radiology center headquartered in the Midwest. The win demonstrates Fusion’s increasing success in the health care vertical. Fusion’s specialized solutions are winning growing acceptance among health care providers who cite Fusion’s comprehensive understanding of the industry’s needs and its professional expertise in delivering effective solutions that solve its unique problems.

The radiology center has continuously evolved its imaging technology for over seventy years, providing expert diagnoses and treatment to patients referred by multiple hospitals and ambulatory care centers in the region. It was impressed with Fusion’s flexibility and agility in customizing solutions to meet the industry’s demanding compliance requirements.

The center also noted that Fusion’s feature-rich cloud communications solutions are provided over the company’s own advanced, yet proven cloud services platform, allowing for the seamless, cost-effective integration of additional cloud services. Citing quality and business continuity concerns, the center was further impressed that Fusion’s solutions are integrated with secure, diverse connections to the cloud over its robust, geo-redundant national network, with end to end quality of service guarantees and business continuity built in.

Fusion’s single source cloud solutions offer the radiology center a single point of contact under one contract for integrated services, eliminating the need to manage multiple vendors, and optimizing efficiency with shared, burstable resources across the enterprise.

“We appreciate the healthcare’s industry’s increasing confidence in us, and we are pleased to have been selected to help the center advance its technology investments with our cost-effective single source cloud solutions. Fusion is committed to providing healthcare institutions with the solutions they need to provide the highest levels of care professionally, efficiently and compassionately,” said Russell P. Markman, Fusion’s president of business services.

Source: CloudStrategyMag

12 New Year's resolutions for your data

12 New Year's resolutions for your data

Your company was once at the forefront of the computing revolution. You deployed the latest mainframes, then minis, then microcomputers. You joined the PC revolution and bought Sparcs during the dot-com era. You bought DB2 to replace some of what you were doing with IMS. Maybe you bought Oracle or SQL Server later. You deployed MPP and started looking at cubes.

Then you jumped on the next big wave and put a lot of your data on the intranet and internet. You deployed VMware to prevent server sprawl, only to discover VM sprawl. When Microsoft came a-knocking, you deployed SharePoint. You even moved from Siebel to Salesforce to hop into SaaS.

Now you have data coming out of your ears and spilling all over the place. Your mainframe is a delicate flower on which nothing can be installed without a six-month study. The rest of your data is all on the SAN. That works out because you have a “great relationship with the EMC/Dell federation” (where you basically pay them whatever they want and they give you the “EMC treatment”). However, the SAN does you no good for finding actual information due to the effects of VM and application sprawl on your data organization.

Now the millennials want to deploy MongoDB because it’s “webscale.” The Hadoop vendor is knocking and wants to build a data lake, which is supposed to magically produce insights by using cheaper storage … and produce yet another storage technology to worry about.

Time to stop the madness! This is the year you wrangle your data and make it work for your organization instead of your organization working for its data. How do you get your data straight? Start with these 12 New Year’s resolutions:

1. Catalog where the data is

You need to know what you have. Whether or not this takes the form of a complicated data mapping and management system isn’t as important as the actual concerted effort to find it.

2. Map data use

Your data is in use by existing applications, and there’s an overall flow throughout the organization. Whether you track this “data lineage” and “data dependency” via software or sweat, you need to know why you’re keeping this stuff, as well as who’s using it and why. What is the data? What is the source system for each piece of data? What is it used for?

3. Understand how data is created

Remember the solid fuel booster at NASA that had a 1-in-300-year failure rate? Remember that the number was pretty much pulled out of the air? Most of the data was on paper and passed around. How is your data created? How are the numbers derived? This is probably an ongoing effort, as there are new sources of data every day, but it’s worthwhile to prevent your organization’s own avoidable and repeated disasters.

4. Understand how data flows through the organization

Knowing how data is used is critical, but you also need to understand how it got there and any transformation it underwent. You need a map of your organization’s data circulatory system, the big form of the good old data flow diagram. This will not only let you find “black holes” (where inputs are used but no results happen) and “miracles” (where a series of insufficient inputs can’t possibly produce the expected result), but also where redundant flows and transformations exist. Many organizations have lots of copies of the same stuff produced by very similar processes that differ by technology stack alone. It’s just data—we don’t have to pledge allegiance to the latest platform in our ETL process.

5. Automate manual data processing

At various times I’ve tried to sneak a post past my editor entitled something like “Ban Microsoft Excel!” (I think may have worked that into a post or two.) I’m being partly tongue in cheek, but people who routinely monkey with the numbers manually should be replaced by absolutely no one.

I recently watched the movie “Hidden Figures,” and among other details, it depicted the quick pace at which people were replaced by machines (the smarter folk learned how to operate the machines). In truth, we stagnated somewhere along the way, and a large number of people push bits around in email and Excel. You don’t have to get rid of those people, but the latency of fingers on the keyboard is awful. If you map your data, from where it originates and where it flows, you should be able to identify these manual data-munging processes.

6. Find a business process you can automate with machine learning

Machine learning is not magic. You are not going to buy software, turn it loose on your network, and get insights out of the box. However, right now someone in your organization is finding patterns by matching sets of data together and doing an “analysis” that can be done by the next wave of computing. Understand the basics (patterns and grouping, aka clustering, are the easiest examples), and try and find at least one place it can be introduced to advantage. It isn’t the data revolution, but it’s a good way to start looking forward again.

7. Make everything searchable using natural language and voice

My post-millennial son and my Gen-X girlfriend share one major trait: They click the microphone button more often than I do. I use voice on my phone in the car, but almost never otherwise. I learned to type at a young age, and I compose pretty accurate search queries because I practically grew up with computers.

But the future is not communicating with computers on their terms. Training everyone to do that has produced mixed results, so we are probably at the apex of computer literacy and are on our way down. Making your data accessible by natural language search isn’t simply nice to have—it’s essential for the future. It’s also time to start looking into voice if you aren’t there yet. (Disclaimer: I work for Lucidworks, a search technology company with products in this area.)

8. Make everything web-accessible

Big, fat desktop software is generally hated. The maintenance is painful, and sooner or later you need to do something somewhere else on some other machine. Get out of the desktop business! If it isn’t web-based, you don’t want it. Ironically, this is sort of a PC counterrevolution. We went from mainframes and dumb terminals to installing everything everywhere to web browsers and web servers—but the latest trip is worth taking.

9. Make everything accessible via mobile

By any stretch of the numbers, desktop computing is dying. I mean, we still have laptops, but the time we spend on them versus other computing devices is in decline. You can look at sales or searches or whatever numbers you like, but they all point in this direction. Originally you developed an “everything mobile” initiative because the executive got an iPad and wanted to use it on an airplane, and everything looked like crap in the iPad edition of Safari. Then it was the salespeople. Now it’s everyone. If it can’t happen on mobile, then it probably isn’t happening as often as or when/where it should.

10. Make it highly available and distributable

I’m not a big fan of the Oracle theory of computing (stuff everything into your RDBMS and it will be fine, now cut the check, you sheep). Sooner or later outages are going to eat the organization’s confidence. New York City got hit by a hurricane, remember?

It’s time to make your data architecture resilient. That isn’t an old client-server model where you buy Golden Gate or the latest Oracle replication product from a company it recently acquired, then hope for the best. That millennial may be right—you may need a fancy, newfangled database designed for the cloud and distributed computing era. Your reason may not even be to scale but that you want to stay up, handle change better, and have a more affordable offsite replica. The technology has matured. It’s time to take a look.

11. Consolidate

Ultimately the tree of systems and data at many organizations is too complicated and unwieldy to be efficient, accurate, and verifiable. It’s probably time to start chopping at the mistakes of yesteryear. This is often a hard business case to make, but the numbers are there, whether they show how often it goes down, how many people are spent maintaining it, or that you can’t recruit talent to maintain it. Sometimes if it isn’t broke, you still knock it down because it’s eating you alive.

12. Make it visual

People like charts—lots of charts and pretty lines.

This can be the year you drive your organization forward and prove that IT is more than a cost center. It can be the year you build a new legacy. What else are you hoping to get done with data this year? Hit me up on Twitter.

Source: InfoWorld Big Data

Apache Beam unifies batch and streaming for big data

Apache Beam unifies batch and streaming for big data

Apache Beam, a unified programming model for both batch and streaming data, has graduated from the Apache Incubator to become a top-level Apache project.

Aside from becoming another full-fledged widget in the ever-expanding Apache tool belt of big-data processing software, Beam addresses ease of use and dev-friendly abstraction, rather than just offering upraw speed or a wider array of included processing algorithms.

Beam us up!

Beam provides a single programming model for creating batch and stream processing jobs (the name is a hybrid of “batch” and “stream”), and it offers a layer of abstraction for dispatching to various engines used to run said jobs. The project originated at Google, where it’s currently a service called GCD (Google Cloud Dataflow). Beam uses the same API as GCD, and it can use GCD as an execution engine, along with Apache Spark, Apache Flink (a stream processing engine with a highly memory-efficient design), and now Apache Apex (another stream engine for working closely with Hadoop deployments).

The Beam model involves five components: the pipeline (the pathway for data through the program); the “PCollections,” or data streams themselves; the transforms, for processing data; the sources and sinks, where data’s fetched and eventually sent; and the “runners,” or components that allow the whole thing to be executed on a given engine.

Apache says it separated concerns in this fashion so that Beam can “easily and intuitively express data processing pipelines for everything from simple batch-based data ingestion to complex event-time-based stream processing.” This is in line with how tools like Apache Spark have been reworked to support stream and batch processing within the same product and with similar programming models. In theory, it’s one less concept for a prospective developer to wrap her head around, but that presumes Beam is used entirely in lieu of Spark or other frameworks, when it’s more likely that it’ll be used — at least at first — to augment them.

Hands off

One possible drawback to Beam’s approach is that while the layers of abstraction in the product make operations easier, they also put the developer at a distance from the underlying layers. A good case in point is Beam’s current level of integration with Apache Spark; the Spark runner doesn’t yet use Spark’s more recent DataFrames system, and thus may not take advantage of the optimizations those can provide. But this isn’t a conceptual flaw, it’s an issue with the implementation, which can be addressed in time.

The big payoff of using Beam, as noted by Ian Pointer in his discussion of Beam in early 2016, is that it makes migrations between processing systems less of a headache. Likewise, Apache says that Beam “cleanly [separates] the user’s processing logic from details of the underlying engine.”

Separation of concern and ease of migration will be good to have if the ongoing rivalries and competitions between the various big data processing engines continues. Granted, Apache Spark has emerged as one of the undisputed champs of the field, and become a de facto standard choice. But there’s always room for improvement, or an entirely new streaming or processing paradigm. Beam is less about offering a specific alternative than about providing developers and data-wranglers with more breadth of choice between them.

Source: InfoWorld Big Data

Beeks Financial Cloud Joins Equinix Cloud Exchange

Beeks Financial Cloud Joins Equinix Cloud Exchange

Equinix, Inc. has announced global financial cloud infrastructure provider, Beeks Financial Cloud, has deployed on Equinix’s Cloud Exchange as it continues to expand its business globally.

Beeks Financial Cloud leverages Cloud Exchange and Platform Equinix™ to connect its customers to global cloud services and networks via a secure, private and low-latency interconnection model. By joining the Equinix Cloud Exchange, Beeks Financial Cloud gains access to instantly connect to multiple cloud service providers (CSPs) in 21 markets, build a more secure application environment and reduce the total cost of private network connectivity to CSPs for its customers.

“Beeks Financial Cloud has continued to grow rapidly on Equinix’s interconnection platform, with Hong Kong being our eighth addition. Data centers underpin our business and we are confident that Equinix’s Cloud Exchange will enable the speed, resilience and reduced latency our customers have come to expect from our company. Equinix’s global footprint of interconnected data centers has allowed our business to really thrive,” said Gordon McArthur, CEO, Beeks Financial Cloud.

Today, banks, brokers, forex companies, and professional traders are increasingly relying on high-speed, secure and low-latency connections for more efficient business transactions, as demand for data centers and colocation services in the cloud, enterprise and financial services sector continues to grow. According to a July 2016 report by Gartner – Colocation-Based Interconnection Will Serve as the ‘Glue’ for Advanced Digital Business Applications – digital business is “enabled and enhanced through high-speed, secure, low-latency communication among enterprise assets, cloud resources, and an ecosystem of service providers and peers. Architects and IT leaders must consider carrier-neutral data center interconnection as a digital business enabler.”

Beeks Financial Cloud, a UK-based company, first deployed in an Equinix London data center four years ago on one server rack, now has approximately 80 interconnections within Equinix across eight data centers situated in financial business hubs around the world. These direct connections provide increased performance and security between Beeks and its customers and partners across its digital supply chain. Beeks was the first provider in the world to use cross connects to ensure a retail trader customer had a direct connection to their broker.

Beeks’ new deployment in Equinix’s Cloud Exchange provides the necessary digital infrastructure and access to a mature financial services business ecosystem to connect with major financial services providers in key markets around the globe via the cloud. Equinix’s global data centers are home to 1,000+ financial services companies and the world’s largest multi-asset class electronic trading ecosystem— interconnected execution venues and trading platforms, market data vendors, service providers, and buy-side and sell-side firms.

Equinix’s Cloud Exchange offers software-defined direct connections to multiple CSPs including Amazon Web Services (AWS), Google Cloud Platform, Microsoft Azure ExpressRoute and Office 365, IBM Softlayer, Oracle Cloud, and others. This has allowed Beeks to scale up rapidly while securely connecting to multiple cloud providers.

Beeks Financial Cloud has continued to expand its business on Equinix’s global interconnection platform of 146 International Business Exchanges™ (IBX®) in 40 markets across the globe. Beeks is currently deployed in Equinix’s International Business Exchanges™ (IBX®) in London, New York, Frankfurt, Tokyo, Chicago, and most recently, Hong Kong.

The move to Equinix’s Cloud Exchange is expected to help save approximately £1M over the next three years, while enabling Beeks Financial Cloud to meet the needs of its global customer base who thrive and grow through forex trading.

London is a key player in the global digital economy, with the fifth largest GDP by metropolitan area in the world. Equinix’s flagship London data center based in Slough (LD6) is one of the fastest-growing in the UK and has been established as a hub for businesses to interconnect in a secure colocation environment.

 

Source: CloudStrategyMag