Gartner Identifies The Top 10 Strategic Technology Trends For 2017

Gartner Identifies The Top 10 Strategic Technology Trends For 2017

Gartner, Inc. has highlighted the top technology trends that will be strategic for most organizations in 2017. Analysts presented their findings during the sold-out Gartner Symposium/ITxpo, which took place October 16-20 in Orlando.

Gartner defines a strategic technology trend as one with substantial disruptive potential that is just beginning to break out of an emerging state into broader impact and use or which are rapidly growing trends with a high degree of volatility reaching tipping points over the next five years.

“Gartner’s top 10 strategic technology trends for 2017 set the stage for the Intelligent Digital Mesh,” said David Cearley, vice president and Gartner Fellow. “The first three embrace ‘Intelligence Everywhere,’ how data science technologies and approaches are evolving to include advanced machine learning and artificial intelligence allowing the creation of intelligent physical and software-based systems that are programmed to learn and adapt. The next three trends focus on the digital world and how the physical and digital worlds are becoming more intertwined. The last four trends focus on the mesh of platforms and services needed to deliver the intelligent digital mesh.”

The top 10 strategic technology trends for 2017 are:

AI and Advanced Machine Learning

Artificial intelligence (AI) and advanced machine learning (ML) are composed of many technologies and techniques (e.g., deep learning, neural networks, natural-language processing [NLP]). The more advanced techniques move beyond traditional rule-based algorithms to create systems that understand, learn, predict, adapt and potentially operate autonomously. This is what makes smart machines appear “intelligent.”

“Applied AI and advanced machine learning give rise to a spectrum of intelligent implementations, including physical devices (robots, autonomous vehicles, consumer electronics) as well as apps and services (virtual personal assistants [VPAs], smart advisors), said Mr. Cearley. “These implementations will be delivered as a new class of obviously intelligent apps and things as well as provide embedded intelligence for a wide range of mesh devices and existing software and service solutions.”

Intelligent Apps

Intelligent apps such as VPAs perform some of the functions of a human assistant making everyday tasks easier (by prioritizing emails, for example), and its users more effective (by highlighting the most important content and interactions). Other intelligent apps such as virtual customer assistants (VCAs) are more specialized for tasks in areas such as sales and customer service. As such, these intelligent apps have the potential to transform the nature of work and structure of the workplace.

“Over the next 10 years, virtually every app, application and service will incorporate some level of AI,” said Mr Cearley. “This will form a long-term trend that will continually evolve and expand the application of AI and machine learning for apps and services.”

Intelligent Things

Intelligent things refer to physical things that go beyond the execution of rigid programing models to exploit applied AI and machine learning to deliver advanced behaviors and interact more naturally with their surroundings and with people. As intelligent things, such as drones, autonomous vehicles and smart appliances, permeate the environment, Gartner anticipates a shift from stand-alone intelligent things to a collaborative intelligent things model.

Virtual and Augmented Reality

Immersive technologies, such as virtual reality (VR) and augmented reality (AR), transform the way individuals interact with one another and with software systems. “The landscape of immersive consumer and business content and applications will evolve dramatically through 2021,” said Mr. Cearley. “VR and AR capabilities will merge with the digital mesh to form a more seamless system of devices capable of orchestrating a flow of information that comes to the user as hyperpersonalized and relevant apps and services. Integration across multiple mobile, wearable, Internet of Things (IoT) and sensor-rich environments will extend immersive applications beyond isolated and single-person experiences. Rooms and spaces will become active with things, and their connection through the mesh will appear and work in conjunction with immersive virtual worlds.”

Digital Twin

A digital twin is a dynamic software model of a physical thing or system that relies on sensor data to understand its state, respond to changes, improve operations and add value. Digital twins include a combination of metadata (for example, classification, composition and structure), condition or state (for example, location and temperature), event data (for example, time series), and analytics (for example, algorithms and rules).

Within three to five years, hundreds of millions of things will be represented by digital twins. Organizations will use digital twins to proactively repair and plan for equipment service, to plan manufacturing processes, to operate factories, to predict equipment failure or increase operational efficiency, and to perform enhanced product development. As such, digital twins will eventually become proxies for the combination of skilled individuals and traditional monitoring devices and controls (for example, pressure gauges, pressure valves).

Blockchain and Distributed Ledgers

Blockchain is a type of distributed ledger in which value exchange transactions (in bitcoin or other tokens) are sequentially grouped into blocks. Each block is chained to the previous block and recorded across a peer-to-peer network, using cryptographic trust and assurance mechanisms. Blockchain and distributed-ledger concepts are gaining traction because they hold the promise to transform industry operating models. While the current hype is around the financial services industry, there are many possible applications including music distribution, identity verification, title registry and supply chain.

“Distributed ledgers are potentially transformative but most initiatives are still in the early alpha or beta testing stage,” said Mr. Cearley.

Conversational Systems

The current focus for conversational interfaces is focused on chatbots and microphone-enabled devices (e.g., speakers, smartphones, tablets, PCs, automobiles). However, the digital mesh encompasses an expanding set of endpoints people use to access applications and information, or interact with people, social communities, governments and businesses. The device mesh moves beyond the traditional desktop computer and mobile devices to encompass the full range of endpoints with which humans might interact. As the device mesh evolves, connection models will expand and greater cooperative interaction between devices will emerge, creating the foundation for a new continuous and ambient digital experience.

Mesh App and Service Architecture

In the mesh app and service architecture (MASA), mobile apps, web apps, desktop apps and IoT apps link to a broad mesh of back-end services to create what users view as an “application.” The architecture encapsulates services and exposes APIs at multiple levels and across organizational boundaries balancing the demand for agility and scalability of services with composition and reuse of services. The MASA enables users to have an optimized solution for targeted endpoints in the digital mesh (e.g., desktop, smartphone, automobile) as well as a continuous experience as they shift across these different channels.

Digital Technology Platforms

Digital technology platforms provide the basic building blocks for a digital business and are a critical enabler to become a digital business. Gartner has identified the five major focal points to enable the new capabilities and business models of digital business — information systems, customer experience, analytics and intelligence, the IoT, and business ecosystems. Every organization will have some mix of these five digital technology platforms. The platforms provide the basic building blocks for a digital business and are a critical enabler to become a digital business.

Adaptive Security Architecture

The intelligent digital mesh and related digital technology platforms and application architectures create an ever-more-complex world for security. “Established security technologies should be used as a baseline to secure Internet of Things platforms,” said Mr. Cearley. “Monitoring user and entity behavior is a critical addition that is particularly needed in IoT scenarios. However, the IoT edge is a new frontier for many IT security professionals creating new vulnerability areas and often requiring new remediation tools and processes that must be factored into IoT platform efforts.”

Source: CloudStrategyMag

Big data face-off: Spark vs. Impala vs. Hive vs. Presto

Big data face-off: Spark vs. Impala vs. Hive vs. Presto

Today AtScale released its Q4 benchmark results for the major big data SQL engines: Spark, Impala, Hive/Tez, and Presto.

The findings prove a lot of what we already know: Impala is better for needles in moderate-size haystacks, even when there are a lot of users. Presto also does well here. Hive and Spark do better on long-running analytics queries.

I spoke to Joshua Klar, AtScale’s vice president of product management, and he noted that many of the company’s customers use two engines. Generally they view Hive as more stable and tend to run their long-running queries on it. All of their Hive customers use Tez, and none use MapReduce any longer.

In my experience, the stability gap between Spark and Hive closed a while ago, so long as you’re smart about memory management. As I noted recently, I don’t see a long-term future for Hive on Tez, because Impala and Presto are better for those normal BI queries, and Spark generally performs better for analytics queries (that is, for finding smaller haystacks inside of huge haystacks). In an era of cheap memory, if you can afford to do large scale analytics, you can afford to do it in-memory, and everything else is more of a BI pattern.

While all of the engines have shown improvement over the last AtScale benchmark, Hive/Tez with the new LLAP (Live Long and Process) feature has made impressive gains across the board. The performance still hasn’t caught up with Impala and Spark, but according to this benchmark, it isn’t as slow and unwieldy as before — and at least Hive/Tez with LLAP is now practical to use in BI scenarios.

The full benchmark report is worth reading, but key highlights include:

  • Spark 2.0 improved its large query performance by an average of 2.4X over Spark 1.6 (so upgrade!). Small query performance was already good and remained roughly the same.

  • Impala 2.6 is 2.8X as fast for large queries as version 2.3. Small query performance was already good and remained roughly the same

  • Hive 2.1 with LLAP is over 3.4X faster than 1.2 and its small query performance doubled. If you’re using Hive, this isn’t an upgrade you can afford to skip.

Not really analyzed is whether SQL is always the right way to go and how, say, a functional approach in Spark would compare. You need to take these benchmarks within the scope of which they are presented.

The bottom line is that all of these engines have dramatically improved in one year. Both Impala and Presto continue lead in BI-type queries and Spark leads performance-wise in large analytics queries. I’d like to see what could be done to address the concurrency issue with memory tuning, but that’s actually consistent with what I observed in the Google Dataflow/Spark Benchmark released by my former employer earlier this year. Either way, it is time to upgrade!

Source: InfoWorld Big Data

The AI overlords have already won

The AI overlords have already won

AI and its many subsets, including machine learning and bots, have been incredibly hyped of late, claiming to revolutionize the way humans interact with machines. InfoWorld, for example, has reviewed the machine learning APIs offered by the major clouds. Everyone wonders who will be the big winner in this new world.

Bad new for those who like drama: The war may already be over. If AI is only as good as the algorithms — and more important, the data fed to them — who can hope to compete with Amazon, Apple, Facebook, Google, and Microsoft, all of which continually feast on the data we happily give them every day?

All your bots are belong to us

Former Evernote CEO and current venture capitalist Phil Libin has suggested that bots are on par with browsers 20 years ago: basic command lines control them with minimalistic interfaces. (“Alexa, what is the weather today?”) Bots, however, promise to be far richer than browsers, with fewer limits on how we inject data into the systems and better ways to pull data-rich experiences therefrom — that is, if we can train them with enough quality data.

This isn’t a problem for the fortunate few: Amazon, Apple, Facebook, Google, Microsoft, and a handful of others are swimming in data. In exchange for free services like email or Siri, we gladly give mountains of data to these companies. In so doing, we may be unwittingly building out competitive differentiation for them that could last for a long, long time.

Who, for example, can hope to compete with Google’s sense of location, given its mapping service, which relies on heavily structured data that we feed it every time we ask for directions? Or how about Facebook, which understands unstructured interactions between people better than anyone else?

All trends point to this getting worse (or better, depending on your trust of concentrations of power). Take your smartphone. Originally we exulted in the sea of apps available in the various app stores, unlocking a cornucopia of services. A few years into the app revolution, however, and the vast majority of the apps that consume up to 90 percent of our mobile hours are owned by a handful of companies: Facebook and Google, predominantly. All that data we generate on our devices? Owned by very few companies.

On the back end, these same companies dominate, making up the “megacloud” elite. Tim O’Reilly first pointed out this trend, arguing that megaclouds like Facebook and Microsoft would be difficult to beat because of the economies of scale that make them stronger even as they grow bigger. While he was talking about infrastructure, the same principle applies to data. The rich get richer.

In the case of AI bot interfaces, the data-rich may end up as the only ones capable of delivering experiences that consumers find credible and useful.

CompuServe 3.0

If this seems bleak, it’s because it is. It’s hard to see how any upstart challenger can hope to rend control of consumer data from these megaclouds with the processing power, data science smarts, and treasure trove of user info. The one ray of light, perhaps, is if someone can introduce a superior “curation layer.”

For example, today I might conversationally ask Apple’s Siri or Amazon’s Alexa to point out nearby sushi restaurants. Both are able to tap into a places-of-interest database and spit out an acceptable response. However, what if I really want not merely nearby sushi restaurants, but nearby sushi restaurants recommended by someone whose food preferences I trust?

Facebook appears to be in pole position to use its knowledge of my human interactions to give the best answer, but it actually doesn’t. Just because I’m friends with someone on Facebook doesn’t mean I care about their preferred restaurants. I almost certainly will never have expressed my belief in digital text that their taste in food is terrible. (I don’t want to be rude, after all.) Thus, the field is open to figure out which sources I do trust, then curate accordingly.

This is partly a matter of data, but ultimately it’s a matter of superior algorithms coupled with better interpretation of signals that inform those algorithms. Yes, Google or Facebook might be first to develop such algorithms and interpret the signals, but in the area of data curation there’s still room for hope that new entrants can win.

Otherwise, all our data belongs to these megaclouds for the next 10 years, as it has for the last 10 years. And they’re using it to get smarter all the time.

Source: InfoWorld Big Data

Equinix Collaborates With VMware

Equinix Collaborates With VMware

Equinix, Inc. has announced that it is offering direct and private access to VMware vCloud® Air™ on Platform Equinix™ in major markets across North America and Europe. Access is available either via Equinix Cloud Exchange™, the company’s cloud interconnection platform, or via Direct Connect, depending on location. These dedicated, private connections reduce network costs and provide higher and more consistent network performance than Internet-based connections for enterprise customers looking to build out their VMware-based hybrid clouds.

By combining the flexibility of public cloud with the control and security of private cloud, hybrid cloud deployments are on the rise within enterprise IT, with an estimated annual growth rate of 45%*. Direct access to vCloud Air inside Equinix provides seamless hybrid cloud enablement for VMware customers by enabling them to easily combine on-premises vSphere investments with the agility of public cloud without the inherent control, latency and security concerns and limitations of the public Internet. The solution is now available via Equinix Cloud Exchange in London, Silicon Valley and Washington, D.C. International Business Exchange™ (IBX®) data centers, and also via Direct Connect in Frankfurt, London, New York, Silicon Valley, and Washington, D.C.

“VMware has a strong history and established foothold within the enterprise market, and therefore it is important that we provide direct access to their solution via our Equinix Cloud Exchange as enterprise demand for high-performance, enterprise-grade cloud services continues to grow. Cloud Exchange was developed with the intent of providing improved performance of cloud-based applications and workloads through high-throughput, low-latency connections and we are thrilled to be able to provide this to VMware customers,” said Mark Adams, chief development officer, Equinix.

Direct access to VMware vCloud Air inside Equinix is ideal for customers looking to seamlessly extend their on-premises VMware vSphere® infrastructure and move workloads to and from vCloud Air in a more secure, reliable and compliant manner. This offering also enables customers to consolidate multiple network siloes into an Equinix IBX data center and directly connect to vCloud Air, enabling a unified and truly hybrid architecture. And, direct access to vCloud Air offers a robust disaster recovery solution for customers. Customers with aggressive recovery time objective (RTO) and recovery point objective (RPO) requirements can place critical applications close to vCloud Air for direct instant recovery, or establish a secondary disaster recovery site for vSphere workloads, thus avoiding application downtime.

Equinix offers the industry’s broadest choice in cloud service providers, such as VMware vCloud Air, and offers direct connections to many of these platforms via Equinix Cloud Exchange or Equinix Cross Connects. Equinix Cloud Exchange is an advanced interconnection solution that provides virtualized, private direct connections that bypass the Internet to provide better security and performance with a range of bandwidth options. It is currently available in 21 markets, globally.

Source: CloudStrategyMag

State Bridge Authority Selects The Fusion Cloud

State Bridge Authority Selects The Fusion Cloud

Fusion has announced that the company was recently awarded a three year, $165,000 contract to provide its award-winning cloud communications and cloud connectivity solutions to a large state government corporation. The Authority, which is mandated to ensure safe, reliable, and convenient bridge crossings for millions of residents and visitors, cited Fusion’s ability to integrate its communications solutions with a safe, reliable connection to the cloud, delivering quality of service and a single contract, point of contact and invoice. The Authority also cited Fusion’s robust national network, local presence for on-site support and enhanced, cost-effective feature set as important reasons for its selection. The award reflects Fusion’s increasing success in serving the government sector.

“The selection process was rigorous, reflecting the trust that millions of people place in the Authority to protect the integrity of the many bridges linking communities across the state. Fusion was gratified to have been selected over multiple potential providers through the intensive bid process. The Authority was particularly impressed by our high value, cost-effective communications solutions, our flexibility in meeting the specialized regulatory requirements of a state authority, and our ability to provide local support while maintaining a truly diverse national network,” said Russell P. Markman, president of business services, Fusion.

“The Authority’s challenge was to upgrade its legacy, premise-based communications infrastructure in order to significantly improve its mission-critical communications, driving increases in productivity while protecting taxpayer investments. Invoking the public trust, the Authority expressed its confidence in Fusion to migrate its communications safely to the cloud. We were pleased to have the opportunity to deliver a secure single source cloud solution that will scale with the Authority’s future communications needs,” Markman added.

Source: CloudStrategyMag

Report: CTERA Reveals Enterprise Disconnect With In-Cloud Data Protection Strategies

Report: CTERA Reveals Enterprise Disconnect With In-Cloud Data Protection Strategies

According to new research announced by CTERA Networks, enterprise data protection strategies may not be fully aligned with IT modernization initiatives driven by cloud computing. The research shows that while enterprises continue to migrate workloads to the cloud at a rapid pace, protection of cloud-based servers and applications has not fully evolved to meet enterprise requirements for business continuity and data availability.

CTERA’s new eBook, Game of Clouds, showcases the findings of CTERA’s inaugural cloud backup survey, and presents a deep look at the state of enterprise cloud data protection. A CTERA-commissioned study was conducted by independent research firm Vanson Bourne to examine the data protection strategies of 400 IT decision makers and IT specialists in organizations using the cloud for application deployment at U.S., German and French organizations. The study analyzes the benefits and pitfalls of current backup strategies, offers key considerations for organizations moving to the cloud, and looks at the impact of poor backup practices on business continuity. The Game of Clouds eBook can be downloaded here.

Key findings from the research include:

  • Organizations are moving to the cloud at a rapid pace to realize efficiency gains, real-time scalability and cost savings. More than two-thirds (67%) of organizations deploy more than 25% of their applications in the cloud, and 37% plan to grow their cloud use by at least 25%, if not more. In addition, 54% of organizations are embracing a hybrid cloud strategy that leverages both on-premises and cloud services.
  •  But two out of three companies (66%) strongly agree or somewhat agree there is less focus on backing up applications in the cloud due to a misconception that the cloud is inherently resilient compared to on-premises applications. This is not a surprise considering that 62% of organizations rely on the cloud provider to back up applications running on their platform.
  • As more enterprise applications move to the cloud and threats such as ransomware become more pervasive, the criticality of business continuity prevails. Thirty-nine percent of respondents claim that ensuring business continuity is the highest priority when backing up applications and data running in the cloud. And 71% of organizations cite data protection and availability as one of the biggest challenges when moving to the cloud.
  • With more than a third of respondents (36%) reporting that the loss of data in the cloud would be more catastrophic than their data center crashing, and 14% of respondents claiming it would cost them their jobs, the need to get a cloud strategy right the first time is imperative.

“The enterprise’s move beyond traditional data centers has rewritten the playbook for data protection in the cloud,” said Jeff Denworth, SVP Marketing, CTERA. “As organizations adopt cloud and multi-cloud strategies, traditional backup tools fall down. Our research spotlights the key data protection considerations and challenges for enterprises as they look for simple, efficient and automated solutions that protect critical cloud-based applications.”

Source: CloudStrategyMag

Common Controls Hub™ Provides Microsoft With A Harmonized View Of Compliance

Common Controls Hub™ Provides Microsoft With A Harmonized View Of Compliance

Microsoft has licensed for use for their global customer base the Common Controls Hub™ (CCH). By doing so, Microsoft underscores its commitment to assist their clients in meeting and retaining the highest level of compliance in all aspects of their enterprise.

Why the Common Controls Hub™ Is Significant

The Common Controls Hub is the SaaS portal to the data in the Unified Compliance Framework® (UCF). It is a research and information database that provides compliance professionals with the ability to maintain regulatory compliance for both national and international standards that apply to their enterprise and industry. The CCH connects the various criteria, policies, and lexicons of over 200,000 individual compliance mandates across over 800 laws, standards, and regulations (referred to by the UCF as Authority Documents) from around the world. By creating a standardized structure that traces the what, why, and how of every Authority Document, the CCH harmonizes the rules and regulations with which they must comply. It also provides the means to track compliance activities over time, for both internal and external audit purposes.

To achieve its objective, the CCH incorporates all the elements of each body of rules into its database, where it maps each separate mandate in relation to all other mandates. Dividing every rule set into five domains, 15 “Impact Zones” and three categories, the program identifies all the controls that affect their organization, the authority documents that establish those controls, and the details of implementation that will demonstrate compliance.

Why Microsoft’s Licensing Is Significant

At Microsoft, compliance with the rules is an ethical matter. The company’s Standards of Business Conduct embody its values of integrity and ethical business practices, and every Microsoft employee is expected to follow the company’s internal standards, as well as national and international laws.

In its Trust Center, the company offers its customers the proprietary programs and services that help them do the same high level of ethical business in this complex, digital world. Products related to digital and corporate security, privacy, transparency and, compliance, are utilized by millions of consumers each year. Until now, however, the company did not have a tool that provided the comprehensive analytics necessary to identify both the rule sets and the mandates with which each of its customers was required to follow.

The addition of the Common Controls Hub in a branded whitebox in Microsoft’s Trust Center marks the first time the technology giant has placed the product of an outside content provider alongside their already sizeable and significant proprietary programming as a valid and valuable resource for its existing and potential customers.

Source: CloudStrategyMag

Logicalis US Ranked Among Top 100 Cloud Service Providers

Logicalis US Ranked Among Top 100 Cloud Service Providers

Logicalis US has announced that it has ranked among the world’s top 100 cloud service providers (CSPs) according to Penton’s sixth annual Talkin’ Cloud 100 report.

Based on data from Talkin’ Cloud’s online survey, conducted between June and August 2016, the Talkin’ Cloud 100 list recognizes top CSPs considering annual cloud services revenue growth and input from Penton Technology’s channel editors.

“There’s a significant change taking place in IT today, and cloud computing is at the heart of it,” says Eric Brooks, Cloud and Automation Practice Leader, Logicalis US.  “Organizations that want to be agile enough to quickly respond to changes in business environments need to rely on a host of compute capabilities delivered as-a-service in the cloud. To take advantage of these offerings, CIOs are shifting their thinking about IT as a technology-defined function to one that is services defined, and they need partners like Logicalis that can help them with this digital transformation. Being recognized among the world’s top cloud service providers is, therefore, both an honor and an industrywide acknowledgement of the proficiency Logicalis displays in delivering services via the cloud and helping our clients chart their course toward becoming service-defined organizations.”

“On behalf of Penton and Talkin’ Cloud, I would like to congratulate Logicalis for its recognition as a Talkin’ Cloud 100 honoree,” says Nicole Henderson, editor in chief, Talkin’ Cloud. “Cloud services providers on the Talkin’ Cloud 100 set themselves apart through innovative cloud offerings and new support models demonstrating a deep understanding of their customers’ needs and future cloud opportunities.”

 

Source: CloudStrategyMag

It's (not) elementary: How Watson works

It's (not) elementary: How Watson works

What goes into making a computer understand the world through senses, learning and experience, as IBM says Watson does? First and foremost, tons and tons of data.

To build a body of knowledge for Watson to work with on Jeopardy, researchers put together 200 million pages of content, both structured and unstructured, including dictionaries and encyclopedias. When asked a question, Watson initially analyzes it using more than 100 algorithms, identifying any names, dates, geographic locations or other entities. It also examines the phrase structure and the grammar of the question to better gauge what’s being asked. In all, it uses millions of logic rules to determine the best answers.

Today Watson is frequently being applied to new areas, which means learning new material. Researchers begin by loading Word documents, PDFs and web pages into Watson to build up its knowledge. Question and answer pairs are then added to train Watson on the subject. To answer a question, Watson searches millions of documents to find thousands of possible answers. Along the way it collects evidence and uses a scoring algorithm to rate each item’s quality. Based on that scoring, it ranks all possible answers and offers the best one. The video below explains the process in more detail.

[embedded content]

Over time, Watson learns from its experience. It’s also updated automatically as new information is published. In terms of nuts and bolts, Watson uses IBM’s DeepQA software along with a variety of other proprietary and open-source technologies. In its original form, that included Hadoop and Apache UIMA (Unstructured Information Management Architecture) software and a cluster of 90 Power 750 computers packing a total of 2880 processor cores.

Today Watson is delivered via the cloud, but as competition heats up, IBM is keeping quiet about the underlying specifics.

“Our DeepQA reasoning and other foundational cognitive skills make use of deep-learning techniques, proprietary algorithms and open-source kernels and frameworks that make use of hardware technologies that are optimized for those workloads,” said IBM Watson vice president and CTO Rob High. 

Source: InfoWorld Big Data

Why being a data scientist 'feels like being a magician'

Why being a data scientist 'feels like being a magician'

The data scientist role was thrust into the limelight early this year when it was named 2016’s “hottest job,” and there’s been considerable interest in the position ever since. Just recently, the White House singled data scientists out with a special appeal for help.

Those in the job can expect to earn a median base salary of roughly $116,840 — if they have what it takes. But what is it like to be a data scientist? Read on to hear what three people currently on the front lines had to say.

How the day breaks down

That data scientists spend a lot of time working with data goes without saying. What may be less obvious is that meetings and face-to-face time are also a big part of the picture.

“Typically, the day starts with meetings,” said Tanu George, an account manager and data scientist with LatentView Analytics. Those meetings can serve all kinds of purposes, she said, including identifying a client’s business problem, tracking progress, or discussing reports.

tanu george latentviewLatentView Analytics

Tanu George is a data scientist with LatentView Analytics.

By midmorning the meetings die down, she said. “This is when we start doing the number crunching,” typically focused on trying to answer the questions asked in meetings earlier.

Afternoon is often spent on collaborative meetings aimed at interpreting the numbers, followed by sharing analyses and results via email at the end of the day.

Roughly 50 percent of George’s time is taken up in meetings, she estimates, with another 20 percent in computation work and 30 percent in interpretation, including visualizing and putting data into actionable form.

Meetings with clients also represent a significant part of the day for Ryan Rosario, an independent data scientist and mentor at online education site Springboard. “Clients explain the problem and what they’d like to see for an outcome,” he said.  

Next comes a discussion of what kinds of data are needed. “More times than not, the client actually doesn’t have the data or know where to get it,” Rosario said. “I help develop a plan for how to get it.”

ryan rosario data scientistRyan Rosario

Ryan Rosario is an independent data scientist and engineer.

A lot of data science is not working with the data per se but more trying to understand the big picture of “what does this mean for a company or client,” said Virginia Long, a predictive analytics scientist at healthcare-focused MedeAnalytics. “The first step is understanding the area — I’ll spend a lot of time searching the literature, reading, and trying to understand the problem.”

Figuring out who has what kind of data comes next, Long said. “Sometimes that’s a challenge,” she said. “People really like the idea of using data to inform their decisions, but sometimes they just don’t have the right data to do that. Figuring out ways we can collect the right data is sometimes part of my job.”

Once that data is in hand, “digging in” and understanding it comes next. “This is the flip side of the basic background research,” Long said. “You’re really finding out what’s actually in the data. It can be tedious, but sometimes you’ll find things you might not have noticed otherwise.”

virginia long medeanalyticsVirginia Long

Virginia Long is a predictive analytics scientist at MedeAnalytics.

Long also spends some of her time creating educational materials for both internal and external use, generally explaining how various data science techniques work.

“Especially with all the hype, people will see something like machine learning and see just the shiny outside. They’ll say, ‘oh we need to do it,'” she explained. “Part of every day is at least some explaining of what’s possible and how it works.”

Best and worst parts of the job

Meetings are George’s favorite part of her day: “They make me love my job,” she said.

For Rosario, whose past roles have included a stint as a machine learning engineer at Facebook, the best parts of the job have shifted over time.

“When I worked in Silicon Valley, my favorite part was massaging the data,” he said. “Data often comes to us in a messy format, or understandable only by a particular piece of software. I’d move it into a format to make it digestible.”

As consultant, he loves showing people what data can do.

“A lot of people know they need help with data, but they don’t know what they can do with it,” he said. “It feels like being a magician, opening their minds to the possibilities. That kind of exploration and geeking out is now my favorite part.”

Long’s favorites are many, including the initial phases of researching the context of the problem to be solved as well as figuring out ways to get the necessary data and then diving into it headfirst.

Though some reports have suggested that data scientists still spend an inordinate amount of their time on “janitorial” tasks, “I don’t think of it as janitorial,” Long said. “I think of it as part of digging in and understanding it.”

As for the less exciting bits, “I prefer not to have to manage projects,” Long said. Doing so means “I often have to spend time managing everyone else’s priorities while trying to get my own things done.”

As for Rosario, who was trained in statistics and data science, systems building and software engineering are the parts he prefers to de-emphasize.

Preparing for the role

It’s no secret that data science requires considerable education, and these three professionals are no exception. LatentView Analytics’ George holds a bachelor’s degree in electrical and electronics engineering along with an MBA, she said.

Rosario holds a BS in statistics and math of computation as well as an MS in statistics and an MS in computer science from UCLA; he’s currently finishing his PhD in statistics there.

As for MedeAnalytics’ Long, she holds a PhD in behavioral neuroscience, with a focus on learning, memory and motivation.

“I got tired of running after the data,” Long quipped, referring to the experiments conducted in the scientific world. “Half of your job as a scientist is doing the data analysis, and I really liked that aspect. I also was interested in making a practical difference.”

The next frontier

And where will things go from here?

“I think the future has a lot more data coming,” said George, citing developments such as the internet of things (IoT). “Going forward, all senior and mid-management roles will incorporate some aspect of data management.”

The growing focus on streaming data means that “a lot more work needs to be done,” Rosario agreed. “We’ll see a lot more emphasis on developing algorithms and systems that can merge together streams of data. I see things like the IoT and streaming data being the next frontier.”

Security and privacy will be major issues to tackle along the way, he added.

Data scientists are still often expected to be “unicorns,” Long said, meaning that they’re asked to do everything single-handedly, including all the coding, data manipulation, data analysis and more.

“It’s hard to have one person responsible for everything,” she said. “Hopefully, different types of people with different skill sets will be the future.”

Words of advice

For those considering a career in data science, Rosario advocates pursuing at least a master’s degree. He also suggests trying to think in terms of data.

“We all have problems around us, whether it’s managing our finances or planning a vacation,” he said. “Try to think about how you could solve those problems using data. Ask if the data exists, and try to find it.”

For early portfolio-building experience, common advice suggests finding a data set from a site such as Kaggle and then figuring out a problem that can be solved using it.

“I suggest the inverse,” Rosario said. “Pick a problem and then find the data you’d need to solve it.”

“I feel like the best preparation is some sense of the scientific method, or how you approach a problem,” said MedeAnalytics’ Long. “It will determine how you deal with the data and decide to use it.”

Tools can be mastered, but “the sensibility of how to solve the problem is what you need to get good at,” she added.

Of course, ultimately, the last mile for data scientists is presenting their results, George pointed out.

“It’s a lot of detail,” she said. “If you’re a good storyteller, and if you can weave a story out of it, then there’s nothing like it.”

Source: InfoWorld Big Data