AT&T Suit Shows Web Has Enough Rules, Open Internet Foes Say
[unable to retrieve full-text content]Critics have promised to mount an aggressive campaign against FCC Chairman Ajit Pai’s decision to seek the elimination of the net neutrality rules. Read More…
Source: TheWHIR
HPE Returns to Techie Roots Appointing Neri to Succeed Whitman
HPE Returns to Techie Roots Appointing Neri to Succeed Whitman
[unable to retrieve full-text content]The company now known as HPE will be run by an engineer for the first time in almost two decades, now that Antonio Neri is succeeding Meg Whitman as chief executive officer….
Source: TheWHIR
OVH Launches Hosted Private Cloud to US Market
OVH Launches Hosted Private Cloud to US Market
[unable to retrieve full-text content]OVH has launched its next-generation Hosted Private Cloud offerings for the US market, the company announced this week….
Source: TheWHIR
SolarWinds Updates Its SaaS Portfolio
SolarWinds Updates Its SaaS Portfolio
SolarWinds has announced an all-new, breakthrough product and two advanced product updates in a major evolution of its SolarWinds Cloud® Software as a Service (SaaS) portfolio. The new offerings expand the company’s current capabilities for comprehensive, full-stack monitoring with the introduction of AppOptics™, a new application and infrastructure monitoring solution; significant updates to Papertrail™, providing faster search speeds and new log velocity analytics; and enhanced digital experience monitoring (DEM) functionality within Pingdom®.
Collectively, the new SolarWinds Cloud portfolio gives customers broad and unmatched visibility into logs, metrics, and tracing, as well as the digital experience. It will enable developers, DevOps engineers, and IT professionals to simplify and accelerate management and troubleshooting, from the infrastructure and application layers to the end-user experience. In turn, it will allow customers to focus on building the innovative capabilities businesses need for today’s on-demand environments.
“Application performance and the digital experience of users have a direct and significant impact on business success,” said Christoph Pfister, executive vice president of products, SolarWinds. “With the stakes so high, the ability to monitor across the three pillars of observability — logs, metrics, and tracing — is essential. SolarWinds Cloud offers this comprehensive functionality with industry-best speed and simplicity. With AppOptics and the enhancements to Papertrail and Pingdom, we’re breaking new ground by delivering even greater value to our customers in an incredibly powerful, disruptively affordable SaaS portfolio.”
AppOptics: Simple, unified monitoring for the modern application stack
Available today, AppOptics addresses challenges customers face from being forced to use disparate solutions for applications and infrastructure performance monitoring. To do so, it offers broad application performance monitoring (APM) language support with auto-instrumentation, distributed tracing functionality, and a host agent supported by a large open community to enable expanded infrastructure monitoring capabilities and comprehensive visibility through converged dashboards.
For a unified view, AppOptics’ distributed tracing, host and IT infrastructure monitoring, and custom metrics all feed the same dashboarding, analytics, and alerting pipelines. SolarWinds designed the solution to simplify and unify the management of complex modern applications, infrastructure, or both. This allows customers to solve problems and improve performance across the application stack, in an easy-to-use, as-a-service platform.
For application performance monitoring, the powerful distributed tracing functionality can follow requests across any number of hosts, microservices, and languages without manual instrumentation. Users can move quickly from visualizing trends to deep, code-level, root cause analysis.
AppOptics bridges the traditional divide between application and infrastructure health metrics with unified dashboards, alerting, and management features. The host agent runs Snap™ and Telegraf™ plug-ins, enabling drop-in monitoring of key systems. The solution integrates with a wide range of systems to support the heterogeneous infrastructure environments dominating today’s IT landscape.
AppOptics serves as a highly extensible custom metrics and analytics platform that brings together applications, infrastructure, and business data to deliver deep insights that enable fast problem resolution. Finally, with pricing starting at $7.50 USD per host/month, AppOptics delivers an unmatched combination of deep functionality and very affordable pricing, a breakthrough that makes powerful application performance monitoring capabilities accessible to virtually all organizations.
Papertrail: Faster, smarter troubleshooting with log velocity analytics and ‘lightning search’
Papertrail is a cloud-hosted log management solution that helps users troubleshoot infrastructure and application problems. The latest version introduced today includes log velocity analytics, which can instantly visualize log patterns and help identify anomalies. For example, customers now can visualize an increase in total logs sent by a server, a condition that could indicate imminent failure, or something out of the norm.
Also, new to Papertrail is “lightning search,” which will enable developers, support engineers, and systems administrators to search millions or billions of log messages faster than ever before, and then immediately act on information found within the log messages. Together, Papertrail’s latest enhancements empower customers to troubleshoot complex problems, error messages, application server errors, and slow database queries, faster and smarter, with full visibility across all logs.
Pingdom digital experience monitoring
Research firm Gartner estimates that, “by 2020, 30 percent of global enterprises will have strategically implemented DEM technologies or services, up from fewer than 5 percent today1.” Pingdom, a market leader in the DEM arena, helps make websites faster and more reliable with powerful, easy-to-use uptime and performance monitoring functionality. Available on November 27, the Pingdom solution’s latest enhancements for digital experience monitoring include three new dashboard views that provide the ability to continuously enhance user experience on websites or web applications:
- Sites View: Customers can quickly locate a user experience issue on any monitored website
- Experience View: Customers can filter users and identify those affected by performance issues
- Performance View: Customers can explore the technical cause of an issue and quickly and easily identify opportunities for performance improvements
The latest updates to the Pingdom solution’s digital experience monitoring will empower customers to know first when issues affect their site visitors’ experience, and quickly surface critical information needed to enhance the overall experience.
SolarWinds Cloud: The next evolution of SaaS-based full-stack monitoring
Today’s announcement of SolarWinds Cloud is another important milestone in the company’s drive to deliver a set of comprehensive, simple, and disruptively affordable full-stack monitoring solutions built upon a common, seamlessly integrated, SaaS-based platform. Since 2014, SolarWinds has dramatically expanded its cloud portfolio and capabilities through a series of acquisitions, while making significant progress integrating these acquired solutions, including Pingdom, Librato®, Papertrail, and TraceView™, under a common sales and operational model.
AppOptics builds on the technology and feedback SolarWinds put into Librato and TraceView since their introductions. Now, the company has integrated and enhanced this functionality within a single solution, taking another big step forward in advancing its strategy to unify full-stack monitoring across the three pillars of observability on a common SaaS-based platform. SolarWinds’ ultimate goal is to enable a single view of infrastructure, applications, and digital experience, which will help customers solve their most complex performance and reliability problems quickly, with unexpected simplicity and industry-leading affordability.
Source: CloudStrategyMag
Mobile-Phone Case at Supreme Court to Test Privacy Protections
Mobile-Phone Case at Supreme Court to Test Privacy Protections
[unable to retrieve full-text content]The case could have a far-reaching impact. Prosecutors seek phone-location information from telecommunications companies in tens of thousands of cases a year. …
Source: TheWHIR
FCC Rollback of Open-Internet Rules Won't Settle Divisive Issue
FCC Rollback of Open-Internet Rules Won't Settle Divisive Issue
[unable to retrieve full-text content]Federal Communications Commission Chairman Ajit Pai will take a big step toward his goal of voiding Obama-era net neutrality regulations Tuesday….
Source: TheWHIR
Cambridge Semantics Announces Semantic Layer For Multi-Cloud Environments
Cambridge Semantics Announces Semantic Layer For Multi-Cloud Environments
Cambridge Semantics has announced multi-cloud support for Anzo Smart Data Lake (SDL) 4.0, its flagship product that brings business meaning to all enterprise data.
Incorporating several new technical advancements designed to deliver a generational shift over current data lake, data management and data analytics offerings, Anzo SDL 4.0 now supports all three major cloud platforms — Google Cloud Platform, Amazon Web Services (AWS) and Microsoft Azure. The vision for multi-cloud capability enabled by Anzo will allow enterprises to choose from any combination of on-premise, hybrid cloud or multi-cloud solutions that makes the most sense for their business environment.
“Organizations today view their data assets as key business drivers for competitive advantage,” said Sean Martin, CTO of Cambridge Semantics. “However, for many, the cost of running analytic solutions is drastically increasing, while speed-to-deployment remains a major challenge. Therefore, we are seeing an accelerated movement to the cloud and its variable cost model.”
As the largest cost center in most enterprise connected data analytics and machine learning programs becomes renting on-demand computing power, many enterprises are actively planning to work with multiple separate cloud vendors to take advantage of the price fluctuations in today’s increasingly commoditized cloud-based computing market, according to Martin.
Cambridge Semantics’ multi-cloud data center model for service-based cloud compute consumption is abstracted and completely automated to eliminate cloud infrastructure provider lock-in as well as securely shift sophisticated consumption of compute resources between the different cloud vendors dynamically to achieve the most competitive pricing at any given moment.
“Our customers want to be able to decide where to place their analytics compute spend globally on an hour-by-hour or even a minute-by-minute basis,” Martin said. “Not only does our open standards-based semantic layer provide business understandable meaning to all enterprise data, but the same metadata driven approach is essential in enabling customers to describe the policies that determine where that data is both securely stored and processed.”
“Cambridge Semantics offers the only semantically-driven smart data lake big data management and connected data analytics solution that entirely insulates enterprises from the different cloud vendor APIs,” Martin said. “It won’t be long before our customers will be able to see multiple vendors quotes for exactly how much the same analytics dashboard is going to cost them to compute before they click the button to select the cloud provider that will get to run that specific job.”
Source: CloudStrategyMag
IDG Contributor Network: A speedy recovery: the key to good outcomes as health care’s dependence on data deepens
IDG Contributor Network: A speedy recovery: the key to good outcomes as health care’s dependence on data deepens
It may have been slow to catch on compared to other industries, but the health care sector has developed a voracious appetite for data. Digital transformation topped the agenda at this year’s Healthcare Information and Management Systems Society (HIMSS) conference in Florida, and big data analytics in health care is on track to be worth more than $34 billion globally within the next five years—possibly sooner.
Electronic health records are growing in importance to enable more interdisciplinary collaboration, speed up communication on patient cases, and drive up the quality of care. Enhanced measurement and reporting have become critical for financial management and regulatory compliance, and to protect organizations from negligence claims and fraud. More strategically, big data is spurring new innovation, from smart patient apps to complex diagnostics driven by machine learning. Because of their ability to crunch big data and build knowledge at speed, computers could soon take over from clinicians in identifying patient conditions—in contrast to doctors relying on their clinical experience to determine what’s wrong.
But as health care providers come to rely increasingly on their IT systems, their vulnerability to data outages grows exponentially. If a planned surgery can’t go ahead due to an inability to look up case information, lab results, or digital images, the patient’s life might be put at risk.
Symptoms of bigger issues
Even loss of access to administrative systems can be devastating. The chaos inflicted across the UK National Health Service in May following an international cyberattack—which took down 48 of the 248 NHS trusts in England—gave a glimpse into health care’s susceptibility to paralysis if key systems become inaccessible, even for a short time. In the NHS’s case, out-of-date security settings were to blame for leaving systems at risk. But no one is immune to system downtime, as was highlighted recently by the outage at British Airways, which grounded much of its fleet for days, at great cost not to mention severe disruption for passengers.
Although disastrous events like these instill fear in CIOs, they can—and should—also serve as a catalyst for positive action. The sensible approach is to design data systems for failure—for times when, like patients, they are not firing on all cylinders. Even with the best intentions, biggest budgets and most robust data center facilities in the world, something will go wrong at some point according to the law of averages. So, it’s far better to plan for that than to assume an indefinitely healthy prognosis.
If the worst happens, and critical systems go down, recovery is rarely a matter of switching over to backup infrastructure and data—particularly if we’re talking about live records and information, which are currently in use and being continuously updated. Just think of the real-time monitoring of the vital signs of patients in intensive care units.
If a contingency data-set exists (as it should) in another location, the chances are that the original and the backup copy will be out of sync for much of the time, because of ongoing activity involving those records. In the event of an outage, the degree to which data is out of step will have a direct bearing on the organization’s speed of recovery.
To ensure continuous care and patient safety, health care organizations need the fastest possible recovery time. But how many organizations have identified and catered for this near-zero tolerance for downtime in their contingency provisions?
Emergency protocol
The issue must be addressed as data becomes an integral part of medical progress. Already, data is not just a key to better operational and clinical decisions, but also an intrinsic part of treatments—for example in processing the data that allows real-time control and movement in paralyzed patients. Eventually, these computer-assisted treatments will also come to rely on external servers, because local devices are unlikely to have the computing power to process all the data. They too will need live data backups to ensure the continuity and safety of treatment.
On a broader scale, data looks set to become pivotal to new business models (for example, determining private health care charges based on patient outcomes, otherwise known as “value-based medicine”).
While technology companies will be pulling out all the stops to keep up with these grander plans, maintaining live data continuity is already possible. So that’s one potential barrier to progress that can be checked off the list.
This article is published as part of the IDG Contributor Network. Want to Join?
Source: InfoWorld Big Data
What is Apache Spark? The big data analytics platform explained
What is Apache Spark? The big data analytics platform explained
From its humble beginnings in the AMPLab at U.C. Berkeley in 2009, Apache Spark has become one of the key big data distributed processing frameworks in the world. Spark can be deployed in a variety of ways, provides native bindings for the Java, Scala, Python, and R programming languages, and supports SQL, streaming data, machine learning, and graph processing. You’ll find it used by banks, telecommunications companies, games companies, governments, and all of the major tech giants such as Apple, Facebook, IBM, and Microsoft.
Out of the box, Spark can run in a standalone cluster mode that simply requires the Apache Spark framework and a JVM on each machine in your cluster. However, it’s more likely you’ll want to take advantage of a resource or cluster management system to take care of allocating workers on demand for you. In the enterprise, this will normally mean running on Hadoop YARN (this is how the Cloudera and Hortonworks distributions run Spark jobs), but Apache Spark can also run on Apache Mesos, while work is progressing on adding native support for Kubernetes.
If you’re after a managed solution, then Apache Spark can be found as part of Amazon EMR, Google Cloud Dataproc, and Microsoft Azure HDInsight. Databricks, the company that employs the founders of Apache Spark, also offers the Databricks Unified Analytics Platform, which is a comprehensive managed service that offers Apache Spark clusters, streaming support, integrated web-based notebook development, and optimized cloud I/O performance over a standard Apache Spark distribution.
Spark vs. Hadoop
It’s worth pointing out that Apache Spark vs. Apache Hadoop is a bit of a misnomer. You’ll find Spark included in most Hadoop distributions these days. But due to two big advantages, Spark has become the framework of choice when processing big data, overtaking the old MapReduce paradigm that brought Hadoop to prominence.
The first advantage is speed. Spark’s in-memory data engine means that it can perform tasks up to one hundred times faster than MapReduce in certain situations, particularly when compared with multi-stage jobs that require the writing of state back out to disk between stages. Even Apache Spark jobs where the data cannot be completely contained within memory tend to be around 10 times faster than their MapReduce counterpart.
The second advantage is the developer-friendly Spark API. As important as Spark’s speed-up is, one could argue that the friendliness of the Spark API is even more important.
Spark Core
In comparison to MapReduce and other Apache Hadoop components, the Apache Spark API is very friendly to developers, hiding much of the complexity of a distributed processing engine behind simple method calls. The canonical example of this is how almost 50 lines of MapReduce code to count words in a document can be reduced to just a few lines of Apache Spark (here shown in Scala):
val textFile = sparkSession.sparkContext.textFile(“hdfs:///tmp/words”)
val counts = textFile.flatMap(line => line.split(“ “))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile(“hdfs:///tmp/words_agg”)
By providing bindings to popular languages for data analysis like Python and R, as well as the more enterprise-friendly Java and Scala, Apache Spark allows everybody from application developers to data scientists to harness its scalability and speed in an accessible manner.
Spark RDD
At the heart of Apache Spark is the concept of the Resilient Distributed Dataset (RDD), a programming abstraction that represents an immutable collection of objects that can be split across a computing cluster. Operations on the RDDs can also be split across the cluster and executed in a parallel batch process, leading to fast and scalable parallel processing.
RDDs can be created from simple text files, SQL databases, NoSQL stores (such as Cassandra and MongoDB), Amazon S3 buckets, and much more besides. Much of the Spark Core API is built on this RDD concept, enabling traditional map and reduce functionality, but also providing built-in support for joining data sets, filtering, sampling, and aggregation.
Spark runs in a distributed fashion by combining a driver core process that splits a Spark application into tasks and distributes them among many executor processes that do the work. These executors can be scaled up and down as required for the application’s needs.
Spark SQL
Originally known as Shark, Spark SQL has become more and more important to the Apache Spark project. It is likely the interface most commonly used by today’s developers when creating applications. Spark SQL is focused on the processing of structured data, using a dataframe approach borrowed from R and Python (in Pandas). But as the name suggests, Spark SQL also provides a SQL2003-compliant interface for querying data, bringing the power of Apache Spark to analysts as well as developers.
Alongside standard SQL support, Spark SQL provides a standard interface for reading from and writing to other datastores including JSON, HDFS, Apache Hive, JDBC, Apache ORC, and Apache Parquet, all of which are supported out of the box. Other popular stores—Apache Cassandra, MongoDB, Apache HBase, and many others—can be used by pulling in separate connectors from the Spark Packages ecosystem.
Selecting some columns from a dataframe is as simple as this line:
citiesDF.select(“name”, “pop”)
Using the SQL interface, we register the dataframe as a temporary table, after which we can issue SQL queries against it:
citiesDF.createOrReplaceTempView(“cities”)
spark.sql(“SELECT name, pop FROM cities”)
Behind the scenes, Apache Spark uses a query optimizer called Catalyst that examines data and queries in order to produce an efficient query plan for data locality and computation that will perform the required calculations across the cluster. In the Apache Spark 2.x era, the Spark SQL interface of dataframes and datasets (essentially a typed dataframe that can be checked at compile time for correctness and take advantage of further memory and compute optimizations at run time) is the recommended approach for development. The RDD interface is still available, but is recommended only if you have needs that cannot be encapsulated within the Spark SQL paradigm.
Spark MLlib
Apache Spark also bundles libraries for applying machine learning and graph analysis techniques to data at scale. Spark MLlib includes a framework for creating machine learning pipelines, allowing for easy implementation of feature extraction, selections, and transformations on any structured dataset. MLLib comes with distributed implementations of clustering and classification algorithms such as k-means clustering and random forests that can be swapped in and out of custom pipelines with ease. Models can be trained by data scientists in Apache Spark using R or Python, saved using MLLib, and then imported into a Java-based or Scala-based pipeline for production use.
Note that while Spark MLlib covers basic machine learning including classification, regression, clustering, and filtering, it does not include facilities for modeling and training deep neural networks (for details see InfoWorld’s Spark MLlib review). However, Deep Learning Pipelines are in the works.
Spark GraphX
Spark GraphX comes with a selection of distributed algorithms for processing graph structures including an implementation of Google’s PageRank. These algorithms use Spark Core’s RDD approach to modeling data; the GraphFrames package allows you to do graph operations on dataframes, including taking advantage of the Catalyst optimizer for graph queries.
Spark Streaming
Spark Streaming was an early addition to Apache Spark that helped it gain traction in environments that required real-time or near real-time processing. Previously, batch and stream processing in the world of Apache Hadoop were separate things. You would write MapReduce code for your batch processing needs and use something like Apache Storm for your real-time streaming requirements. This obviously leads to disparate codebases that need to be kept in sync for the application domain despite being based on completely different frameworks, requiring different resources, and involving different operational concerns for running them.
Spark Streaming extended the Apache Spark concept of batch processing into streaming by breaking the stream down into a continuous series of microbatches, which could then be manipulated using the Apache Spark API. In this way, code in batch and streaming operations can share (mostly) the same code, running on the same framework, thus reducing both developer and operator overhead. Everybody wins.
A criticism of the Spark Streaming approach is that microbatching, in scenarios where a low-latency response to incoming data is required, may not be able to match the performance of other streaming-capable frameworks like Apache Storm, Apache Flink, and Apache Apex, all of which use a pure streaming method rather than microbatches.
Structured Streaming
Structured Streaming (added in Spark 2.x) is to Spark Streaming what Spark SQL was to the Spark Core APIs: A higher-level API and easier abstraction for writing applications. In the case of Structure Streaming, the higher-level API essentially allows developers to create infinite streaming dataframes and datasets. It also solves some very real pain points that users have struggled with in the earlier framework, especially concerning dealing with event-time aggregations and late delivery of messages. All queries on structured streams go through the Catalyst query optimizer, and can even be run in an interactive manner, allowing users to perform SQL queries against live streaming data.
Structured Streaming is still a rather new part of Apache Spark, having been marked as production-ready in the Spark 2.2 release. However, Structured Streaming is the future of streaming applications with the platform, so if you’re building a new streaming application, you should use Structured Streaming. The legacy Spark Streaming APIs will continue to be supported, but the project recommends porting over to Structured Streaming, as the new method makes writing and maintaining streaming code a lot more bearable.
What’s next for Apache Spark?
While Structured Streaming provides high-level improvements to Spark Streaming, it currently relies on the same microbatching scheme of handling streaming data. However, the Apache Spark team is working to bring continuous streaming without microbatching to the platform, which should solve many of the problems with handling low-latency responses (they’re claiming ~1ms, which would be very impressive). Even better, because Structured Streaming is built on top of the Spark SQL engine, taking advantage of this new streaming technique will require no code changes.
In addition improving streaming performance, Apache Spark will be adding support for deep learning via Deep Learning Pipelines. Using the existing pipeline structure of MLlib, you will be able to construct classifiers in just a few lines of code, as well as apply custom Tensorflow graphs or Keras models to incoming data. These graphs and models can even be registered as custom Spark SQL UDFs (user-defined functions) so that the deep learning models can be applied to data as part of SQL statements.
Neither of these features is anywhere near production-ready at the moment, but given the rapid pace of development we’ve seen in Apache Spark in the past, they should be ready for prime time in 2018.
Source: InfoWorld Big Data
IBM Cloud Hands German Users Control of Their Data
IBM Cloud Hands German Users Control of Their Data
[unable to retrieve full-text content]To protect itself and EU-based clients, Big Blue modifies a play for Microsoft’s playbook. Read More…
Source: TheWHIR