How to use Redis for real-time stream processing

How to use Redis for real-time stream processing

Real-time streaming data ingest is a common requirement for many big data use cases. In fields like IoT, e-commerce, security, communications, entertainment, finance, and retail, where so much depends on timely and accurate data-driven decision making, real-time data collection and analysis are in fact core to the business.

However, collecting, storing and processing streaming data in large volumes and at high velocity presents architectural challenges. An important first step in delivering real-time data analysis is ensuring that adequate network, compute, storage, and memory resources are available to capture fast data streams. But a company’s software stack must match the performance of its physical infrastructure. Otherwise, businesses will face a massive backlog of data, or worse, missing or incomplete data.

Redis has become a popular choice for such fast data ingest scenarios. A lightweight in-memory database platform, Redis achieves throughput in the millions of operations per second with sub-millisecond latencies, while drawing on minimal resources. It also offers simple implementations, enabled by its multiple data structures and functions.

In this article, I will show how Redis Enterprise can solve common challenges associated with the ingestion and processing of large volumes of high velocity data. We’ll walk through three different approaches (including code) to processing a Twitter feed in real time, using Redis Pub/Sub, Redis Lists, and Redis Sorted Sets, respectively. As we’ll see, all three methods have a role to play in fast data ingestion, depending on the use case.

Challenges in designing fast data ingest solutions

High-speed data ingestion often involves several different types of complexity:

  • Large volumes of data sometimes arriving in bursts. Bursty data requires a solution that is capable of processing large volumes of data with minimal latency. Ideally, it should be able to perform millions of writes per second with sub-millisecond latency, using minimal resources.
  • Data from multiple sources. Data ingest solutions must be flexible enough to handle data in many different formats, retaining source identity if needed and transforming or normalizing in real-time.
  • Data that needs to be filtered, analyzed, or forwarded. Most data ingest solutions have one or more subscribers who consume the data. These are often different applications that function in the same or different locations with a varied set of assumptions. In such cases, the database not only needs to transform the data, but also filter or aggregate depending on the requirements of the consuming applications.
  • Data coming from geographically distributed sources. In this scenario, it is often convenient to distribute the data collection nodes, placing them close to the sources. The nodes themselves become part of the fast data ingest solution, to collect, process, forward, or reroute ingest data.

Handling fast data ingest in Redis

Many solutions supporting fast data ingest today are complex, feature-rich, and over-engineered for simple requirements. Redis, on the other hand, is extremely lightweight, fast, and easy to use. With clients available in more than 60 languages, Redis can be easily integrated with the popular software stacks.

Redis offers data structures such as Lists, Sets, Sorted Sets, and Hashes that offer simple and versatile data processing. Redis delivers more than a million read/write operations per second, with sub-millisecond latency on a modestly sized commodity cloud instance, making it extremely resource-efficient for large volumes of data. Redis also supports messaging services and client libraries in all of the popular programming languages, making it well-suited for combining high-speed data ingest and real-time analytics. Redis Pub/Sub commands allow it to play the role of a message broker between publishers and subscribers, a feature often used to send notifications or messages between distributed data ingest nodes.

Redis Enterprise enhances Redis with seamless scaling, always-on availability, automated deployment, and the ability to use cost-effective flash memory as a RAM extender so that the processing of large datasets can be accomplished cost-effectively.

In the sections below, I will outline how to use Redis Enterprise to address common data ingest challenges.

Redis at the speed of Twitter

To illustrate the simplicity of Redis, we’ll explore a sample fast data ingest solution that gathers messages from a Twitter feed. The goal of this solution is to process tweets in real-time and push them down the pipe as they are processed.

Twitter data ingested by the solution is then consumed by multiple processors down the line. As shown in Figure 1, this example deals with two processors – the English Tweet Processor and the Influencer Processor. Each processor filters the tweets and passes them down its respective channels to other consumers. This chain can go as far as the solution requires. However, in our example, we stop at the third level, where we aggregate popular discussions among English speakers and top influencers.

redis twitter streamRedis Labs

Figure 1. Flow of the Twitter stream

Note that we are using the example of processing Twitter feeds because of the velocity of data arrival and simplicity. Note also that Twitter data reaches our fast data ingest via a single channel. In many cases, such as IoT, there could be multiple data sources sending data to the main receiver.

There are three possible ways to implement this solution using Redis: ingest with Redis Pub/Sub, ingest with the List data structure, or ingest with the Sorted Set data structure. Let’s examine each of these options.

Ingest with Redis Pub/Sub

This is the simplest implementation of fast data ingest. This solution uses Redis’s Pub/Sub feature, which allows applications to publish and subscribe to messages. As shown in Figure 2, each stage processes the data and publishes it to a channel. The subsequent stage subscribes to the channel and receives the messages for further processing or filtering.

redis pubsubRedis Labs

Figure 2. Data ingest using Redis Pub/Sub

Pros

  • Easy to implement.
  • Works well when the data sources and processors are distributed geographically.

Cons 

  • The solution requires the publishers and subscribers to be up all the time. Subscribers lose data when stopped, or when the connection is lost.
  • It requires more connections. A program cannot publish and subscribe to the same connection, so each intermediate data processor requires two connections – one to subscribe and one to publish. If running Redis on a DBaaS platform, it is important to verify whether your package or level of service has any limits to the number of connections.

A note about connections

If more than one client subscribes to a channel, Redis pushes the data to each client linearly, one after the other. Large data payloads and many connections may introduce latency between a publisher and its subscribers. Although the default hard limit for maximum number of connections is 10,000, you must test and benchmark how many connections are appropriate for your payload.

Redis maintains a client output buffer for each client. The default limits for the client output buffer for Pub/Sub are set as:

client-output-buffer-limit pubsub 32mb 8mb 60

With this setting, Redis will force clients to disconnect under two conditions: if the output buffer grows beyond 32MB, or if the output buffer holds 8MB of data consistently for 60 seconds.

These are indications that clients are consuming the data more slowly than it is published. Should such a situation arise, first try optimizing the consumers such that they do not add latency while consuming the data. If you notice that your clients are still getting disconnected, then you may increase the limits for the client-output-buffer-limit pubsub property in redis.conf. Please keep in mind that any changes to the settings may increase latency between the publisher and subscriber. Any changes must be tested and verified thoroughly.

Code design for the Redis Pub/Sub solution

redis pubsub class diagramRedis Labs

Figure 3. Class diagram of the fast data ingest solution with Redis Pub/Sub

This is the simplest of the three solutions described in this paper. Here are the important Java classes implemented for this solution. Download the source code with full implementation here: https://github.com/redislabsdemo/IngestPubSub.

The Subscriber class is the core class of this design. Every Subscriber object maintains a new connection with Redis.

class Subscriber extends JedisPubSub implements Runnable{
private String name ="Subscriber";
private RedisConnection conn = null;
private Jedis jedis = null;

private String subscriberChannel ="defaultchannel";

public Subscriber(String subscriberName, String channelName) throws Exception{
name = subscriberName;
subscriberChannel = channelName;
Thread t = new Thread(this);
t.start();
}

@Override
public void run(){
try{
conn = RedisConnection.getRedisConnection();
jedis = conn.getJedis();
while(true){
jedis.subscribe(this, this.subscriberChannel);
}
}catch(Exception e){
e.printStackTrace();
}
}

@Override
public void onMessage(String channel, String message){
super.onMessage(channel, message);
}
}

The Publisher class maintains a separate connection to Redis for publishing messages to a channel.

public class Publisher{

RedisConnection conn = null;
Jedis jedis = null;

private String channel ="defaultchannel";

public Publisher(String channelName) throws Exception{
channel = channelName;
conn = RedisConnection.getRedisConnection();
jedis = conn.getJedis();
}

public void publish(String msg) throws Exception{
jedis.publish(channel, msg);
}
}

The EnglishTweetFilter, InfluencerTweetFilter, HashTagCollector, and InfluencerCollector filters extend Subscriber, which enables them to listen to the inbound channels. Since you need separate Redis connections for subscribe and publish, each filter class has its own RedisConnection object. Filters listen to the new messages in their channels in a loop. Here is the sample code of the EnglishTweetFilter class:

public class EnglishTweetFilter extends Subscriber
{

private RedisConnection conn = null;
private Jedis jedis = null;
private String publisherChannel = null;

public EnglishTweetFilter(String name, String subscriberChannel, String publisherChannel) throws Exception{
super(name, subscriberChannel);
this.publisherChannel = publisherChannel;
conn = RedisConnection.getRedisConnection();
jedis = conn.getJedis();
}

@Override
public void onMessage(String subscriberChannel, String message){
JsonParser jsonParser = new JsonParser();
JsonElement jsonElement = jsonParser.parse(message);
JsonObject jsonObject = jsonElement.getAsJsonObject();

//filter messages: publish only English tweets
if(jsonObject.get(“lang”) != null &&
jsonObject.get(“lang”).getAsString().equals(“en”)){
jedis.publish(publisherChannel, message);
}
}
}

The Publisher class has a publish method that publishes messages to the required channel.

public class Publisher{
.
.
public void publish(String msg) throws Exception{
jedis.publish(channel, msg);
}
.
}

The main class reads data from the ingest stream and posts it to the AllData channel. The main method of this class starts all of the filter objects.

public class IngestPubSub
{
.
public void start() throws Exception{
.
.
publisher = new Publisher(“AllData”);

englishFilter = new EnglishTweetFilter(“English Filter”,”AllData”,
“EnglishTweets”);
influencerFilter = new InfluencerTweetFilter(“Influencer Filter”,
“AllData”, “InfluencerTweets”);
hashtagCollector = new HashTagCollector(“Hashtag Collector”,
“EnglishTweets”);
influencerCollector = new InfluencerCollector(“Influencer Collector”,
“InfluencerTweets”);
.
.
}

Ingest with Redis Lists

The List data structure in Redis makes implementing a queueing solution easy and straightforward. In this solution, the producer pushes every message to the back of the queue, and the subscriber polls the queue and pulls new messages from the other end.

redis listsRedis Labs

Figure 4. Fast data ingest with Redis Lists

Pros

  • This method is reliable in cases of connection loss. Once data is pushed into the lists, it is preserved there until the subscribers read it. This is true even if the subscribers are stopped or lose their connection with the Redis server.
  • Producers and consumers require no connection between them.

Cons

  • Once data is pulled from the list, it is removed and cannot be retrieved again. Unless the consumers persist the data, it is lost as soon as it is consumed.
  • Every consumer requires a separate queue, which requires storing multiple copies of the data.

Code design for the Redis Lists solution

redis lists class diagramRedis Labs

Figure 5. Class diagram of the fast data ingest solution with Redis Lists

You can download the source code for the Redis Lists solution here: https://github.com/redislabsdemo/IngestList. This solution’s main classes are explained below.

MessageList embeds the Redis List data structure. The push() method pushes the new message to the left of the queue, and pop() waits for a new message from the right if the queue is empty.

public class MessageList{

protected String name = “MyList”; // Name
.
.
public void push(String msg) throws Exception{
jedis.lpush(name, msg); // Left Push
}

public String pop() throws Exception{
return jedis.brpop(0, name).toString();
}
.
.
}

MessageListener is an abstract class that implements listener and publisher logic. A MessageListener object listens to only one list, but can publish to multiple channels (MessageFilter objects). This solution requires a separate MessageFilter object for each subscriber down the pipe.

class MessageListener implements Runnable{
private String name = null;
private MessageList inboundList = null;
Map<String, MessageFilter> outBoundMsgFilters = new HashMap<String, MessageFilter>();
.
.
public void registerOutBoundMessageList(MessageFilter msgFilter){
if(msgFilter != null){
if(outBoundMsgFilters.get(msgFilter.name) == null){
outBoundMsgFilters.put(msgFilter.name, msgFilter);
}
}
}

.
.
@Override
public void run(){
.
while(true){
String msg = inboundList.pop();
processMessage(msg);
}
.
}

.
protected void pushMessage(String msg) throws Exception{
Set<String> outBoundMsgNames = outBoundMsgFilters.keySet();
for(String name : outBoundMsgNames ){
MessageFilter msgList = outBoundMsgFilters.get(name);
msgList.filterAndPush(msg);
}
}
}

MessageFilter is a parent class facilitating the filterAndPush() method. As data flows through the ingest system, it is often filtered or transformed before being sent to the next stage. Classes that extend the MessageFilter class override the filterAndPush() method, and implement their own logic to push the filtered message to the next list.

public class MessageFilter{

MessageList messageList = null;
.
.
public void filterAndPush(String msg) throws Exception{
messageList.push(msg);
}
.
.
}

AllTweetsListener is a sample implementation of a MessageListener class. This listens to all tweets on the AllData channel, and publishes the data to EnglishTweetsFilter and InfluencerFilter.

public class AllTweetsListener extends MessageListener{
.
.
public static void main(String[] args) throws Exception{
MessageListener allTweetsProcessor = AllTweetsListener.getInstance();

allTweetsProcessor.registerOutBoundMessageList(new
EnglishTweetsFilter(“EnglishTweetsFilter”, “EnglishTweets”));
allTweetsProcessor.registerOutBoundMessageList(new
InfluencerFilter(“InfluencerFilter”, “Influencers”));

allTweetsProcessor.start();
}
.
.
}

EnglishTweetsFilter extends MessageFilter. This class implements logic to select only those tweets that are marked as English tweets. The filter discards non-English tweets and pushes English tweets to the next list.

public class EnglishTweetsFilter extends MessageFilter{

public EnglishTweetsFilter(String name, String listName) throws Exception{
super(name, listName);
}

@Override
public void filterAndPush(String message) throws Exception{
JsonParser jsonParser = new JsonParser();

JsonElement jsonElement = jsonParser.parse(message);
JsonArray jsonArray = jsonElement.getAsJsonArray();
JsonObject jsonObject = jsonArray.get(1).getAsJsonObject();
if(jsonObject.get(“lang”) != null &&
jsonObject.get(“lang”).getAsString().equals(“en”)){
Jedis jedis = super.getJedisInstance();
if(jedis != null){
jedis.lpush(super.name, jsonObject.toString());

}
}
}
}

Ingest using Redis Sorted Sets

One of the concerns with the Pub/Sub method is that it is susceptible to connection loss and hence unreliable. The challenge with the Redis Lists solution is the problem of data duplication and tight coupling between producers and consumers.

The Redis Sorted Sets solution addresses both of these issues. A counter tracks the number of messages, and the messages are indexed against this message count. They are stored in a non-ephemeral state inside the Sorted Sets data structure, which is polled by consumer applications. The consumers check for new data and pull messages by running the ZRANGEBYSCORE command.

redis sorted setsRedis Labs

Figure 6. Fast data ingest with Redis Sorted Sets and Pub/Sub

Unlike the previous two solutions, this one allows subscribers to retrieve historical data when needed, and consume it more than once. Only one copy of data is stored at each stage, making it ideal for situations where the consumer to producer ratio is very high. However, this approach is more complex and less cost-effective when compared with the last two solutions.

Pros

  • It can fetch historical data when needed, because retrieved data is not removed from the Sorted Set.
  • The solution is resilient to data connection losses, because producers and consumers require no connection between them.
  • Only one copy of data is stored at each stage, making it ideal for situations where the consumer to producer ratio is very high.

Cons

  • Implementing the solution is more complex.
  • More storage space is required, as data is not deleted from the database when consumed. 

Code design for the Redis Sorted Sets solution

redis sorted sets class diagramRedis Labs

Figure 7. Class diagram of the fast data ingest solution with Redis Sorted Sets

You can download the source code here: https://github.com/redislabsdemo/IngestSortedSet. The main classes are explained below.

SortedSetPublisher inserts a message into a Sorted Set and increments the counter that tracks new messages. In many practical cases the counter can be replaced by the timestamp.

public class SortedSetPublisher
{

public static String SORTEDSET_COUNT_SUFFIX ="count";

// Redis connection
RedisConnection conn = null;

// Jedis object
Jedis jedis = null;

// name of the Sorted Set data structure
private String sortedSetName = null;

/*
* @param name: SortedSetPublisher constructor
*/
public SortedSetPublisher(String name) throws Exception{
sortedSetName = name;
conn = RedisConnection.getRedisConnection();
jedis = conn.getJedis();
}

/*
*/
public void publish(String message) throws Exception{
// Get count
long count = jedis.incr(sortedSetName+”:”+SORTEDSET_COUNT_SUFFIX);

// Insert into sorted set
jedis.zadd(sortedSetName, (double)count, message);
}

}

The SortedSetFilter class is a parent class that implements logic to learn about new messages, pull them from the database, filter them, and push them to the next level. Classes that implement custom filters extend this class and override the processMessage() method with a custom implementation.

public class SortedSetFilter extends Thread
{
// RedisConnection to query the database
protected RedisConnection conn = null;

protected Jedis jedis = null;

protected String name ="SortedSetSubscriber"; // default name

protected String subscriberChannel ="defaultchannel"; //default name

// Name of the Sorted Set
protected String sortedSetName = null;

// Channel (sorted set) to publish
protected String publisherChannel = null;

// The key of the last message processed
protected String lastMsgKey = null;

// The key of the latest message count
protected String currentMsgKey = null;

// Count to store the last message processed
protected volatile String lastMsgCount = null;

// Time-series publisher for the next level
protected SortedSetPublisher SortedSetPublisher = null;

public static String LAST_MESSAGE_COUNT_SUFFIX="lastmessage";

/*
* @param name: name of the SortedSetFilter object
* @param subscriberChannel: name of the channel to listen to the
* availability of new messages
* @param publisherChannel: name of the channel to publish the availability of
* new messages
*/
public SortedSetFilter(String name, String subscriberChannel,
String publisherChannel) throws Exception{
this.name = name;
this.subscriberChannel = subscriberChannel;
this.sortedSetName = subscriberChannel;
this.publisherChannel = publisherChannel;
this.lastMsgKey = name+”:”+LAST_MESSAGE_COUNT_SUFFIX;
this.currentMsgKey =
subscriberChannel+”:”
+SortedSetPublisher.SORTEDSET_COUNT_SUFFIX;
}

@Override
public void run(){
try{

// Connection for reading/writing to sorted sets
conn = RedisConnection.getRedisConnection();
jedis = conn.getJedis();
if(publisherChannel != null){
sortedSetPublisher =
new SortedSetPublisher(publisherChannel);
}

// load delta data since last connection
while(true){
fetchData();
}
}catch(Exception e){
e.printStackTrace();
}
}

/*
* init() method loads the count of the last message processed. It then loads
* all messages since the last count.
*/
private void fetchData() throws Exception{
if(lastMsgCount == null){
lastMsgCount = jedis.get(lastMsgKey);
if(lastMsgCount == null){
lastMsgCount ="0";
}
}

String currentCount = jedis.get(currentMsgKey);

if(currentCount != null && Long.parseLong(currentCount) >
Long.parseLong(lastMsgCount)){
loadSortedSet(lastMsgCount, currentCount);
}else{
Thread.sleep(1000); // sleep for a second if there’s no
// data to fetch
}
}

//Call to load the data from last Count to current Count
private void loadSortedSet(String lastMsgCount, String currentCount)
throws Exception{
//Read from SortedSet
Set<Tuple> CountTuple = jedis.zrangeByScoreWithScores(sortedSetName, lastMsgCount, currentCount);
for(Tuple t : CountTuple){
processMessageTuple(t);
}

}

// Override this method to customize the filters
private void processMessageTuple(Tuple t) throws Exception{
long score = new Double(t.getScore()).longValue();
String message = t.getElement();
lastMsgCount = (new Long(score)).toString();
processMessage(message);

jedis.set(lastMsgKey, lastMsgCount);
}

protected void processMessage(String message) throws Exception{
//Override this method
}
}

EnglishTweetsFilter is a custom filter that extends SortedSetFilter with its own custom filter to select only tweets that are marked as English.

public class EnglishTweetsFilter extends SortedSetFilter
{
/*
* @param name: name of the SortedSetFilter object
* @param subscriberChannel: name of the channel to listen to the
* availability of new messages
* @param publisherChannel: name of the channel to publish the availability
* of new messages
*/
public EnglishTweetsFilter(String name, String subscriberChannel, String publisherChannel) throws Exception{
super(name, subscriberChannel, publisherChannel);
}

@Override
protected void processMessage(String message) throws Exception{
//Filter; add them to a new time series database and publish
JsonParser jsonParser = new JsonParser();

JsonElement jsonElement = jsonParser.parse(message);
JsonObject jsonObject = jsonElement.getAsJsonObject();

if(jsonObject.get(“lang”) != null &&
jsonObject.get(“lang”).getAsString().equals(“en”)){
System.out.println(jsonObject.get(“text”).getAsString());
if(sortedSetPublisher != null){
sortedSetPublisher.publish(jsonObject.toString());

}
}
}

/*
* Main method to start EnglishTweetsFilter
*/
public static void main(String[] args) throws Exception{
EnglishTweetsFilter englishFilter = new EnglishTweetsFilter(“EnglishFilter”, “alldata”, “englishtweets”);
englishFilter.start();
}

Final thoughts

When using Redis for fast data ingest, its data structures and pub/sub functionality offer a number of options for implementation. Each approach has its advantages and disadvantages. Redis Pub/Sub is easy to implement, and producers and consumers are decoupled. But Pub/Sub is not resilient to connection loss, and it requires many connections. It’s typically used for e-commerce workflows, job and queue management, social media communications, gaming, and log collection.

The Redis Lists method is also easy to implement, and unlike with Pub/Sub, data is not lost when the subscriber loses the connection. Disadvantages include tight coupling of producers and consumers and the duplication of data for each consumer, which makes it unsuitable for some scenarios. Suitable use cases would include financial transactions, gaming, social media, IoT, and fraud detection.

The Redis Sorted Sets has a larger footprint and is more complex to implement and maintain than the Pub/Sub and List methods, but overcomes their limitations. It is resilient to connection loss, and because retrieved data is not removed from the Sorted Set, it allows for time-series queries. And because only one copy of the data is stored at each stage, it is very efficient in cases where one producer has many consumers. The Sorted Sets method is a good match for IoT transactions, financial transactions, and metering.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Source: InfoWorld Big Data

Report: Federal IT Pulling Back From Pure-Play Public Cloud Infrastructures

Report: Federal IT Pulling Back From Pure-Play Public Cloud Infrastructures

Cloud savvy Federal IT decision makers are opting for hybrid cloud models over pure-play public cloud infrastructures as they seek to modernize and secure government systems, according to an independent survey underwritten by Nutanix. The survey, conducted by Market Connections, Inc., drove several key findings. Cost savings using a public-only approach, while possible, have not lived up to the initial hype of cloud computing. While 39% of public cloud users indicated that cost savings are considered ‘great’, the majority of respondents (61%) noted minimal results, ranging from ‘some savings’ to ‘no savings’ at all. Respondents also noted that not every workload is optimal to run in a public cloud, with financials (43%), custom or mission-specific applications (36%) and human resources/ERP applications (34%) considered the least suited for the public cloud.

The most surprising result was that, as a group, more experienced public cloud users forecasted increasing the proportion of application workloads that they run in their private clouds over the next two years, indicating that more experienced cloud users are increasingly leveraging hybrid models to optimize their environments.

“Federal agencies are realizing that a wholesale move to the public cloud is not always the best approach to meet their desired outcomes,” said Chris Howard, Vice President of Federal, Nutanix. “There is a clear opportunity to achieve the benefits of cloud with a hybrid approach, keeping predictable application workloads on-prem and using public cloud for dynamic applications that require extra capacity for finite periods of time.”

The survey of 150 defense, civilian, and intelligence agency IT decision makers sought to determine whether the move to cloud computing has fulfilled agency expectations since the Cloud First Mandate was issued in 2010. Key areas of focus for the study were cost savings, security and applicability of cloud for all application workloads.

The blind online survey was comprised of Department of Defense, military service or intelligence agency respondents (45%), and federal civilian or independent government agencies, including legislative and judicial respondents (55%). All respondents were familiar with their agency’s cloud usage.

To access the full report and survey results, please visit http://www.nutanix.com/FedStudy.

Source: CloudStrategyMag

Ensono Cloud Deploys NetApp SolidFire

Ensono Cloud Deploys NetApp SolidFire

Ensono Cloud has recently deployed NetApp SolidFire, a Tier-one quality service provider accelerates shift to a software-defined data center with performance guarantees, flexibility, and scalability.

 “Ensono is committed to delivering a superior managed service experience to our clients. With NetApp® SolidFire®, our clients can focus on accelerating their businesses and leave the burden of managing IT to us,” said Oliver Presland, VP of Global Product Management, Ensono. “Simply put, NetApp SolidFire enables us to deliver more value with less resources. It has become a key enabler of our international cloud platform expansion and our shift toward a software-defined data center.”

According to a survey by Harvey Nash and KPMG, CIO priorities are broadening to include automation and improving agility, along with increasing operational efficiencies, improving business processes, delivering consistent and stable IT performance to the business, and saving money. As a leading hybrid IT solutions and governance provider, U.S.- and UK-based Ensono is answering these rapidly changing business priorities with its new secure private cloud platform, Ensono Cloud.

As the storage foundation designed to run business-critical applications for Ensono Cloud, Ensono deployed NetApp SolidFire all-flash storage from NetApp to serve its clients in the UK and United States. With SolidFire, Ensono delivers improved managed IT services from an innovative infrastructure that is predictable, automated, scalable, and highly available to support its clients’ business-critical applications.

With granular, volume-level SolidFire quality of service, Ensono delivers high performance of every customer application from its shared infrastructure and offers simple delivery of different service tiers. Its scale-out architecture enables Ensono to achieve nondisruptive system expansion with instant resource availability to better support its business growth.

Through SolidFire automation capabilities and deep integration with its VMware-based environment, Ensono can provision resources more quickly, allowing its staff to focus on high-value activities such as applications management and designing complex client solutions.

The diverse features provided by the NetApp SolidFire system are critical to transitioning to a next-generation, software-defined data center. This transition, combined with improved managed services delivered through the new Ensono Cloud, has enabled Ensono to release a state-of-the-art platform to an international market. Already available in the UK, Ensono Cloud is slated to deploy in the United States by summer 2017, bringing highly regarded Ensono services to new markets.

Source: CloudStrategyMag

All your streaming data are belong to Kafka

All your streaming data are belong to Kafka

Apache Kafka is on a roll. Last year it registered a 260 percent jump in developer popularity, as Redmonk’s Fintan Ryan highlights, a number that has only ballooned since then as IoT and other enterprise demands for real-time, streaming data become common. Hatched at LinkedIn, Kafka’s founding engineering team spun out to form Confluent, which has been a primary developer of the Apache project ever since.

But not the only one. Indeed, given the rising importance of Kafka, more companies than ever are committing code, including Eventador, started by Kenny Gorman and Erik Beebe, both co-founders of ObjectRocket (acquired by Rackspace). Whereas ObjectRocket provides the MongoDB database as a service, Eventador offers a fully managed Kafka service, further lowering the barriers to streaming data.

Talking with the Eventador co-founders, it became clear that streaming data is different, requiring “fresh eyes” because “data being mutated in real time enables new use cases and new possibilities.” Once an enterprise comes to depend on streaming data, it’s hard to go back. Getting to that point is the key.

Kafka vs. Hadoop

As popular as Apache Hadoop has been, the Hadoop workflow is simply too slow for the evolving needs of modern enterprises. Indeed, as Gorman tells it, “Businesses are realizing that the value of data increases as it becomes more real-time.” For those companies that prefer to wait on adding a real-time data flow to their products and services, they risk the very real likelihood that their competitors are not content to sit on their batchy laurels.

This trend is driving the adoption of technologies that can reliably and scalably deliver and process data as near real-time as possible. New frameworks dedicated to this architecture needed to exist. Hence, Apache Kafka was born.

What about Apache Spark? Well, as Gorman points out, Spark is capable of real-time processing, but isn’t optimally suited to it. The Spark streaming frameworks are still micro-batch by design.

This leaves Kafka, which “can offer a true exactly once, one-at-a-time processing solution for both the transport and the processing framework,” Gorman explains. Beyond that, additional components like Apache Flink, Beam, and others extend the functionality of these real-time pipelines to allow for easy mutation, aggregation, filtering, and more. All the things that make a mature, end-to-end, real-time data processing system.

Kafka’s pub-sub model

This wouldn’t matter if Kafka were a beast to learn and implement, but it’s not (on either count). As Gorman highlights, “The beauty of Apache Kafka is it exposes a powerful API yet has very simple semantics. It is all very approachable.” Not only that, but its API has been implemented in many different programming languages, so the odds are good that your favorite language has a driver available.

Kafka has the notion of a topic, which is simply a namespace for a stream of data. It’s very simple to publish data to a topic, and Kafka handles the routing, scalability, durability, availability, etc. Multiple consumers coordinate subscription to these topics, to fetch data and process or route it. Asked about how this translates into the application development experience, Gorman stressed that it’s not trivial but it’s straightforward: “Building applications that work with Kafka is fairly easy [as] the client libraries handle much of the nuances of the communication, and developers utilize the API to publish or subscribe to streams of data.”

The problem, if any, isn’t the technology. Rather, it’s a question of paradigms.

The real trick for developers, Gorman tells me, is “to think about using streaming data with a fresh pair of eyes.” Why? Because “data being mutated in real time enables new use cases and new possibilities.”

Let’s look at a tangible example. Perhaps a client publishes data about ridership of a ride-sharing service. One set of consumers analyzes this stream to perform machine learning algorithms for dynamic pricing, then another set of consumers reads the data to provide location and availability of the cars to customers’ mobile devices. Yet another consumer feeds an aggregation framework for ridership data to internal dashboards. Kafka is at the core of a data architecture that can feed all kinds of business needs, all real-time.

Kafka in the cloud

This is great for developers and the companies for which they work, but Kafka demand is no guarantee of Eventador’s success, given that it has to compete with Confluent, which has the distinction of being the founder of Kafka. What’s more, Confluent, too, has announced a cloud offering that likely will compete with Eventador’s Kafka service.

Gorman is not bothered. As he describes,

The real difference is that we aren’t limited just to Kafka. We use Kafka where it makes the most sense. We are an end-to-end, enterprise-grade, stream processing framework built on Apache Kafka and Apache Flink. We have connectors for AWS S3, a REST interface, integration with PrestoDB and Jupyter notebooks, as well as connections for popular databases and even other streaming systems like AWS Kinesis. We offer plans from a simple single node to full on-prem enterprise configurations.

Besides, given the booming demand for real-time data, Gorman believes there is room for many different players. Not only does Eventador complement Kafka with Flink and more, it has taken to heart Rackspace’s mantra for “fanatical customer support,” which starts with a well-built, fully integrated product. Having spent decades doing operations for some of the world’s largest companies, Gorman continues, “We know what it means to run a first class, professional quality, rock solid, as-a-service offering.”

He’s absolutely right that the market is still young. Developers are still working to understand how Kafka can be integrated into their projects. The use cases are expanding every day, driven by this need to compete with data.

Years from now, however, “It will be common to rely on streaming data in your infrastructure,” Gorman points out, “and not just some ancillary workload.” This is the future they’re building for. “Once you start expecting data to be more real-time, it’s hard to stop.” Eventador, Confluent, and undoubtedly others are building for this real-time, streaming data future. For some, that future is now. For others, these startups hope to get them there sooner.

Source: InfoWorld Big Data

Data is eating the software that is eating the world

Data is eating the software that is eating the world

No one doubts that software engineering shapes every last facet of our 21st century existence. Given his vested interest in companies whose fortunes were built on software engineering, it was no surprise when Marc Andreessen declared that “software is eating the world.”

But what does that actually mean, and, just as important, does it still apply, if it ever did? These questions came to me recently when I reread Andreessen’s op-ed piece and noticed that he equated “software” with “programming.” Just as significant, he equated “eating” with industry takeovers by “Silicon Valley-style entrepreneurial technology companies” and then rattled through the usual honor roll of Amazon, Netflix, Apple, Google, and the like. What they, and others cited by Andreessen, have in common is that they built global-scale business models on the backs of programmers who bang out the code that drives web, mobile, social, cloud, and other 24/7 online channels.

Since the piece was published in the Wall Street Journal in 2011, we’ve had more than a half-decade to see whether Andreessen’s epic statement of Silicon Valley triumphalism proved either prescient or, perhaps, merely self-serving and misguided. I’d say it comes down more on the prescient end of the spectrum, due to the fact that most (but not all) of the success stories he cited have continued their momentum in growth, profitability, acquisitions, innovation, and so forth. People from programming backgrounds – such as Mark Zuckerberg – are indeed the multibillionaire rockstars of this new business era. In this way, Andreessen has so far been spared the fate of Tom Peters, who saw many of the exemplars he cited in his 1982 bestseller “In Search of Excellence” go on to be deconstructed by business rivals or blindsided by trends they didn’t see coming.

Rise of the learning machines

However, it has become clear to everyone, especially the old-school disruptors cited by Andreessen, that “software,” as it’s normally understood, is not the secret to future success. Going forward, the agent of disruption will be the data-driven ML (machine learning) algorithms that power AI. In this new era, more of the logic that powers intelligent applications won’t be explicitly programmed. The days of predominantly declarative, deterministic, and rules-based application development are fast drawing to a close. Instead, the probabilistic logic at the heart of chatbots, recommendation engines, self-driving vehicles, and other AI-powered applications is being harvested directly from source data.

The “next best action” logic permeating our lives is evolving inside of applications through continuous inference from data originating in the Internet of Things and other production applications. Consequently, there will be a diminishing need for programmers, in the traditional sense of people who build hard-and-fast application logic. In their place, the demand for a new breed of developer – the data scientist – will continue to grow. This term refers to the wide range of specialists who craft, train, and manage the regression models, neural networks, support vector machines, unsupervised learning models, and other ML algorithms upon which AI-centric apps depend.

To compound the marginalization of programmers in this new era, we’re likely to see more ML-driven code generation along the lines that I discussed in this recent post. Amazon, Google, Facebook, Microsoft, and other software-based powerhouses have made huge investments in data science, hoping to buoy their fortunes in the post-programming era. They all have amassed growing sets of training data from their ongoing operations. For these reasons, the “Silicon Valley-style” monoliths are confident that they have the resources needed to build, tune, and optimize increasingly innovative AI/ML-based algorithms for every conceivable application.

However, any strategic advantages that these giants gain from these AI/ML assets may be short-lived. Just as data-driven approaches are eroding the foundations of traditional programming, they’re also beginning to nibble at the edges of what highly skilled data scientists do for a living. These trends are even starting to chip away at the economies of scale available to large software companies with deep pockets.

AI and the Goliaths

We’re moving into an era in which anyone can tap into cloud-based resources to cheaply automate the development, deployment, and optimization of innovative AI/ML apps. In a “snake eating its own tail” phenomenon, ML-driven approaches will increasingly automate the creation and optimization of ML models, per my discussion here. And, from what we’re seeing in research initiatives such as Stanford’s Snorkel project, ML will also play a growing role in automating the acquisition and labeling of ML training data. What that means is that, in addition to abundant open-source algorithms, models, code, and data, the next-generation developer will also be able to generate ersatz but good-enough labeled training data on the fly to tune new apps for their intended purposes.

As the availability of low-cost generative training data grows, the established software companies’ massive data lakes, in which their developers maintain petabytes of authentic from-the-source training data, may become more of an overhead burden than a strategic asset. Likewise, the need to manage the complex data-preparation logic for use of this source data may become a bottleneck that impedes the ability of developers to rapidly build, train, and deploy new AI apps.

When any developer can routinely make AI apps just as accurate as Google’s or Facebook’s – but with far less expertise, budget, and training data than the big shots – a new era will have dawned. When we reach that tipping point, the next generation of data-science-powered disruptors will start to eat away at yesteryear’s software startups.

Source: InfoWorld Big Data

Dataguise Presents Data-Centric Audit and Protection (DCAP) Solutions At AWS Summit

Dataguise Presents Data-Centric Audit and Protection (DCAP) Solutions At AWS Summit

Dataguise has announced that the company will exhibit at the AWS Summit in Chicago. Dataguise will showcase its award-winning security and compliance solution for the detection, protection, monitoring, and auditing of sensitive data in Amazon Simple Storage Service (Amazon S3), Redshift, and all databases supported by RDS. During the event, Dataguise will feature enterprise customer deployments of DgSecure where IT professionals have securely migrated enterprise data to AWS and leveraged the highly scalable platform in financial services, health care, insurance, and other tightly regulated industries.

The AWS Summit in Chicago takes place at the McCormick Place Lakeside Center from July 26-27 and is a popular industry conference that brings together the cloud computing community to connect, collaborate and learn about AWS. Dataguise will show DgSecure, a multi-platform data-centric security solution which integrates with AWS to scan and locate all sensitive information stored on Amazon S3, Redshift, and RDS to reveal the location and status of this information throughout its lifecycle. The solution also provides data masking for databases supported by RDS, including SQL Server, Oracle, Postgres, and MySQL.

“As an established Amazon AWS partner, Dataguise has invested the time and resources required to develop the most effective solution for Amazon’s cloud infrastructure platform,” said JT Sison, VP, Marketing and Business Development for Dataguise. “As a result, DgSecure enables users to unleash the power of data in the cloud in the most secure and compliant way possible. The presentation of this technology at the AWS Chicago Summit is expected to draw interest from enterprises with deployments on AWS infrastructure as an increasing number of workloads move to this platform.”

Source: CloudStrategyMag

Cloud Technology Partners Launches Cloud Kickstart For AWS

Cloud Technology Partners Launches Cloud Kickstart For AWS

Cloud Technology Partners (CTP) has announced that it has launched Cloud Kickstart for Amazon Web Services (AWS), a fixed-time, fixed-price offering that enables clients to safely and quickly leverage the power of AWS across their organization.

“Regardless of whether clients are looking at one workload, multiple workloads, or an entire portfolio, transforming from on-premise to cloud-based IT requires a deep understanding of the best technologies, tools, and vendors to build the right end-to-end solution,” said Robert Christiansen, vice president and cloud adoption practice lead, Cloud Technology Partners. “Cloud Kickstart for AWS takes the guess work out of standing up an enterprise-grade AWS environment and includes all the processes and tools you need to enable your team to safely and securely start provisioning cloud resources.”

Cloud Kickstart for AWS enables your team to quickly start leveraging a secure cloud environment and gain instant awareness of the transformative power of the cloud within your organization. Reducing time to cloud from up to a year to just six weeks, Cloud Kickstart for AWS provides an enterprise-grade AWS environment with all of the essential automation, security, governance, and compliance controls in place including:

  • CTP’s recommended tagging standards
  • AWS account standards
  • IPSec VPN connectivity to AWS
  • Platform common services VPC (both production and nonproduction)
  • Active Directory integration for controlling access to tooling and environment
  • Automated creation of Windows and Linux AMIs
  • AMI snapshot management
  • Creation of custom IAM Roles
  • SDLC toolchain using best-of-breed automation tools

Cloud Kickstart helps companies realize time to value faster with pre-built continuous integrations and continuous testing capabilities that provide immense efficiency gains to DevOps teams. As well as providing best-in-class security, compliance and analytics capabilities to keep your organization’s data secure.

In addition to these features, Cloud Kickstart for AWS also includes a robust reference architecture created with best-of-breed tools in the AWS ecosystem to create a simple package for rapid AWS adoption at scale. From configuration management with chef to security leveraging Trend Micro, Dome9 Arc and Vault, to operational analytics using Sumo Logic, Cloud Kickstart for AWS includes all the tools you need to ensure a secure and compliant AWS environment. Today’s Cloud Kickstart for AWS reference architecture includes these proven solutions:

  • Host-based anti-malware, intrusion prevention and detection, integrity monitoring and application control (Trend Micro)
  • Network security and IAM monitoring and enforcement (Dome9 Arc)
  • Secrets store, temporary access leases (Vault)
  • Encryption for EBS volumes (AWS)
  • Monitoring (AWS Cloudwatch)
  • Consolidated logging including CloudTrail, Build, Automation, and security events across all accounts (Sumo Logic)
  • Configuration management (Chef)
  • Orchestration (Jenkins)
  • Binary repository (Artifactory)
  • Code repository (Git)
  • Managed Cloud Controls for Continuous Compliance and Continuous Cost Control (CTP)

“Moving to the cloud is one of the most transformative changes an organization can make, however, it can be disruptive to the business without the right tools or expertise,” said William Fellows, founder and research vice president, 451 Group. “Today, organizations are seeking methods to help them transition to the cloud more quickly and securely. With Cloud Kickstart for AWS, Cloud Technology Partners has brought together best-of-breed partners to address this business need and help organizations rapidly take advantage of the cloud.”

Source: CloudStrategyMag

Logicalis US Expands Its Microsoft CSP Program

Logicalis US Expands Its Microsoft CSP Program

Logicalis US has announced it has expanded its Microsoft Cloud Solution Provider (CSP) program and is relaunching its offerings to include additional professional and managed services attractive to organizations of all sizes — including enterprise organizations. To help CIOs wondering if a CSP relationship is right for their business, the Microsoft solution experts at Logicalis have identified seven reasons CIOs should consider a CSP relationship. Microsoft’s CSP program provides organizations an alternative consumption model for procuring cloud-based Office 365 and Azure subscriptions that does not require the kind of stringent, three-year commitment inherent in larger, volume-based on-premise enterprise agreements (EAs). Additionally, by bundling the right professional and managed services, Logicalis, a Microsoft Gold Partner and one of the top 200 Microsoft solution providers of 2017, is able to provide comprehensive Office 365 and Azure solutions that take advantage of the cloud’s flexibility, delivering a monthly consumption- or usage-based model capable of meeting the varied needs of customers of all sizes — small to large.

“Providing our Microsoft customers – including enterprise-level clients – with a model that can adapt to changes in their business is particularly important to organizations that may not be able to predict business fluctuations three years in advance,” says Wendy McQuiston, director, Microsoft Professional Services, Logicalis US. “If a customer needs 1,000 seats of Office 365 this month, then wants to decrease those seats to 800 the following month, under their CSP agreement with Logicalis, they can do that with no financial penalty. Our CSP program combines the product pricing as well as the level of support that fits the customer’s changing needs and digital transformation strategy — no more money lost due to non-consumption or increases based on the difference between EA estimates and actual usage. Logicalis’ Microsoft CSP program allows any size customer to purchase and pay only for the subscriptions and services they actually use on a month-to-month basis.”

Seven Benefits of a Microsoft CSP Relationship

Logicalis’ Microsoft CSP program can help users of any size take advantage of a more flexible, pay-as-you-go, consumption-based model with the option to bundle Microsoft cloud subscriptions like Office 365 and Azure with a host of professional and managed services designed to ensure those solutions are functioning as expected and to assist with the migration to those services.

  • Pay-As-You-Go: No more long-term pre-pay service commitments, enabling a faster ROI on Microsoft Office 365 and Azure cloud implementations and services.
  • Migration Help: Built-in migration assistance to move users to the cloud.
  • Professional Services Included: Security, identity management and application needs as well as ongoing support services are bundled into the Logicalis CSP professional services engagement.
  • Faster Issue Resolutions: Logicalis US becomes the CIO’s main point of contact for issue resolution, delivering the right level of support faster.
  • Digital Transformation Strategies: By partnering with Logicalis, CIOs will learn how to combine Microsoft solutions with other multi-vendor solutions to drive their organization’s digital transformation.
  • Logicalis-Direct Invoicing: Cloud subscription invoicing will come directly from Logicalis, unless customers choose to retain on-premise EA subscriptions.
  • Combined Program Capabilities: Organizations can continue to purchase their on-premise EA licensing direct from Microsoft and still utilize the Logicalis CSP program for their cloud subscriptions or alternatively add professional or managed services support to services procured through an EA.

 

Source: CloudStrategyMag

Archer Voted Best Cloud-Based Services Provider

Archer Voted Best Cloud-Based Services Provider

Archer has been voted best cloud-based services provider in the 2017 Waters Rankings. The win was revealed at today’s Waters Rankings Awards Presentation luncheon held at the Metropolitan Club in New York.

The 15th annual Waters Rankings recognize the investment management community’s leading solutions providers. Winners were determined based on a survey of financial technology professionals who rated their peers in terms of overall quality of service.

“This is a fantastic team win for Archer,” said Bryan Dori, Archer CEO. “Firms competing for Waters’ Best cloud-based services provider category include the strongest brands in Fintech.”

Dori adds, “Our flexible cloud-based services model is ideal for investment managers positioning for growth. Firms are able to focus on investing and client relationships with scalable technology and services. The Best cloud-based services provider win is an acknowledgement of the value of our integrated model as much as the way we deliver it.”

Winning Waters’ best cloud-based services provider is the latest in a series of accolades received by Archer and its team so far this year. Archer CEO Bryan Dori was named Entrepreneur Of The Year 2017 in the Fintech category in Greater Philadelphia, and Archer CFO Ted Pastva was recognized as a Leading CFO in Philadelphia by Philadelphia Business Journal.

Source: CloudStrategyMag

COPA-DATA Wins The 2017 Microsoft Internet of Things (IoT) Award

COPA-DATA Wins The 2017 Microsoft Internet of Things (IoT) Award

Microsoft has crowned COPA-DATA, the Austria-based software manufacturer for industrial automation, as its global Microsoft Partner of the Year in the Internet of Things (IoT) category. The award was presented at this year’s Microsoft Inspire, the company’s worldwide partner conference, which was held between July 9 and 13, 2017 in Washington, D.C., USA. This award is a recognition by Microsoft of COPA-DATA’s innovations and software solutions based on Microsoft technology.

Each year, Microsoft presents its sought-after Partner of the Year Awards at its worldwide partner conference. This year, projects were submitted by more than 2,800 companies across 115 countries in a total of 34 global categories and 103 national categories. Following on from two victories at the Microsoft Awards last year, this is the third prize for COPA-DATA. The company was honored for its solutions and services involving its zenon software system in the global Internet of Things (IoT) category.

“This success is excellent proof of the high quality and performance demonstrated by our IoT solutions based on our zenon software and the latest Microsoft technologies and products, especially the Microsoft Azure cloud platform, including the Azure IoT Suite,” explains Johannes Petrowisch, Global Partner & Business Development Manager at COPA-DATA. “We are extremely pleased with this award and would like to thank all our customers and partners who have made this achievement possible.”

Pioneering Software Solution

Triumph at the 2017 Microsoft Partner of the Year Awards was based on an IoT application by COPA-DATA, which uses the technological interaction between zenon and Microsoft Azure to prepare equipment manufacturers for digitization in the industry. This gives users access to all data relating to individual machines, assembly lines, or a company’s entire production site from a single system. Additional services such as predictive analysis, machine learning, cross-site reporting, remote maintenance, and control can be fully cloud-based or implemented in hybrid scenarios – opening the door to service-oriented business models.

“Our ecosystem of innovative partners is the cornerstone to delivering transformative solutions to our mutual customers,” said Ron Huddleston Corporate Vice President, One Commercial Partner, Microsoft Corp. “We are pleased to recognize COPA-DATA for being selected as winner of the 2017 Microsoft Internet of Things (IoT) Worldwide Partner of the Year award.”

Source: CloudStrategyMag