Big Data Must Not Be an Elephant Riding a Bicycle

Forrester’s John Rymer sums up his opinion succinctly when he says, “Big Data: The worst category name ever.” [1] Big Data certainly has challenges in both name and how people perceive it; as the hype would have it, I call it “the elephant riding the bicycle.” I’ll give you the seven things you need to consider, but first let’s look at the hype.

Big Data hype is everywhere. Fresh from three conferences in the past four weeks, I’ve noticed that Big Data has been the single biggest topic of discussion with customers, new acquaintances, old friends, co-workers, and industry analysts. It would be hard to overstate how much attention Big Data is receiving.

But as with any hype cycle, there is an enormous amount of questioning about where and how Big Data delivers value. Thinking back to the early Internet, there were very similar types of conversations. In 1995, there was a growing feeling that something big was coming from” The Web,” but it was hard to sort the success from the sales pitch. People were spending money to create websites with little plan for why. A few learned detractors even published bold criticisms.

Despite the doubt [2] and slow start, we found our cyberspace groove and launched wave after wave of new ways to buy, sell, read, listen, watch, and converse. For billions of people, the Internet is simply an expectation.

Faster This Time Around

The change of pace is now much faster, and this time around the pump has been primed. We’ve become very quick at jumping on new technology and far better at marketing software and services. What took years to crank up in the ’90s, takes much less time today. In what seems like no time, there are a wide variety of vendors selling brand-new products with Big Data labels, and more than a few old products getting papered over with Big Data buzzwords. The market is confused and perhaps a bit skeptical—but for good reason.

Solutions are well ahead of success stories. There are far more companies selling Big Data solutions than companies providing details around how Big Data is creating value.

What About Hadoop?

Before you think Big Data lacks success stories, let me assure you, it doesn’t. Powerful Big Data analytics have already created enormous value for companies like Google, Facebook, and Yahoo; these companies needed to monetize their vast amounts of member and search data. Apache Hadoop, the leading open source solution, was created from Google’s MapReduce and the Google File System, but adopted by a Yahoo employee. Facebook uses Hadoop.

No one doubts the value those companies gained, but they were in unique positions and employed scores of people in creating and maintaining Big Data solutions. It doesn’t make sense for everyone to go through that effort, or throw money at the scarce data scientists needed to run the system. Most organizations shouldn’t follow that model and for one very good reason that has nothing to do with Hadoop. It has to do with elephants and bicycles.

Rymer weighed in on Hadoop saying, “…too many people now seem to think that Hadoop is Big Data, when Hadoop is just one of the several Big Data solutions available—and Hadoop isn’t good for many Big Data scenarios.”

Elephants On Bicycles

Unlike the Internet circumstances in 1995, Big Data has a higher barrier to entry and requires a broader perspective to connect, understand, anticipate; and act on the intelligence gleaned from data that’s known for its high velocity, large volume, and wide variety of sources and types.

“Simply applying distributed storage and processing (like Hadoop) to extremely large data sets is like putting an elephant on a bicycle…it just doesn’t make business sense.”

And there’s another challenge to those thinking about the Google, Facebook, and Yahoo model: The need for operational decision making isn’t sufficiently supported by offline batch “research projects.” Things are happening much faster than batch allows. I’ll touch more on that in a moment.

Deep Foundation

If the data ecosystem isn’t sufficient for creating value from Big Data, value will be elusive. Data will be “dirty” and silo’d ,and response will be too late. There is a logical path emerging among forward-leaning organizations, analysts, and a small set of software vendors. It is a balanced ecosystem with a deep foundation. It starts even before the data arrives and it has its roots in how we talk about the challenge and where we want to create value.

As Rymer puts it, “We have yet to see a one-size-fits-all suite or solution for all of these scenarios.” It takes foundational strategy and technology to make Big Data “work.”

Here are our seven steps to avoid getting crushed by the elephant riding a bicycle:

1. Use Case Clarity

Those who haven’t figured out their uses cases for Big Data are in danger of confusing the term with its purposes—a focal point of Rymer’s “‘crabby old guy rant” and a likely reason companies who are finding success don’t like to use the term Big Data. Instead, they refer to their use cases with terms like predictive analytics, digital customer experience management, compliance, sense and respond, behavioral insights, and compliance.

The purpose matters more than the hype and the hype will eventually go away. Those without purpose will be in an Emperor’s New Clothes scenario as either the king or his advisers.

2. Data Enablement

You can’t have Big Data without data. While there are companies that simply crunch one-dimensional data, like Twitter feeds or customer shopping patterns, the typical enterprise needs to manage more complex data sets coming from anywhere and everywhere. There is traditional data and its absolute integrity, defined by complex schema, and sitting at rest in databases and log files. There is also the data that arrives in the operational moment, while business is happening and there’s no time to treat, store, and recall. This in-motion data is often unstructured and dynamic. It presents challenges for an organization that can’t grab it in real time and make use of it.

This is where in-memory data grids matter. The ability to use what Rymer calls “elastic caching platforms” allows data at rest and data in motion to be married up in cache, where flexibility and speed matter and many operations and queries can take place at the speed of business.

If you’re skeptical, think of how much information is only important in a moment. We call this volatile data and it “dies” before it can be extracted, transformed, loaded into a database, and then queried. Without cache memory, volatile data has no value. Commodities and equities trading, and healthcare, are big users of volatile data; but putting it to use gets easier, and more and more organizations will demand this capability.

So far, we’ve only talked about typical data sources, but as the world adapts more sensors and live feeds, the amount, speed, types, and volatility of data will increase rapidly. Elastic caching options will be a highly critical piece.

3. Infrastructure Pipes

We’ve written three recent success stories that stand out from the noisy crowd talking about Big Data: Mercy Healthcare, The Nielsen Company and FedEx Services. They all reap powerful benefits from Big Data solutions and have something basic in common. They all made significant investments in a service oriented architecture (SOA) with interoperable services that allow their organizations to move large amounts of data quickly from outside in, inside out, and across any application, database, or other data source. They do this through an enterprise service bus (ESB) that moves data automatically between disparate sources that publish and subscribe rather than trying to connect each application or database separately.

If you think about it for a moment, the need to connect to data precedes the ability to do any form of analysis. Without it, you have the bicycle, unable to support the heavy elephant of Big Data.

4. User-friendly Analytics

Organizations that have their infrastructure house in order are ready to crunch data and find meaningful insight. Insight shows up as patterns in the data that can be discovered by using complex algorithms. The best tools have visual interfaces and are useful to business people who don’t need to understand every technical nuance, but can manipulate data to discover insights. Drag and drop is the new black.

Done right, visualization is the starting point for creating interactive dashboards that aggregate, present, and allow manipulation of large data sets across disparate data sources.

If a PhD is necessary to manage the front end of analytics, the system is unsustainable and won’t allow the typical enterprise to successfully mine data at the pace necessary for meaningful change.

5. Sense and Respond

Once a pattern is understood, there needs to be a way to anticipate its occurrence to either maximize its benefit or take steps to prevent or mitigate the problem it presents. Automated event processing is the modern equivalent of what was studied and taught by U.S. Air Force Col. John Boyd, a fighter pilot who realized that decision making occurs in recurring cycles. He called the process OODA: Observe, Orient, Decide, and Act. He was highly effective with his system and it is still taught today in air combat schools.

What Boyd had to do based on training and awareness, we now support with computerized systems that can handle far more data, far faster. Systems trained to watch for events in combination can also apply logic to a pattern discovery to call for more analysis, look for follow-on events, or respond immediately in a variety of complex ways.

And just like OODA, it feeds back into itself. Events that are streamed into an event processing engine from either data found across the ESB or coming “live” from external feeds may not be understood in the moment, but are processed after-the-fact in the very same analytical tools that discovered data patterns in the first place. This creates a virtuous cycle of discovery-operation-improvement-operation and so on.

Making sense of the competitive landscape is truly a function of a Big Data solution, but event processing is the secret sauce. It reflects the high proprietary choices each organization makes. It sets the stage for the highest value step of a Big Data solution.

6. Putting Solutions Into Play

Whether preventing deadly illnesses, providing near-real-time global market research, or solving congestion in the logistics network, Big Data’s big value comes from the action that is taken. The most effective way to respond to opportunity and risk is to have control over both manual and automated processes. Business process management suites are the flexible and fast way to drive efficiency in execution and consistency in response.

Social media has a powerful role as well—apply the Big Data “Big Filter” to put the right information in the right hands, at the right moment. Social software is maturing in this role, but users need to catch up. Expect social tools to be the way to define work at the role level of any organization in the future.

Big Data challenges organizations to think across more functional silos than ever before and responsive process management and collaboration will keep the Big Data wave from swamping the boat.

7. Go Back and Do It Again

When the connect-understand-anticipate-act cycles have run its course, the next step is to analyze outcomes, find new and/or better patterns, and improve the system’s function and output. It is a virtuous cycle of change that the best organizations never stop running.

Not Just Our Opinion

It would be hard to oversell the need for the infrastructure outlined above. A recent IDC report lists data integration (#3) as the biggest Big Data IT challenge, not the size or speed required by the system. That same report lists defining business requirements (#1) as the top business challenge. Rymer says, “Big Data must include complex event processing platforms (#5), elastic caching platforms (#2), and various not-only SQL (NoSQL) databases. We have yet to see a one-size-fits-all suite or solution for all of these scenarios.”

Likewise, a recent Gartner Big Data report, by analyst Douglas Laney, kicks off its analysis section with a warning that organizations need to ensure infrastructure adequacy (#3). Coming out of the late-2000s downturn, infrastructure adequacy isn’t, by any means, in place for current needs, much less the significantly expanded requirements for Big Data solutions.

O’Reilly’s Strata site wrote up an interview with an expert that stressed a new focus on user-friendly analysis (#4).

In a report from last summer by Sanchit Gogia, Forrester warns that organizations overlook the key factors that support Big Data solutions like infrastructure (#3) and process (#6).

Forrester’s Clay Richardson uses the term “Big Process” when he observes that, “…even for organizations that might not be focused on Big Data quite yet, there is still the need to begin thinking from a big process perspective to better understand the relationships and impacts between operational data and business process performance.” (#6). Clay’s blog is appropriately titled, Big Data Ain’t Worth Diddly Without Big Process [3]

The challenges aren’t a mystery at this point in the Big Data hype cycle. The solutions aren’t either if you take a look at the market analysis and success stories we’ve mentioned. Big Data, beyond being a questionable term, has real value for those who understand the nuances of working with data that has volume, velocity, variety, and volatility.

Endnotes


[1] Forrester Research, Inc., October 2012 John Rymer Blog Post

[2] Detractors are always easy to find…here are but three of the more famous ones:

Airplanes are interesting toys but of no military value.” – Marechal Ferdinand Foch, World War I French General

I think there is a world market for maybe five computers.” – Thomas Watson, chairman of IBM, 1943

There is no reason anyone would want a computer in their home.” – Ken Olson, president, chairman and founder of Digital Equipment Corp., 1977

[3] Forrester Research, Inc., April 2012 Blog Post