See results from your on cluster analytics with Arcadia Data

Big Data is making a lot of promises to the enterprise, but there are many challenges when it comes to building infrastructure that is flexible and scalable. Where to begin?

Start with a good blueprint

As with any architectural construct, the best place to start is with a good blueprint. Commonly, the central construct of Big Data is the data lake where raw, unstructured data is turned into actionable intelligence.

But that’s not a magic, one-step process. For this reason, your enterprise needs to think very carefully about how the data lake is designed and executed. Failure to start with a well-crafted blueprint could have serious consequences not only in the short term – such as with quality of the analytics – but in the long term as part of the overall success of your big data strategy.


Based on Arcadia Data’s experience working with enterprise customers driving business value from Big Data, the data lake is best viewed as a sequence of three operational zones connected by a pair of transition phases. The function of each zone is not just to enhance the value of data assets coming in, but also to build workflows, establish access and security parameters, and systematically increase the exposure to a larger community of business users.

Once you have your basic architectural blueprint, you’ll want to be sure to follow the “10 Commandments” for BI on Big Data. Since you aren’t using a previous generation architecture to store your Big Data, why should you use previous-generation BI tools for your Big Data?

Our customer Procter & Gamble sums it up well:

“For three years, we’ve been evaluating the market for a BI product… Arcadia Enterprise is the first product we found that provides truly on-cluster Hadoop BI…Its execution model and user self-service approach deliver performance at Hadoop scale, and lets us develop our analytics quickly” – Terry McFadden, Associate Director, Global Business Services, Procter & Gamble

Our Chief Technology Officer, Shant Hovsepian, puts this in plain English:

1st Commandment: Thou shalt not move Big Data.

This one speaks for itself. Moving Big Data is expensive. By nature, it is big. Physics are in play here. People want BI tools that can push computation as close to the data as possible. Don’t just settle for ODBC/JDBC connectors. An “extract” is, by definition, moving. It’s a huge maintenance problem. Now there are two copies of something that is logically the same. Think about the possibilities of your Big Data system, of how much BI you can actually push down to the lower layers.

2nd Commandment: Thou shalt not steal or violate corporate security policy.

A lot of companies are very, very serious about security, especially given the last few data breaches. Big Data vendors have heard this from their customers, and they’ve implemented some amazing infrastructure to make security a possibility. But again, the theme with Big Data is: it’s large and it’s complicated. When you’re looking for BI tools, you want to look for tools that leverage the security model that’s already in place. If you have to  re-implement your whole security model, once in your storage layer, once in your database layer, and once in your application BI layer, there is more and more possibility that you’re going to lose information. Look for unified security models. And then auditing. If you can’t get security, and you can’t get encryption, at least make sure there’s an audit trail for your applications, because when Edward Snowden hits, you want to know where he hits.

3rd Commandment: Thou shalt not pay for every user or gigabyte.

One of the fundamental beauties of Big Data, besides the types of analytics and the storage, is the economic advantage if done properly. When you’re looking for BI tools, make sure they don’t have pricing models that penalize you for increased adoption. Lots of applications charge you by gigabytes. Some charge you by gigabyte index. These are frightening concepts when you’re dealing with Big Data, because it’s very common to have really fast growth, both from the data side and the adoption side. We’ve had multiple customers who, within a couple of months, have deployments go from tens of billions of entries to hundreds of billions. They went from 12 active users on the system to 600. Don’t pay a penalty on the BI side for having too many gigabytes indexed or too many users on the system.

But wait, there’s more

You can read more on these first three commandments as well as the following seven. But maybe you’d rather see all this in action. In that case, you’ll be interested in hearing from Royal Bank of Canada (RBC) in the person of Reid Levesque, who spoke on ‘Beyond TCO: Architecting Hadoop for Adoption and Data Applications‘ at the Hadoop Summit in San Jose.

Want to see Arcadia Data in action? Register for a live demo today. Or if you want to join the debate and hear from three domain experts discussing Bank Trade Surveillance and Compliance, then register for our upcoming webinar, which will be aired on November 22nd at 2pm UK time.