Chris Roche, UKI Regional Director for Greenplum – Blog

Day 90: A Chorus of Anticipation in the Bloodstock Industry to the Anticipation of Chorus in the Analytics Industry

I’m here at Newbury races and in the horse racing community the end of March is a time of both reflection and anticipation.

The jumping community reflects on how their horses performed at the Cheltenham festival in March. Years of work cumulating in one week of championship racing in the Cotswold countryside. Others in the community are anticipating the Aintree festival where the most famous of races, The Grand National, is to be run in a few weeks.

And anticipation is the watchword in the flat racing community. With the first of the five classic races only six weeks away the new season is just under way and trainers and connections wait to see if their two year olds have trained on into potential classic winning three year olds. Win one of these classics and the value of the horse in the breeding and bloodstock industry jumps exponentially. Others wait to see if their significant investments in yearly purchases last year will make the grade as two years old.

Horse racing and betting is a heavily stats driven business. From analyzing racing form to matching bloodlines the key is to spot the piece of information that the bookmaker or other breeder have missed and make an investment on that knowledge.

Just as the jump trainers review their year a highlight so far has been the massively successful Data Science event Greenplum held in February. Several hundred interested parties listening to leading thinkers of the day Marcus Du Sautoy and Peter Hinssen.

However, just like the flat trainers it is the promise of what is to come and the anticipation of some of the potential Greenplum has been developing that excites me most.

In particularly we have some fantastic two year olds in the stable. This month we released Chorus V2.0 our productivity software.

The first solution of its kind, Greenplum Chorus provides an analytic productivity platform that enables the team to search, explore, visualize, and import data from anywhere in the organization. It provides rich social network features that revolve around datasets, insights, methods, and workflows, allowing data analysts, data scientists, IT staff, DBAs, executives, and other stakeholders to participate and collaborate on Big Data. Customers deploy Chorus to create a self-service agile analytic infrastructure; teams can create workspaces on-the-fly with self-service provisioning, and then instantly start creating and sharing insights.

Here I think we’ve developed that thoroughbred piece of software that will dominate the analytics productivity market for years to come. Just like the great sire Saddlers Wells who dominated the bloodstock industry. One reason for this is the bold announcement that the software will be open source. Just like a champion sire whose bloodline is found throughout the form books of racing history by open sourcing this software EMC Greenplum is truly creating a platform from which to develop your big data applications. The Greenplum Unified Analytics Platform (UAP) is the Open, Agile and Collaborative platform of the future.

It seems that it was not only me that anticipates Chorus to be a classic winner. Some of the immediate analyst reaction included:

“Greenplum clearly understands data scientists and has crafted a social application just for them. This is the emerging face of the social enterprise. Ultimately, it is about using social tools to accomplish our jobs better–and not social for social’s sake.”

“Just as important, we expect a growing range of next-generation Big Data development tools to plug into extensible open-source platforms geared to boosting the collective productivity of teams of data scientists and subject-matter experts. It’s with this last trend in mind that we laud EMC Greenplum’s recent announcement that it is open-sourcing its new Chorus “social” framework for Big Data development.”

With Greenplum’s Unified Analytics Platform bringing together our stable’s thoroughbreds including not only Chorus but also the industry-leading massively parallel processing (MPP) database (Greenplum DB) and our enterprise Hadoop offering, Greenplum HD, the industry is anticipating a massive transformation to UAP.

Just as the bloodstock industry has a chorus of Anticipation I have my own Anticipation of Chorus.

If you’re interested in learning about building an analytics business for the future and learning about analytics in general then please follow my blog journey throughout 2012

Day 40: Predictive Show Jumping and Data Science

So there I am sitting on a horse waiting to compete in a show jumping competition when one of the other competitors, having just completed their round, appears back in the warm up ring. She says to her coach:

“I don’t know what went wrong, I corrected his (the horses) position before each fence, but we still knocked several of them”

The coach, clearly had been here before:

“Yes you did and that’s the problem. You’re correcting the horse after the fact. You’re waiting for something to happen before you react. The winning riders are reading all the signals and predicting what might happen and taking corrective action first”

“But there is so much going on” says a clearly frustrated pupil, “The crowd, the speed and rhythm of the horse, the type of fence, the line of the fences, the next fence, the time left, my position, the horses lead leg…”

Having attended the Greenplum Data Science Event on Feb 8th it’s clear that whether you are Marcus Du Sautoy, Peter Hinssen or Professor Nigel Shadbolt predicting outcomes based on vast quantities of data, just like our rider, is a key skill that business needs to develop.

Peter Hinssen discussed how the “power of participation is changing the world” and that “markets are becoming networks of intelligence and the need to have deep technology but also a drastically different consumption model.”

Whilst Marcus Du Sautoy cautioned about identifying patterns too early in too limited a data set as you may end up with the wrong prediction.

Our show jumping friend was suffering from a limited ability to take on board different streams of information but did have a coach to help develop those skills.

This brings me to my point – Consumerisation (Participation) and Big Data are two of the greatest discontinuities in the business environment for some time. The entrepreneurial business person sees great opportunity but also a few fences and ditches along the way. Some just see the fences.

What’s interesting is that at the Data Science event Greenplum surveyed the 200+ attendees asking them to rate from non problematic to very problematic 18 data management concepts. The results seem to support what our commentators’ positioned.

The top two most problematic concepts, by far, were:
*The lack of an organizational view of data (Network of Intelligence)
And
*Reconciling disparate data sources (Patterns in massive data sets)

The attendees know that to reconcile disparate data sources they need deep technology that combines the co-processing of structured and unstructured data in lightening fast time. But they know also that to gain true insight that the consumption model of how to predicatively analyse these data sources must be drastically different. They know it must be a participatory model or a “Network of Intelligence”

Great news for the attendees then that the Greenplum Unified Analytics Platform (UAP) combines the co-processing of structured and unstructured data with a productivity engine (Chorus) that enables collaboration among data science teams. For the first time in analytics Greenplum has fused the deep technology with a new consumption model, a model that allows the discovery and self service creation of datasets into collaborative sandboxes using a social computing user experience.

That sounds great but this is a new world some businesses are venturing into, the rewards are vast but the fences are high. They may need some coaching. EMC Greenplum calls these coaching moments Analytic Labs.

An Analytic Lab brings Greenplum’s Data Scientists together with a client’s analysts, data platform administrators, and business leadership to solve a real life analytics challenge on an accelerated schedule. By combining services, training, and, in some cases, hardware and software, these unique labs bring the latest tools, methods, and technologies to bear on Big Data.

What really excites me is that EMC Greenplum has already delivered several of these Analytics Labs. Labs to create new customer churn models, Healthcare fraud detection, Energy demand pattern recognition and Sentiment analysis to name a few.

A new approach for a new world

If you’re interested in learning about building an analytics business for the future and learning about analytics in general then please follow my blog journey throughout 2012

Day 30: Analytics Goes to Hollywood

I’ve just returned from my “sheep dip” sorry new hire training on the US west coast. What movie did I watch on the plane on the way over? You’ve guessed it – MONEYBALL. How kul is that? (You can tell I’ve been on the west coast as I’ve started using words like kul which I haven’t done since the 70’s.).

The story of Oakland A’s general manager Billy Beane’s successful attempt to put together a winning baseball club on a budget by employing computer-generated analysis to draft his players.

A film with data science and predictive analytics at the heart of it – I’m now trendy, I can tell people what I do at dinner parties and they’ll understand.

“Yes I employ Peter Brand Characters who are great with numbers”

One of the points I took away from the film was that the Billy Beane character had to stand firm with his commitment to a radically new approach despite a torrent of resistance even though it was clear that the old ways were not working.

He did have a catalyst. Peter Brand. The unassuming Data Scientist who crunched the numbers.

What was great about my visit was I met several of our own data scientists. Unassuming, intelligent people who make numbers sing. I learnt that we deploy their skills with clients via Data Labs. Collaborative environments built on a unified data platform where the Billy Beane’s of our clients can engage with our Peter Brand’s to make their numbers sing.

If you want to be astounded by what the future holds for analytics then come join me at the Data Science Event on Feb 8th.

If you’re interested in learning about building an analytics business for the future and learning about analytics in general then journey with me on my blog throughout 2012.

Big Data Analytics – From Combi Boiler to Megaflow

As well as taking on a new job this year I’m also just completing the renovation of a house from top to bottom. I remember the day the plumber arrived to discuss “The Heating”.

Lawrence asked “Is there anything that frustrates you about your current system?”

A long list of niggles we’ve put up with for years came to light;

“Well there’s the pressure for one thing, the water just dribbles through at times, especially when there is high demand, if guests are staying you have to be careful not to flush the loo when someone is in the shower, also we keep getting air locks and are constantly bleeding the system, the flow’s not great and it’s noisy. Apart from that it’s excellent and I’ve put up with it for 20 years”

Lawrence grins; “You do know we had people on the moon in 1969. Architecture has moved on. You’re running an old system called a Combi Boiler that’s just not up to the requirements of your modern day life. We all have more appliances, we want several bathrooms and we want everything on demand in real time.”

“So tell me what this great new World is,” I say.

“Megaflow” say Lawrence.

I like the sound of that already I think to myself, ‘Tell me more”

“Water comes into your house at an incredible rate and pressure. Yet most households do not harness that. Instead the water is stored in the loft, in a tank which means the system only has the force of gravity to power all the water outlets. In essence you’ve slowed down both the speed at which you acquire water and deliver it. It’s a double whammy. Megaflow systems are designed to utilize the pressure from the water main solving many of the challenges you’ve been experiencing. With this you’ll have high pressure to all outlets, no noisy tank or pumps and a lot less air locks”

“Brilliant, I’ll have one” I say.

So after a few tests and some negotiation I’m now the proud owner and user of a Megaflow system. And it’s fantastic, we should have changed years ago, guests love the result. Welcome to the future.

So what’s this all got to do with Big Data Analytics?

Well, 15 years ago, when I left my database designer role to move into business transformation, everything was relational databases running on shared servers with limited access for the normal user and some queries where starting to take days to run due to indexing. Let’s say it was a Combi Boiler.

Today I turn up and the architecture has moved on. There is now Massive Parallel Processing and Scatter Gather Technology let’s call this Megaflow. This new Megaflow technology in the data analytics World allows rapid load of data and rapid execution of queries in real time over many data sources.

The use cases that can be run on these platforms are limitless.

Today I saw real time sentiment analysis of a consumer brand. I witnessed the reaction in real time across all the main social media channels to a particular advert. The system was learning and processing millions of pieces of data. The Chief Merchandising Officer I was with was astounded.

If you want to be astounded by what the future holds for analytics then come join me at the Data Science event on Feb 8th.

If you’re interested in learning about building an analytics business for the future and learning about analytics in general then follow me on my journey on this blog throughout 2012

Who is Marcus du Sautoy? Making sense of the Big Data deluge….

Day 6. I’ve just taken over the running of EMC’s Analytics business (Greenplum) in the UK&I. Drinking from a fire hose!

Big Data this Big Data that, Massive Parallel Processing, Scatter Gather Technology, Sentiment Analysis, Bayesian Statistics, met twenty new people already. Chaos? Typical first week.

Then some bright spark from marketing tells me I’m to meet Marcus du Sautoy at a Data Science event on Feb 8th. “Look forward to it” I say as I type his name into Google.

His big thesis is that although the world looks messy and chaotic, if you translate it into the world of numbers and shapes, patterns emerge and you start to understand why things are the way they are.

I need to be there. Marcus may be able to help. Thinking about it I can’t be the only person trying to make sense of a deluge of data from many different sources and then trying to make some money from it. If you find yourself in the same boat join me at the Data Science Series and hear not only Professor du Sautoy but also Peter Hinssen, Sean Gourley and others who have some great ideas and learning.

If you’re interested in learning about building an analytics business for the future and learning about analytics in general then join me on my journey on my blog throughout 2012.