Building Prototypes to Avoid Falling into a Big Data Analytics Trap

The main challenge about big data analytics projects is that they are really big. According to a survey conducted by Infochimps, 55% of all the projects remain incomplete, which means significant money was invested into a solution that is never to be released. There are numerous reasons that explain why data intensive projects fail, for example, lack of experienced staff, bad communication, weak cooperation between the departments, inability to manage with data velocity, and many others. However, the most essential of them is the wrong planning.

Most projects need more than just a technology but a careful development strategy. Very often companies select an incorrect platform, vendor, infrastructure, database, or other tools. Besides, it only becomes clear that the choice was wrong when the project is about to be launched, to be exact, after 12-24 month of development work! Many planning errors are easy to avoid, if to be aware of them before you start.

To deal with all the issues mentioned above, more and more companies prefer building prototypes prior to beginning large-scale projects. In general, a prototype allows for understanding how the system should work, testing design and features, as well as taking corrective measures to put the project back on track.

For more than 10 years of experience in delivering software solutions at Altoros Systems, Inc., the company’s engineers witnessed a number of cases when a good prototype helped to reveal some hidden problems and, as a result, the company selected tools/methods that differed from those planned originally.

That was exactly the case with one of Altoros’s customers. This is a global provider of automated IT systems management services that helps IT administrators to monitor, manage and protect their infrastructures (range from systems of small size to large enterprise systems that consist of more than 3.5 million of machines) from one central dashboard. Since the number of regular users was growing constantly, they decided to migrate from a relational database to a more flexible data storage, namely Couchbase. However, this database requires that the working set is placed directly into the memory; otherwise, access operations are slowed down considerably.

The company required a system that would sustain extreme loads. Every day, 50,000,000 files and 40,000,000 support sessions along with archived and compressed data, etc. had to be uploaded to the database. If all these records were stored in Couchbase, infrastructure maintenance costs would be extremely high. The team benchmarked the alternatives, added the number of records which was similar to a planned load, and tested the system performance using the Amazon cloud. It was decided to use Cassandra to support queries and range scans, as well as provide access to data by key. As a result, the customer got a cost-effective and scalable NoSQL-based solution that can easily serve 20,000,000 users and 200,000,000 machines.

Another case happened to a one of the biggest US-based cloud infrastructure provider. They wanted to make sure that their hardware will be powerful enough to provide the required performance to a large big data project. Altoros’s team created a prototype cluster, tested the hardware under different operating systems and settings, detected the bottlenecks and after that put together a detailed list with recommendations on how to better configure and tune the cluster. As a result, the customer was able to achieve performance which was by 20-30% better than expected.

In our session “Big Data, Big Projects, Big Mistakes: How to Jumpstart and Deliver with Success” Altoros team will share practical tips on how to start a big data project, how to select a vendor of an IT solution, a platform, a technology stack, etc. He will also give more details on how prototypes help to eliminate critical mistakes, save money, and eventually, avoid project failure.
The presentation would be interesting for IT managers, software architects, financial decision makers, and all those who are interested in practical aspects of Big Data projects.