Application platforms provide frameworks for making development of applications simpler, by carving out the generic parts of applications such as security, scalability, and reliability (attributes of a ‘good’ application), from the parts of the applications that are specific to the business domain.
Most existing application platforms (such as JEE and Ruby on Rails) were designed to work with centralized relational databases. This model is a poor fit for, because it was not designed to deal with massive amount of data in first place. In addition, frameworks like Hadoop are sometimes still considered too complex (see “Big Data” Technology: Getting Hotter, But Still Too Hard , Mike Gilpin, Forrester Research).
Other models for handling Big Data, such as data warehousing, do not provide an effective alternative, as noted by Dan Woods in Forbes, Big Data Requires a Big, New Architecture :
So, to effectively write Big Data applications, a new kind of application platform is needed, that can take the various patterns and tools being used by pioneers in the Big Data space (like Google, Yahoo, Facebook) and put them into a single framework, while making them simple enough for any organization to use without making a huge investment in terms of cost or development time.
What Would This New Big Data Platform Look Like?
A Big Data application platform must support all the functionality expected of any application platform, such as scalability, availability, security, and so on.
But a Big Data application platform would be unique because it has to be able handle massive amounts of data, and therefore must include built-in support for features and functionality such as Map/Reduce, integration with external NoSQL databases, parallel processing, and data distribution services. Moreover, it should make the use of these tools simple from a development perspective.
The following are some characteristics that define what a Big Data application platform ought to be.
Support Batch and Real-Time Analytics
Most existing application platforms were designed for handling of transactional web applications and have little support for business analytics applications. Hadoop has become the de facto standard for handling batch processing; Real-time analytics, however, is done through other means outside of the Hadoop framework, mostly through an event processing framework, (see Nati Shalom blog post Real Time Analytics for Big Data: An Alternative Approach).
Bring Big Data Applications Closer to Mainstream Development Practices
A Big Data application platform needs to bring Big Data application development closer to mainstream development by providing a built-in stack that includes integration with Big Data databases from the NoSQL world, and Map/Reduce frameworks such as Hadoop and distributed processing. It would also need to extend the existing transaction processing and event processing semantics that come with Java EE for handling real-time analytics, which fit into the Big Data world.
Built-In Support for Public/Private Cloud
Big Data applications consume large amounts of compute and storage resources. There is a growing number of cases where using the cloud enables significantly better economics for running Big Data applications. To take advantage of these economics, Big Data application platforms must include built-in support for public/private clouds, providing a seamless transition between the various cloud platforms through integration with frameworks like JClouds. Cloud-bursting provides a hybrid model for using cloud resources as spare capacity to handle load. To effectively handle cloud-bursting with Big Data the data must be available for both the public and private side of the cloud under reasonable latency – which often requires other services such as data replication.
Open and Consistent Management and Orchestration Across the Stack
A typical Big Data application stack includes multiple layers such as the database itself, the web tier, the processing tier, caching layer, the data synchronization and distribution layer, reporting tools, and more. One of the biggest challenges is that each of these layers comes with different management, provisioning, monitoring, and troubleshooting tools. Big Data applications tend to be complex; so the lack of consistent management, monitoring, and orchestration across the stack makes the maintenance and management of this type of application significantly more difficult.
In most Java EE management layers, the management application assumes control of the entire stack. With Big Data applications, this assumption does not apply. The stack can vary significantly between application layers; therefore, the management layer of a Big Data application platform must include a more open management that can host different databases, web-containers, etc., and provide consistent management and monitoring throughout the entire stack.
Java EE application servers played an important role in bringing the development of database-centric web applications closer to the mainstream. Other frameworks, such as Spring and Ruby on Rails, later emerged to increase the development productivity of these applications. Big Data application platforms have a similar purpose – they are meant to provide the framework for making the development, maintenance, and management of Big Data applications simpler. Think of Big Data application platforms as a natural evolution of current application platforms.
With the current shift of Java EE application platforms toward PaaS, expect to see even stronger demand for running Big Data applications in cloud-based environments due to the inherent economic and operational benefits. Compared to the current PaaS model, moving data to the cloud is more complex and requires more advanced support for data replication across sites, cloud-bursting and so on.
The good news is that Big Data application platforms are being implemented with these goals in mind, and you can already see migration yielding exactly the benefits one would expect.