Scalability of Open Source Big Data Management: A World of Possibilities
The one constant about data is that it grows. Over time and constant generation, data accumulates and occupies space. Then like the proverbial cup spilling over, data outsizes its storage. So you delete what is outdated and not relevant to create space or move the data to another storage device or enhance existing storage capacity. But what if you are an organization generating GBs of data each day and cannot delete any of your old data? What if you may require any part of that data, at any time? What if you need all of that data for continuous analysis between the distant past, the present and the future. Welcome to the world of data management.
The Bombay Stock Exchange (BSE) is a prime example of the above. Their data was doubling every 2 years and the proprietary traditional data management system that BSE was using at that time had reached its maximum utility. According to Mr. Dulal Mali, General Manager, Operations at BSE, “We had to subscribe to newer releases of expensive hardware and software upgrades every few years. With the data growth we were experiencing and the complex analytics that were required, continuing with it meant significant ongoing investments.”
What they needed was a solution to:
- Make BSE a data-driven enterprise with the ability to analyze and act on information in real time
- Consolidate information from multiple data sources into a scalable and robust data repository via an automated process
- Guarantee high ROI – A scalable solution, but one that was cost-effective
- Have the ability to query and analyze data via user-friendly methods in a secure manner
- Permit detailed investigations – run automated, complex algorithms to flag potential symptoms/incidents
DataMetica’s solution with its unique methodologies helped BSE migrate to a massively scalable platform coupled with high performance at a very economical cost of ownership.
The main aspect of the solution consisted of a 150 terabyte Hadoop Cluster to move BSE’s traditional information management platform to an open source one.
DataMetica proposed and delivered the following to BSE, as part of Phase 1 of the installation:
- Create an Enterprise Data Hub and consolidate BSE’s data silos
- Migrate BSE’s Reporting and Analytics to this frame work using open source technologies
- Migrate the statistical modeling framework from proprietary technology to Open Source R
- Create a comprehensive CMS (Content Management System) replacing the proprietary legacy system
- Use this cluster as an intelligent archive and do away with expensive storage, e.g. emails, documents, backups, archives
Hadoop and the open source ecosystem are a well-known Big Data solution. However, DataMetica brought its own home-grown methodologies and experience to the table. The result was a highly customized solution for BSE, and therefore innovative!
The Results and Benefits:
With an investment of less than 100 lakhs, the solution is designed to cater to the compute and storage needs of BSE for the next two years. Savings amounted to $150K in annual software maintenance and $1M in future investment to maintain the projected growth rate for the next three years.
The seamless transition to the new data platform was implemented so efficiently that end-users did not experience a single glitch.
The solution provides the capability of data-driven decisions based on very complex analyses. It is possible to get deeper insights into business, something that was impossible with the earlier platform.
The solution gave BSE what it wanted – a highly scalable environment. BSE experienced at least 4x improvement in query performance in comparison to the earlier environment. It is now possible to run historical queries spanning many years using very complex algorithms.
With the introduction of algorithm trading in the Indian stock markets, the trade to order ratio has gone up 125 times. It was nearly impossible for the old system to analyze matching orders with trades for longer periods. It required the system to churn billions of orders with millions of trades. With the implementation of the Big Data solution it became possible to analyze these complex computations in BSE with a reasonably quick response time.
New Business Models
With vast increases in compute and storage, it is now possible to store new voluminous data sources like the single-view market picture. Furthermore, the unstructured data sources have opened up new revenue streams for BSE.