The Objective: Offer Scalable Solution to reduce the Operational Cost
An American fashion retailer wanted to migrate their existing applications which were running on the Hadoop, Hive cluster on-premise migration to Google Cloud Platform. The objective was to build a scalable solution that leverages open-source technologies and minimizes operational costs.
Challenges: High Operational cost & limited scalability
The client wanted to leverage GCP technologies for reducing operational costs and risk mitigation for their business. They wanted to move their data (transactions and customer data) and the pipeline processing from the on-premise Cloudera Data warehouse Migration to the Google Cloud Platform for better query performance and efficient business continuity.
The Solution: Lift and Shift migration to GCP
Datametica followed an agile approach for this implementation and delivered the following:
- Assessment and GCP Foundation setup which included analysis of Operational and analytics requirements, detailed migration strategy, and GCP Foundation Design document and configuration files to set up the Foundation Layer. Terraform scripting was used to maintain “Infra as Code”
- Historical Data Migration from on-prem tables to GCP.
- Enabled consumption layer on Cloud (Big Query – Soft Launch) where we pushed the processed data only for the final tables into Big Query so that business users could be migrated to the GCP platform for data access.
- Ingestion, rewriting and processing code migration to make it compatible with the GCP cloud platform.
- Scheduled jobs via Control-M and orchestrated via Cloud Composer.
- Performed continuous data validation using Datametica’s automatic data validation tool – Pelican to provide test evidence for the functional accuracy of the data.
- Implemented optimal partitioning and clustering strategy to improve query performance and reduce scan cost.
- Usage of managed DataProc clusters removed the need for Infra Management of Cluster.
Benefits: Improved Performance with Reduced operations cost
- Reduction in operational cost by 67%. We implemented an automated cluster startup and shut down mechanism and also implemented cluster auto-scaling to reduce operational costs.
- Usage of Terraform for maintaining “Infra as code” enabled automated Infra setup and upgrades.
- Easy scalability with minimum cost and effort.
- Better performance of queries and reporting time.