A big retailer in the US was extensively using SAS for predictive analytics; more than 700 models were built using SAS PROC. The model building and variable generation code for most models took more than 4/5 hours to execute; models executed on larger and discrete data sets required even more time. The analytics team could not tweak the model frequently due to the time-intensive nature of the process.
Migrating to Hadoop
Datametica migrated one of the models to run on top of Hadoop where the datasets were ingested into HDFS. The model building code was replicated in H2O, R and Python. The variable generation and weightage calculation time was brought down to less than one hour. With the success of the initial POC, we have started to migrate all the models to Hadoop. Eventually, all the model building and execution will be offloaded to Hadoop.
Reduced cycle time for model building
Tweak & model
Facility to tweak the model more frequently and derive benefits with more accurate predictions
Reduced error factor due to the use of larger training data sets
Tools / Technologies