847-505-9933 | +91 20 66446300 info@datametica.com

DataMetica’s Solution for Real-Time Analytics

Analyzing sales trends using legacy systems on POS data is a task that becomes increasingly slow over time, because of growing data sizes.Migrating the legacy application to Hadoop will not only speed things up when processing large datasets but also save you hefty licensing fees. Hadoop gives you the ability to run the same batch process multiple times a day – a feat impossible with the older systems due to cost bound processing of CPU cycles, storage solutions and management of historical data.

Batch analytics can only help you understand how customer behavior was. However, it won’t let you visualize current behaviour in real time. For example, reports on fast selling products in real time across your retail stores will help you make business decisions based on customer trends. Thus a combination of both Realtime and Batch framework works wonderfully to help you visualize current consumer behavior in real time. Designing such a solution is not easy though.

Datametica has designed a distributed, highly available and reliable framework using popular open source frameworks currently available for use in development. The framework serves as a base to solve a wide range of business problems and can be tweaked to suit a specific business requirement. This of course brings down the costs associated with using commodity hardware, yet manging to achievie the ability to crunch large numbers of enterprise data.

Types of problems our framework can address in Supply Chain scenarios

  • Out of Stock
  • Demand fluctuations
  • Offer Effectiveness
  • Real Time Trend Based Offers

 

 

Datametica Realtime Framework Solution:

The framework addresses the following KPIs:

  • Total number of Sales for the Day
  • Total Sale amount
  • Total High Value Transacitons based on rules defined
  • Out of stock items
  • Number of Promotions active
  • Top 3 selling products
  • Bottom 3 selling products
  • Store wise sales
  • Store Sqft wise sales
  • State wise sales
  • Current Sales Trend vs Yesterday’s sales trend
  • Google Maps Integration to show current location

 

Realtime KPIs:

blog1

 

 

The above widgets are updated in real time employing the advanced  Server to Browser Ajax Push technology using CometD. This enables a business to take rapid situation-based decisions. The above widget shows Today’s total  sales numbers, total transaction in $, number of high value transactions (based on the rule defined), out of stock items and total number active promotions.

The portal allows business users to drill down to alerts such as high value transactions, out of stock items and promotions as shown below.

Out of Stock:

 

blog2

 

 

High Value Transactions:

blog3

The framework lets you see the high value transaction bill details.

Promotions:

blog4

 

Using the Promotions drill down, we can identify which items were sold along with the promoted product. It thus allows business owners to provide realtime offers.

Top 3 and bottom 3 selling items:

blog5

This KPI allows business users to see the top 3 selling items along with the slow selling items. It thus allows to create dynamic offers to boost sales.

Realtime Sales Trend analytics:

blog6

This widget shows the realtime sales trend. The yellow line depicts the previous day’s sales trend and the blue line depicts the realtime sales trend.

Google Maps Integration:

blog7

Google Maps Integration provides to track transactions and drill down to any particular transaction to drill down to see the Invoice.

Batch KPIs:

blog8

 

 

The above 3 widgets use batch output stored in Cassandra through Presto as their querying engine. These widgets are updated as soon the batch runs and updates the Cassandra NoSQL database.

 

Datametica’s  Realtime Framework Architecture:

 

blog9

The framework is logically split into the following parts:

  • Data Ingestion Layer
  • Buffering Layer
  • Batch Processing Layer
  • Realtime Processing Layer
  • Serving Layer
  • View Layer

Data Ingestion Layer:

The framework is capable of handling 2 types of Data feeds.

  • Batch Input
  • Enterprise Service Bus(ESB)

blog10

Batch Input:

The POS transactions can be captured in a file and the file can be transferred  periodically to a predefined directory (Flume’s Spool Directory). Once the file is placed in this directory, the transactions are fed into the system for processing.

Input from ESB

If the POS data is available in ESB, the framework can consume transaction data from ESB. This is done using Apache ActiveMQ.

“Apache ActiveMQ is the most popular and powerful open source messaging and Integration Patterns server.

Apache ActiveMQ is fast, supports many Cross Language Clients and Protocols, comes with easy to use Enterprise Integration Patterns and many advanced features while fully supporting JMS 1.1 and J2EE 1.4″

This Data Ingestion layer has to be very reliable, highly available as well as easy to scale to match the growing demands of enterprise data. We chose Apache Flume, which is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming event data.

Buffering Layer:

blog11

The data from the data ingestion layer has to be stored temporarily in the buffering system before the realtime or batch processing layer consumes the data. We selected Kafka, as a single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients(from Flume agents). This enables us to read and write lots of data using Kafka. It therefore makes the buffering layer horizontally scalable if the data requirement increases.

Kafka is a great framework for handling distributed producer and consumer problems. If you are interested to know more about the performance of Apache Kafka, here[link: http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines] is the link to detailed benchmarking results for 2 Million Writes Per Second (on Three Cheap Machines).

 

Batch Processing Layer:

The first step in batch processing is to get the data from  the buffering layer (Kafka) to Hadoop. We developed an in-house utility which is based on LinkedIn Camus. At the time of developing the realtime  framework, LinkedIn Camus was compatible only with Hadoop 1, so we had to tweak Camus to be compatible with Hadoop 2. Now LinkedIn Camus has a new branch which is compatible with Hadoop 2.

The Kafka-to-Hadoop job gets the data incrementally from the previous checkpoint and stores it in a date-wise folder in HDFS. Once the data is in Hadoop, it can be used to analyze customer trends.

The more data we collect, more accurately can we analyze the trends of customers, stores and inventory. Hadoop fits in well to analyze historical data. Hadoop provides fault tolerant computation frameworks and provides high availability of data with the help of HDFS. Hadoop can crunch a huge amount of data easily using Map and Reduce paradigm. We performed aggregation of data using Apache Pig and stored the output into Cassandra, a NoSQL key-value datastore.

 

Realtime Processing Layer:

Apache Storm is a free and open source distributed realtime computation system. The data from the buffering layer serves as input to this layer.

Storm Bolts are used to perform calculation on the input stream in realtime. The realtime output stream is fed back again to Kafka so that the consumers do not miss any data.

Push based events

We used CometD for scalable http ajax push mechanism to push the messages from Servers to Browsers. Thus the realtime framework is scalable horizontally with increasing number of clients using the system.

 

 

Serving Layer:

Cassandra provides low latency read-write access. To provide an SQL-like interface we used Presto for Interactive querying for this use case. This layer is pluggable and other candidates for this layers are Impala and Hive with Tez for interactive querying. The view layer uses the serving layer by using JDBC Connectivity.

Prestodb.io website describes Presto as below:

“Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.

Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.”

View Layer :

Datametica has designed a custom web based dashboard which is described in the beginning of this blog. Popular framworks such as Twitter Bootstrap, Google Graphs JS and Flot Time series graph.

Popular tools such as Pentaho can also be used to report the data from Presto through JDBC.

Abbreviations Used:

POS: Point Of Sale

References:

http://multichannelmerchant.com/crosschannel/untangling-retail-supply-chain-real-time-analytics-22012014/

http://kafka.apache.org/

http://flume.apache.org/

http://cometd.org/

https://github.com/linkedin/camus/

http://prestodb.io/

http://activemq.apache.org/

By :-Suraj Nayak

 

 

 

 

Leave a Comment

POST COMMENT Back to Top
*
Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.