Clickstream data and Usage

What is Clickstream data?

A web trail that a user takes while browsing the internet is called clickstream. A clickstream is a record of a user’s activity on the internet including every website and every page of every website that the user visits, how long the user was on a page or site, and in what order the pages were visited. It is identified by a digital ID called cookie. These interactions are uniquely identified by cookies and by time stamps.

Clickstream data is becoming increasingly valuable to internet marketers and advertisers to understand customer behavior. There are different products such as Google Analytics and Adobe site catalyst that store and report on this data. However, clickstream data can help reveal insightful and actionable information to tailor the website experience for a specific user based on their site interaction. For example, a collection of heavy fleece jackets will be displayed to users living in harsh winters and searching for a fleece jacket. But a collection of light jackets will be displayed on the website to users living in moderate winters in a different geographical area.

If you want to grow your business, you should know your customer behavior, the better you know the better you can plan for your business.

This clickstream data is typically captured in semi-structured website log files or internet messages transmitted by JavaScript embedded in pages of the website that are received by a central server. These website log files contain data elements such as a date and time stamp, the visitor’s IP address, the destination URLs of the pages visited and a user ID that uniquely identifies the website visitor.

What is the Big Data approach to clickstream data analysis?

Big data analytics platforms such as cloud based Hadoop have become powerful tools for businesses looking to leverage vast sets of customer data for competitive advantage. But with so much rich data streaming in from multiple sources, the analytics challenge for many businesses is determining the types of data that will yield the highest amount of useful information and insights in a cost-effective manner.

Clickstream data is the answer for this problem because it is valuable customer information for business. Because of huge scale of clickstream data, it cannot be processed using relational database and can’t be accessed easily. We can use different Big Data approaches to benefit businesses in the following ways:

  • Click-path optimization: Clickstream data shows what pages users linger on, what items are placed into or removed from a shopping cart, and what items are purchased. Also it shows a pattern of combination of products the user usually buys. Using e-commerce-based clickstream analysis, marketers can quantify a user’s behavior while on the website to get an idea of how effective the site is at producing sales.
  • Market basket analysis: Market basket analysis is used to find product affinity and product association. For example, heavy snow gloves, thermal socks, and ear muffs will be some suggested products along side snow jackets.
  • Next Best Product analysis: Clickstream analytics gives marketers a predictive edge through Next Best Product analysis (NBP). This analysis is related to basket analysis. It helps marketers to see what kind of product combination customers buy. By using this information, marketers can plan for some discount for customers to increase the sell of that product.
  • Website resource allocation: This type of analysis is the major kind of challenge for marketer to locate resources for best optimization. By this analytics, resources can be provisioned to enhance the customer experience.
  • Customer segmentation on a granular level: Clickstream data has different dimensions of data that is tracked such as IP address, time of the day, and location. This dimensions help analysis by grouping similar behavior and comparing it across different groups. For example, if the company has demographic information along with location, it will help compare suburban customers shopping habits vs. metropolitan customers shopping habits.

Hadoop helps join different segments together to perform advanced analysis. Popular tools like Apache Hive and Pig are used to analyze data along with Tableau and R to visualize data. Using big data approaches will transform the data in three steps: LOAD, REFINE and VISUALIZE.

How is clickstream data transformed in HDFS?

We can load raw web data logs of customer and product data into HDFS (Hadoop Distributed File System) using any technology such as Flume and Sqoop. Data maybe from different sources. The file can contain sample data of customer which contains field like time-stamp, IP of a user, destination URL. To refine data, we can use Apache Hive. Hive can join different data sets in one master data set using joins. Hive is used to perform SQL like queries. We can combine a website log data of customer and product using joint queries. On this data set, different types of charts can be visualized: highest ranking pages for specific product demand from different countries and user profiles for particular products from a specific country. By using all the visualized charts, marketers can optimize the path for a particular age group. One sample visualized graph is as follows.

This is a 383 by 213 png that cannot be zoomed in to take a close look at the numbers. It is not readable. Plus it does not have a title, x and y axes names.

 manisha click

Clickstream data is very useful for business purposes. With this information marketers can analyze each and every aspect to optimize sales through customer experience with every single click of the mouse.

Manisha Langote
Software Engineer
Big Data Platform