MARKET BASKET ANALYSIS USING SQL

Ngày đăng: 23/05/2023

Market Basket Analysis (or in short: MBA) is a technique using analytics to explore customer purchase behaviors, thus retailers can efficiently employ purchase information to effectively market and optimize sales of diverse products. This analysis can help to promote deals, offers, sale by the companies, and data mining techniques helps to achieve this analysis task. MBA goes with several other names such as Product Association or Product Recommendation and is one of the most popular and best applications of machine learning, looking for the products that frequently go with each other in purchases. For example, if a customer buys a pair of pants, how likely are they going to buy a shirt on the same trip? How likely is a customer going to buy a drink following a purchase of a sandwich? This analysis stands at the forefront in the retail industry to solve such cases to make better recommendations for customers, which has been applied widely in such as: cinemas, fast food restaurants, clothing shops.

1. MARKET BASKET ANALYSIS WITH ADVENTUREWORKS2019 DATABASE

To better understand how MBA works, let’s look at the following demonstration in Azure Data Studio on AdventureWorks2019 database, specifically FactInternetSales table with two attributes: SalesOrderNumber and ProductKey.

DESCRIPTION

  • FactInternetSales table: Sales record through Internet channel

SalesOrderNumber: The code indentifies the order

ProductKey: The code indentifies the product

1.        SELECT THE ORDERS HAVING AT LEAST THREE DIFFERENT PRODUCTS

 

 

We have the following result:

 

2. LIST OUT THE SALESORDERNUMBER AND PRODUCTKEY OF ORDERS HAVING AT LEAST THREE PRODUCT KEYS

We have the following result:

Notice that all the orders returned have more than 1 product key. Before seeing which combination of three products occurs most frequent, let’s use common table expression (CTE) to use this result for further uses.

 

3. COMMON TABLE EXPRESSION (CTE) AND CREATE COMBINATIONS OF THREE PRODUCTS IN THE SAME ORDER

We self join the temporary “Info” table to view every 3 products that occur in the same order. Pay attention to the “WHERE” statement where we specify four conditions to avoid duplicates.

·      Info1.ProductKey != Info2.ProductKey ;
Info2.ProductKey != Info3.ProductKey: This is to make sure that the products we account for are different.

 

·       Info1.ProductKey < Info2.ProductKey ;
Info2.ProductKey < Info3.ProductKey: This is to avoid duplicates. If this condition does not exist, the result will be: