Market Basket Analysis (or in short: MBA) is a technique using analytics to explore customer purchase behaviors, thus retailers can efficiently employ purchase information to effectively market and optimize sales of diverse products. This analysis can help to promote deals, offers, sale by the companies, and data mining techniques helps to achieve this analysis task. MBA goes with several other names such as Product Association or Product Recommendation and is one of the most popular and best applications of machine learning, looking for the products that frequently go with each other in purchases. For example, if a customer buys a pair of pants, how likely are they going to buy a shirt on the same trip? How likely is a customer going to buy a drink following a purchase of a sandwich? This analysis stands at the forefront in the retail industry to solve such cases to make better recommendations for customers, which has been applied widely in such as: cinemas, fast food restaurants, clothing shops.
Mục lục
- 1 1. MARKET BASKET ANALYSIS WITH ADVENTUREWORKS2019 DATABASE
- 2 2. LIST OUT THE SALESORDERNUMBER AND PRODUCTKEY OF ORDERS HAVING AT LEAST THREE PRODUCT KEYS
- 3 3. COMMON TABLE EXPRESSION (CTE) AND CREATE COMBINATIONS OF THREE PRODUCTS IN THE SAME ORDER
- 4 4. CALCULATE THE FREQUENCY OF THREE PRODUCTS
- 5 CONCLUSION
1. MARKET BASKET ANALYSIS WITH ADVENTUREWORKS2019 DATABASE
To better understand how MBA works, let’s look at the following demonstration in Azure Data Studio on AdventureWorks2019 database, specifically FactInternetSales table with two attributes: SalesOrderNumber and ProductKey.
DESCRIPTION
- FactInternetSales table: Sales record through Internet channel
SalesOrderNumber: The code indentifies the order
ProductKey: The code indentifies the product
1. SELECT THE ORDERS HAVING AT LEAST THREE DIFFERENT PRODUCTS
We have the following result:
2. LIST OUT THE SALESORDERNUMBER AND PRODUCTKEY OF ORDERS HAVING AT LEAST THREE PRODUCT KEYS
We have the following result:
Notice that all the orders returned have more than 1 product key. Before seeing which combination of three products occurs most frequent, let’s use common table expression (CTE) to use this result for further uses.
3. COMMON TABLE EXPRESSION (CTE) AND CREATE COMBINATIONS OF THREE PRODUCTS IN THE SAME ORDER
We self join the temporary “Info” table to view every 3 products that occur in the same order. Pay attention to the “WHERE” statement where we specify four conditions to avoid duplicates.
· Info1.ProductKey != Info2.ProductKey ;
Info2.ProductKey != Info3.ProductKey: This is to make sure that the products we account for are different.
· Info1.ProductKey < Info2.ProductKey ;
Info2.ProductKey < Info3.ProductKey: This is to avoid duplicates. If this condition does not exist, the result will be:
Look at the first and the second record where Product1 equals Product3 and vice versa. This makes the result is twice or more as long as it should be. Therefore, removing duplicates by using second condition is a must
This is the result without duplicates:
4. CALCULATE THE FREQUENCY OF THREE PRODUCTS
We have the following result:
From the result, we can interpret that if a customer buys product 477, they are most likely to buy product 478 and 485. Therefore, recommending customers to buy these three products would be likely to improve sales revenue.
CONCLUSION
MBA is an extremely powerful technique that a number of companies now are using to anticipate the interest of customers in order to suggest them the most relevant product besides what they chose to optimize the revenue. With only a few lines of code, SQL helps us to extract the data we need and make the right business decisions.

