MARKET BASKET ANALYSIS USING SQL

Ngày đăng: 23/05/2023

Market Basket Analysis (or in short: MBA) is a technique using analytics to explore customer purchase behaviors, thus retailers can efficiently employ purchase information to effectively market and optimize sales of diverse products. This analysis can help to promote deals, offers, sale by the companies, and data mining techniques helps to achieve this analysis task. MBA goes with several other names such as Product Association or Product Recommendation and is one of the most popular and best applications of machine learning, looking for the products that frequently go with each other in purchases. For example, if a customer buys a pair of pants, how likely are they going to buy a shirt on the same trip? How likely is a customer going to buy a drink following a purchase of a sandwich? This analysis stands at the forefront in the retail industry to solve such cases to make better recommendations for customers, which has been applied widely in such as: cinemas, fast food restaurants, clothing shops.

1. MARKET BASKET ANALYSIS WITH ADVENTUREWORKS2019 DATABASE

To better understand how MBA works, let’s look at the following demonstration in Azure Data Studio on AdventureWorks2019 database, specifically FactInternetSales table with two attributes: SalesOrderNumber and ProductKey.

DESCRIPTION

  • FactInternetSales table: Sales record through Internet channel

SalesOrderNumber: The code indentifies the order

ProductKey: The code indentifies the product

1.        SELECT THE ORDERS HAVING AT LEAST THREE DIFFERENT PRODUCTS

We have the following result:

2. LIST OUT THE SALESORDERNUMBER AND PRODUCTKEY OF ORDERS HAVING AT LEAST THREE PRODUCT KEYS

We have the following result:

Notice that all the orders returned have more than 1 product key. Before seeing which combination of three products occurs most frequent, let’s use common table expression (CTE) to use this result for further uses.

3. COMMON TABLE EXPRESSION (CTE) AND CREATE COMBINATIONS OF THREE PRODUCTS IN THE SAME ORDER

We self join the temporary “Info” table to view every 3 products that occur in the same order. Pay attention to the “WHERE” statement where we specify four conditions to avoid duplicates.

·      Info1.ProductKey != Info2.ProductKey ;
Info2.ProductKey != Info3.ProductKey: This is to make sure that the products we account for are different.

·       Info1.ProductKey < Info2.ProductKey ;
Info2.ProductKey < Info3.ProductKey: This is to avoid duplicates. If this condition does not exist, the result will be:

Look at the first and the second record where Product1 equals Product3 and vice versa. This makes the result is twice or more as long as it should be. Therefore, removing duplicates by using second condition is a must

This is the result without duplicates:

4. CALCULATE THE FREQUENCY OF THREE PRODUCTS

We have the following result:

From the result, we can interpret that if a customer buys product 477, they are most likely to buy product 478 and 485. Therefore, recommending customers to buy these three products would be likely to improve sales revenue.

CONCLUSION

MBA is an extremely powerful technique that a number of companies now are using to anticipate the interest of customers in order to suggest them the most relevant product besides what they chose to optimize the revenue. With only a few lines of code, SQL helps us to extract the data we need and make the right business decisions.

Chia sẻ bài viết này

Để lại một bình luận

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *

This site uses Akismet to reduce spam. Learn how your comment data is processed.