Apriori Algorithm
The Apriori Algorithm is an influential data mining technique used to extract frequent itemsets and derive association rules from large datasets. It is widely employed in market basket analysis, where it helps in identifying frequently-purchased items and understanding customers' buying patterns. This information can be used by businesses to optimize their marketing strategies, such as providing targeted offers, product placements, and recommendations. The main idea behind the algorithm is the "Apriori Principle," which states that if an itemset is frequent, then all its subsets must be frequent as well; conversely, if an itemset is infrequent, then its supersets will also be infrequent.
The Apriori Algorithm works in an iterative manner, starting with single-item sets and progressively expanding to larger itemsets. In the first step, it calculates the support (frequency) for each item in the dataset and retains only those items that meet a specified minimum support threshold. Then, it generates candidate itemsets of size two by combining the frequent single-item sets. The support for these candidate itemsets is calculated, and only those meeting the threshold are retained as frequent itemsets of size two. The process is repeated, generating candidates of increasingly larger itemsets and retaining only those that meet the support threshold until no more frequent itemsets can be found. Once the frequent itemsets are identified, association rules can be derived from these itemsets by establishing relationships between items based on their co-occurrence in the dataset, measured by metrics such as confidence, lift, and leverage.
library(arules)
groceries<-read.transactions("groceries.csv",sep = ",")
summary(groceries)
itemFrequencyPlot(groceries,topN=20)
#使用image可视化稀疏矩阵,sample随机抽样
#sample for randomly extracting samples, image function for visualing sparse matrix
image(sample(groceries,100))
groceries_rule<-apriori(data = groceries,parameter =list(support=0.006,confidence=0.25,minlen=2))
plotly_arules(groceries_rule)
summary(groceries_rule)