To illustrate the concepts, we use a small example from the supermarket domain. Table 2 shows a small database containing the items where, in each entry, the value 1 means the presence of the item in the corresponding transaction, and the value 0 represents the absence of an item in that transaction. The set of items is I= \{\mathrm{milk, bread, butter, beer, diapers, eggs, fruit}\}. An example rule for the supermarket could be \{\mathrm{butter, bread}\} \Rightarrow \{\mathrm{milk}\} meaning that if butter and bread are bought, customers also buy milk. In order to select interesting rules from the set of all possible rules, constraints on various measures of significance and interest are used. The best-known constraints are minimum thresholds on support and confidence. Let X, Y be itemsets, X \Rightarrow Y an association rule and a set of transactions of a given database. Note: this example is extremely small. In practical applications, a rule needs a support of several hundred transactions before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Support Support is an indication of how frequently the itemset appears in the dataset: : \text{support}(A) = P(A)= \frac{(\text{number of transactions containing }A)}\text{ (total number of transactions)} The support of a rule is defined as: : \text{support}(A \Rightarrow B) = P(A \cup B) = \frac{(\text{number of transactions containing }A\text{ and }B)}\text{ (total number of transactions)} where A and B are separate item sets that occur at the same time in a transaction. Using Table 2 as an example, the itemset X=\{\mathrm{beer, diapers}\} has a support of since it occurs in 20% of all transactions (1 out of 5 transactions). The argument of
support of X is a set of preconditions, and thus becomes more restrictive as it grows (instead of more inclusive). Furthermore, the itemset Y=\{\mathrm{milk, bread, butter}\} has a support of as it appears in 20% of all transactions as well. When using antecedents and consequents, it allows a data miner to determine the support of multiple items being bought together in comparison to the whole data set. For example, Table 2 shows that if milk is bought, then bread is bought has a support of 0.4 or 40%. This because in 2 out 5 of the transactions, milk as well as bread are bought. In smaller data sets like this example, it is harder to see a strong correlation when there are few samples, but when the data set grows larger, support can be used to find correlation between two or more products in the supermarket example. Minimum support thresholds are useful for determining which itemsets are preferred or interesting. If we set the support threshold to ≥0.4 in Table 3, then the \{\mathrm{milk}\} \Rightarrow \{\mathrm{eggs}\} would be removed since it did not meet the minimum threshold of 0.4. Minimum threshold is used to remove samples where there is not a strong enough support or confidence to deem the sample as important or interesting in the dataset. Another way of finding interesting samples is to find the value of (support)×(confidence); this allows a data miner to see the samples where support and confidence are high enough to be highlighted in the dataset and prompt a closer look at the sample to find more information on the connection between the items. Support can be beneficial for finding the connection between products in comparison to the whole dataset, whereas confidence looks at the connection between one or more items and another item. Below is a table that shows the comparison and contrast between support and support × confidence, using the information from Table 4 to derive the confidence values. The support of with respect to is defined as the proportion of transactions in the dataset which contains the itemset . Denoting a transaction by (i,t) where is the unique identifier of the transaction and is its itemset, the support may be written as: :\mathrm{support\,of\,X} = \frac This notation can be used when defining more complicated datasets where the items and itemsets may not be as easy as our supermarket example above. Other examples of where support can be used is in finding groups of genetic mutations that work collectively to cause a disease, investigating the number of subscribers that respond to upgrade offers, and discovering which products in a drug store are never bought together. With respect to , the confidence value of an association rule, often denoted as X \Rightarrow Y, is the ratio of transactions containing both and to the total amount of values present, where is the antecedent and is the consequent. Confidence can also be interpreted as an estimate of the
conditional probability P(E_Y | E_X), the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS. It is commonly depicted as: :\mathrm{conf}(X \Rightarrow Y) = P(Y | X) = \frac{\mathrm{supp}(X \cup Y)}{ \mathrm{supp}(X) }=\frac{\text{number of transactions containing }X\text{ and }Y}{\text{number of transactions containing }X} The equation illustrates that confidence can be computed by calculating the co-occurrence of transactions and within the dataset in ratio to transactions containing only . This means that the number of transactions in both and is divided by those just in . For example, Table 2 shows the rule \{\mathrm{butter, bread}\} \Rightarrow \{\mathrm{milk}\} which has a confidence of \frac{1/5}{1/5}=\frac{0.2}{0.2}=1.0 in the dataset, which denotes that every time a customer buys butter and bread, they also buy milk. This particular example demonstrates the rule being correct 100% of the time for transactions containing both butter and bread. The rule \{\mathrm{fruit}\} \Rightarrow \{\mathrm{eggs}\}, however, has a confidence of \frac{2/5}{3/5}=\frac{0.4}{0.6}=0.67. This suggests that eggs are bought 67% of the times that fruit is brought. Within this particular dataset, fruit is purchased a total of 3 times, with two of those times consisting of egg purchases. For larger datasets, a minimum threshold, or a percentage cutoff, for the confidence can be useful for determining item relationships. When applying this method to some of the data in Table 2, information that does not meet the requirements are removed. Table 4 shows association rule examples where the minimum threshold for confidence is 0.5 (50%). Any data that does not have a confidence of at least 0.5 is omitted. Generating thresholds allow for the association between items to become stronger as the data is further researched by emphasizing those that co-occur the most. The table uses the confidence information from Table 3 to implement the Support × Confidence column, where the relationship between items via their both confidence and support, instead of just one concept, is highlighted. Ranking the rules by Support × Confidence multiples the confidence of a particular rule to its support and is often implemented for a more in-depth understanding of the relationship between the items. Overall, using confidence in association rule mining is great way to bring awareness to data relations. Its greatest benefit is highlighting the relationship between particular items to one another within the set, as it compares co-occurrences of items to the total occurrence of the antecedent in the specific rule. However, confidence is not the optimal method for every concept in association rule mining. The disadvantage of using it is that it does not offer multiple difference outlooks on the associations. Unlike support, for instance, confidence does not provide the perspective of relationships between certain items in comparison to the entire dataset, so while milk and bread, for example, may occur 100% of the time for confidence, it only has a support of 0.4 (40%). This is why it is important to look at other viewpoints, such as Support × Confidence, instead of solely relying on one concept incessantly to define the relationships.
Lift The
lift of a rule is defined as: \mathrm{lift}(X\Rightarrow Y) = \frac{ \mathrm{supp}(X \cup Y)}{ \mathrm{supp}(X) \times \mathrm{supp}(Y) } or the ratio of the observed support to that expected if X and Y were
independent. For example, the rule \{\mathrm{milk, bread}\} \Rightarrow \{\mathrm{butter}\} has a lift of \frac{0.2}{0.4 \times 0.4} = 1.25 . If the rule had a lift of 1, it would imply that the probability of occurrence of the antecedent and that of the consequent are independent of each other. When two events are independent of each other, no rule can be drawn involving those two events. If the lift is > 1, that lets us know the degree to which those two occurrences are dependent on one another, and makes those rules potentially useful for predicting the consequent in future data sets. If the lift is \{\mathrm{milk, bread}\} \Rightarrow \{\mathrm{butter}\} has a conviction of \frac{1 - 0.4}{1 - 0.5} = 1.2 , and can be interpreted as the ratio of the expected frequency that X occurs without Y (that is to say, the frequency that the rule makes an incorrect prediction) if X and Y were independent divided by the observed frequency of incorrect predictions. In this example, the conviction value of 1.2 shows that the rule \{\mathrm{milk, bread}\} \Rightarrow \{\mathrm{butter}\} would be incorrect 20% more often (1.2 times as often) if the association between X and Y was purely random chance.
Alternative measures of interestingness In addition to confidence, other measures of
interestingness for rules have been proposed. Some popular measures are: • All-confidence • Collective strength • Leverage Several more measures are presented and compared by Tan et al. and by Hahsler. Looking for techniques that can model what the user has known (and using these models as interestingness measures) is currently an active research trend under the name of "Subjective Interestingness." ==History==