ASSOCIATION RULE MINING

OVERVIEW:

Association rule mining (ARM) is a unsupervised Machine Learning technique used to discover interesting relationships, patterns, or associations among variables in large datasets.

arm1

ARM is often applied to transactional datasets, such as market basket data, where each transaction consists of a set of items.

armdtypes

The above image represents Transaction Data in Basket (left), Single (center), and Matrix (right) formats.

Measures in ARM:

Let A and B are sets and assume association rule A => B.

  1. Support:
  2. It measures how often an item in A and an item in B occur together relative to all the transactions.

    support

  3. Confidence:
  4. It measures how often an item in A and an item in B occur together relative to transactions that contain A.

    Confidence

    From the above, Confidence Support.

  5. Lift:
  6. Lift is a measure of how much more likely items in Y is to occur when the items in X is present, compared to when it is absent. It is the ratio of the confidence of the rule, to the frequency of the items in Y in the whole dataset.

    lift

    From the above, Lift Confidence.

    1. If Lift (A, B) = 1 then P(A, B) = P(A).P(B) i.e, if Lift is 1, then A and B are Independent.

    2. If Lift (A, B) < 1 then P(A, B) < P(A).P(B) i.e, if Lift is less than 1, then A and B are negatively correlated.

    3. If Lift (A, B) > 1 then P(A, B) > P(A).P(B) i.e, if Lift is greater than 1, then A and B are positively correlated.

    Since we are looking for associations, we will consider only rules with Lift > 1.

Association Rules:

Association rules are logical implications that describe relationships between sets of items in the dataset. They are typically represented in the form of "if-then" statements, where the antecedent (A) implies the consequent (B).

Example: A => B

If an item A is purchased, then item B is likely to be purchased.

Apriori Algorithm:

The Apriori algorithm is a classic algorithm for association rule mining. It's specifically designed to extract frequent itemsets and generate association rules from transactional databases.

Steps:

  1. Generating Frequent Itemsets:
    1. Initialization: Start by identifying all unique items in the dataset.
    2. Support calculation: Scan the dataset to count the support (frequency of occurrence) of each individual item. Items with support above a predefined minimum support threshold are considered frequent 1-itemsets.
    3. Joining: Proceed iteratively to generate larger itemsets. In each iteration, join pairs of frequent (k-1)-itemsets to form candidate k-itemsets.
    4. Pruning: Prune candidate itemsets that contain subsets which are infrequent. This pruning step is possible due to the Apriori property, which states that if an itemset is infrequent, all its supersets must also be infrequent.
    Repeat steps 3 and 4 until no new frequent itemsets can be generated.

  2. Generating Association Rules:
    1. Rule Generation: For each frequent itemset, generate association rules by considering all possible combinations of items as antecedents and consequents.
    2. Rule Evaluation: Calculate teh measures such as confidence and support for each rule.
    3. Pruning: Discard rules that do not meet a minimum threshold measure.

apriori

PLAN

  1. Firstly, unlabelled transaction data with City and Weather details is required to generate association rules between cities and weather types.
  2. Next step is to read the unlabelled transaction data with city and weather and generate association rules with minimum support and confidence threshold values.
  3. Finally sort the rules and identify the top 15 rules by Support, Confidence and Lift.
  4. Also lets check top cities associated with snowy weather.

DATA PREPARATION

Association Rule Mining typically operate on unlabelled transaction data.

  1. Before Transformation:
  2. The below image shows the sample of data before transformation.

    aftercleaning

  3. After Transformation:
  4. The below image shows the data after transformation into transaction data (Basket) and after removing labels. It shows city and its hourly weather type in 2017.

    transactiondata

  5. Transaction Dataset (Basket):
  6. transactiondata.csv

CODE

  1. ARM (R):
  2. ARM.ipynb

THRESHOLDS

  1. Minimum Support:
  2. Minumum Support for generating Top 15 rules overall and Top 5 rules for Snowy Weather is 0.0001. Low threshold value is due to 213435 transactions and each city has around 7000 transactions which is comparatively very less.

  3. Minimum Confidence:
  4. Minimum confidence for generating Top 15 rules overall is 0.1 and for generating Top 5 rules for Snowy weather is 0.01 due to comparatively less transactions containing snowy weather.

RESULTS

  1. Top 10 Frequent items:
  2. 10freq

    The above image displays Top 10 frequent items from the transaction data containing city and weather details. Clear weather is the most frequently found item in the transaction data.

  3. Top 15 Rules by Support:
  4. top15sup

    The above image displays Top 15 rules by Support generated from the transaction data using Apriori Algorithm. Las Vegas and Clear weather occurred together most times in the transaction data.

    netsup

    The above interactive network diagram depicts Top 15 rules by Support. Size indicates Support. Higher the size of node in the network higher its Support is and similarly color indicates Confidence.

  5. Top 15 Rules by Confidence:
  6. top15conf

    The above image displays Top 15 rules by Confidence generated from the transaction data using Apriori Algorithm. Las Vegas and Clear weather occurred together most times in the transactions that contains Las Vegas.

    netconf

    The above interactive network diagram depicts Top 15 rules by Confidence. Size indicates Support. Higher the size of node in the network higher its Support is and similarly color indicates Confidence.

  7. Top 15 Rules by Lift:
  8. top15lift

    The above image displays Top 15 rules by Lift generated from the transaction data using Apriori Algorithm. Other weather type and Los Angeles will occur together more frequently than would be expected if they were independent.

    netlift

    The above interactive network diagram depicts Top 15 rules by Lift. Size indicates Support. Higher the size of node in the network higher its Support is and similarly color indicates Confidence.

  9. Top 5 Rules for Snowy Weather by Confidence:
  10. top5snow

    The above image displays Top 5 rules for Snowy Weather by confidence generated from the transaction data using Apriori Algorithm. Pittsburgh and Snowy weather occurred together most times in the transactions that contains Pittsburgh.

    netsnow

    The above interactive network diagram depicts Top 5 rules for Snowy Weather by confidence. Size indicates Support. Higher the size of node in the network higher its Support is and similarly color indicates Confidence.

CONCLUSION

The exploration of city-to-weather associations uncovers profound insights into how various urban landscapes are influenced by weather dynamics. Through this investigation, distinct patterns associating specific cities with particular weather profiles are unveiled, revealing the unique climatic identities of each location. This comprehension deepens the understanding of how weather profoundly impacts urban life, shaping daily routines and informing long-range urban planning strategies. Furthermore, understanding the intricate city-weather associations provides insight into both the obstacles and potential for urban growth. Armed with this knowledge, urban planners and politicians can proactively protect cities against negative weather impacts while capitalising on favourable conditions to improve quality of life and sustainability. Finally, the investigation emphasises the dynamic interaction between urban surroundings and natural factors, establishing the framework for resilient, adaptable cities that can thrive in the face of changing climate realities.