Association rule mining (ARM) is a unsupervised Machine Learning technique used to discover interesting relationships, patterns, or associations among variables in large datasets.
ARM is often applied to transactional datasets, such as market basket data, where each transaction consists of a set of items.
The above image represents Transaction Data in Basket (left), Single (center), and Matrix (right) formats.
Let A and B are sets and assume association rule A => B.
It measures how often an item in A and an item in B occur together relative to all the transactions.
It measures how often an item in A and an item in B occur together relative to transactions that contain A.
From the above, Confidence ≥ Support.
Lift is a measure of how much more likely items in Y is to occur when the items in X is present, compared to when it is absent. It is the ratio of the confidence of the rule, to the frequency of the items in Y in the whole dataset.
From the above, Lift ≥ Confidence.
If Lift (A, B) = 1 then P(A, B) = P(A).P(B) i.e, if Lift is 1, then A and B are Independent.
If Lift (A, B) < 1 then P(A, B) < P(A).P(B) i.e, if Lift is less than 1, then A and B are negatively correlated.
If Lift (A, B) > 1 then P(A, B) > P(A).P(B) i.e, if Lift is greater than 1, then A and B are positively correlated.
Since we are looking for associations, we will consider only rules with Lift > 1.
Association rules are logical implications that describe relationships between sets of items in the dataset. They are typically represented in the form of "if-then" statements, where the antecedent (A) implies the consequent (B).
Example: A => B
If an item A is purchased, then item B is likely to be purchased.
The Apriori algorithm is a classic algorithm for association rule mining. It's specifically designed to extract frequent itemsets and generate association rules from transactional databases.
Steps:
Association Rule Mining typically operate on unlabelled transaction data.
The below image shows the sample of data before transformation.
The below image shows the data after transformation into transaction data (Basket) and after removing labels. It shows city and its hourly weather type in 2017.
Minumum Support for generating Top 15 rules overall and Top 5 rules for Snowy Weather is 0.0001. Low threshold value is due to 213435 transactions and each city has around 7000 transactions which is comparatively very less.
Minimum confidence for generating Top 15 rules overall is 0.1 and for generating Top 5 rules for Snowy weather is 0.01 due to comparatively less transactions containing snowy weather.
The above image displays Top 10 frequent items from the transaction data containing city and weather details. Clear weather is the most frequently found item in the transaction data.
The above image displays Top 15 rules by Support generated from the transaction data using Apriori Algorithm. Las Vegas and Clear weather occurred together most times in the transaction data.
The above interactive network diagram depicts Top 15 rules by Support. Size indicates Support. Higher the size of node in the network higher its Support is and similarly color indicates Confidence.
The above image displays Top 15 rules by Confidence generated from the transaction data using Apriori Algorithm. Las Vegas and Clear weather occurred together most times in the transactions that contains Las Vegas.
The above interactive network diagram depicts Top 15 rules by Confidence. Size indicates Support. Higher the size of node in the network higher its Support is and similarly color indicates Confidence.
The above image displays Top 15 rules by Lift generated from the transaction data using Apriori Algorithm. Other weather type and Los Angeles will occur together more frequently than would be expected if they were independent.
The above interactive network diagram depicts Top 15 rules by Lift. Size indicates Support. Higher the size of node in the network higher its Support is and similarly color indicates Confidence.
The above image displays Top 5 rules for Snowy Weather by confidence generated from the transaction data using Apriori Algorithm. Pittsburgh and Snowy weather occurred together most times in the transactions that contains Pittsburgh.
The above interactive network diagram depicts Top 5 rules for Snowy Weather by confidence. Size indicates Support. Higher the size of node in the network higher its Support is and similarly color indicates Confidence.
The exploration of city-to-weather associations uncovers profound insights into how various urban landscapes are influenced by weather dynamics. Through this investigation, distinct patterns associating specific cities with particular weather profiles are unveiled, revealing the unique climatic identities of each location. This comprehension deepens the understanding of how weather profoundly impacts urban life, shaping daily routines and informing long-range urban planning strategies. Furthermore, understanding the intricate city-weather associations provides insight into both the obstacles and potential for urban growth. Armed with this knowledge, urban planners and politicians can proactively protect cities against negative weather impacts while capitalising on favourable conditions to improve quality of life and sustainability. Finally, the investigation emphasises the dynamic interaction between urban surroundings and natural factors, establishing the framework for resilient, adaptable cities that can thrive in the face of changing climate realities.