RESEARCH QUESTIONS
- How do temperature and humidity vary over different time intervals in Denver (2017) within the dataset, and are there noticeable trends or anomalies?
Answer:
- Temperature starts to increase from January to July and then decreases gradually from July to November in Denver (2017)
- January is the coldest month and July is the hottest month.
- There is no significant pattern in Humidity.
- January and May recorded highest humidity and March recorded Lowest Humidity.
- Which cities had the most days with a particular weather condition?
Answer:
- Las Vegas had most days with Clear weather
- Albuquerque had most days with Cloudy weather
- San Diego had most days with Foggy Weather
- Seattle had most days with Rainy weather
- Pittsburgh had most days with Snowy weather
- Miami had most days with Thunderstorm weather
- Los Angeles had most days with Other weather
- Is there a statistically significant correlation between any two variables in the dataset, and if so, what is the nature of this relationship?
Answer:
- There is a weak negative correlation between Temperature and humidity, which suggests that as temperatures increase, humidity tends to decrease.
- There are no significant correlations between other variables.
- Which group of cities have high snowy weather?
Answer:
Detroit, Minneapolis and Pittsburgh have frequent snowy weather.
- Which clustering algorithm is most effective in grouping cities?
Answer:
K-Means and Hierarchical clustering with Euclidean distance gave same and effective groupings.
- Are there any meaningful insights about the co-occurrence of weather conditions in different cities that can be identified using association rule mining?
Answer:
-
Las Vegas and Clear weather occurred together most times in the transactions that contains Las Vegas.
-
Pittsburgh and Snowy weather occurred together most times in the transactions that contains Pittsburgh.
- What are the best criterion and splitter parameters for Decision Tree?
Answer:
-
Decision Tree with entropy criterion and best splitter achieved highest accuracy of 67.79%. (Python)
-
Decision Tree with gini criterion has achieved highest accuracy of 67.98%. (R)
- Which features has the most significant impact on the decision-making process when predicting weather conditions using decision trees?
Answer:
-
Temperature is the most important feature. (Python)
-
Humidity is the most important feature. (R)
- What is the best Kernel and cost function combination for Support Vector Machine?
Answer:
SVM model with Polynomial Kernel and Cost function achieved highest accuracy of 70.1%
- How accurately can we predict clear weather conditions using different supervised learning algorithms, and which algorithm performs the best for this task?
Answer:
SVM model with Polynomial Kernel and Cost function achieved highest accuracy of 70.1%