RESEARCH QUESTIONS

clouds
  1. How do temperature and humidity vary over different time intervals in Denver (2017) within the dataset, and are there noticeable trends or anomalies?
  2. Answer:

    1. Temperature starts to increase from January to July and then decreases gradually from July to November in Denver (2017)
    2. January is the coldest month and July is the hottest month.
    3. There is no significant pattern in Humidity.
    4. January and May recorded highest humidity and March recorded Lowest Humidity.

  3. Which cities had the most days with a particular weather condition?
  4. Answer:

    1. Las Vegas had most days with Clear weather
    2. Albuquerque had most days with Cloudy weather
    3. San Diego had most days with Foggy Weather
    4. Seattle had most days with Rainy weather
    5. Pittsburgh had most days with Snowy weather
    6. Miami had most days with Thunderstorm weather
    7. Los Angeles had most days with Other weather

  5. Is there a statistically significant correlation between any two variables in the dataset, and if so, what is the nature of this relationship?
  6. Answer:

    1. There is a weak negative correlation between Temperature and humidity, which suggests that as temperatures increase, humidity tends to decrease.
    2. There are no significant correlations between other variables.

  7. Which group of cities have high snowy weather?
  8. Answer:

    Detroit, Minneapolis and Pittsburgh have frequent snowy weather.

  9. Which clustering algorithm is most effective in grouping cities?
  10. Answer:

    K-Means and Hierarchical clustering with Euclidean distance gave same and effective groupings.

  11. Are there any meaningful insights about the co-occurrence of weather conditions in different cities that can be identified using association rule mining?
  12. Answer:

    1. Las Vegas and Clear weather occurred together most times in the transactions that contains Las Vegas.
    2. Pittsburgh and Snowy weather occurred together most times in the transactions that contains Pittsburgh.

  13. What are the best criterion and splitter parameters for Decision Tree?
  14. Answer:

    1. Decision Tree with entropy criterion and best splitter achieved highest accuracy of 67.79%. (Python)
    2. Decision Tree with gini criterion has achieved highest accuracy of 67.98%. (R)

  15. Which features has the most significant impact on the decision-making process when predicting weather conditions using decision trees?
  16. Answer:

    1. Temperature is the most important feature. (Python)
    2. Humidity is the most important feature. (R)

  17. What is the best Kernel and cost function combination for Support Vector Machine?
  18. Answer:

    SVM model with Polynomial Kernel and Cost function achieved highest accuracy of 70.1%

  19. How accurately can we predict clear weather conditions using different supervised learning algorithms, and which algorithm performs the best for this task?
  20. Answer:

    SVM model with Polynomial Kernel and Cost function achieved highest accuracy of 70.1%