Skip to content

Engage in the critical phase of Exploratory Data Analysis (EDA) using the tools and techniques from Python to uncover patterns, spot anomalies, test hypotheses, and identify the main structures of your dataset.

Notifications You must be signed in to change notification settings

Helzheng123/datasci_3_eda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

datasci_3_eda

This is an assignment for HHA 507

Objective: Engage in the critical phase of Exploratory Data Analysis (EDA) using the tools and techniques from Python to uncover patterns, spot anomalies, test hypotheses, and identify the main structures of your dataset.

Instructions:

  1. Univariate Analysis:
  • Load a dataset of your choice in your Colab notebook .ipynb or in a python script .py (you can use one from previous assignments or find a new one).
  • Manually perform a univariate analysis to understand the distribution of each variable. This includes calculating measures of central tendency (mean, median, mode) and measures of spread (range, variance, standard deviation, IQR).
  • Visualize the distribution of select numerical variables using histograms.
  1. Bivariate Analysis:
  • Analyze the relationship between pairs of variables.
    • Use scatter plots to explore potential relationships between two numerical variables.
    • For categorical and numerical variable pairs, use boxplots.
  • Compute correlation coefficients for numerical variables and document any strong correlations observed.
  1. Handling Outliers:
  • Identify outliers in your dataset using the IQR method or visualization tools.
  • Decide on an approach to handle these outliers (e.g., remove, replace, or retain) and justify your decision in a markdown cell.
  • If there are no outliers based on 1, 2, or 3 standard deviations (or z scores >= 1), please state that and support it with your code.
  1. Automated Analysis:

Please refer to datasets to view the dataset used for this repo. Please refer to the automatedEDA folder to view the automated EDA pandas profiling.

About

Engage in the critical phase of Exploratory Data Analysis (EDA) using the tools and techniques from Python to uncover patterns, spot anomalies, test hypotheses, and identify the main structures of your dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published