Exploratory Data Analysis (EDA)
Exploratory Data Analysis methods and techniques
Exploratory Data Analysis (EDA) is the act of getting intimate with your data.
This means you get a feeling for your data. You donβt simply know itβs characteristics (# rows, columns, distributions, etc.)β¦you actually feel it.
It may sound a bit corny, but after doing data for long enough, you gain the ability to understand a dataset on an intuition level.
EDA is the process of initial exploration. Imagine you are in a deep dark cave and all you have is a flash light. You illuminate sections of the walls, the ground, and head down passages. EDA is the same process for exploring data.
Whenever we do Exploratory Data Analysis, you can bet we are analyzing:
- # rows, #columns
- Column cardinality (how many unique elements are there in each group?)
- Correlations, which columns relate to each other?
- What are the min/max of each column?
- What do outliers (if any) say about the data?
There isnβt a right answer when doing EDA. The goal is for you to have a launching point that will lead to more analysis. Youβll know when you are done when you are sufficiently inspired to take the next step in your analysis.
Letβs take a look at a python EDA sample.