Data Analysis : Initial Steps For Any Dataset (Version 1)
We often get confused while reading a book which does not have index in it.
This article or document will be so beneficiary for all beginners and those are in the field of data. The problem with me while I was learning data science was I started from somewhere and keep going but later found that, I need an index that will keep me in the direction of effective and productive learning. Thats why if you want to enter into field of data my advise is to start with very basics and get complete knowledge about data analysis first.
As title says these are the initial steps, data analysis is all about finding meaning and solution in huge amount of data. But before that we should know about given dataset. The information we will get from dataset in initial phase is presented here.
“The ability to take data to able to understand it, to process it, to extract value from it, to visualize it, to communicate it that’s going to be important skills in next decades.” — Hal Varian, Chief economist at Google
Below are 12 initial steps in any data analysis of dataset
Step 1 : Import pre-libraries
Step 2 : Load dataset
Step 3 : Get general information of dataset
Step 4 : Set statistical information of dataset
Step 5 : Missing data finding and management
Step 6 : Check data type of each column and change it if required
Step 7 : Display heat map to visualize the correlation between features
Step 8 : Calculate and interpret measure of central dependency
Step 9 : Calculate and interpret measure of dispersion
Step 10 : Calculate and interpret moments
Steo 11 : State problem statements and solve it
Step 12. Visualize the solution
Step 1 — Import pre-libraries
Below are the most welcomed libraries of all time. Just import them because eventually you are going to use them just trust me 👻.
Step 2 — Load dataset
We are using below dataset for reference.


Step 3 — Get general information of dataset
Now its time to see the general information of dataset like column names, non-null count, data type of each column, memory usage, index range etc.


Step 4 — Set statistical information of dataset
Using describe() method we can find the statistical information of the numerical and non-numerical columns.


Step 5 — Missing data finding and management
Before heading forward we should take care 👿 of missing data present in our dataset. Below demonstration shows that how to find and handle missing data.



Other ways to find missing values.
Lets handle missing values using simple imputer.
Below are some other ways we can handle missing data.
Step 6 — Check data type of each column and change it if required
So it is needed to check data type of each column and then change data type if required. It will bring down memory usages.


Step 7 — Display heat map to visualize the correlation between features

Step 8 — Calculate and interpret measure of central dependency

Step 9 — Calculate and interpret measure of dispersion

Step 10 — Calculate and interpret moments

Step 11 — State problem statements and solve it
This is where you can state problem statements and crate logics and codes to find solution of those problem statements.
Step 12 — Visualize solution
When we done with solving the problem statement, try to demonstrate that solution with visuals. That will quickly demonstrate the solution.
Full code
YouTube
Thanks you for your time 🙂
You can also see below medium pages if you like.
To connect : kishantongrao123@gmail.com/kishan.tongs@gmail.com