Description
This course goes through the dynamics of exploratory data analysis. Eleven datasets are used in this course, with two of them being Kaggle competition datasets.
In this course the student will learn how to perform an exploratory data analysis using Python. The main libraries in Python that will be used are pandas, numpy, matplotlib and seaborn.
Pandas is a library that is used to read tabular files into a program and convert them to dataframes. It can also manage the data in the dataframes and even create new dataframes from existing dataframes using filtering techniques, appending, and merging. Some of the functions that are used to analyse dataframes are the describe funvtion, which will give information about the numerical columns in a dataframe, and the info function, which will provide information about each column in a dataframe. In addition, the shape function will tell the user how many rows and columns are present in each dataframe. Another important function is the isnull function, which will reveal how many null values are present in a dataframe.
Numpy is a library that is used to provide mathematical calculations and also create numpy arrays. Some of the mathematical operations that are performed in this course are statistical functions, such as minimum values, maximum values, median values, mean values, the most commonly used values, and standard deviation.
Matplotlib is a graphical library, which will allow the user to visualise the data by plotting each datapoint onto a two dimensional graph. Many different types of graphs can be produces by matplotlib, to include line charts, bar charts, histograms, scatter plots, area plots and pie plots.
Seaborn is a high level graphics library that has been written on top of matplotlib. Seaborn can produce statistical graphs, which makes it ideal for writing statistical programs. Some of the graphs that this library can produce are distplots, catplots, jointplots, pairplots, and heatmaps.