Вы находитесь на странице: 1из 3

How To Perform Common Excel Commands in Python

Machine Learning is a valuable skill to learn. There's no doubt about it. But let's face it, most of the time, we can already
generate a business value with relatively simple calculations, we just have to ask the right questions (Credits to a meme
that i saw from Isaac Reyes). But sometimes, when faced with ~100mb worth of data, excel isn't gonna cut it. Most Data
Analysts comes from a quantitative degree, without a Programming background, and most introductory tutorial isn't
practical. My goal is to let you get started in using Python for Data Analysis. We will cover:

How to read the data / Installation Guide.


How to filter data.
How to aggregate the data using pivot table.
How to combine data from different data sources (something like a SQL join).
How to plot the data.

Reading the Data


Okay, first assuming that you have a jupyter notebook installed. For ease of use, i recommend installing it using
Anaconda.

Once you have your Jupyter up and running, it will look something like this:

To upload your csv file, simply click on the Upload button (On the right hand corner). Once you have uploaded your csv
file, you're ready to create your first notebook file. Simply click on New which is right beside the Upload button.

Now you're ready to start analyzing some data!

In [1]: import pandas as pd #You have to write this code first. Just think of this a
s a requirement.

#The below to code are just to give you an excel like feel.
pd.set_option('display.max_columns', False)

What the code below is doing is it is loading your csv file named mydata.csv to a variable named dataset .

The command :

dataset.head(5)

will show the first 5 rows of your data. Changing the number inside will change the number of rows displayed.

20/11/2019, 1:44 pm
In [2]: dataset = pd.read_csv('mydata.csv')
pd.set_option('display.max_rows', dataset.shape[0]+1)
dataset.head(5)

Out[2]:
Product_Info_1 Product_Info_2 Product_Info_3 Product_Info_4 Product_Info_5 Ins_Age Ht

0 1 D3 10 0.076923 2 0.641791 0.581818 0.148536

1 1 A1 26 0.076923 2 0.059701 0.600000 0.131799

2 1 E1 26 0.076923 2 0.029851 0.745455 0.288703

3 1 D4 10 0.487179 2 0.164179 0.672727 0.205021

4 1 D2 26 0.230769 2 0.417910 0.654545 0.234310

How to Filter Data


Let's say you wanted to filter the data where the column Ht > 0.8 So that this document will not be too large, i will just
always print the first 10 rows.

In [3]: dataset[dataset['Ht']>0.8].head(10)

Out[3]:
Product_Info_1 Product_Info_2 Product_Info_3 Product_Info_4 Product_Info_5 Ins_Age Ht

5 1 D2 26 0.230769 3 0.507463 0.836364 0.299163

24 1 D1 26 0.487179 2 0.164179 0.818182 0.435146

34 1 D1 26 0.487179 2 0.567164 0.854545 0.456067

38 1 A8 26 0.128205 2 0.746269 0.836364 0.456067

44 1 D2 26 0.076923 2 0.701493 0.836364 0.405858

58 1 D1 26 1.000000 2 0.477612 0.836364 0.315900

59 1 D3 26 0.384615 2 0.268657 0.854545 0.435146

65 1 D2 26 0.076923 2 0.537313 0.836364 0.456067

84 1 D1 26 0.076923 2 0.791045 0.818182 0.560669

85 1 D3 26 0.615385 2 0.223881 0.872727 0.303347

Filtering with multiple conditions

Ht > 0.8 AND Wt < 0.6

20/11/2019, 1:44 pm
In [4]: dataset[(dataset['Ht']>0.8) & (dataset['Wt']<0.6)].head(10)

Out[4]:
Product_Info_1 Product_Info_2 Product_Info_3 Product_Info_4 Product_Info_5 Ins_Age Ht

5 1 D2 26 0.230769 3 0.507463 0.836364 0.299163

24 1 D1 26 0.487179 2 0.164179 0.818182 0.435146

34 1 D1 26 0.487179 2 0.567164 0.854545 0.456067

38 1 A8 26 0.128205 2 0.746269 0.836364 0.456067

44 1 D2 26 0.076923 2 0.701493 0.836364 0.405858

58 1 D1 26 1.000000 2 0.477612 0.836364 0.315900

59 1 D3 26 0.384615 2 0.268657 0.854545 0.435146

65 1 D2 26 0.076923 2 0.537313 0.836364 0.456067

84 1 D1 26 0.076923 2 0.791045 0.818182 0.560669

85 1 D3 26 0.615385 2 0.223881 0.872727 0.303347

Ht > 0.8 OR Wt < 0.6

In [5]: dataset[(dataset['Ht']>0.8) & (dataset['Wt']<0.6)].head(10)

Out[5]:
Product_Info_1 Product_Info_2 Product_Info_3 Product_Info_4 Product_Info_5 Ins_Age Ht

5 1 D2 26 0.230769 3 0.507463 0.836364 0.299163

24 1 D1 26 0.487179 2 0.164179 0.818182 0.435146

34 1 D1 26 0.487179 2 0.567164 0.854545 0.456067

38 1 A8 26 0.128205 2 0.746269 0.836364 0.456067

44 1 D2 26 0.076923 2 0.701493 0.836364 0.405858

58 1 D1 26 1.000000 2 0.477612 0.836364 0.315900

59 1 D3 26 0.384615 2 0.268657 0.854545 0.435146

65 1 D2 26 0.076923 2 0.537313 0.836364 0.456067

84 1 D1 26 0.076923 2 0.791045 0.818182 0.560669

85 1 D3 26 0.615385 2 0.223881 0.872727 0.303347

To be continued..

In [ ]:

20/11/2019, 1:44 pm

Вам также может понравиться