Вы находитесь на странице: 1из 9

Modules (sometimes called packages or libraries) help group together related

sets of tools in Python. In this exercise, we will examine two modules that are
frequently used by Data Scientists:

1. statsmodels: used in machine learning; usually aliased as sm


2. seaborn: a visualization library; usually aliased as sns

Pandas

?What can pandas do for you

 Loading tabular data from different sources


 Search for particular rows or columns
 Calculate aggregate statistics
 Combining data from multiple sources

CSV files:

Import pandas as pd

Df = pd.read_csv('random.csv')

Print(df)

Methods:
Print(Df.head()) (method to display the first five lines in data frame)

Df.info() (disply information about data frame)

Selecting columns ways:


1. selecting with brackets and string: strings with spaces or special characters

Dataframe name['columnname']

2. selecting with a dot: if the name contains only letters or numbers

Dataframe name.columnname

Logical statements:

Question == solution (test if the two nu,ber are the same

>, >=, <, <=, !=

Booleans: true or false

In data frame we can compare more than one variable with one variable

Ex: credit_records[credit_records.price > 20.00]

matplotlib

Creating plot line:


From matplotlib import pyplot as plt;

Plt.plot( Xvalues, Yvalues )

Plt.show() to display the plot

Adding labels and legends:


Axes and title labels:
Plt.xlabel(" ")
Plt.ylabel(" ")
Plt.title(" ")
Plt.xticks() to change the numbers on axis
Plt.yticks()
Legends:
Add keyword argument label:
Plt.plot(x, y, label = " ")
Plt.legend() to show legends
Arbitrary text: (floating text)
Plt.text(x, y, ' ')
Modifying text:
Change font size:
Plt.title(' ', fontsize=20)
Change font color:
Plt.legend(color='green')
Adding style:
Changing line color:
Plt.plot(x, y, color=" ")
Changing line width:
Linewidth=1 (from 1 to 7)
Changing line style:
Linestyle='-' or '—' or '-.' Or ':'
Adding markers:
Marker='x' or 's' or 'o' or 'd' or '*' or 'h'
Setting a style:
Plt.style.use(' ')

 'fivethirtyeight' - Based on the color scheme of the popular website


 'grayscale' - Great for when you don't have a color printer!
 'seaborn' - Based on another Python visualization library
 'classic' - The default color scheme for Matplotlib

Making a Scatter Plot:


Plt.scatter(x,y)
Plt.show()
Changing marker transparency:
Alpha= 0 to 1
logarithmic scale:

plt.xscale('log');
Making a bar chart:
Plt.bar(x, y)
Plt.barh(x,y) to make horizontal bar chart
Adding error bars:
Yerr=dataframename.error
Stacked bar carts:
Display two diff. sets of bars
Bottom = df.column

Making a histogram:
Plt.hist(x,y)
Changing bins:
Bins=num of bins
Changing range:
Range = (xmin, xmax)
Normalizing:
Reduces the height of each bar by a const. factor so that the area of each bar adda
to one.
Density=True

Dictionary:
varName = {key: value}
name[key] >> to print value
name.keys() >> print all keys
key in name >> true/ false
del(name[key]);

differences between list and dictionariy:


list is indexed by range of numbers, order matters
dictionary is indexed by unique keys, order doesn't matter
PANDAS:

Pandas is an open source library, providing high-performance, easy-to-use


data structures and data analysis tools for Python.

The DataFrame is one of Pandas' most important data structures. way to store
tabular data where you can label the rows and the columns. One way to build a
DataFrame is from a dictionary.

Import pandas as pd
To turn dictionary into dataframe
DataFrameName= pd.dataframe(dictionaryName);
dataframeName.index = …;
CSV files:
Name= pd.read_csv('', index_col = 0);

Index and select data:


Square brackets: selecting columns: dataframeName['colName'],
dataframeName[['colName']]
Selecting rows: dataframeName['1:4'],
Advanced methods:
1- loc: select parts of your data based on labels: loc['dataname'] or [['name']]
Square brackets

● Column access

● Row access: only through slicing

● loc (label-based)

● Row access

● Column access

● Row & Column access brics[["country", "capital"]] brics[1:4] brics.loc[["RU", "IN", "CH"]]
brics.loc[:, ["country", "capital"]] brics.loc[["RU", "IN", "CH"], ["country", "capital"]]

2- iloc: position-based
Like loc but instead of col names you put numbers from 0 to …

Boolean operators: and, or , not


True and true >> true
Numpy operators:
Np.Logical_and(), np.logical_or(), np.logical_not()
Conditional statements:
If, elif, else
If z % 2 == 0 :
Print('z is even')
Else :
Print('z is odd')
Elif ( it is like else if)

Loops:
1- While loop = repeated if statement
Syntax:
While condition :
expression
2- For loop:
Syntax:
For var in seq :
Expression
Ex:
Fam = [1, 2, 4, 5]
For height in fam :
Print(height)

Using enumerate:
For index, height in enumerate (fam) :
Print( height + index)

For c in "family" :

The loop will run for a number of times equal to each char in the string

Loop data structure:

1- Dictionary: use a method

For key, value in dicName .items() :

Print (key + value)

2- Numpy Arrays: 2D array: use a function

For val in np.nditer(npArr) :

Print(val)

3- pandas Data Frame: use a method

For lab, row in dataframeName.iterrows() :


Print(lab)

Print(row)

Using apply func to apply a function for every row in data:

dataframeName["colName'].apply(function)

str(integer) >> convert a number to a string

Random Number:

Random generators:

Np.random.rand()

Np. Random. Seed()

Np.random.randint(0,2) generate 0 or 1

Arr.append() >> add item to array

For x in range(number) : >> make for loop run for a number of times

Np.mean()

Python Data Science Toolbox (Part 1)

1-Defining a function:
Def funcName(): (function header)
Function body

Calling the function>> funcName()

2-Function Parameters:

Def funcName(value):

……
funcName(---)

3-return values from functions:

Return ---

4-Docstrings:

-describe what your function does, placed in the immediate line after the function header

"""….."""

5- Multiple Function Param.

6- return multiple values (tuples)

Tuples is like a list and immutable and constructed using ()

Even_nums = (2, 4, 6)

A, b, c = even_nums

Scope and user-defined functions:

Scopes are three: Global, local, built-in (like print())

Python first looks in local, then enclosing functions if there any, then global then built-in
scope

Using Global key word alter the value of a variable defined in the global scope.

Import builtins to access the python built-in scope????

Nested Functions:

Def outer(..):

Code

Def inner(…):

code

outer function is called enclosing function

Returning Functions

Closure: anything defined locally in the enclosing scope is available to the inner
function even when the outer function has finished execution.

Using nonlocal to alter the value of a variable defined in the enclosing scope.

Default and flexible arguments


Add default argument: funcName(arg = default value)

Functions with variable-length arguments: (*args)

Functions with variable-length arguments: (**Kwargs)

Lambda Functions:

Quicker way to write functions:

Varname = lambda x, y: x ** y

Varname(1,2)

Anonymous functions:

Map(func, seq)

We can write this function into map using lambda and without defining it.

To convert a variable into a list: list(variable)

Errors and exceptions:


 Exceptions - caught during execution
 Catch exceptions with try-except clause
 Runs the code following try
 If there’s an exception, run the code following except

Try:

Except Type Error:

Print()

If x < 0:

Raise ValueError('')

Try:

….

Except:

Print()

Вам также может понравиться