Вы находитесь на странице: 1из 24

DataCamp

Introduction

• What is Data Science ?


• Who is a Data Scientist ?
• What does a Data Scientist do ?
• Difference between Data Analyst and Data
Scientist.
• Reasons to choose Data Science as a
Carrer.
Introduction to Python
• Python is one of the most popular language among every
programmer.
Uses:
- Web Development(Server Side)
- Developing Softwares
- Huge Calculations
- Scripting

• Collection Data Types in Python


1. List : Changeable, Allow Duplicacy.
Declaration : fruits = [“apple”,”mango”,”oranges”]
Accessing : for x in fruits print(x)
Add : fruits.append ('grapes”)
2. Tuples : Orderd , Unchangeable.
Decleartion : tup = (“apple”,”banana”)

3. Sets :Unordered , Unindexed


Decleartion : newset = {“apple”,”litchi”}

4. Dictionary :Unorderd, Changeable and Indexed.


Decleartion : newdict ={“brand” : “Maruti”, “model” :
“Switft”}

Mutable : List, Dictionary


- NumPy : Library in Python
- Adds tons of features
- Arrays, Matrices, Scientifc
Calculations.
- SQL : Structured Query Language.
- Joins Front and Backend.
- SQL in Python :
- Import SQLite
- Use connect() to establish a connection.
- Cursor object is called being capable of
sending commands.
e.g
import sqlite3
connection = sqlite3.connect(“myserver.db”)
JOINS in Python
- JOINS : Clause to combine rows from two or more
tabes on the basis of similar column.
- Types :
- INNER : Returns records with matching value in
both tables.
- LEFT : All from left and matched from right
- RIGHT : All from right and matched from left.
- FULL : Returns value if there is match in either
left or right.
- Union in SQL
- combine result-set of 2 or more select
statements.
- E.g
- SELECT roll_no from tab1
- UNINON
- SELECT name from tab2
Functions in Python
- Helps doing the tasks easily and in a customized
manner.
- Type : Inbuilt and Self Built
- Inbuilt : min(), max(), print() etc.

Self Built/Decleared : Parts : Header, Parameters and


Defination.
def fun() :
say = “Hello Guys”
print(say)
Calling : fun()
Lambda Functions
- Function with no name/ anonymous.
- Small, handy and of one line.

listt = [12,13,49,56]
evn_list = list(filter(lambda x: x% 2 ==0,listt)
print(evn_list)

Output = 12,56.
Pandas Foundations

- Python Library used for Data Analysis.


- Analyze using :
DataFrames
Series
Series : Series is one dimensional(1-D) array defined in
pandas that can be used to store any data type.
- (eg.) Creating series
import pandas as pd
Data =[1, 3, 4, 5, 6, 2, 9]
si = pd.Series(Data, index = Index)
DataFrames : DataFrames is two-dimensional(2-
D) data structure defined in pandas which
consists of rows and columns.

-(eg.) Creating DataFrame


dict1 ={'a':1, 'b':2, 'c':3, 'd’:4}
dict2 ={'a':5, 'b':6, 'c':7, 'd':8, 'e’:9}
Data={'first':dict1, 'second':dict2}
df = pd.DataFrame(Data
Manipulation of DataFrames

- Printing a DataFrame Column:


= df['Col Name']
- V iew first 2 rows :
= df[ : 2]
- Filtering :
df[df['Second'] > 7]
df[(df['Second'] > 7) & (df['Second'] < 9)]
- View a column :
df.ix[ : ,'Second']
- Particular Value :
df.ix[3,2]
Data Visualisation in Python
- MatPlotlib : library in Python used for data
visualization.
- Helps represnting the data in a graphical manner.

Example :
- import matplotlib.pyplot as plt
- plt.plot([1, 2, 3, 4])
- plt.ylabel('some numbers')
- plt.show()
- Point Plotting : Can be useful for plotting certain
points.

plt.plot([1, 2, 3, 4], [1, 4, 9, 16], 'ro')


plt.axis([0, 6, 0, 20])
plt.show()
- Bar Graphs in Python :
names = ['group_a', 'group_b', 'group_c']
values = [1, 10, 100]
plt.bar(names, values)
Data Visuals using Bokeh
- Bokeh is a data visualization library in Python that
provides high-performance interactive charts and
plots.
- E.g
# import modules
from bokeh.plotting import figure, output_notebook, show
# output to notebook
output_notebook()
# create figure
p = figure(plot_width = 400, plot_height = 400)
# add a circle renderer with
# size and color
p.circle([1, 2, 3, 4, 5], [4, 7, 1, 6, 3],
size = 10, color = "navy")
# show the results
show(p)
Bar Graphs

# import necessary modules


import pandas as pd
from bokeh.charts import Bar, output_notebook, show
# output to notebook
output_notebook()
# read data in dataframe
df = pd.read_csv(r"D://summertraining/datacamp/menu.csv")
# create bar
p = Bar(df, "Category", values = "Calories",
title = "Total Calories by Category",
legend = "top_right")
# show the results
show(p)
Statistical Thinking in
Python
- Applying Mathematical Functions of probabilty to
predict certain events.
- Distributions : Distributes random variable in a
graph shaping as a bell.
- Poisson Distribution : Predicts the occurence of
an event
from scipy.stats import poisson
import seaborn as sb
data_binom = poisson.rvs(mu=4, size=10000)
ax = sb.distplot(data_binom,
color='green')
ax.set(xlabel='Poisson', ylabel='Frequency')
- Hypothesis Testing : Hypothesis is a Python library
for creating unit tests which are simpler to write and
more powerful when run, finding edge cases in your
code you wouldn’t have thought to look for. It is stable,
powerful and easy to add to any existing test suite.
- Check for every possible value of error in a given
range / Set of values.
Learning with Scikit - Learn

- Scikit-learn provides a range of supervised and


unsupervised learning algorithms via a consistent
interface in Python.
- The library is built upon the SciPy (Scientific Python)
- Features many algorithms and pre defined codes like
k-neighbours.
- Used mostly in machine learning and scientific studies
and research.
Neural Networks

- In between the input units and output units are one or


more layers of hidden units, which, together, form the
majority of the artificial brain.
- Most neural networks are fully connected, which
means each hidden unit and each output unit is
connected to every unit in the layers either side.
- They consist of different layers for analyzing and
learning data.
Deep Learing in Python

- TensorFlow is a Python library for fast


numerical computing created and
released by Google.
- It is a foundation library that can be used
to create Deep Learning models directly
or by using wrapper libraries that simplify
the process built on top of TensorFlow
Thank You

Вам также может понравиться