Вы находитесь на странице: 1из 113

FUN WITH

PYTHON

AGENDA

Using Python to Access Web Data


Using Databases with Python
Processing and Visualizing Data with Python

USING PYTHON TO ACCESS WEB DATA

Access Web Data

USING PYTHON TO ACCESS WEB DATA

Web Requests
Web Parser
Web Services

USING PYTHON TO ACCESS WEB DATA

Web Requests
Requests Library
pip install requests #install library

import requests
requests.get(http://www.facebook.com).text

USING PYTHON TO ACCESS WEB DATA

Web Requests
Make a Request
#GET Request

import requests
r = requests.get(http://www.facebook.com)
if r.status_code == 200:
print(Success)
Success

USING PYTHON TO ACCESS WEB DATA

Web Requests
Make a Request
#POST Request

import requests
r = requests.post('http://httpbin.org/post', data = {'key':'value'})
if r.status_code == 200:
print(Success)
Success

USING PYTHON TO ACCESS WEB DATA

Web Requests
Make a Request
#Other Types of Request

import requests
r = requests.put('http://httpbin.org/put', data = {'key':'value'})
r = requests.delete('http://httpbin.org/delete')
r = requests.head('http://httpbin.org/get')
r = requests.options('http://httpbin.org/get')

USING PYTHON TO ACCESS WEB DATA

Web Requests
Passing Parameters In URLs
#GET Request with parameter

import requests
r = requests.get(https://www.google.co.th/?hl=th)
if r.status_code == 200:
print(Success)

Success

USING PYTHON TO ACCESS WEB DATA

Web Requests
Passing Parameters In URLs
#GET Request with parameter
import requests
r = requests.get(https://www.google.co.th,params={hl:en})
if r.status_code == 200:
print(Success)

Success

USING PYTHON TO ACCESS WEB DATA

Web Requests
Passing Parameters In URLs
#POST Request with parameter
import requests
r = requests.post("https://m.facebook.com",data={"key":"value"})
if r.status_code == 200:
print(Success)

Success

USING PYTHON TO ACCESS WEB DATA

Web Requests
Response Content
#Text Response
import requests

data = {email :.. , pass : }


r = requests.post(https://m.facebook.com,data=data)
if r.status_code == 200:
print(r.text)
'<?xml version="1.0" encoding="utf-8"?>\n<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML
Mobile 1.0//EN" "http://www.wapforum.org/DTD/xhtml-mobile10.dtd"><html xmlns="http://
www.w3.org/1999/xhtml"><head><title>Facebook</title><meta name="referrer"
content="default" id="meta_referrer" /><style type=text/css>/*<!..

USING PYTHON TO ACCESS WEB DATA

Web Requests
Response Content
#Response encoding
import requests
r = requests.get('https://www.google.co.th/logos/doodles/2016/kingbhumibol-adulyadej-1927-2016-5148101410029568.2-hp.png')
r.encoding = tis-620'
if r.status_code == 200:
print(r.text)
'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage"
lang="th"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta
content="/logos/doodles/2016/king-bhumibol-adulyadej-1927-2016-5148101410029568.2hp.png" itemprop="image"><meta content=" ...

USING PYTHON TO ACCESS WEB DATA

Web Requests
Response Content
#Binary Response

import requests
r = requests.get('https://www.google.co.th/logos/doodles/2016/kingbhumibol-adulyadej-1927-2016-5148101410029568.2-hp.png')
if r.status_code == 200:
open(img.png,wb).write(r.content)

USING PYTHON TO ACCESS WEB DATA

Web Requests
Response Status Codes
#200 Response (OK)

import requests
r = requests.get('https://api.github.com/events')
if r.status_code == requests.codes.ok:
print(data[0]['actor'])

{'url': 'https://api.github.com/users/ShaolinSarg', 'display_login': 'ShaolinSarg', 'avatar_url': 'https://


avatars.githubusercontent.com/u/6948796?', 'id': 6948796, 'login': 'ShaolinSarg', 'gravatar_id': ''}

USING PYTHON TO ACCESS WEB DATA

Web Requests
Response Status Codes
#200 Response (OK)

import requests
r = requests.get('https://api.github.com/events')
print(r.status_code)

200

USING PYTHON TO ACCESS WEB DATA

Web Requests
Response Status Codes
#404

import requests
r = requests.get('https://api.github.com/events/404')
print(r.status_code)

404

USING PYTHON TO ACCESS WEB DATA

Web Requests
Response Headers
#404

import requests
r = requests.get('http://www.sanook.com')
print(r.headers)
print(r.headers[Date])
{'Content-Type': 'text/html; charset=UTF-8', 'Date': 'Tue, 08 Nov 2016 14:38:41 GMT', 'CacheControl': 'private, max-age=0', 'Age': '16', 'Content-Encoding': 'gzip', 'Content-Length': '38089',
'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'Accept-Ranges': 'bytes'}

Tue, 08 Nov 2016 14:38:41 GMT

USING PYTHON TO ACCESS WEB DATA

Web Requests
Timeouts
#404

import requests
r = requests.get(http://www.sanook.com',timeout=0.001)

ReadTimeout: HTTPConnectionPool(host='github.com', port=80): Read timed out. (read


timeout=0.101)

USING PYTHON TO ACCESS WEB DATA

Web Requests
Authentication
#Basic Authentication

import requests
r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
print(r.status_code)

200

USING PYTHON TO ACCESS WEB DATA

Web Requests
read more : http://docs.python-requests.org/en/master/

USING PYTHON TO ACCESS WEB DATA

Web Requests
Quiz#1 : Tag Monitoring

1. Get webpage : http://pantip.com/tags


2. Save to file every 5 minutes (time.sleep(300))
3. Use current date time as filename

(How to get current date time using Python?, find it on Google)

USING PYTHON TO ACCESS WEB DATA

Web Parser
HTML Parser : beautifulsoup
pip install beautifulsoup4 #install library

from bs4 import BeautifulSoup

soup = BeautifulSoup(open(file.html),"html.parser") #parse from file


soup = BeautifulSoup(<html>data</html>,"html.parser") #parse from
text

USING PYTHON TO ACCESS WEB DATA

Web Parser
Parse a document
from bs4 import BeautifulSoup
soup = BeautifulSoup(<html>data</html>,"html.parser")
print(soup)

<html>data</html>

USING PYTHON TO ACCESS WEB DATA

Web Parser
Parse a document
#Navigating using tag names
from bs4 import BeautifulSoup

html_doc = """<html><head><title>The Dormouse's story</title></


head><body><p class="title"><b>The Dormouse's story</b></p></
body>
soup = BeautifulSoup(html_doc,"html.parser")
soup.head
soup.title
soup.body.p

USING PYTHON TO ACCESS WEB DATA

Web Parser
<head><title>The Dormouse's story</title></head>
<title>The Dormouse's story</title>
<p class="title"><b>The Dormouse's story</b></p>

USING PYTHON TO ACCESS WEB DATA

Web Parser
Parse a document
#Access string
from bs4 import BeautifulSoup

html_doc = ""<h1>hello</h1>
soup = BeautifulSoup(html_doc,"html.parser")
print(soup.h1.string)
hello

USING PYTHON TO ACCESS WEB DATA

Web Parser
Parse a document
#Access attribute
from bs4 import BeautifulSoup

html_doc = <a href="http://example.com/elsie" >Elsie</a>


soup = BeautifulSoup(html_doc,"html.parser")
print(soup.a[href])

http://example.com/elsie

USING PYTHON TO ACCESS WEB DATA

Web Parser
Parse a document
#Get all text in the page
from bs4 import BeautifulSoup

html_doc = """<html><head><title>The Dormouse's story</title></


head><body><p class="title"><b>The Dormouse's story</b></p></
body>
soup = BeautifulSoup(html_doc,"html.parser")
print(soup.get_text)
<bound method Tag.get_text of <html><head><title>The Dormouse's story</title></
head><body><p class="title"><b>The Dormouse's story</b></p></body></html>>

USING PYTHON TO ACCESS WEB DATA

Web Parser
Parse a document
# find_all()
from bs4 import BeautifulSoup

html_doc = """<a href="http://example.com/elsie" class="sister"


id="link1">Elsie</a>,<a href="http://example.com/lacie" class="sister"
id="link2">Lacie</a> and <a href="http://example.com/tillie"
class="sister" id="link3">Tillie</a>;
soup = BeautifulSoup(html_doc,"html.parser")
for a in soup.find_all(a):
print(a)

USING PYTHON TO ACCESS WEB DATA

Web Parser
Parse a document
<a class="sister" href="http://example.com/elsie"
id="link1">Elsie</a>
<a class="sister" href="http://example.com/lacie"
id="link2">Lacie</a>

USING PYTHON TO ACCESS WEB DATA

Web Parser
Parse a document

#find_all()
soup.find_all(id='link2')

soup.find_all(href=re.compile("elsie"))

soup.find_all(id=True)

data_soup.find_all(attrs={"data-foo": value"})

soup.find_all("a", class_="sister")

soup.find_all("a", recursive=False)
soup.p.find_all(a", recursive=False)

USING PYTHON TO ACCESS WEB DATA

Web Parser
Parse a document

re.compile(..)
<a href=http://192.x.x.x class=c1>hello</a>
<a href=https://192.x.x.x class=c1>hello</a>
<a href=https://www.com class=c1>hello</a>
find_all(href=re.compile((https|http)://[0-9\.]))

https://docs.python.org/2/howto/regex.html

USING PYTHON TO ACCESS WEB DATA

Web Parser
Parse a document
read more : https://www.crummy.com/software/BeautifulSoup/
bs4/doc/

USING PYTHON TO ACCESS WEB DATA

Web Parser
Quiz#2 : Tag Extraction

1. Get webpage : http://pantip.com/tags


2. Extract tag name, tag link, number of topic in
first 10 pages
3. save to file as this format
tag name, tag link, number of topic, current datetime
4. Run every 5 minutes

USING PYTHON TO ACCESS WEB DATA

Web Parser
JSON Parser : json
built-in function

import json

json_doc = json.loads({key : value})

USING PYTHON TO ACCESS WEB DATA

Web Parser
JSON Parser : json
#JSON string

json_doc = {employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]}

USING PYTHON TO ACCESS WEB DATA

Web Parser
JSON Parser : json
#Parse string to object

import json
json_obj = json.loads(json_doc)
print(json_obj)

{'employees': [{'firstName': 'John', 'lastName': 'Doe'}, {'firstName': 'Anna', 'lastName': 'Smith'},


{'firstName': 'Peter', 'lastName': 'Jones'}]}

USING PYTHON TO ACCESS WEB DATA

Web Parser
JSON Parser : json
#Access json object

import json
json_obj = json.loads(json_doc)
print(json_obj[employees][0][firstName])
print(json_obj[employees][0][lastName])

John
Doe

USING PYTHON TO ACCESS WEB DATA

Web Parser
JSON Parser : json
#Create json doc

import json
json_obj = {firstName : name,lastName : last} #Dictionary
print(json.dumps(json_obj,indent=1))

"firstName": "name",
"lastName": last"

USING PYTHON TO ACCESS WEB DATA

Web Parser
Quiz#3 : Post Monitoring

1. Register as Facebook Developer on


developers.facebook.com
2. Get information of last 10 hours post on the page
https://www.facebook.com/MorningNewsTV3
3. save to file as this format
post id, post datetime, #number like, current datetime

USING PYTHON TO ACCESS WEB DATA

Web Parser
Quiz#3 : Post Monitoring
URL

https://graph.facebook.com/v2.8/<PageID>?
fields=posts.limit(100)%7Blikes.limit(1).summary(true)
%2Ccreated_time%7D&access_token=

USING PYTHON TO ACCESS WEB DATA

Web Service

USING PYTHON TO ACCESS WEB DATA

Web Service
Web Service Type

USING PYTHON TO ACCESS WEB DATA

Web Parser
SOAP Example

USING PYTHON TO ACCESS WEB DATA

Web Parser
SOAP Request

USING PYTHON TO ACCESS WEB DATA

Web Parser
REST

USING PYTHON TO ACCESS WEB DATA

Web Parser
REST Request

USING PYTHON TO ACCESS WEB DATA

Web Parser
JSON Web Service

USING PYTHON TO ACCESS WEB DATA

Web Parser
Application

USING PYTHON TO ACCESS WEB DATA

Web Parser
JSON

{"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]}

read more : http://www.json.org/

list
dict
key
value

USING PYTHON TO ACCESS WEB DATA

Web Service
Create Simple Web Service
pip install Flask-API
from flask.ext.api import FlaskAPI
app = FlaskAPI(__name__)

@app.route('/example/')
def example():
return {'hello': 'world'}

app.run(debug=False,port=5555)

USING PYTHON TO ACCESS WEB DATA

Web Service
Create Simple Web Service
#receive input

from flask.ext.api import FlaskAPI


app = FlaskAPI(__name__)

@app.route(/hello/<name>/<lastName>')
def example(name,lastName):
return {'hello':name}

app.run(debug=False,port=5555)

USING PYTHON TO ACCESS WEB DATA

Web Parser
Quiz#4 : Tag Service

1. Build get TopTagInfo function using web service.


2. Input : Number of top topic
3. Output: tag name and number of top the topic in json
format.

USING PYTHON TO ACCESS WEB DATA

Web Parser
Quiz#4 : Top Tag Service

1. Build getTopTagInfo web service.


2. Input : Number of top topic
3. Output: tag name and number of top the topic in json
format.

USING DATABASES WITH PYTHON

Databases

USING DATABASES WITH PYTHON

USING DATABASES WITH PYTHON

Zero configuration
SQLite does not need to be Installed as there is no setup procedure to use it.
Server less
SQLite is not implemented as a separate server process. With SQLite, the process that wants to access the
database reads and writes directly from the database files on disk as there is no intermediary server process.
Stable Cross-Platform Database File
The SQLite file format is cross-platform. A database file written on one machine can be copied to and used
on a different machine with a different architecture.
Single Database File
An SQLite database is a single ordinary disk file that can be located anywhere in the directory hierarchy.
Compact
When optimized for size, the whole SQLite library with everything enabled is less than 400KB in size

USING DATABASES WITH PYTHON

SQLite
built-in library : sqlite3

import sqlite3
conn = sqlite3.connect('my.db')

USING DATABASES WITH PYTHON

SQLite
Workflow
1. Connect to db
2. Get cursor
3. Execute command
4. Commit (insert / update/delete) / Fetch result (select)
5. Close database

USING DATABASES WITH PYTHON

SQLite
Workflow Example
import sqlite3
conn = sqlite3.connect(example.db') # connect db
c = conn.cursor() # get cursor
# execute1
c.execute('''CREATE TABLE stocks
(date text, trans text, symbol text, qty real, price real)''')
# execute2
c.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")
conn.commit() # commit
conn.close() # close

USING DATABASES WITH PYTHON

SQLite
Data Type

USING DATABASES WITH PYTHON

Database Storage
import sqlite3
conn = sqlite3.connect(example.db') #store in disk
conn = sqlite3.connect(:memory:) #store in memory

USING DATABASES WITH PYTHON

Execute
#execute

import sqlite3
conn = sqlite3.connect(example.db')
c = conn.cursor()
t = ('RHAT',)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)

USING DATABASES WITH PYTHON

Execute
#executemany

import sqlite3
conn = sqlite3.connect(example.db')
c = conn.cursor()
purchases = [('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
('2006-04-05', 'BUY', 'MSFT', 1000, 72.00),
('2006-04-06', 'SELL', 'IBM', 500, 53.00),]
c.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)

USING DATABASES WITH PYTHON

fetch
#fetchaone

import sqlite3
conn = sqlite3.connect(example.db')
c = conn.cursor()
c.execute('SELECT * FROM stocks')
c.fetchone()

('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14)

USING DATABASES WITH PYTHON

fetch
#fetchall

import sqlite3
conn = sqlite3.connect(example.db')
c = conn.cursor()
c.execute('SELECT * FROM stocks')
for d in c.fetchall():
print(d)

[('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14),


('2006-03-28', 'BUY', 'IBM', 1000.0, 45.0),
('2006-04-05', 'BUY', 'MSFT', 1000.0, 72.0),

USING DATABASES WITH PYTHON

Context manager
import sqlite3
con = sqlite3.connect(":memory:")
con.execute("create table person (id integer primary key, firstname
varchar unique)")
#con.commit() is called automatically afterwards
with con:
con.execute("insert into person(firstname) values (?)", ("Joe"))

USING DATABASES WITH PYTHON

Read more :
https://docs.python.org/2/library/sqlite3.html
https://www.tutorialspoint.com/python/python_database_access.htm

USING DATABASES WITH PYTHON

Quiz#5 : Post DB

1. Register as Facebook Developer on


developers.facebook.com
2. Get information of last 10 hours post on the page
https://www.facebook.com/MorningNewsTV3
(post id, post datetime, #number like, current datetime)
3. design and create table to store posts

PROCESSING AND VISUALIZING DATA WITH PYTHON

Processing and Visualizing

PROCESSING AND VISUALIZING DATA WITH PYTHON

Processing : pandas
pip install pandas

high-performance, easy-to-use data structures and


data analysis tools

USING DATABASES WITH PYTHON

Pandas : Series
#create series with Array-like

import pandas as pd
from numpy.random import rand
s = pd.Series(rand(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)
a
b
c
d
e

0.690232
0.738294
0.153817
0.619822
0.4347

USING DATABASES WITH PYTHON

Pandas : Series
#create series with dictionary

import pandas as pd
from numpy.random import rand

d = {'a' : 0., 'b' : 1., 'c' : 2.}


s = pd.Series(d) #with dictionary
print(s)
a 0
b 1
c 2
dtype: float64

USING DATABASES WITH PYTHON

Pandas : Series
#create series with Scalar

import pandas as pd
from numpy.random import rand
s = pd.Series(5., index=['a', 'b', 'a', 'd', a']) #index can duplicate
print(s[a])
a
a
a

5
5
5

dtype: float64

USING DATABASES WITH PYTHON

Pandas : Series
#access series data

import pandas as pd
from numpy.random import rand
s = pd.Series(5., index=['a', 'b', 'a', 'd', a']) #index can duplicate
print(s[0])
print(s[:3])
5.0
a 5
b 5
a 5
dtype: float64

USING DATABASES WITH PYTHON

Pandas : Series
#series operations

import pandas as pd
from numpy.random import rand
import numpy as np
s = pd.Series(rand(10)) #index can duplicate
s = s + 2
s = s * s
s = np.exp(s)
print(s)

0
1
2
3
4
5
6
7
8
9

187.735606
691.660752
60.129741
595.438606
769.479456
397.052123
4691.926483
1427.593520
180.001824
410.994395

dtype: float64

USING DATABASES WITH PYTHON

Pandas : Series
#series filtering

import pandas as pd
from numpy.random import rand
import numpy as np
s = pd.Series(rand(10)) #index can duplicate
s = s[s > 0.1]
print(s)

1
2
3
6
7
8
9

0.708700
0.910090
0.380613
0.692324
0.508440
0.763977
0.470675

dtype: float64

USING DATABASES WITH PYTHON

Pandas : Series
#series incomplete data

import pandas as pd
from numpy.random import rand
import numpy as np
s1 = pd.Series(rand(10))
s2 = pd.Series(rand(8))
s = s1 + s2
print(s)

0
1
2
3
4
5
6
7
8
9

0.813747
1.373839
1.569716
1.624887
1.515665
0.526779
1.544327
0.740962
NaN
NaN

dtype: float64

USING DATABASES WITH PYTHON

Pandas : Series
#create series with Array-like

import pandas as pd
from numpy.random import rand
s = pd.Series(rand(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)
a
b
c
d
e

0.690232
0.738294
0.153817
0.619822
0.4347

USING DATABASES WITH PYTHON

Pandas : DataFrame
2-dimensional labeled data
structure with columns
of potentially different types

USING DATABASES WITH PYTHON

Pandas : DataFrame
#create dataframe with dict

d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),


'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df)

one two
a 1 1
b 2 2
c 3 3
d NaN 4

USING DATABASES WITH PYTHON

Pandas : DataFrame
#create dataframe with dict list

d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}

df = pd.DataFrame(d)
print(df)

one
0 1
1 2
2 3
3 4

two
4
3
2
1

USING DATABASES WITH PYTHON

Pandas : DataFrame
#access dataframe column

d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}

df = pd.DataFrame(d)
print(df[one])

0
1
2
3

1
2
3
4

Name: one, dtype: float64

USING DATABASES WITH PYTHON

Pandas : DataFrame
#access dataframe row

d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}

df = pd.DataFrame(d)
print(df.iloc[:3])

one
0 1
1 2
2 3

two
4
3
2

USING DATABASES WITH PYTHON

Pandas : DataFrame
#add new column

d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
df['three'] = [1,2,3,2]
print(df)

0
1
2
3

one
1
2
3
4

two
4
3
2
1

three
1
2
3
2

USING DATABASES WITH PYTHON

Pandas : DataFrame
#show data : head() and tail()

d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
df['three'] = [1,2,3,2]
print(df.head())
print(df.tail())

0
1
2
3

one
1
2
3
4

two
4
3
2
1

three
1
2
3
2

USING DATABASES WITH PYTHON

Pandas : DataFrame
#dataframe summary

d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df.describe())

USING DATABASES WITH PYTHON

Pandas : DataFrame
#dataframe function

d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df.mean())
one 2.5
two 2.5
dtype: float64

USING DATABASES WITH PYTHON

Pandas : DataFrame
#dataframe function

d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df.corr()) #calculate correlation
one two
one 1 -1
two -1 1

USING DATABASES WITH PYTHON

Pandas : DataFrame
#dataframe filtering

d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df[(df[one] > 1) & (df[one] < 3)] )
one two
1

USING DATABASES WITH PYTHON

Pandas : DataFrame
#dataframe filtering with isin

d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df[df[one].isin([2,4])] )
one two
1
3

2
4

3
1

USING DATABASES WITH PYTHON

Pandas : DataFrame
#dataframe with row data

d = [ [1., 2., 3., 4.], [4., 3., 2., 1.]]


df = pd.DataFrame(d)
df.columns = ["one","two","three","four"]
print(df)

0
1

one two three four


1 2
3 4
4 3
2 1

USING DATABASES WITH PYTHON

Pandas : DataFrame
#dataframe sort values

d = [ [2., 1., 3., 4.], [1., 3., 2., 4.]]


df = pd.DataFrame(d)
df.columns = ["one","two","three","four"]
df = df.sort_values([one,two],ascending=[1,0])
print(df)
one two three four
0 2 1
3 4
1 1 3
2 4

USING DATABASES WITH PYTHON

Pandas : DataFrame
#dataframe from csv file

file.csv

df = pd.read_csv(file.csv)
print(df)

one,two,three
1,2,3
1,2,3
1,2,3

one two three


0
1
2

1
1
1

2
2
2

3
3
3

USING DATABASES WITH PYTHON

Pandas : DataFrame
#dataframe from csv file, without header.

file.csv

df = pd.read_csv(file.csv,header=-1)
print(df)

1,2,3
1,2,3
1,2,3

0 1 2
0
1
2

1
1
1

2
2
2

3
3
3

USING DATABASES WITH PYTHON

Pandas : DataFrame

USING DATABASES WITH PYTHON

Pandas : DataFrame
#dataframe from html, need to install lxml first (pip install lxml)

df = pd.read_html(https://simple.wikipedia.org/wiki/
List_of_U.S._states)

print(df[0])

Abbreviation

State Name

AL

Alabama

AK

Alaska

AZ

Arizona

Capital

Became a State

Montgomery December 14, 1819


Juneau January 3, 1959
Phoenix February 14, 1912

USING DATABASES WITH PYTHON

Quiz#6 : Data Exploration

1. Goto https://archive.ics.uci.edu/ml/datasets/Adult
to read data description
2. Parse data into pandas using read_csv() and set
columns name
3. Explore data to answer following questions,
- find number of person in each education level.
- find correlation and covariance between continue
fields
- Avg age of United-States population where income
>50K.

USING DATABASES WITH PYTHON

Quiz#6 : Data Exploration

df[3].value_counts()

PROCESSING AND VISUALIZING DATA WITH PYTHON

Visualizing : seaborn
pip install seaborn

visualization library based on matplotlib

PROCESSING AND VISUALIZING DATA WITH PYTHON

Visualizing : seaborn
seaborn : set inline plot for jupyter
%matplotlib inline
import numpy as np
import seaborn as sns

# Generate some sequential data


x = np.array(list("ABCDEFGHI"))
y1 = np.arange(1, 10)
sns.barplot(x, y1)

PROCESSING AND VISUALIZING DATA WITH PYTHON

Visualizing : seaborn
seaborn : plot result

PROCESSING AND VISUALIZING DATA WITH PYTHON

Visualizing : seaborn
seaborn : set layout
%matplotlib inline
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
f,ax = plt.subplots(1,1,figsize=(10, 10))
sns.barplot(x=[1,2,3,4,5],y=[3,2,3,4,2])

PROCESSING AND VISUALIZING DATA WITH PYTHON

Visualizing : seaborn
seaborn : set layout

PROCESSING AND VISUALIZING DATA WITH PYTHON

Visualizing : seaborn
seaborn : set layout
%matplotlib inline
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
f,ax = plt.subplots(2,2,figsize=(10, 10))
sns.barplot(x=[1,2,3,4,5],y=[3,2,3,4,2],ax=ax[0,0])
sns.distplot([3,2,3,4,2],ax=ax[0,1])

PROCESSING AND VISUALIZING DATA WITH PYTHON

Visualizing : seaborn
seaborn : set layout

PROCESSING AND VISUALIZING DATA WITH PYTHON

Visualizing : seaborn
seaborn : axis setting
%matplotlib inline
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
f,ax = plt.subplots(figsize=(10, 5))
sns.barplot(x=[1,2,3,4,5],y=[3,2,3,4,2])
ax.set_xlabel("number")
ax.set_ylabel("value")

PROCESSING AND VISUALIZING DATA WITH PYTHON

Visualizing : seaborn
seaborn : axis setting

PROCESSING AND VISUALIZING DATA WITH PYTHON

Visualizing : seaborn
seaborn : with pandas dataframe
%matplotlib inline
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

d = {'x' : [1., 2., 3., 4.], 'y' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
f,ax = plt.subplots(figsize=(10, 5))
sns.barplot(x=x,y=y,data=df)

PROCESSING AND VISUALIZING DATA WITH PYTHON

Visualizing : seaborn
seaborn : with pandas dataframe

PROCESSING AND VISUALIZING DATA WITH PYTHON

Visualizing : seaborn
seaborn : plot types

http://seaborn.pydata.org/examples/index.html

USING DATABASES WITH PYTHON

Quiz#7 : Adult Plot

1. Goto https://archive.ics.uci.edu/ml/datasets/Adult
to read data description
2. Parse data into pandas using read_csv() and set
columns name

3. Plot five charts.

Вам также может понравиться