Вы находитесь на странице: 1из 9

Chapter 1.

Pandas Data frame basics

1) Loading your first data set


a) Since Pandas is not part of the Python standard library, we have to first tell Python to load
(import) the library.
i) import pandas
ii) Since we will be using Pandas functions many times throughout the book as well as your
own programming. It is common to give pandas the alias pd . The above code will be the
same as below:

(1)
b) read_csv function
i) to load csv data
ii) by default, the read_csv function will read a comma separated
iii) we can use the sep parameter and indicate a tab with \t
iv) df = pandas.read_csv(’../data/gapminder.tsv’, sep=’\t’)
c) df.head()
i) we use the head function so Python only shows us the first 5 rows
d) shape
i) Every DataFrame object has a shape attribute that will give us the number of rows and
columns of the DataFrame.

ii)
iii) 1704 rows and 6 columns.
e) # get column names
i) print(df.columns)
(1) Index([’country’, ’continent’, ’year’, ’lifeExp’, ’pop’, ’gdpPer
f) What is the type of the column names?
i) The Pandas DataFrame object is similar to other languages that have a DataFrame-like
object (e.g., Julia and R) Each column (Series) has to be the same type, whereas, each row
can contain mixed types.

ii)
iii)

iv)
2) Looking at columns, rows, and cells
a) Subsetting columns
i) If we wanted multiple columns we can specify them a few ways: by names, positions, or
ranges
ii) Subsetting columns by name
(1)
(2) When subsetting a single column, you can use dot notation and call the column name
attribute directly.

(a)
(3) In order to specify multiple columns by the column name

(a)
iii) Subsetting columns by index position
(1)
iv) Subsetting columns by range

(1)
b) Subsetting rows
i) # get the first row

(1)
ii) # get the 100th row

(1)
iii) # get the last row
(1)
(2) Or simply use the tail method to return the last 1 row, instead of the default 5.

(a)
(3) Subsetting multiple rows
(a)

(4) Subset rows by row number - .iloc

(a)
(5) Subsetting rows with .ix (combination of .loc and .iloc )
(a) .ix allows us to subset by integers and labels. By default it will search for labels, and
if it cannot find the corresponding label, it will fall back to using integer indexing.

(b)
(6) Subsetting rows and columns

(a)
(7) Subsetting multiple rows and columns

(a)
(b)
3) Grouped and aggregated calculations
a) Grouped means
i) We accomplish grouped/aggregate computations by using the groupby method on
dataframes.
ii)

iii)
iv)
b) Grouped frequency counts
i)

ii)
4) Basic plot
a)

b)
Chapter -2 Pandas data structures
1. Creating your own data
a. Creating a Series
b. Creating a DataFrame
2. The Series
a. The Series is ndarray-like
i. series methods
b. Boolean subsetting Series
c. Operations are vectorized
i. Vectors of same length
ii. Vectors with integers (scalars)
iii. Vectors with different lengths
iv. Vectors with common index labels
3. The DataFrame
a. Boolean subsetting DataFrame
b. Operations are automatically aligned and vectorized
4. Making changes to Series and DataFrame s
a. Add additional columns
b. Directly change a column
5. Exporting and importing data
a. Pickle
i. Series
ii. DataFrame
iii. Reading pickel data
b. CSV
c. Excel
i. Series.
ii. DataFrame
d. Many data output types
e.

Вам также может понравиться