Вы находитесь на странице: 1из 298

Python for Econometrics

Lecturer: Fabian H. C. Raters

Institute: Econometrics, University of Goettingen

Version: October 08, 2018

© 2018 PyEcon.org. All rights reserved. Python is a trademark of the PSF.


Learning Python for econometrics 2
Essential
concepts
Getting started Welcome to this course and to the world of Python!
Procedural
programming
Object-orientation
Learning objectives of this course:
Numerical
programming
NumPy package Python: Roughly half the course is about Python.
NumPy array
Linear Algebra
for : You will learn tools and methods.
Data formats and
handling Econometrics:
Pandas
Series
Statistics: Numerical programming in Python.
DataFrame applied to: We will use it on examples.
Import/Export data
Economics: In an economic context.
Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Learning Python for econometrics 3
Essential
concepts
Getting started Knowledge after completing this course:
Procedural
programming
Object-orientation
You have acquired a basic understanding of programming in general
Numerical
programming
with Python and a special knowledge of working with standard
NumPy package numerical packages.
NumPy array
Linear Algebra
You are able to study Python in depth and absorb new knowledge
Data formats and
handling for your scientific work with Python.
Pandas
Series
You know the capabilities and further possibilities to use Python
DataFrame
in econometrics.
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Learning Python for econometrics 4
Essential
concepts
Getting started What you should not expect from this course:
Procedural
programming
Object-orientation
A guide how to install or maintain an application.
Numerical
programming An introduction to programming for beginners.
NumPy package
NumPy array Non-scientific, general purpose programming (beyond the language
Linear Algebra
essentials).
Data formats and
handling
Pandas
Introduction to professional development tools.
Series
DataFrame
Few content and less effort...
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Course organisation 5
Essential
concepts
Getting started This course can be seen as an applied lecture:
Procedural
programming
Object-orientation Lecture:
Numerical We try to explain the partly theoretical knowledge on Python by simple,
programming
NumPy package easy to understand examples. You can learn the subtleties by reading
NumPy array
Linear Algebra
good literature.
Data formats and Exercises:
handling
Pandas Digital work sheets in the form of Jupyter notebooks with applied
Series
DataFrame
tasks are available for each chapter. For all exercises there are sample
Import/Export data solutions available in separate notebooks.
Visual
illustrations Self-tests:
Matplotlib
Figures and subplots
At the end of each of the five chapters there are typical exam questions.
Plot types and styles
Pandas visualization
Written exam:
Applications There will be a final exam. This will be a pure multiple choice exam:
Time series
60 questions, 90 minutes.
Moving window
Financial applications
After the successful participation in the exam you will receive 6 ECTS.

© 2018 PyEcon.org
Literature 6
Essential
concepts
Getting started The programming language Python is already established and very well
Procedural
programming in trend for numerical applications. Some keywords:
Object-orientation

Numerical
programming
Data science,
NumPy package
NumPy array
Data wrangling,
Linear Algebra
Machine learning,
Data formats and
handling
Pandas
Numerical statistics,
Series
DataFrame
...
Import/Export data

Visual Recommended literature while following this course:


illustrations
Matplotlib Learning Python, 5th Edition by Mark Lutz,
Figures and subplots
Plot types and styles Python Crash Course by Eric Matthes,
Pandas visualization

Applications Python Data Science Handbook by Jake VanderPlas,


Time series
Moving window Python for Data Analysis, 2nd Edition by Wes McKinney,
Financial applications
Python for Finance by Yves Hilpisch.

© 2018 PyEcon.org
Software: Python 3 7
Essential
concepts
Getting started We are using Python 3. There was a big revision in the migration
Procedural
programming from Python 2 to version 3 and the new version is no longer backwards
Object-orientation
compatible to the old version.
Numerical
programming
NumPy package Python 3 running [command line]
NumPy array
Linear Algebra python3 --version
Data formats and
handling
Pandas ## Python 3.6.6
Series
DataFrame
Import/Export data The normal execution mode is that the Python interpreter processes
Visual
illustrations
the instructions in the background – in other numeric programming
Matplotlib languages such as R this is known as batch mode. It executes program
Figures and subplots
Plot types and styles
code that is usually located in a source code file.
Pandas visualization
The interpreter can also be started in an interactive mode. It is used
Applications
Time series for testing and analytical purposes in order to obtain fast results when
Moving window
Financial applications
performing simple applications.

© 2018 PyEcon.org
Software: IDEs 8
Essential
concepts
Getting started For everyday work with Python it would be extremely tedious to make
Procedural
programming all edits in interactive mode.
Object-orientation

Numerical
There are a number of excellent integrated development environments
programming
NumPy package
(IDEs) for Python, with two being emphasized here:
NumPy array
Linear Algebra
Jupyter (and IPython)
Data formats and
handling PyCharm (by IntelliJ)
Pandas
Series
DataFrame Of course, you can also use a simple text editor. However, you would
Import/Export data
probably miss the comfort of an IDE.
Visual
illustrations
Matplotlib
Installing, adding and maintaining Python is not trivial at the beginning.
Figures and subplots Therefore, as a beginner, you are well advised to download and install
Plot types and styles
Pandas visualization
the Python distribution Anaconda. Bonus: Many standard packages
Applications are supplied directly or you can post-install them conveniently.
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Following this course 9
Essential
concepts
Getting started In this course – in a numerical and analytical context – we use only
Procedural
programming Jupyter with the IPython kernel.
Object-orientation

Numerical
That is why we have combined
programming
NumPy package
NumPy array
1 all the code from the slides, and
Linear Algebra
2 all the exercises and solutions
Data formats and
handling
Pandas
into interactive Jupyter notebooks that you can use online without
Series
DataFrame having to install software locally on your computer. The GWDG has
Import/Export data
set up a cloud-based Jupyter-Hub for you.
Visual
illustrations
Matplotlib
You can access the working environment with your university credentials
Figures and subplots at
Plot types and styles
Pandas visualization https://jupyter.gwdg.de/
Applications
Time series
create a profile and get started right away – even using your smart
Moving window devices. However, so far you are still asked to upload the course
Financial applications
notebooks by yourself or rewrite the code from scratch.

© 2018 PyEcon.org
Notebook workflow 10
Essential
concepts
Getting started A Jupyter notebook is divided into individual, vertically arranged cells,
Procedural
programming which can be executed separately:
Object-orientation

Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications
The notebook approach is not novel and comes from the field of
computer algebra software.
© 2018 PyEcon.org
Notebook workflow 11
Essential
concepts
Getting started Actually, an interactive Python interpreter called IPython is started “in
Procedural
programming the core”.
Object-orientation

Numerical IPython running [command line]


programming
NumPy package ipython3 --version
NumPy array
Linear Algebra

Data formats and ## 6.5.0


handling
Pandas
Series Roughly speaking, this is a greatly enhanced version of the Python
DataFrame
Import/Export data
3 interpreter, which has numerous, convenient advantages over the
Visual “normal” interpreter in interactive mode, such as, e. g.,
illustrations
Matplotlib printing of return values,
Figures and subplots
Plot types and styles color highlighting, and
Pandas visualization

Applications
magic commands.
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Following this course 12
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
NumPy array
Linear Algebra Finally, we wish you a lot of fun and success with and in this course!
Data formats and
handling
Pandas Practice makes perfect!
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots Contribution and credits:
Plot types and styles
Pandas visualization
Fabian H. C. Raters
Applications
Time series Eike Manßen
Moving window
Financial applications
GWDG for the Jupyter-Hub

© 2018 PyEcon.org
Table of contents 13
Essential
concepts
Getting started
Procedural
programming
1 Essential concepts 4 Visual illustrations
Object-orientation
1.1 Getting started 4.1 Matplotlib
Numerical
programming 1.2 Procedural programming 4.2 Figures and subplots
NumPy package
NumPy array
1.3 Object-orientation 4.3 Plot types and styles
Linear Algebra
2 Numerical programming 4.4 Pandas visualization
Data formats and
handling 2.1 NumPy package 5 Applications
Pandas
Series
2.2 NumPy array 5.1 Time series
DataFrame 2.3 Linear Algebra 5.2 Moving window
Import/Export data

Visual 3 Data formats and handling 5.3 Financial applications


illustrations
Matplotlib 3.1 Pandas
Figures and subplots
Plot types and styles
3.2 Series
Pandas visualization 3.3 DataFrame
Applications 3.4 Import/Export data
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Chapter 1 14
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Essential concepts
Numerical
programming
NumPy package
NumPy array
1.1 Getting started
Linear Algebra

Data formats and


1.2 Procedural programming
handling
Pandas 1.3 Object-orientation
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Section 1.1 15
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Essential concepts
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I Getting started
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Motivation for learning Python 16
Essential
concepts
Getting started Python can be described as
Procedural
programming
Object-orientation
a dynamic, strongly typed, multi-paradigm and object-oriented
Numerical
programming
programming language,
NumPy package
NumPy array
for versatile, powerful, elegant and clear programming,
Linear Algebra
with a general, high-level, multi-platform application scope,
Data formats and
handling
Pandas
which is being used very successfully in the data science sector
Series and very much in trend.
DataFrame
Import/Export data

Visual
Moreover, Python is relatively easy to learn and its successful language
illustrations
Matplotlib
design supports novices to professional developers.
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
A short history of time 17
Essential
concepts
Getting started ... of the Python era:
Procedural
programming
Object-orientation The language was originally developed in 1991 by Guido van Rossum.
Numerical Its name was based on Monty Python’s Flying Circus. Its main identifi-
programming
NumPy package cation feature is the novel markup of code blocks – by indentation:
NumPy array
Linear Algebra
Indentation example
Data formats and
handling password = input("I am your bank. Password please: ")
Pandas
Series ## I am your bank. Password please: sparkasse
DataFrame
Import/Export data if password == "sparkasse":
Visual print("You successfully logged in!")
illustrations else:
Matplotlib
print("Fail. Will call the police!")
Figures and subplots
Plot types and styles
Pandas visualization ## You successfully logged in!
Applications
Time series
Moving window
This increases the readability of code and should at the same time
Financial applications encourage the programmer in programming neatly. Since the source
code can be written more compactly with Python, an increased efficiency
in daily work can be expected.
© 2018 PyEcon.org
A short history of time 18
Essential
concepts
Getting started Overview of the Python development by versions and dates:
Procedural
programming
Object-orientation

Numerical
programming
1990 1995 2000 2005 2010 2015 2020
NumPy package
NumPy array
Linear Algebra

Data formats and


handling
Pandas
Series
Python’s birthday Python 2.0 Python 3.0
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Python 2.7 lives forever Python 2.7 will die
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Python 3.6
Financial applications

© 2018 PyEcon.org
In comparison 19
Essential
concepts
Getting started Comparing the way Python works with common programming languages,
Procedural
programming we briefly discuss a selection of popular competitors:
Object-orientation

Numerical C/C++:
programming
NumPy package CPython is interpreted, not compiled.
NumPy array
Linear Algebra C/C++ are strongly static, complex languages.
Data formats and
handling Java:
Pandas
Series CPython is not compiled just-in-time.
DataFrame
Import/Export data Java has a C-type syntax.
Visual
illustrations
MATLAB
Matplotlib
Figures and subplots
In Python you primarily follow a scalar way of thinking, while in
Plot types and styles MATLAB you write matrix-based programs.
Pandas visualization

Applications In the numerical context, the matrix view and syntax are very
Time series
similar to those of MATLAB.
Moving window
Financial applications
MATLAB is partially compiled just-in-time.
Where CPython is the reference implementation – the “Original Python”,
© 2018 PyEcon.org
which is implemented in C itself.
In comparison 20
Essential
concepts
Getting started R
Procedural
programming
Object-orientation
In Python you primarily follow a scalar way of thinking, while in R
Numerical
you write vector-based programs.
programming
NumPy package R has a C-type syntax including additions to novel language con-
NumPy array
Linear Algebra
cepts.
Data formats and Stata
handling
Pandas Any comparison would inadequately describe the differences.
Series
DataFrame
Import/Export data Reference semantics
Visual
illustrations An extremely important difference between the first two languages,
Matplotlib
Figures and subplots
C/C++ and Java, as well as Python itself, and the last three languages
Plot types and styles is that they follow a call-by-reference semantic, while MATLAB, R and
Pandas visualization

Applications
Stata are call-by-copy.
Time series
Moving window Further specific differences and similarities to MATLAB and R will be
Financial applications
addressed in other parts of this course.

© 2018 PyEcon.org
Versatility – diversity 21
Essential
concepts
Getting started Python has become extremely popular:
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

Source: https://stackoverflow.blog/2017/09/06/incredible-growth-python/
© 2018 PyEcon.org
Versatility – diversity 22
Essential
concepts
Getting started So, you’re on the right track – because who wants to bet on the wrong
Procedural
programming hoRse?
Object-orientation

Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

Source: https://stackoverflow.blog/2017/09/06/incredible-growth-python/
© 2018 PyEcon.org
Versatility – diversity 23
Essential
concepts
Getting started Areas in which Python is used with great success:
Procedural
programming
Object-orientation Scripts,
Numerical Console applications,
programming
NumPy package GUI applications,
NumPy array
Linear Algebra
Game development,
Data formats and Website development, and
handling
Pandas
Numerical programming.
Series
DataFrame Places where Python is used:
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Yet another outline 24
Essential
concepts
Getting started In this course we will successively gain the following insights:
Procedural
programming
Object-orientation

Numerical
programming
1 General basics of the language.
NumPy package
NumPy array
Linear Algebra
2 Numerical programming and handling of data sets.
Data formats and
handling 3 Application to economic and analytical questions.
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Section 1.2 25
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Essential concepts
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I Procedural programming
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
The first program 26
Essential
concepts
Getting started Programs can be implemented very quickly – this is a pretty minimal
Procedural
programming example. You can write this command to a text file of your choice and
Object-orientation
run it directly on your system:
Numerical
programming
NumPy package Hello there
NumPy array
Linear Algebra
print("Hello there!")
Data formats and
handling ## Hello there!
Pandas
Series

Only one function print() (shown here as a keyword),


DataFrame
Import/Export data

Visual
illustrations Function displays argument (a string) on screen,
Matplotlib
Figures and subplots
Arguments are passed to the function in parentheses,
A string must be wrapped in " " or ’ ’,
Plot types and styles
Pandas visualization

Applications
No semicolon at the end.
Time series
Moving window
Financial applications

© 2018 PyEcon.org
User input 27
Essential
concepts
Getting started Let’s add a user input to the program:
Procedural
programming
Object-orientation Hello you
name = input("Please enter your name: ")
Numerical
programming
NumPy package
NumPy array
## Please enter your name: Angela Merkel
Linear Algebra
print("Hello " + name + "!")
Data formats and
handling
Pandas
## Hello Angela Merkel!
Series
DataFrame
Import/Export data

Visual
The function input() is used for interactive text input,
You can use the equal sign = to assign variables (here: name),
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Strings can be joined by the (overloaded) Operator +.
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Determining weekdays 28
Essential
concepts
Getting started We are now trying to find out on which weekday a person was born
Procedural
programming (Merkel’s birthday is 17-07-1954):
Object-orientation

Numerical
programming
Weekday of birth
NumPy package
from datetime import datetime
NumPy array
Linear Algebra answer = input("Your birthday (DD-MM-YYYY): ")
Data formats and
handling ## Your birthday (DD-MM-YYYY): 17-07-1954
Pandas
Series birthday = datetime.strptime(answer, "%d-%m-%Y")
DataFrame print("Your birthday was on a " + birthday.strftime("%A") + "!")
Import/Export data

Visual ## Your birthday was on a Saturday!


illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization
It is really easy to import functionality from other modules,
Applications Function strptime() is a method of class datetime,
Time series
Moving window Both methods, strptime() and strftime(), are used to convert
Financial applications
between strings and date time specifications.

© 2018 PyEcon.org
Time since birth 29
Essential
concepts
Getting started And how many days have passed since then (until Merkel’s 4th swearing-
Procedural
programming in as Federal Chancellor)?
Object-orientation

Numerical
programming
Age in days
NumPy package
someday = datetime.strptime("09-10-2018", "%d-%m-%Y")
NumPy array
Linear Algebra
print("You are " + str((someday - birthday).days) + " days old!")
Data formats and
handling ## You are 23460 days old!
Pandas
Series
DataFrame
Import/Export data You can create time differences, i. e. the operator - is overloaded,
Visual
illustrations The difference represents a new object, with its own attributes,
Matplotlib
Figures and subplots
such as days,
Plot types and styles
Pandas visualization
When using the overloaded operator +, you have to explicitly
Applications convert the number of days by means of str() into a string.
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Time since birth 30
Essential
concepts
Getting started How many years, weeks and days do you think that is?
Procedural
programming
Object-orientation Human readable age
Numerical
programming
from dateutil.relativedelta import relativedelta
NumPy package delta = relativedelta(someday, birthday)
NumPy array print(f"That’s {delta.years} years, {delta.months} months "
Linear Algebra
f"and {delta.days} days!!")
Data formats and
handling
Pandas
## That's 64 years, 2 months and 22 days!!
Series
DataFrame
Import/Export data

Visual
You don’t have to keep reinventing the wheel – a wealth of packages
illustrations and individual modules are freely available,
Matplotlib
Figures and subplots
A lowercase f before "..." provides convenient formatting – there
Plot types and styles
Pandas visualization are other options as well,
Applications
Time series
Two strings in sequence are implicitly joined together – "That"
Moving window "’s nice"!
Financial applications

© 2018 PyEcon.org
Getting help 31
Essential
concepts
Getting started When working with the interactive interpreter, i. e. in a notebook, you
Procedural
programming can quickly get useful information about Python objects:
Object-orientation

Numerical
programming
Help system
NumPy package
help(len)
NumPy array
Linear Algebra
## Help on built-in function len in module builtins:
Data formats and
handling ##
Pandas ## len(obj, /)
Series
## Return the number of items in a container.
DataFrame
Import/Export data

Visual Alternatively, e. g., for more complex problems, it is best to search


illustrations
Matplotlib
directly with your preferred internet search engine.
Figures and subplots
Plot types and styles
You can find neat solutions to conventional challenges in literature.
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Lexical structure 32
Essential
concepts
Getting started As with natural language, programming languages have a lexical struc-
Procedural
programming ture. Source code consists of the smallest possible, indivisible elements,
Object-orientation
the tokens. In Python you can find the following groups of elements:
Numerical
programming
NumPy package Literals
NumPy array
Linear Algebra
Variables
Data formats and
handling Operators
Pandas
Series Delimiters
DataFrame
Import/Export data Keywords
Visual
illustrations Comments
Matplotlib
Figures and subplots
Plot types and styles These terms give us a rock-solid foundation for exploring the heart of
Pandas visualization
a programming language.
Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Literals and variables 33
Essential
concepts
Getting started Basically, we distinguish between literals and variables:
Procedural
programming
Object-orientation Assigning variables with literals
Numerical
programming
myint = 7
NumPy package myfloat = 4.0
NumPy array myboat = "nice"
Linear Algebra
mybool = True
Data formats and
handling
myfloat = myboat
Pandas
Series
DataFrame
Import/Export data In this course, we will work with four different literals: integer (7),
Visual float (4.0), string ("nice") and boolean (True),
illustrations
Matplotlib Literals are assigned to variables at runtime,
Figures and subplots
Plot types and styles In Python the data type is derived from the literal and does not
Pandas visualization

Applications
have to be described explicitly,
Time series
Moving window
It is allowed to assign values of different data types to the same
Financial applications variable (name) sequentially,
If we don’t assign a literal to any variables, we forfeit it.
© 2018 PyEcon.org
Operators and delimiters 34
Essential
concepts
Getting started Most operators and delimiters will be introduced to you during this
Procedural
programming course. Here is an overview of the operators:
Object-orientation

Numerical
programming
Overview of operators
NumPy package
## + - * / ** //
## % << >> & | ^
NumPy array
Linear Algebra

Data formats and


## ~ and or not in not in
handling ## is is not < > != <>
Pandas ## == <= >=
Series
DataFrame
Import/Export data An overview of the delimiters follows:
Visual
illustrations Overview of delimiters
Matplotlib
Figures and subplots ## ( ) [ ] { }
Plot types and styles ## , : . ` = ;
Pandas visualization
## += -= *= /= **= //=
Applications ## %= <= |= ^= >>= <<=
Time series
Moving window
## ' " \ @ SPACE
Financial applications

© 2018 PyEcon.org
Arithmetic operators 35
Essential
concepts
Getting started All regular arithmetic operations involving numbers are possible:
Procedural
programming
Object-orientation Pocket calculator
Numerical 10 + 5
programming
NumPy package
100 - 20
NumPy array 8 / 2
Linear Algebra 4 * (10 + 20)
Data formats and 2**3
handling
Pandas ## 15
Series ## 80
DataFrame
## 4.0
Import/Export data
## 120
Visual
illustrations ## 8
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization The result of dividing two integers is a floating point number,
Applications
Time series
The conventional rules apply: Parentheses first, then multiplication
Moving window and division, etc.,
Financial applications
The operator ** is used for exponentiation.

© 2018 PyEcon.org
Keywords and comments 36
Essential
concepts
Getting started The programmer explains the structure of his/her program to the
Procedural
programming interpreter via a restricted set of short commands, the keywords:
Object-orientation

Numerical
programming
Overview of keywords
NumPy package ## and as assert break class continue
NumPy array
Linear Algebra
## def del elif else except False
## finally for from global if import
Data formats and
handling ## in is lambda None nonlocal not
Pandas ## or pass raise return True try
Series
## while with yield
DataFrame
Import/Export data

Visual There are two ways to make comments:


illustrations
Matplotlib
Figures and subplots Give some comments
Plot types and styles
Pandas visualization
# Set variable to something - or nothing?
something = None
Applications
Time series
"""
Moving window I am a docstring!
Financial applications A multiline string comment hybrid.
I will be useful for describing classes and methods.
"""
© 2018 PyEcon.org
Logical operations 37
Essential
concepts
Getting started We can create a handy table summarizing some results demonstrating
Procedural
programming the use of logical operators (and formatted strings and for-loops):
Object-orientation

Numerical
programming
Logical table
NumPy package
# Create table head
b a and b a or b not a\n"
NumPy array
Linear Algebra
print("a
Data formats and
"--------------------------------")
handling # Loop through the rows
Pandas for a in [False, True]:
Series
DataFrame
for b in [False, True]:
Import/Export data print(f"{a:1} {b:3} {a and b:6} {a or b:8} {not a:7}")
Visual ## a b a and b a or b not a
illustrations
Matplotlib
## --------------------------------
Figures and subplots ## 0 0 0 0 1
Plot types and styles ## 0 1 0 1 1
Pandas visualization
## 1 0 0 1 0
Applications
## 1 1 1 1 0
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Data types 38
Essential
concepts
Getting started Python offers the following basic data types, which we will use in this
Procedural
programming course:
Object-orientation

Numerical Data type Description


programming
NumPy package
int() Integers
NumPy array float() Floating point numbers
Linear Algebra

Data formats and


str() Strings, i.e. unicode (UTF-8) texts
handling
bool() Boolean, i.e. True or False
Pandas
Series list() List, an ordered array of objects
tuple()
DataFrame
Import/Export data
Tuple, an ordered, unmutable array of objects
Visual dict() Dictionary, an unordered, associative array of objects
set()
illustrations
Matplotlib Set, an unordered array/set of objects
Figures and subplots
Plot types and styles
None() Nothing, emptyness, the void..
Pandas visualization

Applications
Each data type has its own methods, that is, functions that are appli-
Time series cable specifically to an object of this type.
Moving window
Financial applications You will gradually get to know new and more complex data types or
object classes.
© 2018 PyEcon.org
Lists 39
Essential
concepts
Getting started A list is an ordered array of objects, accessible via an index:
Procedural
programming
Object-orientation Listing tech companies
stocks = ["Google", "Amazon", "Facebook", "Apple"]
Numerical
programming
NumPy package stocks[1]
NumPy array
stocks.append("Twitter")
stocks.insert(2, "Microsoft")
Linear Algebra

Data formats and


handling
stocks.sort()
Pandas
## ['Google', 'Amazon', 'Facebook', 'Apple']
## Amazon
Series
DataFrame
Import/Export data ## ['Google', 'Amazon', 'Facebook', 'Apple', 'Twitter']
Visual ## ['Google', 'Amazon', 'Microsoft', 'Facebook', 'Apple', 'Twitter']
illustrations ## ['Amazon', 'Apple', 'Facebook', 'Google', 'Microsoft', 'Twitter']
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization
The constructor for new lists is [ ],
Applications
Time series The first element has the index 0,
Moving window
Financial applications The data type list() possesses its own methods.

© 2018 PyEcon.org
Tuples 40
Essential
concepts
Getting started Tuples are immutable sequences related to lists that cannot be extended,
Procedural
programming for example. The drawbacks in flexibility are compensated by the
Object-orientation
advantages in speed and memory usage:
Numerical
programming
NumPy package Selecting elements in sequences
NumPy array
Linear Algebra lottery = (1, 8, 9, 12, 24, 28)
Data formats and len(lottery)
handling lottery[1:3]
lottery[:4]
Pandas
Series
DataFrame lottery[-1]
Import/Export data lottery[-2:]
Visual
illustrations ## (1, 8, 9, 12, 24, 28)
Matplotlib ## 6
Figures and subplots
## (8, 9)
Plot types and styles
Pandas visualization
## (1, 8, 9, 12)
Applications
## 28
Time series ## (24, 28)
Moving window
Financial applications
The same operations are also supported when using lists.

© 2018 PyEcon.org
Dictionaries 41
Essential
concepts
Getting started Dictionaries are associative collections of key-value pairs. The key must
Procedural
programming be immutable and unique:
Object-orientation

Numerical
programming
Internet slang dictionary
NumPy package
slang = {"imho": "in my humble opinion",
"lol": "laughing out loud",
NumPy array
Linear Algebra

Data formats and


"tl;dr": "too long; didn’t read"}
handling slang["lol"]
Pandas slang["gl&hl"] = "good luck & have fun"
Series
DataFrame
slang.keys()
Import/Export data slang.values()
Visual
illustrations
## {'imho': 'in...ion', 'lol': 'la...oud', 'tl;dr': 'to...ead'}
Matplotlib ## laughing out loud
Figures and subplots ## good luck & have fun
Plot types and styles
## dict_keys(['imho', 'lol', 'tl;dr', 'gl&hl'])
## dict_values([... & have fun'])
Pandas visualization

Applications
Time series
Moving window
Financial applications
The constructor for dict() is { } with :,
The pairs are unordered, iterable sequences.
© 2018 PyEcon.org
Sets 42
Essential
concepts
Getting started A set is an unordered collection of objects without duplicates:
Procedural
programming
Object-orientation Set operations
x = {"o", "n", "y", "t"}
Numerical
programming
NumPy package y = {"p", "h", "o", "n"}
NumPy array
x & y
x | y
Linear Algebra

Data formats and


handling
x - y
Pandas
Series
## {'y', 'o', 'n', 't'}
DataFrame ## {'h', 'o', 'p', 'n'}
Import/Export data ## {'o', 'n'}
Visual ## {'y', 'n', 'h', 'o', 'p', 't'}
illustrations
## {'y', 't'}
Matplotlib
Figures and subplots
Plot types and styles

The constructor for set() is { },


Pandas visualization

Applications
Time series
Defines its own operators that overload existing ones.
Moving window
Financial applications
Empty set via set(), because {} already creates dict().

© 2018 PyEcon.org
Control flow: Conditional statements 43
Essential
concepts
Getting started Python has only one kind of conditional statement – if-elif-else:
Procedural
programming
Object-orientation Computer data sizes
bytes = 100000000 / 8 # e.g. DSL 100000
Numerical
programming
NumPy package if bytes >= 1e9:
NumPy array
print(f"{bytes/1e9:6.2f} GByte")
elif bytes >= 1e6:
Linear Algebra

Data formats and


handling
print(f"{bytes/1e6:6.2f} MByte")
Pandas elif bytes >= 1e3:
Series print(f"{bytes/1e3:6.2f} KByte")
DataFrame
Import/Export data
else:
print(f"{bytes:6.2f} Byte")
Visual
illustrations
Matplotlib ## 12.50 MByte
Figures and subplots
Plot types and styles
Pandas visualization Control flow structures may be nested in any order:
Applications
Time series Nestings
Moving window
Financial applications if a > 1:
if b > 2:
pass # special keyword for empty blocks
© 2018 PyEcon.org
Control flow: for loop 44
Essential
concepts
Getting started In Python there exist two conventional program loops – for-in-else:
Procedural
programming
Object-orientation Total sum
Numerical
programming
numbers = [7, 3, 4, 5, 6, 15]
NumPy package
y = 0
NumPy array for i in numbers:
Linear Algebra
y += i
Data formats and print(f"The sum of ’numbers’ is {y}.")
handling
Pandas
Series
## The sum of 'numbers' is 40.
DataFrame
Import/Export data
Lists or other collections can also be created dynamically:
Visual
illustrations
Matplotlib Powers of 2
powers = [2 ** i for i in range(11)]
Figures and subplots
Plot types and styles
Pandas visualization teacher = ["***", "**", "*"]
Applications grades = {star: len(teacher) - len(star) + 1 for star in teacher}
## [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
Time series
Moving window
Financial applications ## {'***': 1, '**': 2, '*': 3}

© 2018 PyEcon.org
Control flow: continue and break 45
Essential
concepts
Getting started Loops can skip iterations (continue):
Procedural
programming
Object-orientation Continue the loop
Numerical
programming
for x in ["a", "b", "c"]:
NumPy package a = x.upper()
NumPy array continue
Linear Algebra
print(x)
Data formats and print(a)
handling
Pandas
Series ## C
DataFrame
Import/Export data
Or a loop can be aborted instantly (break):
Visual
illustrations
Matplotlib Breaking the habit
Figures and subplots
Plot types and styles y = 0
Pandas visualization for i in [7, 3, 4, "x", 6, 15]:
Applications if not isinstance(i, int):
Time series break
Moving window
y += i
Financial applications
print(f"The total sum is {y}.")

## The total sum is 14.


© 2018 PyEcon.org
Control flow: while loop 46
Essential
concepts
Getting started For loops where the number of iterations is not known at the beginning,
Procedural
programming you use while-else.
Object-orientation

Numerical
Have you already noticed the keyword else? Python only executes the
programming
NumPy package
branch if it was not terminated by break:
NumPy array
Linear Algebra Favorite lottery number
Data formats and
handling
import random
Pandas n = 0
Series favorite = 7
DataFrame
Import/Export data
while n < 100:
n += 1
Visual
illustrations draw = random.randint(1, 49) # e.g. German lottery
Matplotlib if draw == favorite:
Figures and subplots
print("Got my number! :)")
Plot types and styles
Pandas visualization
break
Applications
else:
Time series print("My favorite did not show up! :(")
Moving window print(f"I tried {n} times!")
Financial applications
## Got my number! :)
## I tried 15 times!
© 2018 PyEcon.org
Functions 47
Essential
concepts
Getting started Functions are defined using the keyword def. The structure of function
Procedural
programming signature and body is specified by indentation, too:
Object-orientation

Numerical
programming
Drawing lottery numbers
NumPy package
def draw_sample(n, first=1, last=49):
numbers = list(range(first, last + 1))
NumPy array
Linear Algebra

Data formats and


sample = []
handling for i in range(n):
Pandas ind = random.randint(0, len(numbers) - 1)
Series
DataFrame
sample.append(numbers.pop(ind))
Import/Export data sample.sort()
Visual
return sample
illustrations
Matplotlib draw_sample(6)
Figures and subplots draw_sample(6, 80, 100)
Plot types and styles
draw_sample(3, first=5)
Pandas visualization

Applications ## [26, 31, 33, 36, 41, 49]


Time series ## [83, 84, 85, 87, 91, 92]
Moving window
## [18, 25, 42]
Financial applications

© 2018 PyEcon.org
Functions 48
Essential
concepts
Getting started Functions are of type callable(), defined as closures, and can be
Procedural
programming created and used like other objects:
Object-orientation

Numerical
programming
Prime numbers
NumPy package def primes(n):
NumPy array
numbers = [2]
Linear Algebra
def is_prime(num):
Data formats and
handling for i in numbers:
Pandas if num % i == 0:
Series return False
DataFrame
Import/Export data
return True
if n == 2:
Visual
illustrations return numbers
Matplotlib for i in range(3, n + 1):
Figures and subplots
if is_prime(i):
Plot types and styles
Pandas visualization
numbers.append(i)
Applications
return numbers
Time series primes(50)
Moving window
Financial applications ## [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

Seems weird? We discuss namespaces in the next section.


© 2018 PyEcon.org
Section 1.3 49
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Essential concepts
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I Object-orientation
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Python is object-oriented 50
Essential
concepts
Getting started There are three widely known programming paradigms: procedural,
Procedural
programming functional and object-oriented programming (OOP). Python supports
Object-orientation
them all.
Numerical
programming
NumPy package You have learned how to handle predefined data types in Python.
NumPy array
Linear Algebra
Actually, we have already encountered classes and instances, take for
Data formats and
example dict().
handling
Pandas In this section you will learn the basics of dealing with (your own)
Series
DataFrame
classes:
Import/Export data
1 References
Visual
illustrations 2 Classes
Matplotlib
Figures and subplots 3 Instances
Plot types and styles
Pandas visualization 4 Main principles
Applications
Time series 5 Garbage collection
Moving window
Financial applications OOP is a wide field and challenging for beginners. Don’t get discouraged
and, if you find deficits in yourself, read the literature.

© 2018 PyEcon.org
References 51
Essential
concepts
Getting started When you assign a variable, a reference to an object is set:
Procedural
programming
Object-orientation Equal but not identical
Numerical
programming a = ["Star", "Trek"]
NumPy package b = ["Star", "Trek"]
NumPy array c = a
Linear Algebra
a == b
Data formats and
handling
a == c
Pandas a is b
Series a is c
DataFrame
Import/Export data ## ['Star', 'Trek']
Visual
## ['Star', 'Trek']
illustrations ## ['Star', 'Trek']
Matplotlib
## True
Figures and subplots
Plot types and styles
## True
Pandas visualization ## False
Applications ## True
Time series
Moving window
Financial applications
Two equal but not identical objects are created,
Variables a and c link to the same object.
© 2018 PyEcon.org
Copying objects 52
Essential
concepts
Getting started When we introduced lists, we initially did not mention that they are a
Procedural
programming first-class example of mutable objects:
Object-orientation

Numerical Collecting grades


programming
NumPy package grades = [1.7, 1.3, 2.7, 2.0]
NumPy array
result = grades.append(1.0)
Linear Algebra
result
Data formats and
handling grades
Pandas finals = grades
Series finals.remove(2.7)
DataFrame
Import/Export data
finals
grades
Visual
illustrations
## None
Matplotlib
Figures and subplots
## [1.7, 1.3, 2.7, 2.0, 1.0]
Plot types and styles ## [1.7, 1.3, 2.0, 1.0]
Pandas visualization ## [1.7, 1.3, 2.0, 1.0]
Applications
Time series
Moving window
Financial applications
Modifications can be in-place – the object itself is modified.
Changing an object that is referenced several times could cause
(un)intended consequences.
© 2018 PyEcon.org
Side effects 53
Essential
concepts
Getting started In Python, arguments are passed by assignment, i.e. call-by-reference:
Procedural
programming
Object-orientation Side effects
Numerical def last_element(x):
programming
NumPy package
return x.pop(0)
NumPy array a = stocks
Linear Algebra
last_element(a)
Data formats and a
handling
Pandas ## ['Amazon', 'Apple', 'Facebook', 'Google', 'Microsoft', 'Twitter']
Series
## Twitter
## ['Amazon', 'Apple', 'Facebook', 'Google', 'Microsoft']
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
There are side effects,
Plot types and styles
Pandas visualization
Referenced mutable objects might be modified,
Applications Referenced immutable objects might be copyied.
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Copying objects 54
Essential
concepts
Getting started We are able to make an exact copy of the object:
Procedural
programming
Object-orientation Copying
Numerical
programming def last_element(x):
NumPy package y = x.copy()
NumPy array
return y.pop(-1)
a = stocks
Linear Algebra

Data formats and


handling
last_element(a)
Pandas a
## ['Amazon', 'Apple', 'Facebook', 'Google', 'Microsoft']
Series
DataFrame
Import/Export data ## Microsoft
Visual ## ['Amazon', 'Apple', 'Facebook', 'Google', 'Microsoft']
illustrations
Matplotlib
Figures and subplots
Plot types and styles We receive a new object,
Pandas visualization

Applications The new object is not identical to the old one.


Time series
Moving window
Financial applications

© 2018 PyEcon.org
Deep and shallow copying 55
Essential
concepts
Getting started However, keep in mind that, in most cases, a method copy() will
Procedural
programming create shallow copys while only deep copying will duplicate also the
Object-orientation
contents of a mutable object with a complex structure:
Numerical
programming
NumPy package Cloning fast food
NumPy array
Linear Algebra fastfood = [["burgers", "hot dogs"], ["pizza", "pasta"]]
Data formats and italian = fastfood.copy()
handling italian.pop(0)
american = list(fastfood)
Pandas
Series
DataFrame american.pop(1)
Import/Export data american[0] = american[0].copy()
Visual fastfood[0][1] = "chicken wings"
illustrations
fastfood[1][0] = "risotto"
Matplotlib
Figures and subplots
italian
Plot types and styles american
Pandas visualization
## [['risotto', 'pasta']]
Applications
Time series
## [['burgers', 'hot dogs']]
Moving window
Financial applications
Both approaches, copy() and list(), create new list objects con-
taining new references to the original sub-lists. But for a deep copy,
© 2018 PyEcon.org
you have to recursively create duplicates of all its objects.
Classes 56
Essential
concepts
Getting started In Python everything is an object and more complex objects consist of
Procedural
programming several other objects.
Object-orientation

Numerical In the OOP, we create objects according to patterns. These kinds of


programming
NumPy package blueprints are called classes and are characterized by two categories of
NumPy array
Linear Algebra
elements:
Data formats and
handling
Attributes:
Pandas Variables that represent the properties of
Series
DataFrame an object, object attributes, or
Import/Export data

Visual a class, named class attributes.


illustrations
Matplotlib Methods:
Figures and subplots
Plot types and styles
Functions that are defined within a class:
Pandas visualization
(non-static) methods can access all attributes, while
Applications
Time series
static methods can only access class attributes.
Moving window
Financial applications
Every generated object is an instance of such a construction plan.

© 2018 PyEcon.org
Class definition 57
Essential
concepts
Getting started Specifically, we want to create “rectangle object” and define a separate
Procedural
programming Rectangle class for it:
Object-orientation

Numerical
programming
Rectangle class
NumPy package class Rectangle:
width = 0
NumPy array
Linear Algebra
height = 0
Data formats and
handling def area(self):
Pandas return self.width * self.height
Series
myrectangle = Rectangle()
DataFrame
Import/Export data myrectangle.width = 10
Visual
myrectangle.height = 20
illustrations print(myrectangle.area())
Matplotlib
Figures and subplots
## 200
Plot types and styles
Pandas visualization

Applications
Time series New classes are defined using the keyword class,
Moving window
Financial applications The variable self always refers to the instance itself.

© 2018 PyEcon.org
Class constructor 58
Essential
concepts
Getting started We add a constructor (method) __init__(), that is called to initialize
Procedural
programming an object of Rectangle:
Object-orientation

Numerical
programming
Rectangle class with constructor
NumPy package class Rectangle:
width = 0
NumPy array
Linear Algebra
height = 0
Data formats and
handling def __init__(self, width, height):
Pandas self.width = width
Series
self.height = height
DataFrame
Import/Export data def area(self):
Visual
return self.width * self.height
illustrations myrectangle = Rectangle(15, 30)
Matplotlib
print(myrectangle.area())
Figures and subplots
Plot types and styles
Pandas visualization
## 450
Applications
Time series In our example, we use the constructor to set the attributes. Methods
with names matching __fun__() have a special, standardized meaning
Moving window
Financial applications

in Python.

© 2018 PyEcon.org
Class inheritance 59
Essential
concepts
Getting started One of the most important concepts of OOP is inheritance. A class
Procedural
programming inherits all attributes and methods of its parent class and can add new
Object-orientation
or overwrite existing ones:
Numerical
programming
NumPy package Square inherits Rectangle
NumPy array
Linear Algebra class Square(Rectangle):
Data formats and def __init__(self, length):
handling super().__init__(length, length)
Pandas
Series
def diagonal(self):
DataFrame return (self.width**2 + self.height**2)**0.5
Import/Export data mysquare = Square(15)
Visual
illustrations print(f"Area: {mysquare.area()}")
Matplotlib print(f"Diagonal length: {mysquare.diagonal():7.4f}")
Figures and subplots
Plot types and styles ## Area: 225
Pandas visualization ## Diagonal length: 21.2132
Applications
Time series
Moving window The methods of the parent class, including the constructor, may be
Financial applications
referenced by super().

© 2018 PyEcon.org
Garbage collection 60
Essential
concepts
Getting started You do not have to worry about memory management in Python. The
Procedural
programming garbage collector will tidy up for you.
Object-orientation

Numerical If there are no more references to an object, it is automatically disposed


programming
NumPy package of by the garbage collector:
NumPy array
Linear Algebra
Garbage collection in action
Data formats and
handling class Dog:
Pandas
def __del__(self):
print("Woof! The dogcatcher got me! Entering the void.. :(")
Series
DataFrame
Import/Export data # My old dog on a leash
Visual mydog = Dog()
illustrations # A new dog is born
Matplotlib
Figures and subplots
newdog = Dog()
Plot types and styles # Using my leash for the new dog
Pandas visualization mydog = newdog
Applications
Time series ## Woof! The dogcatcher got me! Entering the void.. :(
Moving window
Financial applications
The destructor __del__() is executed as the last act before an object
gets deleted.
© 2018 PyEcon.org
Namespaces 61
Essential
concepts
Getting started We have already come into contact with namenspaces in Python many
Procedural
programming times. These are hierarchically linked layers in which the references to
Object-orientation
objects are defined. A rough distinction is made between
Numerical
programming
NumPy package the global namespace, and
NumPy array
Linear Algebra
the local namespace.
Data formats and
handling
Pandas The global namespace is the outermost environment whose references
Series
DataFrame
are known by all objects.
Import/Export data
On the other hand, locally defined references are only known in a local,
Visual
illustrations i.e. internal environment.
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Namespaces 62
Essential
concepts
Getting started Reference names from the local namespace mask the same names in
Procedural
programming an outer or in the global namespace:
Object-orientation

Numerical
programming
Namespaces
NumPy package
def multiplier(x):
x = 4 * x
NumPy array
Linear Algebra

Data formats and


return x
handling x = "OH"
Pandas multiplier("AH")
Series
DataFrame
multiplier(x)
Import/Export data x
Visual ## OH
illustrations
## AHAHAHAH
Matplotlib
Figures and subplots
## OHOHOHOH
Plot types and styles ## OH
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Namespaces 63
Essential
concepts
Getting started In fact, functions defined in Python are themselves objects that remem-
Procedural
programming ber and can access their own context where they were created. This
Object-orientation
concept comes from functional programming and is called closure:
Numerical
programming
NumPy package Closures
NumPy array
Linear Algebra
def gen_multiplier(a):
Data formats and
def fun(x):
handling return a * x
Pandas
return(fun)
multi1 = gen_multiplier(4)
Series
DataFrame
Import/Export data multi2 = gen_multiplier(5)
Visual multi1
illustrations multi1("EH")
Matplotlib
Figures and subplots
multi2("EH")
Plot types and styles ## <function gen_multiplier.<locals>.fun at 0x7f042eaa6048>
Pandas visualization
## EHEHEHEH
Applications ## EHEHEHEHEH
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Managing code 64
Essential
concepts
Getting started In order to provide, maintain and extend modular functionality with
Procedural
programming Python, its code containing components can be described hierarchically:
Object-orientation

Numerical
programming Packages
NumPy package
NumPy array
Linear Algebra Modules
Data formats and
handling
Pandas
Classes
Series
DataFrame Functions
Import/Export data

Visual The organization in Python is very straightforward and is based on the


illustrations
Matplotlib local namespaces mentioned before.
Figures and subplots
Plot types and styles When you download and use new packages, such as NumPy for numer-
Pandas visualization
ical programming in the next chapter, the packages are loaded and the
Applications
Time series
namespaces initialized.
Moving window
Financial applications
The development of custom packages is an advanced topic and not
essential for a reasonable code structure of small projects, as it is in
other programming languages.
© 2018 PyEcon.org
Modules 65
Essential
concepts
Getting started Modules provide classes and functions via namespaces. It is Python
Procedural
programming code that is executed in a local namespace and whose classes and
Object-orientation
functions you can import. Basically, there are the following alternatives
Numerical
programming how to import from an module:
NumPy package
NumPy array
Linear Algebra
Import statements
Data formats and import datetime as dt
handling
Pandas
from datetime import date, timedelta
Series from datetime import *
DataFrame dt.date.today()
Import/Export data
dt.timedelta.days
Visual date.today()
illustrations
Matplotlib
timedelta.days
Figures and subplots datetime.now()
Plot types and styles
Pandas visualization

Applications
In the latter case, all classes and functions, but no instances, are
Time series imported from the datetime namespace.
Moving window
Financial applications

© 2018 PyEcon.org
The Zen of Python 66
Essential
concepts
Getting started
Procedural The Zen of Python
programming
Object-orientation import this
Numerical
programming
## The Zen of Python, by Tim Peters
NumPy package ##
NumPy array ## Beautiful is better than ugly.
Linear Algebra
## Explicit is better than implicit.
Data formats and
handling
## Simple is better than complex.
Pandas
## Complex is better than complicated.
Series ## Flat is better than nested.
DataFrame
## Sparse is better than dense.
## Readability counts.
Import/Export data

Visual
illustrations
## Special cases aren't special enough to break the rules.
Matplotlib ## Although practicality beats purity.
Figures and subplots ## Errors should never pass silently.
Plot types and styles
Pandas visualization
## Unless explicitly silenced.
## In the face of ambiguity, refuse the temptation to guess.
Applications
Time series
## ...
Moving window
Financial applications

© 2018 PyEcon.org
Further topics 67
Essential
concepts
Getting started A selection of exciting topics that are among the advanced basics but
Procedural
programming are not covered in this lecture:
Object-orientation

Numerical
programming
Dynamic language concepts, such as duck typing,
NumPy package
NumPy array
Further, complex type classes, such as ChainMap or OrderedDict,
Linear Algebra
Iterators and generators in detail,
Data formats and
handling
Pandas
Exception handling, raising exceptions, catching errors,
Series
DataFrame
Debugging, introspection and annotations.
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Chapter 2 68
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical programming
Numerical
programming
NumPy package
NumPy array
2.1 NumPy package
Linear Algebra

Data formats and


2.2 NumPy array
handling
Pandas 2.3 Linear Algebra
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Section 2.1 69
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical programming
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I NumPy package
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
The NumPy package 70
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


handling
Pandas
Series The Numerical Python package NumPy provides efficient tools for sci-
DataFrame
Import/Export data entific computing and data analysis:
Visual
illustrations
np.array(): multidimensional array capable of doing fast and
Matplotlib efficient computations,
Figures and subplots
Plot types and styles Built-in mathematical functions on arrays without writing loops,
Pandas visualization

Applications Built-in linear algebra functions.


Time series
Moving window
Financial applications Import NumPy
import numpy as np

© 2018 PyEcon.org
Motivation 71
Essential
concepts
Getting started
Procedural Element-wise addition
programming
Object-orientation vec1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
Numerical vec2 = np.array(vec1)
programming print(vec1 + vec1)
NumPy package
NumPy array
Linear Algebra
## [1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Data formats and
handling
print(vec2 + vec2)
Pandas
Series ## [ 2 4 6 8 10 12 14 16 18]
DataFrame
Import/Export data
for i in range(len(vec1)):
Visual vec1[i] += vec1[i]
illustrations
Matplotlib
print(vec1)
Figures and subplots
Plot types and styles ## [2, 4, 6, 8, 10, 12, 14, 16, 18]
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Motivation 72
Essential
concepts
Getting started
Procedural Matrix multiplication
programming
Object-orientation mat1 = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Numerical mat2 = np.array(mat1)
programming
print(np.dot(mat2, mat2))
NumPy package
NumPy array
Linear Algebra ## [[ 30 36 42]
Data formats and
## [ 66 81 96]
handling ## [102 126 150]]
Pandas
Series
mat3 = np.zeros([3, 3])
DataFrame
Import/Export data for i in range(3):
Visual
for k in range(3):
illustrations for j in range(3):
Matplotlib
mat3[i][k] = mat3[i][k] + mat1[i][j] * mat1[j][k]
Figures and subplots
Plot types and styles
print(mat3)
Pandas visualization

Applications
## [[ 30. 36. 42.]
Time series ## [ 66. 81. 96.]
Moving window ## [102. 126. 150.]]
Financial applications

© 2018 PyEcon.org
Motivation 73
Essential
concepts
Getting started
Procedural Time comparison
programming
Object-orientation import time
Numerical mat1 = np.random.rand(50, 50)
programming mat2 = np.array(mat1)
t = time.time()
NumPy package
NumPy array
Linear Algebra mat3 = np.dot(mat2, mat2)
Data formats and nptime = time.time() - t
handling mat3 = np.zeros([50, 50])
Pandas
Series
t = time.time()
DataFrame for i in range(50):
Import/Export data for k in range(50):
Visual for j in range(50):
illustrations
mat3[i][k] = mat3[i][k] + mat1[i][j] * mat1[j][k]
pytime = time.time() - t
Matplotlib
Figures and subplots
Plot types and styles times = str(pytime / nptime)
Pandas visualization print("NumPy is " + times + " times faster!")
Applications
Time series ## NumPy is 35.166825796371846 times faster!
Moving window
Financial applications

© 2018 PyEcon.org
Section 2.2 74
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical programming
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I NumPy array
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Creating NumPy arrays 75
Essential
concepts
Getting started np.array(list): converts python list into NumPy arrays.
Procedural
programming array.ndim: returns dimension of the array.
Object-orientation
array.shape: return shape of the array as a list.
Numerical
programming
NumPy package Creation
NumPy array
Linear Algebra arr1 = [4, 8, 2]
Data formats and
arr1 = np.array(arr1)
handling arr2 = np.array([24.3, 0., 8.9, 4.4, 1.65, 45])
Pandas
arr3 = np.array([[4, 8, 5], [9, 3, 4], [1, 0, 6]])
Series
DataFrame
print(arr1.ndim)
Import/Export data

Visual
## 1
illustrations
Matplotlib print(arr3.shape)
Figures and subplots
Plot types and styles
Pandas visualization
## (3, 3)
Applications
Time series
Moving window From now on, the name array refers to an np.array().
Financial applications

© 2018 PyEcon.org
Array creation functions 76
Essential
concepts
Getting started np.arange(start, stop, step): array of values from start to
Procedural
programming stop.
Object-orientation
np.zeros((rows, columns)): array with all values set to 0.
Numerical
programming np.identity(dimension): identity matrix of a certain dimension.
NumPy package
NumPy array
Linear Algebra
Creation functions
Data formats and print(np.zeros((4, 3)))
handling
Pandas
Series
## [[0. 0. 0.]
DataFrame ## [0. 0. 0.]
Import/Export data ## [0. 0. 0.]
Visual ## [0. 0. 0.]]
illustrations
Matplotlib
print(np.arange(6))
Figures and subplots
Plot types and styles
Pandas visualization ## [0 1 2 3 4 5]
Applications
Time series
print(np.identity(3))
Moving window
Financial applications ## [[1. 0. 0.]
## [0. 1. 0.]
## [0. 0. 1.]]
© 2018 PyEcon.org
Array creation functions 77
Essential
concepts
Getting started array.linspace(start, stop, n): array of n evenly divided values
Procedural
programming from start to stop.
Object-orientation
array.full((row, column), k): array with all values set to k.
Numerical
programming
NumPy package Array creation
NumPy array
Linear Algebra print(np.linspace(0, 80, 5))
Data formats and
handling ## [ 0. 20. 40. 60. 80.]
Pandas
Series
DataFrame
print(np.full((5, 4), 7))
Import/Export data
## [[7 7 7 7]
Visual
illustrations ## [7 7 7 7]
Matplotlib ## [7 7 7 7]
Figures and subplots
## [7 7 7 7]
Plot types and styles
Pandas visualization ## [7 7 7 7]]
Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Array creation functions 78
Essential
concepts
Getting started np.random.rand(rows, columns): array of random floats between
Procedural
programming zero and one.
Object-orientation
np.rondom.randint(k, size=(rows, columns)): array of random
Numerical
programming integers between 0 and k-1.
NumPy package
NumPy array
Linear Algebra
Array of random numbers
Data formats and print(np.random.rand(3, 3))
handling
Pandas
Series
## [[0.65417053 0.63215654 0.72761157]
DataFrame ## [0.30757468 0.64874108 0.69997956]
Import/Export data
## [0.74054193 0.57131055 0.77555459]]
Visual
illustrations
print(np.random.randint(10, size=(5, 4)))
Matplotlib
Figures and subplots
Plot types and styles ## [[4 2 0 4]
Pandas visualization ## [3 1 0 9]
Applications ## [9 6 0 0]
Time series ## [3 8 1 9]
Moving window
## [9 7 6 7]]
Financial applications

© 2018 PyEcon.org
Copy arrays 79
Essential
concepts
Getting started
Procedural Reference
programming
Object-orientation print(arr3)
Numerical
programming ## [[4 8 5]
NumPy package ## [9 3 4]
NumPy array
Linear Algebra
## [1 0 6]]
Data formats and
handling
arr = arr3
Pandas arr[1, 1] = 777
Series print(arr3)
DataFrame

## [[ 4 8 5]
Import/Export data

Visual
illustrations
## [ 9 777 4]
Matplotlib ## [ 1 0 6]]
Figures and subplots
Plot types and styles arr3[1, 1] = 3
Pandas visualization

Applications
Time series
Moving window
call-by-reference
arr = arr3 binds arr to the existing arr3. They both refer to the
Financial applications

same object.
© 2018 PyEcon.org
Copy array 80
Essential
concepts
Getting started array.copy(): copy array without reference (call-by-value).
Procedural
programming
Object-orientation

Numerical Copy Reference


programming
NumPy package print(arr3) print(arr3)
NumPy array
Linear Algebra
## [[4 8 5] ## [[4 8 5]
Data formats and ## [9 3 4] ## [9 3 4]
handling
Pandas ## [1 0 6]] ## [1 0 6]]
Series
DataFrame arr = arr3.copy() arr = arr3
Import/Export data
arr[1, 1] = 777 arr[1, 1] = 777
Visual print(arr3) print(arr3)
illustrations
Matplotlib
Figures and subplots ## [[4 8 5] ## [[ 4 8 5]
Plot types and styles ## [9 3 4] ## [ 9 777 4]
Pandas visualization
## [1 0 6]] ## [ 1 0 6]]
Applications
Time series arr3[1, 1] = 3
Moving window
Financial applications

© 2018 PyEcon.org
Overview: array creation functions 81
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Function Description
Numerical
programming array Convert input array in NumPy array
NumPy package
NumPy array arange(start,stop,step) Creates array from given input
Linear Algebra
ones Creates array containing only ones
Data formats and
handling zeros Creates array containing only zeros
Pandas
Series
empty Allocating memory without specific values
DataFrame eye, identity Creates N x N identity matrix
Import/Export data

Visual
linspace Creats array of evenly divided values
illustrations full Creates array with values set to one number
Matplotlib
Figures and subplots random.rand Creates array of random floats
Plot types and styles
Pandas visualization
random.randint Creates array of random int
Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Data types of arrays 82
Essential
concepts
Getting started array.dtype: type of array,
Procedural
programming array.astype(np.type): manual typecast.
Object-orientation

Numerical
programming
Data types
NumPy package
print(arr1.dtype)
NumPy array
Linear Algebra
## int64
Data formats and
handling
Pandas print(arr2.dtype)
Series
DataFrame ## float64
Import/Export data

Visual arr1 = arr1 * 2.5


illustrations
Matplotlib
print(arr1.dtype)
Figures and subplots
Plot types and styles ## float64
Pandas visualization

Applications arr1 = (arr1 / 2.5).astype(np.int64)


Time series print(arr1.dtype)
Moving window

## int64
Financial applications

© 2018 PyEcon.org
Array operations 83
Essential
concepts
Getting started
Procedural
Element-wise operations
programming
Object-orientation
Calculation operators on NumPy arrays operate element-wise.
Numerical
programming
NumPy package
NumPy array
Element-wise operations
Linear Algebra
print(arr3)
Data formats and
handling
## [[4 8 5]
Pandas
Series ## [9 3 4]
DataFrame ## [1 0 6]]
Import/Export data

Visual print(arr3 + arr3)


illustrations
Matplotlib
Figures and subplots
## [[ 8 16 10]
Plot types and styles ## [18 6 8]
Pandas visualization ## [ 2 0 12]]
Applications
Time series print(arr3**2)
Moving window
Financial applications
## [[16 64 25]
## [81 9 16]
## [ 1 0 36]]
© 2018 PyEcon.org
Array operations 84
Essential
concepts
Getting started
Procedural Matrix multiplication
programming
Object-orientation
Operator * applied on arrays does not do the matrix multiplication.
Numerical
programming
NumPy package
NumPy array
Element-wise operations
print(arr3 * arr3)
Linear Algebra

Data formats and


handling
Pandas
## [[16 64 25]
Series ## [81 9 16]
DataFrame ## [ 1 0 36]]
Import/Export data

Visual arr = np.ones((3, 2))


illustrations
Matplotlib
print(arr)
Figures and subplots
Plot types and styles ## [[1. 1.]
Pandas visualization
## [1. 1.]
Applications ## [1. 1.]]
Time series
Moving window print(arr3 * arr) # not defined for element-wise multiplication
Financial applications
## ValueError: operands could not be broadcast together

© 2018 PyEcon.org
Slicing 85
Essential
concepts
Getting started array[start : stop : step]: Selecting a subset of the data.
Procedural
programming
Object-orientation Slicing in one dimension
Numerical
programming arr = np.arange(10)
NumPy package print(arr)
NumPy array
Linear Algebra
## [0 1 2 3 4 5 6 7 8 9]
Data formats and
handling
Pandas
print(arr[4])
Series
DataFrame ## 4
Import/Export data

Visual print(arr[3:7])
illustrations
Matplotlib
## [3 4 5 6]
Figures and subplots
Plot types and styles
Pandas visualization print(arr)
Applications
Time series
## [0 1 2 3 4 5 6 7 8 9]
Moving window
Financial applications

© 2018 PyEcon.org
Slicing 86
Essential
concepts
Getting started
Procedural Slicing in one dimension with steps
programming
Object-orientation print(arr[:7])
Numerical
programming ## [0 1 2 3 4 5 6]
NumPy package
NumPy array
Linear Algebra
print(arr[-3:])
Data formats and
handling ## [7 8 9]
Pandas
Series print(arr[::-1])
DataFrame
Import/Export data
## [9 8 7 6 5 4 3 2 1 0]
Visual
illustrations
Matplotlib
print(arr[::2])
Figures and subplots
Plot types and styles ## [0 2 4 6 8]
Pandas visualization

Applications print(arr[:5:-1])
Time series
Moving window ## [9 8 7 6]
Financial applications

© 2018 PyEcon.org
Slicing 87
Essential
concepts
Getting started
Procedural Slicing in higher dimensions
programming
Object-orientation
In n-dimensional arrays the element at each index is an
Numerical
programming (n-1)-dimensional array.
NumPy package
NumPy array
Linear Algebra Indexing in two dimensions
Data formats and
handling print(arr3)
Pandas
Series
## [[4 8 5]
## [9 3 4]
DataFrame
Import/Export data

Visual
## [1 0 6]]
illustrations
Matplotlib vec = arr3[1]
Figures and subplots
print(vec)
Plot types and styles
Pandas visualization
## [9 3 4]
Applications

print(arr3[1, 0])
Time series
Moving window
Financial applications
## 9

© 2018 PyEcon.org
Slicing 88
Essential
concepts
Getting started
Procedural Slicing in two dimensions
programming
Object-orientation print(arr3)
Numerical
programming ## [[4 8 5]
NumPy package
NumPy array
## [9 3 4]
Linear Algebra ## [1 0 6]]
Data formats and
handling print(arr3[0:2, 0:2])
Pandas
Series ## [[4 8]
DataFrame
Import/Export data
## [9 3]]
Visual
illustrations print(arr3[2:, :])
Matplotlib
Figures and subplots ## [[1 0 6]]
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Slicing 89
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

Figure: Python for Data Analysis (2017) on page 99

© 2018 PyEcon.org
Views on arrays 90
Essential
concepts
Getting started So far, selecting by index numbers or slicing belongs to basic indexing
Procedural
programming in NumPy. With basic indexing you get NO COPY of your data but a
Object-orientation
so-called view on the existing data set – a different perspective.
Numerical
programming A view on an array can be seen as a reference to a rectangular memory
NumPy package
NumPy array
area of its values. The view is intended to
Linear Algebra
edit a rectangular part of a matrix, e.g., a sub-matrix, a column,
Data formats and
handling or a single value,
Pandas
Series change the shape of the matrix or the arrangement of its elements,
DataFrame
Import/Export data
e.g., transpose or reshape a matrix,
Visual
illustrations
change the visual representation of values, e.g, to cast a float
Matplotlib array into an int array,
Figures and subplots
Plot types and styles map the values in other program areas.
Pandas visualization

Applications The crucial point here is that for efficiency reasons data arrays in your
Time series
working memory do not have to be copied again and again for simple
Moving window
Financial applications index operations, which would require an excessive additional effort
writing to the computer memory.

© 2018 PyEcon.org
Creating views implicitly 91
Essential
concepts
Getting started A view is created automatically when you do basic indexing such as
Procedural
programming slicing:
Object-orientation

Numerical
programming
Create a view by slicing
NumPy package
column = arr3[:, 1]
NumPy array
Linear Algebra
print(column)
Data formats and
handling ## [8 3 0]
Pandas
Series print(column.base)
DataFrame
Import/Export data
## [[4 8 5]
Visual
illustrations
## [9 3 4]
Matplotlib ## [1 0 6]]
Figures and subplots
Plot types and styles column[1] = 100
Pandas visualization
print(arr3)
Applications
Time series
## [[ 4 8 5]
## [ 9 100 4]
Moving window
Financial applications
## [ 1 0 6]]

© 2018 PyEcon.org
Creating views implicitly 92
Essential
concepts
Getting started
Procedural Create a view by slicing
programming
Object-orientation elem = column[1:2]
Numerical print(elem.base)
programming
NumPy package
NumPy array
## [[ 4 8 5]
Linear Algebra ## [ 9 100 4]
Data formats and
## [ 1 0 6]]
handling
Pandas elem[0] = 3
Series
print(arr3)
DataFrame
Import/Export data
## [[4 8 5]
Visual
illustrations ## [9 3 4]
Matplotlib ## [1 0 6]]
Figures and subplots
Plot types and styles
Pandas visualization

Applications The middle column is a view of the base array referenced by arr3,
Time series
Moving window Any changes to the values of a view directly affect the base data,
Financial applications
A view of a view is another view on the same base matrix.

© 2018 PyEcon.org
Obtaining views explicitly 93
Essential
concepts
Getting started In addition, an array contains methods and attributes that return a
Procedural
programming view of its data:
Object-orientation

Numerical Obtain a view


programming
NumPy package
arr3_t = arr3.T
NumPy array print(arr3_t)
Linear Algebra

Data formats and ## [[4 9 1]


handling
## [8 3 0]
Pandas
Series
## [5 4 6]]
DataFrame
Import/Export data print(arr3_t.flags.owndata)
Visual
illustrations ## False
Matplotlib
Figures and subplots
Plot types and styles
arr3_r = arr3.reshape(1, 9)
Pandas visualization print(arr3_r)
Applications
Time series
## [[4 8 5 9 3 4 1 0 6]]
Moving window
Financial applications print(arr3_t.flags.owndata)

## False
© 2018 PyEcon.org
Obtaining views explicitly 94
Essential
concepts
Getting started
Procedural Obtain a view
programming
Object-orientation arr3_v = arr3.view()
Numerical print(arr3_v.flags.owndata)
programming
NumPy package ## False
NumPy array
Linear Algebra

Data formats and


handling The transposed matrix is a predefined view that is available as an
Pandas
Series
attribute,
DataFrame
Import/Export data
Reshaping is also just another way of looking at the same set of
Visual data,
illustrations
Matplotlib By means of the method view() you create a view with an identical
Figures and subplots
Plot types and styles
representation.
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Fancy indexing 95
Essential
concepts
Getting started The behavior described above changes with advanced indexing, i. e., if
Procedural
programming at least one component of the index tuple is not a scalar index number
Object-orientation
or slice. The case of fancy indexing is described below:
Numerical
programming
NumPy package Advanced and basic indexing
NumPy array
Linear Algebra print(arr3)
Data formats and
handling ## [[4 8 5]
Pandas
## [9 3 4]
## [1 0 6]]
Series
DataFrame
Import/Export data

Visual
arr = arr3[[0, 2], [0, 2]]
illustrations print(arr)
Matplotlib
Figures and subplots
## [4 6]
Plot types and styles
Pandas visualization
print(arr.base)
Applications
Time series
Moving window
## None
Financial applications

© 2018 PyEcon.org
Fancy indexing 96
Essential
concepts
Getting started
Procedural Advanced and basic indexing
programming
Object-orientation arr = arr3[0:3:2, 0:3:2]
Numerical print(arr)
programming
NumPy package
## [[4 5]
NumPy array
Linear Algebra ## [1 6]]
Data formats and
handling print(arr.base)
Pandas
Series ## [[4 8 5]
DataFrame
Import/Export data
## [9 3 4]
## [1 0 6]]
Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Contrary to intuition, fancy indexing does not return a (2 × 2)-
Pandas visualization
matrix, but a vector of the matrix elements (0, 0) and (2, 2). This
Applications
Time series
is a complete copy – a new object and not a view to the original
Moving window matrix.
Financial applications
A submatrix (view) with the corner elements of the initial matrix
can be obtained with slicing.
© 2018 PyEcon.org
Conditional indexing 97
Essential
concepts
Getting started Filter arrays without using loops by conditional indexing.
Procedural
programming
Object-orientation Find and replace values in arrays, condition: smaller
Numerical
programming print(arr3)
NumPy package
NumPy array ## [[4 8 5]
Linear Algebra
## [9 3 4]
Data formats and ## [1 0 6]]
handling
Pandas
Series arr = arr3.copy()
DataFrame arr[arr < 5] = 0
Import/Export data
print(arr)
Visual
illustrations
## [[0 8 5]
Matplotlib
Figures and subplots
## [9 0 0]
Plot types and styles ## [0 0 6]]
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Conditional indexing 98
Essential
concepts
Getting started
Procedural Find and replace values in arrays, condition: equal
programming
Object-orientation print(arr3)
Numerical
programming ## [[4 8 5]
NumPy package
NumPy array
## [9 3 4]
Linear Algebra ## [1 0 6]]
Data formats and
handling arr = arr3.copy()
Pandas arr[arr == 4] = 100
Series
print(arr)
DataFrame
Import/Export data
## [[100 8 5]
Visual
illustrations ## [ 9 3 100]
Matplotlib ## [ 1 0 6]]
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Reshaping arrays 99
Essential
concepts
Getting started array.reshape((rows, columns)): reshaping existing array.
Procedural
programming array.resize((rows, columns)): changes array shape to rows x
Object-orientation
columns and fills new values with 0.
Numerical
programming
NumPy package
Reshape
NumPy array
Linear Algebra
arr = np.arange(15)
Data formats and
print(arr.reshape((3, 5)))
handling
Pandas ## [[ 0 1 2 3 4]
Series
## [ 5 6 7 8 9]
DataFrame
Import/Export data
## [10 11 12 13 14]]
Visual
illustrations arr.resize((3, 7))
Matplotlib print(arr)
Figures and subplots
Plot types and styles
## [[ 0 1 2 3 4 5 6]
Pandas visualization
## [ 7 8 9 10 11 12 13]
Applications
Time series
## [14 0 0 0 0 0 0]]
Moving window
Financial applications

© 2018 PyEcon.org
Adding and removing elements of arrays 100
Essential
concepts
Getting started np.append(array, value): appends value to the end of array.
Procedural
programming np.insert(array, index, value): inserts values before index.
Object-orientation
np.delete(array, index, axis): deletes row or column on index.
Numerical
programming
NumPy package Naming
NumPy array
Linear Algebra a = np.arange(5)
Data formats and a = np.append(a, 8)
handling a = np.insert(a, 3, 77)
Pandas
Series
print(a)
DataFrame
Import/Export data ## [ 0 1 2 77 3 4 8]
Visual
illustrations a.resize((3, 3))
Matplotlib
print(np.delete(a, 1, axis=0))
Figures and subplots
Plot types and styles
Pandas visualization ## [[0 1 2]
Applications
## [8 0 0]]
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Combining and splitting 101
Essential
concepts
Getting started np.concatenate((arr1, arr2), axis): join a sequence of arrays
Procedural
programming along an existing axis.
Object-orientation
np.split(array, n): split an array into multiple sub-arrays.
Numerical
programming np.hsplit(array, n): split an array into multiple sub-arrays horizon-
NumPy package
NumPy array
tally.
Linear Algebra

Data formats and Naming


handling
Pandas print(np.concatenate((a, np.arange(6).reshape(2, 3)), axis=0))
Series
DataFrame
## [[ 0 1 2]
## [77 3 4]
Import/Export data

Visual
illustrations
## [ 8 0 0]
Matplotlib ## [ 0 1 2]
Figures and subplots ## [ 3 4 5]]
Plot types and styles

print(np.split(np.arange(8), 4))
Pandas visualization

Applications
Time series
Moving window
## [array([0, 1]), array([2, 3]), array([4, 5]), array([6, 7])]
Financial applications

© 2018 PyEcon.org
Transposing array 102
Essential
concepts
Getting started array.T: transposed array (as a view).
Procedural
programming
Object-orientation Transpose
Numerical
programming print(arr3)
NumPy package
NumPy array ## [[4 8 5]
Linear Algebra
## [9 3 4]
Data formats and
handling
## [1 0 6]]
Pandas
Series print(arr3.T)
DataFrame
Import/Export data ## [[4 9 1]
Visual ## [8 3 0]
## [5 4 6]]
illustrations
Matplotlib
Figures and subplots
Plot types and styles print(np.eye(3).T)
Pandas visualization

Applications ## [[1. 0. 0.]


Time series ## [0. 1. 0.]
Moving window
## [0. 0. 1.]]
Financial applications

© 2018 PyEcon.org
Matrix multiplication 103
Essential
concepts
Getting started np.dot(array1, array2): matrix multiplication of array1 and array2.
Procedural
programming
Object-orientation Matrix multiplication
Numerical
programming res = np.dot(arr3, np.arange(18).reshape((3, 6)))
NumPy package print(res)
NumPy array
Linear Algebra
## [[108 125 142 159 176 193]
Data formats and ## [ 66 82 98 114 130 146]
handling
Pandas
## [ 72 79 86 93 100 107]]
Series
DataFrame res = np.dot(np.eye(4), np.arange(16).reshape((4, 4)))
Import/Export data
print(res)
Visual
illustrations
Matplotlib
## [[ 0. 1. 2. 3.]
Figures and subplots ## [ 4. 5. 6. 7.]
Plot types and styles ## [ 8. 9. 10. 11.]
Pandas visualization
## [12. 13. 14. 15.]]
Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Array functions 104
Essential
concepts
Getting started
Procedural Element-wise functions
programming
Object-orientation print(arr3)
Numerical
programming ## [[4 8 5]
NumPy package ## [9 3 4]
NumPy array
Linear Algebra
## [1 0 6]]
Data formats and
handling
print(np.sqrt(arr3))
Pandas
Series ## [[2. 2.82842712 2.23606798]
DataFrame ## [3. 1.73205081 2. ]
## [1. 0. 2.44948974]]
Import/Export data

Visual
illustrations
Matplotlib
print(np.exp(arr3))
Figures and subplots
Plot types and styles ## [[5.45981500e+01 2.98095799e+03 1.48413159e+02]
Pandas visualization
## [8.10308393e+03 2.00855369e+01 5.45981500e+01]
Applications ## [2.71828183e+00 1.00000000e+00 4.03428793e+02]]
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Overview: element-wise array functions 105
Essential
concepts
Getting started
Procedural
programming Function Description
Object-orientation
abs Absolute value of integer and floating point
Numerical
programming sqrt Sqare root
NumPy package
NumPy array exp Exponential function
Linear Algebra
log, log10, log2 Natural logarithm, log base 10, log base 2
Data formats and
handling sign Sign (1 : positiv, 0: zero, -1 : negative)
Pandas
Series
ceil Rounding up to integer
DataFrame floor Round down to integer
Import/Export data

Visual
rint Round to nearest integer
illustrations
Matplotlib
modf Returns fractional parts
Figures and subplots sin, cos, tan, sinh, cosh, tanh, arcsin, ...
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Binary functions 106
Essential
concepts
Getting started
Procedural Binary
programming
Object-orientation x = np.array([3, -6, 8, 4, 3, 5])
Numerical y = np.array([3, 5, 7, 3, 5, 9])
programming
print(np.maximum(x, y))
NumPy package
NumPy array
Linear Algebra ## [3 5 8 4 5 9]
Data formats and
handling print(np.greater_equal(x, y))
Pandas
Series ## [ True False True True False False]
DataFrame

print(np.add(x, y))
Import/Export data

Visual
illustrations
Matplotlib
## [ 6 -1 15 7 8 14]
Figures and subplots
Plot types and styles print(np.mod(x, y))
Pandas visualization

Applications ## [0 4 1 1 3 5]
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Overview: binary functions 107
Essential
concepts
Getting started
Procedural
programming Function Description
Object-orientation
add Add elements of arrays
Numerical
programming subtract Subtract elements in the second from the first array
NumPy package
NumPy array multiply Multiply elements
Linear Algebra
divide Divide elements
Data formats and
handling power Raise elements in first array to powers in second
Pandas
Series
maximum Element-wise maximum
DataFrame minimum Element-wise minimum
Import/Export data

Visual
mod Element-wise modulus
illustrations
Matplotlib
greater, less, equal gives boolean
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Data processing 108
Essential
concepts
Getting started np.meshgrid(array1, array2): coordinate matrice from coordinate
Procedural
programming arrays.
Object-orientation
p
Numerical
programming Evaluate the function f (x , y ) = x 2 + y 2 on a 10 x 10 grid
NumPy package
NumPy array p = np.arange(-5, 5, 0.01)
Linear Algebra x, y = np.meshgrid(p, p)
Data formats and print(x)
handling
Pandas
Series
## [[-5. -4.99 -4.98 ... 4.97 4.98 4.99]
DataFrame ## [-5. -4.99 -4.98 ... 4.97 4.98 4.99]
Import/Export data ## [-5. -4.99 -4.98 ... 4.97 4.98 4.99]
Visual ## ...
illustrations
## [-5. -4.99 -4.98 ... 4.97 4.98 4.99]
## [-5. -4.99 -4.98 ... 4.97 4.98 4.99]
Matplotlib
Figures and subplots
Plot types and styles ## [-5. -4.99 -4.98 ... 4.97 4.98 4.99]]
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Data processing 109
Essential
concepts
Getting started p
Procedural
programming
Evaluate the function f (x , y ) = x 2 + y 2 on a 10 x 10 grid.
Object-orientation
import matplotlib.pyplot as plt
Numerical
programming
val = np.sqrt(x**2 + y**2)
NumPy package plt.figure(figsize=(2, 2))
NumPy array plt.imshow(val, cmap="hot")
Linear Algebra
plt.colorbar()
Data formats and
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Data processing 110
Essential
concepts
Getting started p
Procedural
programming
Evaluate the function f (x , y ) = x 2 + y 2 on a 10 x 10 grid.
plt.show()
Object-orientation

Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


handling
Pandas
6
Series

4
DataFrame
Import/Export data

Visual
illustrations

2
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Conditional logic 111
Essential
concepts
Getting started np.where(condition, a, b): If condition is True, take value from
Procedural
programming a, else take b.
Object-orientation

Numerical Conditional logic


programming
NumPy package a = np.array([4, 7, 5, -7, 9, 0])
NumPy array
Linear Algebra
b = np.array([-1, 9, 8, 3, 3, 3])
cond = np.array([True, True, False, True, False, False])
Data formats and
handling res = np.where(cond, a, b)
Pandas print(res)
Series
DataFrame
Import/Export data
## [ 4 7 8 -7 3 3]
Visual
illustrations
res = np.where(a <= b, b, a)
Matplotlib print(res)
Figures and subplots
Plot types and styles ## [4 9 8 3 9 3]
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Conditional logic 112
Essential
concepts
Getting started
Procedural Conditional logic, examples
programming
Object-orientation print(arr3)
Numerical
programming ## [[4 8 5]
NumPy package
NumPy array
## [9 3 4]
Linear Algebra ## [1 0 6]]
Data formats and
handling res = np.where(arr3 < 5, 0, arr3)
Pandas print(res)
Series
DataFrame
Import/Export data
## [[0 8 5]
## [9 0 0]
Visual
illustrations ## [0 0 6]]
Matplotlib
Figures and subplots even = np.where(arr3 % 2 == 0, arr3, arr3 + 1)
Plot types and styles
Pandas visualization
print(even)
Applications
## [[ 4 8 6]
Time series
Moving window
## [10 4 4]
Financial applications ## [ 2 0 6]]

© 2018 PyEcon.org
Statistical methods 113
Essential
concepts
Getting started array.mean(): mean of all array elements.
Procedural
programming array.sum(): sum of all array elements.
Object-orientation

Numerical
programming
Statistical methods
NumPy package print(arr3)
NumPy array
Linear Algebra
## [[4 8 5]
Data formats and
handling
## [9 3 4]
Pandas ## [1 0 6]]
Series
DataFrame print(arr3.mean())
Import/Export data

Visual ## 4.444444444444445
illustrations
Matplotlib
Figures and subplots
print(arr3.sum())
Plot types and styles
Pandas visualization ## 40
Applications
Time series print(arr3.argmin())
Moving window
Financial applications
## 7

© 2018 PyEcon.org
Overview: statistical methods 114
Essential
concepts
Getting started
Procedural
programming Method Description
Object-orientation
sum Sum of all array elements
Numerical
programming mean Mean of all array elements
NumPy package
NumPy array std, var Standard deviation, variance
Linear Algebra
min, max Minimum and Maximum value in array
Data formats and
handling argmin, argmax Indices of Minimum and Maximum value
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Axis 115
Essential
concepts
Getting started Axes are defined for arrays with more than one dimension. A two-
Procedural
programming dimensional array has two axes. The first one is running vertically
Object-orientation
downwards across the rows (axis=0), the second one running horizon-
Numerical
programming tally across the columns (axis=1).
NumPy package
NumPy array
Linear Algebra
Axis
Data formats and print(arr3)
handling
Pandas
Series
## [[4 8 5]
DataFrame ## [9 3 4]
Import/Export data ## [1 0 6]]
Visual
illustrations print(arr3.sum(axis=0))
Matplotlib

## [14 11 15]
Figures and subplots
Plot types and styles
Pandas visualization

Applications
print(arr3.sum(axis=1))
Time series
Moving window ## [17 16 7]
Financial applications

© 2018 PyEcon.org
Sorting 116
Essential
concepts
Getting started array.sort(axis): sort array by an axis.
Procedural
programming
Object-orientation Sorting one-dimensional arrays
Numerical
programming print(arr2)
NumPy package
NumPy array ## [24.3 0. 8.9 4.4 1.65 45. ]
Linear Algebra

Data formats and arr2.sort()


handling
Pandas
print(arr2)
Series
DataFrame ## [ 0. 1.65 4.4 8.9 24.3 45. ]
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Sorting 117
Essential
concepts
Getting started
Procedural
Sorting two-dimensional arrays
programming
Object-orientation print(arr3)
Numerical
programming ## [[4 8 5]
NumPy package ## [9 3 4]
NumPy array
Linear Algebra
## [1 0 6]]
Data formats and
handling
arr3.sort()
Pandas print(arr3)
Series
DataFrame ## [[4 5 8]
## [3 4 9]
Import/Export data

Visual
illustrations
## [0 1 6]]
Matplotlib
Figures and subplots arr3.sort(axis=0)
Plot types and styles print(arr3)
Pandas visualization

Applications ## [[0 1 6]
Time series
## [3 4 8]
Moving window
Financial applications
## [4 5 9]]

The default axis using sort() is -1, which means to sort along the
© 2018 PyEcon.org last axis (in this case axis 1).
Section 2.3 118
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Numerical programming
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I Linear Algebra
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Inverse matrix 119
Essential
concepts
Getting started
Procedural Import numpy.linalg
programming
Object-orientation import numpy.linalg as nplin
Numerical
programming
NumPy package nplin.inv(array): inverse matrix.
NumPy array
Linear Algebra
np.allclose(array1, array2): returns True if two arrays are ele-
Data formats and
ment-wise equal within a tolerance.
handling
Pandas
Series
Inverse
DataFrame inv = nplin.inv(arr3)
Import/Export data
print(inv)
Visual
illustrations
Matplotlib
## [[ 4. -21. 16.]
Figures and subplots ## [ -5. 24. -18.]
Plot types and styles ## [ 1. -4. 3.]]
Pandas visualization

Applications print(np.allclose(np.identity(3), np.dot(inv, arr3)))


Time series
Moving window
## True
Financial applications

© 2018 PyEcon.org
Matrix functions 120
Essential
concepts
Getting started nplin.det(array): compute determininat.
Procedural
programming np.trace(array): compute trace.
Object-orientation
np.diag(array): return diagonal elements as an array.
Numerical
programming
NumPy package Linear algebra functions
NumPy array
Linear Algebra print(nplin.det(arr3))
Data formats and
handling ## -1.0
Pandas
Series
DataFrame
print(np.trace(arr3))
Import/Export data
## 13
Visual
illustrations
Matplotlib print(np.diag(arr3))
Figures and subplots
Plot types and styles
## [0 4 9]
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Eigenvalues and eigenvectors 121
Essential
concepts
Getting started nplin.eig(array): return array of eigenvalues and array of eigenvec-
Procedural
programming tors as a list.
Object-orientation

Numerical Get eigenvalues and eigenvectors


programming
NumPy package A = np.array([[3, -1, 0], [2, 0, 0], [-2, 2, -1]])
NumPy array
eigenval, eigenvec = nplin.eig(A)
Linear Algebra
print(eigenval)
Data formats and
handling
Pandas ## [-1. 1. 2.]
Series
DataFrame print(eigenvec)
Import/Export data

Visual ## [[ 0. -0.40824829 -0.70710678]


## [ 0. -0.81649658 -0.70710678]
illustrations
Matplotlib
Figures and subplots ## [ 1. -0.40824829 0. ]]
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Eigenvalues and eigenvectors 122
Essential
concepts
Getting started
Procedural Check eigenvalues and eigenvectors
programming
Object-orientation print(eigenval * eigenvec)
Numerical
programming ## [[-0. -0.40824829 -1.41421356]
NumPy package
NumPy array
## [-0. -0.81649658 -1.41421356]
Linear Algebra ## [-1. -0.40824829 0. ]]
Data formats and
handling print(np.dot(A, eigenvec))
Pandas
Series ## [[ 0. -0.40824829 -1.41421356]
DataFrame
Import/Export data
## [ 0. -0.81649658 -1.41421356]
## [-1. -0.40824829 0. ]]
Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles        
Pandas visualization 3 −1 0 0 0 0
Applications 2 0 0  · 0 = (−1) · 0 =  0 
Time series
Moving window −2 2 −1 1 1 −1
Financial applications

© 2018 PyEcon.org
QR decomposition 123
Essential
concepts
Getting started nplin.qr(array): QR decomposition, returns Q and R as lists.
Procedural
programming
Object-orientation QR decomposition
Numerical
programming Q, R = nplin.qr(arr3)
NumPy package print(Q)
NumPy array
Linear Algebra
## [[ 0. 0.98058068 0.19611614]
Data formats and ## [-0.6 0.15689291 -0.78446454]
handling
Pandas
## [-0.8 -0.11766968 0.58834841]]
Series
DataFrame print(R)
Import/Export data

Visual ## [[ -5. -6.4 -12. ]


illustrations
Matplotlib
## [ 0. 1.0198039 6.07960019]
Figures and subplots ## [ 0. 0. 0.19611614]]
Plot types and styles
Pandas visualization print(np.allclose(arr3, np.dot(Q, R)))
Applications
Time series ## True
Moving window
Financial applications

© 2018 PyEcon.org
Linearsystem 124
Essential
concepts
Getting started nplin.solve(A, b): return solution of the linearsystem Ax = b.
Procedural
programming
Object-orientation Solve linearsystems
Numerical
programming b = np.array([7, 4, 8])
NumPy package x = nplin.solve(A, b)
NumPy array
print(x)
Linear Algebra

Data formats and ## [ 2. -1. -14.]


handling
Pandas
Series print(np.allclose(np.dot(A, x), b))
DataFrame
Import/Export data ## True
Visual
illustrations
Matplotlib
Figures and subplots    
Plot types and styles
Pandas visualization
3x1 − 1x2 + 0x3 =7 x1 2
Applications
2x1 − 0x2 + 0x3 = 4 → x2  =  −1 
Time series −2x1 + 2x2 − 1x3 =8 x3 −14
Moving window
Financial applications

© 2018 PyEcon.org
Overview: linear algebra 125
Essential
concepts
Getting started
Procedural
programming Function Description
Object-orientation
np.dot Matrix multiplication
Numerical
programming np.trace Sum of the diagonal elements
NumPy package
NumPy array np.diag Diagonal elements as an array
Linear Algebra
nplin.det Matrix determinant
Data formats and
handling nplin.eig Eigenvalues and eigenvectors
Pandas
Series
nplin.inv Inverse matrix
DataFrame nplin.qr QR decomposition
Import/Export data

Visual
nplin.solve Solve linearsystem
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Chapter 3 126
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Data formats and handling
Numerical
programming
NumPy package
NumPy array
3.1 Pandas
Linear Algebra

Data formats and


3.2 Series
handling
Pandas 3.3 DataFrame
Series
DataFrame
Import/Export data
3.4 Import/Export data
Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Section 3.1 127
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Data formats and handling
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I Pandas
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Pandas 128
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


handling
Pandas
Series The package pandas is a free software library for Python including the
DataFrame
Import/Export data
following features:
Visual
illustrations
Data manipulation and analysis,
Matplotlib
Figures and subplots
DataFrame objects and Series,
Plot types and styles
Pandas visualization
Export and import data from files and web,
Applications Handling of missing data.
Time series
Moving window → Provides high-performance data structures and data analysis tools.
Financial applications

© 2018 PyEcon.org
Motivation 129
Essential
concepts
Getting started With pandas you can import and visualize financial data in only a few
Procedural
programming lines of code.
Object-orientation

Numerical Motivation
programming
NumPy package
import pandas as pd
NumPy array import matplotlib.pyplot as plt
Linear Algebra
fig = plt.figure()
Data formats and ax = fig.add_subplot(1, 1, 1)
handling
Pandas
dow = pd.read_csv("data/dji.csv", index_col=0, parse_dates=True)
Series close = dow["Close"]
DataFrame close.plot(ax=ax)
Import/Export data
ax.set_xlabel("Date")
Visual
illustrations
ax.set_ylabel("Price")
Matplotlib ax.set_title("DJI")
Figures and subplots fig.savefig("out/dji.pdf", format="pdf")
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Motivation 130
Essential
concepts
Getting started
Procedural
programming

DJI
Object-orientation

Numerical 27500
programming
NumPy package
NumPy array
25000
Linear Algebra

Data formats and 22500


handling
Pandas 20000
Series
DataFrame
17500
Price

Import/Export data

Visual
illustrations 15000
Matplotlib
Figures and subplots 12500
Plot types and styles
Pandas visualization
10000
Applications
Time series
Moving window
7500
Financial applications

6 8 0 2 4 6 8
200 200 201 201 201 201 201
Date
© 2018 PyEcon.org
Section 3.2 131
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Data formats and handling
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I Series
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Series 132
Essential
concepts
Getting started Series are a data structure in pandas.
Procedural
programming
Object-orientation
One-dimensional array-like object,
Numerical
programming Containing a sequence of values and an corresponding array of
NumPy package
NumPy array
labels, called the index,
Linear Algebra
The string representation of a Series displays the index an the
Data formats and
handling right and the values on the right,
Pandas
Series The default index consists of the integers 0 through N-1.
DataFrame
Import/Export data

Visual
illustrations String representation of a Series
## 0 3
Matplotlib
Figures and subplots
Plot types and styles ## 1 7
Pandas visualization ## 2 -8
Applications ## 3 4
Time series
## 4 26
## dtype: int64
Moving window
Financial applications

© 2018 PyEcon.org
Create Series 133
Essential
concepts
Getting started
Procedural Import pandas
programming
Object-orientation import pandas as pd
Numerical
programming
NumPy package pd.Series(): one-dimensional array-like object including values and
NumPy array
Linear Algebra
an index.
Data formats and
handling
Series
Pandas obj = pd.Series([2, -5, 9, 4])
Series
DataFrame
print(obj)
Import/Export data

Visual
## 0 2
illustrations ## 1 -5
Matplotlib ## 2 9
Figures and subplots
Plot types and styles
## 3 4
Pandas visualization ## dtype: int64
Applications
Time series
Moving window
Financial applications
Simple Series formed only from a list,
Index is added automatically.
© 2018 PyEcon.org
Create Series 134
Essential
concepts
Getting started
Procedural
Series indexing vs. Numpy indexing
programming
Object-orientation obj2 = pd.Series([2, -5, 9, 4], index=["a", "b", "c", "d"])
Numerical npobj = np.array([2, -5, 9, 4])
programming print(obj2)
NumPy package

## a 2
NumPy array
Linear Algebra

Data formats and


## b -5
handling ## c 9
Pandas ## d 4
Series
DataFrame
## dtype: int64
Import/Export data

Visual
print(obj2["b"])
illustrations
Matplotlib ## -5
Figures and subplots
Plot types and styles
print(npobj[1])
Pandas visualization

Applications ## -5
Time series
Moving window
Financial applications

NumPy arrays can only be indexed by integers while Series can be


indexed by the manually set index.
© 2018 PyEcon.org
Create Series 135
Essential
concepts
Getting started
Procedural Series creation from Numpy arrays
programming
Object-orientation npobj = np.array([2, -5, 9, 4])
Numerical obj2 = pd.Series(npobj, index=["a", "b", "c", "d"])
programming
print(obj2)
NumPy package
NumPy array
Linear Algebra ## a 2
Data formats and
## b -5
handling ## c 9
Pandas
## d 4
## dtype: int64
Series
DataFrame
Import/Export data

Visual
illustrations
Pandas Series can be created from:
Matplotlib
Figures and subplots
Lists,
Plot types and styles
Pandas visualization
NumPy arrays,
Applications Dicts.
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Create Series 136
Essential
concepts
Getting started
Procedural
programming
The index of the Series can be set manually,
Object-orientation
Compared to NumPy array you can use the set index to select
Numerical
programming single values,
NumPy package
NumPy array Data contained in a dict can be passed to a Series. The index of
Linear Algebra
the resulting Series consists of the dict’s keys.
Data formats and
handling
Pandas
Series
Series from dicts
DataFrame
Import/Export data dictdata = {"Göttingen": 117665, "Northeim": 28920,
Visual "Hannover": 532163, "Berlin": 3574830}
illustrations
obj3 = pd.Series(dictdata)
Matplotlib
Figures and subplots
print(obj3)
Plot types and styles
Pandas visualization ## Göttingen 117665
Applications ## Northeim 28920
Time series ## Hannover 532163
Moving window
## Berlin 3574830
Financial applications
## dtype: int64

© 2018 PyEcon.org
Create Series 137
Essential
concepts
Getting started
Procedural Dict to Series with manual index
programming
Object-orientation cities = ["Hamburg", "Göttingen", "Berlin", "Hannover"]
Numerical obj4 = pd.Series(dictdata, index=cities)
programming print(obj4)
NumPy package
NumPy array
Linear Algebra
## Hamburg NaN
## Göttingen 117665.0
Data formats and
handling ## Berlin 3574830.0
Pandas ## Hannover 532163.0
Series
## dtype: float64
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Passing a dict to a Series, the index can be set manually,
Figures and subplots
Plot types and styles
NaN (not a number) marks missing values where the index and the
Pandas visualization dict do not match.
Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Series properties 138
Essential
concepts
Getting started Series.values: returns the values of a Series.
Procedural
programming Series.index: returns the index of a Series.
Object-orientation

Numerical Series properties


programming
NumPy package print(obj.values)
NumPy array
Linear Algebra
## [ 2 -5 9 4]
Data formats and
handling
Pandas
print(obj.index)
Series
DataFrame ## RangeIndex(start=0, stop=4, step=1)
Import/Export data

Visual print(obj2.index)
illustrations
Matplotlib
## Index(['a', 'b', 'c', 'd'], dtype='object')
Figures and subplots
Plot types and styles
Pandas visualization

Applications The values and the index of a DataFrame can be printed separately
Time series
Moving window as a list,
Financial applications
The default index is given as a RangeIndex.

© 2018 PyEcon.org
Selecting and manipulating values 139
Essential
concepts
Getting started
Procedural Series manipulation
programming
Object-orientation print(obj2[["c", "d", "a"]])
Numerical
programming ## c 9
NumPy package
NumPy array
## d 4
Linear Algebra ## a 2
Data formats and
## dtype: int64
handling
Pandas print(obj2[obj2 < 0])
Series
DataFrame
Import/Export data
## b -5
## dtype: int64
Visual
illustrations
Matplotlib
Figures and subplots
NumPy-like functions can be applied on Series
Plot types and styles
Pandas visualization
For filtering data,
Applications To do scalar multiplications or applying math functions,
Time series
Moving window The index-value link will be preserved.
Financial applications

© 2018 PyEcon.org
Selecting and manipulating values 140
Essential
concepts
Getting started
Procedural
Series functions
print(obj2 * 2)
programming
Object-orientation

Numerical
programming ## a 4
NumPy package ## b -10
NumPy array ## c 18
Linear Algebra
## d 8
Data formats and
handling
## dtype: int64
Pandas
Series print(np.exp(obj2)["a":"c"])
DataFrame
Import/Export data
## a 7.389056
Visual ## b 0.006738
illustrations
Matplotlib
## c 8103.083928
Figures and subplots ## dtype: float64
Plot types and styles
Pandas visualization print("c" in obj2)
Applications
Time series ## True
Moving window
Financial applications

Mathematical functions on a Series will only be applied on the


values not on its index.
© 2018 PyEcon.org
Selecting and manipulating values 141
Essential
concepts
Getting started
Procedural
Series manipulation
programming
Object-orientation obj4["Hamburg"] = 1900000
Numerical print(obj4)
programming
NumPy package ## Hamburg 1900000.0
NumPy array
Linear Algebra
## Göttingen 117665.0
## Berlin 3574830.0
Data formats and
handling ## Hannover 532163.0
Pandas ## dtype: float64
Series
DataFrame
Import/Export data
obj4[["Berlin", "Hannover"]] = [3600000, 1100000]
print(obj4)
Visual
illustrations
Matplotlib ## Hamburg 1900000.0
Figures and subplots ## Göttingen 117665.0
Plot types and styles
Pandas visualization
## Berlin 3600000.0
## Hannover 1100000.0
Applications
Time series
## dtype: float64
Moving window
Financial applications

Values can be manipulated by using the labels in the index,


Sets of values can be set in one line.
© 2018 PyEcon.org
Detect missing data 142
Essential
concepts
Getting started pd.isnull(): True if data is missing.
Procedural
programming pd.notnull(): False if data is missing.
Object-orientation

Numerical
programming
NaN
NumPy package print(pd.isnull(obj4))
NumPy array
Linear Algebra
## Hamburg False
Data formats and
handling
## Göttingen False
Pandas ## Berlin False
Series ## Hannover False
DataFrame
## dtype: bool
Import/Export data

Visual print(pd.notnull(obj4))
illustrations
Matplotlib
Figures and subplots ## Hamburg True
Plot types and styles ## Göttingen True
Pandas visualization
## Berlin True
Applications ## Hannover True
Time series
Moving window
## dtype: bool
Financial applications

© 2018 PyEcon.org
Align differently indexed data 143
Essential
concepts
Getting started There are not two values to align for Hamburg and Northeim so they
Procedural
programming are marked with NaN (not a number).
Object-orientation

Numerical
programming
NumPy package
Data 1 Data 2
NumPy array
print(obj3) print(obj4)
Linear Algebra

Data formats and


handling
## Göttingen 117665 ## Hamburg 1900000.0
Pandas ## Northeim 28920 ## Göttingen 117665.0
Series ## Hannover 532163 ## Berlin 3600000.0
DataFrame
## Berlin 3574830 ## Hannover 1100000.0
Import/Export data
## dtype: int64 ## dtype: float64
Visual
illustrations
Matplotlib
Figures and subplots
Align data
Plot types and styles
Pandas visualization
print(obj3 + obj4)
Applications
## Berlin 7174830.0
Time series
Moving window
## Göttingen 235330.0
Financial applications ## Hamburg NaN
## Hannover 1632163.0
## Northeim NaN
© 2018 PyEcon.org
## dtype: float64
Naming Series 144
Essential
concepts
Getting started Series.name: name of the Series
Procedural
programming Series.index.name: name of the index
Object-orientation

Numerical Naming
programming
NumPy package obj4.name = "population"
NumPy array
obj4.index.name = "city"
Linear Algebra
print(obj4)
Data formats and
handling
Pandas
## city
Series ## Hamburg 1900000.0
DataFrame ## Göttingen 117665.0
## Berlin 3600000.0
Import/Export data

Visual
illustrations
## Hannover 1100000.0
Matplotlib ## Name: population, dtype: float64
Figures and subplots
Plot types and styles
Pandas visualization

Applications The attribute name will change the name of the existing Series,
Time series
Moving window There is no default name of the Series or the index.
Financial applications

© 2018 PyEcon.org
Series vs. NumPy arrays 145
Essential
concepts
Getting started
Procedural
programming
NumPy arrays are accessed by their integer position.
Object-orientation
Series can definied and accessed by your own index, including
Numerical
programming letters and numbers.
NumPy package
NumPy array Different Series can be aligned efficiently by the index.
Linear Algebra

Data formats and Series can work with missing values, so operations do not auto-
handling
Pandas
matically fail.
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Section 3.3 146
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Data formats and handling
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I DataFrame
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
DataFrame 147
Essential
concepts
Getting started
Procedural
programming
DataFrames are the primary structure of pandas,
Object-orientation
It represents a table of data with an ordered collection of columns,
Numerical
programming
NumPy package
Each column can have a different data type,
NumPy array
Linear Algebra
A DataFrame can be thought of as a dict of Series sharing the
Data formats and same index,
handling
Pandas Physically a DataFrame is two-dimensional, but by using hierachical
Series
DataFrame
indexing it can respresent higher dimensional data.
Import/Export data

Visual
illustrations String representation of a DataFrame
Matplotlib
Figures and subplots ## company price volume
Plot types and styles
## 0 Daimler 69.20 4456290
## 1 E.ON 8.11 3667975
Pandas visualization

Applications
## 2 Siemens 110.92 3669487
Time series
Moving window
## 3 BASF 87.28 1778058
Financial applications ## 4 BMW 87.81 1824582

© 2018 PyEcon.org
DataFrame 148
Essential
concepts
Getting started pd.DataFrame(): a DataFrame is a tabular-like structure. It is two-
Procedural
programming dimensional and has labeled axis (rows and columns).
Object-orientation

Numerical Creating a DataFrame


programming
NumPy package data = {"company": ["Daimler", "E.ON", "Siemens", "BASF", "BMW"],
NumPy array
"price": [69.2, 8.11, 110.92, 87.28, 87.81],
Linear Algebra
"volume": [4456290, 3667975, 3669487, 1778058, 1824582]}
Data formats and
handling frame = pd.DataFrame(data)
Pandas print(frame)
Series
DataFrame
## company price volume
Import/Export data
## 0 Daimler 69.20 4456290
## 1 E.ON 8.11 3667975
Visual
illustrations
Matplotlib ## 2 Siemens 110.92 3669487
Figures and subplots
## 3 BASF 87.28 1778058
## 4 BMW 87.81 1824582
Plot types and styles
Pandas visualization

Applications

In this example the construction of the DataFrame frame is done


Time series
Moving window
Financial applications
by passing a dict of equal-length lists,
Instead of passing a dict of lists, it is also possible to pass a dict
of NumPy arrays.
© 2018 PyEcon.org
Show DataFrames 149
Essential
concepts
Getting started
Procedural Print DataFrame
programming
Object-orientation frame2 = pd.DataFrame(data, columns=["company", "volume",
Numerical "price", "change"])
programming print(frame2)
NumPy package
NumPy array
Linear Algebra
## company volume price change
## 0 Daimler 4456290 69.20 NaN
Data formats and
handling ## 1 E.ON 3667975 8.11 NaN
Pandas ## 2 Siemens 3669487 110.92 NaN
Series
## 3 BASF 1778058 87.28 NaN
DataFrame
Import/Export data
## 4 BMW 1824582 87.81 NaN
Visual
illustrations
Matplotlib
Figures and subplots
Passing a column that is not contained in the dict, it will be
Plot types and styles marked with NaN,
Pandas visualization

Applications The default index will be assigned automatically as with Series.


Time series
Moving window
Financial applications

© 2018 PyEcon.org
Inputs to DataFrame constructor 150
Essential
concepts
Getting started
Procedural
programming Type Description
Object-orientation
2D NumPy arrays A matrix of data
Numerical
programming dict of arrays, lists, or tuples Each sequence becomes a column
NumPy package
NumPy array dict of Series Each value becomes a column
Linear Algebra
dict of dicts Each inner dict becomes a column
Data formats and
handling List of dicts or Series Each item becomes a row
Pandas
Series
List of lists or tuples Treated as the 2D NumPy arrays
DataFrame Another DataFrame Same indexes
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Indexing and adding DataFrames 151
Essential
concepts
Getting started
Procedural Add data to DataFrame
programming
Object-orientation frame2["change"] = [1.2, -3.2, 0.4, -0.12, 2.4]
Numerical print(frame2["change"])
programming
NumPy package ## 0 1.20
NumPy array
Linear Algebra
## 1 -3.20
## 2 0.40
Data formats and
handling ## 3 -0.12
Pandas ## 4 2.40
Series
## Name: change, dtype: float64
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Selecting the column of DataFrame, a Series is returned,
Figures and subplots
Plot types and styles
A attribute-like access, e. g., frame2.change, is also possible,
Pandas visualization
The returned Series has the same index as the initial DataFrame.
Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Indexing DataFrames 152
Essential
concepts
Getting started
Procedural Indexing DataFrames
programming
Object-orientation print(frame2[["company", "change"]])
Numerical
programming ## company change
NumPy package
NumPy array
## 0 Daimler 1.20
Linear Algebra ## 1 E.ON -3.20
Data formats and
## 2 Siemens 0.40
handling ## 3 BASF -0.12
Pandas
## 4 BMW 2.40
Series
DataFrame
Import/Export data

Visual Using a list of multiple columns while indexing, the result is a


illustrations
Matplotlib DataFrame,
Figures and subplots
Plot types and styles The returned DataFrame has the same index as the initial one.
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Changing DataFrames 153
Essential
concepts
Getting started del DataFrame[column]: delete column from DataFrame.
Procedural
programming
Object-orientation DataFrame delete column
Numerical
programming
del frame2["volume"]
NumPy package print(frame2)
NumPy array
Linear Algebra ## company price change
Data formats and ## 0 Daimler 69.20 1.20
handling
Pandas
## 1 E.ON 8.11 -3.20
Series ## 2 Siemens 110.92 0.40
DataFrame ## 3 BASF 87.28 -0.12
Import/Export data
## 4 BMW 87.81 2.40
Visual
illustrations
print(frame2.columns)
Matplotlib
Figures and subplots
Plot types and styles ## Index(['company', 'price', 'change'], dtype='object')
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Naming DataFrames 154
Essential
concepts
Getting started
Procedural Naming properties
programming
Object-orientation frame2.index.name = "number:"
Numerical frame2.columns.name = "feature:"
programming
print(frame2)
NumPy package
NumPy array
Linear Algebra
## feature: company price change
Data formats and
## number:
handling ## 0 Daimler 69.20 1.20
Pandas
## 1 E.ON 8.11 -3.20
## 2 Siemens 110.92 0.40
Series
DataFrame
Import/Export data ## 3 BASF 87.28 -0.12
Visual ## 4 BMW 87.81 2.40
illustrations
Matplotlib
Figures and subplots
Plot types and styles In DataFrames there is no default name for the index or the
Pandas visualization
columns.
Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Reindexing 155
Essential
concepts
Getting started DataFrame.reindex(): creates new DataFrame with data conformed
Procedural
programming to a new index, the initial DataFrame will not be changed.
Object-orientation

Numerical
programming
Reindexing
NumPy package
frame3 = frame.reindex([0, 2, 3, 4])
NumPy array
Linear Algebra
print(frame3)
Data formats and
handling ## company price volume
Pandas ## 0 Daimler 69.20 4456290
Series
## 2 Siemens 110.92 3669487
## 3 BASF 87.28 1778058
DataFrame
Import/Export data
## 4 BMW 87.81 1824582
Visual
illustrations
Matplotlib

Index values that are not already present will be filled with NaN by
Figures and subplots
Plot types and styles
Pandas visualization
default,
Applications
Time series There are many options for filling missing values.
Moving window
Financial applications

© 2018 PyEcon.org
Reindexing 156
Essential
concepts
Getting started
Procedural Filling missing values
programming
Object-orientation frame4 = frame.reindex(index=[0, 2, 3, 4, 5], fill_value=0,
Numerical columns=["company", "price", "market cap"])
programming
print(frame4)
NumPy package
NumPy array
Linear Algebra ## company price market cap
Data formats and
## 0 Daimler 69.20 0
handling ## 2 Siemens 110.92 0
Pandas
## 3 BASF 87.28 0
## 4 BMW 87.81 0
Series
DataFrame
Import/Export data ## 5 0 0.00 0
Visual
illustrations frame4 = frame.reindex(index=[0, 2, 3, 4], fill_value=np.nan,
Matplotlib
columns=["company", "price", "market cap"])
Figures and subplots
Plot types and styles
print(frame4)
Pandas visualization

Applications
## company price market cap
Time series ## 0 Daimler 69.20 NaN
Moving window ## 2 Siemens 110.92 NaN
Financial applications
## 3 BASF 87.28 NaN
## 4 BMW 87.81 NaN

© 2018 PyEcon.org
Fill NaN 157
Essential
concepts
Getting started DataFrame.fillna(value): filling NaN with value
Procedural
programming
Object-orientation Filling NaN
Numerical
programming
print(frame4[:3])
NumPy package
NumPy array ## company price market cap
Linear Algebra ## 0 Daimler 69.20 NaN
Data formats and ## 2 Siemens 110.92 NaN
handling
Pandas
## 3 BASF 87.28 NaN
Series
DataFrame frame4.fillna(1000000, inplace=True)
Import/Export data print(frame4[:3])
Visual
illustrations
## company price market cap
## 0 Daimler 69.20 1000000.0
Matplotlib
Figures and subplots
Plot types and styles ## 2 Siemens 110.92 1000000.0
Pandas visualization ## 3 BASF 87.28 1000000.0
Applications
Time series
Moving window
Financial applications
The option inplace=True fills the current DafaFrame (here
frame4). Without using inplace a new DataFrame will be cre-
ated, filled with NaN values.
© 2018 PyEcon.org
Dropping entries 158
Essential
concepts
Getting started DataFrame.drop(index, axis): returns a new object with labels in
Procedural
programming requested axis removed.
Object-orientation

Numerical
programming
Dropping index
NumPy package frame5 = frame
NumPy array
Linear Algebra
print(frame5)
Data formats and
handling
## company price volume
Pandas ## 0 Daimler 69.20 4456290
Series ## 1 E.ON 8.11 3667975
DataFrame
Import/Export data
## 2 Siemens 110.92 3669487
## 3 BASF 87.28 1778058
Visual
illustrations ## 4 BMW 87.81 1824582
Matplotlib
Figures and subplots print(frame5.drop([1, 2]))
Plot types and styles

## company price volume


Pandas visualization

Applications
## 0 Daimler 69.20 4456290
Time series
Moving window
## 3 BASF 87.28 1778058
Financial applications ## 4 BMW 87.81 1824582

© 2018 PyEcon.org
Dropping entries 159
Essential
concepts
Getting started
Procedural Dropping column
programming
Object-orientation print(frame5[:2])
Numerical
programming ## company price volume
NumPy package
NumPy array
## 0 Daimler 69.20 4456290
Linear Algebra ## 1 E.ON 8.11 3667975
Data formats and
handling print(frame5.drop("price", axis=1)[:3])
Pandas
Series ## company volume
DataFrame
Import/Export data
## 0 Daimler 4456290
## 1 E.ON 3667975
Visual
illustrations ## 2 Siemens 3669487
Matplotlib
Figures and subplots print(frame5.drop(2, axis=0))
Plot types and styles

## company price volume


Pandas visualization

Applications
## 0 Daimler 69.20 4456290
Time series
Moving window
## 1 E.ON 8.11 3667975
Financial applications ## 3 BASF 87.28 1778058
## 4 BMW 87.81 1824582

© 2018 PyEcon.org
Indexing, selecting and filtering 160
Essential
concepts
Getting started
Procedural
programming
Indexing of DataFrames works like indexing an numpy array, you
Object-orientation
can use the default index values and a manually set index.
Numerical
programming
NumPy package
NumPy array Indexing
Linear Algebra
print(frame)
Data formats and
handling
Pandas ## company price volume
Series ## 0 Daimler 69.20 4456290
## 1 E.ON 8.11 3667975
DataFrame
Import/Export data
## 2 Siemens 110.92 3669487
Visual
illustrations ## 3 BASF 87.28 1778058
Matplotlib ## 4 BMW 87.81 1824582
Figures and subplots
Plot types and styles
Pandas visualization
print(frame[2:])
Applications
## company price volume
Time series
Moving window ## 2 Siemens 110.92 3669487
Financial applications ## 3 BASF 87.28 1778058
## 4 BMW 87.81 1824582

© 2018 PyEcon.org
Indexing, selecting and filtering 161
Essential
concepts
Getting started
Procedural Indexing
programming
Object-orientation frame6 = pd.DataFrame(data, index=["a", "b", "c", "d", "e"])
Numerical print(frame6)
programming
NumPy package
NumPy array
## company price volume
Linear Algebra ## a Daimler 69.20 4456290
Data formats and
## b E.ON 8.11 3667975
handling ## c Siemens 110.92 3669487
Pandas
## d BASF 87.28 1778058
## e BMW 87.81 1824582
Series
DataFrame
Import/Export data

Visual
print(frame6["b":"d"])
illustrations
Matplotlib ## company price volume
Figures and subplots
## b E.ON 8.11 3667975
Plot types and styles
Pandas visualization
## c Siemens 110.92 3669487
Applications
## d BASF 87.28 1778058
Time series
Moving window
Financial applications
When slicing with labels the end element is inclusive.

© 2018 PyEcon.org
Indexing, selecting and filtering 162
Essential
concepts
Getting started DataFrame.loc(): select a subset of rows and columns from a DataFrame
Procedural
programming using axis labels.
Object-orientation
DataFrame.iloc(): select a subset of rows and columns from a
Numerical
programming DataFrame using integers.
NumPy package
NumPy array
Linear Algebra
Selection with loc and iloc
Data formats and print(frame6.loc["c", ["company", "price"]])
handling
Pandas
## company Siemens
Series
DataFrame
## price 110.92
Import/Export data ## Name: c, dtype: object
Visual
illustrations print(frame6.iloc[2, [0, 1]])
Matplotlib
Figures and subplots
Plot types and styles
## company Siemens
Pandas visualization ## price 110.92
Applications
## Name: c, dtype: object
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Indexing, selecting and filtering 163
Essential
concepts
Getting started
Procedural Selection with loc and iloc
programming
Object-orientation print(frame6.loc[["c", "d", "e"], ["volume", "price", "company"]])
Numerical
programming ## volume price company
NumPy package ## c 3669487 110.92 Siemens
NumPy array
Linear Algebra
## d 1778058 87.28 BASF
## e 1824582 87.81 BMW
Data formats and
handling
Pandas print(frame6.iloc[2:, ::-1])
Series
DataFrame ## volume price company
## c 3669487 110.92 Siemens
Import/Export data

Visual
illustrations
## d 1778058 87.28 BASF
Matplotlib ## e 1824582 87.81 BMW
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Both of the indexing functions work with slices or lists of labels,
Time series
Moving window
Many ways to select and rearrange pandas objects.
Financial applications

© 2018 PyEcon.org
DataFrame incexing options 164
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Type Description
Numerical
programming df[val] Select single column or set of columns
NumPy package
NumPy array df.loc[val] Select single row or set of rows
Linear Algebra
df.loc[:, val] Select single column or set of columns
Data formats and
handling df.loc[val1, val2] Select row and column by label
Pandas
Series
df.iloc[where] Select row or set of rows by integer position
DataFrame df.iloc[:, where] Select column or set of columns by integer pos.
Import/Export data

Visual
df.iloc[w1, w2] Select row and column by integer position
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Hierarchical indexing 165
Essential
concepts
Getting started
Procedural Hierarchical indexing enables you to have multiple index levels.
programming
Object-orientation

Numerical
programming
Multiindex
NumPy package ind = [["a", "a", "a", "b", "b"], [1, 2, 3, 1, 2]]
NumPy array
frame6 = pd.DataFrame(np.arange(15).reshape((5, 3)),
index=ind,
Linear Algebra

Data formats and


handling columns=["first", "second", "third"])
Pandas print(frame6)
Series
DataFrame
## first second third
## a 1 0 1 2
Import/Export data

Visual
illustrations
## 2 3 4 5
Matplotlib ## 3 6 7 8
Figures and subplots ## b 1 9 10 11
Plot types and styles
Pandas visualization
## 2 12 13 14
Applications
frame6.index.names = ["index1", "index2"]
Time series
Moving window
print(frame6.index)
Financial applications
## MultiIndex(levels=[['a', 'b'], [1, 2, 3]],
## labels=[[0, 0, 0, 1, 1], [0, 1, 2, 0, 1]],
## names=['index1', 'index2'])
© 2018 PyEcon.org
Hierarchical indexing 166
Essential
concepts
Getting started
Procedural Selecting of a multiindex
programming
Object-orientation print(frame6.loc["a"])
Numerical
programming ## first second third
NumPy package
NumPy array
## index2
Linear Algebra ## 1 0 1 2
Data formats and
## 2 3 4 5
handling ## 3 6 7 8
Pandas
Series
print(frame6.loc["b", 1])
DataFrame
Import/Export data
## first 9
Visual
illustrations ## second 10
Matplotlib ## third 11
Figures and subplots
## Name: (b, 1), dtype: int64
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Operations between DataFrame and Series 167
Essential
concepts
Getting started
Procedural Series and DataFrames
programming
Object-orientation frame7 = frame[["price", "volume"]]
Numerical frame7.index = ["Daimler", "E.ON", "Siemens", "BASF", "BMW"]
programming series = frame7.iloc[2]
NumPy package
print(frame7)
NumPy array
Linear Algebra
## price volume
Data formats and
handling ## Daimler 69.20 4456290
Pandas ## E.ON 8.11 3667975
Series
## Siemens 110.92 3669487
DataFrame
Import/Export data
## BASF 87.28 1778058
Visual
## BMW 87.81 1824582
illustrations
Matplotlib print(series)
Figures and subplots
Plot types and styles
## price 110.92
Pandas visualization
## volume 3669487.00
Applications
Time series
## Name: Siemens, dtype: float64
Moving window
Financial applications

Here the Series was generated from the first row of the DataFrame.
© 2018 PyEcon.org
Operations between DataFrames and Series 168
Essential
concepts
Getting started
Procedural Operations between Series and DataFrames down the rows
programming
Object-orientation print(frame7 + series)
Numerical
programming ## price volume
NumPy package
NumPy array
## Daimler 180.12 8125777.0
Linear Algebra ## E.ON 119.03 7337462.0
Data formats and
## Siemens 221.84 7338974.0
handling ## BASF 198.20 5447545.0
Pandas
## BMW 198.73 5494069.0
Series
DataFrame
Import/Export data

Visual By default arithmetic operations between DataFrames and Series


illustrations
Matplotlib match the index of the Series on the DataFrame’s columns,
Figures and subplots
Plot types and styles The operations will be broadcasted along the rows.
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Operations between DataFrames and Series 169
Essential
concepts
Getting started
Procedural Operations between Series and DataFrames down the columns
programming
Object-orientation series2 = frame7["price"]
Numerical print(frame7.add(series2, axis=0))
programming
NumPy package
NumPy array
## price volume
Linear Algebra ## Daimler 138.40 4456359.20
Data formats and
## E.ON 16.22 3667983.11
handling ## Siemens 221.84 3669597.92
Pandas
## BASF 174.56 1778145.28
## BMW 175.62 1824669.81
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Here, the Series was generated from the price column,
Figures and subplots
Plot types and styles
The arithmetic operation will be broadcasted along a column
Pandas visualization matching the DataFrame’s row index (axis=0).
Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Operations between DataFrames and Series 170
Essential
concepts
Getting started
Procedural Pandas vs Numpy
programming
Object-orientation nparr = np.arange(12.).reshape((3, 4))
Numerical row = nparr[0]
programming
print(nparr-row)
NumPy package
NumPy array
Linear Algebra ## [[0. 0. 0. 0.]
Data formats and
## [4. 4. 4. 4.]
handling ## [8. 8. 8. 8.]]
Pandas
Series
DataFrame
Import/Export data Operations between DataFrames are similar to operations between
Visual
illustrations
one- and two-dimensional Numpy arrays,
Matplotlib
Figures and subplots
As in DataFrames and Series the arithmetic operations will be
Plot types and styles
broadcasted along the rows.
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
NumPy functions on DataFrames 171
Essential
concepts
Getting started DataFrame.apply(np.function, axis): applies a NumPy function
Procedural
programming on the DataFrame axis.
Object-orientation
See also statistical and mathematical NumPy functions.
Numerical
programming
NumPy package Numpy functions on DataFrames
NumPy array
Linear Algebra print(frame7[:2])
Data formats and
handling ## price volume
Pandas
## Daimler 69.20 4456290
## E.ON 8.11 3667975
Series
DataFrame
Import/Export data

Visual
print(frame7.apply(np.mean))
illustrations
Matplotlib ## price 72.664
Figures and subplots
## volume 3079278.400
Plot types and styles
Pandas visualization
## dtype: float64
Applications
Time series
print(frame7.apply(np.sqrt)[:2])
Moving window
Financial applications ## price volume
## Daimler 8.318654 2110.992657
## E.ON 2.847806 1915.195812
© 2018 PyEcon.org
Grouping DataFrames 172
Essential
concepts
Getting started DataFrame.groupby(col1, col2): group DataFrame by columns
Procedural
programming (grouping by one or more than two columns is also possible).
Object-orientation
See also how to import data from CSV files.
Numerical
programming
NumPy package Groupby
NumPy array
Linear Algebra vote = pd.read_csv("data/vote.csv")[["Party", "Member", "Vote"]]
Data formats and print(vote.head())
handling
Pandas
## Party Member Vote
## 0 CDU/CSU Abercron yes
Series
DataFrame
Import/Export data ## 1 CDU/CSU Albani yes
Visual ## 2 CDU/CSU Altenkamp yes
illustrations ## 3 CDU/CSU Altmaier absent
Matplotlib
Figures and subplots
## 4 CDU/CSU Amthor yes
Plot types and styles
Pandas visualization
Adding the functions count() or mean() to groupby() returns the
Applications
Time series sum or the mean of the grouped columns.
Moving window
Financial applications

© 2018 PyEcon.org
Grouping DataFrames 173
Essential
concepts
Getting started
Procedural Groupby
programming
Object-orientation res = vote.groupby(["Party", "Vote"]).count()
Numerical print(res)
programming
NumPy package
NumPy array
## Member
Linear Algebra ## Party Vote
Data formats and
## AfD absent 6
handling ## no 86
Pandas
## BÜ90/GR absent 9
## no 58
Series
DataFrame
Import/Export data ## CDU/CSU absent 7
Visual ## yes 239
illustrations ## DIE LINKE. absent 7
Matplotlib
Figures and subplots
## no 62
Plot types and styles ## FDP absent 5
Pandas visualization ## no 75
Applications ## Fraktionslos absent 1
Time series ## no 1
Moving window
Financial applications
## SPD absent 6
## yes 147

© 2018 PyEcon.org
Section 3.4 174
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Data formats and handling
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I Import/Export data
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Reading data in text format 175
Essential
concepts
Getting started ex1.csv
Procedural
programming
Object-orientation a, b, c, d, hello
Numerical
programming
1, 2, 3, 4, world
NumPy package 5, 6, 7, 8, python
2, 3, 5, 7, pandas
NumPy array
Linear Algebra

Data formats and


handling
Pandas pd.read_csv("file"): read CSV into DataFrame.
Series
DataFrame
Import/Export data Read comma-separated values
Visual
illustrations
df = pd.read_csv("data/ex1.csv")
Matplotlib print(df)
Figures and subplots
Plot types and styles ## a b c d hello
Pandas visualization
## 0 1 2 3 4 world
Applications ## 1 5 6 7 8 python
Time series
Moving window
## 2 2 3 5 7 pandas
Financial applications

© 2018 PyEcon.org
Reading data in text format 176
Essential
concepts
Getting started tab.txt
Procedural
programming
Object-orientation a| b| c| d| hello
Numerical
programming
1| 2| 3| 4| world
NumPy package 5| 6| 7| 8| python
2| 3| 5| 7| pandas
NumPy array
Linear Algebra

Data formats and


handling
Pandas pd.read_table("file", sep): read table with any seperators into
Series
DataFrame DataFrame.
Import/Export data

Visual Read table values


illustrations
Matplotlib
df = pd.read_table("data/tab.txt", sep="|")
Figures and subplots print(df)
Plot types and styles
Pandas visualization
## a b c d hello
Applications ## 0 1 2 3 4 world
Time series
## 1 5 6 7 8 python
Moving window
Financial applications ## 2 2 3 5 7 pandas

© 2018 PyEcon.org
Reading data in text format 177
Essential
concepts
Getting started ex2.csv
Procedural
programming
Object-orientation 1, 2, 3, 4, world
Numerical
programming
5, 6, 7, 8, python
NumPy package 2, 3, 5, 7, pandas
NumPy array
Linear Algebra

Data formats and


handling
CSV file without header row:
Pandas
Series Read CSV and header settings
DataFrame
Import/Export data df = pd.read_csv("data/ex2.csv", header=None)
Visual print(df)
illustrations
Matplotlib ## 0 1 2 3 4
Figures and subplots
Plot types and styles
## 0 1 2 3 4 world
Pandas visualization ## 1 5 6 7 8 python
Applications
## 2 2 3 5 7 pandas
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Reading data in text format 178
Essential
concepts
Getting started ex2.csv
Procedural
programming
Object-orientation 1, 2, 3, 4, world
Numerical
programming
5, 6, 7, 8, python
NumPy package 2, 3, 5, 7, pandas
NumPy array
Linear Algebra

Data formats and


handling
Specify header:
Pandas
Series Read CSV and header names
DataFrame
Import/Export data df = pd.read_csv("data/ex2.csv",
Visual names=["a", "b", "c", "d", "hello"])
illustrations print(df)
Matplotlib
Figures and subplots
Plot types and styles
## a b c d hello
Pandas visualization ## 0 1 2 3 4 world
Applications
## 1 5 6 7 8 python
Time series ## 2 2 3 5 7 pandas
Moving window
Financial applications

© 2018 PyEcon.org
Reading data in text format 179
Essential
concepts
Getting started ex2.csv
Procedural
programming
Object-orientation 1, 2, 3, 4, world
Numerical
programming
5, 6, 7, 8, python
NumPy package 2, 3, 5, 7, pandas
NumPy array
Linear Algebra

Data formats and


handling
Use hello-column as the index:
Pandas
Series Read CSV and specify index
DataFrame
Import/Export data df = pd.read_csv("data/ex2.csv",
Visual names=["a", "b", "c", "d", "hello"],
illustrations index_col="hello")
Matplotlib
print(df)
Figures and subplots
Plot types and styles
Pandas visualization ## a b c d
Applications
## hello
Time series ## world 1 2 3 4
Moving window ## python 5 6 7 8
Financial applications
## pandas 2 3 5 7

© 2018 PyEcon.org
Reading data in text format 180
Essential
concepts
Getting started ex3.csv
Procedural
programming
Object-orientation 1, 2, 3, 4, world
Numerical
programming
#+#-.,.-'*'-.,
NumPy package 5, 6, 7, 8, python
87646756754456978
NumPy array
Linear Algebra

Data formats and 2, 3, 5, 7, pandas


handling
Pandas
Series
DataFrame
Skip rows while reading:
Import/Export data

Visual Read CSV and choose rows


illustrations
Matplotlib df = pd.read_csv("data/ex3.csv", skiprows=[1, 3])
Figures and subplots print(df)
Plot types and styles
Pandas visualization
## 1 2 3 4 world
Applications ## 0 5 6 7 8 python
## 1 2 3 5 7 pandas
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Writing data to text file 181
Essential
concepts
Getting started DataFrame.to_csv("filename’): writing DataFrame to CSV.
Procedural
programming
Object-orientation Write to CSV
Numerical
programming
df = pd.read_csv("data/ex3.csv", skiprows=[1, 3])
NumPy package
df.to_csv("out/out1.csv")
NumPy array
Linear Algebra
out1.csv
Data formats and
handling
Pandas ,1, 2, 3, 4, world
0,5,6,7,8, python
Series
DataFrame
Import/Export data
1,2,3,5,7, pandas
Visual
illustrations
Matplotlib
Figures and subplots In the .csv file, the index and header is included (reason why ,1).
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Writing data to text file 182
Essential
concepts
Getting started
Procedural Write to CSV and settings
programming
Object-orientation df = pd.read_csv("data/ex3.csv", skiprows=[1, 3])
Numerical df.to_csv("out/out2.csv", index=False, header=False)
programming
NumPy package
NumPy array out2.csv
Linear Algebra

Data formats and


handling
5,6,7,8, python
Pandas 2,3,5,7, pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Writing data to text file 183
Essential
concepts
Getting started
Procedural Write to CSV and specify header
programming
Object-orientation df = pd.read_csv("data/ex3.csv", skiprows=[1, 3, 4])
Numerical df.to_csv("out/out3.csv", index=False,
programming
header=["a", "b", "c", "d", "e"])
NumPy package
NumPy array
Linear Algebra
out3.csv
Data formats and
handling
Pandas a,b,c,d,e
Series
DataFrame
5,6,7,8, python
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Reading Excel files 184
Essential
concepts
Getting started pd.read_excel("file.xls"): read .xls files.
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


handling
Pandas
Series
DataFrame
Import/Export data
Figure: goog.xls
Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles Reading Excel
Pandas visualization
xls_frame = pd.read_excel("data/goog.xls")
Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Reading Excel files 185
Essential
concepts
Getting started
Procedural Excel as a DataFrame
programming
Object-orientation print(xls_frame[["Adj Close", "Volume", "High"]])
Numerical
programming ## Adj Close Volume High
NumPy package ## 0 1169.939941 1538700 1173.000000
NumPy array
Linear Algebra
## 1 1167.699951 2412100 1174.000000
## 2 1111.900024 4857900 1123.069946
Data formats and
handling ## 3 1055.800049 3798300 1110.000000
Pandas ## 4 1080.599976 3448000 1081.709961
Series
## 5 1048.579956 2341700 1081.780029
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Remote data access 186
Essential
concepts
Getting started Extract financial data from Internet sources into a DataFrame. There
Procedural
programming are different sources offering different kind of data. Some sources are:
Object-orientation

Numerical
Robinhood
programming
NumPy package IEX
NumPy array
Linear Algebra World Bank
Data formats and
handling
OECD
Pandas
Series
Eurostat
DataFrame
Import/Export data
A complete list of the sources and the usage can be found here:
pandas-datareader
Visual
illustrations
Matplotlib
Figures and subplots
Import pandas-datareader
Plot types and styles from pandas_datareader import data
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Data access: Robinhood 187
Essential
concepts
Getting started data.DataReader("stock symbol", "source", "start", "end"):
Procedural
programming get financial data of a stock in a certain time period.
Object-orientation

Numerical
programming
Robinhood get data
NumPy package
ford = data.DataReader("F", "robinhood", "1/1/2017", "1/31/2018")
print(ford.head()[["close_price", "volume"]])
NumPy array
Linear Algebra

Data formats and


handling ## close_price volume
Pandas ## symbol begins_at
Series
## F 2017-10-09 11.575500 28924795
## 2017-10-10 11.622400 40586234
DataFrame
Import/Export data
## 2017-10-11 11.613000 34953438
Visual
illustrations ## 2017-10-12 11.369100 45924434
Matplotlib ## 2017-10-13 11.303500 44597334
Figures and subplots
Plot types and styles
Pandas visualization Stock code list

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Data access: Robinhood 188
Essential
concepts
Getting started
Procedural Robinhood handle data
programming
Object-orientation print(ford.index)
Numerical
programming
## MultiIndex(levels=[[F], [2017-01-02 00:00:00, 2017-01-03...
NumPy package ## names=[Symbol, Date])
NumPy array
Linear Algebra
print(ford.loc["F", "1/26/2018"])
Data formats and
handling ## close_price 11.063900
Pandas ## high_price 11.111400
Series
## interpolated False
DataFrame
Import/Export data
## low_price 10.921500
Visual
## open_price 11.007000
illustrations ## session reg
Matplotlib ## volume 52496001
## Name: (F, 2018-01-26 00:00:00), dtype: object
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series DataFrame index
Moving window
Financial applications Index of the DataFrame is different at different sources. Always check
DataFrame.index!

© 2018 PyEcon.org
Data access: IEX 189
Essential
concepts
Getting started
Procedural IEX
programming
Object-orientation sap = data.DataReader("SAP", "iex", "1/1/2017", "1/31/2018")
Numerical print(sap[25:27])
programming
NumPy package ## open high low close volume
NumPy array
Linear Algebra
## date
## 2017-02-08 89.5382 90.0263 89.4405 89.6065 653804
Data formats and
handling ## 2017-02-09 89.7139 89.9738 89.5284 89.5284 548787
Pandas
Series print(sap.loc["2017-02-08"])
DataFrame

## open 89.5382
Import/Export data

Visual
illustrations
## high 90.0263
Matplotlib ## low 89.4405
Figures and subplots ## close 89.6065
Plot types and styles
## volume 653804.0000
Pandas visualization
## Name: 2017-02-08, dtype: float64
Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Data access: Eurostat 190
Essential
concepts
Getting started
Procedural Eurostat
programming
Object-orientation population = data.DataReader("tps00001", "eurostat", "1/1/2007",
Numerical "1/1/2018")
programming
NumPy package
print(population.columns)
## MultiIndex(levels=[[Population on 1 January - total], [Albania,
NumPy array
Linear Algebra
## Andorra, Armenia, Austria, Azerbaijan, Belarus, Belgium, ...
Data formats and
handling
Pandas
print(population["Population on 1 January - total", "France"][0:5])
Series
## FREQ Annual
## TIME_PERIOD
DataFrame
Import/Export data

Visual
## 2007-01-01 63645065.0
illustrations ## 2008-01-01 64007193.0
Matplotlib ## 2009-01-01 64350226.0
Figures and subplots
Plot types and styles
## 2010-01-01 64658856.0
Pandas visualization ## 2011-01-01 64978721.0
Applications
Time series Eurostat Database
Moving window
Financial applications

© 2018 PyEcon.org
Read data from HTML 191
Essential
concepts
Getting started Website used for the example: Econometrics
Procedural
programming
Object-orientation Beautiful Soup
Numerical
programming
from bs4 import BeautifulSoup
NumPy package import requests
NumPy array url = "www.uni-goettingen.de/de/applied-econometrics/412565.html"
Linear Algebra
r = requests.get("https://" + url)
Data formats and
handling
d = r.text
Pandas soup = BeautifulSoup(d, "lxml")
Series
DataFrame
print(soup.title)
Import/Export data
## <title>Applied Econometrics - Georg-August-... ...</title>
Visual
illustrations
Matplotlib
Figures and subplots
Reading data from HTML in detail exceeds the content of this course.
Plot types and styles If you are interested in this kind of importing data, you can find detailed
Pandas visualization
information on Beautiful Soup here.
Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Motivation 192
Essential
concepts
Getting started
Procedural Bollinger
programming
Object-orientation sap = data.DataReader("SAP", "iex", "1/1/2017", "8/31/2018")
Numerical sap.index = pd.to_datetime(sap.index)
programming
boll = sap["close"].rolling(window=20, center=False).mean()
NumPy package
NumPy array
std = sap["close"].rolling(window=20, center=False).std()
Linear Algebra upp = boll + std * 2
Data formats and low = boll - std * 2
handling fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
Pandas
Series
DataFrame boll.plot(ax=ax, label="20 days Rolling mean")
Import/Export data upp.plot(ax=ax, label="Upper Band")
Visual low.plot(ax=ax, label="Lower Band")
illustrations
sap["close"].plot(ax=ax, label="SAP Price")
Matplotlib
Figures and subplots ax.legend(loc="best")
Plot types and styles fig.savefig("out/boll.pdf")
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Motivation 193
Essential
concepts
Getting started
Procedural
programming
125 20 days Rolling mean
Object-orientation Upper Band
120 Lower Band
Numerical
programming
SAP Price
NumPy package
115
NumPy array
Linear Algebra
110
Data formats and
handling
Pandas 105
Series
DataFrame
100
Import/Export data

Visual
illustrations 95
Matplotlib
Figures and subplots 90
Plot types and styles
Pandas visualization
85
Applications
1 3 5 7 9 1 1 3 5 7 9
7-0 017-0 017-0 017-0 017-0 017-1 018-0 018-0 018-0 018-0 018-0
Time series
Moving window
201 2 2 2 2 2 2 2 2 2 2
Financial applications date

© 2018 PyEcon.org
Chapter 4 194
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Visual illustrations
Numerical
programming
NumPy package
NumPy array
4.1 Matplotlib
Linear Algebra

Data formats and


4.2 Figures and subplots
handling
Pandas 4.3 Plot types and styles
Series
DataFrame
Import/Export data
4.4 Pandas visualization
Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Section 4.1 195
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Visual illustrations
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I Matplotlib
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
matplotlib 196
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


handling
Pandas The package matplotlib is a free software library for python including
Series
DataFrame the following functions:
Import/Export data

Visual
Image plot, Contour plot, Scatter plot, Polar plot, Line plot, 3-D
illustrations plot,
Matplotlib
Figures and subplots Variety of hardcopy formats,
Plot types and styles
Pandas visualization Works in Python scripts, the Python and IPython shell and the
Applications
Time series
jupyter notebook,
Moving window
Financial applications
Interactive environments.

© 2018 PyEcon.org
matplotlib 197
Essential
concepts
Getting started
Procedural
Usage of matplotlib
programming
Object-orientation matplotlib has a vast number of functions and options, which is hard
Numerical
programming
to remember. But for almost every task there is an example you can
NumPy package take code from. A great source of information is the examples gallery
on the matplotlib homepage. Also note the Best practice Quick
NumPy array
Linear Algebra

Data formats and Start Guide


handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Simple plot 198
Essential
concepts
Getting started plt.plot(array): plot the values of a list, the X-axis has by default
Procedural
programming the range (0, 1, ..., n).
Object-orientation

Numerical
programming
Import matplotlib and simple example
NumPy package
import matplotlib.pyplot as plt
import numpy as np
NumPy array
Linear Algebra

Data formats and


plt.plot(np.arange(10))
handling plt.savefig("out/list.pdf")
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations 8

Matplotlib
Figures and subplots 6

Plot types and styles


4
Pandas visualization

Applications 2

Time series
Moving window 0
0 2 4 6 8
Financial applications

© 2018 PyEcon.org
Section 4.2 199
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Visual illustrations
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I Figures and subplots
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Figures 200
Essential
concepts
Getting started Plots in matplotlib reside in a Figure object:
Procedural
programming plt.figure(figsize) creates new Figure object with multiple options.
Object-orientation
plt.gcf(): reference of the active figure.
Numerical
programming
NumPy package Create Figures
NumPy array
Linear Algebra fig = plt.figure(figsize=(16, 8))
Data formats and print(plt.gcf())
handling
Pandas
## Figure(1600x800)
Series
DataFrame
Import/Export data

Visual
illustrations
A Figure object can be considered as an empty window,
Matplotlib
Figures and subplots
The Figure object has a number of options, such as the size or
Plot types and styles the aspect ratio,
Pandas visualization

Applications You cannot make a plot in a blank figure. There has to be a


Time series
Moving window
subplot in the Figure object.
Financial applications

© 2018 PyEcon.org
Saving plots to file 201
Essential
concepts
Getting started plt.savefig("filename"): Saving active figure to file.
Procedural
programming Available file formats are among others:
Object-orientation

Numerical
programming Filename extension Description
NumPy package
NumPy array .png Portable Network Graphics
Linear Algebra
.pdf Portable Document Format
Data formats and
handling .svg Scalable Vector Graphics
Pandas
Series
.jpeg JPEG File Interchange Format
DataFrame
Import/Export data
.jpg JPEG File Interchange Format
Visual .ps PostScript
illustrations
Matplotlib
.raw Raw image format
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Subplots 202
Essential
concepts
Getting started fig.add_subplot(): adds subplot to the Figure fig.
Procedural
programming Example: fig.add_subplot(2, 2, 1) creates four subplots and se-
Object-orientation
lects the first.
Numerical
programming
NumPy package
Adding subplots
NumPy array
Linear Algebra
ax1 = fig.add_subplot(2, 2, 1)
Data formats and
ax2 = fig.add_subplot(2, 2, 2)
handling ax3 = fig.add_subplot(2, 2, 3)
Pandas
ax4 = fig.add_subplot(2, 2, 4)
fig.savefig("out/subplots.pdf")
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
The Figure object is filled with subplots in which the plots reside,
Using the plt.plot() command without creating a subplot in
Figures and subplots
Plot types and styles
Pandas visualization
advance, matplotlib will create a Figure object and a subplot
Applications
Time series
automatically,
Moving window
Financial applications
The Figure object and its subplots can be created in one line.

© 2018 PyEcon.org
Subplots 203
Essential
concepts
Getting started
Procedural
programming
Object-orientation
1.0 1.0
Numerical 0.8 0.8
programming
NumPy package 0.6 0.6

NumPy array 0.4 0.4


Linear Algebra
0.2 0.2
Data formats and
0.0 0.0
handling 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Pandas 1.0 1.0
Series
0.8 0.8
DataFrame
Import/Export data 0.6 0.6

Visual 0.4 0.4


illustrations
0.2 0.2
Matplotlib
0.0 0.0
Figures and subplots 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Subplots 204
Essential
concepts
Getting started
Procedural Filling subplots with content
programming
Object-orientation from numpy.random import randn
Numerical ax1.plot([5, 7, 4, 3, 1])
programming ax2.hist(randn(100), bins=20, color="r")
ax3.scatter(np.arange(30), np.arange(30)*randn(30))
NumPy package
NumPy array
Linear Algebra ax4.plot(randn(40), "k--")
Data formats and fig.savefig("out/content.pdf")
handling
Pandas
Series
DataFrame The subplots in one Figure object can be filled with different plot
Import/Export data

Visual
types,
illustrations
Matplotlib
Using only plt.plot() matplotlib draws the plot in the last
Figures and subplots Figure object and last subplot selected.
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Subplots 205
Essential
concepts
Getting started
Procedural
programming
Object-orientation
7
Numerical 12
6
programming 10
5
NumPy package 8
4
NumPy array 6
3
Linear Algebra 4
2
2
Data formats and 1
0
handling 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 2 1 0 1 2
Pandas 30
2
Series 20
DataFrame 10 1
Import/Export data 0
0
10
Visual
20 1
illustrations
Matplotlib 30 2
Figures and subplots 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 40
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Standard creation of plots 206
Essential
concepts
Getting started plt.subplots(nrows, ncols, sharex, sharey): creates figure and
Procedural
programming subplots in one line. If sharex or sharey are True, all subplots share
Object-orientation
the same X- or Y-ticks.
Numerical
programming
NumPy package
Standard creation
NumPy array fig, axes = plt.subplots(2, 3, figsize=(16, 8), sharey=True)
Linear Algebra
axes[1, 1].plot(np.arange(7), color="r")
Data formats and
handling
axes[0, 2].plot(np.arange(10, 0, -1))
Pandas fig.savefig("out/standard.pdf")
Series
DataFrame
Import/Export data

Visual 10
illustrations 8
Matplotlib 6

Figures and subplots 4

Plot types and styles 2

0
Pandas visualization 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8

10
Applications
8
Time series
6
Moving window
4
Financial applications 2

0
0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 1.0

© 2018 PyEcon.org
Section 4.3 207
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Visual illustrations
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I Plot types and styles
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Plot types 208
Essential
concepts
Getting started ax.scatter(x, y): create a scatter plot of x vs y.
Procedural
programming ax.hist(x, bins): create a histogram.
Object-orientation
ax.fill_between(x, y, a): create a plot of x vs y and fills plot
Numerical
programming between a and y.
NumPy package
NumPy array
Linear Algebra
Types
Data formats and fig, ax = plt.subplots(1, 3, figsize=(16, 8))
handling
Pandas
ax[0].hist([1, 2, 3, 4, 5, 4, 3, 2, 3, 4, 2, 3, 4, 4], bins=5,
Series color="yellow")
DataFrame x = np.arange(0, 10, 0.1)
Import/Export data
y = np.sin(x)
Visual
illustrations
ax[1].fill_between(x, y, 0, color="green")
Matplotlib ax[2].scatter(x, y)
Figures and subplots fig.savefig("out/types.pdf")
Plot types and styles
Pandas visualization

Applications A vast number of plot types can be found in the examples gallery.
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Plot types 209
Essential
concepts
Getting started
Procedural
programming
Object-orientation
5 1.00 1.00
Numerical
programming
0.75 0.75
NumPy package
4
NumPy array 0.50 0.50
Linear Algebra
0.25 0.25
Data formats and 3
handling
0.00 0.00
Pandas
Series 2 0.25 0.25
DataFrame
Import/Export data 0.50 0.50

1
Visual 0.75 0.75
illustrations
Matplotlib 1.00 1.00
0
Figures and subplots 1 2 3 4 5 0 2 4 6 8 10 0 2 4 6 8 10
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Adjusting the spacing around subplots 210
Essential
concepts
Getting started plt.subplots_adjust(left, bottom, ..., hspace): set the space
Procedural
programming between the subplots. wspace and hspace control the percentage of
Object-orientation
the figure width and figure height, respectively, to use as spacing
Numerical
programming between subplots.
NumPy package
NumPy array
Linear Algebra
Adjust spacing
Data formats and fig, axes = plt.subplots(2, 2, sharex=True, sharey=True)
handling
Pandas
for i in range(2):
Series for j in range(2):
DataFrame axes[i][j].plot(randn(10))
Import/Export data
plt.subplots_adjust(wspace=0, hspace=0)
Visual
illustrations
fig.savefig("out/spacing.pdf")
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Adjusting the spacing around subplots 211
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical 1.5
programming 1.0
NumPy package
0.5
NumPy array
Linear Algebra 0.0
Data formats and
0.5
handling 1.0
Pandas 1.5
Series
2.0
DataFrame
2.5
Import/Export data
1.5
Visual 1.0
illustrations
Matplotlib
0.5
Figures and subplots 0.0
Plot types and styles 0.5
Pandas visualization
1.0
Applications 1.5
Time series
2.0
Moving window
Financial applications
2.5
0 2 4 6 8 0 2 4 6 8

© 2018 PyEcon.org
Colors, markers and line styles 212
Essential
concepts
Getting started ax.plot(data, linestyle, color, marker): set data and styles
Procedural
programming of subplot ax.
Object-orientation

Numerical
programming
Styles
NumPy package
fig, ax = plt.subplots(1, figsize=(15, 6))
ax.plot(randn(10), linestyle="--", color="darkcyan", marker="p")
NumPy array
Linear Algebra

Data formats and


fig.savefig("out/style.pdf")
handling
Pandas
Series
DataFrame
Import/Export data 1.5

Visual
illustrations 1.0
Matplotlib
Figures and subplots 0.5
Plot types and styles
Pandas visualization 0.0

Applications
0.5
Time series
Moving window
Financial applications 1.0
0 2 4 6 8

© 2018 PyEcon.org
Plot colors 213
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Plot line styles 214
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Plot markers 215
Essential
concepts
Getting started
Procedural
Marker Description
programming
Object-orientation
"." point
Numerical "," pixel
programming
NumPy package
"o" circle
NumPy array "v" triangle_down
Linear Algebra

Data formats and


"8" octagon
handling
Pandas
"s" square
Series "p" pentagon
DataFrame
Import/Export data "P" plus (filled)
Visual "*" star
illustrations
Matplotlib "h" hexagon1
Figures and subplots
Plot types and styles
"H" hexagon2
Pandas visualization "+" plus
Applications
Time series
"x" x
Moving window "X" x (filled)
Financial applications
"D" diamond

© 2018 PyEcon.org
Ticks and labels 216
Essential
concepts
Getting started ax.set_xticks(): set list of X-ticks, alalogous for Y-axis.
Procedural
programming ax.set_xlabel(): set the X-label.
Object-orientation
ax.set_title(): set the subplot title.
Numerical
programming
NumPy package Ticks and labels - default
NumPy array
Linear Algebra
fig, ax = plt.subplots(1, figsize=(15, 10))
Data formats and
ax.plot(randn(1000).cumsum())
handling fig.savefig("out/withoutlabls.pdf")
Pandas
Series
DataFrame
Import/Export data
Here a Figure object and a subplot were created and filled with a
Visual
illustrations plot,
Matplotlib
Figures and subplots By default matplotlib places the ticks evenly distributed along the
Plot types and styles
Pandas visualization
data range. Individual ticks can be set as follows,
Applications By default there is no axis label or title.
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Ticks and labels 217
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
0
NumPy package
NumPy array
Linear Algebra 10
Data formats and
handling
Pandas 20

Series
DataFrame
30
Import/Export data

Visual
illustrations 40
Matplotlib
Figures and subplots
Plot types and styles 50
Pandas visualization

Applications 60
Time series
Moving window
0 200 400 600 800 1000
Financial applications

© 2018 PyEcon.org
Ticks and labels 218
Essential
concepts
Getting started
Procedural Set ticks and labels
programming
Object-orientation ax.set_xticks([0, 250, 500, 750, 1000])
Numerical ax.set_xlabel("Days", fontsize=20)
programming ax.set_ylabel("Change", fontsize=20)
NumPy package
ax.set_title("Simulation", fontsize=30)
NumPy array
Linear Algebra
fig.savefig("out/labels.pdf")
Data formats and
handling

The individual ticks are given as a list to ax.set_xticks(),


Pandas
Series
DataFrame
Import/Export data The label and titel can be set to an individual size using the
Visual argument fontsize.
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Ticks and labels 219
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Simulation
programming
0
NumPy package
NumPy array
Linear Algebra 10
Data formats and
handling
Pandas 20

Series
Change

DataFrame
30
Import/Export data

Visual
illustrations 40
Matplotlib
Figures and subplots
Plot types and styles 50
Pandas visualization

Applications 60
Time series
Moving window
0 250 500 750 1000
Financial applications Days

© 2018 PyEcon.org
Legends 220
Essential
concepts
Getting started Using multiple plots in one subplot one needs a legend.
Procedural
programming ax.legend(loc): showing the legend at location loc.
Object-orientation
Some options: "best", "upper right", "center left", ...
Numerical
programming
NumPy package Set legend
NumPy array
Linear Algebra fig = plt.figure(figsize=(15, 10))
Data formats and ax = fig.add_subplot(1, 1, 1)
handling ax.plot(randn(1000).cumsum(), label="first")
ax.plot(randn(1000).cumsum(), label="second")
Pandas
Series
DataFrame ax.plot(randn(1000).cumsum(), label="third")
Import/Export data ax.legend(loc="best", fontsize=20)
Visual fig.savefig("out/legend.pdf")
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization
The legend displays the label and the color of the associated plot,
Applications Using the option "best" the legend will placed in a corner where
Time series
Moving window is does not interfere the plots.
Financial applications

© 2018 PyEcon.org
Legends 221
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical 30
programming first
NumPy package second
NumPy array 20 third
Linear Algebra

Data formats and


handling 10
Pandas
Series
DataFrame 0
Import/Export data

Visual
illustrations 10

Matplotlib
Figures and subplots
Plot types and styles 20

Pandas visualization

Applications
30
Time series
Moving window
0 200 400 600 800 1000
Financial applications

© 2018 PyEcon.org
Annotations on a subplot 222
Essential
concepts
Getting started ax.text(x, y, "text", fontsize): insert text into a subplot.
Procedural
programming ax.annotate("text", xy, xytext, arrwoprops): insert arrow with
Object-orientation
annotations.
Numerical
programming
NumPy package
Annotations
NumPy array
ax.text(400, -30, "here", fontsize=50)
Linear Algebra
ax.annotate("there",
Data formats and
handling fontsize=40,
Pandas xy=(0, 0),
Series
xytext=(400, 8),
arrowprops=dict(facecolor="black",
DataFrame
Import/Export data

Visual
shrink=0.05))
illustrations ax.set_yticks([-40, -30, -20, -10, 0, 10, 20, 30, 40])
Matplotlib fig.savefig("out/arrow.pdf")
Figures and subplots
Plot types and styles
Pandas visualization

Applications Using ax.annotate() the arrow head points at xy and the


bottom left corner of the text will be placed at xytext.
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Annotations 223
Essential
concepts
Getting started
Procedural
programming
Object-orientation

40
Numerical
programming first
NumPy package
30
second
NumPy array third
Linear Algebra
20
Data formats and

there
handling
Pandas 10
Series
DataFrame
Import/Export data
0

Visual
illustrations 10
Matplotlib
Figures and subplots
20

here
Plot types and styles
Pandas visualization
30
Applications
Time series
Moving window 40
0 200 400 600 800 1000
Financial applications

© 2018 PyEcon.org
Annotations 224
Essential
concepts
Getting started
Procedural Annotation Lehman
programming
Object-orientation import pandas as pd
Numerical
from datetime import datetime
programming date = datetime(2008, 9, 15)
NumPy package
fig = plt.figure(figsize=(16, 8))
ax = fig.add_subplot(1, 1, 1)
NumPy array
Linear Algebra

Data formats and


dow = pd.read_csv("data/dji.csv", index_col=0, parse_dates=True)
handling close = dow["Close"]
Pandas close.plot(ax=ax)
Series
DataFrame
ax.annotate("Lehman Bankruptcy",
Import/Export data fontsize=30,
Visual
xy=(date, close.loc[date] + 400),
illustrations xytext=(date, 22000),
Matplotlib
arrowprops=dict(facecolor="red",
Figures and subplots
Plot types and styles
shrink=0.03))
Pandas visualization ax.set_title("Dow Jones Industrial Average", size=40)
Applications fig.savefig("out/lehman.pdf")
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Annotations 225
Essential
concepts
Getting started

Dow Jones Industrial Average


Procedural
programming
Object-orientation
27500
Numerical
programming 25000

NumPy package
NumPy array
22500 Lehman Bankruptcy
Linear Algebra 20000

Data formats and 17500


handling
15000
Pandas
Series 12500
DataFrame
10000
Import/Export data
7500
Visual
illustrations
6 8 0 2 4 6 8
200 200 201 201 201 201 201
Matplotlib Date
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Drawing on a subplot 226
Essential
concepts
Getting started plt.Rectangle((x, y), width, height, angle): create a rect-
Procedural
programming angle
Object-orientation
plt.Circle((x,y), radius): create a circle.
Numerical
programming
NumPy package Drawing
NumPy array
Linear Algebra fig = plt.figure(figsize=(6, 6))
Data formats and ax = fig.add_subplot(1, 1, 1)
handling ax.set_xticks([0, 1, 2, 3, 4, 5])
ax.set_yticks([0, 1, 2, 3, 4, 5])
Pandas
Series
DataFrame rectangle = plt.Rectangle((1.5, 1),
Import/Export data width=0.8, height=2,
Visual color="red", angle=30)
illustrations
Matplotlib
circ = plt.Circle((3, 3),
Figures and subplots radius=1, color="blue")
Plot types and styles ax.add_patch(rectangle)
Pandas visualization
ax.add_patch(circ)
Applications fig.savefig("out/draw.pdf")
Time series
Moving window
Financial applications A list of all available patches can be found here: matplotlib-patches

© 2018 PyEcon.org
Drawing on a subplot 227
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical
5
programming
NumPy package
NumPy array
Linear Algebra 4
Data formats and
handling
Pandas
Series 3
DataFrame
Import/Export data

Visual
illustrations 2
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization 1
Applications
Time series
Moving window
Financial applications 0
0 1 2 3 4 5

© 2018 PyEcon.org
Best practice: Visual illustrations 228
Essential
concepts
Getting started Step 1
Procedural
programming Create a Figure object and subplots
Object-orientation

Numerical
programming
Best practice Step 1
NumPy package
fig, ax = plt.subplots(1, 1, figsize=(16, 8))
NumPy array
Linear Algebra

Data formats and Step 2


handling
Pandas Plot data using different plot types
Series
DataFrame
An overview of plot types can be found in the examples gallery.
Import/Export data

Visual
Best practice Step 2
illustrations
Matplotlib
x = np.arange(0, 10, 0.1)
Figures and subplots y = np.sin(x)
Plot types and styles
ax.scatter(x, y)
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Best practice: Visual illustrations 229
Essential
concepts
Getting started
Procedural
programming
Object-orientation
1.00
Numerical
programming
0.75
NumPy package
NumPy array 0.50
Linear Algebra
0.25
Data formats and
handling
0.00
Pandas
Series 0.25
DataFrame
Import/Export data 0.50

Visual 0.75
illustrations
Matplotlib 1.00
Figures and subplots 0 2 4 6 8 10
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Best practice: Visual illustrations 230
Essential
concepts
Getting started Step 3
Procedural
programming Set colors, markers and line styles
Object-orientation

Numerical
programming
Best practice Step 3
NumPy package
ax.scatter(x, y, color="green", marker="s")
NumPy array
Linear Algebra

Data formats and Step 4


handling
Pandas Set title, axis labels and ticks
Series
DataFrame Best practice Step 4
Import/Export data

Visual ax.set_title("Sine wave", fontsize=30)


illustrations ax.set_xticks([0, 2.5, 5, 7.5, 10])
ax.set_yticks([-1, 0, 1])
Matplotlib
Figures and subplots
Plot types and styles ax.set_ylabel("y-value", fontsize=20)
Pandas visualization ax.set_xlabel("x-value", fontsize=20)
Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Best practice: Visual illustrations 231
Essential
concepts
Getting started
Procedural

Sine wave
programming
Object-orientation

1
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


y-value

handling
0
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib 1
Figures and subplots 0.0 2.5 5.0 7.5 10.0
Plot types and styles x-value
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Best practice: Visual illustrations 232
Essential
concepts
Getting started Step 5
Procedural
programming Set labels
Object-orientation

Numerical Best practice Step 5


programming
NumPy package ax.scatter(x, y, color="green", marker="s", label="Sine")
NumPy array
Linear Algebra

Data formats and Step 6


handling
Pandas
Set legend (if you add another plot to existing subfigure)
Series
DataFrame Best practice Step 6
Import/Export data

Visual ax.plot(np.arange(11)/10, color="blue", linestyle="-",


illustrations label="Linear")
Matplotlib
Figures and subplots
ax.legend(fontsize=20)
Plot types and styles
Pandas visualization
Step 7
Applications
Time series
Save plot to file
Moving window
Financial applications
Best practice Step 7
fig.savefig("out/sinewave.pdf")
© 2018 PyEcon.org
Best practice: Visual illustrations 233
Essential
concepts
Getting started
Procedural

Sine wave
programming
Object-orientation

1
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


y-value

handling
0
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations Linear
Matplotlib 1 Sine
Figures and subplots 0.0 2.5 5.0 7.5 10.0
Plot types and styles x-value
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Section 4.4 234
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Visual illustrations
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I Pandas visualization
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Line plots 235
Essential
concepts
Getting started DataFrame/Series.plot(): plot a DataFrame or a Series.
Procedural
programming
Object-orientation Simple line plot
Numerical
programming plt.close("all")
NumPy package p = pd.Series(np.random.rand(10).cumsum(), index=np.arange(0, 1000,
NumPy array
100))
Linear Algebra
print(p)
Data formats and
handling
Pandas
## 0 0.888442
Series ## 100 1.549929
DataFrame
## 200 2.258732
## 300 2.485168
Import/Export data

Visual
illustrations
## 400 3.156098
Matplotlib ## 500 3.373227
Figures and subplots ## 600 4.102376
Plot types and styles
Pandas visualization
## 700 4.307634
## 800 5.019096
Applications
Time series
## 900 5.687669
Moving window ## dtype: float64
Financial applications

p.plot()
plt.savefig("out/line.pdf")
© 2018 PyEcon.org
Line plots 236
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
NumPy array 5
Linear Algebra

Data formats and


handling 4
Pandas
Series
DataFrame
3
Import/Export data

Visual
illustrations
2
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization 1
Applications 0 200 400 600 800
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Line plots 237
Essential
concepts
Getting started
Procedural Line plots
programming
Object-orientation df = pd.DataFrame(np.random.randn(10, 3), index=np.arange(10),
Numerical columns=["a", "b", "c"])
programming
print(df)
NumPy package
NumPy array
Linear Algebra ## a b c
Data formats and
## 0 0.362041 0.350474 -1.992641
handling ## 1 -0.481396 1.250534 -0.017076
Pandas
## 2 -1.007017 -0.843875 -1.163215
## 3 -0.043806 0.896435 0.279640
Series
DataFrame
Import/Export data ## 4 -0.011092 -0.714289 0.762072
Visual ## 5 -1.758891 1.332606 -0.931393
illustrations ## 6 -0.361416 -1.811150 -0.677346
Matplotlib
Figures and subplots
## 7 0.503350 -0.806999 0.129074
Plot types and styles ## 8 -0.100652 -0.958269 -1.053158
Pandas visualization ## 9 -1.747851 -0.064166 0.267087
Applications
Time series df.plot(figsize=(15, 12))
Moving window
plt.savefig("out/line2.pdf")
Financial applications

© 2018 PyEcon.org
Line plots 238
Essential
concepts
Getting started
Procedural 1.5
a
programming b
Object-orientation c

Numerical 1.0
programming
NumPy package
NumPy array
0.5
Linear Algebra

Data formats and


handling
0.0
Pandas
Series
DataFrame
Import/Export data 0.5

Visual
illustrations
Matplotlib 1.0
Figures and subplots
Plot types and styles
Pandas visualization
1.5
Applications
Time series
Moving window 2.0
Financial applications
0 2 4 6 8

© 2018 PyEcon.org
Plotting and pandas 239
Essential
concepts
Getting started The plot method applied to a DataFrame plots each column as a
Procedural
programming different line and shows the legend automatically. Plotting DataFrames,
Object-orientation
there are serveral arguments to change the style of the plot:
Numerical
programming
NumPy package
NumPy array Argument Description
Linear Algebra
kind "line", "bar", etc
Data formats and
handling logy logarithmic scale on Y-axis
Pandas
Series
use_index If True, use index for tick labels
DataFrame
rot Rotation of tick labels
Import/Export data

Visual
xticks Values for x ticks
illustrations
Matplotlib
yticks Values for y ticks
Figures and subplots grid Set grid True or False
Plot types and styles
Pandas visualization xlim X-axis limits
Applications ylim Y-axis limits
Time series
Moving window
subplots Plot each DataFrame column in a new subplot
Financial applications

Table: pandas plot arguments

© 2018 PyEcon.org
Pandas plot 240
Essential
concepts
Getting started
Procedural
Separated line plots
programming
Object-orientation df.plot(grid=True, rot=45, subplots=True, title="Example",
Numerical figsize=(15, 10))
programming plt.savefig("out/pandas.pdf")
NumPy package
NumPy array
Linear Algebra
Example
Data formats and
handling
0.5 a
Pandas
0.0
Series 0.5
DataFrame 1.0
Import/Export data 1.5

Visual 1.5
1.0 b
illustrations
0.5
Matplotlib 0.0
0.5
Figures and subplots 1.0
1.5
Plot types and styles
Pandas visualization
0.5 c
Applications 0.0
0.5
Time series
1.0
Moving window 1.5
2.0
Financial applications
0

© 2018 PyEcon.org 8
Standard creation of plots and pandas 241
Essential
concepts
Getting started DataFrame.plot(ax = subplot): plot DataFrame into an existing
Procedural
programming subplot.
Object-orientation

Numerical
programming
Standard creation
NumPy package fig = plt.figure(figsize=(6, 6))
NumPy array
ax = fig.add_subplot(1, 1, 1)
guests = np.array([[1334, 456], [1243, 597], [1477, 505],
Linear Algebra

Data formats and


handling [1502, 404], [854, 512], [682, 0]])
Pandas canteen = pd.DataFrame(guests,
Series index=["Mon", "Tue", "Wed",
DataFrame
Import/Export data
"Thu", "Fri", "Sat"],
columns=["Zentral", "Turm"])
Visual
illustrations print(canteen)
Matplotlib
Figures and subplots ## Zentral Turm
Plot types and styles
Pandas visualization
## Mon 1334 456
## Tue 1243 597
Applications
Time series
## Wed 1477 505
Moving window ## Thu 1502 404
Financial applications ## Fri 854 512
## Sat 682 0

© 2018 PyEcon.org
Standard creation of plots and pandas 242
Essential
concepts
Getting started
Procedural Bar plot
programming
Object-orientation canteen.plot(ax=ax, kind="bar")
Numerical ax.set_ylabel("guests", fontsize=20)
programming
ax.set_title("Canteen use in Göttingen", fontsize=20)
NumPy package
NumPy array
fig.savefig("out/canteen.pdf")
Linear Algebra

Data formats and

The bar plot resides in the subplot ax,


handling
Pandas
Series
DataFrame The label and title are set as shown before without using pandas.
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Bar plot 243
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Canteen use in Göttingen
Numerical Zentral
programming Turm
NumPy package
1400
NumPy array
Linear Algebra 1200
Data formats and
handling 1000
guests
Pandas
Series
800
DataFrame
Import/Export data
600
Visual
illustrations
Matplotlib
400
Figures and subplots
Plot types and styles 200
Pandas visualization

Applications 0
Mon

Tue

Wed

Thu

Fri

Sat
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Bar plot 244
Essential
concepts
Getting started
Procedural Bar plot - stacked
programming
Object-orientation canteen.plot(ax=ax, kind="bar", stacked=True)
Numerical ax.set_ylabel("guests", fontsize=20)
programming
ax.set_title("Canteen use in Göttingen", fontsize=20)
NumPy package
NumPy array
fig.savefig("out/canteenstacked.pdf")
Linear Algebra

Data formats and


handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Bar plot 245
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Canteen use in Göttingen
Numerical 2000 Zentral
programming Turm
NumPy package Zentral
1750 Turm
NumPy array
Linear Algebra
1500
Data formats and
handling
1250
guests
Pandas
Series
DataFrame 1000
Import/Export data

Visual
750
illustrations
Matplotlib 500
Figures and subplots
Plot types and styles 250
Pandas visualization

Applications 0
Mon

Tue

Wed

Thu

Fri

Sat
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Plot financial data 246
Essential
concepts
Getting started
Procedural BTC chart
programming
Object-orientation fig = plt.figure(figsize=(16, 8))
Numerical ax = fig.add_subplot(1, 1, 1)
programming ax.set_ylabel("price", fontsize=20)
NumPy package
ax.set_xlabel("Date", fontsize=20)
NumPy array
Linear Algebra
BTC = pd.read_csv("data/btc-eur.csv", index_col=0, parse_dates=True)
Data formats and
BTCclose = BTC["Close"]
handling BTCclose.plot(ax=ax)
Pandas
ax.set_title("BTC-EUR", fontsize=20)
fig.savefig("out/btc.pdf")
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Plot financial data 247
Essential
concepts
Getting started
Procedural
programming
Object-orientation BTC-EUR
Numerical
15000
programming
NumPy package
12500
NumPy array
Linear Algebra
10000
price

Data formats and


handling 7500
Pandas
Series 5000

DataFrame
Import/Export data
2500

Visual 0
illustrations
2 3 4 5 6 7 8 9
201 201 201 201 201 201 201 201
Matplotlib
Date
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Plot financial data 248
Essential
concepts
Getting started
Procedural Compare - bad illustration
programming
Object-orientation amazon = pd.read_csv("data/amzn.csv", index_col=0,
Numerical parse_dates=True)["Close"]
programming
siemens = pd.read_csv("data/sie.de.csv", index_col=0,
NumPy package
NumPy array
parse_dates=True)["Close"]
Linear Algebra fig = plt.figure(figsize=(16, 8))
Data formats and ax = fig.add_subplot(1, 1, 1)
handling ax.set_ylabel("price")
amazon.plot(ax=ax, label="Amazon")
Pandas
Series
DataFrame siemens.plot(ax=ax, label="Siemens")
Import/Export data ax.legend(loc="best")
Visual fig.savefig("out/compare.pdf")
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization
In this illustration you can hardly compare the trend of the two
Applications stocks,
Time series
Moving window
Using pandas you can standardize both dataframes in one line.
Financial applications

© 2018 PyEcon.org
Plot financial data 249
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Amazon
Numerical Siemens
1400
programming
NumPy package 1200
NumPy array
Linear Algebra 1000

Data formats and


price

800
handling
Pandas 600
Series
DataFrame 400
Import/Export data
200
Visual
illustrations
7-03 7-0
5
7-0
7
7-0
9
7-1
1
8-0
1
8-0
3
Matplotlib 201 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Plot financial data 250
Essential
concepts
Getting started
Procedural Compare - good illustration
programming
Object-orientation amazon = amazon/amazon[0] * 100
Numerical siemens = siemens/siemens[0] * 100
programming
fig = plt.figure(figsize=(16, 8))
NumPy package
NumPy array
ax = fig.add_subplot(1, 1, 1)
Linear Algebra ax.set_ylabel("percentage")
Data formats and amazon.plot(ax=ax, label="Amazon")
handling siemens.plot(ax=ax, label="Siemens")
ax.legend(loc="best")
Pandas
Series
DataFrame fig.savefig("out/comparenew.pdf")
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Plot financial data 251
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Amazon
Numerical Siemens
programming
160
NumPy package
NumPy array
Linear Algebra
140
percentage

Data formats and


handling
Pandas 120
Series
DataFrame
Import/Export data 100

Visual
illustrations
7-03 7-0
5
7-0
7
7-0
9
7-1
1
8-0
1
8-0
3
Matplotlib 201 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Chapter 5 252
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Applications
Numerical
programming
NumPy package
NumPy array
5.1 Time series
Linear Algebra

Data formats and


5.2 Moving window
handling
Pandas 5.3 Financial applications
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Section 5.1 253
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Applications
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I Time series
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Date and time data types 254
Essential
concepts
Getting started Data types for date and time are included in the Python standard
Procedural
programming library.
Object-orientation

Numerical
programming
Datetime creation
NumPy package from datetime import datetime
NumPy array now = datetime.now()
Linear Algebra
print(now)
Data formats and
handling
Pandas
## 2018-10-08 22:53:27.197198
Series
DataFrame print(now.day)
Import/Export data

Visual ## 8
illustrations
Matplotlib
Figures and subplots
print(now.hour)
Plot types and styles
Pandas visualization ## 22
Applications
Time series From datetime you can get the attributes year, month, day, hour,
second.
Moving window
Financial applications

© 2018 PyEcon.org
Set datetime 255
Essential
concepts
Getting started datetime(year, month, day, hour, minute, second): set time
Procedural
programming and date.
Object-orientation

Numerical Datetime representation


programming
NumPy package holiday = datetime(2018, 12, 24, 8, 30)
NumPy array
print(holiday)
Linear Algebra

Data formats and


handling
## 2018-12-24 08:30:00
Pandas
Series exam = datetime(2018, 11, 9, 10)
DataFrame print("The exam will be on the " + "{:%Y-%m-%d}".format(exam))
Import/Export data

Visual ## The exam will be on the 2018-11-09


illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Time difference 256
Essential
concepts
Getting started timedelta(days, seconds): represent difference between two date-
Procedural
programming time objects.
Object-orientation

Numerical
programming
Datetime difference
NumPy package from datetime import timedelta
NumPy array delta = exam - now
Linear Algebra
print(delta)
Data formats and
handling
Pandas
## 31 days, 11:06:32.802802
Series
DataFrame print("The exam will take place in " + str(delta.days) + " days.")
Import/Export data

Visual ## The exam will take place in 31 days.


illustrations
Matplotlib
Figures and subplots
print(now)
Plot types and styles
Pandas visualization ## 2018-10-08 22:53:27.197198
Applications
Time series print(now + timedelta(10, 120))
Moving window
Financial applications ## 2018-10-18 22:55:27.197198

© 2018 PyEcon.org
Convert string and datetime 257
Essential
concepts
Getting started datetime.strftime("format"): convert datetime object into string.
Procedural
programming datetime.strptime(datestring, "format"): convert date as a
Object-orientation
string into a datetime object.
Numerical
programming
NumPy package Convert Datetime
NumPy array
Linear Algebra
stamp = datetime(2018, 4, 12)
Data formats and
print(stamp)
handling
Pandas ## 2018-04-12 00:00:00
Series
DataFrame
Import/Export data
print("German date format: " + stamp.strftime("%d.%m.%Y"))
Visual
illustrations
## German date format: 12.04.2018
Matplotlib
Figures and subplots val = "2018-5-5"
Plot types and styles d = datetime.strptime(val, "%Y-%m-%d")
Pandas visualization
print(d)
Applications
Time series
## 2018-05-05 00:00:00
Moving window
Financial applications

© 2018 PyEcon.org
Convert string and datetime 258
Essential
concepts
Getting started
Procedural Converting examples
programming
Object-orientation val = "31.01.2012"
Numerical d = datetime.strptime(val, "%d.%m.%Y")
programming
print(d)
NumPy package
NumPy array
Linear Algebra
## 2012-01-31 00:00:00
Data formats and
handling print(now.strftime("Today is %A and we are in week %W of the year
Pandas %Y."))
Series
DataFrame
Import/Export data
## Today is Monday and we are in week 41 of the year 2018.
Visual
illustrations
print(now.strftime("%c"))
Matplotlib
Figures and subplots ## Mon 08 Oct 2018 10:53:27 PM
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Overview: datetime formats 259
Essential
concepts
Getting started
Procedural
programming Type Description
Object-orientation
%Y 4-digit year
Numerical
programming %m 2-digit month [01, 12]
NumPy package
NumPy array %d 2-digit day [01, 31]
Linear Algebra
%H Hour (24-hour clock) [00, 23]
Data formats and
handling %I Hour (12-hour clock) [01, 12]
Pandas
Series
%M 2-digit minute [00, 59]
DataFrame %S Second [00, 61]
Import/Export data

Visual
%W Week number of the year [00, 53]
illustrations
Matplotlib
%F Shortcut for %Y-%m-%d
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Overview : datetime formats 260
Essential
concepts
Getting started
Procedural
programming Type Description
Object-orientation
%a Abbreviated weekday name
Numerical
programming %A Full weekday name
NumPy package
NumPy array %b Abbreviated month name
Linear Algebra
%B Full month name
Data formats and
handling %c Full date and time
Pandas
Series
%x Locale-appropriate formatted date
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Generating date ranges with pandas 261
Essential
concepts
Getting started pd.date_range(start, end, freq): generate a date range.
Procedural
programming
Object-orientation Date ranges
Numerical
programming import pandas as pd
NumPy package index = pd.date_range("2018-01-01", now)
NumPy array print(index[0:2])
Linear Algebra
print(index[15:16])
Data formats and
handling
index = pd.date_range("2018-01-01", now, freq="M")
Pandas print(index[0:2])
Series
DataFrame ## DatetimeIndex(['2018-01-01', '2...ype='datetime64[ns]', freq='D')
Import/Export data ## DatetimeIndex(['2018-01-16'], dtype='datetime64[ns]', freq='D')
Visual ## DatetimeIndex(['2018-01-31', '2...ype='datetime64[ns]', freq='M')
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Overview: time series frequencies 262
Essential
concepts
Getting started
Procedural
programming Alias Offset type
Object-orientation
D Day
Numerical
programming B Business day
NumPy package
NumPy array H Hour
Linear Algebra
T Minute
Data formats and
handling S Second
Pandas
Series
M Month end
DataFrame BM Business month end
Import/Export data

Visual
Q-JAN, Q-FEB, ... Quarter end
illustrations
Matplotlib
A-JAN, A-FEB, ... Year end
Figures and subplots AS-JAN, AS-FEB, ... Year begin
Plot types and styles
Pandas visualization
BA-JAN, BA-FEB, ... Business year end
Applications BAS-JAN, BAS-FEB, ... Business year begin
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Resample date ranges 263
Essential
concepts
Getting started DataFrame.resample("frequency"): resample frecency of time se-
Procedural
programming ries.
Object-orientation

Numerical Resample date ranges


programming
NumPy package import numpy as np
NumPy array start = datetime(2016, 1, 1)
ind = pd.date_range(start, now)
Linear Algebra

Data formats and


handling
numbers = np.arange((now - start).days + 1)
Pandas df = pd.DataFrame(numbers, index=ind)
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
print(df.head()) print(df.resample("3BM").sum().head())
Figures and subplots
Plot types and styles
Pandas visualization ## 0 ## 0
Applications
## 2016-01-01 0 ## 2016-01-29 406
Time series ## 2016-01-02 1 ## 2016-04-29 6734
Moving window ## 2016-01-03 2 ## 2016-07-29 15015
Financial applications
## 2016-01-04 3 ## 2016-10-31 24205
## 2016-01-05 4 ## 2017-01-31 32246

© 2018 PyEcon.org
Section 5.2 264
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Applications
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I Moving window
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Moving window functions 265
Essential
concepts
Getting started DataFrame.rolling(window): conduct rolling window computations.
Procedural
programming
Object-orientation Rolling mean
Numerical
programming import matplotlib.pyplot as plt
NumPy package amazon = pd.read_csv("data/amzn.csv", index_col=0,
NumPy array parse_dates=True)["Adj Close"]
Linear Algebra
fig = plt.figure(figsize=(16, 8))
Data formats and
handling
ax = fig.add_subplot(1, 1, 1)
Pandas ax.set_ylabel("price")
Series amazon.plot(ax=ax, label="Amazon")
DataFrame
amazon.rolling(window=20).mean().plot(ax=ax, label="Rolling mean")
Import/Export data
ax.legend(loc="best")
ax.set_title("Amazon price and rolling mean", fontsize=25)
Visual
illustrations
Matplotlib fig.savefig("out/amzn.pdf")
Figures and subplots
Plot types and styles
Pandas visualization Rolling functions are: mean(), median(), sum(), var(), std(),
Applications min(), max().
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Moving window functions 266
Essential
concepts
Getting started
Procedural
programming
Object-orientation
1500
Amazon price and rolling mean
Amazon
Numerical Rolling mean
programming 1400
NumPy package
NumPy array 1300
Linear Algebra
1200
Data formats and
price

handling
1100
Pandas
Series
1000
DataFrame
Import/Export data
900
Visual
illustrations
7-03 7-0
5
7-0
7
7-0
9
7-1
1
8-0
1
8-0
3
Matplotlib 201 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Moving window functions 267
Essential
concepts
Getting started
Procedural Standard deviation
programming
Object-orientation fig = plt.figure(figsize=(16, 8))
Numerical ax = fig.add_subplot(1, 1, 1)
programming pfizer = pd.read_csv("data/pfe.csv", index_col=0,
NumPy package
parse_dates=True)["Adj Close"]
NumPy array
Linear Algebra
pg = pd.read_csv("data/pg.csv", index_col=0,
Data formats and
parse_dates=True)["Adj Close"]
handling all = pd.DataFrame(index=amazon.index)
Pandas
all["amazon"] = pd.DataFrame(amazon)
all["pfizer"] = pd.DataFrame(pfizer)
Series
DataFrame
Import/Export data all["pg"] = pd.DataFrame(pg)
Visual all_std = all.rolling(window=20).std()
illustrations all_std.plot(ax=ax)
Matplotlib
Figures and subplots
ax.set_title("Standard deviation", fontsize=25)
Plot types and styles fig.savefig("out/std.pdf")
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Moving window functions 268
Essential
concepts
Getting started
Procedural
programming
Object-orientation Standard deviation
amazon
Numerical pfizer
70 pg
programming
NumPy package 60
NumPy array
50
Linear Algebra

Data formats and 40


handling
30
Pandas
Series
20
DataFrame
Import/Export data 10

Visual 0
illustrations
5 7 9 1 1 3
7-0 7-0 7-0 7-1 8-0 8-0
Matplotlib 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Moving window functions 269
Essential
concepts
Getting started
Procedural Logarithmic standard deviation
programming
Object-orientation fig = plt.figure(figsize=(16, 8))
Numerical ax = fig.add_subplot(1, 1, 1)
programming
all_std.plot(ax=ax, logy=True)
NumPy package
NumPy array
ax.set_title("Logarithmic standard deviation", fontsize=25)
Linear Algebra fig.savefig("out/std_log.pdf")
Data formats and
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Moving window functions 270
Essential
concepts
Getting started
Procedural
programming
Object-orientation
102
Logarithmic standard deviation
amazon
Numerical pfizer
pg
programming
NumPy package
NumPy array
101
Linear Algebra

Data formats and


handling
Pandas
Series 100

DataFrame
Import/Export data

Visual
illustrations
5 7 9 1 1 3
7-0 7-0 7-0 7-1 8-0 8-0
Matplotlib 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Exponentially weighted functions 271
Essential
concepts
Getting started DataFrame.ewm(span): compute exponentially weighted rolling win-
Procedural
programming dow functions.
Object-orientation

Numerical Exponentially weighted functions


programming
NumPy package fig = plt.figure(figsize=(16, 8))
NumPy array
ax = fig.add_subplot(1, 1, 1)
amazon.rolling(window=40).mean().plot(ax=ax, label="Rolling mean")
Linear Algebra

Data formats and


handling amazon.ewm(span=40).mean().plot(ax=ax, label="Exp mean",
Pandas linestyle="--", color="red")
Series amazon.plot(ax=ax, label="Amazon price")
DataFrame
Import/Export data
ax.legend(loc="best")
ax.set_title("Exponentially weighted functions", fontsize=25)
Visual
illustrations fig.savefig("out/mean.pdf")
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Exponentially weighted functions 272
Essential
concepts
Getting started
Procedural
programming
Object-orientation
1500
Exponentially weighted functions
Rolling mean
Numerical Exp mean
Amazon price
programming 1400
NumPy package
NumPy array 1300
Linear Algebra
1200
Data formats and
handling
1100
Pandas
Series
1000
DataFrame
Import/Export data
900
Visual
illustrations
7-03 7-0
5
7-0
7
7-0
9
7-1
1
8-0
1
8-0
3
Matplotlib 201 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Binary moving window functions 273
Essential
concepts
Getting started DataFrame.pct_change(): get the daily percentage change.
Procedural
programming
Object-orientation Percentage change
Numerical
programming fig = plt.figure(figsize=(16, 8))
NumPy package ax = fig.add_subplot(1, 1, 1)
NumPy array
returns = all.pct_change()
Linear Algebra
print(returns.head())
Data formats and
handling
Pandas
## amazon pfizer pg
Series ## Date
DataFrame
## 2017-02-23 NaN NaN NaN
## 2017-02-24 -0.008155 0.005872 -0.000878
Import/Export data

Visual
illustrations
## 2017-02-27 0.004023 0.000584 -0.001757
Matplotlib ## 2017-02-28 -0.004242 -0.004668 0.001980
Figures and subplots ## 2017-03-01 0.009514 0.008792 0.006479
Plot types and styles

returns.plot(ax=ax)
Pandas visualization

Applications
ax.set_title("Returns", fontsize=25)
Time series
Moving window
fig.savefig("out/returns.pdf")
Financial applications

© 2018 PyEcon.org
Binary moving window functions 274
Essential
concepts
Getting started
Procedural
programming
Object-orientation Returns
amazon
Numerical 0.125 pfizer
pg
programming
NumPy package 0.100

NumPy array
0.075
Linear Algebra
0.050
Data formats and
handling
0.025
Pandas
Series 0.000
DataFrame
Import/Export data 0.025

Visual 0.050
illustrations
3 5 7 9 1 1 3
7-0 7-0 7-0 7-0 7-1 8-0 8-0
Matplotlib 201 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Binary moving window functions 275
Essential
concepts
Getting started DataFrame.rolling().corr(benchmark): compute correlation be-
Procedural
programming tween two time series.
Object-orientation

Numerical Correlation
programming
NumPy package fig = plt.figure(figsize=(16, 8))
NumPy array ax = fig.add_subplot(1, 1, 1)
Linear Algebra
DJI = pd.read_csv("data/dji.csv", index_col=0,
Data formats and parse_dates=True)["Adj Close"]
handling
Pandas
DJI_ret = DJI.pct_change()
Series corr = returns.rolling(window=20).corr(DJI_ret)
DataFrame corr.plot(ax=ax)
ax.grid()
Import/Export data

Visual
illustrations
ax.set_title("20 days correlation", fontsize=25)
Matplotlib fig.savefig("out/corr.pdf")
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Binary moving window functions 276
Essential
concepts
Getting started
Procedural
programming
Object-orientation 20 days correlation
Numerical 0.8
programming
NumPy package 0.6
NumPy array
Linear Algebra 0.4

Data formats and


handling 0.2

Pandas
0.0
Series
DataFrame
0.2
Import/Export data
amazon
Visual pfizer
0.4 pg
illustrations
5 7 9 1 1 3
7-0 7-0 7-0 7-1 8-0 8-0
Matplotlib 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Section 5.3 277
Essential
concepts
Getting started
Procedural
programming
Object-orientation
Applications
Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


I Financial applications
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Cumulative returns 278
Essential
concepts
Getting started
Procedural Returns
programming
Object-orientation fig = plt.figure(figsize=(16, 8))
Numerical ax = fig.add_subplot(1, 1, 1)
programming ret_index = (1+returns).cumprod()
NumPy package
stocks = ["amazon", "pfizer", "pg"]
NumPy array
Linear Algebra
for i in stocks:
Data formats and
ret_index[i][0] = 1
handling print(ret_index.tail())
Pandas
Series
## amazon pfizer pg
DataFrame
Import/Export data
## Date
Visual
## 2018-02-15 1.715298 1.088693 0.932322
illustrations ## 2018-02-16 1.699961 1.105461 0.934471
Matplotlib ## 2018-02-20 1.723031 1.097840 0.920217
## 2018-02-21 1.740128 1.090218 0.907772
Figures and subplots
Plot types and styles
Pandas visualization ## 2018-02-22 1.742968 1.090218 0.914560
Applications
Time series ret_index.plot(ax=ax)
Moving window ax.set_title("Cumulative returns", fontsize=25)
Financial applications
fig.savefig("out/cumret.pdf")

© 2018 PyEcon.org
Cumulative returns 279
Essential
concepts
Getting started
Procedural
programming
Object-orientation Cumulative returns
amazon
Numerical pfizer
pg
programming
NumPy package 1.6

NumPy array
Linear Algebra
1.4
Data formats and
handling
Pandas 1.2
Series
DataFrame
Import/Export data 1.0

Visual
illustrations
7-03 7-0
5
7-0
7
7-0
9
7-1
1
8-0
1
8-0
3
Matplotlib 201 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Cumulative returns 280
Essential
concepts
Getting started
Procedural Monthly returns
programming
Object-orientation returns_m = ret_index.resample("BM").last().pct_change()
Numerical print(returns_m.head())
programming
NumPy package
NumPy array
## amazon pfizer pg
Linear Algebra ## Date
Data formats and
## 2017-02-28 NaN NaN NaN
handling ## 2017-03-31 0.049110 0.002638 -0.013396
Pandas
## 2017-04-28 0.043371 -0.008477 -0.020604
## 2017-05-31 0.075276 -0.028124 0.008703
Series
DataFrame
Import/Export data ## 2017-06-30 -0.026764 0.028790 -0.010671
Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Volatility calculation 281
Essential
concepts
Getting started
Procedural Volatility
programming
Object-orientation fig = plt.figure(figsize=(16, 8))
Numerical ax = fig.add_subplot(1, 1, 1)
programming
vola = returns.rolling(window=20).std() * np.sqrt(20)
NumPy package
NumPy array
vola.plot(ax=ax)
Linear Algebra ax.set_title("Volatility", fontsize=25)
Data formats and fig.savefig("out/vola.pdf")
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Volatility calculation 282
Essential
concepts
Getting started
Procedural
programming
Object-orientation Volatility
0.14 amazon
Numerical pfizer
pg
programming
0.12
NumPy package
NumPy array
0.10
Linear Algebra

Data formats and 0.08


handling
Pandas 0.06
Series
DataFrame 0.04
Import/Export data
0.02
Visual
illustrations
5 7 9 1 1 3
7-0 7-0 7-0 7-1 8-0 8-0
Matplotlib 201 201 201 201 201 201
Date
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Group analysis 283
Essential
concepts
Getting started DataFrame.describe(): show summarized analysis.
Procedural
programming
Object-orientation Describe
Numerical print(all.describe())
programming
NumPy package
NumPy array ## amazon pfizer pg
Linear Algebra ## count 252.000000 251.000000 252.000000
Data formats and ## mean 1044.521903 33.892665 87.934304
handling
## std 158.041844 1.694680 2.728659
Pandas
Series
## min 843.200012 30.872143 79.919998
DataFrame ## 25% 953.567474 32.593733 86.241475
Import/Export data
## 50% 988.680023 33.147469 87.863598
Visual ## 75% 1136.952484 35.331834 90.363035
illustrations
Matplotlib
## max 1485.339966 38.661823 92.988976
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Return analysis 284
Essential
concepts
Getting started
Procedural Histogram
programming
Object-orientation fig, ax = plt.subplots(3, 1, figsize=(10, 8), sharex=True)
Numerical for i in range(3):
programming
ax[i].set_title(stocks[i])
NumPy package
NumPy array
returns[stocks[i]].hist(ax=ax[i], bins=50)
Linear Algebra fig.savefig("out/return_hist.pdf")
Data formats and
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Return analysis 285
Essential
concepts
Getting started
Procedural
programming
Object-orientation
amazon
Numerical
40
programming
NumPy package 30
NumPy array
Linear Algebra 20
Data formats and 10
handling
Pandas 0
Series pfizer
DataFrame 40
Import/Export data
30
Visual
illustrations 20
Matplotlib
10
Figures and subplots
Plot types and styles 0
Pandas visualization pg
30
Applications
Time series
Moving window
20
Financial applications
10

0
0.050 0.025 0.000 0.025 0.050 0.075 0.100 0.125
© 2018 PyEcon.org
Ordinary Least Squares 286
Essential
concepts
Getting started Using the statsmodels module to determine regressions:
Procedural
programming DataFrame.tolist(): return a list containing the DataFrame values.
Object-orientation
sm.OLS(X, Y).fit(): get OLS fit of data (X, Y).
Numerical
programming
NumPy package Regression data
NumPy array
Linear Algebra import statsmodels.api as sm
Data formats and fig = plt.figure(figsize=(16, 8))
handling ax = fig.add_subplot(1, 1, 1)
Pandas
Series
Y = np.array(amazon.loc["2018-1-1":"2018-1-15"].tolist())
DataFrame X = np.arange(len(Y))
Import/Export data ax.scatter(x=X, y=Y, marker="o", color="red")
Visual fig.savefig("out/reg_data.pdf")
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Ordinary Least Squares 287
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical 1300
programming
NumPy package
NumPy array 1280

Linear Algebra

1260
Data formats and
handling
Pandas 1240
Series
DataFrame
1220
Import/Export data

Visual
illustrations 1200

Matplotlib
Figures and subplots 0 1 2 3 4 5 6 7 8
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Ordinary Least Squares 288
Essential
concepts
Getting started
Procedural Regression
programming
Object-orientation X_reg = sm.add_constant(X)
Numerical res = sm.OLS(Y, X_reg).fit()
programming
b, a = res.params
NumPy package
NumPy array
ax.plot(X, a*X + b)
Linear Algebra fig.savefig("out/ols.pdf")
Data formats and
handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Ordinary Least Squares 289
Essential
concepts
Getting started Summary of OLS regression. To print in python use res.summary().
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Ordinary Least Squares 290
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical 1300
programming
NumPy package
1280
NumPy array
Linear Algebra
1260
Data formats and
handling
Pandas 1240
Series
DataFrame
1220
Import/Export data

Visual
1200
illustrations
Matplotlib
Figures and subplots 0 1 2 3 4 5 6 7 8
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Newton-Raphson 291
Essential
concepts
Getting started The Newton-Raphson method is an algorithm for finding successively
Procedural
programming better approximations to the roots of real-valued functions.
Object-orientation

Numerical
programming
Let F : Rk → Rk be a continuously differentiable function and JF (xn )
NumPy package the Jacobian matrix of F . The recursive Newton-Raphson method to
NumPy array
Linear Algebra find the root of F is given by:
Data formats and

x n+1 := x n − J(x n )−1 F (x n )


handling

Pandas
Series
DataFrame
Import/Export data
with an initial guess x0 .
Visual
illustrations For f : R → R the process is repeated as
Matplotlib
Figures and subplots
Plot types and styles f (xn )
xn+1 = xn − .
f 0 (xn )
Pandas visualization

Applications
Time series
Moving window Accordingly, we can determine the optimum of the function f by
Financial applications
applying the method instead to f 0 = df /dx .

© 2018 PyEcon.org
Newton-Raphson 292
Essential
concepts
Getting started As an illustrative application, we consider the function
Procedural
programming
Object-orientation
f (x ) = 3x 3 + 3x 2 − 5x , x ∈ R,
Numerical
programming which is represented by the blue line in the following diagram. The
NumPy package
NumPy array
figure depicts the iterative solution path applying the Newton-Raphson
Linear Algebra method to find the root, e.g., x solving f (x ) = 0, by tangent points
Data formats and
handling
and tangents starting from the intial guess x0 = −1.
Pandas
Series
15.0 f(x)
DataFrame
Import/Export data
12.5
Visual
illustrations
10.0
Matplotlib
Figures and subplots
7.5
Plot types and styles
Pandas visualization
5.0
Applications
Time series 2.5
Moving window
Financial applications
0.0
x0 x3 x2 x1
1.5 1.0 0.5 0.0 0.5 1.0 1.5

© 2018 PyEcon.org
Newton-Raphson implementation 293
Essential
concepts
Getting started The first step involves the definition of the function f (x ) and its
Procedural
programming derivation f 0 (x ) in Python. We also specify a delta function that
Object-orientation
determines the absolute deviation of the target function and the target
Numerical
programming value, i.e., 0:
NumPy package
NumPy array
Linear Algebra
Newton-Raphson requirements
Data formats and def f(x):
handling
Pandas
return 3*x**3 + 3*x**2 - 5*x
Series def df(x):
DataFrame return 9*x**2 + 6*x - 5
Import/Export data
def dx(f, x):
Visual return abs(f(x))
illustrations
Matplotlib
Figures and subplots
Finally, we implement the Newton-Raphson algorithm as outlined above.
Plot types and styles
Pandas visualization In addition, for a better understanding, we plot the solution path using
Applications the tangent points for x0 , x1 , . . . , xN . The solution point is colored
black. Hence, the lines starting with ax.scatter() are not part of
Time series
Moving window
Financial applications
the algorithm – they take global variables and are included just for the
visual illustration.
© 2018 PyEcon.org
Newton-Raphson implementation 294
Essential
concepts
Getting started
Procedural Newton-Raphson
programming
Object-orientation def newton_raphson(fun, dfun, x0, e):
Numerical delta = dx(fun, x0)
programming
while delta > e:
NumPy package
NumPy array
ax.scatter(x0, f(x0), color="red", s=80)
Linear Algebra x0 = x0 - fun(x0) / dfun(x0)
Data formats and delta = dx(fun, x0)
handling ax.scatter(x0, f(x0), color="black", s=80)
Pandas
Series
return(x0)
DataFrame fig = plt.figure(figsize=(16, 8))
Import/Export data ax = fig.add_subplot(1, 1, 1)
Visual x = np.arange(-1.5, 1.7, 0.001)
illustrations
ax.plot(x, f(x))
Matplotlib
Figures and subplots ax.grid()
Plot types and styles x_root = newton_raphson(f, df, -1, 0.1)
Pandas visualization
fig.savefig("out/newton_raphson_root.pdf")
Applications print(f"Root at: {x_root:.4f}")
Time series
Moving window
Financial applications
## Root at: 0.8878

© 2018 PyEcon.org
Newton-Raphson implementation 295
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical 14
programming
NumPy package 12
NumPy array
Linear Algebra 10

Data formats and 8


handling
Pandas 6
Series
4
DataFrame
Import/Export data
2
Visual
illustrations 0

Matplotlib
2
Figures and subplots 1.5 1.0 0.5 0.0 0.5 1.0 1.5
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
Newton-Raphson optimization 296
Essential
concepts
Getting started With the definition of the second derivative f 00 , i.e. the derivative of the
Procedural
programming derivative, we can employ the Newton-Raphson method to obtain an
Object-orientation
optimum of the target function f (x ) numerically. Hence, the previous
Numerical
programming example needs only minimal modifications:
NumPy package
NumPy array
Linear Algebra
Newton-Raphson
Data formats and def ddf(x):
handling
Pandas
return 18*x + 6
Series fig = plt.figure(figsize=(16, 8))
DataFrame ax = fig.add_subplot(1, 1, 1)
Import/Export data
x = np.arange(-1.5, 1.7, 0.001)
Visual
illustrations
ax.plot(x, f(x))
Matplotlib ax.grid()
Figures and subplots x_opt = newton_raphson(df, ddf, 1, 0.1)
Plot types and styles
fig.savefig("out/newton_raphson_optimum.pdf")
print(f"Minimum at: {x_opt:.4f}")
Pandas visualization

Applications

## Minimum at: 0.4886


Time series
Moving window
Financial applications

© 2018 PyEcon.org
Newton-Raphson optimization 297
Essential
concepts
Getting started
Procedural
programming
Object-orientation
15.0
Numerical
programming
NumPy package 12.5

NumPy array
Linear Algebra 10.0

Data formats and


7.5
handling
Pandas
Series 5.0

DataFrame
Import/Export data 2.5

Visual
illustrations 0.0

Matplotlib
Figures and subplots 1.5 1.0 0.5 0.0 0.5 1.0 1.5
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org
The End... but not finally 298
Essential
concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
NumPy array
Linear Algebra

Data formats and


handling
Pandas
Series
DataFrame
Import/Export data

Visual
illustrations
Matplotlib
Figures and subplots
Plot types and styles
Pandas visualization

Applications
Time series
Moving window
Financial applications

© 2018 PyEcon.org

Вам также может понравиться