Вы находитесь на странице: 1из 5

DATA SCIENCE CURRICULUM

Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the rst eight
weeks doing iterative, project-centered skill acquisition. Over the course of five data science projects, they develop skills across
key aspects of data science, and results from each project are added to the students' portfolios. In the last four weeks, students
build out and complete their individual final projects, culminating in a presentation of their work to representatives from the Metis
Hiring Network.

ONLINE PRE-WORK
Students work through a curated collection of tutorials that cover the basics
so they can hit the ground running. First, they're guided through initial
software setup. Introductory materials then start with productivity at the
command line, using an editor effectively, and becoming familiar with Python
basics. Students reinforce their statistics knowledge through a set of readings
with exercises that start to blend the statistical and computational. Metis
teaching assistants review these preparatory exercises and provide feedback
online.

INSTALLING PACKAGES
COMMAND LINE
CODE EDITOR
PYTHON
STATISTICS

WEEK 1

UNIT ONE
Introduction to the Data Science Toolkit

Students complete an entire bite-sized data science project


from start to finish. They start using Git for version control
and the IPython environment with the pandas and
matplotlib packages to perform exploratory statistical
analyses and visualizations.

Review probability and statistics, including distributions, bootstrapping, hypothesis testing, maximum
likelihood estimation, and Bayes theorem (This review
spans the first three weeks.)
Use UNIX, Git, and IPython to organize data science
project resources
Load and manipulate data with the pandas Python
package
Visualize results using the matplotlib Python package
Communicate data science results

PRO J EC T O5

PRO J EC T O3
PRO J EC T O4

P ROJ E CT O 1
PR OJ EC T O 2

CODENAME
LUTHER
CODENAME
MCNULTY
CODENAME
CODENAME
CODENAME
BENSON
FLETCHER
KOJAK (AKA PASSION PROJECT)
first
pass
atwork
machine
learning,
students
dive
prediction
Students
form
small
groups
that
each
work
as
an internal
data
science
team
at awith
fictional
In theFor
firstthe
week,
Thestudents
last
Students
guided
are
project
in
free
small
to
focuses
use
groups
anything
on
using
unsupervised
covered
MTA
turnstile
in deep
class
learning
data
orinto
toand
learn
estimate
NLP
something
algorithms,
the
new
NoSQL
to answer
regression
models.
They
experience
the
beauty
of
at
les,
and
learn
to scrape
informain the
insurance
industry
(details
are
leftwork
to and
the
students
to
determine).
volume
ofcompany
people
databases,
the
on
questions
and
street,
APIthey
sodata
that
want
collection.
(theoretical)
to
address.
Students
nonprofits
Some
students
individually
companies
know
what
and
can
will
have
be very
theirfew
passion
tionstreet
fromconstraints
websites
using
tools
likeofPython
Requests,
Beautiful
Soup,
and
Selenium.
Supervised
learning
algorithms
and
relational
databases
been
covered
in class.
deploy
teams
project
efficiently.
for
atthe
the
The
design
admissions
students
thisstage.
are
project.
provided
Others
with
embark
thehave
data
on
entirely
and
guided
new turf.
Every student
Afterexploratory
scrapingworks
together
some
movie
box office
data,
students
and
scrape
more
Students
work
on
their
own
classification
models
that
within
overall
goals
of the
through
data
intensely
analysis
and
and
challenges
plotting
so
him
they
or herself
canfitfocus
to find
create
onthe
new
something
tools,
cool,
interesting,
resources
on
own
and
present
their movie
industry
regression
predictions
to the
company
and
theorteam.
During McNulty,
students
perform
a deep
dive into the
visualibrainstorming,
andtheir
useful,
communication.
worthwhile.
class.zation package D3 and create their own APIs on the Python Flask micro framework to
serve data from their databases to their visualizations.

thisismetis.com

WEEK 2

UNIT TWO: PART 1:


Design Process and Web Scraping

In preparation for Project 2, students start to learn one of the


most important tools a data scientist uses: the iterative
design process. They learn tools for web scraping and start
fitting simple models to data. Also, they are introduced to
cloud computing and work on remote servers.

WEEK 3

Use the design process to iteratively explore the


possible ways that a problem can be solved
Create and work in a virtual environment on a cloud
computing service
Use Pythons Requests and Selenium packages to obtain
data from web pages
Use Pythons Beautiful Soup package to parse the
content of a web page to find useful data for subsequent
analysis
Use the design process to iterate the concept for the
Unit 2 projects
Complete a primer on web fundamentals including
HTML, CSS, and JavaScript

UNIT TWO: PART 2


Regression and Communicating Results

Students go in-depth on regression using scikit-learn and


matplotlib. Choosing among the analysis methods and
approaches to reporting their results, students nish the
second project and present their ndings.

Apply regression modeling with Python packages


scikit-learn and statsmodels
Load, clean, and explore data using Python packages
pandas, numpy, and matplotlib/pyplot
Experience how the design process inuences analysis
and results
Complete second project and communicate results to
each other

PRO J EC T O5

PRO J EC T O3
PRO J EC T O4

P ROJ E CT O 1
PR OJ EC T O 2

CODENAME
LUTHER
CODENAME
MCNULTY
CODENAME
CODENAME
CODENAME
BENSON
FLETCHER
KOJAK (AKA PASSION PROJECT)
first
pass
atwork
machine
learning,
students
prediction
Students
form
small
groups
that
each
work
as
an dive
internal
data
science
team
at awith
fictional
In theFor
firstthe
week,
Thestudents
last
Students
guided
are
project
in
free
small
to
focuses
use
groups
anything
on
using
unsupervised
covered
MTA
turnstile
in deep
class
learning
data
orinto
toand
learn
estimate
NLP
something
algorithms,
the
new
NoSQL
to answer
regression
models.
experience
beauty
of left
at
les,
and
learn
scrape
in questions
insurance
industry
(details
are
to
the
students
determine).
volume
ofcompany
people
databases,
the
on
theThey
and
street,
API
they
sodata
that
want
collection.
(theoretical)
tothe
address.
Students
Some
nonprofits
students
work
and
individually
know
companies
whatto
and
will
can
be
have
their
very
finalfew
project
information
from
web
sites
using
tools
likeprovided
Python
Requests,
Beautiful
Soup,
and
Supervised
learning
algorithms
and
relational
databases
have
been
in class.
deploy
street
constraints
teams
at efficiently.
the
for
admissions
the
The
design
students
stage.
of
thisOthers
are
project.
embark
with
on the
entirely
data
new
and covered
guided
turf.
Every
student works
Selenium.
After
scraping
some
boxtothat
office
data,
students
find
andof the useful, or
Students
work
on their
ownand
classification
models
fitfocus
within
thenew
overall
goals
through
exploratory
intensely
data
analysis
andtogether
challenges
plotting
himmovie
or herself
so
they
can
create
something
on
cool,
tools,
interesting,
scrape
more
on their
own
and present
their
movie
industry
company
and
the team.
During
McNulty,
students
perform
a deep
diveregression
into the visualibrainstorming,
andresources
worthwhile.
communication.
predictions
to the class.
zation package
D3 and create their own APIs on the Python Flask micro framework to
serve data from their databases to their visualizations.

thisismetis.com

WEEK 4

UNIT THREE: PART 1:


Databases and Introduction to Machine Learning Concepts

Students cover relational databases such as SQL and more


ways of obtaining, cleaning and maintaining data. They are
introduced to the concepts of machine learning and
exposed to classification and supervised learning with a few
examples such as logistic regression and KNN. They also
discuss different types of feasibility related to data science
questions and projects.

WEEK 5

UNIT THREE: PART 2


Machine Learning, Supervised Learning Techniques, Naive Bayes Algorithm

Students dig into more details and more algorithms for


supervised learning including SVM, decision trees and
random forests; techniques for feature selection and feature
extraction; and concepts and applications for deep
learning. Students choose to apply one or more of these
algorithms as part of this Units project.

WEEK 6

Use SQL databases to store and organize data


Explore supervised learning techniques including
decision trees and random forests
Access stored data with MySQL querying language
Complete a deep applied survey of classication
(supervised learning) techniques, such as logistic
regression, k-nearest neighbors, etc.
Design and evaluate the computational feasibility of a
third data project

Connect regression modeling to the broader family of


machine learning techniques
Use supervised learning on Project 3; work in groups
simulating in-house data science teams
Rene models with feature selection and feature
extraction
Evaluate the ecacy and computational feasibility of
various ML algorithms in dierent contexts

UNIT THREE: PART 3


JavaScript and D3

Students visualize projects using D3, a favorite tool for


exible and attractive presentations of data and relationships. Since D3 is a JavaScript library, students learn
JavaScript essentials and the incorporation of other js
libraries (jQuery, Bootstrap, etc.) that make the job much
easier.

Learn the fundamentals of JavaScript


Explore basic principles of good visual design and
communication
Use D3 to create interactive visualizations that are
functional in any browser
Create novel data visualizations with D3 to illustrate Unit
3 project results in blog post format

PRO J EC T O5

PRO J EC T O3
PRO J EC T O4

P ROJ E CT O 1
PR OJ EC T O 2

CODENAME
LUTHER
CODENAME
MCNULTY
CODENAME
CODENAME
CODENAME
BENSON
FLETCHER
KOJAK (AKA PASSION PROJECT)
first
pass
atwork
machine
learning,
students
prediction
Students
form
small
groups
that
each
work
as
an dive
internal
data
science
team
at awith
fictional
In theFor
firstthe
week,
Thestudents
last
Students
guided
are
project
in
free
small
to
focuses
use
groups
anything
on
using
unsupervised
covered
MTA
turnstile
in deep
class
learning
data
orinto
toand
learn
estimate
NLP
something
algorithms,
the
new
NoSQL
to answer
regression
models.
They
experience
the(details
beauty
of left
atwork
les,
and
learn
scrape
in the
insurance
industry
are
to
the
students
to
determine).
volume
ofcompany
people
databases,
the
on
questions
and
street,
API
they
sodata
that
want
collection.
(theoretical)
to
address.
Students
nonprofits
Some
students
and
individually
companies
know
what
and
can
will
have
be very
theirfew
passion
information
from
web
using
tools
likeprovided
Python
Requests,
Beautiful
Soup,
and
Supervised
learning
algorithms
and
relational
databases
been
in class.
deploy
street
constraints
teams
project
efficiently.
for
atsites
the
the
The
design
admissions
students
of
thisstage.
are
project.
Others
with
embark
thehave
data
on
entirely
and covered
guided
new turf.
Every student
Selenium.
After
scraping
together
some
movie
box
office
data,
students
find
and
Students
work
on
their
own
classification
models
that
within
thenew
overall
goals
of the
through
exploratory
works
data
intensely
analysis
and
and
challenges
plotting
so
himthey
or herself
canfitfocus
to create
on
something
tools,
cool,
interesting,
scrape
more
on their
own
and present
their
movie
industry
company
and
theorteam.
During
McNulty,
students
perform
a deep
diveregression
into the visualibrainstorming,
andresources
useful,
communication.
worthwhile.
predictions
to the class.
zation package
D3 and create their own APIs on the Python Flask micro framework to
serve data from their databases to their visualizations.

thisismetis.com

WEEK 7

UNIT FOUR: PART 1:


APIs, Data Collection Methods, NoSQL Storage, WebApps with Flask

The project for the fourth unit involves text data. Students
round out data acquisition methods with APIs and online
database servers. Students also learn about NoSQL
databases and start using MongoDB.

WEEK 8

Use Python to download data from an API


Use NoSQL databases; parse and store unstructured
data in MongoDB
Review database selection: non-relational (NoSQL)
databases vs. relational (SQL) databases vs. no database
(at les)
Merge disparate data sets to practice data munging
Design and propose initial data collection for Unit 4
project

UNIT FOUR: PART 2


Natural Language Processing (NLP)

Students analyze the text data collected in the previous


week and learn about NLP algorithms. More unsupervised
learning algorithms are explored. Students dive deeper into
unsupervised learning and more algorithms, covering
K-means, hierarchical clustering, mixture models and topic
models. They also learn about how large amounts of data
are handled, discussing parallel computing and Hadoop
MapReduce. Project 4 presentations are presented as
lightning talks.

Use Pythons Natural Language ToolKit and TextBlob


library to perform natural language analyses on text data
Apply deep learning/neural networks, DBSCAN,
dimensionality reduction (with principle components
analysis).
Algorithms including KD-trees and locality sensitive
hashing are learned.
Survey K-means, hierarchical clustering, and other
unsupervised learning algorithms; applications on real
data
Reect on the strengths and weaknesses of each
algorithm and its appropriate use
Outline the data science stack and design choices in
data engineering fault tolerant systems
Set up Hadoop environment on cloud servers
Use Hadoop via Python bindings to write customized
map-reduce jobs from scratch and run in Hadoop cloud
environment
Discuss Hadoop: history & ecosystem, when & why, hype
& reality
Complete Project 4 and present ndings to class in
lightning talk format

PRO J EC T O5

PRO J EC T O3
PRO J EC T O4

P ROJ E CT O 1
PR OJ EC T O 2

CODENAME
LUTHER
CODENAME
MCNULTY
CODENAME
CODENAME
CODENAME
BENSON
FLETCHER
KOJAK (AKA PASSION PROJECT)
first
pass
atwork
machine
learning,
students
prediction
Students
form
small
groups
that
each
work
as
an dive
internal
data
science
team
at awith
fictional
In theFor
firstthe
week,
Thestudents
last
Students
guided
are
project
in
free
small
to
focuses
use
groups
anything
on
using
unsupervised
covered
MTA
turnstile
in deep
class
learning
data
orinto
toand
learn
estimate
NLP
something
algorithms,
the
new
NoSQL
to answer
regression
models.
They
experience
the(details
beauty
of left
atwork
les,
and
learn
scrape
in the
insurance
industry
are
to
the
students
to
determine).
volume
ofcompany
people
databases,
the
on
questions
and
street,
API
they
sodata
that
want
collection.
(theoretical)
to
address.
Students
nonprofits
Some
students
and
individually
companies
know
what
and
can
will
have
be very
theirfew
passion
information
from
web
using
tools
likeprovided
Python
Requests,
Beautiful
Soup,
and
Supervised
learning
algorithms
and
relational
databases
been
in class.
deploy
street
constraints
teams
project
efficiently.
for
atsites
the
the
The
design
admissions
students
of
thisstage.
are
project.
Others
with
embark
thehave
data
on
entirely
and covered
guided
new turf.
Every student
Selenium.
After
scraping
together
some
movie
box
office
data,
students
find
and
Students
work
on
their
own
classification
models
that
within
thenew
overall
goals
of the
through
exploratory
works
data
intensely
analysis
and
and
challenges
plotting
so
himthey
or herself
canfitfocus
to create
on
something
tools,
cool,
interesting,
scrape
more
on their
own
and present
their
movie
industry
company
and
theorteam.
During
McNulty,
students
perform
a deep
diveregression
into the visualibrainstorming,
andresources
useful,
communication.
worthwhile.
predictions
to the class.
zation package
D3 and create their own APIs on the Python Flask micro framework to
serve data from their databases to their visualizations.

thisismetis.com

WEEKS 9-12

UNIT FIVE
Final Project

Students work full time on their Final Projects, which they


have been slowly designing through the rst eight weeks.
They also learn more about cloud computing, system architectures and feasibility evaluations.

Use the design process to isolate an appropriate


problem to solve
Evaluate the computational feasibility of the problem
Choose data sources that can be used to address the
problem
Design and implement an appropriate computational
architecture
Design and implement an appropriate set of analysis
steps
Design and develop a data visualization to clearly
convey the results of the analysis to a layperson
Assemble nal portfolio and present project at
Career Day

MORE ABOUT PROJECTS


Data science projects can be divided into useful dimensions. A dimension can
be thought of as a facet along which a decision must be made to specify a
project implementation. The bootcamp considers the dimensions of domain,
design, data, algorithms, tools, and communication. Each Unit covers certain
content from several domains, which are reinforced in that Unit's project.
The rigor with which we attack the topics covered in the bootcamp allow us to
sleep soundly at night. We feel confident in saying that our graduates haven't
simply learned about the tools that data scientists use. By the time they leave
our classroom, our graduates are data scientists. They are ready to approach
the problem space in their new careers and assemble the suite of tools and
methods to answer insightful questions and communicate comprehensible
results. They are competent, capable, and confident. And they are ready to
work.

PRO J EC T O5

PRO J EC T O3
PRO J EC T O4

P ROJ E CT O 1
PR OJ EC T O 2

CODENAME
LUTHER
CODENAME
MCNULTY
CODENAME
CODENAME
CODENAME
BENSON
FLETCHER
KOJAK (AKA PASSION PROJECT)
first
pass
atwork
machine
learning,
students
prediction
Students
form
small
groups
that
each
work
as
an dive
internal
data
science
team
at awith
fictional
In theFor
firstthe
week,
Thestudents
last
Students
guided
are
project
in
free
small
to
focuses
use
groups
anything
on
using
unsupervised
covered
MTA
turnstile
in deep
class
learning
data
orinto
toand
learn
estimate
NLP
something
algorithms,
the
new
NoSQL
to answer
regression
models.
experience
beauty
of left
at
les,
and
learn
scrape
in questions
insurance
industry
(details
are
to
the
students
determine).
volume
ofcompany
people
databases,
the
on
theThey
and
street,
API
they
sodata
that
want
collection.
(theoretical)
tothe
address.
Students
Some
nonprofits
students
work
and
individually
know
companies
whatto
and
will
can
be
have
their
very
finalfew
project
information
from
web
sites
using
tools
likeprovided
Python
Requests,
Beautiful
Soup,
and
Supervised
learning
algorithms
and
relational
databases
have
been
in class.
deploy
street
constraints
teams
at efficiently.
the
for
admissions
the
The
design
students
stage.
of
thisOthers
are
project.
embark
with
on the
entirely
data
new
and covered
guided
turf.
Every
student works
Selenium.
After
scraping
some
boxtothat
office
data,
students
find
andof the useful, or
Students
work
on their
ownand
classification
models
fitfocus
within
thenew
overall
goals
through
exploratory
intensely
data
analysis
andtogether
challenges
plotting
himmovie
or herself
so
they
can
create
something
on
cool,
tools,
interesting,
scrape
more
on their
own
and present
their
movie
industry
company
and
the team.
During
McNulty,
students
perform
a deep
diveregression
into the visualibrainstorming,
andresources
worthwhile.
communication.
predictions
to the class.
zation package
D3 and create their own APIs on the Python Flask micro framework to
serve data from their databases to their visualizations.

thisismetis.com

Вам также может понравиться