Вы находитесь на странице: 1из 84

What is Data Science?

January 24, 2019


Data Science CSCI 1951A
Brown University
Instructor: Ellie Pavlick
HTAs: Wennie Zhang, Maulik Dang, Gurnaaz Kaur
Waitlist

• If you are not registered, make sure you are on the


waitlist (link on CAB)

• We have a *little* wiggle room in the enrollment cap

• Indicate relevant extenuating circumstances, we


will try to prioritize fairly
What is Data Science?
Moneyball!

https://en.wikipedia.org/wiki/Moneyball
Obama
Campaign

http://crowdsourcing-class.org/slides/ab-testing.pdf
Google’s
“40 Shades
of Blue”

Why Google has 200m reasons to put engineers over designers. The Gaurdian.
The Origin of A/B Testing. Nicolai Kramer Jakobsen.
Data Science = Magic
Data Science!
The Scientific Method

https://en.wikipedia.org/wiki/Scientific_method
The Scientific Method
The Scientific Method
Data Analytics,
Visualization,
Presentation
The Scientific Method
Data Analytics,
Visualization,
Presentation

Machine Learning,
Forecasting,
Modeling
The Scientific Method
Data Analytics,
Data Collection, Visualization,
Sampling, Presentation
Cleaning and
Processing

Machine Learning,
Forecasting,
Modeling
The Scientific Method
👍 👍

👍
👍
What is Data Science?
What is Data Science?
Data “Science”
Data “Science”

https://www.dailydot.com/unclick/state-googled-2017
http://nerdgeeks.co/us-state-words-map
Data “Science”
So many maps!

https://xkcd.com/1845/
Data “Science”
• To be fair…

• Intuition plays a huge role in the scientific method (“make


observations” is Step 1).

• Exploratory analysis is necessary, its okay to not be all rigor all


the time

• But!

• Exploratory analysis (even when it involves the biggest of data)


is meant to *form* a hypothesis, not test one

• Good experimental design and rigorous statistics are essential if


we want to make claims about how the world works
Data “Science”
• To be fair…

• Intuition plays a huge role in the scientific method (“make


observations” is Step 1).

• Exploratory analysis is necessary, its okay to not be all rigor all


the time

• But!

• Exploratory analysis (even when it involves the biggest of data)


is meant to *form* a hypothesis, not test one

• Good experimental design and rigorous statistics are essential if


we want to make claims about how the world works
Data “Science”
• To be fair…

• Intuition plays a huge role in the scientific method (“make


observations” is Step 1).

• Exploratory analysis is necessary, its okay to not be all rigor all


the time

• But!

• Exploratory analysis (even when it involves the biggest of data)


is meant to *form* a hypothesis, not test one

• Good experimental design and rigorous statistics are essential if


we want to make claims about how the world works
Data “Science”
“Eyeballing it”

13-18 23-29

19-22 30-65
Facebook posts by age group
Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach.
Schwartz et al. (2013).
Data “Science”
“Eyeballing it”

Frequent topics observed in 17,000 Science articles


Probabilistic Topic Models. Blei (2012).
Data “Science”
“Eyeballing it”

Similarity of words according on word2vec model


https://www.tensorflow.org/versions/r0.12/how_tos/embedding_viz/
Data “Science”
• To be fair…

• Intuition plays a huge role in the scientific method (“make


observations” is Step 1).

• Exploratory analysis is necessary, its okay to not be all rigor all


the time

• But!

• Exploratory analysis (even when it involves the biggest of data)


is meant to *form* a hypothesis, not test one

• Good experimental design and rigorous statistics are essential if


we want to make claims about how the world works
Data “Science”
• To be fair…

• Intuition plays a huge role in the scientific method (“make


observations” is Step 1).

• Exploratory analysis is necessary, its okay to not be all rigor all


the time

• But!

• Exploratory analysis (even when it involves the biggest of data)


is meant to *form* a hypothesis, not test one

• Good experimental design and rigorous statistics are essential if


we want to make claims about how the world works
Data “Science”
• To be fair…

• Intuition plays a huge role in the scientific method (“make


observations” is Step 1).

• Exploratory analysis is necessary, its okay to not be all rigor all


the time

• But!

• Exploratory analysis (even when it involves the biggest of data)


is meant to *form* a hypothesis, not test one

• Good experimental design and rigorous statistics are essential if


we want to make claims about how the world works
Data “Science”
Per capita cheese consumption
correlates with
Number of people who died by becoming tangled in their bedsheets
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

800 deaths
33lbs

Bedsheet tanglings
Cheese consumed

31.5lbs 600 deaths

30lbs 400 deaths

28.5lbs 200 deaths


2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Bedsheet tanglings Cheese consumed


ρ = 0.95 tylervigen.com

https://en.wikipedia.org/wiki/Data_dredging
http://www.tylervigen.com/spurious-correlations
Data “Science”
Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon:
An argument for multiple comparisons correction
Craig M. Bennett1, Abigail A. Baird2, Michael B. Miller1, and George L. Wolford3
1 Psychology Department, University of California Santa Barbara, Santa Barbara, CA; 2 Department of Psychology, Vassar College, Poughkeepsie, NY;
3 Department of Psychological & Brain Sciences, Dartmouth College, Hanover, NH

INTRODUCTION GLM RESULTS


With the extreme dimensionality of functional neuroimaging data comes
extreme risk for false positives. Across the 130,000 voxels in a typical fMRI
volume the probability of a false positive is almost certain. Correction for
multiple comparisons should be completed with these datasets, but is often
ignored by investigators. To illustrate the magnitude of the problem we
carried out a real experiment that demonstrates the danger of not correcting
for chance properly.

METHODS
Subject. One mature Atlantic Salmon (Salmo salar) participated in the fMRI study.
The salmon was approximately 18 inches long, weighed 3.8 lbs, and was not alive at
the time of scanning. A t-contrast was used to test for regions with significant BOLD signal change
during the photo condition compared to rest. The parameters for this
Task. The task administered to the salmon involved completing an open-ended
mentalizing task. The salmon was shown a series of photographs depicting human
comparison were t(131) > 3.15, p(uncorrected) < 0.001, 3 voxel extent
individuals in social situations with a specified emotional valence. The salmon was threshold.
asked to determine what emotion the individual in the photo must have been
experiencing. Several active voxels were discovered in a cluster located within the salmon’s
brain cavity (Figure 1, see above). The size of this cluster was 81 mm3 with a
Design. Stimuli were presented in a block design with each photo presented for 10 cluster-level significance of p = 0.001. Due to the coarse resolution of the
seconds followed by 12 seconds of rest. A total of 15 photos were displayed. Total echo-planar image acquisition and the relatively small size of the salmon
scan time was 5.5 minutes.
brain further discrimination between brain regions could not be completed.
Preprocessing. Image processing was completed using SPM2. Preprocessing steps Out of a search volume of 8064 voxels a total of 16 voxels were significant.
for the functional imaging data included a 6-parameter rigid-body affine realignment
of the fMRI timeseries, coregistration of the data to a T1 -weighted anatomical image, Identical t-contrasts controlling the false discovery rate (FDR) and familywise
and 8 mm full-width at half-maximum (FWHM) Gaussian smoothing. error rate (FWER) were completed. These contrasts indicated no active
voxels, even at relaxed statistical thresholds (p = 0.25).
Analysis. Voxelwise statistics on the salmon data were calculated through an
ordinary least-squares estimation of the general linear model (GLM). Predictors of
the hemodynamic response were modeled by a boxcar function convolved with a
canonical hemodynamic response. A temporal high pass filter of 128 seconds was
include to account for low frequency drift. No autocorrelation correction was VOXELWISE VARIABILITY
applied.

Voxel Selection. Two methods were used for the correction of multiple comparisons
in the fMRI results. The first method controlled the overall false discovery rate
(FDR) and was based on a method defined by Benjamini and Hochberg (1995). The
second method controlled the overall familywise error rate (FWER) through the use
of Gaussian random field theory. This was done using algorithms originally devised
by Friston et al. (1994).

To examine the spatial configuration of false positives we completed a


DISCUSSION variability analysis of the fMRI timeseries. On a voxel-by-voxel basis we
calculated the standard deviation of signal values across all 140 volumes.
Can we conclude from this data that the salmon is engaging in the
perspective-taking task? Certainly not. What we can determine is that random We observed clustering of highly variable voxels into groups near areas of
noise in the EPI timeseries may yield spurious results if multiple comparisons high voxel signal intensity. Figure 2a shows the mean EPI image for all 140
are not controlled for. Adaptive methods for controlling the FDR and FWER image volumes. Figure 2b shows the standard deviation values of each voxel.
are excellent options and are widely available in all major fMRI analysis Figure 2c shows thresholded standard deviation values overlaid onto a high-
packages. We argue that relying on standard statistical thresholds (p < 0.001) resolution T1 -weighted image.
and low minimum cluster sizes (k > 8) is an ineffective control for multiple To
comparisons. We further argue that the vast majority of fMRI studies should To investigate this effect in greater
be utilizing multiple comparisons correction as standard practice in the detail we conducted a Pearson
computation of their statistics. correlation to examine the relationship
between the signal in a voxel and its
variability. There was a significant
positive correlation between the mean
REFERENCES voxel value and its variability over
time (r = 0.54, p < 0.001). A
Benjamini Y and Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful
approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57:289-300. scatterplot of mean voxel signal
intensity against voxel standard
Friston KJ, Worsley KJ, Frackowiak RSJ, Mazziotta JC, and Evans AC. (1994). Assessing the deviation is presented to the right.
significance of focal activations using their spatial extent. Human Brain Mapping, 1:214-220.

Neural correlates of interspecies perspective taking in the


post-mortem Atlantic Salmon
3 Department of Psychological & Brain Sciences, Dartmouth College, Hanover, NH

INTRODUCTION GLM RESULTS

Data “Science”
With the extreme dimensionality of functional neuroimaging data comes
extreme risk for false positives. Across the 130,000 voxels in a typical fMRI
volume the probability of a false positive is almost certain. Correction for
multiple comparisons should be completed with these datasets, but is often
ignored by investigators. To illustrate the magnitude of the problem we
carried out a real experiment that demonstrates the danger of not correcting
Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon:
for chance properly. An argument for multiple comparisons correction
Craig M. Bennett1, Abigail A. Baird2, Michael B. Miller1, and George L. Wolford3
1 Psychology Department, University of California Santa Barbara, Santa Barbara, CA; 2 Department of Psychology, Vassar College, Poughkeepsie, NY;
3 Department of Psychological & Brain Sciences, Dartmouth College, Hanover, NH

INTRODUCTION GLM RESULTS

METHODS With the extreme dimensionality of functional neuroimaging data comes


extreme risk for false positives. Across the 130,000 voxels in a typical fMRI
volume the probability of a false positive is almost certain. Correction for
multiple comparisons should be completed with these datasets, but is often

Subject. One mature Atlantic Salmon (Salmo salar) participated in the fMRI study.
ignored by investigators. To illustrate the magnitude of the problem we
carried out a real experiment that demonstrates the danger of not correcting
for chance properly.

The salmon was approximately 18 inches long, weighed 3.8 lbs, and was not alive at
the time of scanning. METHODS
Subject. One mature Atlantic Salmon (Salmo salar) participated in the fMRI study.
A t-contrast was used to test for regions with significant BOLD signal change
during the photo condition compared to rest. The parameters for this
The salmon was approximately 18 inches long, weighed 3.8 lbs, and was not alive at
the time of scanning. A t-contrast was used to test for regions with significant BOLD signal change
Task. The task administered to the salmon involved completing an open-ended
Task. The task administered to the salmon involved completing an open-ended
during the photo condition compared to rest. The parameters for this
mentalizing task. The salmon was shown a series of photographs depicting human

mentalizing task. The salmon was shown a series of photographs depicting human
individuals in social situations with a specified emotional valence. The salmon was comparison were t(131) > 3.15, p(uncorrected) < 0.001, 3 voxel extent
comparison were t(131) > 3.15, p(uncorrected) < 0.001, 3 voxel extent
threshold.
asked to determine what emotion the individual in the photo must have been
experiencing.

individuals in social situations with a specified emotional valence. The salmon was
Design. Stimuli were presented in a block design with each photo presented for 10
threshold.
Several active voxels were discovered in a cluster located within the salmon’s
brain cavity (Figure 1, see above). The size of this cluster was 81 mm3 with a
cluster-level significance of p = 0.001. Due to the coarse resolution of the
seconds followed by 12 seconds of rest. A total of 15 photos were displayed. Total echo-planar image acquisition and the relatively small size of the salmon
asked to determine what emotion the individual in the photo must have been
scan time was 5.5 minutes.
brain further discrimination between brain regions could not be completed.
Out of a search volume of 8064 voxels a total of 16 voxels were significant.
Several active voxels were discovered in a cluster located within the salmon’s
Preprocessing. Image processing was completed using SPM2. Preprocessing steps

experiencing. for the functional imaging data included a 6-parameter rigid-body affine realignment
of the fMRI timeseries, coregistration of the data to a T1 -weighted anatomical image,
and 8 mm full-width at half-maximum (FWHM) Gaussian smoothing.
Identical t-contrasts controlling the false discovery rate (FDR) and familywise

brain cavity (Figure 1, see above). The size of this cluster was 81 mm3 with a
error rate (FWER) were completed. These contrasts indicated no active
voxels, even at relaxed statistical thresholds (p = 0.25).
Analysis. Voxelwise statistics on the salmon data were calculated through an

Design. Stimuli were presented in a block design with each photo presented for 10
ordinary least-squares estimation of the general linear model (GLM). Predictors of
the hemodynamic response were modeled by a boxcar function convolved with a
canonical hemodynamic response. A temporal high pass filter of 128 seconds was cluster-level significance of p = 0.001. Due to the coarse resolution of the
VOXELWISE VARIABILITY
seconds followed by 12 seconds of rest. A total of 15 photos were displayed. Total
include to account for low frequency drift. No autocorrelation correction was
applied.
echo-planar image acquisition and the relatively small size of the salmon
scan time was 5.5 minutes. Voxel Selection. Two methods were used for the correction of multiple comparisons

brain further discrimination between brain regions could not be completed.


in the fMRI results. The first method controlled the overall false discovery rate
(FDR) and was based on a method defined by Benjamini and Hochberg (1995). The
second method controlled the overall familywise error rate (FWER) through the use

Out of a search volume of 8064 voxels a total of 16 voxels were significant.


of Gaussian random field theory. This was done using algorithms originally devised

Preprocessing. Image processing was completed using SPM2. Preprocessing steps


by Friston et al. (1994).

for the functional imaging data included a 6-parameter rigid-body affine realignment
DISCUSSION
To examine the spatial configuration of false positives we completed a
variability analysis of the fMRI timeseries. On a voxel-by-voxel basis we
calculated the standard deviation of signal values across all 140 volumes.
of the fMRI timeseries, coregistration of the data to a T1 -weighted anatomical image,
Can we conclude from this data that the salmon is engaging in the
perspective-taking task? Certainly not. What we can determine is that random Identical t-contrasts controlling the false discovery rate (FDR) and familywise
We observed clustering of highly variable voxels into groups near areas of
high voxel signal intensity. Figure 2a shows the mean EPI image for all 140
and 8 mm full-width at half-maximum (FWHM) Gaussian smoothing.
noise in the EPI timeseries may yield spurious results if multiple comparisons
are not controlled for. Adaptive methods for controlling the FDR and FWER
are excellent options and are widely available in all major fMRI analysis
error rate (FWER) were completed. These contrasts indicated no active
image volumes. Figure 2b shows the standard deviation values of each voxel.
Figure 2c shows thresholded standard deviation values overlaid onto a high-
resolution T1 -weighted image.

voxels, even at relaxed statistical thresholds (p = 0.25).


packages. We argue that relying on standard statistical thresholds (p < 0.001)
and low minimum cluster sizes (k > 8) is an ineffective control for multiple To
Analysis. Voxelwise statistics on the salmon data were calculated through an
comparisons. We further argue that the vast majority of fMRI studies should
be utilizing multiple comparisons correction as standard practice in the
To investigate this effect in greater
detail we conducted a Pearson
correlation to examine the relationship
ordinary least-squares estimation of the general linear model (GLM). Predictors of
computation of their statistics.
between the signal in a voxel and its
variability. There was a significant

the hemodynamic response were modeled by a boxcar function convolved


REFERENCES with a
positive correlation between the mean
voxel value and its variability over
time (r = 0.54, p < 0.001). A
canonical hemodynamic response. A temporal high pass filter of 128 seconds was
Benjamini Y and Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful
approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57:289-300. scatterplot of mean voxel signal

include to account for low frequency drift. No autocorrelation correction was


Friston KJ, Worsley KJ, Frackowiak RSJ, Mazziotta JC, and Evans AC. (1994). Assessing the
significance of focal activations using their spatial extent. Human Brain Mapping, 1:214-220.
intensity against voxel standard
deviation is presented to the right.
VOXELWISE VARIABILITY
applied.

Voxel Selection. Two methods were used for the correction of multiple comparisons

Neural correlates of interspecies perspective taking in the


in the fMRI results. The first method controlled the overall false discovery rate
(FDR) and was based on a method defined by Benjamini and Hochberg (1995). The
second method controlled the overall familywise error rate (FWER) through the use
post-mortem Atlantic Salmon
of Gaussian random field theory. This was done using algorithms originally devised
by Friston et al. (1994).
individuals in social situations with a specified emotional valence. The salmon was
asked to determine what emotion the individual in the photo must have been
experiencing.

Design. Stimuli were presented in a block design with each photo presented for 10
seconds followed by 12 seconds of rest. A total of 15 photos were displayed. Total
scan time was 5.5 minutes.

Data “Science” Preprocessing. Image processing was completed using SPM2. Preprocessing steps
for the functional imaging data included a 6-parameter rigid-body affine realignment
of the fMRI timeseries, coregistration of the data to a T1 -weighted anatomical image,
and 8 mm full-width at half-maximum (FWHM) Gaussian smoothing.

Analysis. Voxelwise statistics on the salmon data were calculated through an


ordinary least-squares estimation of the general linear model (GLM). Predictors of
Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon:
An argument for multiple comparisons correction
Craig M. Bennett , Abigail A. Baird , Michael B. Miller , andthe
1 2 Georgehemodynamic
L. Wolford 1
response were modeled by a boxcar function convolved with a
3
1 Psychology Department, University of California Santa Barbara, Santa Barbara, CA; 2 Department of Psychology, Vassar College, Poughkeepsie, NY;

canonical hemodynamic response. A temporal high pass filter of 128 seconds was
Department of Psychological & Brain Sciences, Dartmouth College, Hanover, NH
3

INTRODUCTION
include to account for low frequency drift. No autocorrelation correction was
GLM RESULTS
With the extreme dimensionality of functional neuroimaging data comes
extreme risk for false positives. Across the 130,000 voxels in a typical fMRI
applied.
volume the probability of a false positive is almost certain. Correction for
multiple comparisons should be completed with these datasets, but is often
ignored by investigators. To illustrate the magnitude of the problem we
carried out a real experiment that demonstrates the danger of not correcting Voxel Selection. Two methods were used for the correction of multiple comparisons
for chance properly.
in the fMRI results. The first method controlled the overall false discovery rate
METHODS (FDR) and was based on a method defined by Benjamini and Hochberg (1995). The
second method controlled the overall familywise error rate (FWER) through the use
Subject. One mature Atlantic Salmon (Salmo salar) participated in the fMRI study.
The salmon was approximately 18 inches long, weighed 3.8 lbs, and was not alive at
the time of scanning. A t-contrast was used to test for regions with significant BOLD signal change
Task. The task administered to the salmon involved completing an open-ended
mentalizing task. The salmon was shown a series of photographs depicting human
of Gaussian random field theory. This was done using algorithms originally devised
during the photo condition compared to rest. The parameters for this
comparison were t(131) > 3.15, p(uncorrected) < 0.001, 3 voxel extent
threshold.
by Friston et al. (1994).
individuals in social situations with a specified emotional valence. The salmon was
asked to determine what emotion the individual in the photo must have been
experiencing. Several active voxels were discovered in a cluster located within the salmon’s
brain cavity (Figure 1, see above). The size of this cluster was 81 mm3 with a
Design. Stimuli were presented in a block design with each photo presented for 10 cluster-level significance of p = 0.001. Due to the coarse resolution of the
seconds followed by 12 seconds of rest. A total of 15 photos were displayed. Total echo-planar image acquisition and the relatively small size of the salmon
scan time was 5.5 minutes.
brain further discrimination between brain regions could not be completed.
Preprocessing. Image processing was completed using SPM2. Preprocessing steps Out of a search volume of 8064 voxels a total of 16 voxels were significant.
for the functional imaging data included a 6-parameter rigid-body affine realignment
of the fMRI timeseries, coregistration of the data to a T1 -weighted anatomical image, Identical t-contrasts controlling the false discovery rate (FDR) and familywise
and 8 mm full-width at half-maximum (FWHM) Gaussian smoothing. error rate (FWER) were completed. These contrasts indicated no active
Analysis. Voxelwise statistics on the salmon data were calculated through an
ordinary least-squares estimation of the general linear model (GLM). Predictors of
the hemodynamic response were modeled by a boxcar function convolved with a
voxels, even at relaxed statistical thresholds (p = 0.25).
DISCUSSION
canonical hemodynamic response. A temporal high pass filter of 128 seconds was
include to account for low frequency drift. No autocorrelation correction was VOXELWISE VARIABILITY

Can we conclude from this data that the salmon is engaging in the
applied.

Voxel Selection. Two methods were used for the correction of multiple comparisons
in the fMRI results. The first method controlled the overall false discovery rate
(FDR) and was based on a method defined by Benjamini and Hochberg (1995). The
second method controlled the overall familywise error rate (FWER) through the use
of Gaussian random field theory. This was done using algorithms originally devised
perspective-taking task? Certainly not. What we can determine is that random
by Friston et al. (1994).
noise in the EPI timeseries may yield spurious results if multiple comparisons
are not controlled for. Adaptive methods for controlling the FDR and FWER
To examine the spatial configuration of false positives we completed a
DISCUSSION variability analysis of the fMRI timeseries. On a voxel-by-voxel basis we
calculated the standard deviation of signal values across all 140 volumes.
Can we conclude from this data that the salmon is engaging in the
perspective-taking task? Certainly not. What we can determine is that random
noise in the EPI timeseries may yield spurious results if multiple comparisons
are excellent options and are widely available in all major fMRI analysis
We observed clustering of highly variable voxels into groups near areas of
high voxel signal intensity. Figure 2a shows the mean EPI image for all 140

packages. We argue that relying on standard statistical thresholds (p < 0.001)


are not controlled for. Adaptive methods for controlling the FDR and FWER image volumes. Figure 2b shows the standard deviation values of each voxel.
are excellent options and are widely available in all major fMRI analysis Figure 2c shows thresholded standard deviation values overlaid onto a high-
packages. We argue that relying on standard statistical thresholds (p < 0.001) resolution T1 -weighted image.
and low minimum cluster sizes (k > 8) is an ineffective control for multiple
comparisons. We further argue that the vast majority of fMRI studies should
To and low minimum cluster sizes (k > 8) is an ineffective control for multiple
To investigate this effect in greater
be utilizing multiple comparisons correction as standard practice in the detail we conducted a Pearson
computation of their statistics.
comparisons. We further argue that the vast majority of fMRI studies should
correlation to examine the relationship
between the signal in a voxel and its
variability. There was a significant

REFERENCES be utilizing multiple comparisons correction as standard practice in the


positive correlation between the mean
voxel value and its variability over

computation of their statistics.


time (r = 0.54, p < 0.001). A
Benjamini Y and Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful
approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57:289-300. scatterplot of mean voxel signal
intensity against voxel standard
Friston KJ, Worsley KJ, Frackowiak RSJ, Mazziotta JC, and Evans AC. (1994). Assessing the deviation is presented to the right.
significance of focal activations using their spatial extent. Human Brain Mapping, 1:214-220.

REFERENCES
Neural correlates of interspecies perspective taking in the Benjamini Y and Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful
approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57:289-300.

post-mortem Atlantic Salmon Friston KJ, Worsley KJ, Frackowiak RSJ, Mazziotta JC, and Evans AC. (1994). Assessing the
“Data” Science
“Data” Science
Roses are red.
Violets are blue.
Roses are red.
Violets are blue.
“Data” Science
“Data” Science
“Data” Science
“Data” Science
“Data” Science
• To be fair…

• Not all science is empirical—its possible to gain insight


and make progress via introspection

• E.g. simulations, case studies, motivating/illustrative


examples

• But!

• Theory is only helpful if it mirrors practice.

• “All models are wrong, but some are useful.”


“Data” Science
• To be fair…

• Not all science is empirical—its possible to gain insight


and make progress via introspection

• E.g. simulations, case studies, motivating/illustrative


examples

• But!

• Theory is only helpful if it mirrors practice.

• “All models are wrong, but some are useful.”


“Data” Science
• To be fair…

• Not all science is empirical—its possible to gain insight


and make progress via introspection

• E.g. simulations, case studies, motivating/illustrative


examples, worst-case vs. average case runtime

• But!

• Theory is only helpful if it mirrors practice.

• “All models are wrong, but some are useful.”


“Data” Science
• To be fair…

• Not all science is empirical—its possible to gain insight


and make progress via introspection

• E.g. simulations, case studies, motivating/illustrative


examples, worst-case vs. average case runtime

• But!

• Theory is only helpful if it mirrors practice.

• “All models are wrong, but some are useful.”


“Data” Science
• To be fair…

• Not all science is empirical—its possible to gain insight


and make progress via introspection

• E.g. simulations, case studies, motivating/illustrative


examples, worst-case vs. average case runtime

• But!

• Theory is only helpful if it mirrors practice.

• “All models are wrong, but some are useful.”


“Data” Science
• To be fair…

• Not all science is empirical—its possible to gain insight


and make progress via introspection

• E.g. simulations, case studies, motivating/illustrative


examples, worst-case vs. average case runtime

• But!

• Theory is only helpful if it mirrors practice.

• “All models are wrong, but some are useful.”


“Data” Science
• Problem: Parents run late when
picking kids up from day care

• Sensible Solution: Impose a late fee

https://www.nytimes.com/2005/05/15/books/chapters/freakonomics.html
https://rady.ucsd.edu/faculty/directory/gneezy/pub/docs/fine.pdf
“Data” Science
• Problem: Parents run late when
picking kids up from day care

• Sensible Solution: Impose a late fee

https://www.nytimes.com/2005/05/15/books/chapters/freakonomics.html
https://rady.ucsd.edu/faculty/directory/gneezy/pub/docs/fine.pdf
“Data” Science
• He is not lucky to have to pay for the property.
• Did he pay for the property?

• The girl was not lucky to get away alive.


• Did she get away alive?

http://web.stanford.edu/~laurik/presentations/LuckyAtSALTwithNotes.pdf
“Data” Science
• He is not lucky to have to pay for the property.
• Did he pay for the property?

• The girl was not lucky to get away alive.


• Did she get away alive?

http://web.stanford.edu/~laurik/presentations/LuckyAtSALTwithNotes.pdf
“Data” Science
• He is not lucky to have to pay for the property.
• Did he pay for the property?

• The girl was not lucky to get away alive.


• Did she get away alive?

http://web.stanford.edu/~laurik/presentations/LuckyAtSALTwithNotes.pdf
“Data” Science

http://web.stanford.edu/~laurik/presentations/LuckyAtSALTwithNotes.pdf
“Data” Science

Observed

http://web.stanford.edu/~laurik/presentations/LuckyAtSALTwithNotes.pdf
Data! Science!
CSCI 1951A

What is Data Science?


January February March April
S M T W T F S S M T W T F S S M T W T F S S M T W T

1 2 3 4 5 1 2 1 2 1 2 3 4
6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11
2019 (United States)
13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18
20 21 22 23 24 25 26 17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25
27 28 29 30 31 24 25 26 27 28 24 25 26 27 28 29 30 28 29 30
• Data Collection/Cleaning
31
February March April
S M T W T F S S M T W T F S S M T W T F S
May June July August
S
3 4
M
5
T
6
W
7
T
1
8
F
2
9
S S
3
M
4 5
T
6
W • T
7
Probability
1
8
F
2
9
S
7
Sand 1
8
Statistics
M
9 10 11
2
T
3 4
W T
5
12
F
6
13
S S M T W T

1 2 3 4 1 1 2 3 4 5 6 1
10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18 19 20
5 6 7 8 9 10 11 2 3 4 5 6 7 8 7 8 9 10 11 12 13 4 5 6 7 8
17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25 26 27
12 13 14 15 16 17 18 9 10 11 12 13 14 15 14 15 16 17 18 19 20 11 12 13 14 15
24
19
25
20
26
21
27
22
28
23 24 25
24
16
31
25
17
26
18
27
19
• 28
20 Machine Learning
29
21
30
22
28
21
29
22
30
23 24 25 26 27 18 19 20 21 22
26 27 28 29 30 31 23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29
March April
30
S M T W T F S S M T W T F S
June 1 2 1 2 July
3 4 • Advanced
5 6 Topics/
August
3
S 4
M 5 W
T 6 T7 8
F 9
S 7
S M8 9 10
T W 11
T 12 13
F S S M T W T F S
September October November December
10
S
11
M
12 13 14 15
T W T F
1
16
S
14
S
1
15
M
2 17
16
T
3 18
W
4
T
Applications
5 20
19
F
6
S S M
1
T W T
2
F
3
S S M T W T
2
17 3
18 4 20
19 5 21 6 22
7 8
23 7
21 8
22 9 10
23 24 11
25 12
26 13
27 4 5 6 7 8 9 10
1 2 3 4 5 6 7 1 2 3 4 5 1 2 1 2 3 4 5
9
24 10
25 11
26 1227 1328 14
29 15
30 14
28 15
29 16
30 17 18 19 20 11 12 13 14 15 16 17
8 9 10 11 12 13 14 6 7 8 9 10 11 12 3 4 5 6 7 8 9 8 9 10 11 12
16
31 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22 23 24
15
23
22
16
24
23
17 18 19 20
25 26 27 28
24 April
25 26 27
21
29
28
13
28
20
14
29
21
15 16 17
30 31
22 23 24
• Other Topics
18
25
19
26
10
25
17
11
26
18
12
27
19
13
28
20
14
29
21
15
30
22
16
31
23
15 16 17 18 19
22 23 24 25 26
30
29
S 30
M T W T F S 27 28 29 30 31 24 25 26 27 28 29 30 29 30 31
July August
S
1
M
2 W3 T4
T F
5 S
6 S M T W T F S
7 8 9 10 11
October 12 13 November December
1 2 3 4 5 6 1 32
14 15 16 17 18 19 20
7
S M8 9 10
T W 11
T 12
F 13
S 4 M5 T6 W7 T8 F9 10
S S S M T W T F S
21 22 23 2
1 24 3
25 426 527 1 17
2 1 2 3 4 5 6 7
14 15 16 17 18 19 20 11 12 13 14 15 16
28
6 729 30 9 10
8 11 12 3 19
4 20
5 21
6 22
7 23
8 24
9 8 9 10 11 12 13 14
21 22 23 24 25 26 27 18
January February March April
S M T W T F S S M T W T F S S M T W T F S S M T W T

1 2 3 4 5 1 2 1 2 1 2 3 4
6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11
2019 (United States)
13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18
20 21 22 23 24 25 26 17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25
27 28 29 30 31 24 25 26 27 28 24 25 26 27 28 29 30 28 29 30
31
February March April
S M T W T F S S M T W T F S S M T W T F S
May 1 2
June 1 2 1 2
July
3 4 5 6
August
S M T W T F S S M T W T F S S M T W T F S S M T W T
3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11 12 13
1 2 3 4 1 1 2 3 4 5 6 1
10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18 19 20
5 6 7 8 9 10 11 2 3 4 5 6 7 8 7 8 9 10 11 12 13 4 5 6 7 8
17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25 26 27
12 13 14 15 16 17 18 9 10 11 12 13 14 15 14 15 16 17 18 19 20 11 12 13 14 15
24 25 26 27 28 24 25 26 27 28 29 30 28 29 30
19 20 21 22 23 24 25 16 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22
26 27 28 29
March
30 31
31
23 24 25 26
April
27 Right Here, Right Now.
28 29 28 29 30 31 25 26 27 28 29
30
S M T W T F S S M T W T F S
June 1 2 1 2 July
3 4 5 6 August
3
S 4
M 5 W
T 6 T7 8
F 9
S 7
S M8 9 10
T W 11
T 12
F 13
S S M T W T F S
September 1 1
October
2 173 184 5 6
November
1 2 3
December
10 11 12 13 14 15 16 14 15 16 19 20
S M T W T F S S M T W T F S S M T W T F S S M T W T
2
17 3
18 4 20
19 5 21 6 22
7 8
23 7
21 8
22 9 10
23 24 11
25 12
26 13
27 4 5 6 7 8 9 10
1 2 3 4 5 6 7 1 2 3 4 5 1 2 1 2 3 4 5
9
24 10
25 11
26 1227 1328 14
29 15
30 14
28 15
29 16
30 17 18 19 20 11 12 13 14 15 16 17
8 9 10 11 12 13 14 6 7 8 9 10 11 12 3 4 5 6 7 8 9 8 9 10 11 12
16
31 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22 23 24
15 16 17 18 19 20 21 13 14 15 16 17 18 19 10 11 12 13 14 15 16 15 16 17 18 19
23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29 30 31
22 23 24 April
25 26 27 28 20 21 22 23 24 25 26 17 18 19 20 21 22 23 22 23 24 25 26
30
29
S 30
M T W T F S 27 28 29 30 31 24 25 26 27 28 29 30 29 30 31
July August
S
1
M
2 W3 T4
T F
5 S
6 S M T W T F S
7 8 9 10 11
October 12 13 November December
1 2 3 4 5 6 1 32
14 15 16 17 18 19 20
7
S M8 9 10
T W 11
T 12
F 13
S 4 M5 T6 W7 T8 F9 10
S S S M T W T F S
21 22 23 2
1 24 3
25 426 527 1 17
2 1 2 3 4 5 6 7
14 15 16 17 18 19 20 11 12 13 14 15 16
28
6 729 30 9 10
8 11 12 3 19
4 20
5 21
6 22
7 23
8 24
9 8 9 10 11 12 13 14
21 22 23 24 25 26 27 18
January February March April
S M T W T F S S M T W T F S S M T W T F S S M T W T

1 2 3 4 5 1 2 1 2 1 2 3 4
6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11
2019 (United States)
13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18
20 21 22 23 24 25 26
27 28 29 30 31
17 18 19 20 21 22 23
24 25 26 27 28
Databases for Data Scientists:
17 18 19 20 21 22 23
24 25 26 27 28 29 30
21 22 23 24 25
28 29 30
31
February March April
S M T W T F S S M T W
• Entity-Relationship (ER)
T F S S M T W T F S
May June July August
S
3 4
M
5
T
6
W
7
T
1
8
F
2
9
S S
3
M
4 5
T
6
W
7
Diagrams
1
T
8
2
9
F
7
1
8
2
S
3
S
4 5
M
6
9 10 11 12 13
T W T F S S M T W T

1 2 3 4 1 1 2 3 4 5 6 1
10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18 19 20
5 6 7 8 9 10 11 2 3 4 5 6 7 8 7 8 9 10 11 12 13 4 5 6 7 8
17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25 26 27
12 13 14 15 16 17 18 9 10 11 12 13 14 15 14 15 16 17 18 19 20 11 12 13 14 15
24
19
25
20
26
21
27
22
28
23 24 25
24
16
31
25
17
26
18

27
19
28
20 Relational Algebra
29
21
30
22
28
21
29
22
30
23 24 25 26 27 18 19 20 21 22
26 27 28 29 30 31 23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29
March April
30
S M T W T F S S M T W T F S
June 1 2 1 2 July
3 •4 SQL
5 6 August
3
S 4
M 5 W
T 6 T7 8
F 9
S 7
S M8 9 10
T W 11
T 12
F 13
S S M T W T F S
September 1 1
October
2 173 184 5 6
November
1 2 3
December
10 11 12 13 14 15 16 14 15 16 19 20
S M T W T F S S M T W T F S S M T W T F S S M T W T
2
17 3
18 4 20
19 5 21 6 22
7 8
23 7
21 8
22 9 10
23 24 11
25 12
26 13
27 4 5 6 7 8 9 10
9
1
24
8
2
10
25
9
3
11
26 12
4
27 13
5
10 11 12 13
6
28 14
29
7
15
30
14
14
28
6
15
29
7
1
16
8
2 •
30 17 18
9 10
3
[Briefly] Optimization
4
19
11
5
20
12
11
3
12
4
13
5
14
6
15
7
1
16
8
2
17
9
1
8
2 3 4 5
9 10 11 12
16
31 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22 23 24
15 16 17 18 19 20 21 13 14 15 16 17 18 19 10 11 12 13 14 15 16 15 16 17 18 19
23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29 30 31
22 23 24 April
25 26 27 28 20 21 22 23 24 25 26 17 18 19 20 21 22 23 22 23 24 25 26
30
29
S 30
M T W
July T F S 27 28
August

29 30 31 [Briefly] NoSQL
24 25 26 27 28 29 30 29 30 31
S
1
M
2 W3 T4
T F
5 S
6 S M T W T F S
7 8 9 10 11
October 12 13 November December
1 2 3 4 5 6 1 32
14 15 16 17 18 19 20
7
S M8 9 10
T W 11
T 12
F 13
S 4 M5 T6 W7 T8 F9 10
S S S M T W T F S
21 22 23 2
1 24 3
25 426 527 1 17
2 1 2 3 4 5 6 7
14 15 16 17 18 19 20 11 12 13 14 15 16
28
6 729 30 9 10
8 11 12 3 19
4 20
5 21
6 22
7 23
8 24
9 8 9 10 11 12 13 14
21 22 23 24 25 26 27 18
January February March April
S M T W T F S S M T W T F S S M T W T F S S M T W T

1 2 3 4 5 1 2 1 2 1 2 3 4
6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11
2019 (United States)
13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18
20 21 22 23 24 25 26
27 28 29 30 31
17 18 19 20 21 22 23
24 25 26 27 28
Collecting and Cleaning Data:
17 18 19 20 21 22 23
24 25 26 27 28 29 30
21 22 23 24 25
28 29 30
31
February March April
S M T W T F S S M T W
• T
Web Crawling
F S S M T W T F S
May 1 2
June 1 2 1 2
July
3 4 5 6
August
S M T W T F S S M T W T F S S M T W T F S S M T W T
3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11 12 13
1 2 3 4 1 1 2 3 4 5 6 1
10
5
17
11
6
18
12
7
19
13
8
20
14
9
21
15
10
22
16
11
23
10
2
17
11
3
18
12
4
19
13

5
20
14
6
21
APIs
15
7
22
16
8
23
14
7
21
15
8
22
16
9
23
17
10
24
18
11
25
19
12
26
20
13
27
4 5 6 7 8
12 13 14 15 16 17 18 9 10 11 12 13 14 15 14 15 16 17 18 19 20 11 12 13 14 15
24 25 26 27 28 24 25 26 27 28 29 30 28 29 30
19 20 21 22 23 24 25 16 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22
31
26 27 28 29 30 31 23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29
March
30

April Legal 101
S M T W T F S S M T W T F S
June 1 2 1 2 July
3 4 5 6 August
3
S 4
M 5 W
T 6 T7 8
F 9
S 7
S M8 9 10
T W 11
T 12
F 13
S S M T W T F S
September October November December
10
S
11
M
12 13 14 15
T W T F
1
16
S
14
S
1
15
M
2 17
16
T W

3 184
T
Cleaning, Normalization,
5
19
F
6
20
S S M
1 2
T
3
W T F S S M T W T
2
17 3
18 4 20
19 5 21 6 22
7 8
23 7
21 8
22 9 10
23 24 11
25 12
26 13
4
27 5 6 7 8 9 10
9
1
24
8
2
10
25
9
3
11
26 12
4
27 13
5
10 11 12 13
6
28 14
29
7
15
30
14
14
28
6
15
29
7
1
16
2
30 17 18
8 9 10
3
Regular113Expressions
4
19
11
5
20
12
1 2
12 13 14 15 16 17
4 5 6 7 8 9
1
8
2 3 4 5
9 10 11 12
16
31 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22 23 24
15 16 17 18 19 20 21 13 14 15 16 17 18 19 10 11 12 13 14 15 16 15 16 17 18 19
23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29 30 31
22 23 24 April
25 26 27 28 20 21 22 23 24 25 26 17 18 19 20 21 22 23 22 23 24 25 26
30
29
S 30
M T W
July T F S 27 28
August

29 30 31 Crowdsourcing24 25 26 27 28 29 30 29 30 31
S
1
M
2 W3 T4
T F
5 S
6 S M T W T F S
7 8 9 10 11
October 12 13 November December
1 2 3 4 5 6 1 32
14 15 16 17 18 19 20
7
S M8 9 10
T W 11
T 12
F 13
S 4 M5 T6 W7 T8 F9 10
S S S M T W T F S
21 22 23 2
1 24 3
25 426 527 1 17
2 1 2 3 4 5 6 7
14 15 16 17 18 19 20 11 12 13 14 15 16
28
6 729 30 9 10
8 11 12 3 19
4 20
5 21
6 22
7 23
8 24
9 8 9 10 11 12 13 14
21 22 23 24 25 26 27 18
January February March April
S M T W T F S S M T W T F S S M T W T F S S M T W T

1 2 3 4 5 1 2 1 2 1 2 3 4
6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11
2019 (United States)
13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18
20 21 22 23 24 25 26 17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25
27 28 29 30 31 24 25 26 27 28 24 25 26 27 28 29 30 28 29 30
31
February March April
S M T W T F S S M T W
Big Data and Working at Scale
T F S S M T W T F S
May 1 2
June 1 2 1 2
July
3 4 5 6
August
S M T W T F S S M T W T F S S M T W T F S S M T W T
3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11 12 13
1 2 3 4 1 1 2 3 4 5 6 1
10
5
17
11
6
18
12
7
19
13
8
20
14
9
21
15
10
22
16
11
23
10
2
17
11
3
18
12
4
19
13
5
20
• 14
6
21
Massively Parallel
15
7
22
16
8
23
14
7
21
15
8
22
16
9
23
17
10
24
18
11
25
19
12
26
20
13
27
4 5 6 7 8
12 13 14 15 16 17 18 9 10 11 12 13 14 15 14 15 16 17 18 19 20 11 12 13 14 15
24
19
25
20
26
21
27
22
28
23 24 25
24
16
31
25
17
26
18
27
19
28
20 Processing (MapReduce,
29
21
30
22
28
21
29
22
30
23 24 25 26 27 18 19 20 21 22
26 27 28 29 30 31 23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29
March
30
April Storm)
S M T W T F S S M T W T F S
June 1 2 1 2 July
3 4 5 6 August
3
S 4
M 5 W
T 6 T7 8
F 9
S 7
S M8 9 10
T W 11
T 12
F 13
S S M T W T F S
September October November December
10
S
11
M
12 13 14 15
T W T F
1
16
S
14
S
1
15
M
2 17
16
T
3 18
W
4
T
• [Briefly] Randomized
5
19
F
6
20
S S M
1 2
T
3 Data
W T F S S M T W T
2
17 3
18 4 20
19 5 21 6 22
7 8
23 7
21 8
22 9 10
23 24 11
25 12
26 13
274 5 6 7 8 9 10
9
1
24
8
2
10
25
9
3
11
26 12
4
27 13
5
10 11 12 13
6
28 14
29
7
15
30
14
14
28
6
15
29
7
1
16
2
30 17 18
8 9 10
3
Structures
4
19
11
5
20
12
1 2
11 12 13 14 15 16 17
3 4 5 6 7 8 9
1
8
2
9
3 4 5
10 11 12
16
31 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22 23 24
15 16 17 18 19 20 21 13 14 15 16 17 18 19 10 11 12 13 14 15 16 15 16 17 18 19
23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29 30 31
22 23 24 April
25 26 27 28 20 21 22 23 24 25 26 17 18 19 20 21 22 23 22 23 24 25 26
30
29
S 30
M T W T F S 27 28 29 30 31 24 25 26 27 28 29 30 29 30 31
July August
S
1
M
2 W3 T4
T F
5 S
6 S M T W T F S
7 8 9 10 11
October 12 13 November December
1 2 3 4 5 6 1 32
14 15 16 17 18 19 20
7
S M8 9 10
T W 11
T 12
F 13
S 4 M5 T6 W7 T8 F9 10
S S S M T W T F S
21 22 23 2
1 24 3
25 426 527 1 17
2 1 2 3 4 5 6 7
14 15 16 17 18 19 20 11 12 13 14 15 16
28
6 729 30 9 10
8 11 12 3 19
4 20
5 21
6 22
7 23
8 24
9 8 9 10 11 12 13 14
21 22 23 24 25 26 27 18
January February March April
S M T W T F S S M T W T F S S M T W T F S S M T W T

1 2 3 4 5 1 2 1 2 1 2 3 4
6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11
2019 (United States)
13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18
20 21 22 23 24 25 26 17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25
27 28 29 30 31 24 25 26 27 28 24 25 26 27 28 29 30 28 29 30
31
February March Intro to Probability
April
S M T W T F S S M T W T F S S M T W T F S
May 1 2
June 1 2 1 2
July
3 4 5 6
August
S M T W T F S S M T W T F S S M T W T F S S M T W T
3
10
4
11
5
12
6
1
13
7
2
14
8
3
15
9
4
16
3
10
4
11
5
12
6
13
• 7
14
Random Variables
8
15
9
1
16
7
14
8
1
15
9
2
16
10
3
17
11
4
18
12
5
19
13
6
20
1
5 6 7 8 9 10 11 2 3 4 5 6 7 8 7 8 9 10 11 12 13 4 5 6 7 8
17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25 26 27
12 13 14 15 16 17 18 9 10 11 12 13 14 15 14 15 16 17 18 19 20 11 12 13 14 15
24 25 26 27 28 24 25 26 27 28 29 30 28 29 30
19 20 21 22 23 24 25 16 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22
26 27 28 29
March
30 31
31
23 24 25 26
April
• 27 Sample Spaces
28 29 28 29 30 31 25 26 27 28 29
30
S M T W T F S S M T W T F S
June 1 2 1 2 July
3 4 5 6 August
3
S 4
M 5 W
T 6 T7
September
8
F 9
S 7
S M8 9 10
T W 11
T
October
• DistributionsNovember
12
F 13
S S M T W T F S
December
10 11 12 13 14 15 1
16 14 1
15 2 17
16 3 184 5
19 6
20 1 2 3
S M T W T F S S M T W T F S S M T W T F S S M T W T
2
17 3
18 4 20
19 5 21 6 22
7 8
23 7
21 8
22 9 10
23 24 11
25 12
26 13
27 4 5 6 7 8 9 10
1 2 3 4 5 6 7 1 2 3 4 5 1 2 1 2 3 4 5
9
24 10
25 11
26 1227 1328 14
29 15
30 14
28 15
29 16
30 17 18 19 20 11 12 13 14 15 16 17
8
16
31
15
9
17
16
10 11 12 13
18 19 20 21
17 18 19 20
14
22
21
6
21
13
7
22
14
8 9 10
23 24 25
15 16 17
• Notation
11
26
18
12
27
19
3
18
10
4
19
11
5
20
12
6
21
13
7
22
14
8
23
15
9
24
16
8 9 10 11 12
15 16 17 18 19
23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29 30 31
22 23 24 April
25 26 27 28 20 21 22 23 24 25 26 17 18 19 20 21 22 23 22 23 24 25 26
30
29
S 30
M T W T F S 27 28 29 30 31 24 25 26 27 28 29 30 29 30 31
July August
S
1
M
2 W3 T4
T F
5 S
6 S M T W T F S
7 8 9 10 11
October 12 13 November December
1 2 3 4 5 6 1 32
14 15 16 17 18 19 20
7
S M8 9 10
T W 11
T 12
F 13
S 4 M5 T6 W7 T8 F9 10
S S S M T W T F S
21 22 23 2
1 24 3
25 426 527 1 17
2 1 2 3 4 5 6 7
14 15 16 17 18 19 20 11 12 13 14 15 16
28
6 729 30 9 10
8 11 12 3 19
4 20
5 21
6 22
7 23
8 24
9 8 9 10 11 12 13 14
21 22 23 24 25 26 27 18
January February March April
S M T W T F S S M T W T F S S M T W T F S S M T W T

1 2 3 4 5 1 2 1 2 1 2 3 4
6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11
2019 (United States)
13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18
20 21 22 23 24 25 26 17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25
27 28 29 30 31 24 25 26 27 28 Hypothesis Testing
24 25 26 27 28 29 30 28 29 30
31
February March April
S M T W
May
T F

1
S

2
S M T W

June
T
Central
F

1 2
S
Limit
1 2
S
Theorem
July
3
M

4 5
T

6
W T F S
August
S M T W T F S S M T W T F S S M T W T F S S M T W T
3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11 12 13
1 2 3 4 1 1 2 3 4 5 6 1
10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18 19 20
5 6 7 8 9 10 11 2 3 4 5 6 7 8 7 8 9 10 11 12 13 4 5 6 7 8
17
12
24
18
13
25
19
14
26
20
15
27
21
16
28
22
17
23
18
17
9
24
18
10
25
19
11
26
20
12
27
• 21
13
28
P-Values
22
14
29
23
15
30
21
14
28
22
15
29
23
16
30
24
17
25
18
26
19
27
20 11 12 13 14 15
19 20 21 22 23 24 25 16 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22
31
26 27 28 29 30 31 23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29
March April
30
S M T W
June
T F S S M T W • T T-Tests, Chi-Squared
F S Tests
1 2 1 2 July
3 4 5 6 August
3
S 4
M 5 W
T 6 T7 8
F 9
S 7
S M8 9 10
T W 11
T 12
F 13
S S M T W T F S
September 1 1
October
2 173 184 5 6
November
1 2 3
December
10 11 12 13 14 15 16 14 15 16 19 20
2
S
17
1
3
M
18
2
19
3
T
4 20
W
5 21
4
T
6 22
5
7
6
F
8
23
7
S
7
S
21 8
M
22 23
1
T
9 10
W
24 11
2
T
25
3
• Regression
12
26
4
F
13
4
27
5
S
5
S M
6
T
7
W
8
T
9
F

1
10
2
S S

1 2
M

3
T

4 5
W T

9
24 10
25 11
26 1227 1328 14
29 15
30 14
28 15
29 16
30 17 18 19 20 11 12 13 14 15 16 17
8 9 10 11 12 13 14 6 7 8 9 10 11 12 3 4 5 6 7 8 9 8 9 10 11 12
16
31 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22 23 24
15 16 17 18 19 20 21 13 14 15 16 17 18 19 10 11 12 13 14 15 16 15 16 17 18 19
23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29 30 31
22
30
29
23
30
24 April
25 26 27 28 20
27
21
28
22 23 24
29 30 31
• Fixed and Random Effects
25 26 17
24
18
25
19
26
20
27
21
28
22
29
23
30
22 23 24 25 26
29 30 31
S M T W T F S
July August
S
1
M
2 W3 T4
T F
5 S
6 S M T W T F S
7 8 9 10 11
October 12 13 November December
1 2 3 4 5 6 1 32
14 15 16 17 18 19 20
7
S M8 9 10
T W 11
T 12
F 13
S 4 M5 T6 W7 T8 F9 10
S S S M T W T F S
21 22 23 2
1 24 3
25 426 527 1 17
2 1 2 3 4 5 6 7
14 15 16 17 18 19 20 11 12 13 14 15 16
28
6 729 30 9 10
8 11 12 3 19
4 20
5 21
6 22
7 23
8 24
9 8 9 10 11 12 13 14
21 22 23 24 25 26 27 18
January February March April
S M T W T F S S M T W T F S S M T W T F S S M T W T

1 2 3 4 5 1 2 1 2 1 2 3 4
6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11
2019 (United States)
13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18
20 21 22 23 24 25 26 17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25
27 28 29 30 31 24 25 26 27 28 Intro to ML
24 25 26 27 28 29 30 28 29 30
31
February March April
S M T W
May
T F

1
S

2
S M T W

June
T
Feature
F

1 2
S
Representations
1
S

2
M
July
3 4 5
T

6
W T F S
August
S M T W T F S S M T W T F S S M T W T F S S M T W T
3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11 12 13
1 2 3 4 1 1 2 3 4 5 6 1
10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18 19 20
5 6 7 8 9 10 11 2 3 4 5 6 7 8 7 8 9 10 11 12 13 4 5 6 7 8
17
12
24
18
13
25
19
14
26
20
15
27
21
16
28
22
17
23
18
17
9
24
18
10
25
19
11
26
20
12
27
• 21
13
28
Loss Functions
22
14
29
23
15
30
21
14
28
22
15
29
23
16
30
24
17
25
18
26
19
27
20 11 12 13 14 15
19 20 21 22 23 24 25 16 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22
31
26 27 28 29 30 31 23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29
March April
30
S M T W
June
T F S S M T W Supervised
• T vs.
F S

1 2 1 2 July
3 4 5 6 August
3
S 4
M 5 W
T 6 T7 8
F 9
S 7
S M8 9 10 11Unsupervised
T W 12 13T F Learning
S S M T W T F S
September 1 1
October
2 3 4
November
5 6 1 2 3
December
10 11 12 13 14 15 16 14 15 16 17 18 19 20
S M T W T F S S M T W T F S S M T W T F S S M T W T
2
17 3
18 4 20
19 5 21 6 22
7 8
23 7
21 8
22 9
23 10
24 11
25 12
26 13
27 4 5 6 7 8 9 10
1 2 3 4 5 6 7 1 2 3 4 5 1 2 1 2 3 4 5
9
24 10
25 11
26 1227 1328 14
29 15
30 14
28 15
29 16
30 17 18 19 20 11 12 13 14 15 16 17
8
16
31
15
9
17
16
10 11 12 13
18 19 20 21
17 18 19 20
14
22
21
6
21
13
7
22
14
8
23
15
9
24
16
• 10
25
17
Overview of Categorizations
11
26
18
12
27
19
3
18
10
4
19
11
5
20
12
6
21
13
7
22
14
8
23
15
9
24
16
8 9 10 11 12
15 16 17 18 19
23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29 30 31
22
30
29
23
30
24 April
25 26 27 28 20
27
21
28
22
29
23
30
24
31
of Models
25 26 17
24
18
25
19
26
20
27
21
28
22
29
23
30
22 23 24 25 26
29 30 31
S M T W T F S
July August
S
1
M
2 W3 T4
T F
5 S
6 S M T W T F S
7 8 9 10 11
October 12 13 November December
1 2 3 4 5 6 1 32
14 15 16 17 18 19 20
7
S M8 9 10
T W 11
T 12
F 13
S 4 M5 T6 W7 T8 F9 10
S S S M T W T F S
21 22 23 2
1 24 3
25 426 527 1 17
2 1 2 3 4 5 6 7
14 15 16 17 18 19 20 11 12 13 14 15 16
28
6 729 30 9 10
8 11 12 3 19
4 20
5 21
6 22
7 23
8 24
9 8 9 10 11 12 13 14
21 22 23 24 25 26 27 18
January February March April
S M T W T F S S M T W T F S S M T W T F S S M T W T

1 2 3 4 5 1 2 1 2 1 2 3 4
6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11
2019 (United States)
13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18
20 21 22 23 24 25 26 17 18 19 20 21 22 23 ML for Data Scientists
17 18 19 20 21 22 23 21 22 23 24 25
27 28 29 30 31 24 25 26 27 28 24 25 26 27 28 29 30 28 29 30
31
February March April
S M T W T F S S M T W
• Clustering and Nearest
T F S S M T W T F S
May June July August
S
3 4
M
5
T
6
W
7
T
1
8
F
2
9
S S
3
M
4 5
T
6
W
7
Neighbors
1
8
T
2
9
F
7
1
8
2
S
3 4
S
5 6
M
9 10 11 12 13
T W T F S S M T W T

1 2 3 4 1 1 2 3 4 5 6 1
10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18 19 20
5 6 7 8 9 10 11 2 3 4 5 6 7 8 7 8 9 10 11 12 13 4 5 6 7 8
17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25 26 27
12 13 14 15 16 17 18 9 10 11 12 13 14 15 14 15 16 17 18 19 20 11 12 13 14 15
24
19
25
20
26
21
27
22
28
23 24 25
24
16
31
25
17
26
18
27
19
• 28
20 Linear Regression, Logistic
29
21
30
22
28
21
29
22
30
23 24 25 26 27 18 19 20 21 22
26 27 28 29 30 31 23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29
March
30
April Regression, and SVMs
S M T W T F S S M T W T F S
June 1 2 1 2 July
3 4 5 6 August
3
S 4
M 5 W
T 6 T7 8
F 9
S 7
S M8 9 10
T W 11
T 12
F 13
S S M T W T F S
September October November December
10
S
11
M
12 13 14 15
T W T F
1
16
S
14
S
1
15
M
2 17
16
T
3 18
W
4
T
• Estimating Parameters
5
19
F
6
20
S S M
1 2
T
3 with
W T F S S M T W T
2
17 3
18 4 20
19 5 21 6 22
7 8
23 7
21 8
22 9 10
23 24 11
25 12
26 13
4
27 5 6 7 8 9 10
9
1
24
8
2
10
25
9
3
11
26 12
4
27 13
5
10 11 12 13
6
28 14
29
7
15
30
14
14
28
6
15
29
7
1
16
2
30 17 18
8 9 10
3
Gradient113 Descent
4
19
11
5
20
12
1 2
12 13 14 15 16 17
4 5 6 7 8 9
1
8
2
9
3 4 5
10 11 12
16
31 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22 23 24
15 16 17 18 19 20 21 13 14 15 16 17 18 19 10 11 12 13 14 15 16 15 16 17 18 19
23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29 30 31
22 23 24 April
25 26 27 28 20 21 22 23 24 25 26 17 18 19 20 21 22 23 22 23 24 25 26
30
29
S 30
M T W
July T F S 27 28 29 30 31
August
• Using SciKit Learn
24 25 26 27 28 29 30 29 30 31
S
1
M
2 W3 T4
T F
5 S
6 S M T W T F S
7 8 9 10 11
October 12 13 November December
1 2 3 4 5 6 1 32
14 15 16 17 18 19 20
7
S M8 9 10
T W 11
T 12
F 13
S 4 M5 T6 W7 T8 F9 10
S S S M T W T F S
21 22 23 2
1 24 3
25 426 527 1 17
2 1 2 3 4 5 6 7
14 15 16 17 18 19 20 11 12 13 14 15 16
28
6 729 30 9 10
8 11 12 3 19
4 20
5 21
6 22
7 23
8 24
9 8 9 10 11 12 13 14
21 22 23 24 25 26 27 18
January February March April
S M T W T F S S M T W T F S S M T W T F S S M T W T

1 2 3 4 5 1 2 1 2 1 2 3 4
6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11
2019 (United States)
13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18
20 21 22 23 24 25 26 17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25
27 28 29 30 31 24 25 26 27 28 24 25 26 27 28 29 30 28 29 30
31
February March April
S M T W T F S S M T W T F
Trouble Shooting ML
S S M T W T F S
May 1 2
June 1 2 1 2
July
3 4 5 6
August
S M T W T F S S M T W T F S S M T W T F S S M T W T
3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11 12 13
1 2 3 4 1 1 2 3 4 5 6 1
10
5
17
11
6
18
12
7
19
13
8
20
14
9
21
15
10
22
16
11
23
10
2
17
11
3
18
12
4
19
13
5
20
• 14
6
21
Overfitting and
15
7
22
16
8
23
14
7
21
15
8
22
16
9
23
17
10
24
18
11
25
19
12
26
20
13
27
4 5 6 7 8
12 13 14 15 16 17 18 9 10 11 12 13 14 15 14 15 16 17 18 19 20 11 12 13 14 15
24
19
25
20
26
21
27
22
28
23 24 25
24
16
31
25
17
26
18
27
19
28
20 Generalizability
29
21
30
22
28
21
29
22
30
23 24 25 26 27 18 19 20 21 22
26 27 28 29 30 31 23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29
March April
30
S M T W T F S S M T W T F S
June 1 2 1 2 July
3 4 • Regularization
5 6 August
3
S 4
M 5 W
T 6 T7 8
F 9
S 7
S M8 9 10
T W 11
T 12
F 13
S S M T W T F S
September 1 1
October
2 173 184 5 6
November
1 2 3
December
10 11 12 13 14 15 16 14 15 16 19 20
S M T W T F S S M T W T F S S M T W T F S S M T W T
2
17 3
18 4 20
19 5 21 6 22
7 8
23 7
21 8
22 9 10
23 24 11
25 12
26 13
27 4 5 6 7 8 9 10
9
1
24
8
2
10
25
9
3
11
26 12
4
27 13
5
10 11 12 13
6
28 14
29
7
15
30
14
14
28
6
15
29
7
1
16
8
2
30 17 18
9 10
3• Feature Selection
4
19
11
5
20
12
11
3
12
4
13
5
14
6
15
7
1
16
8
2
17
9
1
8
2 3 4 5
9 10 11 12
16
31 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22 23 24
15 16 17 18 19 20 21 13 14 15 16 17 18 19 10 11 12 13 14 15 16 15 16 17 18 19
23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29 30 31
22 23 24 April
25 26 27 28 20 21 22 23 24 25 26 17 18 19 20 21 22 23 22 23 24 25 26
30
29
S 30
M T W T F S 27 28 29 30 31 24 25 26 27 28 29 30 29 30 31
July August
S
1
M
2 W3 T4
T F
5 S
6 S M T W T F S
7 8 9 10 11
October 12 13 November December
1 2 3 4 5 6 1 32
14 15 16 17 18 19 20
7
S M8 9 10
T W 11
T 12
F 13
S 4 M5 T6 W7 T8 F9 10
S S S M T W T F S
21 22 23 2
1 24 3
25 426 527 1 17
2 1 2 3 4 5 6 7
14 15 16 17 18 19 20 11 12 13 14 15 16
28
6 729 30 9 10
8 11 12 3 19
4 20
5 21
6 22
7 23
8 24
9 8 9 10 11 12 13 14
21 22 23 24 25 26 27 18
January February March April
S M T W T F S S M T W T F S S M T W T F S S M T W T

1 2 3 4 5 1 2 1 2 1 2 3 4
6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11
2019 (United States)
13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18
20 21 22 23 24 25 26 17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25
27 28 29 30 31 24 25 26 27 28 24 25 26 27 28 29 30 28 29 30
31
February March April
S M T W T F S S M T W T F S S M T W T F S
May 1 2
June 1 2 1 2
July
3 4 5 6
August
S M T W T F S S M T W T F S S M T W T F S S M T W T
3
10
4
11
5
12
6
1
13
7
2
14
8
3
15
9
4
16
3
10
4
11
5
12
6
13
7
14
8
15
9
1
16
Visualization
7
14
8
1
15
9
2
16
10
3
17
11
4
18
12
5
19
13
6
20
1
5 6 7 8 9 10 11 2 3 4 5 6 7 8 7 8 9 10 11 12 13 4 5 6 7 8
17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25 26 27
12 13 14 15 16 17 18 9 10 11 12 13 14 15 14 15 16 17 18 19 20 11 12 13 14 15
24 25 26 27 28 24 25 26 27 28 29 30 28 29 30
19 20 21 22 23 24 25 16 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22
26 27 28 29
March
30 31
31
23 24 25 26
April
• 27 Best Practices
28 29 28 29 30 31 25 26 27 28 29
30
S M T W T F S S M T W T F S
June 1 2 1 2 July
3 4 5 6 August
3
S 4
M 5 W
T 6 T7
September
8
F 9
S 7
S M8 9 10
T W 11
T
October
• Using D3 and
12
F 13
S matplotlib
S M
November
T W T F S
December
10 11 12 13 14 15 1
16 14 1
15 2 17
16 3 184 5
19 6
20 1 2 3
S M T W T F S S M T W T F S S M T W T F S S M T W T
2
17 3
18 4 20
19 5 21 6 22
7 8
23 7
21 8
22 9 10
23 24 11
25 12
26 13
27 4 5 6 7 8 9 10
1 2 3 4 5 6 7 1 2 3 4 5 1 2 1 2 3 4 5
9
24 10
25 11
26 1227 1328 14
29 15
30 14
28 15
29 16
30 17 18 19 20 11 12 13 14 15 16 17
8 9 10 11 12 13 14 6 7 8 9 10 11 12 3 4 5 6 7 8 9 8 9 10 11 12
16
31 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22 23 24
15 16 17 18 19 20 21 13 14 15 16 17 18 19 10 11 12 13 14 15 16 15 16 17 18 19
23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29 30 31
22 23 24 April
25 26 27 28 20 21 22 23 24 25 26 17 18 19 20 21 22 23 22 23 24 25 26
30
29
S 30
M T W T F S 27 28 29 30 31 24 25 26 27 28 29 30 29 30 31
July August
S
1
M
2 W3 T4
T F
5 S
6 S M T W T F S
7 8 9 10 11
October 12 13 November December
1 2 3 4 5 6 1 32
14 15 16 17 18 19 20
7
S M8 9 10
T W 11
T 12
F 13
S 4 M5 T6 W7 T8 F9 10
S S S M T W T F S
21 22 23 2
1 24 3
25 426 527 1 17
2 1 2 3 4 5 6 7
14 15 16 17 18 19 20 11 12 13 14 15 16
28
6 729 30 9 10
8 11 12 3 19
4 20
5 21
6 22
7 23
8 24
9 8 9 10 11 12 13 14
21 22 23 24 25 26 27 18
January February March April
S M T W T F S S M T W T F S S M T W T F S S M T W T

1 2 3 4 5 1 2 1 2 1 2 3 4
6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11
2019 (United States)
13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18
20 21 22 23 24 25 26 17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25
27 28 29 30 31 24 25 26 27 28 24 25 26 27 28 29 30 28 29 30
31
February March April
S M T W T F S S M T W T F S S M T W T F S
May 1 2
June 1 2 1 2
July
3 4 5 6
August
S M T W T F S S M T W T F S S M T W T F S S M T W T
3
10
4
11
5
12
6
1
13
7
2
14
8
3
15
9
4
16
3
10
4
11
5
12
6
13
7
14
8
15
Trouble Shooting ML
9
1
16
7
14
8
1
15
9
2
16
10
3
17
11
4
18
12
5
19
13
6
20
1
5 6 7 8 9 10 11 2 3 4 5 6 7 8 7 8 9 10 11 12 13 4 5 6 7 8
17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25 26 27
12 13 14 15 16 17 18 9 10 11 12 13 14 15 14 15 16 17 18 19 20 11 12 13 14 15
24 25 26 27 28 24 25 26 27 28 29 30 28 29 30
19 20 21 22 23 24 25 16 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22
26 27 28 29
March
30 31
31
23 24 25 26
April
• 27 Sampling
28 29 28 29 30 31 25 26 27 28 29
30
S M T W T F S S M T W T F S
June 1 2 1 2 July
3 4 5 6 August
3
S 4
M 5 W
T 6 T7
September
8
F 9
S 7
S M8 9 10
T W 11
T
October
• Evaluation Metrics
12
F 13
S S M
November
T W T F S
December
10 11 12 13 14 15 1
16 14 1
15 2 17
16 3 184 5
19 6
20 1 2 3
S M T W T F S S M T W T F S S M T W T F S S M T W T
2
17 3
18 4 20
19 5 21 6 22
7 8
23 7
21 8
22 9 10
23 24 11
25 12
26 13
27 4 5 6 7 8 9 10
1 2 3 4 5 6 7 1 2 3 4 5 1 2 1 2 3 4 5
9
24 10
25 11
26 1227 1328 14
29 15
30 14
28 15
29 16
30 17 18 19 20 11 12 13 14 15 16 17
8 9 10 11 12 13 14 6 7 8 9 10 11 12 3 4 5 6 7 8 9 8 9 10 11 12
16
31 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22 23 24
15 16 17 18 19 20 21 13 14 15 16 17 18 19 10 11 12 13 14 15 16 15 16 17 18 19
23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29 30 31
22 23 24 April
25 26 27 28 20 21 22 23 24 25 26 17 18 19 20 21 22 23 22 23 24 25 26
30
29
S 30
M T W T F S 27 28 29 30 31 24 25 26 27 28 29 30 29 30 31
July August
S
1
M
2 W3 T4
T F
5 S
6 S M T W T F S
7 8 9 10 11
October 12 13 November December
1 2 3 4 5 6 1 32
14 15 16 17 18 19 20
7
S M8 9 10
T W 11
T 12
F 13
S 4 M5 T6 W7 T8 F9 10
S S S M T W T F S
21 22 23 2
1 24 3
25 426 527 1 17
2 1 2 3 4 5 6 7
14 15 16 17 18 19 20 11 12 13 14 15 16
28
6 729 30 9 10
8 11 12 3 19
4 20
5 21
6 22
7 23
8 24
9 8 9 10 11 12 13 14
21 22 23 24 25 26 27 18
January February March April
S M T W T F S S M T W T F S S M T W T F S S M T W T

1 2 3 4 5 1 2 1 2 1 2 3 4
6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11
2019 (United States)
13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18
20 21 22 23 24 25 26 17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25
27 28 29 30 31 24 25 26 27 28 24 25 26 27 28 29 30 28 29 30
31
February March April
S M T W T F S S M T W T
How to Lie with Statistics
F S S M T W T F S
May 1 2
June 1 2 1 2
July
3 4 5 6
August
S M T W T F S S M T W T F S S M T W T F S S M T W T
3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11 12 13
1 2 3 4 1 1 2 3 4 5 6 1
10
5
17
11
6
18
12
7
19
13
8
20
14
9
21
15
10
22
16
11
23
10
2
17
11
3
18
12
4
19
13
5
20
• 14
6
21
p-hacking
15
7
22
16
8
23
14
7
21
15
8
22
16
9
23
17
10
24
18
11
25
19
12
26
20
13
27
4 5 6 7 8
12 13 14 15 16 17 18 9 10 11 12 13 14 15 14 15 16 17 18 19 20 11 12 13 14 15
24 25 26 27 28 24 25 26 27 28 29 30 28 29 30
19 20 21 22 23 24 25 16 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22
31
26 27 28 29 30 31 23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29
March
30

April Researcher Degrees of
S M T W T F S S M T W T F S
June 1 2 1 2 July
3 4Freedom
5 6 August
3
S 4
M 5 W
T 6 T7 8
F 9
S 7
S M8 9 10
T W 11
T 12
F 13
S S M T W T F S
September 1 1
October
2 173 184 5 6
November
1 2 3
December
10 11 12 13 14 15 16 14 15 16 19 20
S M T W T F S S M T W T F S S M T W T F S S M T W T
2
17 3
18 4 20
19 5 21 6 22
7 8
23 7
21 8
22 9 10
23 24 11
25 12
26 13
27 4 5 6 7 8 9 10
9
1
24
8
2
10
25
9
3
11
26 12
4
27 13
5
10 11 12 13
6
28 14
29
7
15
30
14
14
28
6
15
29
7
1
16
8
2
30 17 18
9 10
3• Issues with Reproducibility
4
19
11
5
20
12
11
3
12
4
13
5
1
8
14
6
15
7
1
16
8
2
17
9
2
9
3 4 5
10 11 12
16
31 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22 23 24
15 16 17 18 19 20 21 13 14 15 16 17 18 19 10 11 12 13 14 15 16 15 16 17 18 19
23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29 30 31
22 23 24 April
25 26 27 28 20 21 22 23 24 25 26 17 18 19 20 21 22 23 22 23 24 25 26
30
29
S 30
M T W T F S 27 28 29 30 31 24 25 26 27 28 29 30 29 30 31
July August
S
1
M
2 W3 T4
T F
5 S
6 S M T W T F S
7 8 9 10 11
October 12 13 November December
1 2 3 4 5 6 1 32
14 15 16 17 18 19 20
7
S M8 9 10
T W 11
T 12
F 13
S 4 M5 T6 W7 T8 F9 10
S S S M T W T F S
21 22 23 2
1 24 3
25 426 527 1 17
2 1 2 3 4 5 6 7
14 15 16 17 18 19 20 11 12 13 14 15 16
28
6 729 30 9 10
8 11 12 3 19
4 20
5 21
6 22
7 23
8 24
9 8 9 10 11 12 13 14
21 22 23 24 25 26 27 18
January February March April
S M T W T F S S M T W T F S S M T W T F S S M T W T

1 2 3 4 5 1 2 1 2 1 2 3 4
6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11
2019 (United States)
13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18
20 21 22 23 24 25 26 17 18 19 20 21 22 23 17 18 19 20 21 22 23 21 22 23 24 25
27 28 29 30 31 24 25 26 27 28 Special Topics/Applications
24 25 26 27 28 29 30 28 29 30
31
February March April
S M T W
May
T F

1
S

2
S M T W

June
T
NLP,
F

1 2
Topic1 Modeling
S S

2 3
M
July 4 5
T

6
W T F S
August
S M T W T F S S M T W T F S S M T W T F S S M T W T
3 4 5 6 7 8 9 3 4 5 6 7 8 9 7 8 9 10 11 12 13
1 2 3 4 1 1 2 3 4 5 6 1
10 11 12 13 14 15 16 10 11 12 13 14 15 16 14 15 16 17 18 19 20
5 6 7 8 9 10 11 2 3 4 5 6 7 8 7 8 9 10 11 12 13 4 5 6 7 8
17
12
24
18
13
25
19
14
26
20
15
27
21
16
28
22
17
23
18
17
9
24
18
10
25
19
11
26
20
12
27
• 21
13
28
Algorithmic Bias, Ethics
22
14
29
23
15
30
21
14
28
22
15
29
23
16
30
24
17
25
18
26
19
27
20 11 12 13 14 15
19 20 21 22 23 24 25 16 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22
31
26 27 28 29 30 31 23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29
March April
30
S M T W
June
T F S S M T W • T Recommendation
F S Systems
1 2 1 2 July
3 4 5 6August
3
S 4
M 5 W
T 6 T7 8
F 9
S 7
S M8 9 10
T W 11
T 12
F 13
S S M T W T F S
September 1 1
October
2 173 184 5 6
November
1 2 3
December
10 11 12 13 14 15 16 14 15 16 19 20
2
S
17
1
3
M
18
2
19
3
T
4 20
W
5 21
4
T
6 22
5
7
6
F
8
23
7
S
7
S
21 8
M
22 23
1
T
9 10
W
24 11
2
T
25
3
• Deep Learning
12
26
4
F
413
5
27
5
S
6
S
7
M T W
8
T
9
F

1
10
2
S S

1 2
M

3
T

4 5
W T

9
24 10
25 11
26 1227 1328 14
29 15
30 14
28 15
29 16
30 17 18 19 20 11 12 13 14 15 16 17
8 9 10 11 12 13 14 6 7 8 9 10 11 12 3 4 5 6 7 8 9 8 9 10 11 12
16
31 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22 23 24
15 16 17 18 19 20 21 13 14 15 16 17 18 19 10 11 12 13 14 15 16 15 16 17 18 19
23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29 30 31
22
30
29
23
30
24 April
25 26 27 28 20
27
21
28
22 23 24
29 30 31
• Causal Inference
25 26 17
24
18
25
19
26
20
27
21
28
22
29
23
30
22 23 24 25 26
29 30 31
S M T W T F S
July August
S
1
M
2 W3 T4
T F
5 S
6 S M T W T F S
7 8 9 10 11
October 12 13 November December
1 2 3 4 5 6 1 32
14 15 16 17 18 19 20
7
S M8 9 10
T W 11
T 12
F 13
S 4 M5 T6 W7 T8 F9 10
S S S M T W T F S
21 22 23 2
1 24 3
25 426 527 1 17
2 1 2 3 4 5 6 7
14 15 16 17 18 19 20 11 12 13 14 15 16
28
6 729 30 9 10
8 11 12 3 19
4 20
5 21
6 22
7 23
8 24
9 8 9 10 11 12 13 14
21 22 23 24 25 26 27 18
Assignments
• Feb 9: SQL and Query Optimization

• Feb 14: Web Crawling and Data Cleaning

• Feb 28: Map Reduce

• Mar 7: Linear Regression

• Mar 21: K Means

• Apr 8: Visualization

• Apr 18: Topic Modeling

• Apr 25: Deep Learning


Project
• Feb 18: Pre-Proposal Due (10%)

• Mar 8: Check-in 1

• Mar 15: Blog Post 1 (10%)

• Apr 5: Midterm Report (30%)

• Apr 16: Check-in 2

• Apr 19: Blog Post 2 (10%)

• May 3: Posters Due

• May 6-7: Poster Presentation (20%)

• May 10: Blog Post 3 (Final Writeup) (20%)


Grading

• 60% Assignments (7.5% each)

• 30% Final Project

• 10% Attendance/Clickers (must attend 2/3 of


classes)
Late Days
• Assignments are due at 11:59 pm on the listed
due date

• 5 late days total; maximum of 2 on any single


assignment

• 20% penalty for each additional day late

• No late days for Final Project deliverables (incl.


intermediate deliverables)
Collaboration
• Talking to each other is good. Cheating is bad.

• Sign the form so I know you know.


To Do Now
To Do Now

• Get on the waitlist—make your case there. (Please


don’t send emails to me directly.)
To Do Now
• Join iClicker: https://
ithelp.brown.edu/kb/articles/
iclicker-cloud-reef-instructions-for-
students

• Make sure you register via canvas


so that grades get synced
To Do Now
• Join the course on Piazza

• Piazza is now opt-out (as opposed to opt-in) for


data sharing.

• Decide how you feel about this. Instructions for opt-


out are on Canvas.
To Do Now

• Hours are starting this week! Go say hi to your


staff…
Gurnaaz Maulik Wennie

Alex 😎 Shivani
😎 Your Phenomenal Staff!
Paarul Jens Mounika
Ashish Yiquan

Fumeng

Anna 😎 Shre Hyunjoon


Pavlo Tanvir David Zander
Jacob

Erin Miles Haomo Esteban


😎
Iris Weiqi Palak
Thank you!
Questions?

Вам также может понравиться