Empirical Research Methods in Computer Science: Lecture 1, Part 1 October 12, 2005 Noah Smith

Empirical Research Methods
in Computer Science
Lecture 1, Part 1
October 12, 2005
Noah Smith
http://nlp.cs.jhu.edu/~nasmith/erm
Empiricism
empeiros: experienced
(peira = trial or test)
cf. rationalism
Exploration & Experiment
Exploratory Data Analysis
(lecture
5)
Hypothesis Testing
experiment
confirm
yes/no?
(lectures 1,2)
explore
visualize
summarize
model
Computer What?
Theory
Practice
Algorithms, Computation
Software Engineering,
Application Areas
Systems
OS, Architecture
Who cares?
1.
2.
anyone who wants to do research

anyone who wants to follow research
(i.e., read papers)
3.
4.
anyone who wants to be able to make

smart decisions / draw conclusions
anyone who likes thinking critically
Basic Research Questions
Basic Research Questions

int foo() {
...
}
Why bother?
int foo() {
int foo() {
...{
int foo()
...
int foo()}{
int foo() ...
{}
...
int foo()
... {}
}
...
}
}
Variation Statistics
determinism isnt good
enough any more!
int foo() {
...
}
Statistics, in this Course
Nonparametric tests
Sampling
Later:
Parametric tests (when and why)
Warning
Theory (complexity analysis, etc.)

is important, too!
Many phenomena arent

surprising if you know your math.
Goals
Know how to look for the interesting

experiments
Know how to construct experiments
Know how to analyze the results
Be critical of all claims
Develop an aesthetic for good
empirical work!
Empiricism is FUN!
Especially in computer science!
Basic Course Information
instructors: Noah and David

{n,d}asmith@cs.jhu.edu
Wednesdays 4-5:15 pm
no class Thanksgiving week
homeworks (65%); final exam
(30%)
About Us
Combined 19 years of experience

in CS; 36 years programming
Autodidact empiricists
Research interests in statistical
modeling and machine learning
(Eisner/Yarowsky lab)
NEB 332
Plan
Hypothesis testing, statistics (2)

Case study: runtime (2)
Exploratory data analysis (1)
Parametric testing, modeling (1-2)
Statistical analysis of computer
programs (1)
MO
Come to class.
Send us feedback anytime.
What do you want to know?

Bring us papers.
Empirical Research Methods

in Computer Science
Lecture 1, Part 2
October 12, 2005
David Smith
Terminological Prelude
Populations
Samples
Sampling distributions
Files on my system
Statistics
Population distributions
All possible files. How big?
Functions of data
Size of my files
Models
Parameters
And now for some data
Abnormality
Abnormality
The Bootstrap
Simulates the sampling distribution

Proposed by Efron in 1979
Anticipated by permutation tests,

jackknife, cross-validation
From original sample of size n,

draw B samples of size n with
replacement and calculate the
statistic on each
Sampling Distributions
Bootstrapping the Mean
Whats Going On?
Why is bootstrap distribution

normal?
Remember, this is a mean
Linearity of Expectation
Central Limit Theorem
Closed form standard error for
means
More Heavy Tails
Sampling Still Normal
Bivariate Data
Compression Performance
Bootstrapping Correlation
Error, Confidence, Testing
Standard error from sampling

distribution
Confidence intervals: bounding
error probability (e.g. to 5%)
Hypothesis testing: how likely is a
particular statistic under our
assumptions?
Hypothesis Testing
One-sample
Two-sample
Are these data normal/Poisson/?

Are these two samples from the
same distribution?
Paired-sample
Is this technique better than that?
Your First Assignment
Data compression
Three-way tradeof
Compression
Speed
Loss
Degenerate cases (cat, echo , )

Unknown distribution of input

Empirical Research Methods in Computer Science: Lecture 1, Part 1 October 12, 2005 Noah Smith

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Empirical Research Methods in Computer Science: Lecture 1, Part 1 October 12, 2005 Noah Smith

Загружено:

Авторское право:

Доступные форматы

Empirical Research Methods

Exploration & Experiment

Exploratory Data Analysis

anyone who wants to do research

anyone who wants to be able to make

Basic Research Questions

Basic Research Questions

Statistics, in this Course

Theory (complexity analysis, etc.)

Many phenomena arent

Know how to look for the interesting

Especially in computer science!

Basic Course Information

instructors: Noah and David

Combined 19 years of experience

Hypothesis testing, statistics (2)

What do you want to know?

Empirical Research Methods

And now for some data

Simulates the sampling distribution

Anticipated by permutation tests,

From original sample of size n,

Bootstrapping the Mean

Whats Going On?

Why is bootstrap distribution

More Heavy Tails

Sampling Still Normal

Error, Confidence, Testing

Standard error from sampling

Are these data normal/Poisson/?

Is this technique better than that?

Your First Assignment

Degenerate cases (cat, echo , )

Вам также может понравиться