Академический Документы
Профессиональный Документы
Культура Документы
Critic data
• But there is too much noise for analyzing all the data without filtering.
• Ideas on filtering:
• filter by package: focus on Collections-*, Kernel, System-*, Nautilus packages
• or perform analysis by package and merge the results
• filter by rule severity: focus only on error rules
• filter by rule group: focus only on “Pharo bugs” and “Bugs”
• calculate the trend values for each rule or each package separately and
compare trends/eliminate outliers
• consider only entities that were changed since last version
Linear and Piecewise Linear Regressions 3
Motivation Linear Regression Piecewise Linear Regression
What is happening…?
There is hope…
This model?
Use all version
• 𝑓(𝑥) = 𝛽0 + 𝛽1 𝑥
• 𝛽0 : intercept of the line (when 𝑥 = 0 then 𝑦 = 𝛽0 ), 𝛽1 : slope of the line
(one unit increase in 𝑥 gives 𝛽1 units in 𝑦)
• 𝜀𝑖 : random component (statistical error) for the 𝑖-th case, it accounts for
the fact that the statistical model does not give an exact fit to each and
every data points
• 𝜀𝑖 is unobservable, but we assume that E(𝜀𝑖 ) = 0 and Var 𝜀𝑖 = 𝜎𝜀2 for all
𝑖 = 1, … , 𝑛
• However, we do not assume any distribution for 𝜀𝑖
• Population parameters are 𝛽0 , 𝛽1 and 𝜎𝜀2 and we want to estimate them
Linear and Piecewise Linear Regressions 10
Motivation Linear Regression Piecewise Linear Regression
• Easy to solve: set the first partial derivatives equal to zero, check the second derivative…
𝛽0𝑙𝑠 = 𝑦 − 𝛽1𝑙𝑠 𝑥
• Properties of residuals
• 𝑒𝑖 = 0 since the least-squares line passes (𝑥 , 𝑦)
• 𝑥𝑖 𝑒𝑖 = 0 and 𝑦𝑖 𝑒𝑖 = 0: residuals are uncorrelated with the independent variable 𝑥𝑖 and
fitted value 𝑦𝑖
• 𝛽𝑙𝑠 are unique defined as long as 𝑥𝑖 ’s are not all identical, in that case the numerator
(𝑥𝑖 − 𝑥 )2 = 0
• Gauss-Markov Theorem:
least squares estimates have the smallest variance
among all linear unbiased estimates
• Recall:
• Let 𝛽 an estimate for an unknown parameter 𝛽
• The quality of 𝛽 is measured via its mean squared error
2 2
• 𝑀𝑆𝐸 𝛽 ≔ 𝐸 𝛽−𝐸 𝛽 = 𝛽−𝐸 𝛽 + 𝑉𝑎𝑟 𝛽
= 𝐵𝑖𝑎𝑠 2 + 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
• Therefor least squares estimates are famous: if the underlying function
𝑓(𝑥) were truly linear (that is, 𝑦 = 𝛽 𝑇 𝑥 + 𝜀), then least squares
estimates are your best approximation!
b0 + b1x3 m3
E(y|x2)
b0 + b1x2 m2
…but the mean
E(y|x1)
value changes with x
b0 + b1x1 m1
x1 x2 x3
Linear and Piecewise Linear Regressions 16
Motivation Linear Regression Piecewise Linear Regression
Test statistics
Test of significance
1. Test both parameters simultaneously with 𝑭 test
𝐻0 ∶ 𝛽0 = 𝛽1 = 0
𝐻1 ∶ at least one of them is not zero
𝐻1 ∶ 𝛽𝑖 ≠ 0
Model validation
The assumptions of the random term (i.e., the errors)
The outliers
Model Evaluation
The goodness-of-fit or quality of the model
How good is the fit?
Two measures:
• Residual standard error
• Coefficient of determination 𝑅2
Hypothesis test
𝛽0 + 𝛽1 𝑥 ; 𝑥≤𝑐
𝑦=
𝛽0 − 𝛽2 𝑐 + (𝛽1 +𝛽2 ) 𝑥 ; 𝑥 > 𝑐