Академический Документы
Профессиональный Документы
Культура Документы
11
5
OBJECTIVE: Implementation of binning method of data cleaning.
Binning methods sorted data value by consulting its “neighbor- hood,” that is, the values
around it.The sorted values are distributed into a number of “buckets,” or bins.
For example
Bin a: 4, 8, 15
In this example, the data for price are first sorted and then partitioned into equal-frequency
bins of size 3.
In smoothing by bin means, each value in a bin is replaced by the mean value of the bin.
Bin a: 4, 4, 15
In smoothing by bin boundaries, each bin value is replaced by the closest boundary value.
EXPERIMENT NO. 12
6
OBJECTIVE: Implementation of z score of data cleaning.
Z-scores are linearly transformed data values having a mean of zero and a standard
deviation of 1.Z-scores are also known as standardized scores; they are scores (or data
values) that have been given a common standard. This standard is a mean of zero and
a standard deviation of 1.
Z-Scores - Standardization
We suggested earlier on that giving scores a common standard of zero mean and unity
standard deviation facilitates their interpretation. We can do just that by
first subtracting the mean over all scores from each individual score and
then dividing each remainder by the standard deviation over all scores.
These two steps are the same as the following formula:
Zx=Xi−X¯¯¯¯Sx
Example.
A group of 100 people took some IQ test. My score was 5. So is that good or bad? At this
point, there's no way of telling because we don't know what people typically score on this
test. However, if my score of 5 corresponds to a z-score of 0.91, you'll know it was pretty
good: it's roughly a standard deviation higher than the average (which is always zero for z-
scores).
What we see here is that standardizing scores facilitates the interpretation of a single test
score. Let's see how that works.
our 100 scores have a mean of 3.45 and a standard deviation of 1.70.
By
entering these numbers into the formula, we see why a score of 5 corresponds to a z-score
of 0.91:
Zx=5−3.451.70=0.91Zx=5−3.451.70=0.91
In a similar vein, the screenshot below shows the z-scores for all distinct values of our first
IQ test added to the data.