Академический Документы
Профессиональный Документы
Культура Документы
HISTOGRAM
A histogram is a bar chart or graph showing the frequency of occurrence of each value
of the variable being analyzed. In histogram, data are plotted as a series of rectangles.
Class intervals are shown on the 'X-axis' and the frequencies on the 'Y-axis'.
The height of each rectangle represents the frequency of the class interval. Each rectangle
is formed with the other so as to give a continuous picture. Such a graph is also caned
staircase or block diagram.
17
The rectangles are drawn taking into consideration 1 cm. = 50 (income in Rs.) as width
and the respective frequencies (being represented in the scale of 1 cm=10 units
frequencies). The set of rectangles so obtained represent the histogram.
A stem-and-leaf plot is a method of exploratory data analysis that is used to rank-order and
arranges data into groups.
To construct a stem-and-leaf display, each value is split into two parts - a "stem" and a "leaf'.
The stem is the first part of the number, and the leaf is the last part of the number, Although
the values may be split in many different ways, depending on the types of values one are
working With, one usually let the leaf be the last digit of the value and the stem be all of the
preceding digits.
For example, the value 46 has a stem of 4 and a leaf of 6. For the value 192, the stem is 19
and the leaf is 2. (Occasionally, we will choose to use a stem of I 'and a leaf of 92 instead. This
is a good idea when, e.g., the values are spread out from 100 to 900).'
example 5: Here are the scores of 32 randomly selected statistics students on an exam on
Unit-I. Construct a stem-and-leaf display for these scores. . .
.
65 94 88 51 76 75
91 47 71 48 68 45
92 82 96 82.71 56
92 76 74 98 75 69
Solution: The lowest value is 45 (stem: 4, leaf: 5), and the highest value is 98 (stem: 9,
leaf: 8), this means that our stems will be 4,5,6, 7, 8, and 9. .
Stem Leaf
4
5
6
7
8
9
Begin with the first value, 83, its stem is 8 and its leaf is 3. We will place a 3 in the row
containing the leaf 8.
Stein Leaf
4
5
6
7
8 3
9
18
The values for each stem need to be sorted, which produces the final stem-and-leaf
display: ".
Stem Leaf
4 578
5 16
6 4589
7 1145,56669
8 2233668
9 0122468
CANDLE STICK
A candle stick chart is style of bar-chart used primarily to describe price movements of
a security, derivative, or currency over time.
It is a combination of a line-chart and a bar-chart, in that each bar represents the range of
price movement over a given time interval. It is most often used in technical analysis of
equity and currency price patterns. They appear superficially similar to box plots, but are
unrelated.
Candlesticks are usually composed of the body (black or white), and an upper and a
lower shadow (wick): the area between the open and the close is called the real body,
price excursions above and below the real body are called shadows. The wick
illustrates the highest and lowest traded price of a security during the time interval
represented. The body illustrates the opening and closing trades. If the security closed
higher than it opened, the botLy is white or unfilled, with the opening price at the
bottom of the body and the closing price at the top. If the security closed lower than it
19
opened, the body is black, with the opening price at the top and the closing price at the
bottom. A candle stick need not have either a body or a wick.
BOX PLOT
The box plot (or whisker diagram) is a standardized way of displayingt the distribution
t of data
based on the five numbers summaries: minimum, first quartile, median, third quartile, and
maximum. In the simplest box plot the central rectangle spans the first quartile to the third
quartile (the interquartile range). A segment inside the rectangle shows the median and
"whiskers" above and below the box show the locations of the minimum and maximum. A
box plot is also known as a box and-whisker plot.
These five numbers taken together give a clear look at many features of the unprocessed data.
The two extremes indicate the range spanned by the data, the median indicates the centre, the
two quartiles indicate the edges of the "middle half of the data", and the position of the median
between the quartiles gives a rough indication of skewness or symmetry.
.The box plot is a picture of the five-number summary, as shown in figure 5.4:
20
Figure: Box Plot Disp1ays th Five number Summary for a Univariate Data
Set. Giving a Quick Impression or the Distribution.
BI-VARIATE ANALYSIS
Bi-variate analysis is concerned with the relationships between pairs of variables (X, Y)
in a data set. The following data analysis situations can be visualized, depending on the
measurement levels of variables and whether there is any distinction between dependent
and independent variables.
For example, suppose we wanted to find out if gender was related to attitudes toward
equality between men and women had measured each variable at the nominal level. The
variables are:
1) Gender: Measured as male or female; and
2) Attitude: Measured simply as a "in favour of' or as "opposed to" gender equality.
To test the relationship between gender and .attitude, hypothesis can be developed as
Adult females have more favourable views toward gender equality than adult males.
Bi-variate analysis is concerned with the relationship between values of two variables, a
pair of values being known for each of a number of individuals or entities. The two
variables may be conventionally labeled X and Y and the values for each individual Xl,
21
Y1, X2, Y2 .Xn,Yn. The term data point will be used for the pair of values of the
variables of anyone individual.
CROSS TABULATION
A cross tabulation displays the joint distribution of two or more variables. They are
usually presented as a contingency table in a matrix format A contingency table contains
a cell for every combination of categories of the two variables.
CORRELATION ANALYSIS
Correlation is the study of the linear relationship between two variables. When there is a
relationship of quantitative measure between two set of variables, the appropriate statistical tool
for measuring the relationship and expressing each in a precise way is known as correlation.
For example, there is a relationship between the heights and weights of persons, demand and
prices of commodities etc:
Meaning
Correlation analysis is the statistical tool we can use to describe the degree to which one
variable is linearly related to another.
6) Other
i) Sampling error can also be calculated.
23
ii) Correlation is the basis for the concept of regression and ratio of variation.
iii) The decision making is heavily felicitated by reducing the range of uncertainty
and hence empowering the predictions.
Types of Correlation
i) Positive Correlation: When the values of two variables move in the same
direction i.e. when an increase in the value of one variable is associates with an increase
in the value of other variable, and a decrease in the value of one variable is associated
with the decrease in the value of the other variable, correlation is to be positive. For
example, heights and weights, income and expenditure ofa groups of individuals, price,
and supply of commodities.
ii) Negative Correlation: The values of two variables move in opposite directions,
so that with an increase in the values of one variable the value of the other variable
decrease, and with a decrease in the values of one variable the values of the other variable
increase, correlation is said to be negative. For example, when prices increase, demand
goes down. Thus there is a negative correlation between these two variables i.e., demand
and supply.
If a = 0, the relation becomes y = bx. In such cases the values of the variables are in constant
ratio.
For example, the following will illustrate the cases of linear and non-linear correlations.
The following diagrams will illustrate the cases of linear and, curvilinear (non-linear)
correlation:
For example, if out of the three related variables, say, marks in statistics, marks in
accountancy, and marks in English, we study the correlation between the two variables,
viz., marks in, statistics, and marks in Accountancy eliminated the effect of the other
variable, i.e., marks in English, it will be a case of partial correlation. On the other hand,
when the relationship between any two or more variables is studied at a time, it is a case of
multiple correlations. If the relationship between the volume of profits, volume of sales,
and the volume of cost of sales at a time are studied it will be a case of multiple
correlation. In actual practice, there are many variables affecting the target variable and
if the businessman is aware of multiple correlation can handle the situation intelligently as
generally people use only simple correlation.
25
Scatter diagram is a special type of dot chart. Under this method the given data are
plotted in a graph paper in the form of dot, For each pair of x and y values, we plot a dot
(or point) and thus we obtain many dots equal in number of observation. If now these
plotted dots (or points) show some trends either upwards or downwards, then the two
variables (x and y) are said to be correlated, or otherwise not correlated. If again the
trend of points is upwards moving from lower left hand corner to upper right hand
corner, then correlation is positive (r = +1). (r is coefficient of correlation). On the other
hand, if movement is reverse i.e., dots move from upper left hand corner to lower right
hand corner, then correlation is negative( r = -1).
When the deviations of items are taken from the actual mean, we can apply anyone of
these methods; but the simplest formula is the third one.
When the variables under consideration are not capable of quantitative measurement
but can be arranged in serial order (ranks), we find correlation between the ranks of two
series. This happens when we deal with qualitative characteristics such as honesty, beauty,
etc.
This method is called Spearman's Rank Difference Method or Ranking Method and the
correlation coefficient so obtained is called Rank Correlation Coefficient and is denoted
by r. This method was developed by Charles Edward Spearman, a British Ps ychologist in
1904.
For example, if the contest is judged by three judges' and the data is available in the form
of ranks given by these judges, we can find out which two of them have similar opinion.
Spearman's rank correlation coefficient is defined as,
27
Spearman's rank correlation coefficient is also used when the measurements are given for
both the series.
Example : Following are the rank obtained by 10 students in two subjects, Statistics and
Mathematics. To what extent the knowledge of the students in the two subjects is related?
.
28
For example, if one knows that the yield of rice and rainfall are closely related then
one want to know the amount of rain required to achieve a certain production. For this
purpose one will use regression analysis.
Meaning of Regression
"Regression i s the measure of the average relationship between two or more variables in
terms of the original units of data."
Regression Coefficients
Let 'b' is the slope of line of regression of Yon X also called coefficient of regression of
Y on X. It represents t h e increment in the value of dependant variable Y corresponding to
a unit change in the value of the independent variable-X.