Вы находитесь на странице: 1из 13

16

HISTOGRAM

A histogram is a bar chart or graph showing the frequency of occurrence of each value
of the variable being analyzed. In histogram, data are plotted as a series of rectangles.
Class intervals are shown on the 'X-axis' and the frequencies on the 'Y-axis'.

The height of each rectangle represents the frequency of the class interval. Each rectangle
is formed with the other so as to give a continuous picture. Such a graph is also caned
staircase or block diagram.
17

The rectangles are drawn taking into consideration 1 cm. = 50 (income in Rs.) as width
and the respective frequencies (being represented in the scale of 1 cm=10 units
frequencies). The set of rectangles so obtained represent the histogram.

LEAF AND STEM

A stem-and-leaf plot is a method of exploratory data analysis that is used to rank-order and
arranges data into groups.

To construct a stem-and-leaf display, each value is split into two parts - a "stem" and a "leaf'.
The stem is the first part of the number, and the leaf is the last part of the number, Although
the values may be split in many different ways, depending on the types of values one are
working With, one usually let the leaf be the last digit of the value and the stem be all of the
preceding digits.

For example, the value 46 has a stem of 4 and a leaf of 6. For the value 192, the stem is 19
and the leaf is 2. (Occasionally, we will choose to use a stem of I 'and a leaf of 92 instead. This
is a good idea when, e.g., the values are spread out from 100 to 900).'
example 5: Here are the scores of 32 randomly selected statistics students on an exam on
Unit-I. Construct a stem-and-leaf display for these scores. . .
.
65 94 88 51 76 75
91 47 71 48 68 45
92 82 96 82.71 56
92 76 74 98 75 69
Solution: The lowest value is 45 (stem: 4, leaf: 5), and the highest value is 98 (stem: 9,
leaf: 8), this means that our stems will be 4,5,6, 7, 8, and 9. .
Stem Leaf
4
5
6
7
8
9

Begin with the first value, 83, its stem is 8 and its leaf is 3. We will place a 3 in the row
containing the leaf 8.

Stein Leaf
4
5
6
7
8 3
9
18

The values for each stem need to be sorted, which produces the final stem-and-leaf
display: ".
Stem Leaf
4 578
5 16
6 4589
7 1145,56669
8 2233668
9 0122468
CANDLE STICK

A candle stick chart is style of bar-chart used primarily to describe price movements of
a security, derivative, or currency over time.

It is a combination of a line-chart and a bar-chart, in that each bar represents the range of
price movement over a given time interval. It is most often used in technical analysis of
equity and currency price patterns. They appear superficially similar to box plots, but are
unrelated.

Candlesticks are usually composed of the body (black or white), and an upper and a
lower shadow (wick): the area between the open and the close is called the real body,
price excursions above and below the real body are called shadows. The wick
illustrates the highest and lowest traded price of a security during the time interval
represented. The body illustrates the opening and closing trades. If the security closed
higher than it opened, the botLy is white or unfilled, with the opening price at the
bottom of the body and the closing price at the top. If the security closed lower than it
19

opened, the body is black, with the opening price at the top and the closing price at the
bottom. A candle stick need not have either a body or a wick.

To better highlight price movements, modern candlestick charts (especially those


displayed digitally) often replace the black , or white of the candlestick body with
colours such as red (for a lower closing) and blue or green (for a higher closing). In
some East Asian countries such as Taiwan, China, Japan, and South Korea, the colouring
scheme is reversed (red for higher closing, and green/blue for a lower closing).

Create a Candlestick Chart


To create a candlestick chart, enter the following values into columns. Each row describes
a single candlestick marker.
1) Column 0: Enter a label for the X-axis.
2) Column 1: Enter a number specifying the low/minimum value of this marker. This is
the base of the candle's center line.
3) Column 2: Enter a number specifying the opening or initial value of this marker.
This is one vertical border of the candle. If this value is less than the value in Column 3,
the candle will be filled; otherwise, it will be hollow.

BOX PLOT

The box plot (or whisker diagram) is a standardized way of displayingt the distribution
t of data
based on the five numbers summaries: minimum, first quartile, median, third quartile, and
maximum. In the simplest box plot the central rectangle spans the first quartile to the third
quartile (the interquartile range). A segment inside the rectangle shows the median and
"whiskers" above and below the box show the locations of the minimum and maximum. A
box plot is also known as a box and-whisker plot.

These five numbers taken together give a clear look at many features of the unprocessed data.
The two extremes indicate the range spanned by the data, the median indicates the centre, the
two quartiles indicate the edges of the "middle half of the data", and the position of the median
between the quartiles gives a rough indication of skewness or symmetry.

.The box plot is a picture of the five-number summary, as shown in figure 5.4:
20

Figure: Box Plot Disp1ays th Five number Summary for a Univariate Data
Set. Giving a Quick Impression or the Distribution.

BI-VARIATE ANALYSIS

Bi-variate analysis is concerned with the relationships between pairs of variables (X, Y)
in a data set. The following data analysis situations can be visualized, depending on the
measurement levels of variables and whether there is any distinction between dependent
and independent variables.

Bi-variate analysis is the simultaneous analysis of two variables. It is usually


undertaken to see if one variable, such as gender, is related to another variable, perhaps
attitudes toward male/female equality.

For example, suppose we wanted to find out if gender was related to attitudes toward
equality between men and women had measured each variable at the nominal level. The
variables are:
1) Gender: Measured as male or female; and
2) Attitude: Measured simply as a "in favour of' or as "opposed to" gender equality.

To test the relationship between gender and .attitude, hypothesis can be developed as

Adult females have more favourable views toward gender equality than adult males.

Bi-variate analysis is concerned with the relationship between values of two variables, a
pair of values being known for each of a number of individuals or entities. The two
variables may be conventionally labeled X and Y and the values for each individual Xl,
21

Y1, X2, Y2 .Xn,Yn. The term data point will be used for the pair of values of the
variables of anyone individual.

Three cases may be distinguished:


1) Both variables are numerical;
2) Both variables are categorical; or
3) One variable is numerical and the other is categorical. Each of these cases will be
reviewed.

Bivariate Statistical Techniques


1) Correlation Analysis
2) Linear Regression Analysis
3) Association of Attributes
4) Two-way ANOVA

CROSS TABULATION

"A cross-tabulation is a technique that describes two or more variables simultaneously


and results in tables that reflect the joint distribution of two or more variables that have a
limited number of categories or distinct values".

A cross tabulation displays the joint distribution of two or more variables. They are
usually presented as a contingency table in a matrix format A contingency table contains
a cell for every combination of categories of the two variables.

Cross-tabulation analysis, also known as contingency table analysis, is most often


used to analyse categorical (nominal measurement scale) data. A cross-tabulation is a
two (or more) dimensional table that records the number (frequency) of respondents
that have the specific characteristics described in the cells of the table. Cross-tabulation
tables provide a wealth of information about the relationship between the variables.

Cross tabulation lead to following contingency matrix as


22

CORRELATION ANALYSIS
Correlation is the study of the linear relationship between two variables. When there is a
relationship of quantitative measure between two set of variables, the appropriate statistical tool
for measuring the relationship and expressing each in a precise way is known as correlation.

For example, there is a relationship between the heights and weights of persons, demand and
prices of commodities etc:

Meaning
Correlation analysis is the statistical tool we can use to describe the degree to which one
variable is linearly related to another.

Significance of Measuring Correlation


1) Study Relationship between Variables: Correlation is very useful to economists
to study the relationship between variables, like price and quantity demanded. To
businessmen, it helps to estimate costs, sales, price and other related variables.

2) Measuring Degree of Association and Direction: For example, there exists a


relationship between price, supply and quantity demanded; convenience, amenities, and
service standards are related to customer retention; yield a crop related to quantity
of fertilizer applied, type of soil, quality of seeds, rainfall and so on. Correlation
analysis helps in measuring the degree of association and direction of such relationship.

3) Verifying and Testing Relation between Variables: The relation between


variables can be verified and tested for significance, with the help of the correlation
analysis.
4) Compare the Relationship between Variables: The coefficient of correlation is a
relative measure and we can compare the relationship between variables, which are
expressed in different units.

5) Determining Validity and Reliability: Correlations are useful in the areas of


healthcare such as determining the validity and reliability of clinical measures or in
_expressing how health problems are related to certain biological or environmental
factors. For example, correlation coefficient can be used to determine the degree
of inter-observer reliability for two doctors who are assessing a patient's disease.

6) Other
i) Sampling error can also be calculated.
23

ii) Correlation is the basis for the concept of regression and ratio of variation.
iii) The decision making is heavily felicitated by reducing the range of uncertainty
and hence empowering the predictions.

Types of Correlation

1) On the Basis of Direction:

i) Positive Correlation: When the values of two variables move in the same
direction i.e. when an increase in the value of one variable is associates with an increase
in the value of other variable, and a decrease in the value of one variable is associated
with the decrease in the value of the other variable, correlation is to be positive. For
example, heights and weights, income and expenditure ofa groups of individuals, price,
and supply of commodities.

For example, Positively Correlated Data

ii) Negative Correlation: The values of two variables move in opposite directions,
so that with an increase in the values of one variable the value of the other variable
decrease, and with a decrease in the values of one variable the values of the other variable
increase, correlation is said to be negative. For example, when prices increase, demand
goes down. Thus there is a negative correlation between these two variables i.e., demand
and supply.

For example, Negatively Correlated Data

2) On the Basis of Ratio of Change Direction: On this basis the correlation is


categorised as:
24

i) Linear Correlation: The correlation between two variables is said to be linear if


corresponding to a unit change in the value of one variable there is a constant change in
the value of the other variable In case of linear correlation the relation between the
variables x and y is of the type.
y = a+ bx

If a = 0, the relation becomes y = bx. In such cases the values of the variables are in constant
ratio.

ii) Non-linear (Curvilinear) Correlation: The correlation between two variables is


said to be non-linear or curvilinear if corresponding to a unit change in the value of one
variable the other variable does not change at a constant rate but at a fluctuating rate.

For example, the following will illustrate the cases of linear and non-linear correlations.
The following diagrams will illustrate the cases of linear and, curvilinear (non-linear)
correlation:

3) On the Basis of the Number of Variables: This category includes:


i) Simple Correlation: In simple correlation we study only two variables: say price and
demand.

ii) Multiple Correlations: In multiple correlations we study together the relationship


between three or more factors like production, rainfall and use of fertilizers.
iii) Partial Correlation: In partial correlation though more than two factors are involved
but correlation is studied only between two factors and the other factors are eliminated to
be constant.

For example, if out of the three related variables, say, marks in statistics, marks in
accountancy, and marks in English, we study the correlation between the two variables,
viz., marks in, statistics, and marks in Accountancy eliminated the effect of the other
variable, i.e., marks in English, it will be a case of partial correlation. On the other hand,
when the relationship between any two or more variables is studied at a time, it is a case of
multiple correlations. If the relationship between the volume of profits, volume of sales,
and the volume of cost of sales at a time are studied it will be a case of multiple
correlation. In actual practice, there are many variables affecting the target variable and
if the businessman is aware of multiple correlation can handle the situation intelligently as
generally people use only simple correlation.
25

Methods of Studying Linear Correlation


Following are some methods for calculation of correlation coefficient. The first one is
based on the knowledge of graphs whereas the others are the mathematical or algebraic
methods.

SCATTER PLOTS DIAGRAM

Scatter diagram is a special type of dot chart. Under this method the given data are
plotted in a graph paper in the form of dot, For each pair of x and y values, we plot a dot
(or point) and thus we obtain many dots equal in number of observation. If now these
plotted dots (or points) show some trends either upwards or downwards, then the two
variables (x and y) are said to be correlated, or otherwise not correlated. If again the
trend of points is upwards moving from lower left hand corner to upper right hand
corner, then correlation is positive (r = +1). (r is coefficient of correlation). On the other
hand, if movement is reverse i.e., dots move from upper left hand corner to lower right
hand corner, then correlation is negative( r = -1).

Karl Pearson's Coefficient of Correlation


Karl Pearson, a great biometrician and statistician, suggested a mathematical method for
measuring the magnitude of linear relationship between two variables. 'Karl Pearson's
method is the most widely used method in practice and is known as Pearsonian
Coefficient of Correlation. It is denoted by the symbol 'r': the formula for calculating
Pearsonian r is:

When the deviations of items are taken from the actual mean, we can apply anyone of
these methods; but the simplest formula is the third one.

Characteristics o f Karl Pearson's Coefficient of Correlation


i) It is an ideal measure of correlation and is independent of the units of X and Y.
ii) It is independent of change of origin and scale.
iii) It is based on all the observations. iv) It varies between - 1 and + 1:
r = -1, when there is a perfect negative correlation
r = 0, when there is no correlation
r = + 1, when there is a perfect positive correlation. .
v) It does not tell anyt hing about cause and effect relationship.
vi) It is somehow difficult to calculate. vii) It requires some interpretation,
26

Spearmans Rank Correlation

When the variables under consideration are not capable of quantitative measurement
but can be arranged in serial order (ranks), we find correlation between the ranks of two
series. This happens when we deal with qualitative characteristics such as honesty, beauty,
etc.

This method is called Spearman's Rank Difference Method or Ranking Method and the
correlation coefficient so obtained is called Rank Correlation Coefficient and is denoted
by r. This method was developed by Charles Edward Spearman, a British Ps ychologist in
1904.

For example, if the contest is judged by three judges' and the data is available in the form
of ranks given by these judges, we can find out which two of them have similar opinion.
Spearman's rank correlation coefficient is defined as,
27

Spearman's rank correlation coefficient is also used when the measurements are given for
both the series.

Calculation of Rank Correlation


It consists of following conditions:
1) Where Ranks a r e given: When the actual ranks are given, the steps are as follows:
i) Compute the difference of the two ranks (R1 and R2) and denote by D.
ii) Square the D and get D 2
iii) Substitute the figures in the formula.

Example : Following are the rank obtained by 10 students in two subjects, Statistics and
Mathematics. To what extent the knowledge of the students in the two subjects is related?
.
28

LINEAR REGRESSION ANALYSIS


By correlation one know the direction and extent of relationship in two related
variables but if we want to know the best estimate of dependent variable from the
values of a variable, the same cannot be calculated from correlation. So for this
purpose we have to use regression analysis.

For example, if one knows that the yield of rice and rainfall are closely related then
one want to know the amount of rain required to achieve a certain production. For this
purpose one will use regression analysis.

Meaning of Regression
"Regression i s the measure of the average relationship between two or more variables in
terms of the original units of data."

Regression Coefficients
Let 'b' is the slope of line of regression of Yon X also called coefficient of regression of
Y on X. It represents t h e increment in the value of dependant variable Y corresponding to
a unit change in the value of the independent variable-X.

Similarly, the coefficient of regression of X on Y indicates the change in the value of


variable X corresponding to a unit change in the value of variable Y.

Two Lines of Regression


The regression equations express the regression lines. As there are two regressionlines,
so there are two regression equations. The regression equation X on Y describes the
variation in the values of X for the given changes in Y, and used for estimating the value
of X for the given value of Y, Similarly, the regression equation Y on X describes the
variation in the values of Y for the given changes in X, and is used for estimating the
value of Y for the given value of X.
1) Regression Equation of X on Y: The regression equation of X on Y is expressed as
follows:
X=a+bY

2) Regression Equation of Y on X: The regression equation of Y on X is expressed as


follows: Y=a+bX

Вам также может понравиться