Вы находитесь на странице: 1из 11

QT II

CODE 358 A

Question 1
(A) Explain the concept of frequency polygon & cumulative frequency curves or Ogires
the discuss the utility of ogires.
Answer
Frequency polygons are a graphical device for understanding the shapes of distributions.
They serve the same purpose as histograms, but are especially helpful in comparing sets of
data. Frequency polygons are also a good choice for displaying cumulative frequency
distributions.
To create a frequency polygon, start just as for histograms, by choosing a class interval. Then
draw an X-axis representing the values of the scores in your data. Mark the middle of each
class interval with a tick mark, and label it with the middle value represented by the class.
Draw the Y-axis to indicate the frequency of each class. Place a point in the middle of each
class interval at the height corresponding to its frequency. Finally, connect the points. You
should include one class interval below the lowest value in your data and one above the
highest value. The graph will then touch the X-axis on both sides.
A frequency polygon for 642 psychology test scores is shown in Figure 1. The first label on
the X-axis is 35. This represents an interval extending from 29.5 to 39.5. Since the lowest test
score is 46, this interval has a frequency of 0. The point labeled 45 represents the interval
from 39.5 to 49.5. There are three scores in this interval. There are 150 scores in the interval
that surrounds 85.
You can easily discern the shape of the distribution from Figure 1. Most of the scores are
between 65 and 115. It is clear that the distribution is not symmetric inasmuch as good scores
(to the right) trail off more gradually than poor scores (to the left). In the terminology of
Chapter 3 (where we will study shapes of distributions more systematically), the distribution
is skewed.

Figure 1: Frequency polygon for the psychology test scores.

QT II 1

A cumulative frequency polygon for the same test scores is shown in Figure 2. The graph is
the same as before except that the Y value for each point is the number of students in the
corresponding class interval plus all numbers in lower intervals. For example, there are no
scores in the interval labeled "35," three in the interval "45,"and 10 in the interval
"55."Therefore the Y value corresponding to "55" is 13. Since 642 students took the test, the
cumulative frequency for the last interval is 642.

Figure 2: Cumulative frequency polygon for the psychology test


scores.
Frequency polygons are useful for comparing distributions. This is achieved by overlaying
the frequency polygons drawn for different data sets. Figure 3 provides an example. The data
come from a task in which the goal is to move a computer mouse to a target on the screen as
fast as possible. On 20 of the trials, the target was a small rectangle; on the other 20, the
target was a large rectangle. Time to reach the target was recorded on each trial. The two
distributions (one for each target) are plotted together in Figure 3. The figure shows that
although there is some overlap in times, it generally took longer to move the mouse to the
small target than to the large one.

Figure 3: Overlaid frequency polygons.


It is also possible to plot two cumulative frequency distributions in the same graph. This is
illustrated in Figure 4 using the same data from the mouse task. The difference in
distributions for the two targets is again evident.

QT II 2

1b)
The following table gives the average monthly earning of the workers in the factory
Monthly earnings in Rs
800-850
850-900
900-950
950-1000
1000-1050
1050-1100
1100-1150
1150-1200
1200-1250
1250-1300
1300-1350
1350-1400
1450-1500
1500-1550

No of workers
21
29
19
39
43
94
73
68
36
45
27
48
21
5

Draw Ogires ley Less than and more than method for the data given above.
ANSWER
LESS THAN

QT II 3

MORE THAN

QUESTION 2
While calculating the coeff of correlation between..
While calculating the coeff of correlation between two variables X & Y . the following results
were obtained N = 125 , Sigma Y = 100 and sigma x*x = 650 . Sigma Y*Y = 460 , sigma
X*Y = 508. It has however later discovered at the time of checking that two pairs of
observations (XY) were copied (6,14) and (8,6) while the correct values were (8,12) and (8,6)
while the correct values were (8,12) and (6,8) respectively. Determine the correct value of the
coeft of correlation.

Answer

Corrected sumx = 127-16+14= 125

Corrected sumy = 100-20+20= 100

Correctedsumsquarex=760-128+100=132 =125

Correctedsumsquarey=449-232+208=425

Corrected product xy=500-160+144=484

R = -0.3

QUESTION 3
a) Explain the following:
(i) Merits of census investigation

As part of an integrated programme of data collection, the population census is the primary
source of basic national population data required for administrative purposes and for many

QT II 4

aspects of economic and social research and planning. The value of the census results is
increased if they can be employed together with the results of other investigations, as in the
use of the census data as a base of benchmark for current statistics. The usefulness of the
census is also enhanced if it can furnish the information needed for conducting other
statistical investigations. It can, for example, provide a statistical frame for other censuses
and sample surveys. The purpose of a continuing programme of data collection can best be
served, therefore, if the relationship between the population census and other statisitcal
investigations is considered when census planning is under way and if provision is made for
facilitating the use of the census and its results in connexion with intercensal sample surveys,
with continuous population registers, with other types of censuses and with civil registration
and vital statistics, and with labour force, educational and similar statistics. The use of
consistent concepts and definitions throughout an integrated programme of data collection is
essential if the advantages of these relationships are to be fully realized.
(ii) Classification of frequency distribution
Frequency distributions are like frequency polygons; however, instead of straight lines, a frequency
distribution uses a smooth curve to connect the points and, similar to a graph, is plotted on two axes:
The horizontal axis from left to right (or x axis) indicates the different possible values of some
variable (a phenomenon where observations vary from trial to trial). The vertical axis from bottom to
top (or y axis) measures frequency or how many times a particular value occurs.
or example, in Figure 1 , the x axis might indicate annual income (the values would be in thousands of
dollars); the y axis might indicate frequency (millions of people or percentage of working population).
Notice that in Figure 1 the highest percentage of the working population would thus
have an annual income in the middle of the dollar values. The lowest
percentages would be at the extremes of the values: nearly 0 and extremely
high.

Figure 1A symmetric bell curve.

Notice that this frequency curve displays perfect symmetry; that is, one half (the left side) is the
mirror image of the other half (the right side). A bell-shaped or mound-shaped curve is also normal,
giving it special properties.
The negatively skewed curve, shown in Figure 2 , is skewed to the left. Its greatest frequency
occurs at a value near the right of the graph.

QT II 5

Figure 2Negatively skewed bell curve.

The positively skewed curve (see Figure 3 ) is skewed to the right. Its greatest frequency occurs at a
value near the left of the graph. This distribution is probably a more accurate representation of the
annual income of working Americans than is Figure 1 .
(iii) Types of diagrams

There are at least the following types of diagrams:

Graph-based diagrams: these take a collection of items and relationships between them, and
express them by giving each item a 2D position, while the relationships are expressed as
connections between the items or overlaps between the items; examples of such
techniques:See
o tree diagram
o

network diagram

cluster diagram

flowchart

Euler diagram, Venn diagram, existential graph

tree diagramnetwork diagram

flowchart

Venn diagram

existential graph

Chart-like diagram techniques, which display a relationship between two variables that take
either discrete or a continuous ranges of values; examples:
o histogram, bar chart
o

pie chart

function graph

QT II 6

scatter plot

table / matrix

histogram

bar chart

pie chart

function graph

b) Incomplete information obtained from a partly destroyed record on cost of living analysis is given
below:
Group
(i)
(ii)
(iii)
(iv)
(v)

Food
Clothing
Housing
Fuel and Electricity
Miscellaneous

Group index

Percentage of total expenditure

134

60

140

Not available

105

20

120

130

Not available

The CLI with percentage of total expenditure as weight was found to be 127.9. Estimate the weights
used for clothing and miscellaneous
Answer

Group

Group index

Percentage of total expenditure

(i)

Food

134

60

(ii)

Clothing

140

10.1

(iii)

Housing

105

20

(iv)

Fuel and Electricity

120

(v)

Miscellaneous

130

4.9

Total
Average group index would be

629 85X
95

For Clothing would be


Hence average would be 127.9X would be the average group index would be 140
Difference would be = 12.1

QT II 7

Hence the first average for the expenditure would be 4.9


Question No 4:
i) Explain the following:
a) Components of time series;

Four separate components trend, cyclical, seasonal and irregular combine to provide
specific values for the time series.
Trend Component
Trend is the underlying long-term movement over time in the value of the data recorded. This
shifting or trend is usually the result of long-term factors such as changes in the population,
demographic characteristics of the population, technology and consumer preferences.
Seasonal Variations
Are short-term fluctuations in recorded values, due to different circumstances, which affect
results at different times of the year, on different days of the week, at different times of day,
or whatever.
Examples of Seasonal Variation are as follows.

Sales of ice cream will be higher in summer than in winter, and sales of overcoats will be
higher in autumn than in spring.
Shops might expect higher sales shortly before Christmas or in their winter and summer sales.

Sales might be higher on Friday and Saturday than on Monday.

The telephone network may be heavily used at a certain times of the day (such as midmorning and mid-afternoon) and much less used at other times (such as in the middle of the
night)

Cyclical Variation
These are medium-term changes in results caused by circumstances which repeat in cycles. In
business, cyclical variations are commonly associated with economic cycles, successful
booms and slumps in the economy. Economic cycles may last a few years. Cyclical
Variations are longer term than seasonal variations.
Random Factors
These are disturbances due to everyday unpredictable influences, such as weather
conditions, illness, transport breakdowns and so on.
b) Multiple correlation;
An intuitive approach to the multiple regression analysis is to sum the squared correlations between
the predictor variables and the criterion variable to obtain an index of the over-all relationship
between the predictor variables and the criterion variable. However, such a sum is often greater than
QT II 8

one, suggesting that simple summation of the squared coefficients of correlations is not a correct
procedure to employ. In fact, a simple summation of squared coefficients of correlations between the
predictor variables and the criterion variable is the correct procedure, but only in the special case
when the predictor variables are not correlated. If the predictors are related, their inter-correlations
must be removed so that only the unique contributions of each predictor toward explanation of the
criterion.
c) Long run cost curve.

The typical microeconomics textbook and classroom development of cost curves consists of
two parts. One shows how the per-unit cost curves (average and marginal) relate to total
costs. The second shows how long-run cost curves (total, average, and marginal) relate to
their short-run counterparts.
First consider the relationships between average and marginal curves. When the average
value involves a linear relationship, the representation is simple: the marginal curve is half
the horizontal distance to the average curve. For nonlinear cost curves, however, drawing the
marginal curves so that they correspond to the average curve (or vice versa) can be tedious.
Too often we simply sketch a marginal cost curve that cuts the average cost curve at its
minimum point and assume that this is good enough. Even textbook authors commit this error
fairly frequently. (This is not an exercise in textbook bashing. We do not cite the textbooks
that commit the errors noted below. Readers may contact the authors for examples.)
Failure to draw the curves consistently causes at least two inconsistencies. One is that the
quantity at which marginal revenue equals marginal cost will not be the quantity at which
profit(price less average cost) times quantityis, in fact, maximized. The other is that
profit as defined above will not equal profit, defined as the area between the marginal
revenue and marginal cost curves. One of the leading principles textbooks contains a graph in
which the area between the marginal revenue curve and the marginal cost curve is roughly
two thirds larger than the area defined in terms of price, average cost, and quantity. Such a
discrepancy is large enough to confuse students.
ii) The following results were obtained from the record of age (x) and blood pressure (y) of a group of
10 women:
x

Mean

53

142

Variance

130

165
& Sigma (x-y) (y-y) = 1220

Find the regression equation of y on x and use it to estimate the blood pressure of a woman of age 45.

Regression Formula:
Regression Equation(y) = a + bx
Slope(b) = (NXY - (X)(Y)) / (NX2 - (X)2)
Intercept(a) = (Y - b(X)) /

QT II 9

b= (530-53) (1420-142)/ Square of (530-53)

= 477*1278/227529
= 609606/227529
= 2.68

a= 142-2.68*

53

= 142-142.04
= 0.4
Y= 0.4 + .68X
I.Q test was administered to 5 persons before and after they are trained. The results are given below:
Candidates

I.Q. before training

70

80

83

92

85

I.Q. after training

80

78

85

96

81

Test whether there is any improvement in I.Q. after the training. Given that t0.05,4=2.13.
Null hypothesis:
H0 : 1 = 2 i.e. there is no significant change in IQ after the training programme.
Alternative Hypothesis:
H1 : 1
2 (two tailed test)
Level of significance :
=
0.01

QT II 10

Candidates

70

80

83

92

85

80

78

85

96

81

D=X-Y
Square of D
Mean(D)

-10

-2

-4

100

16

16

= sum of D/ N = -10/5 = 2

Square of S = 1/n-1 ( sum of squre of D- sum of D/ N)


= ( 140-20)
= 30
Calculation of Statistic:
Under H0 the test statistic is

t= ( mean of D)/S/root of N

= -2/2.45
= 0.816
Expected value:
follows t-distribution with 5 1 = 4 d.f = 4.604

Inference:
Since t0 < te at 1% level of significance we accept the null hypothesis. We therefore, conclude that
there is no change in IQ after the training programme.

QT II 11

Вам также может понравиться