Академический Документы
Профессиональный Документы
Культура Документы
INTRODUCTION TO STATISTICS
ASSIGNMENT#1
Fall 2010
Definition of Statistics:
Business statistics is the science of good decision making in the face of uncertainty and
is used in many disciplines such as financial analysis, econometrics, auditing, production
and operations including services improvement, and marketing research. These sources
feature regular repetitive publication of series of data. This makes the topic of time series
especially important for business statistics. It is also a branch of applied statistics working
mostly on data collected as a by-product of doing business or by government agencies. It
provides knowledge and skills to interpret and use statistical techniques in a variety of
business applications. A typical business statistics course is intended for business majors,
and covers statistical study, descriptive statistics (collection, description, analysis, and
summary of data), probability, and the binomial and normal distributions, test of
hypotheses and confidence intervals, linear regression, and correlation.
Characteristics of Statistics
Some of its important characteristics are given below:
i) Inferential statistics is the process of drawing conclusions from data that are subject
to random variation, for example, observational errors or sampling variation. More
substantially, the terms statistical inference, statistical induction and inferential
statistics are used to describe systems of procedures that can be used to draw conclusions
from datasets arising from systems affected by random variation. Initial requirements of
such a system of procedures for inference and induction are that the system should
produce reasonable answers when applied to well-defined situations and that it should be
general enough to be applied across a range of situations.
The outcome of statistical inference may be an answer to the question "what should be
done next?", where this might be a decision about making further experiments or surveys,
or about drawing a conclusion before implementing some organizational or governmental
policy.
Statistical forecasting: Estimating the likelihood of an event taking place in the future,
based on available data.
Statistical forecasting concentrates on using the past to predict the future by identifying
trends, patterns and business drives within the data to develop a forecast. This forecast is
referred to as a statistical forecast because it uses mathematical formulas to identify the
patterns and trends while testing the results for mathematical reasonableness and
confidence.
(2) Statistical helps in proper and efficient planning of a statistical inquiry in any field of
study.
(4) Statistics helps in presenting complex data in a suitable tabular, diagrammatic and
graphic form for an easy and clear comprehension of the data.
(5) Statistics helps in understanding the nature and pattern of variability of a phenomenon
through quantitative observations.
(6) Statistics helps in drawing valid inference, along with a measure of their reliability
about the population parameters from the sample data.
Scopes of statistics:
Statistics is considered by some to be a mathematical science pertaining to the collection,
analysis, interpretation or explanation, and presentation of data, while others consider it a
branch of mathematics concerned with collecting and interpreting data. Because of its
empirical roots and its focus on applications, statistics is usually considered to be a
distinct mathematical science rather than a branch of mathematics.
Statisticians improve the quality of data with the design of experiments and survey
sampling. Statistics also provides tools for prediction and forecasting using data and
statistical models. Statistics is applicable to a wide variety of academic disciplines,
including natural and social sciences, government, and business.
Limitations of statistics:
The important limitations of statistics are:
(1) Statistics laws are true on average. Statistics are aggregates of facts. So single
observation is not a statistics, it deals with groups and aggregates only.
(4) It sufficient care is not exercised in collecting, analyzing and interpretation the data,
statistical results might be misleading.
(5) Only a person who has an expert knowledge of statistics can handle statistical data
efficiently.
(6) Some errors are possible in statistical decisions. Particularly the inferential statistics
involves certain errors. We do not know whether an error has been committed or not.
What is data?
Factual information, especially information organized for analysis or used to reason or
make decisions.
Computer Science. Numerical or other information represented in a form suitable for
processing by computer.
Values derived from scientific experiments.
i) Primary data:
The primary data are the first hand information collected, compiled and published by
organization for some purpose. They are most original data in character and have not
undergone any sort of statistical treatment.
Example: Population census reports are primary data because these are collected,
complied and published by the population census organization.
The secondary data are the second hand information which are already collected by some
one (organization) for some purpose and are available for the present study. The
secondary data are not pure in character and have undergone some treatment at least
once.
Example: Economics survey of England is secondary data because these are collected by
more than one organization like Bureau of statistics, Board of Revenue, the Banks etc…
Information, facts and data available from within a company's Information System.
Internal data is normally not accessible by outside parties without the company's express
permission.
Designing a questionnaire:
Questionnaires are an inexpensive way to gather data from a potentially large number of
respondents. Often they are the only feasible way to reach a number of reviewers large
enough to allow statistically analysis of the results. A well-designed questionnaire that is
used effectively can gather information on both the overall performance of the test
system as well as information on specific components of the system. If the questionnaire
includes demographic questions on the participants, they can be used to correlate
performance and satisfaction with the test system among different groups of users.
It is important to remember that a questionnaire should be viewed as a multi-stage
process beginning with definition of the aspects to be examined and ending with
interpretation of the results. Every step needs to be designed carefully because the final
results are only as good as the weakest link in the questionnaire process. Although
questionnaires may be cheap to administer compared to other data collection methods,
they are every bit as expensive in terms of design time and interpretation.
i) Pre-testing a questionnaire:
• Participating pretests dictate that you tell respondents that the pretest is a practice
run; rather than asking the respondents to simply fill out the questionnaire,
participating pretests usually involve an interview setting where respondents are
asked to explain reactions to question form, wording and order. This kind of
pretest will help you determine whether the questionnaire is understandable.
• When conducting an undeclared pretest, you do not tell respondents that it is a
pretest. The survey is given just as you intend to conduct it for real. This type of
pretest allows you to check your choice of analysis and the standardization of
your survey. According to Converse and Presser (1986), if researchers have the
resources to do more than one pretest, it might be best to use a participatory
pretest first, then an undeclared test.
Editing of Data:
After collecting the data either from primary or secondary source, the next step is
its editing. Editing means the examination of collected data to discover any error and
mistake before presenting it. It has to be decided before hand what degree of accuracy is
wanted and what extent of errors can be tolerated in the inquiry. The editing of secondary
data is simpler than that of primary data.
Presentation of data:
Data can be presented in statistics by tabulation, charts, histograms, pie charts, box plots,
etc.
Examples
Survey results of the ages of students in the Adult Basic Education maths classes are
shown in this frequency table.
Use the chart above and include a cumulative frequency column. From this draw an ogive
graph. Use the graph to find the median, lower quartile, upper quartile and lowest and
highest value.
Draw a box-plot.
Lowest Value: 15
Highest Value: 44
Five-number summary
Lowest value: 15
25% Quartile: 21
50% Quartile(median) 26
75% Quartile: 36
Highest value: 44
Box Plot
Classifications of data:
A. According to Nature
1. Quantitative data- information obtained from numeral variables(e.g. age, bills, etc)
2.Qualitative Data- information obtained from variables in the form of categories,
characteristics names or labels or alphanumeric variables (e.g. birthdays, gender etc.)
B. According to Source
1. Primary data- first- hand information (e.g. autobiography, financial statement)
2. Secondary data- second-hand information (e.g. biography, weather forecast from news
papers)
C. According to Measurement
1. Discrete data- countable numerical observation.
-Whole numbers only
- has an equal whole number interval
- obtained through counting (e.g. corporate stocks, etc.)
2. Continuous data-measurable observations.
-Decimals or fractions
-obtained through measuring (e.g. bank deposits, volume of liquid etc.)
D. According to Arrangement
1. Ungrouped data- raw data
- no specific arrangement
2. Grouped Data - organized set of data
- at least 2 groups involved
-arranged
Tabulation of data:
The process of placing classified data into tabular form is known as tabulation. A table is
a symmetric arrangement of statistical data in rows and columns. Rows are horizontal
arrangements whereas columns are vertical arrangements. It may be simple, double or
complex depending upon the type of classification.
Bases of Classification:
There are four important bases of classification:
(1) Qualitative Base (2) Quantitative Base (3) Geographical Base (4) Chronological or
Temporal Base
(1) Qualitative Base:
When the data are classified according to some quality or attributes such as sex,
religion, literacy, intelligence etc…
Tabulation of table:
A) Parts of a table:
----THE TITLE----
----Prefatory Notes----
----Box Head----
• A table should be simple and attractive. There should be no need of further explanations
(details).
• Proper and clear headings for columns and rows should be need.
• Suitable approximation may be adopted and figures may be rounded off.
• The unit of measurement should be well defined.
• If the observations are large in number they can be broken into two or three tables.
• Thick lines should be used to separate the data under big classes and thin lines to separate
the sub classes of data.
B) Types of table:
Charting data:
We have discussed the techniques of classification and tabulation that help us in organizing the
collected data in a meaningful fashion. However, this way of presentation of statistical data does
not always prove to be interesting to a layman. Too many figures are often confusing and fail to
convey the message
One of the most effective and interesting alternative way in which a statistical data may be
presented is through diagrams and graphs. There are several ways in which statistical data may
be displayed pictorially such as different types of graphs and diagrams.
Types of diagrams:
Simple Bar Chart
A simple bar chart is used to represents data involving only one variable classified on
spatial, quantitative or temporal basis. In simple bar chart, we make bars of equal width but
variable length, i.e. the magnitude of a quantity is represented by the height or length of the bars.
Multiple Bar Charts
By multiple bars diagram two or more sets of inter-related data are represented (multiple
bar diagram facilities comparison between more than one phenomena). The technique of simple
bar chart is used to draw this diagram but the difference is that we use different shades, colors, or
dots to distinguish between different phenomena. We use to draw multiple bar charts if the total
of different phenomena is meaningless.
Sub-divided or component bar chart is used to represent data in which the total
magnitude is divided into different or components.
In this diagram, first we make simple bars for each class taking total magnitude in that class and
then divide these simple bars into parts in the ratio of various components. This type of diagram
shows the variation in different components within each class as well as between different
classes. Sub-divided bar diagram is also known as component bar chart or staked chart.
Percentage Component Bar Chart
Sub-divided bar chart may be drawn on percentage basis. To draw sub-divided bar chart on
percentage basis, we express each component as the percentage of its respective total. In drawing
percentage bar chart, bars of length equal to 100 for each class are drawn at first step and sub-
divided in the proportion of the percentage of their component in the second step. The diagram so
obtained is called percentage component bar chart or percentage staked bar chart. This type of
chart is useful to make comparison in components holding the difference of total constant.
Pie Chart
Pie chart can used to compare the relation between the whole and its components. Pie
chart is a circular diagram and the area of the sector of a circle is used in pie chart. Circles are
drawn with radii proportional to the square root of the quantities because the area of a circle is
.
To construct a pie chart (sector diagram), we draw a circle with radius (square root of the
total). The total angle of the circle is . The angles of each component are calculated by the
formula.
Angle of Sector
These angles are made in the circle by mean of a protractor to show different components. The
arrangement of the sectors is usually anti-clock wise.
Frequency Distribution
A frequency distribution is a tabular arrangement of data into classes according to the size or
magnitude along with corresponding class frequencies (the number of values fall in each class).
Grouped Data:
Data presented in the form of frequency distribution is called grouped data.
Array:
The numerical raw data is arranged in ascending or descending order is called an array.
Example:
Array the following data in ascending or descending order 6, 4, 13, 7, 10, 16, 19.
Solution:
Array in ascending order is 4, 6, 7, 10, 13, 16, and 19
Array in descending order id 19, 16, 13, 10, 7, 6, and 4
Class Limits:
The variant values of the classes or groups are called the class limits. The smaller value of
the class is called lower class limit and larger value of the class is called upper class limit. Class
limits are also called inclusive classes.
For Example: Let us take the class 10 – 19, the smaller value 10 is lower class limit and larger
value 19 is called upper class limit.
Class Boundaries:
The true values, which describe the actual class limits of a class, are called class
boundaries. The smaller true value is called the lower class boundary and the larger true value is
called the upper class boundary of the class. It is important to note that the upper class boundary
of a class coincides with the lower class boundary of the next class. Class boundaries are also
known as exclusive classes.
For Example:
Weights in Kg No of Students
60 – 65 8
65 – 70 12
70 – 75 5
25
A student whose weights are between 60kg and 64.5kg would be included in the 60 – 65 class. A
student whose weight is 65kg would be included in next class 65 – 70.
Open-end Classes:
A class has either no lower class limit or no upper class limit in a frequency table is called
an open-end class. We do not like to use open-end classes in practice, because they create
problems in calculation.
For Example:
Below – 110 6
110 – 120 12
120 – 130 20
130 – 140 10
140 – Above 2
Class Mark or Mid Point:
The class marks or mid point is the mean of lower and upper class limits or boundaries. So
it divides the class into two equal parts. It is obtained by dividing the sum of lower and upper
class limit or class boundaries of a class by 2.
For Example: The class mark or mid point of the class 60 – 69 is 60+69/2 = 64.5