Вы находитесь на странице: 1из 17

BUS 172

INTRODUCTION TO STATISTICS

ASSIGNMENT#1

Name: Camelia Fatema Lopa


ID#: 1010 533 530
SEC- 13

Instructor: M.SIDDIQUE HOSSAIN

Fall 2010
Definition of Statistics:

Statistics is a mathematical science pertaining to the collection, analysis, interpretation


or explanation, and presentation of data. Statisticians improve the quality of data with the
design of experiments and survey sampling. Statistics also provides tools for prediction
and forecasting using data and statistical models. Statistics is applicable to a wide variety
of academic disciplines, including natural and social sciences, government, and business.
Statistics is closely related to probability theory, with which it is often grouped.

Business statistics is the science of good decision making in the face of uncertainty and
is used in many disciplines such as financial analysis, econometrics, auditing, production
and operations including services improvement, and marketing research. These sources
feature regular repetitive publication of series of data. This makes the topic of time series
especially important for business statistics. It is also a branch of applied statistics working
mostly on data collected as a by-product of doing business or by government agencies. It
provides knowledge and skills to interpret and use statistical techniques in a variety of
business applications. A typical business statistics course is intended for business majors,
and covers statistical study, descriptive statistics (collection, description, analysis, and
summary of data), probability, and the binomial and normal distributions, test of
hypotheses and confidence intervals, linear regression, and correlation.

Characteristics of Statistics
Some of its important characteristics are given below:

• Statistics are aggregates of facts.


• Statistics are numerically expressed.
• Statistics are affected to a marked extent by multiplicity of causes.
• Statistics are enumerated or estimated according to a reasonable standard of
accuracy.
• Statistics are collected for a predetermine purpose.
• Statistics are collected in a systemic manner.
• Statistics must be comparable to each other.

What is Statistical Methods?


A common goal for a statistical research project is to investigate causality, and in
particular to draw a conclusion on the effect of changes in the values of predictors or
independent variables on dependent variables or response. There are two major types of
causal statistical studies: experimental studies and observational studies. In both types of
studies, the effect of differences of an independent variable (or variables) on the behavior
of the dependent variable are observed. The difference between the two types lies in how
the study is actually conducted. Each can be very effective. An experimental study
involves taking measurements of the system under study, manipulating the system, and
then taking additional measurements using the same procedure to determine if the
manipulation has modified the values of the measurements. In contrast, an observational
study does not involve experimental manipulation. Instead, data are gathered and
correlations between predictors and response are investigated.

i) Inferential statistics is the process of drawing conclusions from data that are subject
to random variation, for example, observational errors or sampling variation. More
substantially, the terms statistical inference, statistical induction and inferential
statistics are used to describe systems of procedures that can be used to draw conclusions
from datasets arising from systems affected by random variation. Initial requirements of
such a system of procedures for inference and induction are that the system should
produce reasonable answers when applied to well-defined situations and that it should be
general enough to be applied across a range of situations.

The outcome of statistical inference may be an answer to the question "what should be
done next?", where this might be a decision about making further experiments or surveys,
or about drawing a conclusion before implementing some organizational or governmental
policy.

ii) Statistical Forecasting

Statistical forecasting: Estimating the likelihood of an event taking place in the future,
based on available data.

Statistical forecasting concentrates on using the past to predict the future by identifying
trends, patterns and business drives within the data to develop a forecast. This forecast is
referred to as a statistical forecast because it uses mathematical formulas to identify the
patterns and trends while testing the results for mathematical reasonableness and
confidence.

Major functions of Statistics:


(1) Statistics helps in providing a better understanding and exact description of a
phenomenon of nature.

(2) Statistical helps in proper and efficient planning of a statistical inquiry in any field of
study.

(3) Statistical helps in collecting an appropriate quantitative data.

(4) Statistics helps in presenting complex data in a suitable tabular, diagrammatic and
graphic form for an easy and clear comprehension of the data.

(5) Statistics helps in understanding the nature and pattern of variability of a phenomenon
through quantitative observations.
(6) Statistics helps in drawing valid inference, along with a measure of their reliability
about the population parameters from the sample data.

Scopes of statistics:
Statistics is considered by some to be a mathematical science pertaining to the collection,
analysis, interpretation or explanation, and presentation of data, while others consider it a
branch of mathematics concerned with collecting and interpreting data. Because of its
empirical roots and its focus on applications, statistics is usually considered to be a
distinct mathematical science rather than a branch of mathematics.

Statisticians improve the quality of data with the design of experiments and survey
sampling. Statistics also provides tools for prediction and forecasting using data and
statistical models. Statistics is applicable to a wide variety of academic disciplines,
including natural and social sciences, government, and business.

Statistical methods can be used to summarize or describe a collection of data; this is


called descriptive statistics. This is useful in research, when communicating the results of
experiments. In addition, patterns in the data may be modeled in a way that accounts for
randomness and uncertainty in the observations, and are then used to draw inferences
about the process or population being studied; this is called inferential statistics.
Inference is a vital element of scientific advance, since it provides a prediction (based in
data) for where a theory logically leads. To further prove the guiding theory, these
predictions are tested as well, as part of the scientific method. If the inference holds true,
then the descriptive statistics of the new data increase the soundness of that hypothesis.
Descriptive statistics and inferential statistics (a.k.a., predictive statistics) together
comprise applied statistics.

Limitations of statistics:
The important limitations of statistics are:

(1) Statistics laws are true on average. Statistics are aggregates of facts. So single
observation is not a statistics, it deals with groups and aggregates only.

(2) Statistical methods are best applicable on quantitative data.

(3) Statistical cannot be applied to heterogeneous data.

(4) It sufficient care is not exercised in collecting, analyzing and interpretation the data,
statistical results might be misleading.

(5) Only a person who has an expert knowledge of statistics can handle statistical data
efficiently.
(6) Some errors are possible in statistical decisions. Particularly the inferential statistics
involves certain errors. We do not know whether an error has been committed or not.

What is data?
 Factual information, especially information organized for analysis or used to reason or
make decisions.
 Computer Science. Numerical or other information represented in a form suitable for
processing by computer.
 Values derived from scientific experiments.

i) Primary data:

The primary data are the first hand information collected, compiled and published by
organization for some purpose. They are most original data in character and have not
undergone any sort of statistical treatment.
Example: Population census reports are primary data because these are collected,
complied and published by the population census organization.

ii) Secondary data:

The secondary data are the second hand information which are already collected by some
one (organization) for some purpose and are available for the present study. The
secondary data are not pure in character and have undergone some treatment at least
once.
Example: Economics survey of England is secondary data because these are collected by
more than one organization like Bureau of statistics, Board of Revenue, the Banks etc…

iii) Internal data:

Information, facts and data available from within a company's Information System.
Internal data is normally not accessible by outside parties without the company's express
permission.

Designing a questionnaire:
Questionnaires are an inexpensive way to gather data from a potentially large number of
respondents. Often they are the only feasible way to reach a number of reviewers large
enough to allow statistically analysis of the results. A well-designed questionnaire that is
used effectively can gather information on both the overall performance of the test
system as well as information on specific components of the system. If the questionnaire
includes demographic questions on the participants, they can be used to correlate
performance and satisfaction with the test system among different groups of users.
It is important to remember that a questionnaire should be viewed as a multi-stage
process beginning with definition of the aspects to be examined and ending with
interpretation of the results. Every step needs to be designed carefully because the final
results are only as good as the weakest link in the questionnaire process. Although
questionnaires may be cheap to administer compared to other data collection methods,
they are every bit as expensive in terms of design time and interpretation.

The steps required to design and administer a questionnaire include:

1. Defining the Objectives of the survey


2. Determining the Sampling Group
3. Writing the Questionnaire
4. Administering the Questionnaire
5. Interpretation of the Results

i) Pre-testing a questionnaire:

Ultimately, designing the perfect survey questionnaire is impossible. However,


researchers can still create effective surveys. To determine the effectiveness of your
survey questionnaire, it is necessary to pretest it before actually using it. Pretesting can
help you determine the strengths and weaknesses of your survey concerning question
format, wording and order.

There are two types of survey pretests: participating and undeclared.

• Participating pretests dictate that you tell respondents that the pretest is a practice
run; rather than asking the respondents to simply fill out the questionnaire,
participating pretests usually involve an interview setting where respondents are
asked to explain reactions to question form, wording and order. This kind of
pretest will help you determine whether the questionnaire is understandable.
• When conducting an undeclared pretest, you do not tell respondents that it is a
pretest. The survey is given just as you intend to conduct it for real. This type of
pretest allows you to check your choice of analysis and the standardization of
your survey. According to Converse and Presser (1986), if researchers have the
resources to do more than one pretest, it might be best to use a participatory
pretest first, then an undeclared test.

Editing of Data:
After collecting the data either from primary or secondary source, the next step is
its editing. Editing means the examination of collected data to discover any error and
mistake before presenting it. It has to be decided before hand what degree of accuracy is
wanted and what extent of errors can be tolerated in the inquiry. The editing of secondary
data is simpler than that of primary data.
Presentation of data:
Data can be presented in statistics by tabulation, charts, histograms, pie charts, box plots,
etc.

Examples

Survey results of the ages of students in the Adult Basic Education maths classes are
shown in this frequency table.

Age Interval(yrs) Frequency


15-19 13
20-24 15
25-29 20
30-34 10
35-39 8
40-44 4

Use this data to produce a frequency histogram.

Use the chart above and include a cumulative frequency column. From this draw an ogive
graph. Use the graph to find the median, lower quartile, upper quartile and lowest and
highest value.
Draw a box-plot.

Age Interval(yrs) Frequency Cumulative frequency


15-19 13 13
20-24 15 28
25-29 20 48
30-34 10 58
35-39 8 66
40-44 4 70

Lowest Value: 15
Highest Value: 44

Five-number summary
Lowest value: 15
25% Quartile: 21
50% Quartile(median) 26
75% Quartile: 36
Highest value: 44

Box Plot
Classifications of data:
A. According to Nature
1. Quantitative data- information obtained from numeral variables(e.g. age, bills, etc)
2.Qualitative Data- information obtained from variables in the form of categories,
characteristics names or labels or alphanumeric variables (e.g. birthdays, gender etc.)

B. According to Source
1. Primary data- first- hand information (e.g. autobiography, financial statement)
2. Secondary data- second-hand information (e.g. biography, weather forecast from news
papers)

C. According to Measurement
1. Discrete data- countable numerical observation.
-Whole numbers only
- has an equal whole number interval
- obtained through counting (e.g. corporate stocks, etc.)
2. Continuous data-measurable observations.
-Decimals or fractions
-obtained through measuring (e.g. bank deposits, volume of liquid etc.)

D. According to Arrangement
1. Ungrouped data- raw data
- no specific arrangement
2. Grouped Data - organized set of data
- at least 2 groups involved
-arranged

Tabulation of data:

The process of placing classified data into tabular form is known as tabulation. A table is
a symmetric arrangement of statistical data in rows and columns. Rows are horizontal
arrangements whereas columns are vertical arrangements. It may be simple, double or
complex depending upon the type of classification.

TYPES OF DATA CLASSIFICATION:


The process of arranging data into homogenous group or classes according to some
common characteristics present in the data is called classification.
For Example: The process of sorting letters in a post office, the letters are classified
according to the cities and further arranged according to streets.

Bases of Classification:
There are four important bases of classification:
(1) Qualitative Base (2) Quantitative Base (3) Geographical Base (4) Chronological or
Temporal Base
(1) Qualitative Base:
When the data are classified according to some quality or attributes such as sex,
religion, literacy, intelligence etc…

(2) Quantitative Base:


When the data are classified by quantitative characteristics like heights, weights,
ages, income etc…

(3) Geographical Base:


When the data are classified by geographical regions or location, like states,
provinces, cities, countries etc…

(4) Chronological or Temporal Base:


When the data are classified or arranged by their time of occurrence, such as years,
months, weeks, days etc… For Example: Time series data.

Tabulation of table:
A) Parts of a table:

Construction of Statistical Table


A statistical table has at least four major parts and some other minor parts.
(1) The Title
(2) The Box Head (column captions)
(3) The Stub (row captions)
(4) The Body
(5) Prefatory Notes
(6) Foots Notes
(7) Source Notes
The general sketch of table indicating its necessary parts is shown below:

----THE TITLE----
----Prefatory Notes----

----Box Head----

----Row Captions---- ----Column Captions----


----Stub Entries---- ----The Body----

(1) The Title:


A title is the main heading written in capital shown at the top of the table. It must explain the
contents of the table and throw light on the table as whole different parts of the heading can be
separated by commas there are no full stop be used in the little.

(2) The Box Head (column captions):


The vertical heading and subheading of the column are called columns captions. The spaces
were these column headings are written is called box head. Only the first letter of the box head is
in capital letters and the remaining words must be written in small letters.

(3) The Stub (row captions):


The horizontal headings and sub heading of the row are called row captions and the space
where these rows headings are written is called stub.

(4) The Body:


It is the main part of the table which contains the numerical information classified with
respect to row and column captions.

(5) Prefatory Notes :


A statement given below the title and enclosed in brackets usually describe the units of
measurement is called prefatory notes.

(6) Foot Notes:


It appears immediately below the body of the table providing the further additional
explanation.

(7) Source Notes:


The source notes is given at the end of the table indicating the source from when
information has been taken. It includes the information about compiling agency, publication etc…

General Rules of Tabulation:

• A table should be simple and attractive. There should be no need of further explanations
(details).
• Proper and clear headings for columns and rows should be need.
• Suitable approximation may be adopted and figures may be rounded off.
• The unit of measurement should be well defined.
• If the observations are large in number they can be broken into two or three tables.
• Thick lines should be used to separate the data under big classes and thin lines to separate
the sub classes of data.

B) Types of table:

(1) Simple Tabulation or One-way Tabulation:


When the data are tabulated to one characteristic, it is said to be simple tabulation
or one-way tabulation.
For Example: Tabulation of data on population of world classified by one characteristic
like Religion is example of simple tabulation.

(2) Double Tabulation or Two-way Tabulation:


When the data are tabulated according to two characteristics at a time. It is said to
be double tabulation or two-way tabulation.
For Example: Tabulation of data on population of world classified by two characteristics
like Religion and Sex is example of double tabulation.

(3) Complex Tabulation:


When the data are tabulated according to many characteristics, it is said to be
complex tabulation.
For Example: Tabulation of data on population of world classified by two characteristics
like Religion, Sex and Literacy etc…is example of complex tabulation

Charting data:
We have discussed the techniques of classification and tabulation that help us in organizing the
collected data in a meaningful fashion. However, this way of presentation of statistical data does
not always prove to be interesting to a layman. Too many figures are often confusing and fail to
convey the message

One of the most effective and interesting alternative way in which a statistical data may be
presented is through diagrams and graphs. There are several ways in which statistical data may
be displayed pictorially such as different types of graphs and diagrams.

Types of diagrams:
Simple Bar Chart

A simple bar chart is used to represents data involving only one variable classified on
spatial, quantitative or temporal basis. In simple bar chart, we make bars of equal width but
variable length, i.e. the magnitude of a quantity is represented by the height or length of the bars.
Multiple Bar Charts

By multiple bars diagram two or more sets of inter-related data are represented (multiple
bar diagram facilities comparison between more than one phenomena). The technique of simple
bar chart is used to draw this diagram but the difference is that we use different shades, colors, or
dots to distinguish between different phenomena. We use to draw multiple bar charts if the total
of different phenomena is meaningless.

Component Bar Chart

Sub-divided or component bar chart is used to represent data in which the total
magnitude is divided into different or components.
In this diagram, first we make simple bars for each class taking total magnitude in that class and
then divide these simple bars into parts in the ratio of various components. This type of diagram
shows the variation in different components within each class as well as between different
classes. Sub-divided bar diagram is also known as component bar chart or staked chart.
Percentage Component Bar Chart

Sub-divided bar chart may be drawn on percentage basis. To draw sub-divided bar chart on
percentage basis, we express each component as the percentage of its respective total. In drawing
percentage bar chart, bars of length equal to 100 for each class are drawn at first step and sub-
divided in the proportion of the percentage of their component in the second step. The diagram so
obtained is called percentage component bar chart or percentage staked bar chart. This type of
chart is useful to make comparison in components holding the difference of total constant.

Pie Chart

Pie chart can used to compare the relation between the whole and its components. Pie
chart is a circular diagram and the area of the sector of a circle is used in pie chart. Circles are
drawn with radii proportional to the square root of the quantities because the area of a circle is
.
To construct a pie chart (sector diagram), we draw a circle with radius (square root of the
total). The total angle of the circle is . The angles of each component are calculated by the
formula.

Angle of Sector
These angles are made in the circle by mean of a protractor to show different components. The
arrangement of the sectors is usually anti-clock wise.

Frequency Distribution
A frequency distribution is a tabular arrangement of data into classes according to the size or
magnitude along with corresponding class frequencies (the number of values fall in each class).

Ungrouped Data or Raw Data:


Data which have not been arranged in a systemic order is called ungrouped or raw data.

Grouped Data:
Data presented in the form of frequency distribution is called grouped data.

Array:
The numerical raw data is arranged in ascending or descending order is called an array.

Example:
Array the following data in ascending or descending order 6, 4, 13, 7, 10, 16, 19.
Solution:
Array in ascending order is 4, 6, 7, 10, 13, 16, and 19
Array in descending order id 19, 16, 13, 10, 7, 6, and 4

Class Limits:
The variant values of the classes or groups are called the class limits. The smaller value of
the class is called lower class limit and larger value of the class is called upper class limit. Class
limits are also called inclusive classes.
For Example: Let us take the class 10 – 19, the smaller value 10 is lower class limit and larger
value 19 is called upper class limit.

Class Boundaries:
The true values, which describe the actual class limits of a class, are called class
boundaries. The smaller true value is called the lower class boundary and the larger true value is
called the upper class boundary of the class. It is important to note that the upper class boundary
of a class coincides with the lower class boundary of the next class. Class boundaries are also
known as exclusive classes.
For Example:

Weights in Kg No of Students

60 – 65 8

65 – 70 12

70 – 75 5

25

A student whose weights are between 60kg and 64.5kg would be included in the 60 – 65 class. A
student whose weight is 65kg would be included in next class 65 – 70.
Open-end Classes:
A class has either no lower class limit or no upper class limit in a frequency table is called
an open-end class. We do not like to use open-end classes in practice, because they create
problems in calculation.
For Example:

Weights (Pounds) No of Persons

Below – 110 6

110 – 120 12

120 – 130 20

130 – 140 10

140 – Above 2
Class Mark or Mid Point:
The class marks or mid point is the mean of lower and upper class limits or boundaries. So
it divides the class into two equal parts. It is obtained by dividing the sum of lower and upper
class limit or class boundaries of a class by 2.
For Example: The class mark or mid point of the class 60 – 69 is 60+69/2 = 64.5

Size of Class Interval:


The difference between the upper and lower class boundaries (not between class limits) of
a class or the difference between two successive mid points is called size of class interval.

Вам также может понравиться