Академический Документы
Профессиональный Документы
Культура Документы
CHAPTER ONE
INTRODUCTION TO STATISTICS
WHAT IS STATISTICS?
Statistics is the science of collecting, organizing, presenting, analyzing and interpreting data. Based on the analyzed
data, conclusion can be drawn on the characteristics of the population and decision can be made for future action. The
steps of statistical analysis involve collecting information, evaluating it, and drawing conclusions. The information might
be:
Statisticians provide crucial guidance in determining what information is reliable and which predictions can be trusted.
They often help search for clues to the solution of a scientific mystery, and sometimes keep investigators from being
misled by false impressions. Statisticians work in a variety of fields, including medicine, government, education,
agriculture, business, law and finance.
USES OF STATISTICS
Statistics are invaluable to economist, administration, analysts, manager and even politicians. It can be used whether
for business purposes or administrative or research purposes.
Statistics helps policy-makers and decision-makers to monitor performance of the various sectors in economy, and by
studying their tends finally assist the government in planning the economy.
Managers use statistics by studying past trends and pattern to develop methods that should be taken for future actions
and methods to overcome future possible problem.
TYPES OF STATISTICS
1. Descriptive Statistics
This kind of statistics deals with developing and utilizing technique for careful collection and effective
presentation of the data collected. The aim is to study the characteristics of the data. Therefore the study
involves mainly in the collection, organizing, presentation and description of the numerical information.
1
CHAPTER 1: INTRODUCTION TO STATISTICS STA404
The art of collecting and organizing the data are the basic steps of statistics before any presentation can be
done. Data can be presented by using graphs or charts. Interpretation of the graphs and charts will help us
evaluate the information presented to describe the characteristics of the collected.
2. Inferential Statistics
This kind of statistics deals with the tools and technique of statistics that are used to analyzed the data and to
make prediction, estimates or decisions by drawing conclusions from the data. Inferential statistics is used to
determine how far our decision about any information is true and acceptable. It is also used to estimate or draw
inferences about the attitudes or characteristics of the whole population based on sample. It encompasses all
types of decision. In principle it enables an optimum decision to be made for any problem especially if it relates
to the following:
i) Determining whether any apparent characteristics of a situation are genuine or are merely the result of random
happening.
ii) Assessing the problem magnitude of numerical quantity and determining the reliability of such assessment.
iii)Interpreting past patterns of variations to predict future happenings.
Like all profession, also statisticians have their own keywords and phrases to ease a precise communication. However,
one must interpret the results of any decision making in language that is easy for the decisions-maker to understand.
Otherwise, he/she does not believe in what you recommend, and therefore does not go into the implementation phase.
This lack of communication between statisticians and the managers is the major barrier for using statistics.
Term Meaning
Element An element is an object on which a measurement is taken.
Population A population is a collection of element of interest or the measurements obtained
from all individuals or objects of interest.
Sample A sample is a portion or subset of the total group or population of interest.
Census A census is a study of the entire population.
Sample Survey Is a study of the entire population.
Parameter A parameter is a numerical descriptive measure of the population. Parameters are
used to represents a certain population characteristics.
For example, the population means µ is a parameter that is often used to indicate
the average value of a quantity.
Statistics A statistics is a numerical descriptive measure taken from sample. It is used to give
information about unknown values in the corresponding population.
For example, the average of the data in a sample is used to give information about
the overall average in the population from which that sample was drawn.
2
CHAPTER 1: INTRODUCTION TO STATISTICS STA404
Variable A variable is measure a characteristics of the population under study which may take
different values, such as weight, gender since they are different from individual to
individual.
Data A data is an observation or information that have been recorded or collected.
Random Random is the choice of a single item from a group if every item in the group has the
same chance of being selected as any other item.
Pilot Study A pilot, or feasibility study is a small experiment designed to test logistics and gather
information prior to a large study, in order to improve the quality and efficiency. The
pilot study provide vital information on the severity of proposed procedures.
Example 1
In the automobile industry, customer service is a crucial factor affecting car sales. The management of a reputed
automobile company is interested in determining the level of customer satisfaction with the service provided by the
company’s service center. The company has altogether 60 service centers in the Klang Valley. Six service centers was
selected for the study.
Solution:
Example 2
The local cable television company is planning to add one channel to its basic services. There are five channels to
choose from, and the company would like to have some input from 2000 subscribers. There are about 20,000
3
CHAPTER 1: INTRODUCTION TO STATISTICS STA404
subscribers in Malaysia and the company knows that 35% of the subscribers are college students, 45% are white-
collar workers, 15% are blue-collar workers and 5% are others.
Solution:
TYPE OF VARIABLE
A variable is a measurable factor, characteristic, or attribute of an individual or a system in other words, something that
might be expected to vary over time. For example, variable of interest maybe the absenteeism among students,
household income of Malaysian citizen and sales of cars.
Qualitative Example:
• Measured with non-numerical scale • Gender, type of cars, religion
• Yields categorical response • Are you a Malaysian?
The answer is only 'Yes' or 'No'
Quantitative Discrete
• Measured on numerical scale • Numerical response which arises from
Variable • Yields numerical response a counting process
• Example: How many children do you
have?
Continuous
• Numerical response which arises from
a measuring process
• Example: How tall are you ?
What is your weight?
The level of measurement of a variable in mathematics and statistics is a classification that is used to describe the
nature of information contained within numbers assigned to objects and therefore within the variable. Level of
measurement of the data is an important factor in determining which procedure to use. Four level of measurement:
nominal, ordinal, interval and ratio.
Nominal Scale
4
CHAPTER 1: INTRODUCTION TO STATISTICS STA404
Variables that are measured only nominally are also called categorical variables. In this type of measurement, names
are assigned to objects as labels. The data cannot be arranged in ordering scheme (from low to high). The nominal
scale is the lowest in the level of data measurement scale. Variables measured at a nominal level include gender,
marital status, race, religious affiliation, college major, and birthplace. Other examples include: geographical location
in a country, telephone access code, or the model of car.
Example:
Male Female
Newspapers
Magazines
Television
Internet
Other
ORDINAL SCALE
The numbers are called ordinals when the numbers assigned to objects represent the rank order (1 st, 2nd. 3rd etc) of
the entities measured. Comparisons of greater and less can be made, in addition to equality and inequality. However,
operations such as conventional addition and subtraction are still meaningless. The ordinal scale is a level higher than
the nominal scale.
Example:
1. SPM
2. Diploma
3. Bachelor
4. Master
5
CHAPTER 1: INTRODUCTION TO STATISTICS STA404
5. PhD
INTERVAL SCALE
The interval is like ordinal level but with the additional property that the different between two data values is meaningful.
Data at this level do not have a natural zero starting point. Example of interval scale is temperature. 0°F doesn’t mean
“no heat” and 40°F is not twice as hot as 20°F.
RATIO SCALE
Ratio scale is strongest scale of measurement. Ratio scale contains a meaningful zero (absolute zero point) which
represent the absence of the phenomena being measured. Example of ratio measurement is time taken to study per
day, the monthly amount spent for prepaid top up and number of cigarette per day.
Example:
Example 3
Identify the type of variable and level of measurement for each statement.
i) Identity number
ii) Social class of resident
iii) Laptop price
iv) Amount of time spent shopping every week
v) Favourite shopping spot
vi) Number of UiTM students
vii) Brand of hand phone most preferred
Solution:
6
CHAPTER 1: INTRODUCTION TO STATISTICS STA404
SAMPLING METHODS
Sampling is an effort to estimate the characteristics of a population by studying a small portion of the items in the
population. The process can be done by selecting a sample that can represent the population as accurate as possible.
Sampling technique is the sampling processes that can be done by selecting a sample that can be represented must
the population as accurate as possible. The sampling frame is the frame will consist of all items in the population. The
frame must be complete that is no item of the population should be left out and it should not be defective because it is
out of date or contains inaccurate or duplicate items, or inadequate because it does not cover all the categories required
to be included in the investigation. For example, the sampling frame are telephone directories, town maps provide a
useful frame and some other lists.
Probability sampling techniques is used when a researcher plans to make inferences about population. The sample
is selected based on known probabilities.
a) A simple random sample is selected from the population in such a way that each item has the same chance of
being selected as sample by using chance method or random number.
b) In systematic sampling, we divide the population size (N) by the sample size (n) to obtain the range k
(k=N/n). An element is then randomly selected from the first k elements in the list. Suppose the rth element is
selected, then every kth element in the population is sampled beginning with the rth element. This means that
the element chosen are element rth, (r+k)th, (r+2k)th, (r+3k)th,….and so on until sample size n is obtain.
c) Researchers select stratified samples by dividing the population into groups according to some characteristic
that is important to the study, then sampling from each group.
d) Researchers select cluster samples by using intact groups called clusters. Cluster sampling is used when the
population is large.
Non-probability Sampling techniques are used when generalization concerning the population is not required or
when sampling frames are difficult to obtain.
7
CHAPTER 1: INTRODUCTION TO STATISTICS STA404
a) Convenience sampling is referred as accidental sampling. It is not normally representative of the target
population because sample units are only selected if they can be accessed easily and conveniently. Basically,
respondent are selected because they happen to be in the right place at the right time.
b) Judgmental sampling technique is used when a sample is taken based on certain judgments about the overall
population. Judgmental sampling is subject to the researcher’s biases and is perhaps even more bias than
convenience sampling.
c) Snowball sampling is a method in which a researcher indentifies one member of some population of interest,
speaks to him, and then asks that person to identify others in the population that the researchers might speak to.
This person is then asked to refer the researcher to yet another person and so on.
Quota Sampling technique divide sample into quotas, the quotas indicating the number of people to be interviewed,
but leaving the choice of the actual respondent to the interviewers.
TYPE OF DATA
1. PRIMARY DATA
Primary data are data may be expressly collected for a specific purpose. Such data is known as primary data.
The data are collected at first hand in response to specific question and to satisfy specific purpose of a statistical
inquiry and is not analyzed yet. Data are collected by the investigators himself. When the primary data has been
collected, processed and analyzed, then the published set of data becomes a secondary data.
For examples, the Ministry of Education is interested in knowing the attitudes of students towards studying. To
get such information, the ministry will form a body to ask students towards concerning the matter. Suitable
question are designed to fulfill their requirement. This question will be answered by students and the vocabulary
being used should be suitable to the levels of the people answering the question to avoid misunderstandings.
ADVANTAGES DISADVANTAGES
i) Typical information wanted is obtained, i) Inconvenient, required more time, effort,
and investigators is aware of any limitations manpower and money.
they may contain as he knows the conditions
under which they are collected.
8
CHAPTER 1: INTRODUCTION TO STATISTICS STA404
2. SECONDARY DATA
These kinds of data consists of figures and information which were collected originally to satisfy a particular
inquiry but have been used now, at second hand as the basis for a different inquiry by another person. In other
word, secondary data are data that is taken from other investigator’s collection of figure. Often such data are
collected for some other purpose. It is impossible for users of secondary data to have through understanding of
the background as the original investigators and thus may not be aware of its limitation. Such data must be used
in great care because it not may give the exact kind of information wanted and the data may not be in the most
suitable form.
ADVANTAGES DISADVANTAGES
i) More convenient(required less time, effort i) Transcription error
and money.
ii) May not meet our specific needs and objectives
ii) Data help you decide what further research
of current research.
needs to be done.
iii) Not all is readily available or expensive.
Data collection is important because analysis and conclusion rely on it. That is why the analysis and validity of the data
depends upon the contents and how the data is collected. Three important aspects in choosing the right method
collecting data or doing research are:
i) How to choose a respondent
ii) How to contact the selected respondent
iii) What information needed from the respondent?
There are a few methods of getting information from a sample. The best methods for a certain conditions depend on:
i) Budget available for the research; especially the amount allocated for field work.
ii) Time allocation for the research. This is important so that the research can be finished on time.
iii) Accuracy of the result needed.
iv) The distribution of sample needed. The best methods are needed in handling sample which is widely distributed
to save expenses.
a) Face-to-face interview
b) Telephone interview
c) Direct questionnaire (questionnaires are distributed and collected personally)
d) Mail or postal questionnaire (questionnaires are sent and received back through the post)
e) Direct observation (respondents are observed and data recorded)
9
CHAPTER 1: INTRODUCTION TO STATISTICS STA404
In an observational study, the researcher merely observes what is happening or what has happened in the past and
tries to draw conclusions based on these observations.
In an experimental study, the researcher manipulates one of the variables and tries to determine how the manipulation
influences other variables.
Review Exercises 1
a) A study of statistics can be divided into two sections: qualitative and quantitative methods
b) Inferential statistics consists of methods dealing with collection, tabulation, summarization and
presentation of data
c) A population is a collection of individuals that the researcher wishes to study.
d) Cumulative Grade Point Average (CGPA) score of a student’s is a qualitative variable.
e) When constructing questionnaires, a researcher must ensure that the questions are related to and
satisfy the objectives of the research.
10
CHAPTER 1: INTRODUCTION TO STATISTICS STA404
c) Census study
d) Secondary data
e) Inferential Statistics
f) Discrete random variable
11