Вы находитесь на странице: 1из 40

ST1CMT01

BASIC STATISTICS
AND
INTRODUCTORY PROBABILITY THEORY
Topics for bridge course

 Introduction to Statistics, Population and Sample, Collection of Data, Census and


Sampling
 Methods of Sampling Simple Random Sampling (with and without replacement)
stratified sampling systematic sampling (Method only),
 Types of data quantitative, qualitative, Classification and Tabulation,
 Diagrammatic representation - Bar diagram, Pie diagram;
Statistics

 Statistics is a very broad subject that can apply to different fields.


 Statistics is the methodology for collecting, analyzing, interpreting and drawing
conclusions from information
 Statistics is the science of gaining information from numerical and categorical data.
 Statistical methods can be used to find answers to the questions like:
 What kind and how much data need to be collected?
 How should we organize and summarize the data?
 How can we analyze the data and draw conclusions from it?
 How can we assess the strength of the conclusions and evaluate their uncertainty?
That is, statistics provides methods for
1. Design: Planning and carrying out research studies.
2. Description: Summarizing and exploring data.
3. Inference: Making predictions and generalizing about phenomena represented by the
data.

Statistics in practice is applied successfully to study the


 effectiveness of medical treatments,
 the reaction of consumers to television advertising,
 the attitudes of young people towards marriage, and much more.
Limitations

 Studies only numerical facts


 Does not study individual cases
 Results are true only on an average
 Does not reveal the entire story of the problem
 Can be misused
 Only a means to finding the solution, not a solution to the problem
Population and Sample

 Two basic concepts of statistics

 Population can be characterized as the set of individual persons or objects in which an


investigator is primarily interested during his or her research problem

 Sometimes wanted measurements for all individuals in the population are obtained, but
often only a set of individuals of that population are observed; such a set of individuals
constitutes a sample
Defn : - Population is the collection of all
individuals or items under consideration in
a statistical study.

Defn : - Sample is that part of the


population from which information is
collected.
Statistical Investigation

 Planning the enquiry


 Collection of data
 Organization of Data
 Presentation of the organized data
 Analyzing the presented data
 Interpretation of the collected data
Collection of data

 Two types - Primary & Secondary


 Primary Data - collected by the investigator for the first time for specific purpose, and they
are original in character
 Expensive & Time consuming
 Methods of collection of primary data
1. Direct personal investigation
2. Indirect oral investigation
3. By schedules & Questionnaires
4. Local Correspondents
Direct personal investigation

 Situations
 Area of investigation is limited
 Higher degree of accuracy needed
 Results have to be kept confidential
 Area of investigation is complex

 Merits
 Original data can be collected
 Reliable & Accurate information
 Uniformity in the collection of data

 Demerits
 Cannot be used when area of investigation is small
 More expensive and time consuming
Indirect oral investigation

 Applied to
 Field of investigation is very vast
 Informants are indifferent or unwilling to supply the infmn

 Data are not collected directly from the persons


 Collected from third party (witnesses).
 Adv: -
 Field of investigation is very large
 Economical and saves time
 Simple & convenient

 Disadv:-
 Results are not always true
 Informants are not serious in furnishing information
By schedules & Questionnaires

 A list of questions called Questionnaires are prepared


 Information collected from diff sources
 Success depends on proper drafting
 Schedules – handled by the interviewer who records the replies to questions in the
questionnaire
Local Correspondents

 Investigator appoints local agents or correspondents in different places to collect


information

 They collect & transmit the information to the central office where the data are
processed

 Eg:- Newspaper agencies

 Cheap & suitable for extensive investigation

 May not ensure accurate results


Secondary Data

 Data that is already collected by someone else and is utilized by the investigator for his
purpose.
 Usually in the shape of finished products
 Less expensive & less time consuming
 May be collected from 2 sources
 Published Sources
 Unpublished sources
Published Sources

 Official publications of the Central, State & local governments


 Official publications of the Foreign Governments & International Bodies like UNO
 Reports and publications of Banks, Cooperative Societies
 Technical trade journals like Economica, Commerce
 Reports submitted by economists, research scholars

Unpublished Sources

 All statistical materials is not always published


 Unpublished data such as records maintained by Govt. and private offices, studies made by
research institutions, scholars etc
Census & Sampling

 Census is a method of collecting data in which information are collected from every
individual of the population
 Sampling is the process of obtaining information about an entire population by examining
only a part of it.
Census Sampling
Collect information from all the Collect information from only a
units representative part
Data collection is impossible in It is possible in all situations
certain situations
Has a merit of accuracy & May not be accurate & adequate
adequacy in some situations
Meant to study the population Method for drawing conclusions
about the population
Time consuming & costly Less time consuming & less
expensive
Sampling Methods

 A probability sampling scheme is one in which every unit


in the population has a chance (greater than zero) of being
selected in the sample, and this probability can be
accurately determined. When every element in the
population does have the same probability of selection, this
is known as an 'equal probability of selection' (EPS)
design.

 A non-probability sampling - Any sampling method where


some elements of population have no chance of selection
(these are sometimes referred to as 'out of coverage’) or
where the probability of selection can't be accurately
determined.
Sampling Methods

Probability/Random Non Probability/Non random


Sampling Sampling

Simple Random Complex Random Purposive /Judgement

Stratified Sampling Convenience Sampling

Systematic Sampling Snowball Sampling

Cluster Sampling
Simple Random Sampling:

 Every element has an equal chance of getting selected to be the part of sample.

 It is used when we don’t have any kind of prior information about the target population.

 For example: Random selection of 20 students from class of 50 student. Each student
has equal chance of getting selected. Here probability of selection is 1/50
Stratified Sampling

 This technique divides the elements of the population into


small subgroups (strata) based on the similarity in such a
way that the elements within the group are homogeneous
and heterogeneous among the other subgroups formed.

 And then the elements are randomly selected from each of


these strata. We need to have prior information about the
population to create subgroups.
Cluster Sampling

 Our entire population is divided into clusters or


sections and then the clusters are randomly selected.
 All the elements of the cluster are used for sampling.
 Clusters are identified using details such as age, sex,
location etc.
NON PROBABILITY SAMPLING
Convenience Sampling Purposive Sampling

 Here the samples are selected based on the • This is based on the intention or the purpose of study.
availability. • Only those elements will be selected from the
 This method is used when the availability of sample population which suits the best for the purpose of our
is rare and also costly. study.
 So based on the convenience samples are
selected.

Snowball Sampling
• This technique is used in the situations where the population is completely
unknown and rare.
• Therefore we will take the help from the first element which we select for
the population and ask him to recommend other elements who will fit the
description of the sample needed.
Classification & Tabulation

 Classification is a process of Objectives / purposes of classifications


arranging things or data in groups i) To simplify and condense the large data

or classes according to their ii) To present the facts easily in understandable form

resemblances and affinities. iii) To allow comparisons


iv) To help to draw valid inferences
 When data are classified they give v) To relate the variables among the data
a summary of the whole vi) To help further analysis
information. So it can be a process vii) To eliminate unwanted data viii) To prepare tabulation
of summarizing the data.
Types of Classification

a) Geographical Classification

 In geographical classification, the classification is


based on the geographical regions.

b) Chronological Classification

• If the statistical data are classified according to the time


of its occurrence, the type of classification is called
chronological classification.
 c) Qualitative Classification
i) Simple classification:
 In qualitative classifications, the • If the classification is done into only two classes then
data are classified according to the classification is known as simple classification.
presence or absence of attributes Ex: a) Population in to Male / Female
in given units. b) Population into Educated / Uneducated

 Thus, the classification is based on


some quality characteristics /
ii) Manifold classification:
attributes.
• In this classification, the classification is based on more

 Ex: Sex, Literacy, Education, Class than one attribute at a time.


grade etc.
d) Quantitative Classification:

• In Quantitative classification, the


classification is based on quantitative
measurements of some characteristics,
such as age, marks, income, production,
sales etc.
• The quantitative phenomenon under
study is known as variable and hence this
classification is also called as
classification by variable.
Tabulation

 Tabulation may be defined, as systematic arrangement of data is column and rows.

 It is designed to simplify presentation of data for the purpose of analysis and statistical
inferences.

Differences between Classification and Tabulation


1. First data are classified and presented in tables; classification is the basis for tabulation.
2. Tabulation is a mechanical function of classification because is tabulation classified data are
placed in row and columns.
3. Classification is a process of statistical analysis while tabulation is a process of presenting
data is suitable structure.
a) Simple table: Data are classified b) Two-way table: Classification is based
based on only one characteristic on two characteristics
Frequency Distribution

 Frequency distribution is a table used to organize the data.


 The left column (called classes or groups) includes numerical intervals on a variable under
study.
 The right column contains the list of frequencies, or number of occurrences of each
class/group.
 Definition:-
 A frequency distribution is a statistical table which shows the set of all distinct values of the
variable arranged in order of magnitude, either individually or in groups with their corresponding
frequencies.
 A frequency distribution can be classified as
a) Series of individual observation
b) Discrete frequency distribution
c) Continuous frequency distribution

Series of individual observation

Series of individual observation is a series where the items are listed one after the each
observation. For statistical calculations, these observation could be arranged is either
ascending or descending order. This is called as array.
Discrete (ungrouped) Frequency Distribution

If the data series are presented in such away that


indicating its exact measurement of units, then it is
called as discrete frequency distribution.

Assume that a survey has been made to know


number of post-graduates in 10 families at random;
the resulted raw data could be as follows. 0, 1, 3, 1,
0, 2, 2, 2, 2, 4
Continuous frequency distribution
(grouped frequency distribution)

Continuous data series is one where the


measurements are only approximations
and are expressed in class intervals within
certain limits.

Marks obtained by 20 students in


students‟ exam for 50 marks are as given
below – convert the data into continuous
frequency distribution form.
 Inclusive Method – Inclusive class intervals are those whose lower and upper limits are
included
1–10, 11–20, 21–30, 31–40 etc..

 Exclusive Method - This method is used for those series in which the upper limit of one
class becomes the lower limit of the next class. It is called as exclusive series because
the frequencies of the upper limit of a class interval are not included in that particular
class
for example, 0–10, 10–20, 20–30 and so on.
Questions to be done in the class

1. Construct a frequency distribution with the suitable class interval size ( take as 10) of
marks obtained by 50 students of a class, which are given below: (use exclusive
method)
23, 50, 38, 42, 63, 75, 12, 33, 26, 39, 35, 47, 43, 52, 56, 59, 64, 77, 15, 21, 51, 54, 72, 68, 36,
65, 52, 60, 27, 34, 47, 48, 55, 58, 59, 62, 51, 48, 50, 41, 57, 65, 54, 43, 56, 44, 30, 46, 67, 53

2. q2
Diagrammatic representation

 Diagrams play an important role in statistical data presentation.


 Diagrams are nothing but geometrical figures like lines, bars, circles, squares, etc.
 Diagrammatic data presentation allows us to understand the data in an easier manner.
 Advantages
 Easy to understand • Limitations
 Simplified Presentation  Provides vague ideas for those who
 Reveals hidden facts seek exact idea of the problem
 Quick to grasp  Limited information
 Easy to compare
 A possibility of misuse
 Universally accepted
 Restricts further data analysis
Types of Diagrams

1. Line Diagram
 In a line diagram, you can represent
different values using lines of varying
lengths. Further, these lines are either
horizontal or vertical.
 Also, there is a uniform gap between
successful lines. You can use this when
the number of items is very large.
 The income of 10 workers in a particular
week was recorded as given below.
Represent the data by a line diagram.
Bar Diagram

 Represent the following


data using a bar
diagram:
 Multiple Bar Diagram
Component or Sub-Divided Bar Diagram
Circular or Pie Chart

 A pie chart consists of a circle in which the radii divide the area into sectors.
 Further, these sectors are proportional to the values of the component items under
investigation.
 Also, the whole circle represents the entire data under investigation.
 Steps to draw a Pie Chart
i. Express the different components of the given data in percentages of the whole
ii. Multiply each percentage component with 3.6 (since the total angle of a circle at
the center is 360°)
iii. Draw a circle
iv. Divide the circle into different sectors with the central angles of each component
v. Shade each sector differently
Represent the following data, on India’s exports (Rs. in Crores)
by regions from April to February 1997.

Вам также может понравиться