Вы находитесь на странице: 1из 26

Week 2. 11 March, 2013.

Chapter 4
Elementary Statistics

Descriptive Statistics
Planning and designing the study Collecting data Describing and summarising data Presenting data and summaries as information

Planning and designing a study


scope - how broad is the study? Eg. For Sydney, NSW or Australia? purpose E.g. all stores in NSW or a single store in Sydney be used to measure typical sales? resources - what time, money, and staff are available for collecting data? Is collecting data expensive? type of data - E.g. prices to the nearest cent or preferences? accuracy - trade accuracy of data against the cost of collection data sources - collect the data (primary) or has it been collected for other purpose (secondary) reliability -E.g. are the people being interviewed honest? method of collection - E.g. survey and how, e.g. interview or questionnaire? validation - E.g. undertake a pilot survey

Types of data
Nominal or categorical (qualitative) - e.g. name cannot rank or measure difference Ordinal e.g. position in FTSE or Dow Jones can rank but not measure difference Cardinal (quantitative) e.g. age of person can rank and measure difference (e.g. twice as old) Discrete (e.g. name of country, price) Continuous (e.g. time, weight)

Range of values
Variable Colours Possible range of values [Red, Green, Blue]

Day in month Car mileage

[1,31] [0, +]

Bank balance

[-, +]

Sampling
All possible data is the population (N items) A sample is a subset of a population (n<N items) A list of items in a population is a sample frame (e.g. electoral register) A sample should represent the population (i.e. be unbiased)

Types of sampling
Judgemental (non-random)- e.g. expert opinion Quota (non-random) items selected from subsets of population (e.g. males and females) Random -items selected at random Stratified - quota sampling with random selection Cluster/multistage - random samples selected from random clusters (e.g. electoral districts) with or without replacement (sampled item is returned or not returned to sample frame)

Sampling methods
Observation e.g. traffic survey, 2000 cars pass this road from 8 9 AM Longitudinal e.g. social trends Experiments e.g. drug trials Surveys e.g.preferences for products - interviews - questionnaires - panels e.g. focus groups, Delphi

Survey errors
Coverage error sample frame is inadequate (e.g. an out-ofdate phone directory, people may be excluded from the survey). Non-response error not everyone responds to surveys so bias may occur, follow up non-responses by further visits, telephone calls, letters, email, etc. Sampling error cost limits sample size and chance dictates who or which item is included in the sample, so we make statements about the margin of (sampling) error (e.g. the results of a poll will be within 2 percentage points of the actual votes) Measurement error measurement errors result from poorly designed surveys or questionnaires (e.g. badly worded questions) or from incorrectly calibrated instruments. Measuring devices must be calibrated before use and checked during and after use, surveys and questionnaires should be well designed and structured and validated by a pilot study (i.e. small scale trial to identify problems).

Survey guidelines
Ask a series of related questions in a logical sequence (respondents loose interest if questions are presented at random ) Keep questions brief, simple and unambiguous (if respondents do not understand they may give convenient/untrue answers) Avoid hypothetical and conditional questions (i.e. avoid questions such as if you won the lottery would you: pay off your mortgage, buy a new car or have a holiday?. Respondents may not have considered the possibilities.) Avoid leading questions (i.e. avoid questions such as do you agree that broadsheets report news more accurately than tabloids?. Respondents may conform than give honest opinions) Avoid vague questions (i.e. avoid questions such as do you usually drink more wine or beer?. Respondents may drink neither. Does the more refer to glasses, alcoholic content,etc?) Ask positive questions and avoid apologies (i.e. avoid questions such as I hope you dont mind me asking but do you usually buy a daily paper?, just ask Did you buy a daily paper today?)

Presenting data in tables


Always give the table a title If the table is one of several in a report, essay, book, etc clearly number the table for cross-referencing. Use borders, underlining, shading, etc to structure the table and to make it easy to read Layout the table neatly with columns and rows in line Distinguish between the title, row and column headings and the main body of data Justify numbers Use footnotes to explain abbreviations and conventions State any sources from where data was drawn

Example table
Product Sirloin steak (per 500g) Chicken breast (per kg) Heineken cans (4 440 ml) Coca Cola (litre) Mars bars 5 pack (5 65 g) Colgate total (100 ml) Gillette Blue 2 (fixed blade) 10 pack Haagan-Daz ice-cream (500 ml) Olive oil (per 500g) Instant coffee granules (100g) Kelloggs cornflakes (750g) Tuna (185 g) Tropicana orange juice (1 litre) Total basket UK 4.80 6.99 3.38 1.18 1.09 1.79 2.67 3.69 1.85 1.28 1.38 0.47 1.99 32.56 How shopping costs compare France Germany 5.03 5.03 5.00 6.10 1.84 1.09 0.81 0.91 1.24 0.91 1.45 1.22 2.25 2.90 2.04 2.44 1.77 1.37 1.42 2.59 2.20 1.40 0.36 0.45 1.32 1.22 26.73 27.63

Source: Sunday Times 8/10/2000 Notes: UK prices based on Tesco (where identical items not available the nearest equivalent was chosen). All currencies converted to sterling on day of survey.

Presenting data - charts


To draw any type of chart first put the data into a table Always give the chart a title If the chart is one of several in a report, essay, book, etc clearly number the table for cross-referencing. Draw neatly - use a ruler & graph paper or a spreadsheet. Use borders, underlining, shading, etc Each axis should have a title, show the units of measurement being used and be clearly labelled with text or numbers. If more than one set of data are shown on a chart then use a legend or key and different symbols or colours. Always left justify or use the same format for all numbers. If necessary put footnotes for abbreviations and conventions. If appropriate state the source from which the data was drawn

Charts
Scatter-grams Graphs Pie charts Bar charts Pictograms Histograms Ogives Frequency polygons Lorenz curves

Scatter-grams
Month 1 2 3 4 5 6 7 8 9 10 11 12 Price 16 18 20 25 28 30 28 24 24 22 25 25 Sales 280 300 300 195 155 150 160 250 245 280 200 210
Sales (units) 500 450 400 350 300 250 200 150 100 50 0 0 5 10 15 20 25 Price ('s) 30 35 40 45 50

Sales against price

Sale s (units ) 400 350 300 250 200 150 100 15 16 17 18 19 20

Sales against price

21 22

23 24

25 26 27

28 29 30

31 32

33 3 4 35

Price ('s )

Graphs
Month 1 2 3 4 5 6 7 8 9 10 11 12 Price 16 18 20 25 28 30 28 24 24 22 25 25 Sales 280 300 300 195 155 150 160 250 245 280 200 210

Sales (pairs) 350 300 250 200 150 100 50 0 0 5

Sales at different prices

10

15

20

25

30

35

Prices ('s)

Price and sales over time


35 30
Price ('s)

Price

Sales

350 300
Sales (pairs)

25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 Month

250 200 150 100 50 0

Pie charts
Department Clothing Hardware Food Total =
Sales by Department

Sales 25000 9000 37000 71000

% sales 35 13 52 100

angle 126 46.8 187.2 360

Sales by Department

Food 52%

Clothing 35%

Clothing 35% Food 52%

Hard w are 13%

Ha rdw are 13%

Bar charts
QUARTER Clothing Hardware Food
Sales ('s) 14000 12000 10000 8000 6000 4000 2000 0 1 2 Quarter 3 4

1 900 1100 8000

2 1000 1400 8000

3 1200 900 9000

4 800 700 12000

Departmental Sales in each quarter


Clothing Hardware Food

Sales 's 40000 35000 30000 25000 20000 15000 10000 5000 0 Clothing

Sales by department

Q4 Q3 Q2 Q1

Hardware Department

Food

Picto-grams

Millions of hectares under trees in each region of the UK. (Source: UK Forestry Commission 2000)

Millions of hectares under trees in each region of the UK.


(Source: UK Forestry Commission 2000)

Grouped data
Daily sales of own brand baked beans in a London store:

142 136 160 116 124 96

108 100 156 132 152 164

42 108 180 102 52 106 102 88 50 94 164 130 44 88 168 105 84 90 152 114 80 82 98 121 74 60 138 150 56 58 60 126 76 62 112 156 90 64 163 47 82 60 120 183 88 64 181 65

Classes and tally


Class 1 2 3 4 5 6 7 8 Class range Lower boundary Upper boundary 40 - 59 39.5 59.5 60 -79 59.5 79.5 80 - 99 79.5 99.5 100 - 119 99.5 119.5 120 - 139 119.5 139.5 140 159 139.5 159.5 160 - 179 159.5 179.5 180 - 199 179.5 199.5 Tally //// // //// //// //// //// // //// //// //// /// //// / //// ///

Frequency, % frequency and cumulative frequency distributions


Frequency Class 1 2 3 4 5 6 7 8 Class range 40 - 59 60 -79 80 - 99 100 - 119 120 - 139 140 159 160 - 179 180 - 199 7 9 12 10 8 6 5 3 Cumulative Frequency 7 16 28 38 46 52 57 60 %Frequency 11.67 15.00 20.00 16.67 13.33 10.00 8.33 5.00 % Cumulative Frequency 11.67 26.67 46.67 63.33 76.67 86.67 95.00 100.00

Histogram
Weekly sales of baked beans
Frequency
14 12 10 8 6 4 2 0
40 - 59 60 -79 80 - 99 100 - 119 120 - 139 140 159 160 - 179 180 - 199

weekly sales

Ogives
Less than percent ogive for sales of baked beans in London store

Cumulative Frequency (%)

100.00 80.00 60.00 40.00 20.00 0.00 40 60 80 100 120 Sales 140 160 180 200

Frequency polygon
Class range 20 - 39 40 - 59 60 -79 80 - 99 100 - 119 120 - 139 140 159 160 - 179 180 - 199 200 - 219 Midpoint Frequency 30 0 50 7 70 9 90 12 110 10 130 8 150 6 170 5 190 3 210 0

Lorenz curves
% of population 45 20 15 10 5 3 2 % of Cumulative % Cumulative % income of population of income 0 0 5 45 5 7 65 12 8 80 20 15 90 35 17 95 52 23 98 75 25 100 100

Lorenz Curve
Cumulative % of income
100 80 60 40 20 0 0 20 40 60 80 100 Cumulative % of population