Вы находитесь на странице: 1из 31

Statistics 1

Quantitative data analysis learning outcomes


Explain and put to effective use a number of techniques appropriate for exploratory numerical data analysis Critically interpret and evaluate the results of such analyses Understand the nature of variables and levels of measurement

Effectively describe a single variable through measures of dispersion, central tendency and graphical depiction
Explain the nature of association - how it can be measured and the relationship between pairs of variables Understand the dangers of sampling error and the limitations of statistical data and its analysis

SPSS learning outcomes


You will also be expected to be able to use SPSS to effectively: Code and input questionnaire or other statistical data Transform and recode data into new variables as appropriate

Produce univariate and bivariate statistical analysis of any given data set
Produce graphical depictions of any given data set

Edit and transfer computer output to word documents for the production of professional reports

Assessment

Manipulation and presentation of data based on a given SPSS data set

Suggested reading
Field, A. (2009). Discovering statistics using SPSS. Sage: London.

Other textbooks are available...


Dancey, C. & Reidy, J. (2008). Statistics without maths for psychology: using SPSS for Windows. London: Prentice Hall. Acton, C. & Miller, R. (2009). SPSS for Social Scientists. London: Palgrave Macmillan.

Social sciences and statistics


Why use statistics?

To answer research questions To test predictions To understand and explain the world in an efficient way (1 + 1 + 1 + 1 + 1) To build models we can use to make inferences/predictions about the world

The Research Process

Basic Terms

Measurement Data Variable the word that describes what has been measured, e.g. weight, gender, happiness Sample phenomena from which data has been collected Population all possible data if infinite time and access was available Experiment

Measurement
Assignment of a number to something

To establish relationships between variables, researchers must observe the variables and record their observations. This requires that the variables be measured.
There are lots of different types of data each has certain rules attached. For example, a person can only be living or dead, an Olympic medal winners can only be First, Second or Third, but length can be almost number

Confusing? Next week, we will discuss these level of measurement in detail

Data
The measurements obtained in a research study are called the data (plural of datum). The goal of statistics is to help researchers organize and interpret data.

Variables
A variable is a characteristic or condition that can change or take on different values. Most research begins with a general question about the relationship between two variables for a specific group of individuals, e.g. eating breakfast and concentration

Often we are interested in one particular variable (we call this the dependent variable) and how other variables (independent variables) are related to it.
How are crime rates in a city (DV) affected by unemployment rates (IV) What is the relationship between childhood victimisation (IV) and offending (DV)

There is no rule about which should be the DV - it depends on what we are interested in as researchers, but the outcome is usually a criminological one
Dependent = Outcome

Independent = Predictor

Population
The entire group of individuals is called the population. For example, a researcher may be interested in the relation between experience of bullying (variable 1) and academic performance (variable 2) for the population of Year 7 children in Hong Kong.

Sample
Usually populations are so large that a researcher cannot examine the entire group. Therefore, a sample is selected to represent the population in a research study. The goal is to use the results obtained from the sample to help make inferences about the population. It is likely to be impossible to measure these variables for every Year 7 child in Hong Kong, so we obtain a sample of 300 Year 7 children that we think share the same characteristics of the population.

Levels of measurement
There are many ways to measure (assign numbers) variables. The way in which we do this depends upon the characteristics of the variable.

We cannot measure gender on a scale of 1 to 10, there are a limited number of options (categories)
We cannot rank colours, red is not more than blue

We may wish to differentiate between lengths or scores, 250metres is 100m longer than 150m and it is twice as long as 125m
Different levels of measurement require different types of statistics, so it is very important that we can identify the correct level of measurement

4 Types of Measurement Scales


Categorical variables 1. A nominal scale is an unordered set of categories identified only by name. Categorical measurements only permit you to determine whether two individuals are the same or different, Yes/No Whipped cream/No whipped cream, Red/Green/Blue Caramel/Hazelnut/Vanilla 2. An ordinal scale is an ordered set of categories. Ordinal measurements tell you the direction of difference between two individuals, Position: First/Second/Third Preference: Dont like/Undecided/Like Size: Small/Regular/Large Priority: Non-urgent/Standard/Urgent/Very urgent/Immediate

4 Types of Measurement Scales


Continuous variables
3. An interval scale is an ordered series of equal-sized categories, e.g. Scores on an intelligence test, Length of prison sentence. Interval measurements identify the direction and magnitude of a difference. 4. A ratio scale is an interval scale where a value of zero indicates none of the variable. Ratio measurements identify the direction and magnitude of differences and allow ratio comparisons of measurements, i.e. a score of 4 on a measure means that person is twice as good as 2. For example, RXn times, Weight, Height, Counts

Types of Data Analysis


Types of data
Univariate: When we use one variable to describe a person, place, or thing, e.g. Number of crimes in an area per week Bivariate: When we use two variables to describe a person, place, or thing, e.g. Number of crimes in an area per week and average income of the area The type of data will dictate (in part) the appropriate data-analysis method.

Univariate Analysis/Descriptive Statistics


Descriptive Statistics
Mean (average) Median (middle) Mode (most frequent) The Range Min/Max Variance Standard Deviation Histograms Normal Distributions

Central tendency

Dispersion

The mean: a simple statistical model


In statistics we fit models to our data (i.e. we use a statistical model to represent what is happening in the real world). The mean is a hypothetical value (i.e. it doesnt have to be a value that actually exists in the data set). As such, the mean is simple statistical model

The mean is also known as the average

The Mean
The mean is the sum of all scores divided by the number of scores.

Mean ( X )

xi
n

The Mean: Example


Collect some data:
How many items are in your bag? Add them up:

x
n

Divide by the number of scores, n:

xi
n

Mean number of items


Calculating the mean number of items brought to class by our sample describes the item-carrying behaviour of our sample It also allows us to model the item-carrying behaviour of all statistics students What assumptions would we be making?

What about other variables?

Problems with the mean


Mean is susceptible to extreme values, e.g. Super-rich The average net household income in the UK is 23,000, but only half earn more than 18,800 and the most common income is 14,000

The mean sometimes fails to tell us what is typical, e.g. Almost everyone has more than the average number of feet

Blastland, M. & Dilnot, A. (2008). The tiger that isnt. London: Profile Books.

Sometimes we want to know about the middle - Average Joe - Middle England

- The man on the street

The middle is not the average/mean, but the case where 50% are higher and 50% are lower. This is the median (50th percentile)

The Median
Median The middle score when scores are ordered 22, 40, 53, 57, 93, 98, 103, 108, 116, 121, 252 (n=11)

n 1 11 1 6th 98 2 2
22, 40, 53, 57, 93, 98, 103, 108, 116, 121 (n=10)

10 1 93 98 5.5th 95.5 2 2

Mean versus Median


Large sample values tend to inflate the mean It is not always obvious when the mean or the median is more helpful. Often descriptive statistics reports will give us both values

The median is not influenced by large sample values and is a better measure of centrality if the distribution is skewed
It is possible to examine data by eye-ball alone, but it is easier to use graphical representations of data to look for extreme values. We will do this in the next unit

Central tendency: The Mode


Mode The most frequent score, e.g. 14,000 income Bimodal Having two modes

Multimodal
Having several modes

Summary
Three main levels of measurement Nominal Ordinal Interval/Ratio Measures of central tendency (e.g. mean, median) are simple tools for modelling/describing data, but they create a very simplified description of data often this is too simplified. In the next lecture, we will cover this issue.

Recap: Levels of Measurement


Binary variable: There are only two categories Nominal variable: There are more than two categories Ordinal variable: The same as a nominal variable but the categories have a logical order Interval variable: Equal intervals on the variable represent equal differences in the property being measured Ratio variable: The same as an interval variable, but the ratios of scores on the scale must also make sense

Вам также может понравиться