Вы находитесь на странице: 1из 8

Displaying Data Frequency distribution: Table: frequency table (for either categorical or numerical data) Graphical method: a.

Categorical data: bar graph b. Numerical data: histogram, cumulative frequency distribution (CFD)

Associations between 2 or more categorical variables Table: contingency table a frequency table for 2 or more categorical variables Graphical method: a. Grouped bar graph (frequency distributions of 2 or more categorical variables) e.g. frequency of birds in control and egg removal groups that contact malaria or not b. Mosaic plot (relative frequency of occurrence of all combination of 2 categorical variables) e.g. same as above

Comparing numerical variables between groups Graphical method: a. Grouped histograms e.g. frequency of different Hb concentration in males living in different high altitudes b. CFD e.g. same as above c. Line plots (ordinal only)

Displaying relationships between a pair of numerical variables Graphical method: a. Scatter plot (pattern of association between 2 numerical variables) e.g. relationship between ornamentation of male guppies and the average attractiveness of their sons b. Line plot (displaying trends in measurement over time or other ordered series, dots connected by line segments) e.g. lynx fur returns to HBC in years 1752-1819 c. Maps (spatial equiv. of line graph)

Describing data Sample mean

Variance

Standard deviation

Proportion

Interquartile range (quartiles: values that partition data in quarters)

Quartiles: values that partition data into quarters Box plot: graph that uses lines and a rectangle to display median, quartiles, extreme measurements and range of data

Estimating with uncertainty Sampling distribution: probability distribution of all values for an estimate that we might have obtained when we sampled the population Standard error (estimation of SD)

95% CI

Pseudoreplication: error that occurs when samples are not independent but are treated as though they are (taking individuals in sample more like each other than by chance; affects sampling error)

Probability Mutually exclusive

Probability distribution: list of probabilities of all mutually exclusive outcomes of a random trial Addition rule

Independent events: occurrence of 1 does not in any way affect or predict probability of the other Multiplication rule

Dependent events: probability of 1 event depends on another

Conditional probability: probability of an event happening given that a condition is met Law of total probability

General multiplication rule

Bayes theorem

Hypothesis testing H0: specific statement about population parameter made for purposes (interesting to reject) HA: represents all other possible parameter values except for the H0 Two-sided (two-tailed) test: includes values on both sides of the H0 value One-sided (one-tailed) test: includes parameter values on only 1 side specified by H0; in cases where one side of the distribution does not make sense

Null distribution: sampling distribution of outcomes for a test statistic assuming H0 is true P-value: probability of obtaining the data (or data even worse match to the H0) assuming H0 is true Significance level (): probability used as a criterion for rejecting the H0 P-value , H0 rejected P-value , H0 not rejected Type I error: rejecting a true H0; sets the probability of committing Type I error (reducing makes it more difficult to reject true H0 & harder to reject false H0) Type II error: failing to reject a false H0 Power: probability that a random sample will lead to rejection of a false H0

Analyzing proportions Binomial distribution: probability distribution for the number of success in a fixed # of independent trials when the probability of success is the same in each trial (H0) Binomial test: whether the population proportion (p) matches null expectation p0 H0: Relative frequency of successes in the population is p0 HA: Relative frequency of successes in the population is not p0 Proportion estimations Population proportion

SE of proportion

Agresti-Coull method (CI)

Fitting probability models to frequency data 2-goodness-of-fit test: compares frequency of data to a model stated by the H0 Assuming: a. none of the categories should have an expected frequency < 1 b. no more than 20% of the categories should have expected frequencies less than 5

2 test statistic: discrepancy between observed and expected frequencies

Degrees of freedom: specifies which family of distributions to use as the null distribution Critical value (): value of a test statistic that marks the boundary of a specified area in the tail(s) of the sampling distribution under H0. Poisson distribution: number of success in blocks of time or space, when success happen independently of each other and occur with equal probability at every point in time or space 2 test statistic, degree of freedom, etc. same procedure as above

Contingency analysis: associations between categorical variables 2 contingency test (assumptions same as 2-goodness-of-fit test) H0: Categorical variables are independent HA: Categorical variables are not independent 2 statistic

Degrees of freedom

Fishers exact test: examines independence of two categorical variables when expected frequencies are too low to meet rules by the 2 approximation H0: Categorical variables are independent HA: Categorical variables are not independent P-value best done on computer

Normal distribution Normal distribution: continuous probability distribution, describing a bell-shaped curve; good approximation to frequency distributions of many biological variables

Properties: a. continuous distribution, probability measure by the area under the curve rather than the height of the curve b. symmetric & around its mean c. single mode d. probability density is highest exactly at the mean (mean, mode, median are the same) e. ~2/3 (68.3%) of the area under the normal curves lies within 1 SD of the mean (probability of 0.683 that randomly chosen observation will fall between and + in a normal distribution) f. 95% of the probability of normal distribution lies within 2 SD of the mean (between 1.96 and +1.96)

Standard normal distribution: normal distribution with mean 0 and SD 1

Standard normal deviate (Z): tell us how many SD a particular value is from the mean; aka. Z standardization

Sample mean: if variable Y is a normal distribution in a population, then the distribution of sample means Y is also normal

Probability of sample means

Central limit theorem: the sum or mean of a large number of measurements randomly sampled from a nonnormal population is approximately normally distributed (tells us how large is a large enough sample) Normal approximation: when the number of trials n is large, the binomial probability distribution for the number of successes is approximated by normal distribution having mean np and SD

Inference for a normal population t-distribution: the sample of the normal distribution (as sample increases, more like normal) Students t-distribution (substituting denominator of Z in Z standardization by SE)

Degree of freedom

Critical value (5% critical t-value)

95% CI

99% CI (broader interval than 95% CI because need to include more possibilities to achieve higher probability of covering the true mean)

One-sample test: compares the mean of random sample from a normal population with population mean proposed in a H0 Assuming: a. data are a random sample from the population b. variable is normally distributed in the population Robust: method is not sensitive to modest departures from the assumptions H0: True mean equals 0 HA: True mean does not equal 0

Test statistic

Estimation of SD & variance on normal population Assumptions: same as for CI of the mean (random sample of the population, v. sensitive to assumption of normality)

CI for variance

CI limits for SD