Академический Документы
Профессиональный Документы
Культура Документы
Homework Instructions
This homework assignment is due no later than 3:00 on Friday, April 18th. Late assignments will only be accepted in extreme circumstances and only if arrangements have been made in advance. Your solutions must be typed and very neatly organized. I will not try to infer your solutions if it they are not clearly presented. Equations need not be typeset perfectly but they should be clear. You may substitute letters for symbols (e.g., b1 for 1 ), and you may write-out equations (neatly) by hand if necessary. Include with your solutions you must include the relevant R output and the R scripts that created them. Include these within the text of your solutions using cut-and-paste. Try to include only the relevant output. Use a monospace font (e.g., Consolas or Courier) for R scripts and output for clarity, but only for R scripts and output. It is permitted for you to discuss the homework with other students in the course. However you must still write your own R scripts, produce your own output, and write up your own solutions. You are welcome to ask me questions concerning the homework. I will be particularly open to helping with any R problems. I want to evaluate your understanding of applied regression, not R, but part of the purpose of the homework assignments is to get you to exercise using R. If you email me with a R question, it may be helpful for you to include with your email your full R script so that I can replicate your problem. The Statistics Assistance Center (SAC) and Statistical Consulting Center (SCC) are not designed to accommodate this course. Direct all questions to me.
Streptococcus pyogenes is a bacterium that is the cause of infections resulting in Streptococcal pharyngitis (i.e., strep throat).
Table 1: Number of children classied by tonsil size and carrier status. Carrier Size Normal Large Very Large Yes 19 29 24 No 497 560 269
The likelihood ratio test is identical to that obtained using logistic regression if a likelihood ratio test is done for the effect of tonsil size. Use the model from the previous problem and an appropriate null model to replicate the likelihood ratio test statistic and pvalue above using the anova function. 3. The contrast function can be used to estimate the log-odds or logit of being a carrier for each tonsil size. Exponentiating these values gives the estimated odds of being a carrier, and applying the function ez /(1 + ez ) to the log-odds, or the function z/(1 + z) to the odds, will give the estimated probability of being a carrier.3 Give the estimated log-odds, odds, and probability of being a carrier for each of the three tonsil sizes, and include a condence interval for each quantity.4 Be sure to use contrastfix with wald = TRUE to get the proper condence interval. 4. Use contrast and contrastfix to obtain estimates and condence intervals for the odds ratios for comparing the odds of being a carrier between children with normal and large tonsils, normal and very large tonsils, and larger and very large tonsils. Briey summarize each comparison in a sentence (e.g., The odds of being a carrier is x times larger for children with very large tonsils than for children with normal-sized tonsils.). Also report a condence interval for each odds ratio, and a test statistic and p-value for the null hypothesis that the odds ratio is 1.
plogis.
Remember that when transforming a parameter estimate, to obtain the condence interval for the transformed parameter estimate you can apply the same transformation to the end points of the condence interval.
4
Modify this script as necessary to answer the following questions. 1. Estimate a logistic regression model using the same linear predictor as the normal/linear model in the script. Report the parameter estimates and standard errors from summary, and plot the predicted values with the observed proportions. 2. Use contrast and contrastfix to estimate the two odds ratios to describe the effect of distance on the odds of removal for each morph. Provide condence intervals and use these to briey describe the relationship between distance and odds of removal for each morph. 3. Use contrast and contrastfix to estimate the odds ratio to describe the effect of morph at a distance of 0 km, and also the odds ratio to describe the effect of morph at a distance of 50 km. Provide condence intervals and use these to briey describe the relationship the effect of morph on the odds of removal at the two distances. 4. Is there any evidence of overdispersion? Why or why not?
The script also plots the data with the predicted values as well as the studentized residuals against the predicted values. The residuals show clear evidence of signicant overdispersion suggesting that the variance structure implied by the Poisson distribution is not sufcient. Below you will investigate a couple of other variance structures. 1. One approach to dealing with overdispersion is to use quasilikelihood which assumes the variance structure Var(Yi ) = E(Yi ) where > 0 is an unknown parameter rather than being xed at = 1 in the case of the Poisson distribution.6 Report the parameter estimates and standard errors based on a quasi-likelihood Poisson model.7 Discuss briey how these compare to those from the original Poisson model. Also provide a plot of the studentized residuals against the predicted values and discuss briey if the quasi-likelihood approach has sufciently dealt with the overdispersion. 2. A negative binomial distribution implies the variance structure Var(Yi ) = E(Yi ) + E(Yi )2 where 0. This variance structure assumes that the variance increases quadratically rather than linearly with the expected value. Repeat what you did in the previous problem but for a negative binomial model.8
The term quasi-likelihood comes from the fact that by not xing = 1, estimation is no longer based on a known distribution with a particular likelihood function, but in many ways inferences are analogous to those based on a likelihood function. 7 Recall that this is done by specifying family = quasipoisson, but do not forget to also include the identity link function.
6
There is no family argument for the glm.nb function, so to specify a link function you can simply use the optional argument link = identity.