BBL

© All Rights Reserved

Просмотров: 7

BBL Presentation March2013 Followup

BBL

© All Rights Reserved

- Python Introduction
- Regression Analysis
- Simulation
- IARI PhD Entrance Question Paper 2011 - Agril Statistics
- Riedl Statement
- Bayesian Risk Analysis
- 1. Basic Econometrics Revision - Econometric Modelling
- Metode Analisis Data
- Clinical Trials
- OBES Quality Efficiency Education
- US Federal Reserve: 200206pap
- Smith-2010
- Robot Positioning System Based on a Rotating Receiver
- Spectra Lo Dev 2
- Good Practices in Free-Energy Calculations
- An Empirical Study of Experience-based Software Defect Content Estimation Methods.pdf
- general-strategies.pdf
- 3. general-strategies.pdf
- Lecture 3
- Coventry Correlation

Вы находитесь на странице: 1из 3

Follow up to SON Brown Bag Presentation 3/20/13 (C Thompson) Missing Data part 1

From Baraldi/Enders 2009 reference pp7-9:

Before we can begin discussing different missing data handling options, it is important

to have a solid understanding of so-called missing data mechanisms. Rubin (1976) and

colleagues (Little & Rubin, 2002) came up with the classification system that is in use

today: missing completely at random (MCAR), missing at random (MAR), and missing not

at random (MNAR). These mechanisms describe relationships between measured variables

and the probability of missing data. While these terms have a precise probabilistic and

mathematical meaning, they are essentially three different explanations for why the data are

missing. From a practical perspective, the mechanisms are assumptions that dictate the

performance of different missing data techniques. We give a conceptual description of each

mechanism in this section, and supplementary resources are available to readers who want

additional details on the missing data mechanisms (Allison, 2002; Enders, 2010; Little &

Rubin, 2002; Rubin, 1976; Schafer & Graham, 2002).

To begin, data are MCAR when the probability of missing data on a variable X is

unrelated to other measured variables and to the values of X itself. In other words,

missingness is completely unsystematic and the observed data can be thought of as a

random subsample of the hypothetically complete data. As an example, consider a child in

an educational study that moves to another district midway through the study. The missing

values are MCAR if the reason for the move is unrelated to other variables in the data set

(e.g., socioeconomic status, disciplinary problems, or other study-related variables). Other

examples of MCAR occur when a participant misses a survey administration due to

scheduling difficulties or other unrelated reasons (such as a doctor's appointment), a

computer randomly misreads grid-in sheets, or an administrative blunder causes several test

results to be misplaced prior to data entry. MCAR data may also be a purposeful byproduct

of the research design. For example, suppose that a researcher collects self-report data from

the entire sample but limits time-consuming behavioral observations to a random subset of

participants. We describe a number of these so-called planned missing data designs at the

end of the paper. Because MCAR requires missingness to be unrelated to study variables,

methodologists often argue that it is a very strict assumption that is unlikely to be satisfied

in practice (Raghunathan, 2004; Muthen, Kaplan, & Hollis, 1987).

The MAR mechanism requires a less stringent assumption about the reason for missing

data. Data are MAR if missingness is related to other measured variables in the analysis

model, but not to the underlying values of the incomplete variable (i.e., the hypothetical

values that would have resulted had the data been complete). This terminology is often

confusing and misleading because of the use of the word random. In fact, an MAR

mechanism is not random at all and describes systematic missingness where the propensity

for missing data is correlated with other study-related variables in an analysis. As an

example of an MAR mechanism, consider a study that is interested in assessing the

relationship between substance use and self-esteem in high school students. Frequent

substance abuse may be associated with chronic absenteeism, leading to a higher

probability of missing data on the self-esteem measure (e.g., because students tend to be

absent on the days that the researchers administered the self-esteem questionnaires). This

example qualifies as MAR if the propensity for missing data on the self-esteem measure is

completely determined by a student's substance use score (i.e., there is no residual

C:\Documents and Settings\mdenny1\Local Settings\Temporary Internet

Files\Content.Outlook\KZU6WGSS\Examples_Missingness Mechanisms_20130325.doc3/25/2013 2:38 PM

p 1 of 3

relationship between the probability of missing data and self-esteem after controlling for

substance use). As a second example, suppose that a school district administers a math

aptitude exam, and students that score above a certain cut-off participate in an advanced

math course. The math course grades are MAR because missingness is completely

determined by scores on the aptitude test (e.g., students that score below the cut-off do not

have a grade for the advanced math course).

Finally, data are MNAR if the probability of missing data is systematically related to the

hypothetical values that are missing. In other words, the MNAR mechanism describes data

course grades). Although the magnitude of the bias depends on the correlation between the

omitted aptitude variable and the course grades (bias increases as the correlation increases),

the analysis is nevertheless consistent with an MNAR mechanism. Later in the manuscript,

we describe methods for incorporating so-called auxiliary variables that are related to

missingness into a statistical analysis. Doing so can mitigate bias (i.e., by making the MAR

mechanism more plausible) and can improve power (i.e., by recapturing some of the

missing information).

From Howell ref (Missing):

Missing completely at random

There are several reasons why data may be missing. They may be missing because equipment malfunctioned,

the weather was terrible, people got sick, or the data were not entered correctly. Here the data are missing

completely at random (MCAR). When we say that data are missing completely at random, we mean that the

probability that an observation (Xi) is missing is unrelated to the value of Xi or to the value of any other

variables. Thus data on family income would not be considered MCAR if people with low incomes were less

likely to report their family income than people with higher incomes. Similarly, if Whites were more likely to

omit reporting income than African Americans, we again would not have data that were MCAR because

missingness would be correlated with ethnicity. However if a participant's data were missing because he was

stopped for a traffic violation and missed the data collection session, his data would presumably be missing

completely at random. Another way to think of MCAR is to note that in that case any piece of data is just as

likely to be missing as any other piece of data.

Notice that it is the value of the observation, and not its "missingness," that is important. If people who refused

to report personal income were also likely to refuse to report family income, the data could still be considered

MCAR, so long as neither of these had any relation to the income value itself. This is an important

consideration, because when a data set consists of responses to several survey instruments, someone who did

not complete the Beck Depression Inventory would be missing all BDI subscores, but that would not affect

whether the data can be classed as MCAR.

This nice feature of data that are MCAR is that the analysis remains unbiased. We may lose power for our

design, but the estimated parameters are not biased by the absence of data.

Missing at random

Often data are not missing completely at random, but they may be classifiable as missing at random (MAR).

(MAR is not really a good name for this condition because most people would take it to be synonymous with

C:\Documents and Settings\mdenny1\Local Settings\Temporary Internet

Files\Content.Outlook\KZU6WGSS\Examples_Missingness Mechanisms_20130325.doc3/25/2013 2:38 PM

p 2 of 3

MCAR, which it is not. However, the label has stuck.) Let's back up one step. For data to be missing completely

at random, the probability that Xi is missing is unrelated to the value of Xi or other variables in the analysis.

But the data can be considered as missing at random if the data meet the requirement that missingness does not

depend on the value of Xi after controlling for another variable. For example, people who are depressed might

be less inclined to report their income, and thus reported income will be related to depression. Depressed people

might also have a lower income in general, and thus when we have a high rate of missing data among depressed

individuals, the existing mean income might be lower than it would be without missing data. However, if,

within depressed patients the probability of reported income was unrelated to income level, then the data would

be considered MAR, though not MCAR. Another way of saying this is to say that to the extent that missingness

is correlated with other variables that are included in the analysis, the data are MAR.

The phraseology is a bit awkward here because we tend to think of randomness as not producing bias, and thus

might well think that Missing at Random is not a problem. Unfortunately it is a problem, although in this case

we have ways of dealing with the issue so as to produce meaningful and relatively unbiased estimates. But just

because a variable is MAR does not mean that you can just forget about the problem. But nor does it mean that

You have to throw up your handes and declare that there is nothing to be done

The situation in which the data are at least MAR is sometimes referred to as ignorable missingness. This name

comes about because for those data we can still produce unbiased parameter estimates without needing to

provide a model to explain missingness. Cases of MNAR, to be considered next, could be labeled cases of

nonignorable missingness.

If data are not MCAR or MAR then they are classed as Missing Not at Random (MNAR). For example, if we

are studying mental health and people who have been diagnosed as depressed are less likely than others to

report their mental status, the data are not missing at random. Clearly the mean mental status score for the

available data will not be an unbiased estimate of the mean that we would have obtained with complete data.

The same thing happens when people with low income are less likely to report their income on a data collection

form.

When we have data that are MNAR we have a problem. The only way to obtain an unbiased estimate of

parameters is to model missingness. In other words we would need to write a model that accounts for the

missing data. That model could then be incorporated into a more complex model for estimating missing values.

This is not a task anyone would take on lightly. See Dunning and Freedman (2008) for an example. However

even if the data are MNAR, all is not lost. Our estimators may be biased, but the bias may be small.

Files\Content.Outlook\KZU6WGSS\Examples_Missingness Mechanisms_20130325.doc3/25/2013 2:38 PM

p 3 of 3

- Regression AnalysisЗагружено:A.Benhari
- SimulationЗагружено:राहुल कुमार
- IARI PhD Entrance Question Paper 2011 - Agril StatisticsЗагружено:Abhay Kumar
- Bayesian Risk AnalysisЗагружено:Valya Ruseva
- 1. Basic Econometrics Revision - Econometric ModellingЗагружено:Trevor Chimombe
- Metode Analisis DataЗагружено:hambyong
- Clinical TrialsЗагружено:Olsjon Baxhija
- OBES Quality Efficiency EducationЗагружено:Prashant Kumar
- US Federal Reserve: 200206papЗагружено:The Fed
- Smith-2010Загружено:XAVI Gonzalez
- Robot Positioning System Based on a Rotating ReceiverЗагружено:bozzec
- Spectra Lo Dev 2Загружено:shrine_mi
- Good Practices in Free-Energy CalculationsЗагружено:Milton Paredes Avalos
- An Empirical Study of Experience-based Software Defect Content Estimation Methods.pdfЗагружено:jgonzalezsanz8914
- general-strategies.pdfЗагружено:ShyamBhatt
- 3. general-strategies.pdfЗагружено:Vansh Raj Gautam
- Lecture 3Загружено:Crystal Eshraghi

- Python IntroductionЗагружено:chhackl
- Riedl StatementЗагружено:azertytyty000
- Coventry CorrelationЗагружено:Aditya Kumar
- TLearning SurveyЗагружено:azertytyty000
- TEST PearsonsЗагружено:azertytyty000
- Mps InstallЗагружено:azertytyty000
- thebook.pdfЗагружено:azertytyty000
- Mixture ModelsЗагружено:sankarshan7
- iJcsIt 20140504207Загружено:azertytyty000
- Clustering Distance MeasureЗагружено:azertytyty000
- 04 Public Key Cryptography RSAЗагружено:azertytyty000
- BayesianNetClassifiers-3-5-8Загружено:domsuciu
- 2-solutions-clrs-06-07-08Загружено:azertytyty000
- Design Patterns Prototype PatternЗагружено:azertytyty000
- CheckList 1NHTUЗагружено:azertytyty000
- 切結書Declaration_2013fallNTHUЗагружено:azertytyty000
- grammaire.pdfЗагружено:azertytyty000

- Forecasting the Nominal Brent Oil Price With VARs—One Model Fits AllЗагружено:tegelinsky
- IC0401_Unit v Analysis and Application of VIЗагружено:Krushnasamy Suramaniyan
- ASTM D3786-Bursting Strength of Textile FabricsЗагружено:heobukon
- Evaluation of Sires Using Different Sire Evaluation Methods on the Basisof First Lactation Traits in Sahiwal Cattle 2157 7579 1000296Загружено:Yuel F. Sanchez Luna
- tobit - 1Загружено:Hugo Garcia
- Electric Moisture Meters for WoodЗагружено:Bazooka_Joe
- G 83.PDFЗагружено:Grato Jr Singco
- 2. SamplingЗагружено:Gita Mentari
- m8 ch10 solutionsЗагружено:api-272721387
- Dse Solutions 2015Загружено:Arjun Varma
- Chap 7 Lecture Notes StatisticsЗагружено:api-3737025
- 4. DETECTION AND PREDICTION OF DRIVER’S MICROSLEEP EVENTS.pdfЗагружено:Anonymous EUedLkBl
- On Self Tuning RegulatorsЗагружено:castrojp
- ML CriterionЗагружено:SundarRajan
- Boosting Random Forests to Reduce BiasЗагружено:paragjdutta
- Cmb Parameter estimationЗагружено:lordpeafux
- Topic05.Normal DistrЗагружено:David Le
- monte-carlo.pptЗагружено:adiraju07
- Astm d7582 Tga CarbonЗагружено:Daniel Ballén
- Aczel Solution 005Загружено:aman pasari
- omitted(1)Загружено:Isabella Valencia Vernaza
- Anderson_et_al_Edition_3.pdfЗагружено:earseeker
- Abzalov_2008 QAQCЗагружено:Fernando Solis Castillo
- Bending Stiffness of FabricsЗагружено:Casey Dean
- 1.0 Regression Problems for Magnitudes - Castellaro 2006Загружено:Govind Gaurav
- Bayesian Methods for Testing the Randomness of Lottery DrawsЗагружено:rock_elektron
- Baccino Et Al. - 1999 - Evaluation of Seven Methods of Estimating Age at Death From Mature Human Skeletal RemainsЗагружено:AndreFloresFuentes
- A Process Incapability IndexЗагружено:Hamidiqbal
- 3.Research ReviewЗагружено:MarcAparici