Академический Документы
Профессиональный Документы
Культура Документы
Internal validity is a property of scientific studies which reflects the extent to which a causal conclusion based on a study is
warranted. Such warrant is constituted by the extent to which a study minimizessystematic error (or 'bias').
Contents
[hide]
1Details
2Factors affecting internal validity
3Threats to internal validity
o
o
o
o
o
o
o
o
o
o
o
o
o
Details[edit]
Inferences are said to possess internal validity if a causal relation between two variables is properly demonstrated.
causal inference may be based on a relation when three criteria are satisfied:
1.
2.
3.
[1][2]
[2]
In scientific experimental settings, researchers often manipulate a variable (the independent variable) to see what effect it
has on a second variable (the dependent variable). For example, a researcher might, for different experimental groups,
manipulate the dosage of a particular drug between groups to see what effect it has on health. In this example, the
researcher wants to make a causal inference, namely, that different doses of the drug may be held responsible for observed
changes or differences. When the researcher may confidently attribute the observed changes or differences in the
dependent variable to the independent variable, and when he can rule out other explanations (or rival hypotheses), then his
causal inference is said to be internally valid.
[3]
[4]
In many cases, however, the magnitude of effects found in the dependent variable may not just depend on
Rather, a number of variables or circumstances uncontrolled for (or uncontrollable) may lead to additional or alternative
explanations (a) for the effects found and/or (b) for the magnitude of the effects found. Internal validity, therefore, is more a
matter of degree than of either-or, and that is exactly why research designs other than true experiments may also yield
results with a high degree of internal validity.
In order to allow for inferences with a high degree of internal validity, precautions may be taken during the design of the
scientific study. As a rule of thumb, conclusions based on correlations or associations may only allow for lesser degrees of
internal validity than conclusions drawn on the basis of direct manipulation of the independent variable. And, when viewed
only from the perspective of Internal Validity, highly controlled true experimental designs (i.e. with random selection, random
assignment to either the control or experimental groups, reliable instruments, reliable manipulation processes, and
safeguards against confounding factors) may be the "gold standard" of scientific research. By contrast, however, the very
strategies employed to control these factors may also limit the generalizability or External Validity of the findings.
History effect: Events that occur besides the treatment (events in the environment)
Maturation: Physical or psychological changes in the participants
Testing: Effect of experience with the pretest - - become test wise.
Instrumentation: Learning gain might be observed from pre to posttest simply due to nature of the instrument.
Selection: Effect of treatment confounded with other factors because of selection of participants, problem in non
random sample
Statistical regression: Tendency for participants whose scores fall at either extreme on a variable to score nearer the
mean when measured a second time.
Mortality: Participants lost from the study, attrition.
Confounding[edit]
A major threat to the validity of causal inferences is confounding: Changes in the dependent variable may rather be
attributed to the existence or variations in the degree of a third variable which is related to the manipulated variable.
Where spurious relationships cannot be ruled out, rival hypotheses to the original causal inference hypothesis of the
researcher may be developed.
Selection bias[edit]
Selection bias refers to the problem that, at pre-test, differences between groups exist that may interact with the
independent variable and thus be 'responsible' for the observed outcome. Researchers and participants bring to the
experiment a myriad of characteristics, some learned and others inherent. For example, sex, weight, hair, eye, and skin
color, personality, mental capabilities, and physical abilities, but also attitudes like motivation or willingness to participate.
During the selection step of the research study, if an unequal number of test subjects have similar subject-related variables
there is a threat to the internal validity. For example, a researcher created two test groups, the experimental and the control
groups. The subjects in both groups are not alike with regard to the independent variable but similar in one or more of the
subject-related variables.
History[edit]
Events outside of the study/experiment or between repeated measures of the dependent variable may affect participants'
responses to experimental procedures. Often, these are large scale events (natural disaster, political change, etc.) that
affect participants' attitudes and behaviors such that it becomes impossible to determine whether any change on the
dependent measures is due to the independent variable, or the historical event.
Maturation[edit]
Subjects change during the course of the experiment or even between measurements. For example, young children might
mature and their ability to concentrate may change as they grow up. Both permanent changes, such as physical growth and
temporary ones like fatigue, provide "natural" alternative explanations; thus, they may change the way a subject would react
to the independent variable. So upon completion of the study, the researcher may not be able to determine if the cause of
the discrepancy is due to time or the independent variable.
The instrument used during the testing process can change the experiment. This also refers to observers being more
concentrated or primed, or having unconsciously changed the criteria they use to make judgments. This can also be an
issue with self-report measures given at different times. In this case the impact may be mitigated through the use of
retrospective pretesting. If any instrumentation changes occur, the internal validity of the main conclusion is affected, as
alternative explanations are readily available.
Mortality/differential attrition[edit]
Main article: Survivorship bias
This error occurs if inferences are made on the basis of only those participants that have participated from the start to the
end. However, participants may have dropped out of the study before completion, and maybe even due to the study or
programme or experiment itself. For example, the percentage of group members having quit smoking at post-test was found
much higher in a group having received a quit-smoking training program than in the control group. However, in the
experimental group only 60% have completed the program. If this attrition is systematically related to any feature of the
study, the administration of the independent variable, the instrumentation, or if dropping out leads to relevant bias between
groups, a whole class of alternative explanations is possible that account for the observed differences.
Selection-maturation interaction[edit]
This occurs when the subject-related variables, color of hair, skin color, etc., and the time-related variables, age, physical
size, etc., interact. If a discrepancy between the two groups occurs between the testing, the discrepancy may be due to the
age differences in the age categories.
Diffusion[edit]
If treatment effects spread from treatment groups to control groups, a lack of differences between experimental and control
groups may be observed. This does not mean, however, that the independent variable has no effect or that there is no
relationship between dependent and independent variable.
Experimenter bias[edit]
Experimenter bias occurs when the individuals who are conducting an experiment inadvertently affect the outcome by nonconsciously behaving in different ways to members of control and experimental groups. It is possible to eliminate the
possibility of experimenter bias through the use of double blind study designs, in which the experimenter is not aware of the
condition to which a participant belongs.
For eight of these threats there exists the first letter mnemonic THIS MESS, which refers to the first letters of Testing
(repeated testing), History, Instrument change, Statistical Regression toward the mean,Maturation, Experimental
mortality, Selection and Selection Interaction.
[5]
https://en.wikipedia.org/wiki/Internal_validity
External validity is the validity of generalized (causal) inferences in scientific research, usually based on experiments as experimental validity.[1] In other words, it is the extent to which
the results of a study can be generalized to other situations and to other people. [2] Mathematical analysis of external validity concerns a determination of whether generalization across
heterogeneous populations is feasible, and devising statistical and computational methods that produce valid generalizations. [3]
Contents
[hide]
3Examples
4External, internal, and ecological validity
5Qualitative research
6External validity in experiments
o
o
o
Aptitudetreatment Interaction: The sample may have certain features that may interact with the independent variable, limiting generalizability. For example, inferences based
on comparative psychotherapy studies often employ specific samples (e.g. volunteers, highly depressed, no comorbidity). If psychotherapy is found effective for these sample
patients, will it also be effective for non-volunteers or the mildly depressed or patients with concurrent other disorders?
Situation: All situational specifics (e.g. treatment conditions, time, location, lighting, noise, treatment administration, investigator, timing, scope and extent of measurement, etc.
etc.) of a study potentially limit generalizability.
Pre-test effects: If cause-effect relationships can only be found when pre-tests are carried out, then this also limits the generality of the findings.
Post-test effects: If cause-effect relationships can only be found when post-tests are carried out, then this also limits the generality of the findings.
Reactivity (placebo, novelty, and Hawthorne effects): If cause-effect relationships are found they might not be generalizable to other settings or situations if the effects found
only occurred as an effect of studying the situation.
Rosenthal effects: Inferences about cause-consequence relationships may not be generalizable to other investigators or researchers.
Cook and Campbell[6] made the crucial distinction between generalizing to some population and generalizing across subpopulations defined by different levels of some background factor.
Lynch has argued that it is almost never possible to generalize to meaningful populations except as a snapshot of history, but it is possible to test the degree to which the effect of some
cause on some dependent variable generalizes across subpopulations that vary in some background factor. That requires a test of whether the treatment effect being investigated is
moderated by interactions with one or more background factors. [5][7]
[3]
classified generalization problems into two categories: (1) those that lend
themselves to valid re-calibration, and (2) those where external validity is theoretically impossible. Using graph-based calculus, [8] they derived a necessary and sufficient condition for a
problem instance to enable a valid generalization, and devised algorithms that automatically produce the needed re-calibration, whenever such exists. [9] This reduces the external validity
problem to an exercise in graph theory, and has led some philosophers to conclude that the problem is now solved. [10]
An important variant of the external validity problem deals with selection bias also known as sampling bias that is, bias created when studies are conducted on non-representative
samples of the intended population. For example, if a clinical trial is conducted on college students, an investigator may wish to know whether the results generalize to the entire
population, where attributes such as age, education, and income differ substantially from those of a typical student. The graph-based method of Bareinboim and Pearl identifies conditions
under which sample selection bias can be circumvented and, when these conditions are met, the method constructs an unbiased estimator of the average causal effect in the entire
population. The main difference between generalization from improperly sampled studies and generalization across disparate populations lies in the fact that disparities among
populations are usually caused by preexisting factors, such as age or ethnicity, whereas selection bias is often caused by post-treatment conditions, for example, patients dropping out of
the study, or patients selected by severity of injury. When selection is governed by post-treatment factors, unconventional re-calibration methods are required to ensure bias-free
estimation, and these methods are readily obtained from the problem's graph. [11][12]
Examples[edit]
If age is judged to be a major factor causing treatment effect to vary from individual to individual, then age differences between the sampled students and the general population would
lead to a biased estimate of the average treatment effect in that population. Such bias can be corrected though by a simple re-weighing procedure: We take the age-specific effect in the
student subpopulation and compute its average using the age distribution in the general population. This would give us an unbiased estimate of the average treatment effect in the
population.
If, on the other hand, the relevant factor that distinguishes the study sample from the general population is in itself affected by the treatment, then a different re-weighing scheme need be
invoked. Calling this factor Z, we again average the z-specific effect of X on Y in the experimental sample, but now we weigh it by the "causal effect" of X on Z. In other words, the new
weight is the proportion of units attaining levelZ=z had treatment X=x been administered to the entire population. This interventional probability, often written
A typical example of this nature occurs when Z is a mediator between the treatment and outcome, For instance, the treatment may be a cholesterol- reducing
drug, Z can be cholesterol level, and Y life expectancy. Here, Z is both affected by the treatment and a major factor in determining the outcome, Y. Suppose that
subjects selected for the experimental study tend to have higher cholesterol levels than is typical in the general population. To estimate the average effect of the
drug on survival in the entire population, we first compute the z-specific treatment effect in the experimental study, and then average it using
as a weighting function. The estimate obtained will be bias-free even when Z and Y are confounded, that is, there is unmeasured common factor that affect both Z and Y. [14]
The precise conditions ensuring the validity of this and other weighing schemes are formulated in Bareinboim and Pearl, 2016 [14] and Bareinboim et al., 2014.[12]
Qualitative research[edit]
Within the qualitative research paradigm, external validity is replaced by the concept of transferability. Transferability is the ability of research results to transfer to situations with similar
parameters, populations and characteristics. [15]
The extent to which we can generalize from the situation constructed by an experimenter to real-life situations (generalizability across situations),[2] and
2.
The extent to which we can generalize from the people who participated in the experiment to people in general (generalizability across people)[2]
However, both of these considerations pertain to Cook and Campbell's concept of generalizing to some target population rather than the arguably more central task of assessing the
generalizability of findings from an experiment across subpopulations that differ from the specific situation studied and people who differ from the respondents studied in some meaningful
way.[6]
Critics of experiments suggest that external validity could be improved by use of field settings (or, at a minimum, realistic laboratory settings) and by use of true probability samples of
respondents. However, if one's goal is to understand generalizability across subpopulations that differ in situational or personal background factors, these remedies do not have the
efficacy in increasing external validity that is commonly ascribed to them. If background factor X treatment interactions exist of which the researcher is unaware (as seems likely), these
research practices can mask a substantial lack of external validity. Dipboye and Flanagan (1979), writing about industrial and organizational psychology, note that the evidence is that
findings from one field setting and from one lab setting are equally unlikely to generalize to a second field setting. [16] Thus, field studies are not by their nature high in external validity and
laboratory studies are not by their nature low in external validity. It depends in both cases whether the particular treatment effect studied would change with changes in background factors
that are held constant in that study. If one's study is "unrealistic" on the level of some background factor that does not interact with the treatments, it has no effect on external validity. It is
only if an experiment holds some background factor constant at an unrealistic level and if varying that background factor would have revealed a strong Treatment x Background factor
interaction, that external validity is threatened. [17]
The similarity of an experimental situation to events that occur frequently in everyday lifeit is clear that many experiments are decidedly unreal.
2.
In many experiments, people are placed in situations they would rarely encounter in everyday life.
This is referred to the extent to which an experiment is similar to real-life situations as the experiment'smundane realism.[18]
It is more important to ensure that a study is high in psychological realismhow similar the psychological processes triggered in an experiment are to psychological processes that occur
in everyday life.[19]
Psychological realism is heightened if people find themselves engrossed in a real event. To accomplish this, researchers sometimes tell the participants a cover storya false description
of the study's purpose. If however, the experimenters were to tell the participants the purpose of the experiment then such a procedure would be low in psychological realism. In everyday
life, no one knows when emergencies are going to occur and people do not have time to plan responses to them. This means that the kinds of psychological processes triggered would
differ widely from those of a real emergency, reducing the psychological realism of the study. [2]
People don't always know why they do what they do, or what they do until it happens. Therefore, describing an experimental situation to participants and then asking them to respond
normally will produce responses that may not match the behavior of people who are actually in the same situation. We cannot depend on people's predictions about what they would do in
a hypothetical situation; we can only find out what people will really do when we construct a situation that triggers the same psychological processes as occur in the real world.
Replications[edit]
The ultimate test of an experiment's external validity is replication conducting the study over again, generally with different subject populations or in different settings. Researches will
often use different methods, to see if they still get the same results.
When many studies of one problem are conducted, the results can vary. Several studies might find an effect of the number of bystanders on helping behaviour, whereas a few do not. To
make sense out of this, there is a statistical technique called meta-analysis that averages the results of two or more studies to see if the effect of an independent variable is reliable. A
meta analysis essentially tells us the probability that the findings across the results of many studies are attributable to chance or to the independent variable. If an independent variable is
found to have an effect in only of 20 studies, the meta-analysis will tell you that that one study was an exception and that, on average, the independent variable is not influencing the
dependent variable. If an independent variable is having an effect in most of the studies, the meta analysis is likely to tell us that, on average, it does influence the dependent variable.
There can be reliable phenomena that are not limited to the laboratory. For example, increasing the number of bystanders has been found to inhibit helping behaviour with many kinds of
people, including children, university students, and future ministers; [21] in Israel;[22] in small towns and large cities in the U.S.; [23] in a variety of settings, such as psychology laboratories, city
streets, and subway trains;[24] and with a variety of types of emergencies, such as seizures, potential fires, fights, and accidents, [25] as well as with less serious events, such as having a flat
tire.[26] Many of these replications have been conducted in real-life settings where people could not possibly have known that an experiment was being conducted.
having enough control over the situation to ensure that no extraneous variables are influencing the results and to randomly assign people to conditions, and
2.
Some researchers believe that a good way to increase external validity is by conducting field experiments. In a field experiment, people's behavior is studied outside the laboratory, in its
natural setting. A field experiment is identical in design to a laboratory experiment, except that it is conducted in a real-life setting. The participants in a field experiment are unaware that
the events they experience are in fact an experiment. Some claim that the external validity of such an experiment is high because it is taking place in the real world, with real people who
are more diverse than a typical university student sample. However, as real-world settings differ dramatically, findings in one real world setting may or may not generalize to another real
world setting.[16]
Neither internal nor external validity are captured in a single experiment. Social psychologists opt first for internal validity, conducting laboratory experiments in which people are randomly
assigned to different conditions and all extraneous variables are controlled. Other social psychologists prefer external validity to control, conducting most of their research in field studies.
And many do both. Taken together, both types of studies meet the requirements of the perfect experiment. Through replication, researchers can study a given research question with
maximal internal and external validity.[27]
https://en.wikipedia.org/wiki/External_validity
BOOK REVIEW
Peeling the onion from the inside out
John Noble Jr
Published online: May 28, 2016
Article Tools
PDF
Print this article
Indexing metadata
How to cite item
Finding References
The Western Australian endeavour demonstrates the feasibility of designing and implementing early warning systems for patients who have been excluded from RCTs because of comorbidities and
poly-drug use. In my view, it will be a steep uphill climb to overcome resistance from the pharmaceutical industry and government and private sector sponsors of research as well as biomedical
research opinion leaders and the researchers themselves. Paying attention to the requirements of external validity comes at some cost (3). There will be need to anticipate and include rather than
exclude clinically relevant populations within larger sample size RCTs. Alternatively, there will be need to design and implement separate RCTs to directly establish the efficacy and safety of new drugs
on these excluded populations before granting regulatory approval. Hopefully, the EBM leadership will strike a balance between pursuing improvements in the design, implementation, and reporting of
internally valid RCTs and promoting their external validity. The Oxford Centre for Evidence-Based Medicine is uniquely qualified and capable of taking on the challenge (4).
The importance of Dr Whitstock's recommendation that EBM develop early warning systems to protect at-risk patients is reflected in Abramson and Starfield's observation: "Among even the highest
quality clinical research (included in Cochrane reviews) the odds are 5.3 times greater that commercially funded studies will support their sponsors' products than non-commercially funded studies. ....
[The] primary purpose of commercially funded clinical research is to maximise financial return on investment, not health" (5:pp414,416).
http://ijme.in/index.php/ijme/article/view/2399/4970
Share
Author alerts
Print
Clip
When David Graham, a scientist with the US Food and Drug Administration, first warned in August of the health risks of the anti-inflammatory
drug Vioxx, he helped trigger a crisis in the way medicines are approved and supervised.
Within weeks, the best-selling Vioxx, made by Merck, had been withdrawn from the market. By late December, the safety of the entire class of
cox-2 inhibitors, to which it belonged, had been thrown into jeopardy, with investigations under way by regulators in the US, the UK and across
the rest of the European Union.
The result has been a bout of soul-searching at the FDA in response to criticism that the institution, which was set up to protect the public from
unsafe drugs, is too closely linked to the industry it is supposed to supervise. "The FDA has betrayed the public trust by giving far more focus to
new drug approvals than to safety. It really views industry as its client," says Dr Graham, who still works for the FDA under the protection
afforded him by US law as a whistle-blower.
The debate has raised fundamental questions about drug regulation in the US, the world's largest and most profitable market for medicine. It has
implications for the FDA's counterparts around the globe, many of which operate in similar ways.
It has put regulators and drug companies on edge amid escalating attacks by politicians and heightened media scrutiny. The debate is likely to
shape reforms to be proposed by the FDA in the coming weeks and by the Institute of Medicine, an independent academic body, in a few months'
time.
"What's come to light about Vioxx . . . makes people wonder if the FDA has lost its way when it comes to making sure drugs are safe," Senator
Chuck Grassley said at the opening of congressional hearings into the drug held in November and likely to resume this spring.
The FDA has a an extraordinarily difficult task. It must weigh up the benefits of authorising new life-saving treatments as quickly as possible
against the risks of approving drugs that have side effects or are subsequently discovered to be lethal to some patients.
For the Pharmaceutical Research and Manufacturers of America (PhRMA), the main trade association, the balance is about right. "It is not at all
clear to us that there needs to be change," Jeff Trewhitt, PhRMA's spokesman says. "There are more than 10,000 medicines on the market and
the vast majority are safely and effectively treating patients. Less than 3 per cent have been withdrawn in the past 20 years. The system works
pretty well."
But there are many who are less sure that the FDA has got it right. "The FDA used to be the gold standard for the world but now its default
position is approval," says Dr Sidney Wolfe, author of Worst Pills, Best Pills published by Public Citizen, a consumer watchdog. He says that, of
the 538 leading medicines currently prescribed in the US, 181 are unsafe or ineffective.
Dr Graham, the FDA whistle-blower, argues that Vioxx was only the most "catastrophic" in a series of lethal regulatory failures in the past
decade. He warns that at least five other lucrative blockbuster drugs on the market should be withdrawn.
He says there were concerns about the heart risks linked to Vioxx in 1999, when the drug was first approved by the FDA, and that subsequent
studies should have led to its withdrawal long before September when it was taken off the market. He estimates the failure to do so led to death
or serious illness in up to 139,000 Americans.
The FDA, Merck and Pfizer, which produces the other leading cox-2 drugs Celebrex and Bextra now also under the spotlight, have all dismissed
Dr Graham's analysis. Merck says doctors were told of potential side-effects and patients often had other ailments that may have caused their
problems.
Merck, Pfizer and other pharmaceuticals companies argue that many lives have been saved by their drugs. Cox-2 drugs work by selectively
inhibiting an enzyme linked to pain. They were designed to avoid the side-effects of the previous generations of anti-inflammatory drugs, notably
gastro-intestinal bleeding, which have also been responsible for many deaths.
However, the regulators concede that the events of the past few months have raised broad concerns. Lester Crawford, the head of the FDA,
describes December 17, the day he launched an investigation into cox-2s, as "one of the biggest days in the FDA's history".
While rejecting any suggestion that his relationship with the industry is too cosy, he concedes that there are legitimate questions about the need
for greater independence in the way new drugs are tested, approved and supervised, and about the way the FDA is organised. The FDA's reform
proposals will try to address these concerns.
Some of the problems the FDA faces are the result of past efforts to bring new drugs swiftly to the market. Pharmaceuticals companies invest
hundreds of millions of dollars in developing new drugs but have only a limited time to recoup the costs while they are under the patents that
allow them to charge high prices. As a result, they are keen to push the FDA for approval as quickly as possible.
Under pressure from the industry, in 1992 Congress passed the Prescription Drug User Fee Act, which has since been twice renewed. It created a
system by which drug companies pay the FDA fees for each medicine it considers for approval. In exchange, the companies are guaranteed a
swift decision, which can take as little as six months for potential breakthrough treatments.
Critics say that these fees, which finance the FDA's Office of New Drugs, create a system in which commercial interests eclipse scientific
judgment. "The assumption is that a drug is safe unless you incontrovertibly prove otherwise," says Dr Graham.
Drug companies hold talks with the FDA early in the drug development process and their executives - or academics funded by the companies take part in these discussions. Senior regulators are lobbied by the pharmaceuticals companies. They may also be tempted by the "revolving
door", since several senior FDA staff have since been recruited by the industry.
"There is tremendous pressure from management to do what drug companies like," says Liz Barbehenn, a toxicologist who quit the FDA in 1998
after her advice was ignored and now works for Public Citizen. "Behind our backs drug company vice-presidents would call up our divisional
directors. I got tired of arguing."
She says that drug companies often exploit loopholes in the approvals process, submitting important additional data that have been requested
only weeks before a decision is due. That makes it impossible for researchers to make a detailed assessment.
Dr Crawford argues that there is always tension in any scientific discussion and that safety is not compromised. He points to recent examples of
FDA refusals, such as AstraZeneca's blood-thinning drug Exanta.
But in an internal survey of 396 FDA scientists polled in 2002 and just released, 18 per cent said they had been under pressure to approve drugs
despite reservations about safety, efficacy or quality. Nearly 60 per cent believed that six months was not enough time to conduct an in-depth,
science-based review.
A second problem with the current regulatory system is that the drug industry finances and controls most of the clinical trials on which FDA
decisions are based. Critics argue that the design of the trials, the manipulation of their results by the companies and the suppression of studies
that give negative results also create a bias towards drug approval. Many tests are carried out against a placebo rather than existing drugs, giving
limited indications of true benefits (see right).
"If you torture the data long enough, they will always confess," says Richard Smith, the former editor of the British Medical Journal, the
authoritative academic magazine.
The pharmaceuticals companies often cite concerns over confidentiality but Dr Smith favours compulsory registration of trials when they are first
launched and free access via the internet to all the results and the software used to manipulate them.
A third problem is that drugs are rarely withdrawn from the market. Many side-effects are detected only once drugs have been authorised and
are being used by a large number of patients over a long period. Yet the FDA has devoted far fewer resources to such post-marketing surveillance.
Dr Crawford says there are 800 scientists handling initial drug approvals in the FDA's Office of New Drugs compared with just 14 in the Office of
Drug Safety, which handles post-approval surveillance. The latter department reports to the former, creating tensions when those responsible for
initial approvals are asked to rethink their decisions. "I had a supervisor who told me 'industry is our client'," says Dr Graham, who worked in the
Office of Drug Safety. harmaceutical companies also dominate the follow-up studies used by the Office of Drug Safety. Most monitoring of how
patients respond to drugs is passive, with more than 90 per cent of adverse reactions notified to the FDA by industry, suggesting that doctors and
independently-funded researchers should be doing much more.
"Post-marketing surveillance is a joke," says Leo Lutwak, another former FDA scientist, who spent five years trying to persuade the FDA to ban
the slimming drug Pondimin and to prevent the approval of Ponderex in 1996 after concerns over their side-effects. He estimates that up to
15,000 Americans died before both were withdrawn in 1997.
Given the political clout of the pharmaceutical industry, Dr Graham remains sceptical about how meaningful the current ideas on reform will be.
But the withdrawal of Vioxx and the doubts raised over cox-2s have created a new public and political momentum. Mr Grassley, the senator, has
already indicated that the FDA's drug safety office and its new drug approvals department should separate.
Meaningful change will require tighter controls to prevent conflicts of interest, greater transparency in the approvals process and more rigorous
supervision of clinical trials and post-approval surveillance.
Dr Wolfe of Public Citizen, who wants the user-fee system overturned, concedes that would require substantial new funding. "My hope is that the
current debate means that those other than the pharmaceutical industry will start to hold the FDA accountable," he says.
Real change also requires a stronger political commitment to reform. Dr Crawford concedes that his task is made more difficult because he does
not have the full authority to run the agency, since he is formally only "acting" commissioner of the FDA. A number of other senior positions in
the organisation remain unfilled.
There are signs that the pharmaceutical industry itself is responding to criticism. Yesterday, drug companies across the world unveiled plans for
better disclosure of clinical trials.
Sir Tom McKillop, chief executive of AstraZeneca, the Anglo-Swedish pharmaceutical group, has called for a provisional approval period for
drugs, during which their effects can be scrutinised. However, he has warned that this could stifle innovation. "There is a danger that, if the
pharmaceuticals companies feel they are getting punished for taking risks . . . they will become more conservative."
But today, the onus is on the drug manufacturers and the regulators to give assurances on safety and to explain their actions more clearly. "In
medicine, we have swept things under the carpet for far too long," says Jerome Kassirer, former editor of the New England Journal of Medicine.
"The public needs to know."
For half a century a "randomised clinical trial" has been the standard way to show that a new treatment is safe and effective. Patients are divided
into two groups, matched for age, sex and state of health. One group gets the experimental medicine and the other takes a placebo or a standard
treatment for the disease.
The trial is an experiment in which the investigator sets up the study groups and analyses the outcome. In contrast the other main source of
clinical data, the "observational study", monitors people who have made their own treatment or lifestyle choices.
An observational study may involve more patients than a clinical trial, and has the apparent advantage of following patients in a more "natural"
setting. But it can be difficult for researchers retrospectively to sort out the confounding factors that are removed in a controlled trial. Several
treatments whose use was established by observational studies have since been discredited by controlled trials. They include hormone
replacement treatment to prevent heart disease and some vitamin supplements to prevent cancer.
Ideally, to prevent unconscious bias and the placebo effect undermining the results, a controlled trial should be "double blind": the drugs are
encoded by a third party so that neither the patients nor their doctors know who is receiving which treatment.
Blinding is one of many things that can go wrong with clinical trials, critics say. Studies suggest that patients and doctors often deduce from the
effects of medication who is receiving the experimental treatment. But researchers rarely discuss the possibility of this partial unmasking when
they publish trial results.
A more fundamental problem with controlled trials is their "external validity". To be clinically useful a trial must not only be internally valid well designed to minimise bias - but also relevant to the outside world. If the trial appears to involve a group of patients tightly defined to suit the
researchers' convenience, doctors will not take the results seriously.
The Lancet medical journal is running a series of papers highlighting this problem. Peter Rothwell of Oxford University, who is co-ordinating the
series, says poor external validity is the most frequent criticism made by clinicians of randomised controlled trials - and applies particularly to
some trials conducted by the pharmaceutical industry.
Another issue is that, because large clinical trials are very expensive, many are "underpowered" - too small to give reliable answers. They may
show through a fluke that a treatment is effective when it is not or, conversely, fail to demonstrate the effectiveness of a treatment that does
work. And adverse side-effects are less likely to appear in small trials.
Although researchers have developed a sophisticated statistical technique called meta-analysis for combining the results of several small trials of
a particular treatment into something more powerful, this has serious limitations. A well-designed trial with 1,000 patients is likely to give more
reliable conclusions than trying to combine 10 trials carried out under varying conditions with 100 patients each. But the academics who carry
out most clinical trials often take an individualist attitude that can make it difficult to organise large multi-centre studies.
The final problem, which has received particular attention over the past year, is the way trial results are published - or in many cases not
published. The bias toward publishing positive clinical data and suppressing negative or inconclusive data is not just a result of the
pharmaceutical industry's self-interest. Leading medical journals, which are overloaded with submissions, naturally choose to publish the more
newsworthy papers with positive results. A more important factor is that researchers simply fail to write up trials with inconclusive or negative
outcomes; from their perspective, it is not a productive use of time.
But attitudes are changing. "I think people are coming to realise that it is ethically unacceptable not to publish trial results," says Sir Iain
Chalmers of the James Lind Library in Oxford, who is a leading authority on evidence-based medicine. "At last the drug companies appreciate
that their attempts to maintain secrecy are not sustainable any longer."
Clive Cookson
http://www.ft.com/cms/s/0/cea3bcda-6051-11d9-bd2f-00000e2511c8.html#axzz4Ai0kB47y