Вы находитесь на странице: 1из 21

Empirical Bayesian DataMining for Discovering Patterns in PostMarketing Drug Safety

Maryam Zardad MS101001

Introduction
It is widely recognized that therapeutic products must be fully characterized before these products are approved for marketing When a new drug evolved it has to go through two stages/phases.
Pre-marketing Studies Post-marketing Surveillance

Pre-marketing studies are inherently too short, with study populations that are too small and too homogeneous. The main goal of pre-marketing studies is to approve the drug for particular problem. The opportunity for a new drugs true side effect profile to reveal itself is often realized after the drug is approved and then used in conjunction with other therapies.

To provide an objective basis for safety of marketed products, companies and agencies have implemented post-marketing surveillance activities based in large measure on the collection of spontaneously generated adverse reaction reports.

An Example in post-marketing
Design
Dose Level i Dose is the MTD 2/3 1/3 DLTs ? Dose 3 more patients 2/6

maximum tolerated dose

Dose 3 patients

Total DLTs ? =1/6

0/3

Dose Limiting Toxicity

Dose is safe

Dose is safe Dose Level i + 1

Drug safety assessment is an important goal in the drug development and Post Marketing Surveillance (PMS) It contributes to the balance of benefits and risks of the product. For each adverse event case, there is a demographic record (age, gender, date of event, seriousness of event) and one or more records documenting a sign, symptom or diagnosis. Individual databases may also contain outcomes (e.g., hospitalization, death)

As a result large databases of spontaneous adverse event reports have come into existence. The focus of this paper is to define, design, implement, test, and deploy a webbased visual data mining environment (WebVDME) that is accessible to the intended medical end-user community.

SDLC
The project need to follow a documented System Development Life Cycle (SDLC) Requirement analysis Design coding testing

Requirement analysis
A thin client Data mining including identification of signals related to two-way (drug-event) and multi-way (drug interaction, multi-event syndrome) associations. Access to the major public U.S. drug and vaccine databases A user interface suitable for direct use by medical staff. Output of data mining results in graphical and tabular and also the ability to download results for use with Excel

Design

Coding
Problem: Market basket problem in which a database of transactions (adverse event reports) is mined for the occurrence of interesting itemsets. Interestingness is related to the factor by which the observed frequency of an itemset differs from a nominal baseline frequency.

The baseline frequency is usually taken to be the frequency that would be expected under the full independence model, in which the likelihood of a given item showing up in a report is independent of what other items appear in the report.

Let Ns be the number of transactions in database s, let Nsj be the number of transactions in s that include item j, let Nsjk be the number of transactions in s that include both item j and item k, and so forth. Define Psj = Nsj/Ns as the proportion of transactions in s that include item j.

The baseline frequencies for item sets of size 1, 2, 3, and 4 are, respectively, ej, ejk, ejkl, and ejklm, where ej = s NsPsj = s Nsj ejk = s NsPsjPsk ejkl = s NsPsjPskPsl ejklm = s NsPsjPskPslPsm

For example, if 2% of all reports have PROZAC as a drug, and 3% of all reports have RASH as an event, then one would expect that 0.06% (0.02*0.03) of the reports will include this combination (PROZAC in combination with RASH)

We use lower case ns for total acrossdatabase counts: nj = s Nsj njk = sNsjk, etc.

Comparing Observed and Expected Counts: Relative Reporting Rate


Then, for all item the raw relative rates are defined as Rjk = njk/ejk Rjkl = njkl/ejkl Rjklm = njklm/ejklm

The following all have R = 100:


N = 1000, N = 100, N = 10, N= 1, E = 10 E= 1 E = 0.1 E = 0.01

The highest-ranking pairs according to R had small n but tiny e, such as (n = 1, e = .0003, R = 3300). The highest ranking pairs according to statistical significance had huge n, but not very large values of R, such as (n = 7500, e = 500, R = 15).

Conditional independance:

MULTI-ITEM ASSOCIATION

Chemotherapy is independant of Smoking conditional to Cancer=Yes

Smoking

Lung Cancer

Chemo

For example, suppose in a database of patient drug reactions, A and B are two drugs, and C is the occurrence of kidney failure. In case 1, A and B may act independently upon the kidney In case 2, A and B may have no effect on the kidney if taken alone, but when taken together a drug interaction occurs that often leads to kidney failure.

Вам также может понравиться