Вы находитесь на странице: 1из 34

GENOMICA FUNCIONAL DR. VÍCTOR TREVIÑO VTREVINO@ITESM.MX

A7-421

Microarrays – Image Analysis

GENOMICA FUNCIONAL DR. VÍCTOR TREVIÑO VTREVINO@ITESM.MX A7-421 Microarrays – Image Analysis
GENOMICA FUNCIONAL DR. VÍCTOR TREVIÑO VTREVINO@ITESM.MX A7-421 Microarrays – Image Analysis

vtrevino@itesm.mx

Microarray - Pre-Processing Purpose

vtrevino@itesm.mx Microarray - Pre-Processing Purpose
vtrevino@itesm.mx Microarray - Pre-Processing Purpose
vtrevino@itesm.mx Microarray - Pre-Processing Purpose
vtrevino@itesm.mx Microarray - Pre-Processing Purpose
vtrevino@itesm.mx Microarray - Pre-Processing Purpose
vtrevino@itesm.mx Microarray - Pre-Processing Purpose
vtrevino@itesm.mx Microarray - Pre-Processing Purpose

vtrevino@itesm.mx

Microarray Image Analysis

TECHNOLOGIES y m sectors probsets (~=3) (~100)
TECHNOLOGIES
y
m
sectors
probsets
(~=3)
(~100)

DNA

TECHNOLOGIES y m sectors probsets (~=3) (~100) D N A x sectors (~=3) Usually 3 Sectors

x sectors (~=3)

Usually 3

Sectors (print-tip)

i x j spots (18x20)

Empty spots landing lights

Target

(cDNA, PCR products, etc.)

Probes

Copies per gene

Organization

Sectors

Controls

etc.) Probes Copies per gene Organization Sectors Controls n probsets (~100) Probeset Usually 1 n x

n probsets (~100)

Probeset

Usually 1

n x m probsets

Oligos

~20

40nt

perfect match probes (pm) mismatch probes (mm)

vtrevino@itesm.mx

Microarray - Image Analysis

vtrevino@itesm.mx Microarray - Image Analysis TECHNOLOGIES RAW DATA 10,000 genes * 2 dyes * 3 copies/gene
vtrevino@itesm.mx Microarray - Image Analysis TECHNOLOGIES RAW DATA 10,000 genes * 2 dyes * 3 copies/gene

TECHNOLOGIES

vtrevino@itesm.mx Microarray - Image Analysis TECHNOLOGIES RAW DATA 10,000 genes * 2 dyes * 3 copies/gene
vtrevino@itesm.mx Microarray - Image Analysis TECHNOLOGIES RAW DATA 10,000 genes * 2 dyes * 3 copies/gene

RAW DATA

10,000 genes * 2 dyes * 3 copies/gene * ~40 pixels/gene

10,000 genes * 20 oligos * 2 (pm,mm) * ~ 36 pixels/gene

= 2,400,00 values

* 20 oligos * 2 (pm,mm) * ~ 36 pixels/gene = 2,400,00 values only 10,000 values

only 10,000 values

Image Analysis Pre-processing

= 14,400,00 values

~ 36 pixels/gene = 2,400,00 values only 10,000 values Image Analysis Pre-processing = 14,400,00 values only

only 10,000 values

Image Analysis vtrevino@itesm.mx Addressing: Estimate location of spot centers. Segmentation: Classify pixels as

Image Analysis

vtrevino@itesm.mx

Image Analysis vtrevino@itesm.mx Addressing: Estimate location of spot centers. Segmentation: Classify pixels as

Addressing: Estimate location of spot centers. Segmentation: Classify pixels as foreground or background. Extraction: For each spot on the array and each dye

• foreground intensities • background intensities Addressing • quality measures.
• foreground intensities
• background intensities
Addressing
quality measures.

Done by GeneChip Affymetrix software

intensities • background intensities Addressing • quality measures. Done by GeneChip Affymetrix software

Image Analysis

vtrevino@itesm.mx

Image Analysis vtrevino@itesm.mx Addressing: Estimate location of spot centers. Segmentation: Classify pixels as
Image Analysis vtrevino@itesm.mx Addressing: Estimate location of spot centers. Segmentation: Classify pixels as

Addressing: Estimate location of spot centers. Segmentation: Classify pixels as foreground or background. Extraction: For each spot on the array and each dye

• foreground intensities

• background intensities

• quality measures.

Addressing (by grid, GenePix)

and each dye • foreground intensities • background intensities • quality measures. Addressing (by grid, GenePix

Image Analysis

vtrevino@itesm.mx

Image Analysis vtrevino@itesm.mx Addressing: Estimate location of spot centers. Segmentation: Classify pixels as
Image Analysis vtrevino@itesm.mx Addressing: Estimate location of spot centers. Segmentation: Classify pixels as

Addressing: Estimate location of spot centers. Segmentation: Classify pixels as foreground or background. Extraction: For each spot on the array and each dye

• foreground intensities

• background intensities

• quality measures.

Segmentation

Irregular feature shape

Circular feature

g m e n t a t i o n Irregular feature shape C i r
g m e n t a t i o n Irregular feature shape C i r
g m e n t a t i o n Irregular feature shape C i r

Finally compute Average

Background Reduction

Background Reduction Extraction: Determining Background
Background Reduction Extraction: Determining Background
Background Reduction Extraction: Determining Background

Extraction:

Determining

Background

Background Reduction Extraction: Determining Background
Background Reduction Extraction: Determining Background

Image Analysis

vtrevino@itesm.mx

Image Analysis vtrevino@itesm.mx Segmentation (Spot detection) Background Estimation Value Sample 1 Sample 1 Gene 1 100
Image Analysis vtrevino@itesm.mx Segmentation (Spot detection) Background Estimation Value Sample 1 Sample 1 Gene 1 100

Segmentation (Spot detection)

Background

Estimation

Value

Sample 1

Sample 1

Gene 1

100

98

Gene 2

209

4209

Gene 3

‑7

2

.

.

.

.

.

.

Gene k

9882

9711

.

.

.

.

.

.

Gene N

2298

28

Value = Spot Intensity – Spot Background

vtrevino@itesm.mx

Data Transformation – two dyes

Sample 1 Sample 1 Gene 1 100 98 Gene 2 209 4209 Gene 3 ‑7
Sample 1
Sample 1
Gene 1
100
98
Gene 2
209
4209
Gene 3
‑7
2
.
.
.
.
.
.
Gene k
9882
9711
.
.
.
.
.
.
R=Sample 1
Log2(R=Sample 1)
9882 9711 . . . . . . R=Sample 1 Log2(R=Sample 1) Log2(G=Sample 1) Gene N

Log2(G=Sample 1)

Gene N

2298

28

G=Sample 1

Log2(R=Sample 1) Log2(G=Sample 1) Gene N 2298 28 G=Sample 1 Log 2 Microarray Bioinformatics - D.
Log 2 Microarray Bioinformatics - D. Stekel (Cambridge, 2003)
Log 2
Microarray Bioinformatics - D. Stekel (Cambridge, 2003)

vtrevino@itesm.mx

Data Transformation – two dyes

(log 2 scale)

Sample 1 Sample 1 Gene 1 100 98 Desv Gene 2 209 4209 Gene 3
Sample 1
Sample 1
Gene 1
100
98
Desv
Gene 2
209
4209
Gene 3
‑7
2
.
.
.
Intensity
.
.
.
MA-Plot
Gene k
9882
9711
.
.
.
.
.
.
G=Sample 1
Gene N
2298
28
R
1 value?
M
G
 R 
M
=
Log 
2
G
Log R G
(
)
2
A =
A
2
R=Sample 1

Normalization – 2 dyes

"With-in"

(2 color technologies)

(assumption: Majority No change)

M log2(R)-log2(G) -4 -3 -2 -1 0 1
M
log2(R)-log2(G)
-4
-3
-2
-1
0
1

8

10

12

(log2(G)+log2(R)) / 2

A

14

16

Normalization – 2 dyes

"With-in"

(2 color technologies)

(assumption: Majority No change)

Before

After

Normalization – 2 dyes "With-in" (2 color technologies) (assumption: Majority No change) Before After
Normalization – 2 dyes "With-in" (2 color technologies) (assumption: Majority No change) Before After
Before Normalization Normalization – 2 dyes "With-in" Spatial (2 color technologies) Aftter loess Global

Before Normalization

Before Normalization Normalization – 2 dyes "With-in" Spatial (2 color technologies) Aftter loess Global
Before Normalization Normalization – 2 dyes "With-in" Spatial (2 color technologies) Aftter loess Global

Normalization – 2 dyes

"With-in" Spatial

(2 color technologies)

Aftter loess Global Normalization

Spatial (2 color technologies) Aftter loess Global Normalization Aftter loess by Sector (print-tip) Normalization
Spatial (2 color technologies) Aftter loess Global Normalization Aftter loess by Sector (print-tip) Normalization

Aftter loess by Sector (print-tip) Normalization

Spatial (2 color technologies) Aftter loess Global Normalization Aftter loess by Sector (print-tip) Normalization

vtrevino@itesm.mx

Data Transformation – one dye

vtrevino@itesm.mx Data Transformation – one dye Sample 1 Gene 1 100 Gene 2 209 Gene 3
Sample 1
Sample 1
Sample 1

Sample 1

Sample 1
Sample 1

Gene 1

100

Gene 2

209

Gene 3

‑7

.

.

.

.

Gene k

9882

.

.

.

.

Gene N

2298

Log 2
Log 2

1.5

1.0

Density

0.5

0.0

Normalization – 1 or 2 dyes Between-slides 7 8 9 10 11 12
Normalization – 1 or 2 dyes
Between-slides
7
8
9
10
11
12

N = 3840

Bandwidth = 0.1051

Before normalization

9 10 11 12 13 14 15 16 density 0.0 0.2 0.4 0.6 0.8 1.0
9
10
11
12
13
14
15
16
density
0.0
0.2
0.4
0.6
0.8
1.0

log intensity

After normalization

quantile MAD (median absolute deviation) scale qspline invariantset loess 10 11 12 13 14 15
quantile
MAD (median absolute deviation)
scale
qspline
invariantset
loess
10
11
12
13
14
15
density
0.0
0.2
0.4
0.6
0.8

x

Summarization – Affymetrix

Oligonucleotide dependent technologies

PM

MM

– Affymetrix Oligonucleotide dependent technologies PM MM Sumarization = "Average"(Intensities) Usual Methods:

Sumarization = "Average"(Intensities)

Usual Methods:

tukey-biweight

av-diff

median-polish

The "summarization" equivalent in two-dyes technologies is the average of gene replicates within the slide.

vtrevino@itesm.mx

Microarrays – Filtering / Treating Undefined Values

Microarrays – Filtering / Treating Undefined Values    Some spots may be defective in the
Microarrays – Filtering / Treating Undefined Values    Some spots may be defective in the

Some spots may be defective in the printing process Some spots could not be detected Some spots may be damaged during the assay Artefacts may be presents (bubbles, etc)

Use replicated spots as averages Remove unrecoverable genes Remove problematic spots in all arrays Infer values using computational methods (warning)

vtrevino@itesm.mx

Microarray – Data Filtering

vtrevino@itesm.mx Microarray – Data Filtering    More than 10,000 genes    Too many data
vtrevino@itesm.mx Microarray – Data Filtering    More than 10,000 genes    Too many data

More than 10,000 genes Too many data increases Computation Time and analysis complexity Remove

Genes that do not change significantly Undefined Genes Low expression

Keeping

Large signal to noise ratio Large statistical significance Large variability Large expression

vtrevino@itesm.mx

Microarray Pre-Processing Summary

vtrevino@itesm.mx Microarray Pre-Processing Summary a) Data Processing b) Image Analysis and Background Subtraction
vtrevino@itesm.mx Microarray Pre-Processing Summary a) Data Processing b) Image Analysis and Background Subtraction

a) Data Processing

Microarray Pre-Processing Summary a) Data Processing b) Image Analysis and Background Subtraction Affymetrix
b) Image Analysis and Background Subtraction Affymetrix Image Background Spot Scanning Detection & Detection
b)
Image Analysis and Background Subtraction
Affymetrix
Image
Background
Spot
Scanning
Detection &
Detection
Subtraction
Microarray
Two‑dyes
& Detection Subtraction Microarray Two‑dyes Intensity Value c ) Transformation d) Normalization

Intensity

Value

c) Transformation

Microarray Two‑dyes Intensity Value c ) Transformation d) Normalization M=log2(R/G) A=log2(R*G)/2 Within Between
Microarray Two‑dyes Intensity Value c ) Transformation d) Normalization M=log2(R/G) A=log2(R*G)/2 Within Between
Microarray Two‑dyes Intensity Value c ) Transformation d) Normalization M=log2(R/G) A=log2(R*G)/2 Within Between
Microarray Two‑dyes Intensity Value c ) Transformation d) Normalization M=log2(R/G) A=log2(R*G)/2 Within Between

d) Normalization

M=log2(R/G)
M=log2(R/G)

A=log2(R*G)/2

Microarray Two‑dyes Intensity Value c ) Transformation d) Normalization M=log2(R/G) A=log2(R*G)/2 Within Between

Within

Microarray Two‑dyes Intensity Value c ) Transformation d) Normalization M=log2(R/G) A=log2(R*G)/2 Within Between
Microarray Two‑dyes Intensity Value c ) Transformation d) Normalization M=log2(R/G) A=log2(R*G)/2 Within Between
Microarray Two‑dyes Intensity Value c ) Transformation d) Normalization M=log2(R/G) A=log2(R*G)/2 Within Between

Between

Microarray Two‑dyes Intensity Value c ) Transformation d) Normalization M=log2(R/G) A=log2(R*G)/2 Within Between

vtrevino@itesm.mx

Image Analysis Exercise

vtrevino@itesm.mx Image Analysis Exercise    Data processing of Placental Microarrays       Dr.
vtrevino@itesm.mx Image Analysis Exercise    Data processing of Placental Microarrays       Dr.

Data processing of Placental Microarrays

Dr. Hugo A. Barrera Saldaña Paper in Mol. Med. 2007 : DNA Microarrays - A Powerful Genomic Tool for Biomedical Research - Trevino - Barrera - Mol Med 2007

in Mol. Med. 2007 : DNA Microarrays - A Powerful Genomic Tool for Biomedical Research -

Search PubMed for Trevino V

Experimental Design Goal : Differential Expression

mRNA Extraction

Labelling

Microarray Hybridization (by duplicates)

Scanning & Data Processing

Detection of Differentially Expressed Genes

Validation and Analysis

Placenta 1 Green
Placenta 1
Green
Reference Pool Red Green
Reference Pool
Red
Green
Placenta 2 Red
Placenta 2
Red
(controls)
(controls)

(Dr. Hugo Barrera)

Within Between Image Normalization Normalization Analysis (per array) (all arrays)
Within
Between
Image
Normalization
Normalization
Analysis
(per array)
(all arrays)
Normalization Analysis (per array) (all arrays) t‑test  H 0 : μ = 0 p‑values correction:

t‑test H 0 : μ = 0 p‑values correction: False Discovery Rate

t‑test  H 0 : μ = 0 p‑values correction: False Discovery Rate Comparison With Known

Comparison With Known Tissue Specific Genes

vtrevino@itesm.mx

Experimental Design - Slides

vtrevino@itesm.mx Experimental Design - Slides   SLIDES' SCANNINGS   GROUP SLIDE CY3 (GREEN)
 

SLIDES' SCANNINGS

 

GROUP

SLIDE

CY3 (GREEN)

CY5(RED)

COMMENTS

1a

52

A

V

Sample

Control

 

1b

52

B

V

Sample

Control

 

2a

51

A

V

Sample

Control

RIGHT TOP GROUP

2b

51

B

V

Sample

Control

RIGHT BOTTOM GROUP

3a

56

A

V

Control

Muestra

 

3b

56

B

V

Control

Muestra

 

4a

A 54

V

Control

Muestra

 

4b

B 54

V

Control

Muestra

 

5a

A 55

V

Control

Control

LEFT TOP GROUP

5b

B 55

V

Control

Control

LEFT BOTTOM GROUP

6a

A 53

V

Control

Control

 

6b

B 53

V

Control

Control

 

Download Images from

http://bioinformatica.mty.itesm.mx/?q=node/68

vtrevino@itesm.mx

Read Images

vtrevino@itesm.mx Read Images Read BOTH Images together using SpotFinder Mark file 1 as "Cy3" = Green
vtrevino@itesm.mx Read Images Read BOTH Images together using SpotFinder Mark file 1 as "Cy3" = Green
vtrevino@itesm.mx Read Images Read BOTH Images together using SpotFinder Mark file 1 as "Cy3" = Green

Read BOTH Images together using SpotFinder

Mark file 1 as "Cy3" = Green Mark file 2 as "Cy5" = Red

Adjust Image Brightness and Contrast

SpotFinder Mark file 1 as "Cy3" = Green Mark file 2 as "Cy5" = Red Adjust

vtrevino@itesm.mx

Create Grid

vtrevino@itesm.mx Create Grid Create Grid Metarows = 12, Metacolumns = 4 Rows = 24, Columns =
vtrevino@itesm.mx Create Grid Create Grid Metarows = 12, Metacolumns = 4 Rows = 24, Columns =
vtrevino@itesm.mx Create Grid Create Grid Metarows = 12, Metacolumns = 4 Rows = 24, Columns =

Create Grid

Metarows = 12, Metacolumns = 4 Rows = 24, Columns = 24 Pixels = 450 (of the 24 x 24 spots) Spacing = 18 (between metacolumns and metarows)

= 4 Rows = 24, Columns = 24 Pixels = 450 (of the 24 x 24

vtrevino@itesm.mx

Adjust Grid

vtrevino@itesm.mx Adjust Grid Use “Move All” Created Grids are not aligned to the image. To adjust

Use “Move All”

Created Grids are not aligned to the image.

“Move All” Created Grids are not aligned to the image. To adjust overall position. Use visible

To adjust overall position. Use visible all to restore “grid”.

overall position. Use visible all to restore “grid”. Use “Visible All” (right click in a blank
overall position. Use visible all to restore “grid”. Use “Visible All” (right click in a blank

Use “Visible All” (right click in a blank area)

Adjust each of the 12*4 Grids to correct positions

Right mouse button in a grid to move that grid Arrow keys also work

Right mouse button in a blank section to move all grids

Save Grid vtrevino@itesm.mx    Save the grid frequently to avoid loosing your work

Save Grid

vtrevino@itesm.mx

Save Grid vtrevino@itesm.mx    Save the grid frequently to avoid loosing your work

Save the grid frequently to avoid loosing your work

Save Grid vtrevino@itesm.mx    Save the grid frequently to avoid loosing your work
Image Analysis vtrevino@itesm.mx                  

Image Analysis

vtrevino@itesm.mx

Image Analysis vtrevino@itesm.mx                   

Use Gridding and Processing

Adjust (save grid first, in mac adjust doesn´t work well) Process

Copy images

1 From the grid adjust 1 From the RI plot 1 From the data (figure) 2 From the QC view (A and B) What does they represent?

Export to .mev file Open .mev file in excel Remove comment lines Compute signal:

Signal fondo B A = Cy3 Green = MNA - MedBkgA = Media del spot A - Mediana del

Signal B B = Cy5 Red = MNB - MedBkgB = Media del spot B - mediana del fondo

Plot Signal A vs Signal B

Copy image in a word file

DO NOT SAVE THE modified .MEV FILE

Execute Process vtrevino@itesm.mx - Select Gridding Tab -   Use Histogram Segmentation -   Spot

Execute Process

vtrevino@itesm.mx

Execute Process vtrevino@itesm.mx - Select Gridding Tab -   Use Histogram Segmentation -   Spot Size
Execute Process vtrevino@itesm.mx - Select Gridding Tab -   Use Histogram Segmentation -   Spot Size
Execute Process vtrevino@itesm.mx - Select Gridding Tab -   Use Histogram Segmentation -   Spot Size

- Select Gridding Tab - Use Histogram Segmentation

-

Spot Size = 10

- Process All !

vtrevino@itesm.mx

Inspect DATA PROCESSED

vtrevino@itesm.mx Inspect DATA PROCESSED    Select Data Tab    Select a row / spot
vtrevino@itesm.mx Inspect DATA PROCESSED    Select Data Tab    Select a row / spot

Select Data Tab

Inspect DATA PROCESSED    Select Data Tab    Select a row / spot 
Inspect DATA PROCESSED    Select Data Tab    Select a row / spot 

Select a row / spot

See results and interpret output

vtrevino@itesm.mx Inspect MA-PLOT    Select RI-PLOT Tab    Observe the MA-PLOT   

vtrevino@itesm.mx

Inspect MA-PLOT

vtrevino@itesm.mx Inspect MA-PLOT    Select RI-PLOT Tab    Observe the MA-PLOT    You
vtrevino@itesm.mx Inspect MA-PLOT    Select RI-PLOT Tab    Observe the MA-PLOT    You

Select RI-PLOT Tab Observe the MA-PLOT You can switch on/off specific grids A tendency can be observed (which has to be corrected to 0 see MIDAS exercise)

can switch on/off specific grids    A tendency can be observed (which has to be
can switch on/off specific grids    A tendency can be observed (which has to be

vtrevino@itesm.mx

vtrevino@itesm.mx Quality Control View    Quality view tab    View 2 gives if each

Quality Control View

vtrevino@itesm.mx Quality Control View    Quality view tab    View 2 gives if each
vtrevino@itesm.mx Quality Control View    Quality view tab    View 2 gives if each

Quality view tab

Quality Control View    Quality view tab    View 2 gives if each had

View 2 gives if each had M > 1 (yellow, or 0.5 in this image) or M < -1

View 1 gives the count of all M values per color (yellow, gray, blue, and green)

vtrevino@itesm.mx

Export DATA and VIEW in Excel

vtrevino@itesm.mx Export DATA and VIEW in Excel    Save data to a .mev file 
vtrevino@itesm.mx Export DATA and VIEW in Excel    Save data to a .mev file 
vtrevino@itesm.mx Export DATA and VIEW in Excel    Save data to a .mev file 

Save data to a .mev file

Open .mev file in excel

Remove comment lines

(important !) Compute signal:

 

Signal A = Cy3 Green = MNA -

MedBkgA = Media del spot A - Mediana del fondo B Signal B = Cy5 Red = MNB - MedBkgB = Media del spot B - mediana del fondo B

Plot Signal A vs Signal B

 

Copy image in a word file

DO NOT SAVE THE modified .MEV FILE

The Plot in Excel should be similar to the MA plot (RI-Plot)

vtrevino@itesm.mx

Resumen del Uso de SpotFinder

vtrevino@itesm.mx Resumen del Uso de SpotFinder    Leímos 2 imágenes, Verde=Cy3, Roja=Cy5 para generar un
vtrevino@itesm.mx Resumen del Uso de SpotFinder    Leímos 2 imágenes, Verde=Cy3, Roja=Cy5 para generar un

Leímos 2 imágenes, Verde=Cy3, Roja=Cy5 para generar un valor de “intensidad” con ruido de

Imagen Datos
Imagen
Datos

fondo reducido para cada color:

Generamos un grid con la cantidad de spots y diseño espacial especificado para el microarreglo Ajustamos las posiciones visualmente moviendo los grids Calculamos el valor de la señal y el ruido de fondo para cada color Obtuvimos un archivo con datos