Академический Документы
Профессиональный Документы
Культура Документы
2.3
manual
About
MZmine
2
Copyright
(c)
2005-2011
MZmine
Development
Team
MZmine
2
is
an
open-source
framework
for
processing,
visualization
and
analysis
of
mass
spectrometry
based
molecular
profile
data.
It
is
based
on
the
original
MZmine
toolbox
described
in
2006
Bioinformatics
publication.
References
Pluskal,
T.,
Castillo,
S.,
Villar-Briones,
A.,
Orei,
M.
(2010).
MZmine
2:
Modular
framework
for
processing,
visualizing,
and
analyzing
mass
spectrometry-based
molecular
profile
data,
BMC
Bioinformatics
11:395
Katajamaa,
M.,
Miettinen,
J.,
and
Orei,
M.
(2006).
MZmine:
Toolbox
for
processing
and
visualization
of
mass
spectrometry
based
molecular
profile
data,
Bioinformatics
22,
634-
636
Reporting
bugs
We
appreciate
any
feedback
from
the
users
of
MZmine
2.
Please
do
let
us
know
your
suggestions
regarding
the
design
of
the
software.
Especially,
please
inform
us
about
any
problems
or
bugs
you
may
encounter
while
using
the
software.
Contact
email
for
MZmine
2
developers
mailing
list:
mzmine-devel@lists.sourceforge.net
License
This
program
is
free
software;
you
can
redistribute
it
and/or
modify
it
under
the
terms
of
the
GNU
General
Public
License
as
published
by
the
Free
Software
Foundation;
either
version
2
of
the
License,
or
(at
your
option)
any
later
version.
This
program
is
distributed
in
the
hope
that
it
will
be
useful,
but
WITHOUT
ANY
WARRANTY;
without
even
the
implied
warranty
of
MERCHANTABILITY
or
FITNESS
FOR
A
PARTICULAR
PURPOSE.
See
the
GNU
General
Public
License
for
more
details.
You
should
have
received
a
copy
of
the
GNU
General
Public
License
along
with
this
program;
if
not,
write
to
the
Free
Software
Foundation,
Inc.,
51
Franklin
Street,
Fifth
Floor,
Boston,
MA
02110-1301,
USA.
Table
of
contents
4.
Peak
detection........................................................................................................................... 17
4.1.
Mass
detection ..................................................................................................................................17
4.1.1.
Centroid
mass
detector............................................................................................................................. 18
4.1.2.
Exact
mass
detector ................................................................................................................................... 18
4.1.3.
Local
maxima
mass
detector .................................................................................................................. 19
4.1.4.
Recursive
threshold
mass
detector..................................................................................................... 20
4.1.5.
Wavelet
transform
mass
detector........................................................................................................ 21
4.2.
FTMS
shoulder
peaks
filter
(optional) ......................................................................................23
4.3.
Chromatogram
builder ..................................................................................................................27
4.4.
MS/MS
peak
picker..........................................................................................................................28
4.5.
Chromatogram
deconvolution ....................................................................................................29
4.5.1.
Baseline
cut-off............................................................................................................................................. 29
4.5.2.
Noise
amplitude........................................................................................................................................... 30
4.5.3.
Savitzky-Golay .............................................................................................................................................. 31
4.5.4.
Local
minimum
search.............................................................................................................................. 32
4.6.
Peak
extender ...................................................................................................................................33
4.7.
Peak
shape
modeler........................................................................................................................35
7.
Gap
filling.................................................................................................................................... 48
7.1.
Peak
finder .........................................................................................................................................48
7.2.
Same
m/z
and
RT
range
gap
filler ..............................................................................................50
8.
Normalization ........................................................................................................................... 51
8.1.
Linear
normalizer............................................................................................................................51
8.1.1.
Normalization
factors................................................................................................................................ 51
8.2.
Retention
time
normalizer ...........................................................................................................52
8.3.
Standard
compound
normalizer.................................................................................................53
9.
Identification ............................................................................................................................. 55
9.1.
Adduct
Search ...................................................................................................................................55
9.2.
Peak
complex
search ......................................................................................................................57
9.3.
Custom
database
search ................................................................................................................58
9.4.
Chemical
formula
prediction .......................................................................................................59
9.5.
Online
database
search..................................................................................................................60
9.5.1.
Supported
databases ................................................................................................................................. 63
9.6.
Fragment
search...............................................................................................................................63
9.7.
Glycerophospholipid
prediction.................................................................................................65
9.8.
NIST
MS
Search .................................................................................................................................65
2 GHz CPU
1 GB RAM
4 GB or more RAM
3. Edit
the
startup
script
in
a
text
editor
and
adjust
the
various
parameters,
particularly
HEAP_SIZE
(amount
of
memory
allocated
for
the
Java
Virtual
Machine),
R_HOME
(the
path
to
R
installation,
if
installed),
NIST_MS_SEARCH_PATH
(the
path
to
NIST
database,
if
installed)
and
JAVA_COMMAND
(path
to
Java
runtime
start
command).
The
order
of
the
raw
data
files
and
peak
lists
in
the
project
tree
may
be
changed
using
mouse
(drag
&
drop).
The
first
step
of
the
batch
is
performed
on
those
raw
data
files/peak
lists
selected
by
the
user
at
the
time
of
starting
the
batch.
The
following
steps
of
the
batch
are
performed
on
the
results
obtained
by
the
previous
step.
For
example,
if
the
first
step
of
the
batch
is
Chromatogram
builder,
it
will
produce
peak
lists
as
a
result.
The
following
step
of
the
batch
may
be
Peak
list
deconvolution,
and
it
will
be
performed
on
the
peak
lists
obtained
from
the
first
step.
NetCDF
10
Method
parameters
Retention
time
range
Retention
time
boundary
of
the
cropped
region.
11
Method
parameters
Window
length
One-sided
length
of
the
m/z
smoothing
window.
12
polynomial
regression
(of
degree
k)
on
a
series
of
values
(of
at
least
k+1
points
which
are
treated
as
being
equally
spaced
in
the
series)
to
determine
the
smoothed
value
for
each
point.
http://en.wikipedia.org/wiki/Savitzky-Golay_smoothing_filter
Raw
data
file
before
and
after
the
filter
was
applied:
Method parameters
Number
of
datapoints
This
number
can
be
5,
7,
9,
11,
13
or
15.
Method
parameters
m/z
range
m/z
boundary
of
the
cropped
region.
Retention
time
range
Retention
time
boundary
of
the
cropped
region.
13
Method
parameters
m/z
bin
length
The
length
of
m/z
bin.
The
full
range
of
m/z
values
present
in
the
raw
data
is
divided
into
a
series
of
bins
of
a
specified
width
(see
m/z
bin
width).
2.
For
each
bin
a
chromatogram
is
constucted
from
the
raw
data
points
whose
m/z
values
fall
within
the
bin.
This
chromatogram
(see
Chromatogram
type)
may
be
either
the
base
peak
chromatogram
or
total
ion
count
(TIC)
chromatogram.
3.
The
raw
intensity
values
of
each
data
point
in
a
bin
are
corrected
by
subtracting
the
bin's
baseline.
Subtraction
of
baseline
intensity
values
proceeds
according
to
the
type
of
chromatogram
used
to
determine
the
baseline.
If
the
base
peak
chromatogram
was
used
then
the
corrected
intensity
values
are
calculated
as
follows:
Icorr
=
max(0,
Iorig
-
Ibase)
If
14
the
TIC
chromatogram
was
used
then
the
corrected
intensity
values
are
calculated
as
follows:
Icorr
=
max(0,
Iorig
*
(1
-
Ibase
/
Imax))
where
Iorig,
Ibase,
Imax
and
Icorr
are
the
original,
baseline,
maximum
and
corrected
intensity
values,
respectively,
for
a
given
scan
and
m/z
bin.
If
Ibase
is
less
or
equal
to
zero
then
no
correction
is
performed,
i.e.
Icorr
=
Iorig.
4.
A new raw data file is generated from the corrected intensity values.
Method
Parameters
Filename
suffix
The
text
to
append
to
the
name
of
the
baseline
corrected
raw
data
file.
Chromatogram
type
TIC:
total
ion
count,
i.e.
summed
intensities
per
scan,
or
Base
peak
intensity:
maximum
intensity
per
scan.
MS-level
MS
level
to
which
to
apply
correction.
Select
"0"
for
all
levels.
Smoothing
The
smoothing
factor.
Typically
in
the
range
105
to
108.
Larger
values
produce
a
smoother
baseline.
Asymmetry
The
weight
(p)
for
points
below
the
trendline.
Conversely,
1-p
is
the
weight
applied
to
points
above
the
trendline.
For
baselines
use
a
small
value
of
p.
Use
m/z
bins
Baselines
can
be
calculated
and
data
points
corrected
per
m/z
bin
or
to
the
entire
raw
data
file.
If
no
binning
is
performed
then
a
single
chromatogram
is
calculated
for
the
entire
raw
data
file
and
its
baseline
used
to
correct
the
full
data
file.
No
binning
is
very
quick
but
much
less
accurate
and
so
is
only
suitable
for
fine-tuning
the
smoothing
and
asymmetry
parameters.
m/z
bin
width
The
width
of
the
m/z
bins
if
binning
is
performed
(see
use
m/z
bins).
Smaller
bin
widths
result
in
longer
processing
times
and
greater
memory
requirements.
Avoid
values
below
0.01.
Remove
source
file
Whether
to
remove
the
original
raw
data
file
once
baseline
correction
is
complete.
Requirements
This
module
relies
on
the
R
statistical
computing
software
being
installed
and
two
"packages"
15
being
installed
in
R:
1.
2.
rJava:
provides
an
interface
between
MZmine
and
R.
To
install
rJava
run
R
and
enter
install.packages("rJava")
In
order
for
MZmine
to
make
use
of
R,
various
environment
variables
related
to
R
and
rJava
must
be
correctly
defined
in
the
MZmine
start-up
shell-script
or
batch
file.
References
[1]
Boelens,
H.F.M.,
Eilers,
P.H.C.,
Hankemeier,
T.
(2005)
"Sign
constraints
improve
the
detection
of
differences
between
complex
spectral
data
sets:
LC-IR
as
an
example",
Analytical
Chemistry,
77,
7998
8007.
16
4. Peak
detection
4.1. Mass
detection
The
Mass
detection
module
generates
a
list
of
masses
(ions)
for
each
scan
in
the
raw
data
file.
Several
algorithms
are
provided
for
this
step.
The
choice
of
the
optimal
algorithm
depends
on
the
raw
data
characteristics
(mass
resolution,
mass
precision,
peak
shape,
noise).
In
case
the
raw
data
is
already
centroided,
only
one
algorithm
(Centroid
mass
detector)
can
be
used.
Other
algorithms
work
only
with
continuous
type
data.
When
mass
lists
are
generated
for
all
MS
level
1
scans,
a
green
check
mark
will
appear
at
the
icon
of
the
raw
data
file.
Each
mass
list
can
be
opened
by
expanding
individual
scans
from
the
project
tree:
Method
parameters
Mass
detector
Algorithm
to
use
for
mass
detection
and
its
parameters
MS
level
MS
level
of
scans,
for
which
the
mass
lists
should
be
generated
Mass
list
name
17
Name
of
the
new
mass
list.
If
the
processed
scans
already
have
a
mass
list
of
that
name,
it
will
be
replaced.
Method
parameters
Noise
level
The
minimum
intensity
level
for
a
data
point
to
be
considered
part
of
a
chromatogram.
All
data
points
below
this
intensity
level
are
ignored.
18
Method
parameters
Noise
level
The
minimum
intensity
level
for
a
data
point
to
be
considered
part
of
a
chromatogram.
All
data
points
below
this
intensity
level
are
ignored.
Method
parameters
Noise
level
The
minimum
intensity
level
for
a
data
point
to
be
considered
part
of
a
chromatogram.
All
data
points
below
this
intensity
level
are
ignored.
19
Method
parameters
Noise
level
The
minimum
intensity
level
for
a
data
point
to
be
considered
part
of
a
chromatogram.
All
data
points
below
this
intensity
level
are
ignored.
Min
m/z
peak
width
Minimum
acceptable
m/z
difference
between
the
first
and
last
data
point
of
an
ion
signal
(m/z
width).
This
parameter
is
used
to
determine
when
to
stop
the
recursive
search
and
discard
peaks
which
are
too
small.
Max
m/z
peak
width
Maximum
acceptable
m/z
difference
between
the
first
and
last
data
point
of
an
ion
signal
(m/z
width).
This
parameter
is
used
to
determine
when
to
stop
the
recursive
search.
20
Mathematical
model
In
mathematics
and
numerical
analysis,
the
Mexican
hat
wavelet
is
the
normalized
second
derivative
of
a
Gaussian
function.
The
parameter
t
is
the
intensity
of
each
data
point
in
the
curve,
and
sigma
corresponds
to
the
standard
deviation.
21
To
simplify
the
process
of
wavelet
calculation,
the
original
function
is
transformed
into
two
parts,
where
Wc
is
the
wavelet
coefficient
and
y
is
the
intensity
of
the
wavelet
at
certain
point.
In
the
following
formula,
"t"
is
the
Wavelet
window
size(%)
parameter.
The
limits,
where
the
Mexican
Hat
wavelet
is
evaluated,
are
from
-5
until
5
(ESL,
ESR)
and
the
incremental
step
used
in
this
range
is
the
result
of
divide
the
width
of
ESL
to
ESR
range
by
60,000.
The
number
of
coefficients
used
to
calculated
the
wavelet
intensity
depends
on
the
Scale
level
parameter.
Method
parameters
Noise
level
The
minimum
intensity
level
for
a
data
point
to
be
considered
part
of
a
chromatogram.
All
data
points
below
this
intensity
level
are
ignored.
Scale
level
This
value
is
the
scale
factor
that
either
dilates
or
compresses
the
wavelet
signal.
When
the
scale
factor
is
relatively
low,
the
signal
is
more
contracted
which
in
turn
results
in
a
more
detailed
resulting
graph
and
as
a
consequence
more
noisy
peaks
are
detected.
On
the
other
hand,
when
the
scale
factor
is
high,
the
signal
is
stretched
out
which
means
that
the
resulting
graph
will
be
presented
in
less
detail
resulting
in
a
smoothed
signal.
Wavelet
window
size
(%)
This
value
is
the
size
of
the
window
used
to
calculated
the
wavelet
signal.
When
the
size
of
the
window
is
small,
more
noisy
peaks
can
be
detected.
The
proper
setting
of
this
parameter
may
help
to
avoid
the
undesired
noise
peaks.
Wavelet
window
size
at
10%:
22
Wavelet
window
size
at
100%:
Preview
dialog:
23
Method
parameters
Mass
resolution
of
the
data
Defines
the
width
of
the
model,
which
should
be
equal
to
the
estimated
resolution
of
the
peaks
in
the
raw
data.
Mass
resolution
is
defined
according
to
the
following
scheme:
Peak
model
function
Defines
the
shape
of
the
model
function,
as
described
below.
The
parameter
"a"
is
the
height
of
the
curve's
peak,
"b"
is
the
position
of
the
center
of
the
peak,
and
"c"
controls
the
width
of
the
"bell".
24
The
Lorentzian
peak
model
is
described
by
the
following
formula:
Where
"x0"
is
the
location
parameter,
specifying
the
location
of
the
peak
of
the
distribution,
and
"y"
is
the
scale
parameter
which
specifies
the
width
of
the
peak.
25
Example
of
running
the
shoulder
peaks
filter
on
LTQ
Orbitrap
data:
26
Method
parameters
Mass
list
name
Choose
a
name
of
the
mass
lists
to
be
used
for
building
chromatograms.
The
mass
lists
must
be
previously
generated
for
each
MS
scan
by
the
Mass
detector
module.
Min
time
span
Minimum
time
over
which
the
same
ion
must
be
connected
in
order
to
be
recognized
as
a
chromatogram.
This
parameter
should
be
set
by
observing
the
raw
data
according
to
the
standard
length
of
chromatographic
peaks.
Min
height
Minimum
intensity
of
the
highest
data
point
in
the
chromatogram.
If
chromatogram
height
is
below
this
level,
it
is
discarded.
m/z
tolerance
Maximum
m/z
difference
of
data
points
in
consecutive
scans
in
order
to
be
connected
in
the
same
chromatogram.
Suffix
The
resulting
chromatogram
will
be
named
file
name
+
suffix
27
The
resulting
chromatograms
can
be
opened
in
the
peak
list
table
visualizer
using
a
double-click:
Method
parameters
m/z
window
This
is
the
m/z
range
(window)
where
the
search
of
a
base
peak
ion
in
the
parent
scan
is
done.
28
Method
parameters
Suffix
This
string
is
added
to
the
end
of
the
name
of
each
processed
peak
list
Peak
recognition
Selection
of
algorithm
for
peak
recognition
(see
below)
Remove
original
peak
list
If
selected,
the
original
chromatogram
is
automatically
removed
and
only
deconvoluted
version
will
remain
Method
parameters
Min
peak
height
Minimum
acceptable
height
(intensity)
for
a
chromatographic
peak
Min
peak
duration
Minimum
acceptable
length
(time
duration)
for
a
chromatographic
peak
Baseline
level
Level
below
which
all
data
points
of
the
chromatogram
are
removed
29
The
intensity
range
of
the
chromatogram
is
divided
into
bins
of
the
user-
specified
size
(the
"Noise
amplitude"
parameter)
2.
The
bin
with
the
highest
number
of
data
points
is
found.
This
bin
represents
the
intensity
level
of
the
noise
signal.
3.
Baseline level is set equal to intensity of the bin with the most data points
30
Method
parameters
Min
peak
height
Minimum
acceptable
height
(intensity)
for
a
chromatographic
peak
Min
peak
duration
Minimum
acceptable
length
(time
duration)
for
a
chromatographic
peak
Amplitude
of
noise
This
value
is
the
intensity
amplitude
of
the
signal
in
the
noise
region
4.5.3. Savitzky-Golay
This
method
uses
the
Savitzky-Golay
polynomial
(A.
Savitzky
and
M.
J.
E.
Golay,
Anal.
Chem.,
36,
1627
(1964))
to
get
the
second
smoothed
derivative
of
the
chromatogram
intensities.
The
following
figure
(left)
presents
the
shape
of
a
Gaussian
peak
(a),
the
first
derivative
(b),
and
31
the
second
derivative
(c).
The
figure
at
right
side
shows
how
the
signal
(blue
line)
may
be
divided
into
individual
chromatographic
peaks
by
observing
the
second
derivative.
Method
parameters
Min
peak
height
Minimum
acceptable
height
(intensity)
for
a
chromatographic
peak
Min
peak
duration
Minimum
acceptable
length
(time
duration)
for
a
chromatographic
peak
Derivative
threshold
level
Minimum
acceptable
intensity
in
the
2nd
derivative
for
peak
recognition
32
length.
The
Chromatographic
threshold
parameter
may
be
used
if
the
chromatogram
contains
some
background
noise.
Method
parameters
Chromatographic
threshold
Threshold
for
removing
noise.
The
algorithm
finds
such
intensity
that
given
percentage
of
the
chromatogram
data
points
is
below
that
intensity,
and
removes
all
data
points
below
that
level
Search
minimum
in
RT
range
If
a
local
minimum
is
minimal
in
this
range
of
retention
time,
it
will
be
considered
a
border
between
two
peaks
Minimum
relative
height
Minimum
height
of
a
peak
relative
to
the
chromatogram
top
data
point
Minimum
absolute
height
Minimum
absolute
height
of
a
peak
to
be
recognized
Min
ratio
of
peak
top/edge
Minimum
ratio
between
peak's
top
intensity
and
side
(lowest)
data
points.
This
parameter
helps
to
reduce
detection
of
false
peaks
in
case
the
chromatogram
is
not
smooth.
33
both
directions
by
searching
for
data
points
within
given
m/z
tolerance
and
above
given
minimal
height.
When
no
data
point
is
found,
extending
is
stopped.
Method
parameters
Suffix
This
string
is
added
to
the
end
of
the
name
of
each
processed
peak
list
m/z
tolerance
Maximum
allowed
distance
in
M/Z
between
data
points
in
successive
scans
Min
height
Minimum
allowed
intensity
for
succesive
scans.
When
intensity
drops
below
this
level,
extending
is
stopped.
Remove
original
peak
list
If
selected,
the
original
peak
list
is
automatically
removed
Peak
list
obtained
from
MS/MS
peak
detector:
34
The
same
peak
list
after
running
peak
extender:
Method
parameters
Suffix
This
string
is
added
to
the
end
of
the
name
of
each
processed
peak
list
Mass
resolution
Mass
resolution
is
the
dimensionless
ratio
of
the
mass
of
the
peak
divided
by
its
width.
Peak
width
is
taken
as
the
full
width
at
half
maximum
intensity
(FWHM)
Shape
model
The
type
of
peak
shape
model
to
use.
Triangle
is
a
simple
triangle
model,
designed
just
to
demonstrate
the
principle.
Gaussian
(symmetric)
and
EMG
(Exponentially
Modified
Gaussian;
asymmetric)
models
may
provide
a
better
approximation
of
chromatographic
peaks
shape.
Remove
original
peak
list
If
selected,
the
original
chromatogram
is
automatically
removed
and
only
deconvoluted
version
will
remain
35
36
5. Isotope
patterns
5.1. Isotopic
peaks
grouper
This
module
attepts
to
find
those
peaks
in
a
peak
list,
which
form
an
isotope
pattern.
When
isotope
pattern
is
found,
the
information
about
the
charge
and
isotope
ratios
is
saved,
and
additional
isotopic
peaks
are
removed
from
the
peak
list.
Only
the
highest
isotope
is
kept.
Note
that
deisotoping
is
performed
after
the
Chromatogram
builder
and
Deconvolution.
Therefore,
MZmine
does
not
search
for
isotopic
peaks
in
individual
scans,
but
instead
tries
to
identify
those
peak
list
entries,
which
form
an
isotope
pattern
together.
Method
parameters
Name
suffix
Suffix
to
be
added
to
peak
list
name
m/z
tolerance
Maximum
distance
in
m/z
from
the
expected
location
of
a
peak
RT
tolerance
Maximum
distance
in
RT
from
the
expected
location
of
a
peak
Monotonic
shape
If
true,
then
monotonically
decreasing
height
of
isotope
pattern
is
required
Maximum
charge
Maximum
charge
to
consider
for
detecting
the
isotope
patterns
Remove
original
peaklist
If
checked,
original
peaklist
will
be
removed
and
only
deisotoped
version
remains
Deisotoping
algorithm
Peaks
in
the
peak
list
are
processed
in
the
order
of
decreasing
height.
For
each
peak,
MZmine
tries
to
find
the
most
appropriate
charge
state
by
comparing
the
number
of
identified
isotopes
for
each
possible
charge.
For
each
charge
state,
peaks
that
fit
the
m/z
and
RT
distance
limits
are
considered
as
isotopes.
The
charge
state
with
the
highest
number
of
identified
isotopes
is
selected,
and
the
isotope
pattern
is
generated.
37
38
39
Comparison
algorithm
The
similarity
of
two
isotope
patterns
is
determined
as
follows:
1.
Both
isotope
patterns
are
normalized
(such
that
highest
isotope
in
each
pattern
has
the
intensity
of
1.0)
and
merged
into
a
single
spectrum,
where
all
isotopes
from
the
first
pattern
have
a
positive
intensity,
while
for
the
isotopes
of
the
second
pattern
the
intensity
is
negated.
2.
3.
A
trivial
observation
is
that
for
two
identical
isotope
patterns
the
similarity
score
will
be
100%,
while
for
two
completely
different
patterns
0%
score
is
returned.
Only
a
single
parameter
is
required
for
the
evaluation
of
the
algorithm,
defining
the
width
of
the
sliding
window.
It
should
be
noted,
though,
that
the
optimal
value
of
this
parameter
might
be
different
from
the
commonly
perceived
mass
accuracy
of
the
instrument,
because
mass
resolving
power
and
preprocessing
of
the
data
must
be
considered.
For
example,
even
if
the
mass
accuracy
of
the
40
major
isotopes
may
be
less
than
0.001
m/z,
the
mass
difference
between
minor
isotopes
may
be
significantly
higher.
41
Algorithm
The
peak
alignment
algorithm
uses:
1.
a
master
list
of
peaks
(L)
against
which
every
new
sample
(Sj)
will
be
matched.
When
aligning
peaks
from
multiple
samples,
the
master
list
is
initially
set
to
the
first
sample,
and
subsequently
it
will
be
the
combination
of
samples
aligned
thus
far,
with
the
samples
as
the
columns
and
the
matching
peaks
as
the
rows.
2.
for
every
row
i
in
L,
a
two-dimensional
window
(where
the
window
size
is
selected
by
the
user),
called
Alignment
window
(AW)
defining
the
ranges
of
m/z
and
RT
and
centered
around
the
average
of
m/z
and
RT
of
all
the
individual
peaks
in
the
row;
and
3.
a
score
function
to
compute
the
similarity
of
peaks
between
L
and
the
new
sample
Sj
inside
the
alignment
window.
The
score
function
computes
the
similarity
based
on
the
similarities
in
m/z,
retention
time,
and
optionally
on
identification,
and
isotope
patterns
between
the
peaks
to
be
matched.
The
score
is
calculated
as
follows:
42
The
algorithm
works
as
follows.
It
iterates
through
the
rows
of
L,
and
for
each
row,
it
looks
for
peaks
within
the
alignment
window
in
Sj
that
has
to
be
aligned
with
L.
A
score
is
calculated
for
each
possible
match
and
the
pair
getting
the
best
score
will
be
aligned.
Method
parameters
Peak
list
name
Name
of
the
new
aligned
peak
list
m/z
tolerance
This
value
sets
the
range,
in
terms
of
m/z,
for
possible
peaks
to
be
aligned.
Maximum
allowed
m/z
difference
Weight
for
m/z
This
is
the
assigned
weight
for
m/z
difference
at
the
moment
of
match
score
calculation
between
peak
rows.
In
case
of
perfectly
matching
m/z
values
the
score
receives
the
complete
weight.
Retention
time
tolerance
type
Maximum
RT
difference
can
be
defined
either
using
absolute
or
relative
value
Absolute
RT
tolerance
Maximum
allowed
absolute
RT
difference
Relative
RT
tolerance
Maximum
allowed
relative
RT
difference
Weight
for
RT
This
is
the
assigned
weight
for
RT
difference
at
the
moment
of
match
score
calculation
between
peak
rows.
In
case
of
perfectly
matching
RT
values
the
score
receives
the
complete
weight.
Require
same
charge
state
If
checked,
only
rows
having
same
charge
(taken
from
best
MS/MS
spectra)
can
be
aligned
Require
same
ID
If
checked,
only
rows
having
same
compound
identities
(or
no
identities)
can
be
aligned.
Compare
isotope
pattern
If
both
peaks
represent
an
isotope
pattern,
add
isotope
pattern
score
to
match
score.
Isotope
pattern
score
threshold
level
If
the
score
between
isotope
pattern
is
lower,
discard
this
match.
43
New
aligned
peak
list
showing
peaks
from
3
different
samples:
time:
The
"deviation"
model
for
the
retention
time
is
created
by
taking
some
corresponding
points
from
the
peak
list
of
two
samples
using
the
RANSAC
algorithm
44
(http://en.wikipedia.org/wiki/RANSAC)
and
using
a
non-linear
regression
method
to
fit
the
model.
This
picture
shows
a
preview
of
the
model
where
the
red
dots
represents
the
aligned
peaks
taken
using
RANSAC
algorithm,
and
the
blue
line
represents
the
fitted
model
using
a
non-
linear
regression.
Using
this
model,
the
algorithm
can
predict
the
shift
in
the
retention
time
along
all
the
peak
list
and
use
the
match
score
function,
used
also
in
Join
Align
algorithm,
to
match
the
peaks.
This
score
is
calculated
based
on
the
mass
and
retention
time
of
each
peak
and
ranges
of
tolerance
stipulated
in
the
parameters
setup
dialog.
Method
parameters
Peak
list
name
This
is
the
suffix
to
identify
the
new
aligned
peak
list
in
Peak
list
frame
of
desktop.
m/z
tolerance
This
value
sets
the
range,
in
terms
of
m/z,
to
verify
for
possible
peak
rows
to
be
aligned.
Maximum
allowed
m/z
difference.
Retention
time
tolerance
after
correction
This
value
sets
the
range,
in
terms
of
retention
time,
to
verify
for
possible
peak
rows
to
be
aligned.
Maximum
allowed
retention
time
difference.
45
Retention
time
tolerance
This
value
sets
the
range,
in
terms
of
retention
time,
to
create
the
model
using
RANSAC
and
non-linear
regression
algorithm.
Maximum
allowed
retention
time
difference.
RANSAC
Iterations
Maximum
number
of
iterations
allowed
in
the
algorithm
to
find
the
right
model
consistent
in
all
the
pairs
of
aligned
peaks.
When
its
value
is
0,
the
number
of
iterations
(k)
will
be
estimate
automatically.
Minimum
Number
of
Points
%
of
points
required
to
consider
the
model
valid
(d).
Threshold
value
Threshold
value
(seconds)
for
determining
when
a
data
point
fits
a
model
(t).
Linear
model
Sometimes
the
shift
in
the
retention
time
between
the
peaks
in
the
samples
is
not
constant
making
that
the
model
shape
is
non
linear
in
some
specific
cases.
This
option
should
be
selected
only
if
the
model
has
to
be
linear.
Parameter
optimization
The
three
first
parameters
(m/z
tolerance,
RT
tolerance
after
correction
and
RT
tolerance)
define
2
bi-dimensional
windows
with
the
same
"altitude"
(m/z
tolerance)
and
different
"longitude"
(RT
tolerances).
The
first
window
(m/z
tolerance
RT
tolerance
after
the
correction)
sets
the
space
where
the
matching
peak
should
be
present,
and
the
second
windows
(m/z
tolerance
RT
tolerance)
sets
the
total
space
where
RANSAC
algorithm
will
be
applied.
So,
"RT
tolerance"
should
be
as
big
as
the
maximum
deviation
in
the
retention
time
among
all
the
chromatogram,
and
"RT
tolerance
after
the
correction"
can
be
more
flexible
and
46
depends
on
the
complexity
of
the
data.
If
the
data
contains
only
a
few
peaks
and
the
separation
is
good,
the
window
can
be
bigger
than
"RT
tolerance"
window
and
it
will
improve
the
recall
without
including
mistakes.
This
parameter
should
not
affect
the
final
results
very
much.
RANSAC
is
a
non-deterministic
algorithm,
and
the
probability
of
finding
a
good
result
increases
with
increasing
the
number
of
iterations.
If
the
user
sets
"0
iterations"
into
the
parameter
"RANSAC
iterations"
the
algorithm
will
automatically
set
the
optimum
number
of
iterations
depending
on
the
number
of
data
points.
In
case
there
is
a
lot
of
data
points
it
is
better
to
limit
this
parameter,
even
though
the
result
could
be
non-optimal.
The
preview
module
can
assist
to
set
this
parameter.
The
parameter
"Minimum
number
of
points"
should
be
an
estimation
of
the
proportion
of
the
data
points
inside
the
model.
It
is
important
not
to
get
models
composed
by
few
data
points,
which
do
not
correspond
to
the
real
model.
All
the
models,
which
contain
less
proportion
of
data
points
won't
be
taken
into
account
by
RANSAC
algorithm.
Threshold
value
represents
the
width
of
the
model
and
it
depends
on
the
nature
of
the
data.
If
this
parameter
is
too
big
it
can
lead
to
deviation
of
the
model.
The
preview
function
can
help
us,
also
in
this
case,
to
set
the
optimal
value.
The
last
parameter
depends
on
whether
the
deviation
in
the
retention
time
in
the
data
can
be
considered
linear
or
not.
If
the
deviation
in
the
retention
time
is
linear,
a
simple
linear
regression
will
be
used
to
fit
the
model.
47
7. Gap
filling
7.1. Peak
finder
Following
alignment,
the
resulting
peak
list
may
contain
missing
peaks
as
a
product
of
a
deficient
peak
detection
or
a
mistake
in
the
alignment
of
different
peak
lists.
The
fact
that
one
peak
is
missing
after
the
alignment
does
not
imply
that
the
peak
does
not
exits.
In
most
cases
it
is
present
but
was
undetected
by
the
previous
algorithms.
This
algorithm
fills
the
gaps
in
the
peak
list
when
it
is
possible
according
with
the
parameters
defined
by
the
user.
The
most
crucial
parameters
are
"m/z
tolerance"
and
"RT
tolerance"
which
define
the
window
where
the
algorithm
should
find
the
new
peak.
It
is
centered
in
the
m/z
average
and
retention
time
average
of
the
source
peak
list.
Once
the
best
candidate
is
found
inside
the
window,
its
intensity
and
its
shape
in
RT
direction
is
also
checked.
It
can
also
add
a
previous
correction
of
the
retention
time
in
the
case
it
is
needed.
It
will
change
the
position
of
the
defined
window
depending
on
the
prediction
of
a
RT
model
created
using
all
the
already
aligned
peaks
in
each
pair.
RT correction
When
RT
correction
is
applied,
the
algorithm
is
divided
in
two
main
steps.
In
the
first
step,
one
random
sample
is
taken
from
the
multiple
peak
list
and
is
used
as
a
master
list.
All
the
gaps
of
this
master
list
are
filled
using
all
the
others
samples.
For
each
pair
of
samples
the
algorithm
creates
a
model
of
the
retention
time.
In
the
second
step
the
master
list
is
used
to
fill
the
gaps
of
the
rest
of
the
samples,
creating
also
a
retention
time
model
for
each
pair
(as
shown
in
the
figure
below).
48
Method
parameters
Name
suffix
Suffix
to
be
added
to
the
peak
list
name.
Intensity
tolerance
This
value
sets
the
maximum
allowed
deviation
from
expected
shape
of
a
peak
in
chromatographic
direction.
M/Z
tolerance
This
value
sets
the
range,
in
terms
of
m/z,
to
search
for
possible
peak
in
the
raw
data.
Maximum
allowed
m/z
difference.
Retention
time
tolerance
type
Its
value
can
be
relative
or
absolute.
The
next
parameters
should
be
fill
depending
on
the
value
chosen
in
this
parameter.
Absolute
RT
tolerance
This
value
sets
the
range,
in
terms
of
retention
time,
to
search
for
possible
peak
in
the
raw
data.
Maximum
allowed
absolute
RT
difference.
Relative
RT
tolerance
This
value
sets
the
range,
in
terms
of
retention
time,
to
search
for
possible
peak
in
the
raw
data.
Maximum
allowed
relative
RT
difference.
49
RT
Correction
When
it
is
checked,
correction
of
the
retention
time
will
be
applied
to
avoid
the
problems
caused
by
the
deviation
of
the
retention
time
between
the
samples.
New
peak
list
showing
the
filled
peaks
with
a
yellow
mark:
Method
parameters
Name
suffix
Suffix
to
be
added
to
the
peak
list
name.
50
8. Normalization
8.1. Linear
normalizer
Linear
normalizer
divides
the
height
(or
area)
of
each
peak
in
the
peak
list
by
a
normalization
factor,
determined
according
to
the
Normalization
type
parameter.
Each
column
(raw
data
file)
of
the
peak
list
is
normalized
separately.
In
other
words,
normalization
factor
is
determined
independently
for
each
raw
data
file.
Method
parameters
Suffix
This
string
is
added
to
the
end
of
the
name
of
each
processed
peak
list
Normalization
type
Selection
of
the
normalization
factor
Peak
measurement
type
Selection
of
either
peak
height
or
peak
area,
which
will
be
used
to
calculate
the
normalization
factors
Remove
original
peak
list
If
selected,
the
original
peak
list
is
automatically
removed
51
Method
parameters
Suffix
This
string
is
added
to
the
end
of
the
name
of
each
processed
peak
list
m/z
tolerance
Maximum
allowed
m/z
difference
Retention
time
tolerance
Maximum
allowed
retention
time
difference
Minimum
standard
intensity
Minimum
height
of
a
peak
to
be
selected
as
normalization
standard
Remove
original
peak
list
If
selected,
the
original
chromatogram
is
automatically
removed
The
folowing
screenshot
shows
two
peak
lists
(top)
normalized
by
the
Retention
time
normalizer
(bottom)
with
the
following
parameters:
m/z
tolerance
=
0.1,
RT
tolerance
=
1:00,
Minimum
standard
intensity
=
1E6.
The
red
color
shows
which
peaks
were
selected
as
standards.
The
retention
time
of
standards
was
averaged,
and
retention
time
of
other
peaks
was
adjusted
accordingly
(blue
color).
52
53
Peak
list
must
be
aligned
prior
to
normalization.
User
can
select
one
or
multiple
internal
standard
peaks,
which
must
be
present
in
all
raw
data
files.
Then
peak
height
(or
area)
of
each
peak
is
normalized
by
either
the
nearest
standard
or
a
weighted
contribution
of
all
standards.
Method
parameters
Suffix
This
string
is
added
to
the
end
of
the
name
of
each
processed
peak
list
Normalization
type
Normalize
intensities
using
either
only
one
(nearest)
standard
or
using
a
weighted
contribution
of
all
selected
standards,
weighted
by
distance.
The
distance
of
the
standard
peak
to
the
peak
being
normalized
is
calculated
as
distance
=
MZvsRTBalance
*
(MZdifference)
+
(RTdifference).
Peak
measurement
type
Selection
of
either
peak
height
or
peak
area,
which
will
be
used
to
calculate
the
normalization
factors
m/z
vs
RT
balance
Used
in
distance
measuring
as
coefficient
of
m/z
difference.
Remove
original
peak
list
If
selected,
the
original
peak
list
is
automatically
removed
Standard
compounds
List
of
peaks
for
choosing
the
normalization
standards
54
9. Identification
9.1. Adduct
Search
Definition
of
an
adduct
ion
An
ion
formed
by
interaction
of
two
species,
usually
an
ion
and
a
molecule,
and
often
within
the
ion
source,
to
form
an
ion
containing
all
the
constituent
atoms
of
one
species
as
well
as
an
additional
atom
or
atoms.
This
method
identifies
common
adducts
(selected
by
the
user)
in
a
single
peak
list.
The
adducts
are
identified
by
2
conditions:
1)
the
retention
time
of
the
original
ion
and
the
adduct
ion
should
be
same
and
2)
the
mass
difference
between
the
original
ion
and
the
adduct
must
be
equal
to
one
of
the
adducts
selected
by
the
user.
MZmine
has
a
built-in
list
of
common
adducts
and
their
masses.
On
top
of
this
list,
the
user
can
specify
a
custom
adduct
mass.
The
following
figure
shows
a
spectrum
with
a
phosphate
adduct
ion.
55
The
following
figure
shows
a
peak
list
showing
two
identified
"Deuterium"
adducts:
Method
parameters
RT
tolerance
Maximum
allowed
difference
of
retention
time
to
set
a
relationship
between
peaks
Adducts
List
of
adducts,
each
one
refers
a
specific
distance
in
m/z
axis
between
related
peaks
Custom
adduct
value
Specific
distance
in
m/z
axis
between
related
peaks
for
custom
adduct
m/z
tolerance
Tolerance
value
of
the
m/z
difference
between
peaks
Max
adduct
peak
height
Maximum
height
of
the
recognized
adduct
peak,
relative
to
the
main
peak
56
Method
parameters
Ionization
method
Type
of
ion
used
to
calculate
the
neutral
mass
RT
tolerance
Maximum
allowed
retention
retention
time
difference
to
set
the
relationship
between
peaks
m/z
tolerance
Tolerance
value
of
the
m/z
difference
between
peaks
Max
complex
peak
height
Maximum
height
of
the
recognized
complex
peak,
relative
to
the
highest
of
component
peaks
57
Method
parameters
Database
file
Name
of
file
that
contains
information
for
peak
identification
Field
separator
Character(s)
used
to
separate
fields
in
the
database
file
Field
order
Order
of
items
in
which
they
are
read
from
database
file.
Order
may
be
changed
by
dragging
the
items
with
mouse.
Ignore
first
line
Check
to
ignore
the
first
line
of
database
file
m/z
tolerance
Maximum
allowed
m/z
difference
to
set
an
identification
to
a
peak
RT
tolerance
Maximum
allowed
retention
time
difference
to
set
an
identification
to
a
peak
Database
file
Database
file
has
to
be
provided
in
CSV
format
(Comma-Separated
Values).
Such
files
can
be
exported
from
a
spreadsheet
software
such
as
MS
Excel,
or
edited
manually
using
a
text
editor.
The
following
examples
shows
the
structure
of
the
database
file:
ID,m/z,Retention time (min),Identity,Formula
1,175.121,24.5,Arginine,C6H14N4O2
2,133.063,11.9,Asparagine,C4H8N2O3
3,134.047,11.7,Aspartate,C4H7NO4
58
Setup
dialog
for
the
identification
using
the
above
example
database
file:
59
List
of
candidate
formulas:
Method
parameters
Neutral
mass
The
neutral
mass
is
calculated
from
the
peak
m/z
value,
its
charge
and
type
of
ionization
adduct.
m/z
tolerance
Tolerance
of
the
neutral
mass
for
searching
the
formula.
Elements
Elements
allowed
in
the
formula
and
their
minimum
and
maximum
counts.
Element
count
heuristics
Selection
of
heuristic
restrictions
on
element
counts.
RDBE
restrictions
Selection
of
restrictions
on
RDBE
(rings
double
bonds)
values.
Isotope
pattern
filter
If
selected,
only
results
which
fit
the
required
isotope
pattern
similarity
score
will
be
returned.
See
Isotope
pattern
comparison
for
details
about
the
algorithm.
MS/MS
filter
Restrict
the
formulas
to
those
that
can
be
interpreted
in
the
peaks
MS/MS
pattern
60
matching
compounds
are
returned.
The
module
can
be
invoked
in
two
ways:
either
by
selecting
a
peak
list
and
starting
on-line
database
search
module
from
MZmine
menu,
or
by
selecting
a
single
peak
within
a
peak
list
and
starting
the
database
search
for
this
individual
peak.
The
first
method
will
attempt
to
identify
all
peaks
in
the
peak
list.
The
second
method
is
recommended,
because
it
allows
the
user
to
select
the
right
ionization
adduct.
Selection
of
the
peak
for
identification:
Results
of
database
query:
61
Displaying
the
structure
of
the
database
compound:
Method
parameters
Database
On-line
database
to
search
(see
below)
Peak
m/z
Detected
m/z
value
of
the
peak.
This
is
set
automatically
according
to
the
peak
subjected
to
identification.
Charge
Charge
of
the
peak
being
identified.
This
value
is
used
to
calculate
the
neutral
mass.
Ionization
method
Type
of
ionization
that
produced
the
peak
subjected
to
identification.
This
is
used
to
calculate
the
neutral
mass.
Neutral
mass
This
value
is
automatically
calculated
from
the
parameters
above.
The
neutral
mass
represents
the
final
search
term
for
querying
the
on-line
database.
Number
of
results
Limit
for
the
number
of
results
to
be
retrieved
from
the
on-line
database.
Mass
tolerance
Tolerance
of
the
neutral
mass
for
searching
the
database.
Isotope
pattern
filter
If
selected,
only
results
which
fit
the
required
isotope
pattern
similarity
score
will
be
returned.
62
Isotope
pattern
score
threshold
The
score
required
for
the
isotope
pattern
filter.
PubChem
PubChem
database
(http://pubchem.ncbi.nlm.nih.gov/)
contains
millions
of
chemical
compound
structures.
KEGG
KEGG
database
(http://www.genome.jp/kegg/)
contains
metabolites
and
other
biomolecules
present
in
natural
metabolic
pathways.
HMDB
The
Human
Metabolome
Database
(HMDB)
(http://www.hmdb.ca/)
contains
over
7,000
known
metabolites
found
in
human
body.
METLIN
The
METLIN
database
(http://metlin.scripps.edu/)
contains
over
20,000
metabolites.
63
Method
parameters
RT
tolerance
Maximum
allowed
retention
time
difference
to
set
a
relationship
between
peaks
m/z
tolerance
of
MS2
data
Tolerance
value
of
the
m/z
difference
between
peaks
in
MS/MS
scans
Max
fragment
peak
height
Maximum
height
of
the
recognized
fragment
peak,
relative
to
the
main
peak
Min
MS2
peak
height
64
Minimum
absolute
intensity
of
the
MS2
fragment
peak.
Signals
below
this
level
will
be
ignored.
are
compounds
with
well-defined
structure
(see
Method
parameters
Type
of
lipids
Selection
of
glycerophospholipis
to
consider
Minimum
fatty
acid
length
Minimum
length
of
the
fatty
acid
chain
Maximum
fatty
acid
length
Maximum
length
of
the
fatty
acid
chain
Maximum
number
of
double
bonds
Maximum
number
of
double
bonds
in
one
fatty
acid
chain
m/z
tolerance
Maximum
allowed
m/z
difference
Ionization
method
Type
of
ion
used
to
calculate
the
ionized
mass
65
For
aligned
peak-lists
the
operation
for
forming
spectra
is
extended
such
that
for
two
sets
of
aligned
peaks
(P1,
P2,
P3)
and
(Q1,
Q2,
Q3)
the
aligned
peaks'
RTs
are
considered
within
the
tolerance
if
any
corresponding
pair
of
peaks
(P1,
Q1),
(P2,
Q2)
or
(P3,
Q3)
have
RTs
within
the
specified
tolerance.
Method
Parameters
Ionization
method
Type
of
ionization
that
produced
the
peak
subjected
to
identification.
This
is
used
to
calculate
the
neutral
mass.
Spectrum
RT
tolerance
When
forming
a
search
spectrum
for
a
given
peak,
include
all
other
detected
peaks
whose
RT
is
within
the
specified
tolerance
of
the
given
peak
Min.
match
factor
NIST
MS
Search
calculates
a
match
factor
(0
..
999)
between
the
submitted
spectrum
and
each
hit.
Ignore
hits
with
a
match
factor
below
the
given
threshold.
Min.
reverse
match
factor
NIST
MS
Search
calculates
a
reverse
match
factor
(0
..
999)
between
the
submitted
spectrum
and
each
hit.
Ignore
hits
with
a
reverse
match
factor
below
the
given
threshold.
Requirements
This
module
relies
on
the
NIST
MS
Search
software,
which
is
currently
only
available
for
Microsoft
Windows,
being
installed.
When
MZmine
2
is
started
the
nist.ms.search.path
system
property
must
be
set
to
the
full
path-name
of
the
directory
in
which
the
NISTMS$.EXE
program
is
found.
66
Method
parameters
Name
suffix
This
is
the
suffix
to
identify
the
new
aligned
peak
list
in
Peak
list
frame
of
desktop.
M/Z
tolerance
Maximum
m/z
difference
between
duplicate
peaks.
RT
tolerance
Maximum
retention
time
difference
between
duplicate
peaks.
Require
same
identification
If
the
checkbox
is
selected
duplicate
peaks
must
have
the
same
identification.
Remove
source
peak
list
after
filtering
It
the
checkbox
is
selected
the
source
peak
list
will
be
remove
and
the
filtered
version
remains.
Method
parameters
Name
suffix
This
is
the
suffix
to
identify
the
new
aligned
peak
list
in
Peak
list
frame
of
desktop.
Minimum
peaks
in
a
row
Minimum
number
of
peaks
in
a
row
required
to
keep
it.
Minimum
peaks
in
an
isotope
pattern
67
Minimum
number
of
peaks
in
an
isotope
pattern
required
to
not
remove
the
row.
Minimum
m/z
Minimum
average
m/z
value
in
a
row
required
to
not
remove
it.
Maximum
m/z
Maximum
average
m/z
value
in
a
row
required
to
not
remove
it.
Minimum
retention
time
Minimum
average
retention
time
value
in
a
row
required
to
not
remove
it.
Maximum
retention
time
Maximum
average
retention
time
value
in
a
row
required
to
not
remove
it.
Only
identified?
If
the
checkbox
is
selected
only
indentified
rows
will
be
filtered.
Remove
source
peak
list
after
fitering
It
the
checkbox
is
selected
the
source
peak
list
will
be
remove
and
the
filtered
version
remains.
68
69
The
minimization
can
be
performed
either
by
gradient
descent
or
by
other
means.
(http://en.wikipedia.org/wiki/Sammon%27s_projection)
70
Method
parameters
Data
files
Raw
data
files
correspondent
to
the
samples
selected
to
bi
in
the
projection
plot.
Coloring
style
The
dots
corresponding
to
every
sample
can
be
colored
depending
on
the
sample's
parameter
state
or
on
the
file.
Peak
measuring
approach
It
can
take
two
values:
height
or
area.
The
projections
will
be
calculated
using
one
of
this
two
values.
Peaks
Peaks
that
will
be
taken
into
account
to
create
the
projection
plot.
Component
on
X-axis
This
parameters
is
only
enabled
in
PCA
algorithm
and
it
allows
to
the
user
to
choose
the
principal
component
on
X
axis.
Component
on
Y-axis
This
parameters
is
only
enabled
in
PCA
algorithm
and
it
allows
to
the
user
to
choose
the
principal
component
on
Y
axis
71
Method
parameters
Peak
measuring
approach
It
can
take
two
values:
height
or
area.
The
coefficient
of
variation
and
the
logratio
analysis
will
be
calculated
using
one
of
these
two
values.
72
Logratio
plot:
73
11.3. Clustering
The
goal
of
clustering
is
to
group
a
set
of
observations
into
subsets
(clusters)
finding
an
intrinsic
structure
in
them
so
that
the
members
of
one
single
cluster
share
more
similarities
than
the
members
of
distinct
clusters.
Clustering
is
the
most
important
method
of
unsupervised
learning,
and
a
common
technique
for
statistical
data
analysis
used
in
many
fields.
There
is
no
absolute
"best"
criterion
which
would
be
independent
of
the
final
aim
of
the
clustering.
Consequently,
it
is
the
user
which
must
supply
this
criterion,
in
such
a
way
that
the
result
of
the
clustering
will
suit
their
needs.
The
result
of
the
clustering
can
be
visualized
using
PCA
plot
of
Sammon's
projection,
and
the
data
can
be
seen
in
a
table
where
in
the
first
column
are
the
names
of
the
samples
or
variables
and
in
the
second
column
the
cluster
id
(a
number)
for
each
sample
or
variable.
Visualization
of
the
hierarchical
clustering
result
is
implemented
based
on
TreeViewJ
software
("Peterson,
M.W.
and
M.E.
Colosimo,
TreeViewJ:
an
application
for
viewing
and
analysing
phylogenetic
trees.
Source
Code
Biol
Med,
2007.
2(1):
p.
7.")
74
clusters
is
determined
using
cross-validation.
Each
variable
has
a
probability
distributiona
indicating
the
probability
of
the
variable
belonging
to
each
of
the
clusters.
Simple
K-Means
The
goal
of
K-means
clustering
is
to
determine
k
clusters
in
such
a
way
that
intra
cluster
distances
are
small
and
inter
cluster
distances
are
large;
or
in
other
words,
every
point
is
assigned
to
a
cluster
whose
centre
is
the
nearest.
K-means
clustering
works
by
randomly
choosing
k-centroids
in
the
first
step
and
then
assigning
the
data
points
to
the
clusters
in
such
a
way
that
every
point
belongs
to
the
cluster
with
the
nearest
centroid,
and
redetermining
the
cluster
centroids
by
taking
the
mean
of
data
points
in
each
cluster.
The
process
is
continued
until
the
cluster
means
converge.
Hierarchical
clustering
Hierarchical
clustering
builds
a
hierarchy
of
clusters.
It
is
either
achieved
using
Agglomerative
clustering,
in
which
initially
every
point
belongs
to
a
distinct
cluster
and
the
clusters
are
combined
with
the
nearest
clusters
iteratively;
or
by
dividing
clusters
(Divisive)
starting
from
one
single
cluster
containing
all
data
points,
until
every
singe
point
belongs
to
a
separate
cluster.
The
distances
between
points
maybe
determined
using
e.g.
Euclidean,
Minkowski
or
Manhattan
distance;
and
the
distances
between
clustered
maybe
determined
by
single
linkage
(minimum
distance
between
all
pairs
of
points
between
the
clusters),
complete
linkage
(maximum
distance
between
all
pairs
of
points
between
clusters),
and
so
on.
Determining
the
number
of
clusters
is
done
by
setting
a
length
to
"cut"
the
hierarchical
clustering
tree,
but
hierarchical
clustering
is
more
commonly
used
as
a
tool
for
visualizing
the
patterns
of
neighbourhood.
Method
parameters
Data
files
Raw
data
files
correspondent
to
the
samples
selected
to
bi
in
the
projection
plot.
Colouring
style
75
The
dots
corresponding
to
every
sample
can
be
colored
depending
on
the
sample's
parameter
state
or
on
the
file.
Peak
measuring
approach
It
can
take
two
values:
height
or
area.
The
projections
will
be
calculated
using
one
of
this
two
values.
Peaks
Peaks
that
will
be
taken
into
account
to
create
the
projection
plot.
Visualization
The
visualization
of
the
result
of
non
hierarchical
clustering
algorithms
can
be
performed
using
PCA
or
Sammon's
projection
Type
of
data
It
can
take
two
values:
Samples
or
variables.
The
clustering
will
be
applied
to
one
of
this
types
of
data.
Algorithm
Algorithm
that
will
be
used
to
cluster
the
data.
Link
type
This
parameters
is
only
enable
when
the
hierarchical
clustering
has
been
chosen.
The
distances
between
clusters
is
determined
by
the
chosen
linkage.
Distance
fuction
This
parameters
is
only
enable
when
the
hierarchical
clustering
has
been
chosen.
The
distances
between
points
is
determined
by
the
chosen
distance
function.
Number
of
groups
The
number
of
clusters
has
to
be
defined
by
the
user
in
advance
for
some
clustering
algorithms.
This
parameter
is
available
only
when
K-means
or
Farthest
First
algorithm
are
chosen.
76
This
module
is
using
the
R
function
"heatmap.2"
to
draw
the
heat
map
plot.
The
function
description
can
be
found
here:
http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/gplots/html/heatmap.2.html
Method
parameters
Output
name
Path
of
the
heat
map
plot
output
file.
Output
file
type
The
output
file
can
be
"pdf",
"svg",
"png"
or
"fig".
The
height
and
width
of
the
plot
depend
on
the
type
of
file.
In
the
case
of
"png"
type,
the
height
and
width
have
to
be
more
than
500.
Sample
parameter
Description
of
the
samples
defined
by
the
user
in
"Project->
Set
sample
parameters"
section.
There
has
to
be
at
least
two
groups.
One
of
the
groups
will
be
used
as
a
reference
group
and
the
rest
of
the
groups
will
be
compared
to
it.
Group
of
reference
Name
of
the
group
used
as
a
reference
by
the
heat
map
plot.
The
rest
of
the
groups
defined
in
the
sample
parameters
will
be
compared
to
it
77
Only
identified
rows
Only
identified
rows
will
be
plotted.
Use
peak
area
If
it
is
selected
the
peak
area
will
be
used
to
create
the
heat
map
plot.
If
it
is
not
selected
peak
height
will
be
used.
Scaling
The
data
will
be
scaled
using
the
standard
deviation
of
the
values
of
all
the
rows
in
each
sample.
Log
Log
transformation
of
the
data.
Show
control
samples
The
samples
in
the
reference
group
will
be
shown
if
this
option
is
selected.
It
is
only
valid
when
P-value
legend
is
not
selected
and
the
individual
samples
are
plotted.
P-value
legend
If
this
option
is
selected
the
mean
of
each
group
will
be
compared
with
the
mean
of
the
reference
group
using
T-test.
The
plot
will
show
the
groups
instead
the
individual
samples
and
the
significance
(depending
on
the
p-value)
of
their
mean
differences.
If
the
p-value
is
less
than
0.001,
"***"
will
be
printed
into
the
corresponding
cell.
If
the
p-value
is
between
than
0.01
and
0.001,
"**"
will
be
printed,
and
if
it
is
between
0.05
and
0.01,
"*"
will
be
printed
into
the
corresponding
cell.
Size
p-value
legend
Size
of
the
"*"
in
the
heat
map
plot.
Height
Height
of
the
heat
map
plot.
Width
Width
of
the
heat
map
plot.
Column
margin
Column
margin
of
the
heat
map
plot.
Row
margin
Row
margin
of
the
heat
map
plot.
78
Method
parameters
Filename
Name
of
file
where
the
exported
data
is
saved
Field
separator
Columns
in
the
new
CSV
file
will
be
separated
by
this
character
(typically
a
comma)
Export
elements
Please
select
which
columns
from
the
peak
list
will
be
exported
into
the
CSV
file
79
algorithm
prior
to
saving.
Both
compressed
and
uncompressed
XML
files
can
be
imported
back
to
MZmine,
although
the
connection
to
the
raw
data
files
will
be
lost.
Method
parameters
Filename
Name
of
file
where
the
exported
data
is
saved
Compressed
file
The
XML
file
is
compressed
by
ZIP
algorithm
before
saving
80
Following
the
definition
of
raw
data
files,
each
peak
list
row
is
saved
into
a
<row>
element.
It
contains
the
information
about
peak
identity
and
all
individual
peaks
in
the
row.
For
each
peak,
basic
information
about
its
data
points
(scan
numbers,
m/z
and
intensity
values)
is
saved
as
arrays
of
double
type
encoded
by
Base64
encoding
(the
<mzpeak>
element).
This
information
is
necessary
for
showing
the
peak
shape
in
the
peak
list
table
visualizer,
when
the
XML
peak
list
is
imported
back
into
MZmine.
81
MZMINE_VERSION contains the version number of MZmine that saved the project file
Raw data file #number name.scans for each raw data file in the project
Raw data file #number name.xml for each raw data file in the project
Peak
list
#number
name.xml
for
each
peak
list
in
the
project
82
Please
refer
to
the
MZmine
source
code
for
more
detailed
information
about
the
project
file
format.
83
This
format
is
similar
to
the
format
of
a
peak
list
exported
to
XML
file
(see
the
Peak
list
export
to
XML
section).
Please
refer
to
the
MZmine
source
code
for
more
detailed
information
about
the
project
file
format.
84
14. Visualization
Parameters
Raw
data
files
List
of
raw
data
files
to
display
in
TIC
visualizer,
with
the
possibility
to
chose
one,
many
or
all
of
them.
The
TIC
will
display
all
the
chromatograms
in
the
same
plot.
MS
level
This
refers
to
the
scan
level
(MS1,MS2,...
,MSn)
to
be
used
to
display
the
chromatogram.
Plot
type
Type
of
Y
value
calculation.
TIC
shows
the
sum
of
intensities,
base
peak
shows
the
maximum
intensity.
Retention
time
Retention
time
(X
axis)
range.
m/z
range
Range
of
m/z
values.
If
this
range
does
not
include
the
whole
scan
m/z
range,
the
resulting
visualizer
is
XIC
type.
Selected
peaks
List
of
chromatographic
peaks
to
display
in
TIC
visualizer.
This
is
available
only
if
a
related
peaklist
to
the
selected
raw
data
file
exists
in
the
current
project.
Additionally
in
the
parameters
window,
the
button
"Set
automatically"
allows
the
user
to
set
all
the
ranges
automatically.
MZmine
will
use
the
maximum
ranges
allowed
by
the
raw
data.
85
Functionality
This
plot
is
using
the
third
part
library
JfreeChart
for
its
basic
functionality.
To
zoom
in,
drag
the
mouse
from
left
to
right,
selecting
the
area
to
zoom.
To
zoom
out
drag
the
zoom
from
right
to
left.
In
the
right
part
of
the
plot
exist
a
group
of
icons
which
functionality
is
also
included
in
a
pop-up
menu,
which
appears
when
you
make
right
click
on
the
plot
area.
When
the
mouse
is
passed
over
the
plot
a
tool
tip
appears
showing
the
information
of
that
data
point.
If
left
click
is
done
a
crosshair
lines
will
appear,
indicating
the
selected
position.
Using
the
first
button
from
the
right
panel
is
possible
to
visualize
the
spectrum.
86
To
display
the
data
points
in
the
plot
use
the
second
icon
on
the
right
panel.
To
display
the
label
of
the
data
points
or
modified
the
range
of
the
axis
use
the
third
and
fourth
button
from
the
right
panel
respectively.
87
It
is
possible
to
display
the
data
from
other
raw
data
files
in
the
same
TIC
plot
window
by
using
the
options
in
the
pop-up
menu
(left
click
on
the
plot
area).
The
submenu
contents
and
option
called
"Set
same
range
to
all
windows",
which
apply
the
same
x
and
y
ranges
values
to
other
TIC
plot
windows
in
the
current
project.
88
Parameters
Scan
number
This
is
the
number
that
identify
the
scan
to
be
visualized.
Functionality
This
plot
is
using
the
third
part
library
JfreeChart
for
its
basic
functionality.
To
zoom
in,
drag
the
mouse
from
left
to
right,
selecting
the
area
to
zoom.
To
zoom
out
drag
the
zoom
from
right
to
left.
In
the
right
part
of
the
plot
exist
a
group
of
icons
(toolbar)
which
functionality
is
also
included
in
a
pop-up
menu,
which
appears
when
you
make
right
click
on
the
plot
area.
The
first
icon
correspond
to
the
visualization
mode
(centroid
or
continuous).
89
To
display
the
data
points
in
the
plot
use
the
second
icon
on
the
right
panel.
90
To
display
the
label
of
the
data
points
use
the
third
button
of
the
right
panel.
It
is
possible
to
display
the
selected
ions
to
form
a
chromatographic
peak,
by
default
this
option
is
activated.
The
selected
ion
appears
in
color
red.
If
the
user
toggles
the
button
of
"Toggle
displaying
picked
peaks",
the
color
of
the
ion
change
to
the
normal
color
(blue)
and
the
tool
tip
information
also
changes.
91
92
If
a
search
of
isotopes
was
done,
is
possible
to
visualize
the
predicted
isotope
pattern.
The
visualization
of
this
pattern
appears
in
color
light
green
and
can
be
toggle
on/off.
93
The
last
button
of
the
toolbar
display
a
pop-menu
which
allows
to
set
the
range
in
both
axis.
The
bottom
part
of
the
Spectra
plot
contains
information
about
the
origin
of
the
data
(color
+
legend),
peak
lists
where
some
of
the
peaks
contain
this
spectrum,
list
of
scans
derived
from
an
ion
on
this
scan
(fragmentation
scans).
There
are
two
buttons
with
an
arrow
symbol,
which
allows
the
user
to
display
to
the
next
scan
on
time.
94
14.3. 2D
visualizer
This
tool
displays
a
plot
of
two
dimensions,
where
X
axis
corresponds
to
retention
time
and
Y
axis
is
the
m/z
value.
This
visualization
of
spots
in
the
plot
corresponds
with
the
intensity
of
the
data
in
that
region.
Generally
this
plot
is
well
used
in
MZmine
to
display
chromatographic
peaks.
Parameters
MS
level
MS
level
of
plotted
scans.
MS
level
This
refers
to
the
level
(MS1,MS2,...
,MSn)
of
the
scans
to
be
used.
Retention
time
Retention
time
(X
axis)
range.
m/z
range
95
Range
of
m/z
values.
If
this
range
does
not
include
the
whole
scan
m/z
range,
the
resulting
visualizer
is
XIC
type.
Additionally
in
the
parameters
window,
the
button
"Set
automatically"
allows
the
user
to
set
all
the
ranges
automatically.
MZmine
will
use
the
maximum
ranges
allowed
by
the
raw
data.
Functionality
This
plot
is
using
the
third
part
library
JfreeChart
for
its
basic
functionality.
To
zoom
in,
drag
the
mouse
from
left
to
right,
selecting
the
area
to
zoom.
To
zoom
out
drag
the
zoom
from
right
to
left.
The
intensity
of
the
plot
can
be
manipulated
by
the
button
"Switch
palette".
The
intensity
will
increase
with
every
click
until
the
chart
changes
to
red
color.
The
intensity
of
the
signal
is
displayed
by
difference
in
color
or
gray
scale.
96
To
display
the
data
points
used
to
from
a
spot
or
an
identified
peak,
toggle
the
button
"Toggle
displaying
of
datapoints
in
continuous
mode".
To
adjust
the
range
of
the
plot
use
the
button
"Setup
range
for
axes".
After
click
a
pop-up
window
will
appear
with
text
fields
where
the
user
can
adjust
the
ranges
in
both
axis
or
set
to
automatic
range.
97
The
button
"Switch
between
continuous
and
centroid
mode"
changes
the
visualization
of
the
data.
If
the
color
palette
was
selected
is
easier
to
notice
the
change.
This
option
depends
on
the
orginal
raw
data
format.
98
The
tooltip
that
appears
when
the
mouse
is
over
each
spot
can
contain
information
of
the
identified
peak
in
that
region.
The
related
peak
information
can
be
omitted
by
using
the
button
"Toggle
displaying
of
tool
tips
on
the
peaks".
99
At
the
bottom
of
the
panel
the
user
can
select
the
list
of
peaks
to
display
in
the
plot
(right
combo
box).
This
feature
depends
on
the
peak
list
associated
to
the
visualized
raw
data
in
the
current
project.
Also,
the
user
can
filter
the
number
of
displayed
peaks
by
a
threshold
value
or
by
number
of
most
intense
peaks.
100
14.4. 3D
visualizer
This
tool
presents
a
three
dimensional
plot
where
X
axis
represents
the
retention
time,
Y
axis
the
m/z
value
and
Z
axis
the
intensity
of
the
signal.
This
plot
is
the
collection
of
all
the
information
from
the
raw
data
in
a
graphical
representation.
101
Parameters
MS
level
This
refers
to
the
scan
level
(MS1,MS2,...
,MSn)
to
be
used
to
display
the
chromatogram.
Retention
time
Retention
time
(X
axis)
range.
Retention
time
resolution
Number
of
data
points
on
retention
time
axis.
m/z
range
Range
of
m/z
values.
m/z
resolution
Number
of
data
points
on
m/z
axis.
Additionally
in
the
parameters
window,
the
button
"Set
automatically"
allows
the
user
to
set
all
the
ranges
automatically.
MZmine
will
use
the
maximum
ranges
allowed
by
the
raw
data.
Functionality
This
plot
is
using
the
third
part
library
JfreeChart
for
its
basic
functionality.
To
zoom
in
and
out,
use
the
third
button
functionality
of
the
mouse.
Also
by
click
left
button
and
dragging
the
mouse,
the
user
can
"rotate"
the
plot
in
any
direction.
If
right
button
is
used
instead,
the
user
can
"move"
the
plot
to
any
position.
102
The
toolbar
contains
two
buttons,
"Set
properties"
and
"Toggle
displaying
peaks
values".
The
first
one
open
a
pop-up
window
where
the
user
can
define
the
rage
color
depending
the
intensity
and
other
characteristics
of
the
plot.
This
properties
depend
on
the
JFreeChart
library.
The
button
for
displaying
peaks
values
turn
on/off
the
labels
of
the
peaks.
If
a
peak
list
is
associated
with
the
visualized
raw
data
in
the
current
project,
the
label
of
each
peak
could
be
the
identity
of
the
peak
or
the
representative
mass
(m/z)
of
the
peak.
103
This
plot
also
display
the
region
where
the
peak
was
identified.
By
making
Shift
+
right
click
over
the
label
of
the
peak
the
user
can
observe
a
red
cube
which
represents
the
space
that
covers
the
selected
peak
in
terms
of
retention
time,
m/z,
intensity.
104
At
the
bottom
of
the
3D
window
there
is
a
combo
box
where
the
user
can
select
any
peak
list
associated
with
the
displayed
raw
data.
If
the
user
selects
a
different
one
than
the
current,
the
plot
automatically
changes
the
displayed
labels
of
the
peaks,
according
with
the
selected
peak
list
identifications.
105
Common
columns
ID
This
is
the
number
that
identify
the
peak
list
row.
A
peak
list
row
can
contain
one
or
many
peaks
that
share
the
same
mass
range
and
retention
time
range
but
coming
from
different
raw
data.
m/z
This
is
the
representative
m/z
value
for
this
row
peak
(average
of
all
peaks's
m/z
value).
The
m/z
value
of
each
peak
depends
on
the
peak
identification
method.
Please
refer
to
"Peak
detection
-"
section
of
this
help
Ret.time
This
is
the
representative
retention
time
value
for
this
row
peak
(average
of
all
peaks's
ret.
time
value).
The
ret.
time
value
of
each
peak
depends
on
the
used
peak
identification
method.
Please
refer
to
"Peak
detection
-"
section
of
this
help
Identity
This
is
the
name,
assigned
from
the
peak's
identity.
This
value
depends
on
the
method
to
identify
the
peak.
Please
refer
to
"Identification
-"
section
of
this
help.
Comment
This
is
an
user
input
value.
It
can
be
edited
by
making
double
click.
Peak
shape
This
is
graphic
representation
of
the
peaks
in
two
axis
(X
-
retention
time,
Y-intensity)
with
a
different
color
for
each
peak
in
the
row.
The
size
of
this
graphic
can
be
edited.
106
A
circle
of
different
color
will
appear
indicating
the
status
of
the
peak.
The
meaning
of
the
color
is
as
follow,
RED
=
no
detected,
ORANGE=
manually
detected
by
the
user,
YELLOW
=
detected
by
the
"Gap
Filler"
tool
and
GREEN
=
detected.
Peak
shape
TThis
is
graphic
representation
of
the
peaks
in
two
axis
(X
-
retention
time,
Y-intensity),
in
this
case
a
XIC
m/z
The
m/z
value
of
the
peak
and
depends
on
the
peak
identification
method.
Please
refer
to
"Peak
detection
-"
section
of
this
help.
Ret.time
The
ret.
time
value
of
the
peak
and
depends
on
the
used
peak
identification
method.
Please
refer
to
"Peak
detection
-"
section
of
this
help
Duration
The
duration
of
the
peak
on
terms
of
time.
Height
The
intensity
of
the
peak
in
his
highest
point.
Area
The
calculated
area
of
the
peak
using
the
intensity
and
the
duration
of
the
peak.
Charge
The
value
of
the
ion
charge.
This
value
is
obtained
from
the
raw
data
or
by
the
Isotope
grouper
module
Functionality
This
table
offers
the
possibility
to
sort
the
rows
by
some
of
the
columns
by
making
click
on
the
header
of
each
column.
The
tool
bar
at
the
right
side
of
the
table
the
user
can
edit
the
columns
to
display
on
the
table
and
to
print
the
displayed
information.
107
Each
column
has
his
a
tool
tip
that
provides
some
extra
information.
The
contents
of
the
tool
tip
box
in
the
column
identity
depends
on
the
applied
method
to
identified
the
peak.
Please
refer
to
the
"Identification"
section
of
this
help.
This
tool
tip
appears
automatically
when
passing
over
the
mouse.
108
Submenu
A
pop-up
menu
will
appear
if
right
click
is
done.
Some
of
the
options
in
this
menu
are
active
depending
on
the
position
(column)
where
the
click
is
done
and
the
number
of
selected
rows.
Show
Under
this
option
appear
other
submenu
from
where
the
user
can
choose
the
any
visualization
plot
available
for
this
peak.
Some
of
the
options
depends
on
the
characteristics
of
the
selected
row
or
peak.
For
more
information
about
each
kind
of
visualization
refer
to
the
"Visualization
-"
section
of
this
manual.
Please
note
that
when
opening
the
TIC
visualizer,
the
pre-selected
data
files
will
depend
on
the
position
where
the
mouse
button
was
clicked
when
opening
this
pop-up
menu.
If
the
user
clicks
at
a
common
column,
such
as
Average
m/z,
Identity,
Comment
or
Combined
peak
shape,
all
raw
data
files
will
be
chosen
for
TIC
plot.
However,
if
the
user
clicks
on
a
data
file-
specific
column,
only
that
particular
data
file
will
be
chosen
for
TIC
plot.
Search
This
option
is
described
in
the
section
"Identification
-
Online
database
search"
section
of
this
help.
Plot
using
Intensity
Plot
module
An
intensity
plot
will
be
formed
using
the
selected
peaks.
Pleaase
refer
to
"Visualization
-
Intensity
plot"
section
of
this
help.
Manually
define
peak
The
user
can
define
manually
the
range
of
m/z
and
retention
time
of
a
peak.
MZmine
will
create
a
new
one
or
modify
an
existing
peak.
Delete
selected
rows
109
The
selected
rows
will
be
removed
from
the
peak
list.
There
is
no
way
to
recover
the
deleted
rows.
Add
new
row
The
same
functionality
as
Manually
define
peak,
but
applied
to
all
the
raw
data
files
in
the
peak
list.
At
the
end,
the
result
is
a
new
row
with
peaks
from
all
the
raw
data
files
using
the
input
ranges
by
the
user..
For
the
Identity
column,
a
specific
submenu
will
appear
with
left
click.
This
submenu
has
the
options
to
choose
or
edit
the
active
identity
from
many
associated
to
the
same
peak,
110
Functionality
This
plot
is
using
the
third
part
library
JfreeChart
for
its
basic
functionality.
To
zoom
in,
drag
the
mouse
from
left
to
right,
selecting
the
area
to
zoom.
To
zoom
out
drag
the
zoom
from
right
to
left.
At
the
top
of
the
window
appears
information
about
the
number
of
displayed
peaks
(found
information
in
the
two
selected
peak
lists).
At
the
bottom
of
the
window,
there
are
combo
box
where
the
user
can
choose
two
peak
lists
to
compare,
one
per
axis.
The
tool
tip
shows
the
chromatographic
information
of
the
peak
from
all
the
raw
data
files
in
the
peak
list.
the
user
can
use
the
right
click
menu
and
select
the
option
"Show
chromatogram"
to
display
a
TIC
plot
of
the
selected
peak.
Please
refer
to
"Visualization
-
TIC
plot"
section
of
this
help.
A
search
for
a
peak
can
be
done
using
three
options
(name,
retention
time
and
m/z
value).
There
is
a
check
box
to
switch
on/off
display
labels
from
matched
peaks
to
the
search
parameter.
111
112
113
Parameters
Raw
data
files
Column
of
peaks
to
be
plotted.
Plotted
data
type
Peak's
data
to
be
plotted
(
m/z
value,
height,
area
or
retention
time)
Plotted
data
range
Range
of
data
to
be
plotted.
This
range
is
automatically
loaded
with
max
values
from
the
raw
data.
Number
of
bins
The
plot
is
divides
into
this
number
of
bins.
Functionality
This
plot
is
using
the
third
part
library
JfreeChart
for
its
basic
functionality.
To
zoom
in,
drag
the
mouse
from
left
to
right,
selecting
the
area
to
zoom.
To
zoom
out
drag
the
zoom
from
right
to
left.
The
next
figure
shows
a
histogram
using
the
retention
time
value.
This
data
is
coming
from
a
raw
data
with
a
duration
of
60
min.
Most
of
the
peaks
appears
around
17
min.
114
The
next
figure
shows
a
histogram
using
the
m/z
value.
This
data
is
coming
from
a
raw
data
with
a
range
from
350
to
1400
m/z.
Functionality
This
plot
is
using
the
third
part
library
JfreeChart
for
its
basic
functionality.
To
zoom
in,
drag
the
mouse
from
left
to
right,
selecting
the
area
to
zoom.
To
zoom
out
drag
the
zoom
from
right
to
left.
115
A
tool
bar
on
the
right
side
of
the
window
has
a
button
to
switch
off/on
the
display
of
lines
to
connect
the
points.
116
Optionally
an
error
bar
can
be
displayed
indicating
the
standard
deviation
of
the
peak
value
in
many
samples.
In
order
to
display
this
bar,
the
raw
data
files
need
to
be
grouped.
This
functionality
is
done
by
setting
sample
parameters
defined
by
the
user.
Parameters
X
axis
X
axis
type.
The
user
can
choose
from
"Retention
time"
or
"Precursor
mass"
Retention
time
Retention
time
(X
axis)
range.
Precursor
m/z
Range
of
precursor
m/z
values.
Fragments
Number
of
most
intense
fragments
to
check
for
neutral
loss.
117
Functionality
This
plot
is
using
the
third
part
library
JFreeChart
for
its
basic
functionality.
To
zoom
in,
drag
the
mouse
from
left
to
right,
selecting
the
area
to
zoom.
To
zoom
out
drag
the
zoom
from
right
to
left.
118
The
user
can
highlight
spots
in
a
range
of
m/z.
This
range
can
be
for
neutral
loss
or
for
the
parent
ion
mass.
This
option
appears
in
a
submenu
by
making
right
click
on
the
plot
area.
119