Вы находитесь на странице: 1из 3

VOL. 28 NO.

1 (2016)

TONY DAVIES COLUMN

Spectroscopic data handling at


petabyte scale
Antony N. Davies,a,b Shane R. Ellis,c Benjamin Balluffc and Ron M. Heerenc
a
Strategic Research Group Measurement and Analytical Science, Akzo Nobel Chemicals b.V., Deventer, the
Netherlands
b
SERC, Sustainable Environment Research Centre, Faculty of Computing, Engineering and Science, University of South
Wales, UK
c
Maastricht MultiModal Molecular Imaging Institute M4I, Universiteitssingel 50, 6229 ER Maastricht, The Netherlands

For many analytical spectroscopists, data ing approaches for nanomedicine and processing, a significant challenge within
handling challenges arise every few years biomedical research. Peter Peters team itself. An example of the state-of-the-art
when the space on the USB stick used use techniques such as high-resolu- development work being undertaken
to move data between the spectrom- tion cryo-electron microscopy to investi- with commercial companies is the beta
eter and the office computer becomes gate complex protein structures in cells. testing of a new parallel imaging MS/
full. With the founding of the Maastricht This far-sighted strategic decision by MS nanoTOF II from Physical Electronics
MultiModal Molecular Imaging Institute Maastricht University has attracted signif- (PHI) (Figure 1) where the TOF-SIMS
M4I at the Brightlands Maastricht Health icant funding to the location and has spectrum (MS1) and the MS/MS spec-
Campus and the associated appoint- allowed an unrivalled capability to be trum (MS2) are acquired in parallel. This
ment of two new Professors, the inter- established which is still growing. high-information-volume methodology
ests of Professor Ron Heeren with the allows researchers to directly compare
Division of Imaging Mass Spectrometry Size of the data storm spectra, images or depth profiles from
and Professor Peter Peters and the Working together with some of the major MS1 and MS2 of the same three-dimen-
Division of Nanoscopy, a perfect storm instrument vendors in our field they are sional volume containing hundreds of
of data has been created. At the largest now generating data at the rate of 100s of thousands or more pixels.
molecular imaging centre in Europe, Ron GBytes/day. These large amounts of data On another new instrument, the
Heerens group study high-resolution need to be rapidly and securely stored Bruker rapifleX MALDI Tissuetyper
molecular imaging of biological systems in a location which is also designed to time-of-flight (TOF) which offers acqui-
and polymers through the development be able to serve this data back to the sition rates up to 50 times faster than
and application of state-of-the-art mass individual researchers when they need other MALDI imaging systems, they have
spectrometry based molecular imag- to begin the task of data analysis and generated results from experiments that
were performed on brain sections with
pixel sizes ranging from 1010m 2
to 5050m2. The data generated in
both positive- and negative-ion modes
yielded information-rich and comple-
mentary lipid spectra revealing the
spatial changes of the lipidome compo-
sition throughout the mouse brain. The
speed of the instrument allowed an
entire mouse brain to be imaged consec-
utively in both positive- and negative-ion
mode in ~35minutes.1 These high acqui-
sition speeds allow work on new classes
of matrices that are unstable under high
vacuum for MALDI-MSI studies, but, of
course, this means large amounts of data
are now acquired much faster, thus plac-
Figure 1. The new Physical Electronics NanoToF II tandem SIMS system in operation at M4I in ing further demands on IT infrastructure.
Maastricht. A typical experiment from this instrument

www.spectroscopyeurope.com SPECTROSCOPYEUROPE 15
VOL. 28 NO. 1 (2016)

TONY DAVIES COLUMN


connected via GigabitEthernet connec-
(a) (b)
tions to the instruments, data analy-
sis clients and university network. In
order to reduce the data transfer rates
between the storage and the data
analysis units, the mass spectrometry
imaging (MSI) data is processed and
reduced on the fly during acquisition.

~7 mm
The latter can lead to a 100- to 1000-
fold reduction, depending on the type
of data, enabling acceptable response
times for the analysis by the researcher.
Also MSI data can benefit tremen-
dously from parallelised processing, as
100% an MSI dataset is a collection of indi-
vidual mass spectra where each spec-
trum can be treated separately. Hence,
0% commercial as well as in-house devel-
oped software make use of multi-core
Figure 2. (a) Positive-ion images of [PC(40:6)+K]+, [PC(38:6)+K]+ and [PC(36:1)+K]+ observed processing systems. At the Maastricht
at m/z 972, 844 and 826 and shown in red, blue and green, respectively, acquired with a
University there are currently two nodes
2020m raster. This image contained 181,723 pixels. (b) Enlarged region showing the comple-
of 64 cores and each with 512GB RAM
mentary distributions of these ions in the cerebellum. The corresponding H&E-stained section is
shown on the right. Reproduced from Reference 1 with permission; 2015 John Wiley & Sons memory available. As a partner in the
Ltd. Dutch Life Science Grid, it is possible
to upscale to greater computational
power using clusters of other participat-
(see Figure 2) yields data at around are cutting and allows the differentia- ing centres.
10100GB per tissue. In this particular tion of tumorous and healthy tissue. This Another important pillar for data
case the raw data stream is made up critical information, based on a series of analysis, successful interpretation and
of over 181,000 individual mass spec- collected mass spectra, helps ensure all generation of relevant results, is an
tra measured at a resolution of 20m. tumorous tissue is removed and mini- IT-infrastructure for the integration of the
Of course such advancements open the mises the need for follow-up surgery. data with other data. In the context of
way to analysis of large tissue cohorts projects that run in collaboration with the
for clinical studies. In this respect TBs of Weathering the Academic Hospital of Maastricht (AZM),
raw data are expected which must be stormdata handling this can be clinical data or other types
treated carefully along with the confiden- infrastructure of data that has been obtained by other
tial, associated patient data. In order to efficiently master the in-house techniques from the same sample/
Another area of large data production data tsunami and to provide the research- patient (e.g. genomic data, MRI scans
and analysis in this group at Maastricht ers with the opportunity of actually inter- etc.) Other data can also be meta-data
is from the team working on developing preting the data volumes and converting related to the experiment such as instru-
the medical applications of the Waters them into knowledge (with the associ- ment settings during data acquisition or
iKnife Rapid Evaporative Ionisation Mass ated publications of course!) the follow- the sample preparation protocol. This
Spectrometry (REIMS) systems and ing infrastructure has been put in place IT-infrastructure of storage and integra-
their associated databases. This system (see Figure 3). tion also enables to fulfill the require-
allows for molecular analysis of surgi- The demands on that IT-infrastructure ments of the FAIR data criteria.
cally removed tissue in real-time during in data handling are two-fold: on one
the cutting process by collecting the side huge amounts of data have to be FAIR data
smoke produced and introducing it into stored somewhere (storage space), as The M4I is, with other Dutch-based
a mass spectrometer (in this case a Xevo the data produced surpasses standard research groups, a member of the Dutch
system from Waters). It relies heavily PC storage possibilities. And on the other Techcentre for Life Sciences (DTL) who
on the generation and access to tissue hand, this amount of data has to be are promoters of the FAIR Data approach
and disease-specific databases that are moved in a short amount of time from (http://www.dtls.nl/fair-data/). Long-
compared to the molecular profile of the and to the storage (network speed). term readers of this column will have no
tissue in contact with the surgical knife. At the M4I, a petabyte centralised difficulty in recognising and welcoming
It thus provides real-time feedback to storage system from Hitachi Data the ideals behind the FAIR data approach.
surgeons as to the type of tissue they Systems has been installed which is As they describe it data should be:

16 SPECTROSCOPYEUROPE www.spectroscopyeurope.com
VOL. 28 NO. 1 (2016)

TONY DAVIES COLUMN


Dutch Life
Science Grid
rawdata HitachiDataSystemsPetabyteStorage

processed
imagingdata

clinicaldata DATAIN DATAOUT


Gigabit-Ethernet Multi-core
processing
Data
systems
experimental
metadata
Inform
ation

etc... Knowledge

Researchers
processdata

Figure 3. Very rough outline of the data generation to publication pathway at M4I.

Findableeasy to find by both To be Findable: To be Re-usable:


humans and computer systems and F1. (meta)data are assigned a glob- R1. meta(data) have a plurality of
based on mandatory description of ally unique and eternally persistent accurate and relevant attributes
the metadata that allow the discov- identifier R1.1. (meta)data are released
ery of interesting datasets; F2. data are described with rich with a clear and accessible data
Accessiblestored for the long term metadata usage license
such that they can be easily accessed F3. (meta)data are registered or R1.2. (meta)data are associated
and/or downloaded with well- indexed in a searchable resource with their provenance
defined license and access condi- F4. metadata specify the data iden- R1.3. (meta)data meet domain-
tions (Open Access when possible), tifier relevant community standards
whether at the level of metadata or at To be Accessible:
the level of the actual data content; A1. (meta)data are retrievable by Conclusions
Interoperableready to be combined their identifier using a standardised In conclusion it is very pleasing to see
with other datasets by humans as communications protocol not only significant investment into
well as computer systems; A1.1. the protocol is open, free advanced spectroscopic techniques
Reusableready to be used for future and universally implementable being made in Europe during econom-
research and to be processed further A1.2. the protocol allows for an ically difficult cycles, but also that the
using computational methods. authentication and authorisation longer-term future of the spectroscopic
As such the DTL is working with simi- procedure, where necessary data is also at the forefront of the minds
larly interested international bodies on A2. metadata are eternally accessible, of those fortunate enough to be receiv-
the FAIR Data Stewardship of scientific even when the data are no longer ing this support and a key enabler of
information (https://www.force11.org/ available their strategy and, we are confident, of
group/fairgroup/fairprinciples). To be Interoperable: their future success.
These lay down exactly what steps an I1. (meta)data use a formal, acces-
organisation needs to take in order to sible, shared and broadly applicable References
meet the ideals of the FAIR data approach. language for knowledge representa- 1. N. Ogrinc Potocnik, T. Porta, M. Becker, R.M.
Heeren and S.R. Ellis, Use of advantageous,
This is still work in progress but is very tion. volatile matrices enabled by next-generation
well aligned as general principles for I2. (meta)data use vocabularies that high-speed matrix-assisted laser desorption/
sensible Big Data archiving not only in the follow FAIR principles ionization time-of-flight imaging employing
a scanning laser beam, Rapid Commun.
bio-spectroscopy fields but for all of us I3. (meta)data include qualified refer- Mass Spectrom. 29, 2195203 (2015). doi:
regardless of our specific areas of interest. ences to other (meta)data http://dx.doi.org/10.1002/rcm.7379

www.spectroscopyeurope.com SPECTROSCOPYEUROPE 17

Вам также может понравиться