Manuscript v2 Shuaishuai

Simultaneous spectral and volumetric imaging of moving objects
Shuaishuai Zhu1, Liang Gao2,3, Yu Zhang1, Jie Lin1, and Peng Jin1,*
1Centerof Ultra-precision Optoelectronic Instrument, Harbin Institute of Technology, 2 Yikuang St., Harbin 150080, China
2Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 306 N. Wright St., Urbana, Illinois 61801, USA
3Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, 405 N. Mathews Ave., Urbana, Illinois
61801, USA
*Corresponding author: p.jin@hit.edu.cn
Abstract
Multi-dimensional imaging is a powerful technique for many applications, such as biological analysis, remote sensing, and object recognition.
Most existing multi-dimensional imaging systems rely on scanning or camera arrays, which make the system bulky and unstable. To some extent,
these problems can be mitigated by employing compressed sensing algorithms. However, they are computationally expensive and rely on the
ill-posed assumption that the information is sparse in a given domain . Here, we propose a snapshot spectral-volumetric imaging (SSVI) system
by introducing the paradigm of light-field imaging into Fourier transform imaging spectroscopy. We demonstrate that SSVI can reconstruct a
complete plenoptic function, 𝑃 (𝑥, 𝑦, 𝑧, 𝜃, 𝜑, 𝜆, 𝑡), of the incoming light rays using a single detector. Compared with other multidimensional
imagers, SSVI features significant advantages in compactness, accuracy, and low cost.
Keywords: Computational imaging; Hyperspectral imaging; Three-dimensional imaging; Convolutional neural network.
1. Introduction
In an imaging system, each incoming light ray carries abundant information, which is described as the plenoptic function by Adelson and Bergen
[1]. This seven-dimensional (7D) plenoptic function, 𝑃 (𝑥, 𝑦, 𝑧, 𝜃, 𝜑, 𝜆, 𝑡), gives the spatial (𝑥, 𝑦, 𝑧) and angular (𝜃, 𝜑) coordinates, wavelength
(𝜆), and time (𝑡) of an incoming light ray. Gao and Wang [2] further expanded the function to nine dimensions by including the polarization
orientation and ellipticity angles (𝜓, 𝜒). Conventional imaging systems which only record the information from two dimensions (𝑥, 𝑦), fail to
accurately capture and represent the object being imaged.
Spectral information in the wavelength (𝜆) reveals the chemical and molecular characteristics of the object [3-6]. Spectral imaging has been
widely used in many fields, such as biology and biomedicine [3,7], remote sensing [4,8], and food quality control [5,6], due to its three-dimensional
(3D) (𝑥, 𝑦, 𝜆) imaging capabilities. On the other hand, the morphological and functional information of the object, carried by the volumetric
spatial data (𝑥, 𝑦, 𝑧), can be very useful in biomedical imaging [9-12], photography [13], object recognition [14,15], and particle image velocimetry
[16,17]. In the past few decades, numerous efforts have been made to push the limits of imaging systems and capture light information in more
dimensions. In recent years, spectral-volumetric imaging, which is capable of capturing a four-dimensional (4D) datacube (𝑥, 𝑦, 𝑧, 𝜆) of the
incoming light rays, has drawn considerable interest with applications in biomedicine [18,19,29], remote sensing [19,21], object recognition [22],
and multimedia [23]. Most spectral-volumetric imaging systems rely on scanning [18-25] or multi-shot [26,27] which require long acquisition
times that restrict their use in dynamic imaging. Fusing data from multi-sensor systems [28-31] is a feasible strategy to capture a 4D datacube
(𝑥, 𝑦, 𝑧, 𝜆) in a single snapshot, but these systems are bulky and suffer from alignment error. Feng et al. have reported on compressed sensing to
achieve snapshot 4D spectral-volumetric imaging [32]. However, this technique is computationally expensive and makes a poor assumption that
the 4D datacube (𝑥, 𝑦, 𝑧, 𝜆) is sparse in a given domain.
Here, we demonstrate a snapshot spectral-volumetric imaging (SSVI) system by introducing the paradigm of light-field imaging into Fourier
transform imaging spectroscopy. In the SSVI system, we employ a single detector to record the light-field image coupled with the interference
from a birefringent polarization interferometer. A convolutional neural network has been developed to decouple the light-field image and the
interference. We derive the 3D volumetric datacube (𝑥, 𝑦, 𝑧) and 3D spectral datacube (𝑥, 𝑦, 𝜆) from the light-field image and the interference,
respectively. Combining these two datacubes with the 4D light-field datacube (𝑥, 𝑦, 𝜃, 𝜑) gives the complete, 7D plenoptic function,
𝑃(𝑥, 𝑦, 𝑧, 𝜃, 𝜑, 𝜆, 𝑡). To our knowledge, this is the first time a complete plenoptic function of light field has been recorded by a single detector.
Compared with multi-sensor-based systems [28-31], SSVI is compact, robust, and inexpensive. In addition, SSVI preserves the high-frequency
details of the object in both spatial and spectral domains.
2. Methods
A. Snapshot spectral-volumetric imaging system
The configuration of the SSVI system is shown in Fig. 1. The incident light from an object is first imaged by an objective lens (Canon EF 50mm
f/1.8), forming the first intermediate image (InIm) at a field stop. This InIm is relayed by a relay lens (Canon EF 50mm f/1.4) to a virtual InIm,
indicated as the second InIm in Fig. 1(a). After that, an elemental image (EI) array, denoted as the third InIm, is formed through a microlens array
(MLA) (AMS APO-Q-P1000-R5). Finally, the third InIm is relayed by a relay lens onto a charge-coupled device (CCD).
Fig. 1 Snapshot spectral-volumetric imaging system configuration. (a) System sketch. InIm, intermediate image; MLA, microlens array; CCD,
charge-coupled device; EI, elemental image. (b) Configuration of the birefringent polarization interferometer (BPI). Components between the
polarizer Ⅰ and the Wollaston prism have been omitted for clarity. The red arrow and circle indicate the polarization eigenmodes of the Wollaston
prism, while the black ones denote the polarization of light rays. The transmission axes of the two polarizers are both oriented at 45° with respect
to the polarization eigenmodes of the Wollaston prism. (c) The rotated BPI and CCD. Other components have been omitted.
The SSVI system incorporates a birefringent polarization interferometer (BPI) which contains two polarizers and a Wollaston prism, as shown in
Fig. 1(b). The Wollaston prism was custom-made from quartz. Based on the birefringence of the prism, the BPI separates the incident light rays
into two paths which are then converged by relay lens Ⅱ and interfere on the CCD. As depicted in Fig. 1(c), the BPI is rotated about the 𝑧-axis by
a small angle (𝛿) with respect to the 𝑦-axis. Thus, the optical path difference (OPD) on the CCD can be derived as [33,34]
𝑂𝑃𝐷 (𝑥, 𝑦) = 2𝐵 tan(𝛼) [(𝑥 − 𝑥0 ) cos(𝛿 ) − 𝑦 ∙ sin(𝛿 )]/𝑀𝑅2 , (1)
where 𝐵 is the birefringence of quartz, 𝛼 is the wedge angle of the Wollaston prism, 𝑀𝑅2 is the magnification of relay lens Ⅱ, and 𝑥0 is the 𝑥
offset of the zero-OPD reference position. A virtual interference is located inside the Wollaston prism, as indicated by the blue dashed line in Fig.
1(b). In SSVI, we adopt a focused light-field configuration – the second InIm is located behind the MLA [35]. To record both the light-field and
interference of the object, we set the third InIm and the virtual interference within the depth of field (DOF) of relay lens Ⅱ.
The MLA design choice affects the resolution in the lateral (𝑥, 𝑦), spectral (𝜆) and angular (𝜃, 𝜑) dimensions. In a focused light-field camera, the
lateral resolution is determined by the pitch (microlens diameter) and numerical aperture (NA) of the MLA, while the angular resolution is
controlled by the number of the microlenses in the MLA [35]. On the other hand, the spectral resolution of a BPI-based imaging spectrometer is
related to both the pitch and number of the microlenses in the MLA [33,34]. Here, we employ an MLA which contains 15×15 microlenses with a
pitch of 1mm×1mm. The focal length of each microlens is 10.9mm which gives 𝑁𝐴 = 0.0459. A bi-telecentric lens with 0.44× magnification is
used to relay the third InIm onto the CCD. In the BPI, we employ a customized Wollaston prism with a wedge angle 𝛼 = 7.6° and an 𝑥 offset
𝑥0 ~3.5mm, which makes the largest OPD around 28μm.
B. Reconstruction algorithm
Mathematically, we denote the plenoptic function of an incoming light ray from the object as 𝑃(𝑥, 𝑦, 𝑧, 𝜃, 𝜑, 𝜆, 𝑡). This can be reduced to
𝑃(𝑥, 𝑦, 𝜃, 𝜑, 𝜆) under the assumptions that the function does not vary within a single integration time of the detector and the radiance along a
ray is a constant [36]. Without loss of generality, we can consider only the rays in the 𝑦𝑜𝑧 plane and the function further simplifies to 𝑃(𝑦, 𝜑, 𝜆).
For a centered lens system, the output ray vector [𝑦′, 𝜑′]′ at the image point can be derived as [37]
𝑦′ 𝑀 0 𝑦
[ ]=[ ] [ ], (2)
𝜑′ −1/𝑓 1/𝑀 𝜑
where [𝑦, 𝜑]′ is the input ray vector at the object point, 𝑀 and 𝑓 are the magnification and focal length of the system, respectively. For the SSVI
system with an MLA, we shift the optical axis to the center of each microlens and shift it back to the 𝑧-axis after doing the transformation given
in Eq. 2. For a light ray passing through the 𝑛th microlens along the 𝑦-axis, the output ray vector [𝑦′, 𝜑′]′ at the image point can be calculated as
𝑦′ 𝑀 0 𝑦 − 𝑑𝑛
[ ]=[ ][ ] + [𝑑𝑛 ], (3)
𝜑′ −1/𝑓 1/𝑀 𝜑 0
where 𝑑𝑛 is the distance from the optical axis to the center of the 𝑛th microlens. Here, we define the distances pointing up and down as positive
and negative, respectively. Combining the interference introduced by the BPI, we can derive the radiance of light rays on the CCD as
𝑦−𝑑𝑛 𝑦−𝑑𝑛 1 2𝜋
𝑃′ (𝑦, 𝜑, 𝜆) = 𝑃( + 𝑑𝑛 , + 𝑀𝜑, 𝜆) ∙ [1 + cos⁡( 𝑂𝑃𝐷)]. (4)
𝑀 𝑓 2 𝜆
Extending Eq. 4 to 3D space and integrating over all directions and wavelengths at each location (𝑥, 𝑦) gives the intensity of the raw image
captured by the CCD:
1 2𝜋
𝐼 (𝑥, 𝑦) = ∭ 𝑃(𝑥 ′ , 𝑦 ′ , 𝜃 ′ , 𝜑′, 𝜆) ∙ {1 + cos ⌈ 𝑂𝑃𝐷 (𝑥, 𝑦)⌉} 𝑑𝜃𝑑𝜑𝑑𝜆, (5)
2 𝜆
where
𝑥−𝑑𝑚 𝑦−𝑑𝑛 𝑥−𝑑𝑚 𝑦−𝑑𝑛
𝑥′ = + 𝑑𝑚 , 𝑦 ′ = + 𝑑𝑛 , 𝜃 ′ = + 𝑀𝜃, 𝜑 ′ = + 𝑀𝜑. (6)
𝑀 𝑀 𝑓 𝑓
𝑑𝑚 and 𝑑𝑛 are the distances from the optical axis to the center of the microlens, through which the light rays pass along the 𝑥- and 𝑦-axis,
respectively. If we assume a Lambertian surface, the spectra (𝜆) of light rays are independent of the propagation angles (𝜃, 𝜑) and Eq. 5 can be
decomposed as
1 2𝜋
𝐼 (𝑥, 𝑦) = ∬ 𝑙 (𝑥 ′ , 𝑦 ′ , 𝜃 ′ , 𝜑 ′ ) 𝑑𝜃𝑑𝜑 ∙ ∫ 𝑠 (𝑥, 𝑦, 𝜆) ∙ {1 + 𝑐𝑜𝑠 [ 𝑂𝑃𝐷 (𝑥, 𝑦)]} 𝑑𝜆, (7)
2 𝜆
where 𝑙 (𝑥 ′ , 𝑦 ′ , 𝜃 ′ , 𝜑 ′ ) and 𝑠(𝑥, 𝑦, 𝜆) denote the 4D light-field and the 3D spectral datacube of the object, respectively.
Figure 2 shows the image processing pipeline, which consists of four steps, namely (Ⅰ) light-field-interference decouple, (Ⅱ) depth reconstruction,
(Ⅲ) interferogram extraction, and (Ⅳ) spectral datacube reconstruction. As indicated in Eq. 7, the light-field and interference are multiplicatively
coupled in the raw image 𝐼(𝑥, 𝑦). In step Ⅰ, we develop a convolutional neural network, as shown in Fig. 2(a), to decouple the light-field and
interference. We refer to the proposed decoupling convolutional neural network as DECONN. To convert the multiplicative coupling mode into
an additive one, we take the logarithm of the raw image, i.e.
1 2𝜋
log[𝐼 (𝑥, 𝑦)] = log[∬ 𝑙 (𝑥 ′ , 𝑦 ′ , 𝜃 ′ , 𝜑 ′ ) 𝑑𝜃𝑑𝜑 ] + log [∫ 𝑠(𝑥, 𝑦, 𝜆) ∙ {1 + 𝑐𝑜𝑠 [ 𝑂𝑃𝐷 (𝑥, 𝑦)]} 𝑑𝜆]. (8)
2 𝜆
Inspired by the residual learning strategy [38], we set the interference instead of the light-field image of the object as the learning target, because
the light-field image is much more complex than the interference of the object. As shown in Fig. 2(a), the DECONN architecture consists of two
hidden layers and an output layer. The forward propagation of our network can be represented by three operations
𝐻1 (𝐼𝑖 ) = ReLU{𝐖1 ∗ log[𝐻0 (𝐼𝑖 )] + 𝐛1 }, (9)
𝐻2 (𝐼𝑖 ) = ReLU[𝐖2 ∗ 𝐻1 (𝐼𝑖 ) + 𝐛2 ], (10)
𝐻3 (𝐼𝑖 ) = exp[𝐖3 ∗ 𝐻2 (𝐼𝑖 ) + 𝐛3 ], (11)
where 𝐖𝑙 and 𝐛𝑙 (𝑙 = 1,2,3) are the weight and bias parameters that need to be learned through training, and ReLU(∙) denotes the rectified
linear unit, ReLU(𝑥) = max(0, 𝑥). 𝐼𝑖 is the 𝑖th image in the training set and ∗ indicates convolution. 𝐻𝑙 (𝐼𝑖 )(𝑙 = 1,2) is the output of the 𝑙th
hidden layer, while 𝐻0 (𝐼𝑖 ) and 𝐻3 (𝐼𝑖 ) are the input raw image and the output interference, respectively.
Fig. 2 Flowchart of image processing pipeline. (a) Framework of the light-field-interferogram decoupling convolutional neural network (DECONN).
Log, logarithm; Conv, convolution; ReLU, rectified linear unit; BN, batch normalization. (b) Light-field image. (c) An epipolar plane image extracted
from the light-field image. (d) Disparity map. (e) Depth map. (f) An epipolar plane image extracted from the raw image. (g) Interferogram of a
single pixel. (h) Spectrum of a single pixel. FFT, fast Fourier transform. (i) Spectral datacube.
To generate the training set, we used the spectral images from the ICVL spectral dataset released by Arad and Ben-Shahar [39]. These images
were captured by a line scanner camera (Specim PS Kappa DX4 hyperspectral) with a lateral resolution of 1392×1300 over 519 spectral bands,
and then downsampled to 31 spectral channels from 400nm to 700nm. We further downsampled the spectral images to the same size of a single
elemental image captured by the SSVI system and replicated each downsampled image by 5×5. Then a synthesized elemental image array with
interference, 𝐈, was derived through Eq. 1 and 7, while a corresponding elemental image array without interference, 𝐄, was simply calculated by
accumulating the slice images over all spectral channels. We define the loss function of the DECONN as
1 𝐈𝑖 2
𝐿= ∑𝑁 3( )
𝑖=1 ‖𝐻 𝐈𝑖 − ‖ , (12)
𝑁 𝐄𝑖 2
where 𝑁 is the number of images in the training set. The network was implemented in MatConvNet [40]. Training and evaluation were run on a
workstation with Intel(R) Xeon(R) CPU E5-2650, 2.0 GHz CPU, 128G RAM, and NVIDIA GeForce GTX 1080 Ti. It took about 20 hours to train the
full network. An example input raw image and the corresponding output interference are shown in Fig. 2(a). Dividing the input image by the
interference gives the light field image [Fig. 2(b)], which can be mathematically expressed as
𝑥−𝑑𝑚 𝑦−𝑑𝑛 𝑥−𝑑𝑚 𝑦−𝑑𝑛
𝐼𝐿𝐹 (𝑥, 𝑦) = ∬ 𝑙 ( + 𝑑𝑚 , + 𝑑𝑛 , + 𝑀𝜃, + 𝑀𝜑) 𝑑𝜃𝑑𝜑. (13)
𝑀 𝑀 𝑓 𝑓
In step Ⅱ, we consider the MLA as an array of stereo cameras and arrange the elemental image array into a four-dimensional datacube
𝐿𝐹 (𝑥, 𝑦, 𝑚, 𝑛), where 𝑚 and 𝑛 index the elemental images along the 𝑥- and 𝑦-axis, respectively. A 2D 𝑦 − 𝑛 slice of the datacube, dubbed an
epipolar plane image (EPI), is shown in Fig. 2(c). The tilting angle of the ‘line’ structure, 𝛽, in the EPI corresponds to a certain depth of that point
in the object space [41]. The disparity between the top and bottom elemental images can be derived by 𝐷 = ℎ ∙ tan(𝛽), where ℎ is the width of
the EPI along the 𝑛-axis. We adopt a disparity estimation algorithm based on scale-depth space transform [42] to reconstruct the disparity map.
Finally, the depth map of the object can be derived through a calibration procedure [43].
In step Ⅲ, we extract interference vectors from the raw image under the guidance of the tilting angles derived in the last step. Fig. 2(f) shows
an example of the EPIs reconstructed from the raw image. The red dashed line indicates the tilting angle at that position. The vector extracted
along the red dashed line from bottom to top is a portion of the corresponding interference vector. Concatenating all the vectors extracted from
the EPIs along the 𝑚-axis gives the complete interference vector.
Mathematically, we denote the extracted interference vector as [𝐿𝐹𝑟𝑎𝑤 (𝑥𝑖 , 𝑦𝑖 , 𝑚𝑖 , 𝑛𝑖 )]𝑖=1,2,…,𝑃∙𝑄 , where 𝐿𝐹𝑟𝑎𝑤 (𝑥, 𝑦, 𝑚, 𝑛) is the light-filed
datacube reconstructed from the raw image. 𝑃 and 𝑄 are the number of elemental images along the 𝑚- and 𝑛-axis, respectively. Here, we
consider the central elemental image as a reference. According to the extracting strategy described above, the coordinates, (𝑥𝑖 , 𝑦𝑖 , 𝑚𝑖 , 𝑛𝑖 ), can
be calculated by
𝑖
𝑚𝑖 = ⌈ ⌉ , (14)
𝑄
𝑛𝑖 = Q − mod[(𝑖 − 1), 𝑄 ], (15)
𝑥𝑖 = 𝑥𝑖𝑐 + 𝐷𝑎 (𝑚𝑖 − 𝑚𝑐 ), (16)
𝑦𝑖 = 𝑦𝑖𝑐 + 𝐷𝑎 (𝑛𝑖 − 𝑛𝑐 ), (17)
where ⌈∙⌉ denotes the ceiling operation, and mod(𝑥, 𝑦) calculates the remainder after dividing 𝑥 by 𝑦. The central elemental image is indexed
as⁡(𝑚𝑐 , 𝑛𝑐 ) and [𝑥𝑖𝑐 , 𝑦𝑖𝑐 ] are the coordinates of the corresponding point in the central elemental image. 𝐷𝑎 denotes the disparity between
adjacent elemental images. An OPD vector corresponding to the interference vector can be derived as [𝑂𝑃𝐷 (𝑥𝑖𝑟𝑎𝑤 , 𝑦𝑖𝑟𝑎𝑤 )]𝑖=1,2,…,𝑃∙𝑄 , where
(𝑥𝑖𝑟𝑎𝑤 , 𝑦𝑖𝑟𝑎𝑤 ) are the coordinates of the point in the raw image corresponding to the point (𝑥𝑖 , 𝑦𝑖 , 𝑚𝑖 , 𝑛𝑖 ) in the light-field datacube. These
coordinates can be calculated by
𝑥𝑖𝑟𝑎𝑤 = 𝑥𝑖 + (𝑚𝑖 − 𝑚𝑐 )𝑑 (18)
𝑦𝑖𝑟𝑎𝑤 = 𝑦𝑖 + (𝑛𝑖 − 𝑛𝑐 )𝑑 (19)
where 𝑑 is the distance between the centers of adjacent elemental images. Combining Eqs. 1, 16, 17, 18, and 19 gives
2𝐵 tan(𝛼)
𝑂𝑃𝐷 (𝑥𝑖𝑟𝑎𝑤 , 𝑦𝑖𝑟𝑎𝑤 ) = {[𝑥𝑖𝑐 − 𝑥0 − (𝑚𝑖 − 𝑚𝑐 )(𝑑 − 𝐷𝑎 )] cos(𝛿 ) − [𝑦𝑖𝑐 − (𝑛𝑖 − 𝑛𝑐 )(𝑑 − 𝐷𝑎 )] sin(𝛿 )} (20)
𝑀𝑅2
We can get an OPD vector with equal intervals by setting 𝛿 = tan−1 (1/𝑄 ) [33,34]. Figure 2(g) plots an example of interference vector at different
OPD values.
In step Ⅳ, we derive the spectrum at point (𝑥𝑖𝑐 , 𝑦𝑖𝑐 ) by taking the Fourier transformation of the interference vector
[𝐿𝐹𝑟𝑎𝑤 (𝑥𝑖 , 𝑦𝑖 , 𝑚𝑖 , 𝑛𝑖 )]𝑖=1,2,…,𝑃∙𝑄 along the OPD vector [𝑂𝑃𝐷 (𝑥𝑖𝑟𝑎𝑤 , 𝑦𝑖𝑟𝑎𝑤 )]𝑖=1,2,…,𝑃∙𝑄 . Figure 2(h) plots the derived spectrum corresponding to the
interference in Fig. 2(g). Finally, following this procedure pixelwise yields a reconstructed spectral datacube 𝑆(𝑥, 𝑦, 𝜆) [Fig. 2(i)].
3. Experiment
A. Real-time spectral-volumetric video
To highlight SSVI’s single-shot multi-dimensional-imaging capability, we visualized a dynamic scene consisting of a static white board with letters
and a green leaf swinging along the line of sight with a period of ~2s [Fig. 3(a)]. The scene was located at a distance of ~0.85m from the SSVI
system and a halogen lamp (MI-150, Edmund) was used for illumination. Using the reconstruction algorithm described in Section 2, we derived
a real-time 4D (𝑥, 𝑦, 𝑧, 𝜆) video with a frame rate of 15Hz (Visualization 1). Figure 3(b-i) shows the depth map and reconstructed image of eight
example frames from the video. Here, the reconstructed images are derived by accumulating the datacube over all spectral channels. As a result,
we are able to visualize the real-time motion of the leaf in 3D (𝑥, 𝑦, 𝑧) . To demonstrate the refocusing ability of the SSVI system, we
reconstructed two videos when the system focuses at 450mm and 850mm (Visualization 2). Figure 3j-m illustrates four example frames in each
reconstructed video.
Fig. 3 Real-time spectral-volumetric video. (a) Illustration of the scene including a static paper with letters and a swinging leaf. (b-i) Samples of
consecutive depth and image frames reconstructed at 15 Hz frame-rate with a lateral resolution of 110 × 110 pixels. (j-m) Samples of image
frames reconstructed when the system focuses at 450mm (upper row) and 850mm (lower row). Note: the full video can be seen in visualizations
1 and 2.
In this experiment, we attached an ‘L’ shaped green paper on the swinging leaf. Figure 4(c) shows a close-up of the leaf captured by a commercial
RGB camera. A frame of the reconstructed-image video is depicted in Fig. 4a. Since the color of the green paper is quite close to that of the leaf,
the contrast between the ‘L’ shaped paper and the leaf is low in Fig. 4a and 4c. Figure 4b plots the spectra of point A and B which are located
on the green paper and the leaf, as indicated in Fig. 4a. The spectrum of point B increases dramatically around 700nm due to the absorbing
property of chlorophyll. By contrast, the spectrum of point A varies smoothly from 500nm to 800nm. This phenomenon, whereby different
spectra appear the same color to RGB cameras and human eyes, is referred to as metamerism. Unlike the monochromatic [Fig. 4(a)] and RGB
images [Fig. 4(c)], the spectral slice at 728.6nm [Fig. 4(d)] shows high contrast between the L-shaped paper and the leaf. The image of the leaf
can be further enhanced by taking the difference between the spectral slice at 728.6nm [Fig. 4(d)] and 673.9nm [Fig. 4(e)].
Fig. 4 Response to metamerism. (a) Reconstructed image. (b) Spectra of points A and B indicated in (a). (c) High-resolution RGB image of the leaf
captured by a commercial camera. (d) Spectral slice at 728.6 nm. (e) Spectral slice at 673.9 nm.
B. Spectral resolution and accuracy

To evaluate the spectral resolution of the SSVI system, we imaged a black board illuminated by three lasers [Melles Griot, 25-LHP-925-230,
632.8nm (red); Oxxius, 532S-100-COL-PP, 532nm (green); Coherent, OBIS 488-60 LS, 488nm (blue)]. The board was located at a distance of
~0.9m from the imaging system. Figure 5a shows the reconstructed image where the three laser points are labeled as ‘Red’, ‘Green’, and ‘Blue’,
respectively. The spectra at the red, green, and blue laser point areas are plotted in Fig. 5(b-d). The solid line in each figure is the fitted Gaussian
curve. The insets depict the spectral slice at 632.6nm, 531.6nm, and 487.1nm, respectively. We consider the spectra of these three lasers as
delta functions and use the full width at half maximum (FWHM) of the measured spectra to characterize the spectral resolution of SSVI. As
indicated in Fig. 5 (b-d), the spectral resolution of the system is 15.4nm, 10.7nm, and 8.4nm, corresponding to 384.6cm−1 , 378.1cm−1 , and
352.8cm−1 in the wavenumber domain, at 632.8nm, 532nm, and 488nm, respectively.
Fig. 5 Spectral resolution of the SSVI system. (a) Reconstructed image. (b-d) Spectra at the red, green, and blue laser point. The insets depict the
spectral slice at 632.6nm, 531.6nm, and 487.1nm, respectively.
To evaluate the accuracy and precision of the spectral-datacube reconstruction, we visualized a ColorChecker (X-Rite, MSCCPPCC0616) which
was located at ~0.78m from the imaging system and illuminated by a halogen lamp (MI-150, Edmund). Figure 6(a) shows a photo of the
ColorChecker captured by a commercial camera. The color blocks are labeled as ‘1’ to ‘16’, respectively. We employed a commercial fiber
spectrometer (Avantes, AvaSpec-ULS 2048-USB2) with a spectral resolution of 1.15nm to measure the spectrum of each block. The comparison
between the spectra derived from the Avantes spectrometer and SSVI in each block are shown in Fig. 6 (c-r), respectively. Considering the
measurements from the Avantes spectrometer as references, we took 10×10 pixels in each block area and calculated the normalized root mean
square error (NRMSE) of the corresponding spectra. The blue line in Fig. 6(b) plots the average and standard deviation of the NRMSEs in each
block. The average NRMSE over all blocks is 6.87%. To compare SSVI with a snapshot-spectral-imaging modality, we transferred the SSVI system
to the spectral imager described in [34] by extracting the interference vectors according to a registering procedure [44] instead of the disparity
map. A spectral datacube, 𝑆 (𝑥, 𝑦, 𝜆), was then reconstructed by taking the Fourier transformation of the interference vectors. The red line in Fig.
6(b) plots the NRMSEs of the spectra derived by the spectral imager, where the average NRMSE over all blocks is 6.57%. Compared with the
original spectral imager, SSVI achieves higher spectral accuracy through the use of light-field imaging.
Fig. 6 Quantitative evaluation of the spectral-datacube reconstruction. (a) Photo of the ColorChecker captured by a commercial camera. (b)
Average NRMSEs of the reconstructed spectra in color blocks. (c-r) Reconstructed spectra in color blocks.
C. Depth accuracy
In this experiment, we imaged a scene containing two white boards with black letters [Fig. 7(a)]. The back board is perpendicular to the 𝑧-axis,
and the front board is tilted with respect to the 𝑥-axis by 45°. As shown in Fig. 7(a), the yellow shadowed area on the back board is occluded by
the front board. The distances from the boards to the imaging system were measured and considered as the ground truth [Fig. 7(b)]. We
reconstructed the depth map [Fig. 7(c)] using the algorithm described in Section 2. We then derived an error map representing the absolute
difference between the reconstructed depth and the ground truth (Fig. 7d), where the RMS error was found to be 7.7mm. To compare SSVI with
a light-field imaging modality, we derived a light-field image without interference by summing the two raw images when the transmission axis
of polarizer Ⅰ was oriented at 45° and 135° with respect to the 𝑥′-axis [Fig. 1(b)]. We employed the same algorithm [42] used in SSVI to
reconstruct the depth map. Figure 7(e) shows the error map of the reconstructed depth from the light-field image without interference, where
the RMS error was found to be 7.4mm. These results show that the light field camera maintains its original depth accuracy even after adding
spectral imaging, as in SSVI.
Fig. 7 Depth accuracy of SSVI. (a) Schematic of the experimental setup. (b) Ground-truth depth of the scene. (c) Reconstructed depth. (d) Error
map of the reconstructed depth from SSVI. (e) Error map of the reconstructed depth from the light-field image without interference.
4. Discussion and conclusion

We have introduced a snapshot spectral-volumetric imaging system which can capture the 4D light-field datacube (𝑥, 𝑦, 𝜃, 𝜑) coupled with the
interference of the incoming light rays within a snapshot of a single detector. We also proposed a post-processing algorithm to reconstruct the
corresponding 4D spectral-volumetric datacube (𝑥, 𝑦, 𝑧, 𝜆) from the raw image. From these experiments, we have reconstructed a spectral-
volumetric video (𝑥, 𝑦, 𝑧, 𝜆, 𝑡) of a dynamic scene. Combining the video with the light-field datacubes, we derived the complete plenoptic
function, 𝑃 (𝑥, 𝑦, 𝑧, 𝜃, 𝜑, 𝜆, 𝑡), of the incoming light rays. To our knowledge, this is the first single-detector imaging system to record the complete
plenoptic function of light rays from an object. Through our analysis of RMS error, we also showed that the individual accuracies of spectral and
light-field imaging are maintained in SSVI.
Compared to multi-camera systems [28-31], SSVI is compact, robust and inexpensive. Camera-array-based systems [29,30] capture the
multispectral light-field by placing different broadband color filters in front of a camera array. This directly limits the optical throughput of the
system as the inverse of the number of spectral channels. SSVI does not suffer this trade-off, as it relies on the Fellget multiplex advantage of a
Fourier transform imaging spectroscopy. Additionally, the lateral, depth, and spectral resolution can be easily adjusted using different Wollaston
prisms and MLAs in our system.
That being said, SSVI has a relatively low throughput because 75% of the light is lost in BPI. One future avenue of development is replacing the
first polarizer in BPI with a polarizing beam splitter (PBS) and a half-wave plate (HWP). Setting the fast axis of the HWP at 22.5° with respect to
the 𝑥-axis makes the polarization of the light rays passing through the PBS and HWP orient at 45° with respect to the 𝑥-axis. We can use these
light rays to reconstruct the 7D plenoptic datacube and employ the light rays reflected by the PBS to capture a high-resolution monochromatic
image [44]. The energy loss in this configuration is reduced to 25%. Moreover, the captured high-resolution monochromatic image can be used
to improve the lateral resolution of the 7D plenoptic datacube during post-processing.
The experimental results presented here demonstrate that SSVI can be used to derive the complete plenoptic function, 𝑃(𝑥, 𝑦, 𝑧, 𝜃, 𝜑, 𝜆, 𝑡), of
the light rays from a dynamic scene with information in the lateral, spectral, depth, and time domains. In light of its unprecedented 7D imaging
capability, we anticipate that SSVI will facilitate a wide range of applications in biological analysis and robotic vision.
Funding
National High Technology Research and Development Program of China (2015AA042401); The National Science Foundation (NSF) CAREER Award
(1652150).
Acknowledgement
The authors thank Faraz Arastu for useful discussions in the writing of this paper.
Reference
1. E. H. Adelson and J. R. Bergen, “The plenoptic function and the elements of early vision,” in Computational Models of Visual Processing
(MIT, 1991), pp. 3–20.
2. Gao, L., & Wang, L. V. (2016). A review of snapshot multidimensional optical imaging: Measuring photon tags in parallel. Physics Reports,
616, 1-37. doi:http://dx.doi.org/10.1016/j.physrep.2015.12.004
3. Lu G, Fei B, editors. Medical hyperspectral imaging: a review2014: SPIE.
4. van der Meer FD, van der Werff HMA, van Ruitenbeek FJA, Hecker CA, Bakker WH, Noomen MF, et al. Multi- and hyperspectral geologic
remote sensing: A review. International Journal of Applied Earth Observation and Geoinformation. 2012;14(1):112-28.
5. Nicolaï BM, Beullens K, Bobelyn E, Peirs A, Saeys W, Theron KI, et al. Nondestructive measurement of fruit and vegetable quality by
means of NIR spectroscopy: A review. Postharvest Biology and Technology. 2007;46(2):99-118.
6. Gowen AA, O'Donnell CP, Cullen PJ, Downey G, Frias JM. Hyperspectral imaging – an emerging process analytical tool for food quality
and safety control. Trends in Food Science & Technology. 2007;18(12):590-8.
7. Levenson RM, Mansfield JR. Multispectral imaging in biology and medicine: slices of life. Cytometry A. 2006;69(8):748-58.
8. Goetz AFH, Vane G, Solomon JE, Rock BN. Imaging Spectrometry for Earth Remote Sensing. Science. 1985;228(4704):1147-53.
9. Prevedel R, Yoon Y-G, Hoffmann M, Pak N, Wetzstein G, Kato S, et al. Simultaneous whole-animal 3D imaging of neuronal activity using
light-field microscopy. Nat Methods. 2014;11:727.
10. Pégard NC, Liu H-Y, Antipa N, Gerlock M, Adesnik H, Waller L. Compressive light-field microscopy for 3D neural activity recording.
Optica. 2016;3(5):517-24.
11. Bedard N, Shope T, Hoberman A, Haralam MA, Shaikh N, Kovačević J, et al. Light field otoscope design for 3D in vivo imaging of the
middle ear. Biomedical Optics Express. 2017;8(1):260-72.
12. Zhu, S., Jin, P., Liang, R., & Gao, L. (2018). Optical design and development of a snapshot light-field laryngoscope. Optical Engineering,
57, 5.
13. Ng R. Digital light field photography: Stanford University; 2006.
14. R. Raghavendra, K. B. Raja, and C. Busch, “Presentation attack detection for face recognition using light field camera,” IEEE Trans.
Image Process. 24, 1060–1075 (2015).
15. K. Maeno, H. Nagahara, A. Shimada, and R. I. Taniguchi, “Light field distortion feature for transparent object recognition,” in IEEE
Conference on Computer Vision and Pattern Recognition (IEEE, 2013), pp. 122–135.
16. J. Belden, T. T. Truscott, M. C. Axiak, and A. H. Techet, “Three dimensional synthetic aperture particle image velocimetry,” Meas. Sci.
Technol. 21, 125403 (2010).
17. K. Lynch, T. Fahringer, and B. Thurow, “Three-dimensional particle image velocimetry using a plenoptic camera,” in 50th AIAA
Aerospace Sciences Meeting (AIAA, 2012), pp. 1–14.
18. Li C, Mitchell GS, Dutta J, Ahn S, Leahy RM, Cherry SR. A three-dimensional multispectral fluorescence optical tomography imaging
system for small animals based on a conical mirror design. Optics express. 2009;17(9):7571-85.
19. Jahr, W., Schmid, B., Schmied, C., Fahrbach, F. O., & Huisken, J. (2015). Hyperspectral light sheet microscopy. Nature Communications,
6, 7. doi:10.1038/ncomms8990
20. Morsdorf F, Nichol C, Malthus T, Woodhouse IH. Assessing forest structural and physiological information content of multi-spectral
LiDAR waveforms by radiative transfer modelling. Remote Sens Environ. 2009;113(10):2152-63.
21. Wallace A, Nichol C, Woodhouse I. Recovery of Forest Canopy Parameters by Inversion of Multispectral LiDAR Data. Remote Sens.
2012;4(2):509.
22. Gleckler AD, Gelbart A, Bowden JM, editors. Multispectral and hyperspectral 3D imaging lidar based upon the multiple-slit streak tube
imaging lidar. Aerospace/Defense Sensing, Simulation, and Controls; 2001: SPIE
23. Mansouri A, Lathuiliere A, Marzani FS, Voisin Y, Gouton P. Toward a 3D Multispectral Scanner: An Application to Multimedia. IEEE
MultiMedia. 2007;14(1):40-7.
24. Latorre-Carmona P, Sánchez-Ortiga E, Xiao X, Pla F, Martínez-Corral M, Navarro H, et al. Multispectral integral imaging acquisition and
processing using a monochrome camera and a liquid crystal tunable filter. Optics express. 2012;20(23):25960-9.
25. Farber V, Oiknine Y, August I, Stern A. Compressive 4D spectral-volumetric imaging. Optics letters. 2016;41(22):5174-7.
26. Gao, L., et al. (2011). "Depth-resolved image mapping spectrometer (IMS) with structured illumination." Opt Express 19(18): 17439-
17452.
27. Rueda, H., et al. (2017). "Single Aperture Spectral+ToF Compressive Camera: Toward Hyperspectral+Depth Imagery." Ieee Journal of
Selected Topics in Signal Processing 11(7): 992-1003.
28. Latorre-Carmona P, Pla F, Stern A, Moon I, Javidi B. Three-Dimensional Imaging With Multiple Degrees of Freedom Using Data Fusion.
Proceedings of the IEEE. 2015;103(9):1654-71.
29. Wu J, Xiong B, Lin X, He J, Suo J, Dai Q. Snapshot Hyperspectral Volumetric Microscopy. Scientific Reports. 2016;6:24624.
30. Zhao Y, Yue T, Chen L, Wang H, Ma Z, Brady DJ, et al. Heterogeneous camera array for multispectral light field imaging. Optics express.
2017;25(13):14008-22.
31. Xiong Z. Snapshot Hyperspectral Light Field Imaging. 2017.
32. Feng W, Rueda H, Fu C, Arce GR, He W, Chen Q. 3D compressive spectral integral imaging. Optics express. 2016;24(22):24859-71.
33. Kudenov MW, Dereniak EL, editors. Compact snapshot birefringent imaging Fourier transform spectrometer. Imaging Spectrometry
XV; 2010.
34. Michael W. Kudenov, E. L. D. (2012). Compact real-time birefringent imaging spectrometer. Opt Express, 20(16), 17973-17986.
35. Zhu S, Lai A, Eaton K, Jin P, Gao L. On the fundamental comparison between unfocused and focused light field cameras. Applied Optics.
2018;57(1):A1-A11.
36. Lam, E. Y. (2015). Computational photography with plenoptic camera and light field capture: tutorial. Journal of the Optical Society of
America a-Optics Image Science and Vision, 32(11), 2021-2032. doi:10.1364/josaa.32.002021
37. A. Gerrard and J. M. Burch. Introduction to matrix methods in optics. 1994. 1
38. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
Jun. 2016, pp. 770–778.
39. B. Arad and O. Ben-Shahar. Sparse recovery of hyperspectral signal from natural rgb images. In European Conference on Computer
Vision, pages 19–34. Springer, 2016.
40. A. Vedaldi and K. Lenc, “MatConvNet: Convolutional neural networks for MATLAB,” in Proc. 23rd Annu. ACM Conf. Multimedia Conf.,
2015, pp. 689–692.
41. R. C. Bolles, H. H. Baker, and D. H. Marimont, “Epipolar-plane image analysis: An approach to determining structure from motion,”
International Journal of Computer Vision, vol. 1, no. 1, pp. 7–55, 1987.
42. I. Tosic and K. Berkner, “Light field scale-depth space transform for dense depth estimation,” in IEEE Conference on Computer Vision
and Pattern Recognition (IEEE, 2014), pp. 435–442.
43. L. Gao, N. Bedard, and I. Tosic, “Disparity-to-depth calibration in light field imaging,” in Imaging and Applied Optics, OSA Technical
Digest (Optical Society of America, 2016), paper CW3D.2.
44. Zhu, S., Zhang, Y., Lin, J., Zhao, L., Shen, Y., & Jin, P. (2016). High resolution snapshot imaging spectrometer using a fusion algorithm
based on grouping principal component analysis. Opt Express, 24(21), 24624-24640. doi:10.1364/OE.24.024624

Manuscript v2 Shuaishuai

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Manuscript v2 Shuaishuai

Загружено:

Авторское право:

Доступные форматы

Simultaneous spectral and volumetric imaging of moving objects

𝑂𝑃𝐷 (𝑥, 𝑦) = 2𝐵 tan(𝛼) [(𝑥 − 𝑥0 ) cos(𝛿 ) − 𝑦 ∙ sin(𝛿 )]/𝑀𝑅2 , (1)

𝐻1 (𝐼𝑖 ) = ReLU{𝐖1 ∗ log[𝐻0 (𝐼𝑖 )] + 𝐛1 }, (9)

𝐻2 (𝐼𝑖 ) = ReLU[𝐖2 ∗ 𝐻1 (𝐼𝑖 ) + 𝐛2 ], (10)

𝐻3 (𝐼𝑖 ) = exp[𝐖3 ∗ 𝐻2 (𝐼𝑖 ) + 𝐛3 ], (11)

𝑛𝑖 = Q − mod[(𝑖 − 1), 𝑄 ], (15)

𝑥𝑖 = 𝑥𝑖𝑐 + 𝐷𝑎 (𝑚𝑖 − 𝑚𝑐 ), (16)

𝑦𝑖 = 𝑦𝑖𝑐 + 𝐷𝑎 (𝑛𝑖 − 𝑛𝑐 ), (17)

𝑥𝑖𝑟𝑎𝑤 = 𝑥𝑖 + (𝑚𝑖 − 𝑚𝑐 )𝑑 (18)

𝑦𝑖𝑟𝑎𝑤 = 𝑦𝑖 + (𝑛𝑖 − 𝑛𝑐 )𝑑 (19)

B. Spectral resolution and accuracy

4. Discussion and conclusion

Вам также может понравиться