Mda1 PDF

Methods of Data Analysis Correlation functions, spectral estimation, ltering
Week 1
Motivation
Characterizing spatio-temporal properties of a signal. In the analysis of experimental signals, either scalar time series or spatio-temporal processes, one- and two-point statistics provide the basic description of the system that is usually suciently amenable to reliable estimation from data. Often (but as we will see shortly, not always), physically interesting phenomena including collective behaviors or phase transitions can be completely captured by a full characterization of statistical properties of up to second order. One-point statistics refer to the distribution of values that the single variables in the system take on. Two-point statistics refer to the correlation between pairs of variables: the propensity of one variable to uctuate (deviate from its mean value) up or down in- or out-of-phase with another variable; this gives us an estimate of how coupled, or correlated, a process is across time and space, and can help us identify important temporal and spatial scales. Higher-order correlations usually refer to correlations between three or more variables. Stationary signals. In case the system under study possesses certain nice statistical features (e.g., time or space translation invariance / stationarity), one can represent and analyze the correlation functions in the Fourier (frequency) space which is very convenient to work with. In this case, the system can be easily described in terms of power spectra, and methods of estimating these spectra from nite data are jointly known as spectral estimation techniques. Signals, noise, ltering. Because experimental noise corrupts measurement signals and often has a spectral composition that diers from the real signal, working in frequency domain makes it easier to separate real signal from noise, using basic (linear) ltering.
Goals
One-point histogram and its mean; two-point statistics, the covariance matrix of uctuations. Conventions for covariance / correlations (connected correlations, normalized correlations). Auto-correlation function for a temporal (stationary) process; relation between the variance and the auto-correlation function. Power spectrum of uctuations, Wiener-Khinchin theorem, the integral of power spectrum; cross-power spectra and coherence (optional). Linear lter / convolution is multiplication of power spectra: low-pass, high-pass ltering. 1
Estimating correlation functions and spectra from data; error bars from subsampling and estimation biases.
Data
1. Voltage trace from a single electrode in a microelectrode array that records extracellular activity of retinal ganglion cells. This is a (pre)amplied signal that contains a mixture of noise, background rumbling of cells that are a signicant distance away from the electrode but nevertheless induce voltage uctuations in it, and close by cell(s) that, when they spike, cause a large, sharp echo of the action potential, or spike, on the electrode. Usually, these spikes are signals of interest that need to be isolated. The data vector consists of 2 106 voltage samples (in V ), sampled at 2 104 Hz, for a total of 100 seconds of recording. Data is by Olivier Marre, details in Ref [1]. 2. An ensemble of 45 calibrated grayscale natural images from Ref [2], with resolution of 256 256 pixels.
Quantities, denitions, details

Suppose we have (i) a scalar signal x(t) (in reality, x will probably be sampled at discrete t, time points ti , separated by = ti+1 ti ); (ii) samples from signals on the lattice, yi where i indexes spatial location on a (possibly 2D) lattice, and t indexes the sample, t = 1, . . . , T (t here is just index, it does not necessarily imply temporal ordering). In preparation for thinking about our data, imagine x(t) to be a voltage trace coming from a recording electrode that monitors neural activity, and of y as the luminance values at location i in a patch of a natural image, indexed by t. One-point statistics: distribution P (x) created from histogramming x and properly normalizing such that dxP (x) = 1 (this means you have to create a histogram with some appropriate choice of binning, normalize it by the size of the bin and the total counts). Do not show raw histograms unless youre emphasizing absolute counts for some particular reason. Look at P (x) on log scale, compare to Gaussian with matched mean, x , 2 , to assess whether the values of x deviate from the mean more than and variance, x expected for a Gaussian (long tails), or are sparser / closer to zero (kurtotic), or asymmetric (skewed); these deviations can often be described by third and fourth-order moments. Estimated error bars on the distribution P (x): construct multiple distributions from random halves of data, and look at the resulting SD / SE of P (x). Covariance matrix C (t, t ) = x(t)x(t ) x(t) x(t ) = (x(t) x(t) )(x(t ) x(t ) ) ; this is known as a connected correlation function, since it looks at correlation of uctuations, x(t) = x(t) x , away from the mean. The averages are taken across many repeats of the same signal. For the spatial signal, Cij = (yi yi )(yj yj ) . Note that the means of y for i = j could be dierent (if the signal is not homogenous across the lattice). Stationary signals are signals whose statistical structure does not depend on absolute time, e.g., C (t, t ) = C (t t , 0) = C ( ) = x(t + )x(t) ; for correlation, the quantity that matters is the separation, , between two samples. One can estimate C ( ) by averaging across time t. Stationary signals will have a band-diagonal covariance matrix!
2. Correlation coecient, or normalized correlation, is often dened as ( ) = C ( )/x Note that at = 0 this is equal to 1. This normalization is analogous to the dierence between covariance and correlation coecient in the spatial version, i.e., Cij = yi yj and ij = Cij /(i j ), where i = (yi )2 is the SD of variable yi . Correlation coecients ij go between 1 and 1.
Examine C ( ), by plotting it (check out also the logarithmic axis for C ). We often speak of a characteristic timescale c of a process if C ( ) A exp( /c ), which will be a straight line on a semi-log-y plot. In this case the values at a given time are correlated with values in the past, but only looking back a nite amount of time. This is often a signature of some simple underlying dynamics that generates the uctuations in x. Similarly, one could look for a typical correlation length in the case of signals on the lattice. What does it mean if there are multiple scales, or no clear scale? There is an important dierence between intrinsic and extrinsic scales. Extrinsic scale is given by the scale of sampling (e.g., , or cameras sampling of the image that determines the physical lattice spacing of yi ); this has nothing to do with the signal itself, but with how we detect and represent it. Intrinsic timescale is the timescale on which the signal itself varies. This is important for determining how many independent samples we have in the data, which puts limits on our statistical power. Consider a signal with T temporal samples, with an intrinsic correlation time c : the number of independent samples is only T /c ... Fourier transform (FT) and its inverse of a signal: x(t) = (2 )1 d x ( ) exp(it), x ( ) = dt x(t) exp(it). There is a huge mess of existing notation and conventions (where are factors of 2 , minuses in the exponent, and whether the transformation variable is or frequency f , where = 2f ). The average absolute value squared of the FT is called the power spectrum, Sx ( ) = |x ( )|2 . Power spectrum tells us how much variance in the signal there is at each frequency, i.e., is it a fast (high power at high frequencies), slow (high power at low frequencies), or periodic (sharp peak(s) in the spectrum at characteristic oscillatory frequencies) signal. Wiener-Khinchin theorem: C ( ) = d/(2 ) S ( ) exp(i ), and thus 2 = C (0) = d/(2 ) S ( ). Parseval theorem: dt x2 (t) = d/(2 ) |x ( )|2 . For discretely sampled signals, x(ti ), a fast way to compute the (discrete) Fourier transform is by Fast Fourier Transform, or FFT, implemented by virtually all computational packages. There are even more confusing conventions there and dierent software may do it dierently. When you put in a signal as a vector to a FFT routine, pay attention to: (i) what are the components of the vector that is returned, i.e., where is the zero-frequency component, and how to dierent frequencies follow each other in the array; (ii) where are the real and imaginary components stored; (iii) what is the normalization often there are factors of N (length of the data vector) that oat around. Time- and frequency representations of discrete signals contain equal amounts of information and are invertible.
If real data is of length T , (real-valued) samples sampled with time bin (1/ is called the sampling rate), then the FFT will return T complex numbers x (fn ), representing the signal at equally-spaced frequencies fn = n/(T ), where n = T /2, . . . , T /2. Since the signal is real, one can show that x (f ) = x (f ), where denotes complex conjugation. x (f = 0), the zero-frequency component, is related to the mean value of the signal, x(t) , and is often called the DC component. If you do FFT on a signal from which you have subtracted the mean, the zero-frequency component must be zero. Nyquist frequency and the sampling theorem. Note that FFT returns frequency components with a maximum frequency of fN = 1/(2), called the Nyquist frequency, or one-half of the sampling frequency. Given a sampling frequency at which a continuous signal is sampled, variations in the signal that are faster than fN cannot be resolved; higher frequencies get aliased back into the Nyquist range. On the other hand, if the original continuous (!) signal is band-limited, i.e., contains no power above fN , then the Fourier representation is lossless (or complete), and the continuous signal (that nominally has innitely many parameters) can be perfectly reconstructed from the discrete frequency components. A naive (but often sucient) estimate of the power spectrum is obtained by chunking the whole signal into shorter segments, doing FFT on each segment, taking the square absolute value of the result to get the power spectrum of that segment, and averaging over all segments. This procedure also generates an estimate of error bars, as scatter across squared power spectra over signal segments. One needs to pay attention to the conventions to (i) plot the frequency axis correctly; (ii) normalize the magnitude correctly. The problem with naive estimate is that it has a positive sampling bias, and cutting the continuous time-series into discrete bins introduces various edge eects. There is a whole eld dealing with how to correct for these biases. Check out Matlabs pwelch or periodogram, or Chronux package, if interested. See also explanation in Numerical Recipes in C. FFT is a great tool for isolating and manipulating the eect of pairwise correlations in a signal. Note that the power spectrum contains the same information as the correlation function, which is a full characterization of the pairwise statistical structure. The Fourier transform is complex, and the power spectrum is fully determined by the magnitude (absolute value) of the FT values. The phase of the FT values, in contrast, does not have an inuence on the power spectrum. Therefore, making the magnitudes of all Fourier components equal, but retaining the phase, will remove all pairwise correlations in the signal, or whiten the signal, while maintaining all higher-order correlations. In contrast, keeping the magnitudes of all Fourier components, but randomizing the phase, will remove all higherorder correlations, but keep the pairwise statistics intact. Phase can be very important for certain signals [3]. Many useful operations on a signal can be thought of as linear ltering of the signal. Mathematically, this amounts to a convolution operation: in X (t) = dt F (t )x(t t ), x(t) is the original signal, X (t) is the ltered signal, and F (t ) is a linear lter. The signals are usually thought to be long compared to the length of the lter F , which is non-zero only for a short range of values, so that one can restrict the integral to range [0, K ], where K is the length of the lter. To imagine what this ltering does, see which values of the original signal contribute to the value of X at time t: the convolution instructs you to take 4
all values of the original signal from now, up to K in the past, weight each one with F , and sum everything up. So if F (t ) = (t ) (a Dirac delta function), the ltering operation is identity; if F (t ) is constant, the ltering is taking an average over K past time bins of the signal; if F is biphasic, it is taking a derivative of a signal. Many useful operations can be represented by linear ltering. On real data, the convolution operation clearly needs to be carried out on discrete signals.
Study literature
Numerical Recipes in C: Fourier transform (chapter 12.1), Convolution and deconvolution (chapter 13.1), and optionally Wiener ltering (chapter 13.3).
Homework
1. What is the power spectrum of a process whose correlation function is exponential, i.e. C ( ) = A exp(| |)/c )? What signal x(t) has a spectrum that is purely constant, S ( ) = c (also called white noise) for (formally) an innite range of frequencies? If you know how, compute this problem analytically, otherwise do it numerically. In practice, clearly, there is a frequency cuto, given ultimately by sampling (Nyquist frequency). 2. Plot the voltage signal x(t) from the microelectrode array and visually examine it. Spikes are very fast downward voltage excursions (sometimes reaching to 100 V ), followed by a small overshoot (zoom in to a few spikes to see how they typically look like). They are riding on top of fast noisy oscillation and a slow drift in the average voltage. 3. To get some sense of the signal, plot a probability distribution function (properly normalized), of x(t). Is there any obvious feature for negative voltages where you could draw a threshold easily? 4. To identify the spikes, you can set a threshold. Try two thresholds, 50 V and 30 V ; whenever the signal crosses the threshold in a downward direction, identify a putative spike. By plotting the voltage trace between 40 s < t < 45 s and denoting on the plot the threshold crossings, show that the more stringent threshold misses some spikes, while the more permissive threshold identies as spikes a few uctuations in the baseline. 5. Drawing a simple threshold thus does not look like a good idea. Instead, we should lter out the slow uctuations in the baseline, and be left only with the high frequencies that constitute a fast spike. Filter the signal with a high-pass lter with a cuto frequency of 2 ). f0 = 80 Hz. A basic, single pole lter will have a power spectrum F (f ) = f 2 /(f 2 + f0 Plot this spectrum as a function of frequency on a log log scale and interpret what it does to frequencies above and below f0 ; make sure you understand why this is a high pass lter. To use this lter on our data: (i) take a FFT of the signal; (ii) multiply every frequency component f of the signal by the value of F (f ) at that frequency (remember, ltering is multiplying power spectra; since we are multiplying FT and not its square, we need to multiply by square root of the lter; also be very careful what are the frequencies of the components returned by the FFT routine); (iii) use the inverse Fourier transform, IFFT, to obtain the ltered signal x (t). Check that the returned signal is real (otherwise you have a mistake somewhere). Plot the ltered signal and check that the slow base-line uctuation has been removed. 5
6. Plot the probability distribution function (properly normalized), of the ltered version, x (t), on the same plot where you showed the raw signal, x(t). Describe what has happened to the bulk and the tail of the distribution. Is there an obvious choice of threshold for detecting spikes now? 7. Use this threshold to nd the spike times for all spikes (there should be about Ns = 1.2103 ) spikes in the data. To get a good look at how the spike looks like in the signal, extract all spikes: make a matrix of dimension Ns 201, where each row in the matrix is the ltered signal, x (ti 100 : ti + 100), that is, a signal snippet extending 100 samples before and 100 samples after the identied spike time ti , i = 1, . . . , Ns . Plot the mean spike waveform, and put physical units for time on the x-axis. What is the approximate duration of the spike waveform? 8. Estimate the distribution function (properly normalized to make it a pdf the probability density) of light luminance levels across the Ruderman image dataset. Compare this on the same plot to a Gaussian distribution of matched mean and variance; how does the natural signal deviate from Gaussian? Repeat the same for log luminance levels. To be able to interpret the results, after taking the log of the luminance, subtract the mean value of log luminance across the image ensemble, and normalize by the std of the log luminance. Compare this distribution to the standard Gaussian with zero mean and unit variance, on the same plot. Estimate and plot the error bars on this distribution and explain briey how you computed them. 9. Compute the pairwise correlation function of log-intensity in natural images, and plot it for the range from 1 to 100 pixels, with error bars. Assume the images are homogenous and isotropic, that is, that the correlation only depends on the separation between two pixels. You can optionally check how well this isotropy assumption holds or explain how you would check (do you expect it does based on the salient properties of natural images)? Is there a characteristic spatial scale in this correlation function? Do you expect there should be one? Why or why not? 10. A simplest model of transformation of signals carried out by the retina is that each retinal ganglion cell performs linear ltering on the original image. The linear lter is often de2 scribed as a dierence-of-Gaussians, or mexican-hat, i.e., F (x, y ) = exp (x2 + y 2 )/2c 2 , where is the radius of the central Gaussian and is the raA exp (x2 + y 2 )/2s c s dius of the surround Gaussian. The constant A is chosen so that the lter is balanced, i.e., that the area under the lter is 0. Create such a lter, represent it in a discretized version as a matrix, for c = 1 and s = 3 pixels, and plot it. Convolve the rst image of the dataset with this lter, and show the result so that it is clear which ltered values are positive and which negative. What does the lter to do areas of uniform luminance? Perceptually, what does this ltering do to the image? 11. In Fourier space, the power spectrum of the ltered image is a product of the power spectrum of the original image, and the power spectrum of the lter. Show the 2D power spectrum of the lter as a function of spatial frequency k = (kx , ky ), and average over all directions to show it as a function of |k |. Similarly, show the power spectrum of natural images, computed from the Ruderman dataset. Qualitatively, what does ltering in the eye do to the spectrum of natural images? 12. Optional. Are pairwise correlations in natural images sucient for perceptual salience? Take the rst image from the dataset and construct two new images from it: a whitened 6
image, and a phase-scambled image. For the whitened image, do 2D Fourier transform of the original image, normalize the absolute values of each of the Fourier components to the same value, and transform the image back into the real domain. What is the correlation function of the result? Show the original and the transformed image is the transformed image still recognizable? To create a phase-scambled image, again Fourier transform the original image. Keep the amplitudes of Fourier components untouched, but assign each component a random phase (recall: each component is a complex number z = x + iy = Aei , where is the phase, and A = x2 + y 2 is the amplitude). When you scramble phases, you need to be careful, since not all phases in the signal are independent of each other because the original image is real, the phases of certain components are equal... After phase-scrambling, transform the image back (as a check, you must obtain a real-valued image!). Is the image still recognizable? What does this imply for perceptual salience: is our visual system more sensitive to second-order correlations (Fourier amplitude), or to higher-order correlations (Fourier phase)? See also Ref [3].
References
[1] Marre O, Amodei D, Deshmukh N, Sadeghi K, Soo F, Holy TE, Berry MJ II (2012) Mapping a complete neural population in the retina. J Neurosci 32: 14859. [2] Ruderman DL, Bialek W (1994) Statistics of natural images: Scaling in the woods. Phys Rev Lett 73: 814. [3] Oppenheim AV, Lim JS (1981) The importance of phase in signals. P roceedings IEEE 69: 529.

Mda1 PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Mda1 PDF

Загружено:

Авторское право:

Доступные форматы

Methods of Data Analysis Correlation functions, spectral estimation, ltering

Quantities, denitions, details

Вам также может понравиться