Вы находитесь на странице: 1из 44

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Applications of Functional PCA to Sparse and Irregular Functional Data


Daniel Conn
Department of Biostatistics UCLA School of Public Health

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Outline

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

What is Functional Data?


Each observation consists of one function or curve. So our dataset consists of n functional observations

x1 (t), x2 (t), ...xn (t). For the sake of simplicity, all of the functions in this talk will be indexed by time, t.
Well, to be precise, our raw data still comes in the form of

discrete data sets; the methods of functional data analysis assume we have functional observations (more on this later).
Each functional observation is usually assumed to be

continuous and differentiable.


You could view functional data as continuous extensions of

longitudinal data.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Some Examples of Functional Data


You see a lot of weather examples in functional data

analysis (FDA). One might be interested in seeing how the temperature changes over the course of a year in various geographical locations. Similarly, you might also record precipitation levels over time (Ramsay, 2006).
Growth data is another big topic. How does a childs height

or weight change with time? (Ramsay, 2006)


There are less obvious applications of functional data

analysis. These highlight the idea that t need not stand for time. One could examine how the probability of answering a test question correctly varies w/ IQ (or some other latent variable). Each test question gives rise to a different functional observation. (Ramsay,2006)

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

How Can We Take Advantage of Functional Data


Longitudinal data analysis already allows us to take into

account how measurements on an individual change over time. What do we gain by assuming these measurements to be part of smooth trajectories over time and analyzing the data as such?
Conceptually, if you really believe that measurements are

coming from smooth functions, you might want your statistical analysis to reect this fact.
Functional data analysis offers an intuitively natural

approach to analyzing highly correlated data. Smoothness can be thought of us saying that as you look at two separate observations taken closer and closer in time one observation perfectly determines the other (Ferraty and Vieu, 2006).

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Taking Advantage of Functional Data Continued


At a more practical level, many natural and intuitive

questions can only be asked once you assume you can talk about things like continuity and differentiation.
For example, in our weather data, we would expect

temperature and precipitation to exhibit some sort of cyclical or sinusoidal pattern over the course of the year.
Its natural to explore the extent to which these weather

patterns truly exhibit sinusoidal behavior.


If a function f (t) = c1 + c2 sin((t r )) truly is sinusoidal, it

satises the following differential equation: f = ()2 (f c1 ). We could answer our question by estimating f and f and seeing how linearly related they are (Ramsay, 2006).

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

The Mean Function

x (t) = n1

is the mean curve amongst a population of curves (Ramsay 2006). involves more than naive application of the above formula.

n i=1 xi (t)

In practice, estimating a reasonable mean curve often For example, if we have a lot of measurements on people

with very high curves and few measurements on people with low curves x (t) will be skewed too high.
The mean curve might not be representative of any of the

individual curves.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Covariance Function

The covariance surface is dened as follows:

cov (t1 , t2 ) = (n 1)1 (Ramsay 2006).

n i=1 (xi (t1 )

x (t1 ))(xi (t2 ) x (t2 ))

cov (t1 , t2 ) is just the covariance between the value of the

average functional observation at time points t1 and t2 .


Reasonable estimation of the covariance surface depends

on reasonable estimation of the mean curve.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Simple Functional Regression

E(Y (t)|X ) = (t) +

X (s)(s, t)ds

First note that both s and t represent time on different

time-scales. s is the time-scale for the predictor curve; t is the time-scale for the response curve. For our weather example, they are the same, in general, they can be different.
The effect of the predictor curve on the response curve is

summarized by a bivariate function (s, t).


If t and s are simultaneous time-scales (s, t) = 0 for

s > t.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Generalized Functional Regression

E(Yi |X ) = g 1 ( +

(t)Xi (t)dt)

Yi is a scalar. Were relating curves to scalars. Note the scalar response and functional predictor variable. Again, when making sense of the functional model it helps

to convert integrals back to summations.


g is the link function. The linear predictor is now in the form of an integral. The Yi are chosen to have appropriate probability

distribution.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Practicalities
This whole time weve been talking about how to analyze

smooth curves but, in reality, we never actually get to observe continuous functions through time.
In reality, we only get measurements on individuals at

discrete numbers of time points.


This is where smoothing techniques come into picture.

Before analyzing our data set of smooth functions x1 (t), x2 (t), ...xn (t), we need estimates of these curves.
Curse of Dimensionality: functional data is innite

dimensional. Functional PCA is one dimension reduction method based on multivariate PCA.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

How Does One Estimate a Smooth Curve From Noisy Data?


We dont observe smooth curves. We get ni observations,

(tij , yij ), on the ith individual. We need to use these observations to estimate xi (t).
Our general method consists of three steps. Step 1: assume that the function xi (t) takes the following

form, xi (t) =

n=1 cn bn (t)

is acceptably close to xi (t) and use the data nd estimates of ci for i = 1, 2, ..K . Our estimate of xi (t) is K cn bn (t). n=1

Step 2/3: nd a number K such that

K n=1 cn bn (t)

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Further Practicalities
In the above algorithm there are three places where we

have to make well-informed decisions.


First, we have to choose the bi (t), the basis functions.

Different basis functions give rise to more or less satisfactory results. Different basis functions also have different computational consequences.
We have to choose a number K at which to truncate the

innite series expansion of xi (t). If K is too small we wont have a good approximation. If K is too big well be tting noise.
We need a method of estimating the ci . If we have enough

data on each curve, we can estimate the ci with simple methods like least squares.
Essentially, this presentation is about choosing a sensible

set of basis functions and estimating the coefcients.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Orthonormal Basis Functions


A set of basis functions b1 (t), b2 (t)... is said to be

orthonormal iff (bi (t))2 dt = 1. and bi (t)bj (t)dt = 0.


If we have an innite series xi (t) =

bi (t) are orthonormal, the cn =

n=1 cn bn (t) x(t)bn (t)dt

where the

There are many different choices of orthonormal basis

functions: Fourier series, B-splines, wavelets, Hermite polynomials...


Fourier series are typically used to estimate periodic

functions. B-splines are very popular. Wavelets are newer are popular for tting non-smooth functions.
In functional data analysis, one is also interested in

estimating derivatives. This also has to be taken into account.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

First Glimpse of Functional PCA


In step 1 of our three step algorithm, we came to data with

well-chosen basis functions in mind.


We can take a different tact. We can use the data in some

systematic fashion to choose a set of basis functions for us.


Functional principal components analysis does just this.

We can see functional principal components analysis as a method for estimating the optimal set of orthonormal basis functions with which to represent our data.
Its optimal in the sense that it lets us represent functional

data with a (hopefully) small number of basis functions without losing too much information.
In fact, regular multivariate principal components analysis

can also be seen in this light.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Multivariate Principal Components Analysis


Multivariate PCA is usually presented as a method of data

reduction which takes in a large set of correlated covariates (high-dimensional data) and spits out a new set of covariates.
Each new set of covariates is just a linear combination of

the old covariates. The coefcients in these linear combinations can be seen as weights of length one (the sum of the squared weights add to one).
The new set of covariates is better than the old set for two

main reasons.
FIrst of all, our new covariates are uncorrelated. Second, our new covariates can be ordered in such a way

that rst new covariate explains the most variation in the data set and the second covariate explains the second most variation and so on...

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Optimality of Principal Components


Suppose we measure the total amount of information in

random vector X by the sum if the variances of each component: trace(). Then we can measure the loss of information incurred by dimension reduction by comparing the amount of information in a new, reduced random vector to the amount of information in the original random vector.
Let Y be the reduced random vector. The amount of

information lost by using Y as a proxy for X is quantied trace(Y )/trace(X ). Thus, minimizing information loss is equivalent to maximizing the trace(Y ) subject to the previously mentioned restraints.
Its easy to show that the rst k-principal components

maximize trace(Y ), the principal components are obtained by taking the inner-products of the rst k-eigenvectors of with X, and trace(Y)=k i . i=1

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Functional PCA

X (t) = (t) +
i=1

k k (t)

Just as in multivariate PCA, we can represent a sample

curve as a linear combination of eigenfunctions; this is called the Karhunen-Loeve expansion.


If we truncate the expansion at k , we get the

k -dimensional representation of X (t) that minimizes the amount of information lost.


Data reduction is now mandatory as the Karhunen-Loeve

Expansion demonstrates that each functional observation is innite dimensional.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Optimality in Functional PCA


The functional analogue to the covariance matrix is the

covariance surface cov (s, t).


In multivariate PCA, we measured the total amount of

variability in X by Trace(). What is Trace(cov (s, t))?


Recall the fact that Trace()=n i where the i are the i=1

eigenvalues of .
Every nxn matrix gives rise to a linear function on R n . The

covariance surface gives rise to a special type of linear function on L2 called an integral operator. Call this integral operator (x). maps a function x in L2 to another function on L2 , (x)(t) = x(s)c(s, t)ds.
Integral operators have eigenfunctions and corresponding

eigenvalues: (x)(t) = x(t)


We can dene the total amount of variation in X (t)

summing up eigenvalues.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Optimality in Functional PCA

X (t) = (t) +
i=1

k k (t)

The eigenfunctions are the functional principal

components.
The functional principal components are an orthonormal

basis (now in L2 instead of R 2 ).


If we truncate the expansion at k , we get the

k -dimensional representation of X (t) that minimizes the amount of information lost.


The sum of the rst k eigenvalues over the sum of all

eigenvalues can be interpreted as the percent of variation explained by the rst k eigenfunctions.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Sparse Longitudinal Data


We are said to have sparse longitudinal data if we have

only a small number of measurements on each individual.


Although we might not have very much information on any

individual curve in the data set, we might have a large number of individuals in the study.
If we had a way of tting smooth curves to such sparse

longitudinal data, we could extend the methods of functional data analysis to this sparse setting.
The question is how do we t smooth curves to sparse

trajectories.
One approach for smoothing sparse longitudinal data is

given in a paper by Hans-Georg Muller called "Functional Modelling and Classication of Longitdunal Data"

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Overall Game Plan in this Muller Paper

In a non-sparse setting we could estimate individual curves

via smoothing methods. With maybe only 4-5 measurements on each observation these methods wont work.
Even though we dont have a lot of information on each

individual, we have information on a lot of different individuals. As a result, we should have enough information to estimate (t) and cov (s, t) by using standard univariate and bivariate smoothing methods.
It would be nice if we could use this population level data to

re-construct individual curves.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Functional PCA Applied to Sparse Data

X (t) = (t) +
i=1

k k (t)

Going back to the Kharunen-Loeve Expansion, note that a

sample curve is fully determined by the mean curve, eigenfunctions, and principal component scores.
k =

(Xi (t) (t))dt is the equation for the PCA scores.

In a non-sparse setting, wed simply estimate the PCA

score with a Riemann sum.


Hopefully, by making further assumptions, we can estimate

principal component scores for individuals without needing to compute the above integral.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Estimating PCA Scores

Uij = Xi (Tij ) +

ij

= (Tij ) +
i=1

ik k (Tij ) +

ij

ik = E[ij |Ui ] = k T 1 (Ui i ) ik


Ui

Ni is the number of observations we have recorded for the

ith person. Tij is the jth measurement time on the ith individual. For xed i the Tij are i.i.d. Uij is the measurement on the ith person taken at time j.
Assume X is a Gaussian processes and the ij are normal. If we make the above assumptions, we can get estimates

of the principal component scores for each individual.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Estimating Individual Trajectories

Xi (t) = (t) +
i=1 K

ik k (t)

XiK (t) = (t) +


i=1

E[ik |Ui ]k (t)

Once we have have estimates of (t), k (t), and ik , we

can just plug them in to the Kharunen-Loeve expansion to get estimates of individual trajectories.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

The Data
The data consists of initial bilirubin measurements taken

on 258 primary biliary cirrhosis patients. Primary biliary cirrhosis is a type of liver disease.
From these bilirubin measurements we wish to predict

whether or not a patient will survive for longer than 10 years (categorical outcome).
High bilirubin levels are generally a bad thing. Wed expect

patients with higher bilirubin measures to be less likely to survive for over 10 years.
The study population is restricted to only those patients

who survived longer than 910 days.


84 patients were short-lived (did not survive 10 years); 184

were long-lived.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Plots of Bilirubin Measurements

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

First Three Eigenfunctions

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Deciding How Many Eigenfunctions to Retain

Mueller uses one-curve-leave-out cross validation to

minimize prediction error.


We rst drop measurements on the ith individual to obtain

a new mean function new eigenfunctions, and new (i) (i) trajectories for each individual: (i) , k , and Yi . We then choose the number of eigenfucntions to retain, K , so that K minimizes
n i=1 Ni j=1 (Uij

(i) Xi (Tij ))2 .

Mueller also uses an analogue to AIC and BIC. Both methods suggested that the rst two eigenfunctions

should be retained.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Predicted Bilirubin Curves and Probabilities of Surviving 10 Years

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Plot of PCA Scores in Short-LIved and Long-Lived Groups

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Estimates on Beta Function and Model Based Classication

The negative s match what we saw in the plot of PCA

scores. Short-lived patients in general seemed to be in the upper right corner (i.e. high PCA scores). The overall misclassication rate, based on a one-leave-out prediction error, was 26.54%.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

References
Horvath, Lajos, and Piotr Kokoszka. Inference for

Functional Data With Applications. Springer Verlag, 2012. Springer Ser. in Statistics
Ramsay, J. O., and B. W. Silverman. Functional Data

Analysis. New York: Springer, 2006.


Fang Yao, Hans-Georg Muller and Jane-Ling Wang (2005):

Functional Data Analysis for Sparse Longitudinal Data, Journal of the American Statistical Association, 100:470, 577-590
Ferraty, Frederic, and Philippe Vieu. Nonparametric

Functional Data Analysis: Theory and Practice. New York: Springer, 2006.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Simple Functional Regression


Suppose we were interested in understanding the

relationship between weather in Los Angeles and weather in San Diego. Wed expect some correlation between weather in the two regions but how much? Are there more complex relationships to look for? Could it be that there is a time lag between weather in the two locations? HIgh precipitation in LA could mean high precipitation in San Diego next week. We could go into the records and gather data on precipitation over multiple years in LA and SD. Each years weather data would give rise to one pair of functional observations. Wed end up with n pairs of functional observations. We might be able to investigate the relationship between these curves w/ the technique of simple functional regression.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

More Simple Functional Regression


E(Y (t)|X ) = (t) + X (s)(s, t)ds

Simple functional regression serves as a good illustration

of how classical methods generalize by simply converting sums to integrals.


First hold t constant at to Then picture the integral as summation over many values

of s. E(Y (t0 )|X ) (to ) +

X (si )(si , to )
i=1

So at each point time point t were really just doing a

multiple regression with a continuum of covariates. Each point s gets its own covariate X (s).

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Generalized Functional Regression

In simple functional regression we were relating pairs of

functional observations to one another.


We might also want to relate functional observations to

scalar outcomes.
For example, we might want to relate a runners velocity at

the beginning of a race to his nishing time. Maybe good runners save their energy or maybe good runners run fast the whole way. Maybe both strategies work and the effects of initial velocity will balance out.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Classication of Various Functional Data Models

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Multivariate Principal Components Analysis Continued


Let be the covariance matrix of a vector of random

variables X = (X1 , X2 ..., Xp ).


One might measure the total amount of random variation in

the vector X by simply adding up the variances of the Xi . This is just Trace().
Suppose p is an impractically large number. Suppose we

wanted a k -dimensional representation of X where k is much smaller than p.


Call this k -dimensional summary Y . For simplicity, lets

constrain Y so that each element of Y is a linear combination of the Xi and the coefcients in each linear combination has length 1.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Optimality of Principal Components


First off, Y is k -dimensional; X is p-dimensional. k is

supposed to be much less than p. By using Y as a proxy for X , it is clear that we are losing some amount of information.
Naturally, we want to calculate Y in such a way that we

lose less rather than more information.


The rst step towards solving this problem is to precisely

dene what we mean by information loss. Recall that Trace() represents the amount of variation in a random vector. We can measure the amount of information loss by comparing the amount of variation in Y with the variation in Y . In particular, we take the ratio of the two: Trace(Y )/Trace(X ).

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Optimality of Principal Components Continued


It is easy to show that the Y that minimizes loss

information is given by Y = (X C1 , ..., X Ck ). Where C1 ,...,Ck are the rst k principal components.
Let C be a matrix whose columns are the n principal

components. Because the principal components are an orthonormal basis, X = CC X .


It follows that X = X , C1 C1 + ... + X , Cp Cp Thus, we see that when we carry out multivariate PCA to

reduce the dimensionality of a data set , we are using the data choose an orthonormal basis (the Ci ) and coefcients (the X , Ci Ci ). Then we are re-expressing our data in terms of the basis and the coefcients. We are using the data to choose a basis that minimizes the loss of information.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Functional PCA and Generalized Functional Regression


E[Yi |Xi (t)] = g( + =+ (t)(Xi (t) (t))dt)

(t)(Xi (t) (t))dt

If we choose a specic set of orthonormal basis functions,

k (t), then = + j j where j = (t)j (t)dt and j=1 j = (Xi (t) (t))j (t)dt. We can approximate by simply truncating the innite sum at some number k depending on sample size and the particulars of the data. We just have a regular glm with predictors given by ik . The parameters were trying to estimate are projections of (t) onto the basis functions.

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Functional PCA and Generalized Functional Regression Continued


ik = E[ij |Ui ] = k T 1 (Ui i ) ik
Ui K

E[i |Ui ] = +

[k T 1 (Ui i )]k ik
k =1 Ui

The k k (t) are the projections of (t) onto the

orthonormal basis functions k , so (t) = predictors given by ik = k T 1 (Ui i ). ik


Ui

j j (t).

We can estimate the j by using a regular GLM model with

We can then estimate (t): (t) =

j j (t).

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Average Curves

A Quick Review of FDA

Functional PCA

An Example of Functional PCA Applied to Sparse Data

Covariance Surface

Вам также может понравиться