Pca

NATIONALCHENGKUNGUNIVERSITY
PrincipalComponent
Analysis
FinalPaperinFinancialPricing
TuanAnh
SanderMgi
6/17/2009
[Typetheabstractofthedocumenthere.Theabstractistypicallyashortsummaryofthecontentsof
thedocument.Typetheabstractofthedocumenthere.Theabstractistypicallyashortsummaryofthe
contentsofthedocument.]
Table of Contents
Table of Contents............................................................................................................................ 2
Chapter I Introduction..................................................................................................................... 3
Chapter II Literature review............................................................................................................ 4
2.1 What is PCA ......................................................................................................................... 4
2.1.1 Definition of PCA .......................................................................................................... 4
2.1.2 History of PCA .............................................................................................................. 4
2.1.3 Basic assumptions.......................................................................................................... 5
2.1.4 Important concepts......................................................................................................... 6
2.1.5 Calculating principal components.................................................................................. 6
2.1.6 Deriving principal components...................................................................................... 6
2.2 Advantages and disadvantages of PCA ................................................................................ 9
2.2.1 Importance of PCA ........................................................................................................ 9
2.2.2 Benefits of PCA ........................................................................................................... 10
2.2.3
2.3
Limitations of PCA ................................................................................................. 11
Practical implications - Software ................................................................................... 12
Chapter III Applications ............................................................................................................... 14

Chapter IV Conclusions................................................................................................................ 26
List of Articles .............................................................................................................................. 27
References..................................................................................................................................... 29
Chapter I Introduction
When starting a research students as well as researchers often collect a lot of data or sometimes
come across large datasets that are available. But when having lots of data, especially when it is
secondary data, it is often very easy to get confused. It is hard to find the variables that are really
important for the research when there are so many variables to consider. This is where principal
components analysis (PCA) can help.
Principal Components Analysis (PCA) was invented by Karl Pearson in 1901 and is now used in
many fields of science. PCA is mostly used as a tool in exploratory data analysis because what it
essentially does it to find the most important variables (a combination of them) that explain most
of the variance in the data. So, when there is lots of data to be analyzed, PCA can make the task a
lot easier. PCA also helps to construct predictive models.
In this paper we are going to focus on applications of PCA in finance research.
Earlier
applications of PCA in finance date back to early 1970s, while there are many articles from 2009
that used PCA. PCA is also often used in combination with other methods.
In chapter II we are going to first explain what PCA is and how it works. We are also going to
discuss the advantages and limitations as well as the importance of PCA. We are doing this by
reviewing some relevant literature. In chapter III we are continuing our literature review and
focus on the applications of PCA. Chapter IV concludes our overview of PCA.
So, when having this big pile of data and having decided to use PCA to find the most important
variables, what do we need to do now? We need to understand PCA and learn how to apply it.
This is what the next section of this paper focuses on.
Chapter II Literature review

2.1 What is PCA
2.1.1 Definition of PCA

PCA is known a Principle Component Analysis this is a statistical analytical tool that is used to
explore, sort and group data. What PCA does is take a large number of correlated (interrelated)
variables and transform this data into a smaller number of uncorrelated variables (principal
components) while retaining maximal amount of variation, thus making it easier to operate the
data and make predictions. Or as Smith (2002) puts it PCA is a way of identifying patterns in
data, and expressing the data in such a way as to highlight their similarities and differences.
Since patterns in data can be hard to find in data of high dimension, where the luxury of
graphical representation is not available, PCA is a powerful tool for analyzing data.
2.1.2 History of PCA

According to Jolliffe (2002) it is generally accepted that PCA was first described by Karl
Pearson in 1901. In his article On lines and planes of closest fit to systems of points in space,
Pearson (1901) discusses the graphical representation of data and lines that best represent the
data. He concludes that The best-fitting straight line to a system of points coincides in direction
with the maximum axis of the correlation ellipsoid. He also states that the analysis used in his
paper can be applied to multiple variables.
However, PCA was not widely used until the development of computers. It is not really feasible
to do PCA by hand when number of variables is greater than four, but it is exactly for larger
amount of variables that PCA is really useful, so the full potential of PCA could not be used until
after the spreading of computers (Jolliffe, 2002).
According to Jolliffe (2002) significant contributions to the development of PCA were made by
Hotelling (1933) and Girshick (1936; 1939) before the expansion in the interest towards PCA. In
1960s. as the interest in PCA rose, important contributors were Anderson (1963) with a
theoretical discussion, Rao (1964) with numerous new ideas concerning uses, interpretations and
extensions of PCA, Gower (1966) with discussion about links between PCA and other statistical
techniques and Jeffers (1967) with a practical application in two case studies.
2.1.3 Basic assumptions

According to Shlens (2009) there are three basic assumptions behind PCA that need to be
considered when calculating and interpreting principal components:
1) Linearity - Linearity frames the problem as a change of basis. Several areas of research
have explored how extending these notions to nonlinear regimes.
2) Large variances have important structure - This assumption also encompasses the belief
that the data has a high SNR. Hence, principal components with larger associated
variances represent interesting structure, while those with lower variances represent noise.
Note that this is a strong, and sometimes, incorrect assumption.
3) The principal components are orthogonal - This assumption provides an intuitive
simplification that makes PCA soluble with linear algebra decomposition techniques.
5
2.1.4 Important concepts

y Principal component - a linear combination of the original variables (1st principal
component explains most of the variation n the data, 2nd PC explains most of the rest of
the variance and so on)
y Eigenvectors - the coefficients of the original variables used to construct factors
y Eigenvalue - a corresponding scalar value for each eigenvector of a linear transformation
2.1.5 Calculating principal components

Jolliffe (2002) states that principal components (PCs) can be found using purely mathematical
arguments they are given by an orthogonal linear transformation of a set of variables
optimizing a certain algebraic criterion.
Shlens 2009) gives an overview how to perform principal components analysis:
1. Organize data as an mn matrix, where m is the number of measurement types and n is the
number of samples
2. Subtract off the mean for each measurement type
3. Calculate covariance matrix
4. Calculate the eigenvectors and eigenvalues of the covariance matrix
2.1.6 Deriving principal components

The following is a standard derivation of principal components presented by Jolliffe (2002).
To derive the form of the PCs, consider first 1x; the vector a1 maximizes
. It is clear that, as it stands, the maximum will not be achieved for

6
finite 1 so a normalization constraint must be imposed. The constraint used in the derivation is
11 = 1, that is, the sum of squares of elements of 1 equals 1. Other constraints may be more
useful in other circumstances, and can easily be substituted later on. However, the use of
constraints other than 11 = constant in the derivation leads to a more difficult optimization
problem, and it will produce a set of derived variables different from the principal components.
To maximize
subject
to 11 = 1, the standard approach is to
use the technique of Lagrange multipliers.

Maximize where is a Lagrange multiplier. Differentiation with respect to 1 gives or
Where Ip is the (p x p) identity matrix. Thus, is an eigenvalue of

corresponding eigenvector. To decide which of
and 1 is the
the
eigenvectors gives 1x with maximum variance, note that the quantity to be maximized is
so must be as large as possible. Thus, 1 is the eigenvector corresponding to the largest
eigenvalue of
, and
, the largest eigenvalue.
In general, the kth PC of x is akx and
, where k is the kth largest eigenvalue of
, and k is the corresponding eigenvector.

Shlens (2009) derives an algebraic solution to PCA based on an important property of
eigenvector decomposition. Once again, the data set is X, an mn matrix, where m is the
number of measurement types and n is the number of samples. The goal is summarized as
follows:
Find some orthonormal matrix P in Y = PX such that
is a diagonal matrix. The rows
of P are the principal components of X.

He begins by rewriting CY in terms of the unknown variable.
Note that they have identified the covariance matrix of X in the last line.
The plan is to recognize that any symmetric matrix A is diagonalized by an orthogonal matrix of
its eigenvectors. For a symmetric matrix A => A=EDET , where D is a diagonal matrix and E is a
matrix of eigenvectors of A arranged as columns.
Now comes the trick. They select the matrix P to be a matrix where each row pi is an eigenvector
of
. By this selection,
. With this relation and A (P1 = PT) we can finish
evaluating CY.
It is evident that the choice of P diagonalizes CY. This was the goal for PCA. We can summarize
the results of PCA in the matrices P and CY.
The principal components of X are the eigenvectors of
The ith diagonal value of CY is the variance of X along pi.
In practice computing PCA of a data set X entails (1) subtracting off the mean of each
measurement type and (2) computing the eigenvectors of CX.
2.2 Advantages and disadvantages of PCA
2.2.1 Importance of PCA

Principal component analysis (PCA) is a standard tool in modern data analysis - in diverse fields
from neuroscience to computer graphics - because it is a simple, non-parametric method for
extracting relevant information from confusing data sets. With minimal effort PCA provides a
roadmap for how to reduce a complex data set to a lower dimension to reveal the sometimes
hidden, simplified structures that often underlie it. (Shlens, 2009)
Importance of PCA is manifested by its use in so many different fields of science and life. PCA
is very much used in neuro-science, for example. Another fields of use are pattern recognition
and image compression, therefore PCA is suited for use in facial recognition software for
example, as well as for recognition and storing of other biometric data. Many IT related fields
also use PCA, even artificial intelligence research. According to Jolliffe (2002) PCA is also used
in research of agriculture, biology, chemistry, climatology, demography, ecology, food research
(?), genetics, geology, meteorology, oceanography, psychology, quality control, etc. But in this
paper we are going to focus more on uses in finance and economy.
PCA has been used in economics and finance to study changes in stock markets, commodity
markets, economic growth, exchange rates, etc. Earlier studies were done in economics, but
stock markets were also under research already in 1960s. Lessard (1973) claims that principal
component or factor analysis have been used in several recent empirical studies (Farrar [1962],
King [1967], and Feeney and Hester [1967]) concerned with the existence of general movements
in the returns from common stocks. PCA has mostly been used to compare different stock
markets in search for diversification opportunities, especially in earlier studies like the ones by
Makridakis (1974) and by Phillipatos et al.(1983).
2.2.2 Benefits of PCA

PCA is a special case of Factor Analysis that is highly useful in the analysis of many time series
and the search for patterns of movement common to several series (true factor analysis makes
different assumptions about the underlying structure and solves eigenvectors of a slightly
different matrix). This approach is superior to many of the bivariate statistical techniques used
earlier, in that it explores the interrelationships among a set of variables caused by common
"factors," mostly economic in nature. (Philippatos, Christofi, & Christofi, 1983)
PCA is a way of identifying patterns in data, and expressing the data in such a way as to
highlight their similarities and differences. A primary benefit of PCA arises from quantifying the
10
importance of each dimension for describing the variability of a data set(Shlens, 2009). PCA can
also be used to compress the data, by reducing the number of dimensions, without much loss of
information.
When using principal component analysis to analyze a data set, it is usually possible to explain a
large percentage of the total variance with only a few components. Principal components are
selected so that each successive one explains a maximum of the remaining variance, the first
component is selected to explain the maximum proportion of the total variance, the second to
explain the maximum of the remaining variance, etc. Therefore, the principal component
solution is a particularly appropriate test for the existence of a strong market factor. (Lessard,
1973).
PCA is completely nonparametric: any data set can be plugged in and an answer comes out,
requiring no parameters to tweak and no regard for how the data was recorded. From one
perspective, the fact that PCA is non-parametric (or plug-and-play) can be considered a positive
feature because the answer is unique and independent of the user.
2.2.3
Limitations of PCA
Limitations in PCA occur mainly due to the previously mentioned main assumptions and the data
at hand. PCA is not a statistical method from the viewpoint that there is no probability
distribution specified for the observations. Therefore it is important to keep in mind that PCA
best serves to represent data in simpler, reduced form.
It is often difficult, if not impossible, to discover the true economic interpretation of PCs since
the new variables are linear combinations of the original variables. In addition, for PCA to work
11
exactly, one should use standardized data so that the mean is zero and the unbiased estimate of
variance is unity:
Where zi =ith standardized variable.
This is because it is often the case that the scales of the original variables are not comparable and that
(those) variable (variables) with high absolute variance will dominate the first principal component.
There is one major drawback to standardization, however. Standardizing means that PCA results
will come out with respect to standardized variables. This makes the interpretation and further
applications of PCA results even more difficult. (Malava, 2006)
The mission when using PCA is often to get rid of correlation and interdependence of variables.
PCA succeeds in getting rid of second order dependences, but it has trouble with higher-order
dependencies. This problem might be solved by using kernel PCA or independent component
analysis. The fact that PCA is agnostic to the source of the data is also a weakness.(Shlens, 2009)
2.3 Practical implications - Software
When searching for principal components analysis software on the internet, there are numerour
vendors offering their services ans well as freeware packages available for users who prefer not
to pay. With the help of Wikipedia and Google searches we come out with this list of software
for PCA.
"ViSta: The Visual Statistics System" is free software that provides principal components
analysis, simple and multiple correspondence analysis. "Spectramap" is software to create a
biplot using principal components analysis, correspondence analysis or spectral map analysis.
Other software packages with PCA include Computer Vision Library, Multivariate Data
Analysis Software, MVSP, The Unscrambler, PCA/X and many others.
12
It is also possible to find PCAs using MS Excel, but this requires purchacing of add-in software
called XLSTAT.
In MATLAB, the functions "princomp" and "wmspca" give the principal components, while the
function "pcares" gives the residuals and reconstructed matrix for a low-rank PCA
approximation. While in Octave, the free software equivalent to MATLAB, the function
princomp gives the principal component.
In the open source statistical package R, the functions "princomp" and "prcomp" can be used for
principal component analysis; prcomp uses singular value decomposition which generally gives
better numerical accuracy, while "spm" is a generic package developed in R for multivariate
projection methods that allows principal components analysis.
In XLMiner, the Principles Component tab can be used for principal component analysis. In IDL,
the principal components can be calculated using the function pcomp. Weka computes principal
components (javadoc).
13
Chapter III Applications

Principal Components Analysis (PCA) can be applied to both frequency and time domain, real
and complex data, Spectral analysis quantify MRS data. It is also be used to find image pattern,
find common features of facial image of human being and image impression. But in this final
report we will concentrate more on the application to finance.
In the article principle components analysis for correlated curves and seasonal commodities:
The case of the petroleum market. To find the volatility functions they analyzed the principal
components of the correlation matrix of the historical returns. This methodology will ultimately
allow us to capture the variance of the multiple-curve market with the minimum number of
factors (which will lead to a less computationally intensive model).
It is reasonable to expect that the principal components of any single market behave similarly to
what was shown by Cortazar and Schwartz (1994). That is, one would look for a parallel shift
first, then for changes in slope and curvature and expect these to explain a large proportion of the
futures volatilities. This is because the futures contracts are positively correlated, and the
correlation declines with the difference in maturity. Thus, a joint move will tend to be more
important than a separating move of the same frequency, and a low-frequency move will tend to
be more important than a higher frequency one. The mathematics is worked out in Forzani and
Tolmasky (2001).
The main question we try to answer in this section is how the results of the PCA differ when we
build a model for a commodity that experiences seasonality. If we analyze the explanatory power
of each of the principal components in the case of crude we find that it is fairly stable across
14
trading periods. Due to seasonality effects, we can guess that this will not be the case in the case
of heating oil.
First, note that, as one would expect, the factor pattern for the heating oil is remarkably similar to
that of the crude oil. Similarly, 95.80% of the total variance is due to changes in the level,
99.02% is explained by the level and slope, and 99.63% by the first three factors. This is to
be expected given that the heating oils correlation matrix is stereotypical of many commodity
markets. The factor pattern is remarkably similar to that of the crude oil.
Crude oil: relative importance of the first four factors by season. Are these seasonal differences
statistically significant? Although some results on hypothesis testing in PCA models are
available in the literature, we are not aware of any work on the sampling distribution of the ratio
of the first eigenvalue of a correlation matrix to the sum of the n largest. Overall, the complexity
of the PCA results has increased tremendously in making a small step from a one-commodity to
a two commodity setup.
Another application of our results is pricing correlation-dependent options on petroleum products.
The PCA is helpful if, first, the options payoff depends on correlations between many different
curve points and/or curves and, second, the option will be priced by Monte Carlo simulation.
Under these circumstances, the PCA provides a valuable dimensionality reduction for the Monte
Carlo.
PCA is also widely used to study the co-movement patterns of national equity markets. We apply
PCA to each subperiod separately to study the changes in the co-movement patterns of the U.S
and the four Latin American equity markets. The correlation coefficient measures the extent to
which two statistical series move together. PCA, a multivariate statistical technique, is a useful
15
tool to analyze patterns of co-movement common to several series. In the paper Co-movements
of U.S. and Latin American equity markets before and after the 1987 crash PCA is applied to
each of the three subperiods to study the changes in the co-movement patterns of the five equity
markets between the subperiods. Using Kaisers significance rule, principal components with
eigenvalues greater than unity are retained for analysis. Kaisers varimax rotation is used for an
easier interpretation of the principal components. The highest factor loadings in each principal
component are marked with an asterisk.
February 1984September 1987 (Period I)
For Period I, three principal components with eigenvalues greater than unity are retained for
analysis. The Mexican and Brazilian equity markets have the highest factor loadings in the first
principal component. This principal component explains 28.6% of the total variation in the index
returns data matrix. Since the Brazilian equity market is negatively correlated with the Mexican
equity market in this period, it has a negative factor loading in the first principal component.
The Chilean and U.S. equity markets have the highest factor loadings in the second principal
component. This principal component explains 24.8% of the total variation in the Table 5 index
returns data matrix. The first two principal components together explain 52.9% of the total
variation in the index returns data matrix.
The Argentine equity market dominates the third principal component. This principal component
explains 20.1% of the total variation in the index returns data matrix. The U.S. equity market
also has a high factor loading in this principal component. However, since the U.S. equity market
is negatively correlated with the Argentine equity market in this period, it has a negative factor
16
loading in the third principal component. All three principal components together explain 73.0%
of the total variation in the index returns data matrix.
November 1987June 1991 (Period II)
There are only two statistically significant principal components in Period II, as compared with
three statistically significant principal components in Period I. This implies that the comovements of the five equity markets were closer after the crash than before the crash. We could
also say that the co-movements of the five equity markets were closer during the market opening
period (Period II) than during the closed markets period (Period I).
The Mexican, U.S., and Chilean equity markets have the highest factor loadings in the first
principal component. This principal component explains 31.9% of the total variation in the index
returns data matrix. The Argentine and Brazilian equity markets dominate the second principal
component. This principal component explains 24.4% of the total variation in the index returns
data matrix. Since the Argentine equity market is negatively correlated with the Brazilian equity
market in this period, it has a negative factor loading in the second principal component. The two
principal components together explain 56.3% of the total variation in the index returns data
matrix.
July 1991February 1995 (Period III)
There is only one statistically significant principal component in Period III, as compared with
two statistically significant principal components in Period II. This implies that the comovements of the five equity markets were even closer in Period III than in Period II. In Period
III, the opening of the markets is consolidated and large portfolio inflows into the Latin markets
are observed. The Argentine equity market has the highest factor loading and the U.S. equity
17
market has the lowest factor loading. The factor loadings of all five equity markets have positive
signs. The principal component explains 44.4% of the total variation in the index returns data
matrix.
The number of statistically significant principal components is three in Period I, two in Period II,
and only one in Period III. This implies that the co-movements of the five equity markets have
become considerably closer over time during the February 1984February 1995 period.
Principal Components Analysis (PCA) is another approach that has been applied in studying
diversification and shares common points with both correlation and factor analysis. While
computationally it is a special case of factor analysis, it can be applied to returns from a set or
portfolio of financial assets as a more sophisticated way of studying their correlation matrix and
integration. PCA is used to measure the degree of interdependence and covariability between
several assets. Multiple- or single-equation regression analysis is inappropriate for this purpose
because the returns on these assets may well be highly co-Uinear. The method of principal
components constructs from a set of variables, X, a new set of orthogonal variables, P, the
principal components. Each one of these components absorbs and accounts for the maximum
possible proportion of the variation in the variables X. If the variations in the returns of a set of
financial assets or markets are explained by relatively few principal components, then one can
conclude that they are highly integrated and that opportunities for diversification are limited.
Correlation analysis, factor analysis, and PCA are concerned with the contemporaneous
information flows across markets, i.e., the first risk premium.
These approaches essentially measure the integration of national financial markets and are
sufficient if market efficiency is strong. In the case when markets are weakly efficient, then these
18
approaches will not be adequate if a co-integration mechanism is present. Co-integration

quantifies market inefficiencies as short-term disequilibrium variations in prices and can be
perceived as a sufficient, though not necessary, condition for segmentation between national
equity markets.
The results obtained from applying PCA to the returns on the nine markets are presented in Table
3 of Diversification benefits in the smaller European stock markets. The conclusions drawn
from this approach confirm those from correlation analysis. The first principal component P1
explains 51 to 55 percent of the stock returns covariability with factor loadings that are
significant for most countries. The increase in the coefficient of determination and the value of
the eigenvalue in the second period suggests that the markets under study have become more
integrated. The component P1 can be interpreted as the true stock market return which abstracts
from risk and uncertainty and represents a compensation for sacrificed liquidity [Nellis, 1982].
An increase in the factor loading in P1 for some country is an indication of increased
interdependence. It is clear that such an increase has occurred for Greece, Spain, and Ireland.
Greece in the pre-October 1987 period cannot be explained by the first dominant component
since it has an insignificant factor loading. In the second sample, Greece appears more integrated
and enters the first component with a significant, although small, factor loading. In both periods,
the returns of the Greek stock market need additional components to be explained. The U.S. and
Dutch markets retain the highest factor loading in P1 for both periods.
Correlation and PCA found increased integration between the European markets and the U.S.
market. The smaller European markets were not found to be more strongly integrated with the
Japanese market for the period after the October 1987 crash.
19
In the The financial characteristics of small firms which achieve quotation on the UK unlisted
securities market. In this study, principal component analysis is applied to the financial ratios
data of the fifty-six firms. Principal components obtained are then used as input for the
multivariate analysis of variance (MANOVA) to compare the financial characteristics of firms
which have achieved USM quotation with those which have not.
The six principal components can be named in accordance with the factor loadings of the fifteen
financial ratios in each principal component. The factor loadings show the correlation between
the principal components and the fifteen financial ratios. Those financial ratios which are highly
correlated with a given principal component serve as definers of that principal component. For
example, leverage and liquidity ratios have the highest factor loadings in the first principal
component. Therefore, the first principal component can represent the indebtedness and liquidity
of the firms. Since profitability ratios have the highest factor loadings in the second principal
component, this principal component can represent profitability. Since growth ratios have the
highest factor loadings in the third principal component, this principal component can represent
growth rate, etc.
In Macroeconomic Factors and Stock Returns in a Changing Economic Framework: The Case
of the Athens Stock Exchange. The first principal component consists of variables that reflect
the economys wide influence, since high loadings are observed for 13 out of 19 variables.
Among these variables, those with the significant positive loadings are: Inflation, Money Supply,
Wage Cost, Cost of Construction, Exchange Rates and The Capital Account. It is reminded that
the 1980-86 period was characterized by high inflationary pressures and the economic policy
framework consisted of an accommodating monetary policy, a loose incomes policy and a
continuous depreciation of the Greek currency unit, thus pushing costs of production upwards
20
and introducing uncertainty with respect to companies earnings prospects. As far as fiscal policy
was concerned, it was loose too, and the increasing deficit of the public sector is reflected by the
fact that the Budget Deficit variable had also significant loading, although with negative sign
because it was reported as a series of negative numbers. It is worth noting that the Market Index,
used to approximate the market portfolio, is not included in the first factor. A possible
explanation of this result could be the relative unimportance of the Stock Market in the presence
of serious macroeconomic instabilities. Additionally, variables like Industrial Production and
Construction Permits represented the stagnant value added of the secondary sector and the
negative private investment in housing. Likewise, the Lending Rate has a low correlation with
the component since it is determined by the Central Bank, remaining unchanged for long periods
of time. Although Exports and Imports of Goods have a significant weighting, the Current
Account which also includes the invisible transactions, is not significant, mainly due to the
increase in reverse immigration and the international crisis in shipping.
Variables which were not correlated with those most heavily loaded in the first component
constitute the second component such us the Stock Market Index along with the Construction
Index and Gold Reserves. As it was mentioned earlier in the paper, the Stock Market was rather
impotent, the demand for construction investment was relatively low and the level of State
reserves was not particularly volatile. Likewise, the correlation based construction of the third
component consisting of the Current Account, and the Unemployment Rate does not lack
economic meaning.
The analysis for the period 1986-92 and the determination of the orthogonal factors are presented
in Table III (overleaf). It was found, as in the case of the first period, that the first component
consists of variables which represent a large part of the economy. In contrast to the previous
21
period however, the Stock Market Index, the Lending Rate and the Gold Reserves are
significantly positively correlated to the first component. This phenomenon could be respectively
explained by the growing importance of the Athens Stock Exchange, the gradual deregulation of
interest rates and the significant increase of the level of foreign reserves11. The insignificance of
the budget deficit in the first component, during this period, may reflect the temporary reduction
of the PSBR in 1986-87 due to the stabilization programme imposed. This variable now loads on
the second component with the correct sign. Unemployment Rate on the other hand, after its
significant increase during the 1980-86 period, stabilized around 7.5% for the later period.
Consequently, this variable too, loads on the second component, which has a strong positive
correlation with Industrial Production which accelerated after the implementation of the
stabilization program, therefore exhibiting a negative sign.
The second principal component in the period 1980-86 consists mainly of the Stock Market
Index and the Construction Index. This systematic risk is not significantly priced, reflecting the
unimportance of these variables in this period. In period 1986-92, the second component is
highly correlated with Industrial Production and the Budget Deficit and has a positive and
significant sign. This means that investors demand risk premium vis- vis these systematic risk,
reflecting uncertainty with respect to a further increase in industrial production and threatening
budget deficits. The coefficient of the estimated risk price for the third principal component is
significant in each subperiod ( in both subperiods the current account is the leading variable).
However, the signs of the coefficients are opposite. The positive sign for the first subperiod is
explained by the seriousness of the current account deficit problems, which was above the 5% of
GDP for the period, peaking in 1985 at 10% of GDP. The current account was one of the basic
instabilities that lead to the implementation of the stabilization programme at the end of 1985.
22
On the contrary, the current account problem was alleviated in the next period (the current
account deficit as a percentage of GDP decreased to 3%, although growth and investment were
accelerated significantly) contributing to the optimism about investing in the Stock Market.
PCA is used in the Globalization and changing patterns in the international transmission of
shocks in financial markets. We apply principal components analysis to our monthly data on
yield spreads as well as our indexes of exchange market pressure. The first principal component
vector provides a measure of the overall extent of co-movement within these data, while an
analysis of the factor loadings associated with the second and third principal component vectors
reveals various patterns in dependence within groups. Typically this grouping is easy to identify
by plotting the factor loadings, however, to take some of the arbitrariness out of identifying
groups we employ a clustering algorithm to categorize countries into three distinct clusters. This
works by minimizing the distance between members of a group, while maximizing the
distance across separate groups.
As a complement to our principal components analysis, we estimate the probability of a global
currency crisis. We identify global currency crises as extreme values of an index which captures
the degree of exchange market pressure that is common to all countries. Specifically this index is
the first principal component of the exchange market pressure data.
While principal components analysis sheds light on the patterns in cross-country
interdependence, it does not account for all of the complex dynamics and inter-relationships that
may exist between countries. To better understand these relationships, we estimate vector auto
regressions using data on short-term interest rates. By estimating impulse response functions
from these VARs, we were able to trace the impact of a shock in one country on another, and
23
thus shed light on the direction of shocks and the degree to which they impacted on other
countries.
Like mentioned before, PCA has often been applied in finance to study movements in stock
markets. Papers by Leger and Leone (2008) and Meric, Ratner and Meric(2008) do just that, but
from a slightly different perspective. Leger and Leone look at the changes in the UK stock
market and macroeconomic factors (news) that could cause them and found that Market Capital
Gain, Dividend Yield and Consumer Confidence were only ones with significant influence.
Meric, Ratner and Meric focus on a much conventional topic in finance papers using PCA, they
look at the possibilities of diversification among major stock markets, but they also include
market sectors in their analysis They find that, in a bull market, investors can obtain more
benefit with global diversification than with domestic diversification even if they invest in the
same sector in different countries as opposed to investing in different sectors within the same
country. In a bear market, the sectors of different countries tend to be more closely correlated
and country diversification opportunities are limited.
An interesting paper by Shih et al.(2007) compares performance of China's state owned banks,
joint-stock banks and city commercial banks with performance measures developed using PCA.
Using PCA to develop performance measures is an interesting approach, though in this case it is
partly due to Chinese Government regulations of not publishing the direct bank performance data.
They find, that mid-sized joint-stock banks have the best performance in China. They suggest
this may be due to larger public pressure and less political importance. Also, local banks in costal
areas generally perform better than in inland, worst banks are in north-east of China.
24
PCA has also been applied into the measurement of convergence. Becker and Hall(2009) define
convergence as something that is taking place between a vector of 2 or more series over any
given period 1 to T if the %R2 of the first principle component calculated over the period 1 to
Tt is less than the %R2 of the first principal component calculated over the period Tt to T,
0btbT. Using this definition they find that there is little convergence between inflation rates of
European Monetary Union member countries and the New Member Countries of EU, except for
those 3 countries who have been accepted to join the Euro countries.
As we could see, PCA has many diverse and interesting applications in finance. PCA is an
analytical tool that can be applied alone or in conjunction with other measures to make sense of
complicated data and get interesting research findings.
25
Chapter IV Conclusions
Based on the articles in this paper we can see that Principal Component Analysis (PCA) is a
mathematical algorithm that reduces the dimensionality of the data while retaining most of the
variation in the data set. It accomplishes this reduction by identifying directions, called principal
components, along which the variation in the data is maximal. By using a few components, each
sample can be represented by relatively few numbers instead of by values for thousands of
variables. Samples can then be plotted, making it possible to visually assess similarities and
differences between samples and determine whether samples can be grouped.
PCA not only is applied in finance but also in a lot of others sector such as computer science,
image pattern, finding common features of facial images of human beings, image compression
and computation biology among few. Many applications beyond dimensional reduction,
classification and clustering have taken advantage of global representations of expression
profiles generated by this decomposition. Applications include identifying patterns that correlate
with experimental artifacts and filtering them out, estimating missing data, associating genes and
expression patterns with activities of regulators and helping to uncover the dynamic architecture
of cellular phenotypes. The rapid growth in technologies that generate high-dimensional
molecular biology data will likely provide many new applications for PCA in the years to come.
There are also many possible new applications in finance and economics, like for example the
new framework for measuring convergence. With the development of computation power
(software and hardware), even more complicated analyses utilizing principal component analysis
are possible. In finance and in all other fields of science.
26
List of Articles
1- Detection of financial distress via multivariate statistical analysis (Ganesalingam, 2001).
2- Co-movements of the U.S, U.K., and Middle East Stock markets (Meric, Ratner, & Meric, 2007)
3- Principal components analysis for correlated curves and seasonal commodities: The case of the
Petroleum markets (Tolmasky & Hindanov, 2002).
4- Globalization and changing patterns in the international transmission of stock in financial
markets (Bordo & Murshid, 2006)
5- Macroeconomic Factors and Stock Returns in a Changing Economic Framework: The Case of
the Athens Stock Exchange (Diacogiannis, Tsiritakis, & Manolas, 2001).
6- The financial characteristics of small firms which achive quotation on the UK unlisted securities
market (P.Hutchinson, Meric, & Meric, 1988).
7- Diversification benefits in the smaller European stock markets (Markellos & Siriopoulos, 1997).
8- Co-movements of U.S and Latin American equity markets before and after the 1987 crash
(Meric, Leal, Ratner, & Meric, 2001).
9- Changes in the risk structure of stock returns: Consumer Confidence and the dotcom bubble (Leger &
Leone, 2007).
10- Co-movements of sector index returns in the world's major stock markets in bull and bear markets:
Portfolio diversification implications (Meric, Ratner &Meric, 2006).
11- How far from the Euro Area? Measuring convergence of inflation rates in Eastern Europe (Becker &
Hall, 2009).
12- Comparing the performance of Chinese banks: A principal component approach (Shih, Zhang & Liu,
2006)
13- International portfolio diversification: A multivariate analysis for a group of Latin American countries
(Lessard 1973)
14- The inter-temporal stability of international stock market relationships: Another view (Philippatos,
Christofi & Christofi 1983)
15- An analysis of the interrelationships among the major world stock exchanges (Makridakis & Wheelwright
1974)
27
16- A new approach to modeling the dynamics of implied distributions: Theory and evidence from the S&P
500 options (Panigirtzoglou & Skiadopoulos, 2004)
28
References
Becker, B., & Hall, S. G. (2009). How far from the Euro Area? Measuring convergence of
inflation rates in Eastern Europe. Economic Modelling, In Press, Corrected Proof.
Bordo, M. D., & Murshid, A. P. (2006). Globalization and changing patterns in the international
transmission of stock in financial markets. Journal of International Money and Finance,
25, 655-674.
Cortazar, G., & Schwartz, E. (1994). The evalation of Commodity-contingent claims. Journal of
Derivatives, 1(4), 27-39.
Diacogiannis, G. P., Tsiritakis, E. D., & Manolas, G. A. (2001). Macroeconomic Factors and
Stock Returns in a Changing Economic Framework: The Case of the Athens Stock
Exchange. Managerial Finance, 27(6), 23-41.
Forzani, L., & Tolmasky, C. F. (2001). On the spectral decomposition of empirical correlation
matrices. Journal of Knot theory and its ramifications, 10(8), 1201-1214.
Ganesalingam, S. (2001). Detection of financial distress via multivariate statistical analysis.
Managerial Finance, 27(4), 45-55.
Jolliffe, I. T. (2002). Principal component analysis (Second ed.): Springer.
Leger, L., & Leone, V. (2008). Changes in the risk structure of stock returns: Consumer
confidence and the dotcom bubble. Review of Financial Economics, 17(3), 228-244.
Lessard, D. R. (1973). International portfolio diversification: A multivariate analysis for a group
of Latin American countries. The Journal of Finance, 28(3), 619-633.
Makridakis, S. G. (1974). An analysis of the interrelationships among the major world stock
exchanges. Journal of Business Finance and Accounting, 1(2), 195.
Malava, A. (2006). Principal component analsis on term structure of interest rates. Unpublished
Independent Research Project in Applied Mathematics Helsinki University of
Technology Department of Engineering Physics and Mathematics.
Markellos, R. N., & Siriopoulos, C. (1997). Diversification benefits in the smaller European
stock markets International Advances in Economic Research, 3(2), 142-153.
Meric, G., Leal, R. P. C., Ratner, M., & Meric, I. (2001). Co-movements of U.S and Latin
American equity markets before and after the 1987 crash International Review of
Financial Analysis, 10, 219-235.
Meric, G., Ratner, M., & Meric, I. (2007). Co-movements of the U.S, U.K., and Middle East
Stock markets. Middle Eastern Finance and Economics(1), 60-73.
Meric, I., Ratner, M., & Meric, G. (2008). Co-movements of sector index returns in the world's
major stock markets in bull and bear markets: Portfolio diversification implications.
International Review of Financial Analysis, 17(1), 156-177.
P.Hutchinson, Meric, G., & Meric, I. (1988). The financial characteristics of small firms which
achive quotation on the UK unlisted securities market. Journal of Business Finance and
Accounting, 15(1), 9-19.
Panigirtzoglou, N., & Skiadopoulos, G. (2004). A new approach to modeling the dynamics of
implied distributions: Theory and evidence from the S&P 500 options. Journal of
Banking & Finance, 28(7), 1499-1520.
29
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical
Magazine, 2(6), 559-572.
Philippatos, G. C., Christofi, A., & Christofi, P. (1983). The inter-temporal stability of
international stock market relationships: Another view. Financial Management, 12(4),
63-69.
Shih, V., Zhang, Q., & Liu, M. (2007). Comparing the performance of Chinese banks: A
principal component approach. China Economic Review, 18(1), 15-34.
Shlens, J. (2009). A Tutorial on Principal Component Analysis.Unpublished manuscript.
Smith, L. (2002). A tutorial on Principal Components Analysis.Unpublished manuscript.
Tolmasky, C., & Hindanov, D. (2002). Principal components analysis for correlated curves and
seasonal commodities: The case of the Petroleum markets. The Journal of Futures
Markets, 22(11), 1019-1035.
30

Pca

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Pca

Загружено:

Авторское право:

Доступные форматы

NATIONALCHENGKUNGUNIVERSITY

Limitations of PCA ................................................................................................. 11

Practical implications - Software ................................................................................... 12

Chapter III Applications ............................................................................................................... 14

Chapter II Literature review

2.1.1 Definition of PCA

2.1.2 History of PCA

2.1.3 Basic assumptions

2.1.4 Important concepts

2.1.5 Calculating principal components

2.1.6 Deriving principal components

. It is clear that, as it stands, the maximum will not be achieved for

to 11 = 1, the standard approach is to

use the technique of Lagrange multipliers.

Where Ip is the (p x p) identity matrix. Thus, is an eigenvalue of

, the largest eigenvalue.

In general, the kth PC of x is akx and

, where k is the kth largest eigenvalue of

, and k is the corresponding eigenvector.

is a diagonal matrix. The rows

of P are the principal components of X.

. With this relation and A (P1 = PT) we can finish

2.2.1 Importance of PCA

2.2.2 Benefits of PCA

Chapter III Applications

approaches will not be adequate if a co-integration mechanism is present. Co-integration

Вам также может понравиться