Вы находитесь на странице: 1из 6

In this report, I will explain how I have conducted my project of momentum

strategy. Momentum strategy is one where sorting stocks by their momentum (past
return), diving them into a few groups, forming a portfolio correspondingly for
each group and holding them for a certain period of time. It is essentially
performance chasing.

I use the data from CRSP, all the American stocks, which is about 7000 stocks.
I write everything in R and Python and produce some results. Due to the time
constraint, I was able to only compute the case of holding portfolio for just
one year while using the past year’s return (momentum). The result is that the
hedge portfolio would lose money because longing the top portfolio and shorting
the bottom portfolio turning out to be negative. However, this is only one year.
This is no way to achieve a significance statistics associate with this negative
value. The result is that one would lose 34.625 percent of his capital if he
were to invest in a momentum strategy.

A major issue I faced when cleaning the data is about missing observations. A
lot of stocks have missing data in January and/or December. The real problem is
that those missing observations are not even noted in the dataset. In other words,
if we were to represent the monthly prices of each stock by a vector, the length
of vectors would not be the same. Another issue is that the stocks that have
both of prices in January and December change over time. These two constitute an
issue of matching and merging that stopped me from progressing. While I just
started this project, I was not particularly good at Python or R. So I chose R
and was unable to solve this problem. Towards the end of this project,
incidentally, I was starting to form a clear vision as to how to use Python to
manipulate data. There is certainly some kind of procedure to handle this as in
SAS. Another solution using basic Python operation to this problem is fairly
easy. However, due to the time constraint, I was unable to actually implement it
in Python. Therefore I hereby outline the solution: use dictionary. I first
create a dictionary where the list stores the tickers and the value stores all
the corresponding prices. While loading prices from the dataset which is
presumably in a dataframe, I would artificially construct an arbitrary number,
say -10000, as the price whenever a missing observation happens. This way, I
will create a set of vectors of prices all of which have the equal length. Next,
I will delete all the tickers that have an observation of -10000. I do this for
every year and form a corresponding dictionary. Lastly, I form a common set of
sets of tickers of those dictionary. This set of tickers represents all the
stocks that legitimately have both prices in both January and December over time.
Smoothing price is another way to deal with the missing observations.

There has to be a more elegant solution/package dedicated to this issue, which


I am going to study later.

An interim solution is to manually arrange data. This method only applies to a


relatively small number of stocks, which is why I choose 100 stocks and only
process two years data (year 2011 and year 2011). I simply choose by hand the
very first 100 stocks that do not have missing values in those years.
From this point forward, I did everything I could in R and spreadsheet to product
a dataframe By running the R code called Lizi_project.do and using the input
data called Dataset2.csv, I produce an output that looks like this,
Note that spearman ranking coefficient is negative, which means the momentum
strategy is unlikely to work. At least not in this specific context where I use
one past year’s return as the only anomaly/predictor to predict the next year’s
return.

Next, I export this dataframe into an excel spreadsheet because I cannot proceed
in R and therefore turn to Python.

Open the file called ProblemA_spreadsheet_real.py to solve the problem A where


the input is the real data of stocks return of two consecutive years I just
computed by using R and the output is stocks’ next year returns are sorted
according to stocks momentum. One way to verify this is to run this code with
the artificial data stored in a file called test.xlsx. The code is written in a
way that everything is automated. The output will look like this,
Next, given we have a dataframe that stores the next year’s return, I will
divide stocks into 5 groups, form portfolios and compute the return of each
portfolio. This is done by running the code called ProblemB_spreadsheet_real.py.
Everything is automated.

The output looks like this,


Then simply compute the sum of vectors of stocks return, and long the bottom and
short the top and the output looks like this,

Note that the order is reversed. So the top portfolio is the one that has the
stocks with lowest momentum.
In conclusion, this momentum strategy is unlikely to work within a very short
period of time. The only way to ensure its significance is to run the computation
in a rolling fashion, which is left to future work. The real challenge, as it
turns out, is not so much about the coding of momentum itself as about the data
cleaning, specifically, the matching and merging. In realistic empirical research,
one has to deal with matching and merging almost always. Therefore I need to
master matching and merging in at least both R and Python.

Вам также может понравиться