Академический Документы
Профессиональный Документы
Культура Документы
Download the package and run pip install wrds in your terminal on Mac or
Anaconda prompt in PC.
import wrds
2. Establish Connection
You will need your WRDS username and password (yes, the ones you use for logging
onto WRDS website) to establish the connection. Once you run this code, you will be
prompted to enter those info. In the code below, I name the wrds connection as
"conn". And you will see this referred to repeatedly in later parts of this exercise. Of
course, you can name it any other way to your liking.
conn = wrds.Connection()
If you see the following message after running the code above, it means you've
successfully established the connection!
3. Explore Your Library and Table
For SAS programmers, we tend to refer to databases on WRDS as "libraries", and
specific data items within that database "datasets". In python language, we refer to
these as "libraries" and "tables". For example, pricing database CRSP is a library, and
the monthly stock file CRSP.MSF is a table.
conn.list_libraries()
The output is a "list" containing all libraries your institution has subscription to.
To explore the datasets under any particular database, or library, use the
list_tables() function. Of course, you will need to specify which library's tables you
want to display. You can do so by adding 'library = ' parameter in the list_tables()
function. Below is an example of listing all datasets/tables under the Compstat
library.
conn.list_tables(library='comp')
The output from this command line is again a "list" containing all tables under the
umbrella of Compustat library.
4. Query Data from WRDS Server
There are several ways to extract data from WRDS. I will go over three scenarios,
from the most straightforward method of get_table(), to more sophisticated
raw_sql() that accommodates conditioning statements as well as merging several
data sources.
In the example above, we are getting the comp.company table by specifying library
name (comp) and table name (company). The last component 'obs=5' states only the
first 5 rows will be extracted.
Instead of getting the entire wide table of comp.company, you can also narrow down
to the specific columns to extract from the source table.
Here we impose additional slicing by naming the columns we would like to extract:
['conm', 'gvkey', 'cik']. And the output dataframe company_narrow contains 5 records
and 3 columns, as expected.
4.2 raw_sql() method for subsetting data
If instead of querying the entire table, you want to impose a "where" condition to
subset the data, raw_sql() method provides that functionality. The syntax is fairly
standard SQL style.
The example above extracts several columns from crsp.msf table, and imposes two
conditioning statements: permno=14593 is to zoom in on one particular stock, and
date range condition narrows the output to most recent data starting in 2019
January. Notice the last date_cols=['date'] component in the code. This indicates
that the "date" column should follow datetime format.
4.3 raw_sql() method for joining multiple datasets
So far we have only queried data from single data source. If a researcher needs to
query data from multiple data sources, raw_sql() method offers this capacity. For SAS
programmers, this is essentially the "proc sql" procedure, just that it is now actually
true SQL.
I demonstrate this using the example below. This exercise creates a dataframe
apple_fund reporting fiscal year end balance sheet item (total asset - at) and closing
price (prccm) by joining two different data sources: fundamental data through
comp.funda (fundamental annual data) and pricing data through comp.secm
(monthly pricing data).
from comp.funda a
inner join comp.secm b
on a.gvkey = b.gvkey
and a.iid = b.iid
and a.datadate = b.datadate
Freda Song Drechsler Home CV Intro to Python for FNCE More
where a.tic = 'AAPL'
and a.datadate>='01/01/2010'
and a.datafmt = 'STD'
and a.consol = 'C'
and a.indfmt = 'INDL'
""", date_cols=['datadate'])
We join the two data sources by matching on the common keys. In this case,
gvkey+iid combination and fiscal date. Conditioning statements are part of the SQL
statement by specifying company ticker, date range, and several other compustat
data requirements. Last by not least, the code indicates that the column 'datadate'
should be formatted in the date format.
5. Saving Your Output
Now that you've run some simple python code on WRDS server, you might wonder
how to save your output to your local computer. Python Pandas package supports
flexible output format:
pickle for further python work
csv or excel
even SAS data format!
import pandas as pd
6. Ending Note
Hopefully by now you feel comfortable finding your Python way in the data jungle on
WRDS server. You can click the "Full Code" button to download the entire code in
ipynb format. I will elaborate some of the basic python analytics techniques in the
next section.
Full Code