Benchmarks
- Where did you get the benchmark data for the Dow and the Russell 2000/3000? We could not locate the data on CRSP. I have data downloaded from Bloomberg that is similar to the benchmark, however the data runs from (12/31-12/31) rather than (1/01-1/01), so I was wondering where you got Dow and Russell so I can calibrate exactly to your data.
- I wonder whether the Dow Jones annual return data come from WRDS. I wasn't able to find it under either CRSP or Compustat. If it's not, could you please point out a direction of where we should be looking for the data? You have suggested us to go to the Dow Jones official site for total return data. The Dow Jones Total Return on the site is only available in daily, not yearly, and it only goes back to 1987. See http://www.djaverages.com/?go=industrial-index-data
- We are having difficulty finding data for Russell and NASDAQ as well... It seems the only place that provides market weighted vs value weighted data is WRDS. Yet we could not find any data of Russell or NASDAQ.
- Additionally, how do we normalize data for the year, if we are able to get it.
- I've been exploring WRDS but with the exception of NYSE/AMEX, I can't find any information summarized by index with which to construct benchmarks. Do you intend for us to identify the membership of each index by year (86 years), look up the permno's for each stock, and pull this information from CRSP for each of the DJIA, SP500, NYSE/AMEX, RUT3000, RUT2000? If that is your intention, can you please tell me if you will be assigning this amount of work for each of the homeworks? We did something similar for just the 10 dogs of the Dow over 30 years in Stat 686. It took a very long time and was considered a project level undertaking. I've added 482 on top of an already heavy load because I'm interested in the topic. So I'm eager to know if I should be auditing the class instead of taking it.
- How do we get the master list of "permno" of each company we need the data for. I was thinking of taking the list of "permno" and putting it in the CRSP/CRSP-Merged to get the annual variables for each of the companies(public and private).
Public vs. Private Companies on CRSP, Compustat, and CRSP/Compustat Merged
- First, we noticed in one of the great ones that they excluded OTC stocks from their public data set. However, aren't companies traded OTC still technically public and should therefore be included?
- When we query Compustat for all companies, we get ~10,000 observations (only ~300 are private when we look at exchange codes of 0,1 and 3), and when we query CRSP-Compustat for all companies, we get ~5300 observations (none of which are private.)
Fiscal Year Analysis
- The instructions say to find the count and proportions of companies with market cap over the median and mean for each year. However, by definition, exactly half of the companies are going to have market cap over the median each year. Can you please clarify this instruction?
- We were wondering what sort of 'conclusions' could be made in relation to the coverage analysis. Are you looking for general comparisons between coverage of publicly and privately traded companies? Or rather, an examination between possible differences between coverage of different privately traded companies and then the same for publicly traded companies? (Or both?) Do you also want us to include a discussion about how CRISP/Compustat collects data (such as linking options as seen in WRDS) as it relates to coverage?
- What is the difference between the FYEAR and the DATADATE. When we extract the year from the datadate, it is not always the same as the fyear. Why is this? When examining across time periods, which should we use?
- How are Fiscal Year End and Current Fiscal Year End Month distinct?
- In the assignment it says that we should expect a few hundred private companies per year, but according to our data we are getting less than 10 per year. When I downloaded the data from WRDS, I kept most of the defaults (except for the date variable option). Should we change one of these options? Or is there another exchange code that is private?
- How do we calculate the returns from the data? We are thinking we would calculate the Price adjusted and calculate the yearly return as: Yearly Return=[Price Adjusted(t)-Price Adjusted(t-1)]/(Price Adjusted(t-1)
- For 682 pt a: Could you elaborate on what you meant by "bootstrap in order to determine the standard errors of the median"? Do you mean for us to perform random sampling with replacement on the returns data and then apply the 'Studentized Bootstrap'?
- For 682 pt b: Would you recommend for us to pull data comparing how filing times have fluctuated across different 10-year spans in order to affirm or reject the null hypothesis. For example if we intended to look at the period for 2006-2016 and found that filing times did improve, should we also look at the period from 1995-2005 or 2000-2010 to determine if the changes from 2006-2016 were significant?
- How do I compare distributions?
Financial Data Distribution I
- The instructions use the abbreviations TA and CA in various definitions; however, based on Chapter 4 in QFA and the CRSP/Compustat terms, it seems as though ACT (Current Assets - Total) works in place of both TA and CA. Am I understanding this correctly?
- Do you want three separate portfolios based on each variable or can we do one portfolio based on a combination of our three variables? We have already constructed a portfolio that ranks our companies by each variable at different levels. For instance, we used market cap as our primary variable, then sorted by dividend yield as our secondary variable, and then sorted by price/sales as our third variable. Furthermore, we discussed why we ranked each variables in this way.
WRDS/CRSP Variables
- In the description of HW #3 you say that market cap should be readily available however I am not seeing it on Crsp merged database. I have found a variable called "Market Value" but I do not think this is what I want. If I cannot find the variable should I just multiply "common shares outstanding" by "closing annual price"? I am reluctant to do this though because you said to avoid using price data in our calculation.
- Regarding CFO, "CFOPS" does not appear in the CCM variable list. What should we use for this?
- Regarding CFO, we used XOPR, would that be best?
- The data is improperly formatted. The data entered for fiscal year end and actual filing data is not recognized as a date. Because of this the functions that calculate the amount of days in between 2 dates do not work. There are about 2000 observations for each of the 16 years, thus I think it's a little unreasonable for me to go through and manually calculate this for each observation. I'm not really sure what I should do.
WDB Chapter 4 problems
- In the given income statement for problem 4.1, there is a balance for "Operating Depreciation" that is listed just below Cost of Goods Sold, and is thus incorporated into the listed Gross Income (which is before SG&A expenses). Is this depreciation the classic depreciation expense that is associated with Plant, Property, and Equipment? Or is this referring to depreciation of currently held inventory? Because we are requested to calculate some fixed asset ratios that ask for annual depreciation, we assumed that the listed "Operating Depreciation" refers to Plant, Property, and Equipment. However, I was taught this kind of depreciation is listed outside of the Gross Profit, which makes me think the depreciation applies to Inventory, not PP&E. Can you please clarify what this "Operating Depreciation" applies to?
- In 4.3, first, the only way we can interpret the problem so that it makes sense is that the numbers listed are in thousands, so 50,000 means 50 million. Please let us know if this is not the case. For part b, we are required to reconcile the firm's tax payment. However, even after adjustments, to me it seems like the company has a negative tax liability, meaning it won't have a tax payment and would be adding a deferred tax asset. But I am not so sure the book intends us to get that deep into the accounting. So would demonstration that the company owes 0 taxes a sufficient answer? Alternatively, for part a, the problem mentions copyrights owned by the company are conservatively appraised at $100 million, while listed on the balance sheet at $10 million. Because the problem asks for adjusted financial statements, to me this means we increase assets for the full value, and add a corresponding increase to owners equity. If, however, this increase is not supposed to affect owners equity and instead be recogniz
Volatility Deciling
- What kind of returns are we using? Are we pulling these directly from CRSP database or are we calculating these? CRSP has Equal-Weighted and Value-Weighted Returns. Which should we use?
- When calculating annualized historical volatilities for all CRSP stocks by year for 1980-present, are we calculating the standard deviation using all DAILY returns in each year, and then get annualized volatility for EACH year for EACH stock? If yes, from what I collected from CRSP, I got over 1.5GB of data. Or am I not obtaining the right data? Or are we using the ANNUAL returns throughout this 1980-2017 time period to obtain the standard deviation, and then get the annualized volatility for EACH year for EACH stock?
- Are we using the NYSE/AMEX/NASDAQ file or individual NYSE, AMEX, NASDAQ files for the calculation of individual stock volatilities?
Pitroski
- We are supposed to filter by market cap and trading volume. The trading volume is not in CCM, but it is in CRSP, in both the daily and monthly files. Do you want us to use the monthly stock file, then multiply price by volume, divide by 30, and then merge back into our data from CCM?
Backtesting
- Which one exactly is the result we are supposed to beat? The "value decile intersection, 50 stocks" returns, or what?
- Do we have to beat him on total return since 1963? Or by 10-yr returns, (average or compound)? Sharpe ratio?
- Do we have any restriction on the number of stocks in the portfolio?
- Are we rebalancing on an annual basis?
CAPM
- The CAPM homework asks us to select 20 stocks from the Table D-1 ("Select 20 stocks from the list in QFA Table D-1"). However, Table D.1 on pg 585 is the Most Popular Stocks of 2017 and includes stocks like Alibaba, Amazon, Apple, Chipotle, Facebook, etc. Given the analysis is supposed to be from 1970-Present, is this the correct list?
- Are we required to use Carhart's 4-Factor model as well, or just CAPM and Fama-French 3-Factor? You mention the 4-factor model in the assignment, but I couldn't tell if it was a suggestion for extra work or a requirement of the homework.
- What are we supposed to use as the risk free rate for the stocks?
Final project
- Where is the assignment posted?
- How much flexibility do we have in designing our project?
- Which one exactly is the result we are supposed to beat? The "value decile intersection, 50 stocks" returns, or what?
- Do we have to beat him on total return since 1963? Or by 10-yr returns, (average or compound)? Sharp ratio?
- Do we have any restriction on the number of stocks in the portfolio?
- Are we rebalancing on an annual basis?
Where did you get the benchmark data for the Dow and the Russell 2000/3000? We could not locate the data on CRSP. I have data downloaded from Bloomberg that is similar to the benchmark, however the data runs from (12/31-12/31) rather than (1/01-1/01), so I was wondering where you got Dow and Russell so I can calibrate exactly to your data.
Our data goes from FTDOY to LTDOY, usually 1/2/xx thru 12/31/xx, so there should be very little difference between our results, although this sort of precision from you is what I was hoping for. If you check I'll bet Bloomberg is really based on LTDOY, BTW. Indeed, RUT3000/RUT2000 are not on CRSP, although the DOW is, but it does not go back very far. Our data comes from another service like Bloomberg which accounts for all dividends, including mergers/spinoffs and liquidating dividends. I would be very interested in seeing how these compare wit the Bloomberg data. I don't think Bloomberg goes back much past 1985, but perhaps you found that it did. You can also get the data directly from Russell.
I wonder whether the Dow Jones annual return data come from WRDS. I wasn't able to find it under either CRSP or Compustat. If it's not, could you please point out a direction of where we should be looking for the data? You have suggested us to go to the Dow Jones official site for total return data. The Dow Jones Total Return on the site is only available in daily, not yearly, and it only goes back to 1987. See http://www.djaverages.com/?go=industrial-index-data
Unfortunately, CRSP is making it harder to get Dow data; their Dow index only goes back to 1987, and only goes forward to 2007 for some reason; I suppose they stopped providing. What's worse, YAHOO no longer allows you to download all historical data in a spreadsheet. However, the Dow website does have data for the regular DJIA and the total return, so take that back as far as you can. You'll have to use daily returns to build up the annual return; you could do a sensitivity analysis to see how the daily granularity affects the end of year return, but that is not required.
We are having difficulty finding data for Russell and NASDAQ as well... It seems the only place that provides market weighted vs value weighted data is WRDS. Yet we could not find any data of Russell or NASDAQ.
WRDS provides NASDAQ data back through 1987 or so; YAHOO provides R2K thru 1987, and NASDAQ composite data back to 1971. YOu might have to get total return data from the NASDAQ or Russell websites. You don't have to include every index to so well on this project. You do not a good benchmark for the total stock market, hence we use NYSE/AMEX, maybe the Russell 3000, and also popluar indexes such as Dow and S&P500. Index data from the data providers such as DOW, S&P and RUssell which includes dividends is normally called "Total Return" and the total return CAGR's can be computed directly from these.
Additionally, how do we normalize data for the year, if we are able to get it.
I'm not sure we do "normalizing" on our time series data, the observations being just either an varibale level, or relative change, or category, etc. This is seen over time.
I've been exploring WRDS but with the exception of NYSE/AMEX, I can't find any information summarized by index with which to construct benchmarks. Do you intend for us to identify the membership of each index by year (86 years), look up the permno's for each stock, and pull this information from CRSP for each of the DJIA, SP500, NYSE/AMEX, RUT3000, RUT2000? If that is your intention, can you please tell me if you will be assigning this amount of work for each of the homeworks? We did something similar for just the 10 dogs of the Dow over 30 years in Stat 686. It took a very long time and was considered a project level undertaking. I've added 482 on top of an already heavy load because I'm interested in the topic. So I'm eager to know if I should be auditing the class instead of taking it.
Thanks for the question, fear not! You just need to get annual stock market index returns, available from numerous sources. In CRSP there is S&P, Dow, as well as those you mentioned.
How do we get the master list of "permno" of each company we need the data for. I was thinking of taking the list of "permno" and putting it in the CRSP/CRSP-Merged to get the annual variables for each of the companies(public and private).
Actually in this all this work would be unnecessary. You just select "Search Entire Database" option for the screen input. This will pull all PERMNO (or GVKEY, depending if you are in CRSP or Compustat.) This the fiscal year exercise, you should be in Compustat. You will then be provided whatever data variables you requested for ALL companies.
First, we noticed in one of the great ones that they excluded OTC stocks from their public data set. However, aren't companies traded OTC still technically public and should therefore be included?
The projects in "great ones" are not there because they nailed every aspect, but rather to give sense of what should be done. We limit ourselves to publicy traded stocks which includes OTC and BB stocks.
When we query Compustat for all companies, we get ~10,000 observations (only ~300 are private when we look at exchange codes of 0,1 and 3), and when we query CRSP-Compustat for all companies, we get ~5300 observations (none of which are private.)
There are approximately 14,000 publicly traded companies in US and Canada. Some of this discrepancy can be attributed to data cleaning. However, CRSP and Compustat also operate in different universes. Full details from WRDS attached. (Sorry for the terrible formatting -- I have no idea how to use my office Mac.)
In this class we do not consider OTC securities tradable, but they do represent public companies. For the purposes of this assignment you may either: (1) Remove them from your analysis, and proceed with the assignment as written; or (2) Amend your assignment to a consideration of "tradable" and "non-tradable."
Both solutions are acceptable. When making your choice and doing this assignment, it is important to consider why companies traded on major exchanges might have a different distribution of fiscal years ends than thinly traded or private companies, and you should comment on this in your write-up.
The instructions say to find the count and proportions of companies with market cap over the median and mean for each year. However, by definition, exactly half of the companies are going to have market cap over the median each year. Can you please clarify this instruction?
Well, for the median the proportion will be as you say, but the rest of the statistics will be different. You could handle the median proportion in the text. Plotting the counts WILL show variability.
We were wondering what sort of 'conclusions' could be made in relation to the coverage analysis. Are you looking for general comparisons between coverage of publicly and privately traded companies? Or rather, an examination between possible differences between coverage of different privately traded companies and then the same for publicly traded companies? (Or both?) Do you also want us to include a discussion about how CRISP/Compustat collects data (such as linking options as seen in WRDS) as it relates to coverage?
Coverage analysis is intended to inform you as to the availability of data. Variables with poor coverage should not be used, and alternates should be considered... You should investigate the CCM linking options, but its anlysis is not required for this homework assignment.
What is the difference between the FYEAR and the DATADATE. When we extract the year from the datadate, it is not always the same as the fyear. Why is this? When examining across time periods, which should we use?
Usually the student should make a download and determine from the data what is going. FYEAR indicates the fiscal year that the company is in at the time of DATADATE. When using the annual Compustat it is pretty simple to understand, DATADATE gives us the annual close of fiscal period, in the case of MSFT (June FY) this might be 6/30/2010, and the FYEAR is 2010. In a more granular download (such as quarterly) we find DATADATE/FYEAR as:
DATADATE FYEAR FQTR
20100331 2010 3
20100630 2010 4
20100930 2011 1
20101231 2011 2
20110331 2011 3
20110630 2011 4
You could then want to use PRCC_F to get the closing price for the fiscal year.
How are Fiscal Year End and Current Fiscal Year End Month distinct?
This can be determined by examining the data. FYR and FYRC appear to be identical, one would need to go to Compustat documentation and see under what conditions they are different.
In the assignment it says that we should expect a few hundred private companies per year, but according to our data we are getting less than 10 per year. When I downloaded the data from WRDS, I kept most of the defaults (except for the date variable option). Should we change one of these options? Or is there another exchange code that is private?
Exchange codes are found on WRDS and also for your convenience on Canvas. If you are not getting any exchange 0-3 companies, you are probably using CRSP/Compustat Merged (CCM) rather than Compustat data. Make sure you are using Compustat, you should typically be getting a couple hundred companies per fiscal year.
How do we calculate the returns from the data? We are thinking we would calculate the Price adjusted and calculate the yearly return as: Yearly Return=[Price Adjusted(t)-Price Adjusted(t-1)]/(Price Adjusted(t-1)
Your formula is correct for no dividends, although I could state it more simply as:
Yearly Return=[Price Adjusted(t)/(Price Adjusted(t-1)] - 1. For total return you would need to include the adjusted dividends, e.g. Yearly Return with Dividends =[Price Adjusted(t)+Dividends Adjusted(t)/(Price Adjusted(t-1)] - 1, Where Dividends Adjusted = DVPSX_F/ADJEX_F
For 682 pt a: Could you elaborate on what you meant by "bootstrap in order to determine the standard errors of the median"? Do you mean for us to perform random sampling with replacement on the returns data and then apply the 'Studentized Bootstrap'?
Great question! As you know from your research into what the bootstrap is, it provides a way to get at the sampling varibality of various statistics. The statistic in question here is the median. The general idea is as you say to perform random sampling with replacement and calculate the statistic B=5000 times to get a distribution for the statistic. The standard deviation of this distribution is the standard error of the statistic. Programming the bootstrap is relatively simple, see for example http://www.statmethods.net/advstats/bootstrapping.html (Links to an external site.)Links to an external site..
For 682 pt b: Would you recommend for us to pull data comparing how filing times have fluctuated across different 10-year spans in order to affirm or reject the null hypothesis. For example if we intended to look at the period for 2006-2016 and found that filing times did improve, should we also look at the period from 1995-2005 or 2000-2010 to determine if the changes from 2006-2016 were significant?
Looking at several 10-year timeframes is a good idea and can be done by the industrious. Pick a 10-year period in the past and compare is all we are asking.
How do I compare distributions?
It depends on the problem. For example, for the FY comparison between Private and Public companies,
For example, in comparing the distribution of the private vs. the public companies, osm egouprs use dthe Chi Squared test on the number of reports filed.
The instructions use the abbreviations TA and CA in various definitions; however, based on Chapter 4 in QFA and the CRSP/Compustat terms, it seems as though ACT (Current Assets - Total) works in place of both TA and CA. Am I understanding this correctly?
Compustat/CCM variables have their own naming styles, that's why you have to work with the data and current financial statements to make sure you are using the correct variable name. ACT I believe is total current assets; total assets would be another variable. (TA includes non-current assets.)
Do you want three separate portfolios based on each variable or can we do one portfolio based on a combination of our three variables? We have already constructed a portfolio that ranks our companies by each variable at different levels. For instance, we used market cap as our primary variable, then sorted by dividend yield as our secondary variable, and then sorted by price/sales as our third variable. Furthermore, we discussed why we ranked each variables in this way.
For the Mini-Project we do not need to make portfolios at all, we are only trying to get the distribution of the data you might use in the final project.
In the description of HW #3 you say that market cap should be readily available however I am not seeing it on Crsp merged database. I have found a variable called "Market Value" but I do not think this is what I want. If I cannot find the variable should I just multiply "common shares outstanding" by "closing annual price"? I am reluctant to do this though because you said to avoid using price data in our calculation.
When you say that you found a variable called "Market Value", what was the variable? Was it MKVALT? Was it MKVALTQ? Did you download some data and see if it matched Yahoo Finance? This is sort of how you figure these out. I looked at MKVALT and saw that it appears to quote in terms of $M, so I think this is what you want.
Regarding CFO, "CFOPS" does not appear in the CCM variable list. What should we use for this?
You could use OANCF, Operating Activities - Net Cash Flow, in the cash flow items input section of the WDRS CCM GUI. This item represents the net change in cash from all items classified in the Operating Activities section on a Statement of Cash Flows. Of course you might notice that coverage might be poor for this variable depending on how far back you are checking, in which case you could use O'Shaughnessy's construct, IB + DP. This calculated CF will have much better coverage. A spot check for IBM and GM and other companies reveals that OANCF is missing before FY 1988. A full universe pull (23MB) from 1970-2016 shows 88% coverage for DP and 93% coverage for IB, with a calculated CF therefore exhibiting 93% coverage, with only 60% coverage for OANCF.
Regarding CFO, we used XOPR, would that be best?
No, XOPR is total operating expenses; it represents the sum of: Cost of Goods Sold (COGS) and Selling, General and Administrative Expenses (XSGA), it is an operating expense, not a cashflow item. When in doubt, always check again published 10-K's. When you do you can see that XOPR is clearly not CFOPS, or that IB+DP is very clearly only an approximation. (Note there are difference in DP and DPC, where the latter is a cash flow item - to resolve the discrepancy you'd have to check to 10-K's to see exactly which is correct to use.) The OANCF matches the 10-K data but not necessarly online sources such as YHOO and ADVFN because they would be reporting TTM rather than published numbers since these are more relevant for investors.
The data is improperly formatted. The data entered for fiscal year end and actual filing data is not recognized as a date. Because of this the functions that calculate the amount of days in between 2 dates do not work. There are about 2000 observations for each of the 16 years, thus I think it's a little unreasonable for me to go through and manually calculate this for each observation. I'm not really sure what I should do.
Welcome to the real world, this is how it is most of the time. If you are talking about dates such 20011231, you have to parse it and turn it into a date! There would be lots of ways to do this, in Excel you could use:
CRSP dates: 19800522 (Say this is E62)
text formula: 05/22/80 (F62)
datenum: 05/22/80 (G62)
E62: 19800522
F62: =MID(TEXT(E62,0), 5, 2)&"/"&RIGHT(TEXT(E62,0),2)&"/"&RIGHT(LEFT(TEXT(E62,0),4),2)
G62: =VALUE(F62)
You could just make a single formula:
=value(MID(TEXT(E62,0), 5, 2)&"/"&RIGHT(TEXT(E62,0),2)&"/"&RIGHT(LEFT(TEXT(E62,0),4),2))
You might start maintaining a spreadsheet of these sort of utilities since they will continue to arise. This is true whether you go into business or academia.
In the given income statement for problem 4.1, there is a balance for "Operating Depreciation" that is listed just below Cost of Goods Sold, and is thus incorporated into the listed Gross Income (which is before SG&A expenses). Is this depreciation the classic depreciation expense that is associated with Plant, Property, and Equipment? Or is this referring to depreciation of currently held inventory? Because we are requested to calculate some fixed asset ratios that ask for annual depreciation, we assumed that the listed "Operating Depreciation" refers to Plant, Property, and Equipment. However, I was taught this kind of depreciation is listed outside of the Gross Profit, which makes me think the depreciation applies to Inventory, not PP&E. Can you please clarify what this "Operating Depreciation" applies to?
This is a case of thinking too hard. There is a large variety of phrases used in different companies' financial statements. Wynn is simple enough that it would be safe to assume that operating depreciation pertains to PP&E alone, especially since PPE is listed as net. All of us hope that our inventory does not stick around long enough for it to depreciate!
In 4.3, first, the only way we can interpret the problem so that it makes sense is that the numbers listed are in thousands, so 50,000 means 50 million. Please let us know if this is not the case. For part b, we are required to reconcile the firm's tax payment. However, even after adjustments, to me it seems like the company has a negative tax liability, meaning it won't have a tax payment and would be adding a deferred tax asset. But I am not so sure the book intends us to get that deep into the accounting. So would demonstration that the company owes 0 taxes a sufficient answer? Alternatively, for part a, the problem mentions copyrights owned by the company are conservatively appraised at $100 million, while listed on the balance sheet at $10 million. Because the problem asks for adjusted financial statements, to me this means we increase assets for the full value, and add a corresponding increase to owners equity. If, however, this increase is not supposed to affect owners equity and instead be recogniz
You are correct, the figures are stated in thousands. Showing zero tax liability is sufficient. In making adjusted financial statements, we are not asked to make current value accounting statements, so knowing that the copyrights are worth $100,000,000 instead of $10M is nice, but this should not be included on the adjusted statements.
What kind of returns are we using? Are we pulling these directly from CRSP database or are we calculating these? CRSP has Equal-Weighted and Value-Weighted Returns. Which should we use?
We assume you mean parts (1 and 2) of the question pertaining to the CRSP index volatility. We're interested in index returns so you may use the CRSP-provided returns (i.e., DO NOT calculate the returns yourself). Since the EW indexes are hard to find in the real world, you should use VW (i.e., VWRETD or VWRETX; the dividends should not affect the volatility calculations very much).
When calculating annualized historical volatilities for all CRSP stocks by year for 1980-present, are we calculating the standard deviation using all DAILY returns in each year, and then get annualized volatility for EACH year for EACH stock? If yes, from what I collected from CRSP, I got over 1.5GB of data. Or am I not obtaining the right data? Or are we using the ANNUAL returns throughout this 1980-2017 time period to obtain the standard deviation, and then get the annualized volatility for EACH year for EACH stock?
You are calculating annualized volatilities from the daily stock returns for each year for each stock. Welcome to "Big Data"! 1.5Gb is typical file size (and not that large), although you could break the data into chunks too. R/Python/SAS should easily handle file this size.
There is a variable in CCM (Fundamentals Annual) called OPTVOL. (Implied volatility of options, prefectly good measure), you get one value calculated at the end of each fiscal year. Unfortunately, the coverage is terrible. So since the purpose of the exercise is to get all volatilities in order to do a backtest, then this means we have to calculate from the daily (RET or RETX) for each stock for each year.
Here's a sample output for 2007 (all CRSP stocks):
Permno Volatility
1 10001 0.3088975
2 10002 0.3955540
3 10025 0.3877490
4 10026 0.3544119
5 10028 0.7179534
6 10032 0.4761090
7 10042 0.7790567
8 10044 0.2877138
9 10051 0.4303495
10 10065 0.1560827
...
6242 92284 0.5507326
6243 92340 0.8069427
6244 92399 0.4418432
6245 92583 0.7863388
6246 92655 0.2011622
6247 92663 0.5360361
6248 92690 0.2143214
6249 92807 0.3065137
6250 92874 0.2747127
6251 93105 0.9251976
Here is some rather inelegant code used to produce this (8 lines, I hope you can improve):
x2007 <- read.csv("c:/temp/482/data/crsp.2007.ret.csv", na.strings=c("NA", "", "C", "B")) #eliminate missings
calcVol <- function(x){
r <- log(1+x)
sd(r)*sqrt(length(x))
}
tmp <- tapply(x2007$RET, as.factor(x2007$PERMNO), calcVol)
tmp <- tmp[!is.na(tmp)] #get rid of NA volatilities
vol <- data.frame(as.numeric(names(tmp)), as.numeric(tmp), row.names = NULL)
names(vol) <- c("Permno", "Volatility")
Are we using the NYSE/AMEX/NASDAQ file or individual NYSE, AMEX, NASDAQ files for the calculation of individual stock volatilities?
You are using CRSP daily stock file, not the index files.
We are supposed to filter by market cap and trading volume. The trading volume is not in CCM, but it is in CRSP, in both the daily and monthly files. Do you want us to use the monthly stock file, then multiply price by volume, divide by 30, and then merge back into our data from CCM?
Ideally we would have subscription to Compustat/CCM quarterly data which has average trading volume, but we do not. Merging between CRSP and CCM is much too difficult for this exercise. We would use CCM's CSHTR_F (trading volume per year) divided by 252 to get average number shares per day, and multiply by a representative price such as PRCC_F (closing price). This will give an $AvgVol which we desire to have above $250,000 deflated by 2.5% per annum.
Which one exactly is the result we are supposed to beat? The "value decile intersection, 50 stocks" returns, or what?
You're not trying really to beat O'Shaughnessy, just the market benchmarks of your choice, such as SP500, AMEX/NYSE, etc. We recommend your trying to understand his results and statistics, which you should incorporate into your report.
Do we have to beat him on total return since 1963? Or by 10-yr returns, (average or compound)? Sharpe ratio?
Of course the further back you go the more stable the results will be, but things also change! You should have a backtest period going back at least to 1975, but much past that might not be all that helpful if you are really trying to do something going forward.
Do we have any restriction on the number of stocks in the portfolio?
You should employ a manageable number, such as 20-30-50, etc., unless you are proposing some high-frequency trading system, which is not really in the spirit of this course.
Are we rebalancing on an annual basis?
Yes, assume tax-deferred account such as IRA, etc.
The CAPM homework asks us to select 20 stocks from the Table D-1 ("Select 20 stocks from the list in QFA Table D-1"). However, Table D.1 on pg 585 is the Most Popular Stocks of 2017 and includes stocks like Alibaba, Amazon, Apple, Chipotle, Facebook, etc. Given the analysis is supposed to be from 1970-Present, is this the correct list?
Yes the list is correct. The assignment only requires that you depict the beta over time back to 1970 for ONE stock of your choosing. If you are doing the minimum on this project, all you need is enough data to regress 100-200 observations (daily or weekly) for 21 stocks; most companies in table D-1 should have data going go back this far. If not, the assignment also suggests using table D-2. In this case, a very brief discussion of why you had to use table D-2 is all you need to provide.
Are we required to use Carhart's 4-Factor model as well, or just CAPM and Fama-French 3-Factor? You mention the 4-factor model in the assignment, but I couldn't tell if it was a suggestion for extra work or a requirement of the homework.
Currently the assignment only calls for calculating the Fama-French 3-Factor model. Future assignments may include the momentum factor, as well as the operating profitability and Investment comprising the FF 5-Factor model, but this is not the case at this time. Note that all these factor data are available at the French data repository at http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
What are we supposed to use as the risk free rate for the stocks?
Fama/French provide this on their site too. It is also available in WRDS under Fama French & Liquidity Factors .
Where is the assignment posted?
The final project is assigned in the Assignments section of Canvas, with reference to the powerpoint slides in Files/Projects section. These are the slides we discussed in class.
How much flexibility do we have in designing our project?
Lots. But the more time you spend searching for and cleaning your data, the less time you'll have to actually work with it. Time management will be key, whatever form your final project takes.
Which one exactly is the result we are supposed to beat? The "value decile intersection, 50 stocks" returns, or what?
You're not trying really to beat O’Shaughnessy, just the market benchmarks of your choice, such as SP500, AMEX/NYSE, etc. We recommend your trying to understand his results and statistics, which you should incorporate into your report.
Do we have to beat him on total return since 1963? Or by 10-yr returns, (average or compound)? Sharp ratio?
Of course the further back you go the more stable the results will be, but things also chance! You should have a backtest period gogin back at least to 1975, but much past that might not be all that helpful if you are really trying to do something gogin forward.
Do we have any restriction on the number of stocks in the portfolio?
You should employ a manageable number, such as 20-30-50, etc., unless you are proposing some high-frequency trading system, which is not really in the spirit of this course.
Are we rebalancing on an annual basis?
Yes, assume tax-deferred account such as IRA, etc.