1925โ2023
Data Coverage
๐ Equities & Market Overview
R01 CRSP Market Overview 56.5s
4.9M observations, 36,892 securities, 1925โ2022.
Mean monthly return 0.93%, skewness 1.78, kurtosis 15.1. Listed stocks peaked ~10,000. Market cap: $714T (2021) โ $608T (2022).
R03 Compustat Fundamentals 103.4s
857,983 firm-years, 42,114 firms, 1950โ2021.
Median ROA declined from 7.0% (1950s) to ~0% (2020s). R&D intensity rose 2.2% โ 4.7%. R&D missing for 78% of obs.
R04 CRSP-Compustat Merged 151.7s
28,353 PERMNOs linked to 27,946 GVKEYs, 6.7M merged obs.
Median mktcap $125M vs mean $2,322M (extreme skew). Smallest quintile: 5.3% ann. vs largest: 16.4% โ counter to traditional size premium.
R16 S&P 500 Index 3.9s
25,359 daily observations, 1926โ2022.
Cumulative return 11,452ร, 9.73% annualized. Best decade: 1950s (18.8%, Sharpe 1.62). Worst day: Black Monday โ19.46%. Max DD: โ84.8% (1929โ1945).
๐งฎ Factor Models & Anomalies
R02 Fama-French Factors 2.0s
1,157 months, 1926โ2022.
Momentum: highest Sharpe (0.476), then Market (0.436), Value (0.350), Size (0.208). HML-UMD correlation โ0.415 (natural diversification).
R20 Size / Value / Momentum 3.2s
Decade-by-decade factor analysis, 5-factor model.
CMA highest Sharpe (0.50), RMW (0.44). Small-value: 19.5% ann. vs big-growth: 11.2%.
R30 Factor Zoo 41.4s
CAPM โ FF3 spanning tests.
CAPM alpha 0.12%/mo (t=1.37, NS). FF3 eliminates alpha, Rยฒ jumps 82% โ 96%. HML improved OOS; MKT/SMB degraded.
R31 Anomaly Decay 39.2s
Pre-2000 vs post-2000 factor performance.
Value Sharpe declined 35%, momentum declined 77%. Gradual decay (no structural break), consistent with increased quant adoption.
R40 Comprehensive Factor Model 176.9s
Spanning tests + OOS evaluation, all 4 standard factors.
All factors provide unique info. Momentum: highest unique alpha (11.6% ann.). OOS Rยฒ negative for MKT/SMB/HML โ in-sample models degrade badly OOS.
R41 Fama-MacBeth Regressions โ
723 months, avg 951 stocks/month, full 2-pass cross-sectional pricing.
Size reversed: log(MktCap) positive (t=8.54). Anti-value: BM negative (t=โ16.98). Momentum negative (t=โ2.62). Avg cross-sectional Rยฒ = 5.0%. Portfolio-sort premia may not survive multivariate controls.
R42 Industry Momentum โ
6-month formation / 1-month holding, 69 SIC industries, 1,155 months.
Winner-loser spread: 0.14%/mo (t=0.38, NS). Industry rotation does not reliably add value โ unlike stock-level momentum.
R46 Term Structure of Equity Risk โ
Variance ratios from daily FF factor data, 25,378 obs, 1926โ2022.
All factors exhibit momentum at long horizons. HML strongest (VR=1.91 at 12mo). UMD strongest daily autocorrelation (0.151).
R05 IBES Analyst Forecasts 533.1s
20M forecasts, 26,857 analysts, 13,826 firms.
56.4% overshoot actual. Median abs error improved 1980sโ1990s ($0.35โ$0.22) then worsened to $0.53 (2020s). Coverage peaked ~6,600 analysts (2015).
R21 Earnings Surprises 146.6s
118,225 IBES-CRSP merged obs, 1983โ2022.
57.4% below consensus. Q5-Q1 surprise spread 269%. High forecast dispersion โ 2.7ร surprise volatility. Persistent PEAD confirmed.
R35 Earnings Management 116.7s
Modified Jones Model, 148,062 industry-year obs.
48.9% income-increasing / 51.1% decreasing. Absolute DA peaked in 2000s. Benford's Law rejects null (ฯยฒ=32.4) โ digit manipulation evidence.
R36 Analyst Herding 246.3s
5M IBES forecasts, 3,793 stocks, 18,129 analysts.
Herding ratio 1.17 โ actually anti-herding (late forecasters are bolder). Dispersion narrows approaching announcement (information convergence).
R43 Earnings Quality Composite โ
4-dimension EQ score (accruals quality, persistence, smoothness, loss recognition), 29,136 firms.
Q1 (highest quality): 17.0% ann. vs Q5 (lowest): 7.1% โ spread โ9.9pp. Component correlations near zero โ each dimension captures distinct info. EQ is a priced risk factor.
๐ฆ Institutional & Insider Trading
R06 Insider Trading 336.6s
5M records, 1988โ2011.
Sales dominate 4:1 (2M sales vs 480K purchases). CEOs: 17.3%. Median transaction $20K vs mean $8.8M (extreme skew).
R07 Institutional Holdings (13F) 112.8s
11,806 managers, 1978โ2021.
Average stock held by 68 institutions (median 16). Each manager holds 711 positions (median 334).
R22 Insider vs Institutional 197.6s
2.5M insider trades vs 394K institutional quarterly reports.
Insiders net sellers (buy/sell 0.45), event-driven. Institutions make small quarterly adjustments. Different information sets confirmed.
R37 Institutional Trading Impact 162.7s
2.5M S34 records, 945 managers, 8,376 stocks.
Most positions unchanged each quarter (median ฮ = 0%). Herding measure 0.13; 46.1% of stock-quarters show high herding (>0.10). More pronounced in smaller stocks.
R44 Insider Signal Returns โ
Module did not complete execution.
๐ Options & Derivatives
R08 Options Market (OptionMetrics) 187.9s
26 years (1996โ2021), 500K records/year.
Mean IV 46.7%. IV peaked 2000 (64.3%), lowest 2005 (31.1%). Perfect 50/50 call-put split in standardized data.
R23 Option Implied Volatility 62.9s
3.4M option records, 1996โ2021.
Mean IV 44.4%, significant right skew (1.95). Puts have 0.45pp higher IV than calls (downside protection demand). IV term structure slopes downward.
R38 Option Strategy Backtest 69.2s
4 strategies, 30-day ATM options, 201 securities, 2010โ2020.
Protective Put: Sharpe 0.573 (best). Covered call: +13 bps at cost of +1.6% vol. Short straddle: โ0.3% ann., kurtosis 20.2 (tail risk).
R09 Corporate Bonds (TRACE) 151.0s
805,241 bonds, 2002โ2022.
Mean price 101.15% par, yield 4.55%. Sell-side: 32.7% of volume.
R12 Municipal Bonds (MSRB) 103.1s
2.5M transactions, 2005โ2022.
Customer sales 44.4%, inter-dealer 33.5%. Median par $30K. Estimated median dealer markup $0.40. Mean coupon 4.25%, yield 2.96%.
R17 Treasury (GovPX) 16.6s
800K quotes, 1993โ2007.
Mean spread 2.75 bps. Declined from 3.6ยข (1993) to 2.0ยข (1998). Negative autocorrelation (โ0.10 to โ0.21) confirms bid-ask bounce.
R24 Bond-Equity Linkage 86.0s
TRACE-CRSP 6-digit CUSIP match.
Only 1,352 overlapping issuers (5.8%). These firms avg 15.8 bonds each โ larger, more established subset. Mean equity RV 41.2%.
R33 Credit Risk Spreads 133.4s
TRACE + Compustat merged, 16.7M obs.
Mean leverage 0.61, rising from 0.48 (1960s) to 0.64 (2000s). Credit spreads increase with lower ratings and higher leverage.
R32 High-Frequency Treasury 50.9s
GovPX tick data 1998/2002/2006, 3M observations.
U-shaped intraday activity. Bid-ask bounce: autocorrelation โ0.47 (1998) โ โ0.31 (2006), diminishing as spreads narrow.
R49 Treasury-Equity Correlation โ
S&P 500 vs risk-free, 25,210 obs, 1926โ2022.
Unconditional correlation 0.010 (near zero). But varies: positive 1950s/1970s, negative 2020s (โ0.108). Negative-corr regimes: higher equity Sharpe (0.90 vs 0.60).
๐ข Corporate & Governance
R11 DealScan Lending 26.2s
396,004 facilities, 68,763 covenants.
Syndication dominant (75.2%). Mean spread 232 bps, median maturity 60 months. Most common covenant: Max Debt/EBITDA (15,960).
R13 Executive Compensation 34.9s
333,179 exec-years, 3,961 firms, 1992โ2022.
CEO median $8.0M (2022). Stock awards dominant ($5.0M). Female CEOs earn higher median ($3.9M vs $3.3M) but only 3.2% of obs. Highest: Charles Wang $655M (1998).
R14 Audit Quality 42.1s
278,074 audit-fee obs, 1996โ2024.
Audit fees rose $426K โ $1.99M. Big 4: 66.3% share, $1.87M mean vs non-Big 4 $184K. SOX reduced non-audit ratio 36.3% โ 15.0%. Restatement rate 6.7%.
R28 Financial Distress 133.1s
Altman Z-Score, 857,318 firm-years.
Original Z: 90.1% in distress zone. Modified Z'' (Altman 1993): 70.9% distressed, 22.2% safe. Finance sector: 97.7% distressed (original).
R29 Merger Arbitrage 29.4s
12,818 potential M&A events, 642 target firms.
Mean announcement return 32.4%. Pre-event run-up begins day โ20 (โ8.5% CAR), suggesting information leakage.
R34 Governance & Returns 37.5s
GMI Ratings, 15,561 obs, 200+ governance variables, 2004โ2018.
CEO-Chair duality in 64.3% of firms. No single composite governance score available in dataset.
R48 CEO Pay & Performance โ
54,123 CEO-years merged with Compustat, 1992โ2022.
Median CEO pay: $1.3M (1995) โ $8.0M (2022), +360%. Pay-performance ฮฒ=1.46 (Rยฒ=2.3%). Higher-paid CEOs โ better next-year ROA (Q5: 4.7% vs Q1: โ2.1%). Low Rยฒ = size/industry dominate.
R10 ETF Flows 3.3s
4,684 ETF tickers, 2012โ2023.
AUM $19B โ $97B. Largest daily inflow: $5.8B into SPY (Jun 2022). Top-10 concentration declined 32% โ 14%.
R25 ETF vs Mutual Funds 11.8s
ETF + MF comparison, 2002โ2022.
ETFs: 1,604 โ 3,090 tickers (+93%). MFs: 5,562 โ 1,975 (โ64%). MF median expense ratio: 1.30% โ 0.75% (fee compression from passive competition).
R45 Mutual Fund Persistence โ
9,275 funds, 1.2M fund-month obs, 1962โ2021.
Transition matrix indistinguishable from random (diagonal avg 0.199 vs random 0.200). Q5-Q1 future alpha spread: โ0.27% (โzero). Past performance does NOT predict future. EMH supported.
๐ฌ Market Microstructure
R27 Equity Microstructure 58.9s
5M CRSP daily records.
Median spread declined 73.6% (3.1% โ 0.8%). Decimalization (2001) = primary structural break. Higher spreads โ higher volatility.
R47 Liquidity Premium โ
Amihud illiquidity quintiles, 1.37M stock-months, 1926โ2022.
Q5 (illiquid): 2.63%/mo vs Q1 (liquid): 1.15%/mo โ spread 1.48%/mo (t=3.72, 17.7% ann.). But FF3 alpha only 0.56% (NS) โ size/value explain most. Sharpe peaks at Q3, not Q5.
R18 Bank Branching 47.6s
FDIC, 751,571 branch-years, 1994โ2002.
Branches +6.5% while institutions โ27.2% (consolidation). Branches/institution: 6.3 โ 9.1. BofA led with 2,740 branches.
R15 Crypto Order Book 2.8s
6 instruments, 4 exchanges, 2017โ2023.
Mean spread 7.3 bps (median 2.8). Bybit tightest (1.0 bps). BTC depth $6.3M on Bybit. Depth imbalance slightly negative (โ0.003).
R19 Kaiko Crypto Inventory 18.5s
8,661 trade files (1,860 GB), 64 exchanges.
Spot 5,722 files, futures 1,394, options 833, perps 712. OKEx leads (1,887 files). Order book: 5,010 GB across ~3M files.
R26 Cross-Asset Correlations 25.0s
Equity, S&P 500, crypto daily returns.
Crypto-equity correlation: 0.016 (near zero). Crypto: 84.14% ann., Sharpe 0.40 (0.33%/day, 13.3% daily vol). Diversification ratio 0.889.
R39 Crypto-Traditional Linkage 2.6s
1,613 overlapping trading days.
BTC is NOT a safe haven. Mean BTC-SPX rolling corr: โ0.04. During equity stress: crypto returns โ5.5% (vs market โ2.5%). Optimal allocation to crypto: 0%.
๐ Data Quality & Infrastructure
R50 Data Quality Audit โ
Comprehensive audit across all 20+ databases.
Cross-database consistency checks, missing data patterns, survivorship bias assessment, and data coverage gaps documented.