A Cointegration Approach to Identifying Systemic Risk in Markets

Michael Kane

Bryan Lewis

Who am I?

Currently an Assistant Professor at Yale

 

Research Area is in scalable statistical computing and machine learning

 

I was a Co-PI in DARPA's XDATA program

Premise: What can we do when we can't predict signal (in this case stock price)?

Outline

PART 1 - THE FLASH CRASH AS A MOTIVATING EXAMPLE OF SYSTEMIC RISK

 

PART 2 - QUANTIFYING SYSTEMIC RISK USING COINTEGRATION

 

PART 3 - SCALING TO REAL-TIME MARKET DATA

The Flash Crash

On May 6, 2010 the stock market losses about 1 trillion dollars in value in about 15 minutes

 

 

What happened afterward?

A compelling explanation was not found but conspiracies abound.

 

Single-stock circuit breakers (and collars) are instituted.

Do Single-Stock CB's Work?

When a large-cap stock's price goes up (or down) 5% in 5 minutes trading stops.

 

Drop for the Dow was 9.2%

... but it's worse than that because stocks are stopped when they shouldn't be.

... and even worse than that

A halt is meant to allow traders to "take a step back" and assess the market

 

They do not.

What is the underlying problem?

 

 

Single-stock trading halts manage single-stock volatility

 

They don't manage systemic risk or risk caused by an event severe enough to cause instability in the financial system

Should we regulate volatility?

Characterizing Systemic Risk

Dynamically spanning sectors over time

 

Makes markets vulnerable to events that result in high-volatility

 

We may not be able to forecast events but we may be able to estimate vulnerability

Assume stock that "behave similarly" are susceptible to similar price-moving factors

Quantify "similar behavior" using Cointegration

y(t) - B x(t) = u(t)
y(t)Bx(t)=u(t)y(t) - B x(t) = u(t)

Two time series y(t) and x(t) are cointegrated if

where B is a real-valued constant and u(t) is a stationary process.

 

It is OLS with noise that doesn't diverge.

Who uses it?

Economists - model macroeconomic trends

 

Financial Engineers - Provides a generalization to pairs trading

 

Me - to find stocks "behaving similarly"

Preliminary Study

May 6, 2010, S&P 500 stocks, sliding window

 

124,750 comparisons for each second of a 23,400 second day

 

3 days and about 60 machines at the Oak Ridge National Labratories

 

Off-the shelf R packages

Is this really "massive"?

Results for the Entire Market

Results by Sector

A NORAD for Markets

We can tell when stocks behave similarly

 

They react similarly after a shock

 

We may not be able to predict shocks but we can definitely detect when a shock will be system-wide

 

Now if only we could run this in real-time...

Scaling the Solution to Higher Frequency

Euclidean distance is related to correlation

 

Correlation is necessary for Cointegration

Idea behind the algorithm

Project the data onto a low-dimensional subspace

 

If you are far away on the subspace you are at least as far away in the original space

 

That distance tells you something about the correlation

 

A distance approximation using the SVD

|| x_i - x_j ||^2 = \sum_{l=1}^k \sigma_l (e_i^T v_l - e_j v_l)^2
xixj2=l=1kσl(eiTvlejvl)2|| x_i - x_j ||^2 = \sum_{l=1}^k \sigma_l (e_i^T v_l - e_j v_l)^2
x_i
xix_i
X
XX
\sigma_l
σl\sigma_l
v_l
vlv_l
e_i
eie_i

- standardized matrix

- ith column of x

- lth column of x

- lth right singular vector of X

- unit basis, all zeroes except 1 at position i

A distance approximation using the SVD

The truncated SVD can be calculated very efficiently using the Implicitly Restarted Lanczos Bidiagonalization Algorithm

 

Then we can use the following relationship to get highly correlated pairs

cor(x_i, x_j) = 1 - ||x_i - x_j||^2/2
cor(xi,xj)=1xixj2/2cor(x_i, x_j) = 1 - ||x_i - x_j||^2/2

Result

Computational complexity was reduced by about one and a half orders of magnitude

 

What used to take 3 days on a cluster now takes a few hours on a laptop.

 

This is now scaled for real-time and allows us to try new approaches that weren't previously feasible

R Packages Used

foreach [Calaway, Weston and Revolution Analytics, https://cran.r-project.org/web/packages/foreach/index.html]

nxcore [Hafen and Kane, https://github.com/hafen/nxcore]

tcor [Lewis, https://github.com/bwlewis/tcor]

rthreejs [Lewis, https://github.com/bwlewis/rthreejs]

Thanks