Statistical Arbitrage is a quantitative approach that exploits the price differences between assets. Here, we’ll focus on pairs trading for crypto and use perpetual futures on Binance.

Step One: Gather Universe

First we want to gather a list of suitable USDT perpetual futures pairs on Binance. Some things to look for when choosing pairs would be:

cryptos = [
    "BTCUSDT", "ETHUSDT", "LINKUSDT", "BNBUSDT", "TRXUSDT", "DOTUSDT", "ADAUSDT", "EOSUSDT", "LTCUSDT",
    "BCHUSDT", "XRPUSDT", "ETCUSDT", "FILUSDT", "EGLDUSDT", "DOGEUSDT", "UNIUSDT", "THETAUSDT", "XLMUSDT",
    "SOLUSDT", "FTMUSDT", "SANDUSDT", "MANAUSDT", "AVAXUSDT", "GALAUSDT", "MATICUSDT", "NEARUSDT",
    "ATOMUSDT", "AAVEUSDT", "AXSUSDT", "ROSEUSDT", "XTZUSDT", "ICXUSDT", "ALGOUSDT", "RUNEUSDT",
    "APEUSDT", "VETUSDT", "ZILUSDT", "KNCUSDT", "XMRUSDT", "GMTUSDT", "OPUSDT", "ENSUSDT", "CHZUSDT", "APTUSDT"
]

BTC/USDT + ETH/USDT => 2 largest market cap crypto historically exhibiting strong correlation
ETH/USDT + LINK/USDT or DOT/USDT => all play roles in the smart contract and decentralized application space
BCH/USDT + LTC/USDT => Both bitcoin cash and litecoin are bitcoin forks and have similarities in use case and market perception
UNI/USDT + SUSHI/USDT => DeFi pairs
XMR/USDT + ZEC/USDT => Category specific coins (privacy)

We want to regularly re-evaluate because just because a pair is cointegrated in one period, doesn’t mean it will remain so.

Step 2: Correlation

Once we’ve selected some possible pairs, I want to first check for correlation amongst our pairs. I first want to get an intuitive understanding how two pairs may move together and to filter out any we know uncorrelated pairs.

Quick recap of correlation. Just measures the strength and direction of the linear relationship between 2 variables. It tells us if one variable increases the other tends to increase (positive) or decrease (negative). If the two variable do not show any consistent pattern together, their correlation is close to zero.

The correlation equation is defined as:

$$ r = \frac{n(\Sigma xy) - (\Sigma x)(\Sigma y)}{\sqrt{[n\Sigma x^2 - (\Sigma x)^2][n\Sigma y^2 - (\Sigma y)^2]}}

$$

$$ n \text{ is the number of paired data points.} \\ \Sigma xy \text{ is the sum of the products of paired scores.} \\ \Sigma x \text{ and } \Sigma y \text{ are the sums of the } x \text{ scores and } y \text{ scores, respectively.} \\ \Sigma x^2 \text{ and } \Sigma y^2 \text{ are the sums of the squares of the } x \text{ scores and } y \text{ scores, respectively.}

$$

Here’s a correlation matrix of some of the top pairs of the last month on hourly bars.

Screenshot 2023-09-18 at 3.05.29 PM.png

So we’ve found our list of correlated pairs that we want to explore. Now onto the cointegration test.

Step 3: Cointegration

In the context of financial markets, if one assets prices increases for a bit while the other decreases, they will eventually come back to a relationship that remains relatively constant overtime.

It refers to the situation where two or more time series move together over time, even if they are non-stationary (i.e., their means and variances change over time).