Spurred by reading an account of a trader who swears by machine learning, a few days ago I wrote about aesthetics in finance.

Maths and tech without a narrative is pointless.

My own attempts at providing a narrative foundered a few months back.

I started using the Sharpe as a hypothesis test.

So for example, the S&P 500 (not including dividends) has a Sharpe of ~0.462 over 65+ years.

What does that really mean though?

Little known fact! You can convert it to the probability of yearly loss.

`=tdist( sharpe * sqrt ( years ), years - 1, 1)`

In our case the Sharpe over 65 years translates to about a 0.02% chance of the index accruing a loss in any year.

In fact our Max Wait is 7 years. I.e. based on past experience we may have to wait up to 7 years to see a profit! Moreover, the S&P 500 was unprofitable about 18 years out of the 65.

## Practice and Theory Misalignment?

The t-distribution starts looking extremely like a normal distribution after a sample of 30 years.

The normal distribution requires return independence.

If you took each day's returns, and mixed them up sufficiently well, the very strong returns on average over 65 years would almost ensure being in the black every year.

So let's transform our fat tailed returns into 'normal' ones, and see how our hypothesis test performs.

Instead of mixing the returns, we will find the residual of the S&P 500 index against its SPY ETF over 20+ years, reweighting on a monthly basis.

`probLoss( sharpe = -0.026, years = 22.58 )`

I use conservative 'Big O' analytics, i.e. this is the largest negative average return the Lazy Backtest IDE could find.

The resulting probability of the ETF underperforming the index is 55%; over the last 22 years the actual count is 10 underperforming years. Anecdotal, but close enough.

## Orders of Magnitude Better

Let's see how this residual (or information ratio) approach works on a strategy.

The Realised Volatility strategy leverages up when volatility is low and deleverages when high. Reweighting on a monthly basis and Big Oh is employed.

`probLoss(sharpe = -0.1737, years = 65.42)`

Resulting in a 92% chance that the Strategy will underperform the S&P 500 in any year.

In fact it has underperformed in 'only' 60% of the last 65 years. Ballpark.

How would the usual Sharpe Hypothesis test work out? Exactly the same as the first example.

I.e. over x1,000 away from the realised results versus 0.6 times in the residual example.

## The Upshot

Sharpe ratios are used as a unit of comparison, but each Sharpe is flawed in its own special way.

For example, a Sharpe of 1 over fifty years is a completely different beast than 1 over ten years. 50 years of performance is a far more robust test, and yet by how much?

Sharpe ratios are meaningless if they cannot be used as a unit of comparison

The residual or information ratio hypothesis test approach gives us a fighting chance.

No doubt we can further improve our hypothesis test - the trick will be to do it without compromising simplicity.