Reward to Pain in Trading System Design

It typically takes some experience to recognize the difference between risk and pain in an investment scenario. As it turns out, pain is remarkably easy to define and estimate. In effect, you define it with some measure of accuracy as you live it. You come to know it very, very well. It is personal.

On the other hand, risk is generally a much more amorphous thing, often ambiguous, and in many cases, an estimate of risk is as likely to be flawed and unreliable as it is to be useful. Such is especially true when a complex portfolio is being addressed or entities are being held intermediate to long term where fat tail events are likely to occur.

In constructing a trading system, a money-management layer will often seek to define or set bounds upon the expected risk. Such a layer includes portfolio management, position sizing, the use of various types of stops, and multiple trade scenarios used to attenuate risk. It is anything but simple. And the general consensus we discovered in the course of our work is that even some of the largest firms managing billion dollar funds did not themselves believe or trust the risk management systems they had in place.

Long before the system designer addresses the complex issues of money management, a far more fundamental choice has to be made in the analysis and optimization phase of designing a signaling system. What is the actual metric or metrics one will measure and potentially optimize? While most would agree that the time has long passed for optimizing overall return with no concern for the risk or pain one may subsequently experience, there is very little agreement on exactly how that pain is best accounted.

Experienced traders speak of sleep at night thresholds and choices, portfolio heat indices, and other concepts associated with the investor or trader's mental and emotional well-being. Defining pain meaningfully and accurate is important.

In practical terms, we've found three primary ways for estimating the pain of trading and investing for use in an analytically viable optimization procedure: one can define pain by volatility, by drawdowns, or by retracement. We will start with the one most often used, volatility.

The Sharpe and Sortino Ratios

The Sharpe ratio is probably the best known reward to risk metric. Its formula is simple for an expectation of future return based upon historical performance:

There are just three terms. R_{i} is the expectation of return for the target time series, whether
it be an underlying price curve or a traded equity curve, and whether it be a single instrument or a portfolio.
It is usually some historical rate of return. R_{rf} is the risk-free rate of return, what one
could realistically expect from a zero-risk fixed return investment. sigma_{i} is the standard
deviation of the returns of the instrument, a measure of volatility.

The Sortino ratio is nearly identical to the Sharpe, except the sigma consists of only the downside price movements.

The premise is that more volatile instruments present a greater risk. These ratios are dimensionless, and thus somewhat less intuitive than a rate of return. A Sharpe ratio of 1.0 essentially states that the excess return above the risk-free rate is equal to the historical volatility. If the risk free rate is 5% and the actual annual return is 20%, there is a 15% excess rate of return. If the historical volatility, annualized, is 15%, then one sees a Sharpe ratio of unity. One can realize a 2x improvement in the Sharpe's reward to risk by trading an entity with half the volatility and realizing the same return, or doubling the return on that entity or another with the same volatility, or by some combination of each. A Sharpe ratio of 3 is considered a practical upper limit.

There are variants of these ratios that use a third return from the overall market to generate an adjusted percent return that can be somewhat easier to interpret, but the issues we will address apply to all such volatility-based metrics.

What traders are often concerned with is the d(Sharpe), or the d(Sortino), the difference between the ratio computed from the underlying instrument's price curve and that of the equity curve derived from actively trading that instrument. In such an instance, we are less interested in an absolute metric. It is for this reason that many system designers use simplified formulae that set the risk-free rate of return to zero.

One would hope to build a system whose active management of trades either increases the return, or reduces the volatility, or both, resulting in an improved Sharpe ratio. The magnitude of that improvement is one measure of the efficacy of a given trading system.

If, for example, a blind walk-forward equity curve from a trading system produces a Sharpe ratio of 2, and the underlying buy and hold for the same period has a Sharpe of 0.5, we can know that the trading system, for this reserved data period, improved the return to pain by a factor of 4, assuming we choose to equate trading pain with an instrument's volatility.

Volatility is a Poor Estimate of Pain

The list of drawbacks associated with using a Sharpe-type metric for optimizing the signaling in a trading system is considerable. Volatility, whether the two-sided Sharpe or the one-sided Sortino, is typically computed from the standard deviation of daily closing price deviations. One is thus measuring the second moment of a density of daily closing price differences. The higher moments, the skew (read, trend) and kurtosis (read, fat tail events) are often far more significant in terms of producing unexpected pain, or conversely, unexpected gains.

The following is a 250-day EDV (enhanced density visualization) plot of the 20-day closing price differences of eight different securities. What one most observes in EDV plots is how very seldom one encounters a normal density. This plot includes AAPL and four tech sector securities which show a high fat tail on the upside when comparing the probability of a -10% move with that of a +10% move, and three that show the opposite behavior.

If you look carefully at the densities in the plot, and assume a violet central region of color as representing the approximate volatility of each, you will readily see that such an estimate is quite incapable of catching the positive and negative fat tail price movements which often determine the gains or losses in a trading system.

Robust Returns

The traditional implementation of the Sharpe and similar metrics use a two-point estimate of the return. We suspect most readers are savvy enough to know that almost any entity can be represented as favorable by selecting starting and ending dates that just happen to show a good measure of equity growth. The reward component of any useful ratio must mitigate this sensitivity. A robust estimate of the reward is needed.

The best way to compute a robust trend is to fit the price data to a mathematical model so that every point factors into the trend or cumulative growth that is estimated. This removes much of the sensitivity to the starting and ending dates.

Curtis Faith, in his 2007 "Way of the Turtle", writes extensively about robust statistics. He calls this more robust estimate of return the RAR%, the regressed annual return. While this would seem a straightforward statistical procedure of fitting a straight line model to price data, it is in practice not so simple.

The growth in a compounded interest fixed instrument does not increase linearly, but exponentially, however linear it may appear in its early stages. There are companies whose growth curve is approximately exponential. These are the entities where you often need a log scale to see a long-term price history. For such entities, the inflow of investment funds and profits fuel additional growth in new businesses and products. For other companies, the trend is more linear, especially once the dynamics of that growth begin to wither.

In our work, we compute a robust CAGR% for our estimate of the reward. We thus estimate an annual rate of return, which we assume to be compounded each trading day. The estimate of the robust trend will thus be most accurate for entities that are most efficiently growing, as it is assumed these are the best entities for a candidate trading pool. The approach we've chosen will somewhat underestimate the trend in entities that show little growth.

When the funds flowing into an entity are used to efficiently fuel further growth, a statistician would typically fit a non-linear exponential model to the data and estimate a first order rate constant. We effectively do the same thing by a transform and linear fit, but instead of reporting a mathematical parameter, we use the model to estimate an expected value for the start and end dates, in effect smoothed estimates. Those in turn are used to compute the robust CAGR%.

In other words, the estimation is used only to generate smoothed estimates at the start and end dates and these values are used in place of the actual prices or equity values in a standard CAGR computation.

With an effective estimate of the robust trend for the numerator of a reward to pain ratio, we need only look for a better estimate of the pain. We will now look at the second approach, the R³.

The R³ Ratio

This is the solution suggested by Curtis in his book to generate a meaningful reward to pain ratio. We have found it to be light years more useful than any form of Sharpe or Sortino, including their robust variants (when a robust trend is used in the numerator).

So how does one define pain in a more practical and robust way?

Instead of equating pain with some estimate of the variance or scatter in the price movements, R³ looks at drawdowns and their duration as an estimate of pain. The formula for the pain component is quite simple:

Pain = Avg. Drawdown(5) in % * Avg. Duration(5) in yrs

The pain is the average drawdown in %, of the 5 worst drawdowns, multiplied by the average duration of the 5 worst drawdowns, as measured in years.

The idea is that pain is best defined by drawdowns. This is the truth for long term investors and trend following traders. By averaging the five worst drawdowns, one is somewhat forgiving of a one-off event. Five drawdowns, all between 30-40% would be treated as significantly greater pain than one drawdown of 40% and four of less than 20%.

For trading or investing scenarios where patience is a factor, the length of a drawdown is a major issue in the pain. A 40% drawdown that has fully recovered in six months is very different from a 40% drawdown that takes two years to come back. Because the normalization in R³ is done in years, the estimated pain is increased for each day the average drawdown is greater than one year, and decreased for each day the average drawdown is less than one year.

One criticism of R³ is that it is ad-hoc. Why five drawdowns? Why not four, or six? There is no specification of the amount of historical data other than to suggest more is better than less. Clearly a ten year history gives more time for a sharp drawdown event to occur than half as much time. Include the financial meltdown of 2008-2009 and the Internet bubble burst of 2000-2002 and in most cases, you are assured two very large drawdowns. Exclude those times and R³ typically looks far rosier.

We attempted to make R³ more consistent by using the long established rule of thumb that two years were needed for a Sterling ratio to make sense. It uses the worst drawdown in the period to define the pain. We thus defined R³ as the average of the n-worst drawdowns, where n was ½ the count of years included in the historical data. If there was six years of data available, we would average only the three worst drawdowns. If we looked at twelve years of data, we would average six. This approach made it somewhat possible to roughly compare entities with differing amounts of history.

If we look at the R³ computation of AAPL on a retracement plot, we see the five worst drawdowns in the past ten year period. Their depths (shown here as fractional) and their durations are immediately apparent.

One obvious issue is seen with this visualization. If any one of the major drawdowns was to return to just about the previous level, but remained shy of it, and ran just shy of it for a long duration, the very deep box might continue for an extended time. It if just happens to touch the previous level, the box closes. It is thus easy to imagine a very small difference producing a distinctly different R³ value. That lack of robustness bothered us. A trading system that closed out such a drawdown, but otherwise performed less effectively, would present a higher R³.

R³ can be punishing. If we look at the S&P index, we see that the most recent R³ box dates back to 2007. Its depth is maximum drawdown at the worst of the financial crisis, close to -60%. While one could readily argue that this event is still in play, it is hardly so to the extent it was in 2009.

The RRt Ratio

The solution we ultimately chose for our trading system optimizations was RRt, a reward to retracement ratio. This was our own design choice, and in many ways little more than a simple extension of the R³ visualization. Clearly there is some measure of pain if one is in a drawdown state, but not in one of these five boxed periods. And how appropriate is it so say that the pain is defined by the greatest depth and breadth that defines such a box instead of the actual profile of the retracement?

RRt reduces the problem of pain to its most basic; in fact, it is utterly simple. It is both intuitive and obvious to compute the area beneath the retracement's zero line.

Each zero is an all-time high, each excursion down from zero a retracement from that all-time high. If an entity is trending long term, there will be a series of new all-time highs. If we treat each trading day equally, no volume weighting, the RRt is simply the average retracement. The computation of pain could not be simpler.

RRt does not resolve the issue of comparing different entities with differing amounts of historical data. Like the probability of a sharp drawdown, the longer the data sampling, the greater the opportunity for a deep retracement. Still, the very fact that every day is accounted and averaged probably gets one as close as one is likely to come in such a metric.

RRt is thus this very simple formula:

RRt = Robust CAGR / Avg. Retracement

The numerator is the robust annual compound growth or trend we compute from the fit of the price data. The denominator is the average retracement for the same time period. If the CAGR is a %, so must the retracement be. They could equally be both expressed as fractional values. Like the Sharpe, Sortino, Sterling and other similar ratios, it is dimensionless.

An RRt of 1 means that the % return one would expect, on average, is equal to the % retracement one would expect, again on average. If we know we have a robust 15% CAGR and an RRt of 1, we know we can expect to average about a 15% level in the red from the last all-time high. There will be times when one is near an all-time high, and the pain will be close to zero. There will be times when one experiences the full depth of a harsh drawdown, and we would expect that to be well above a 30% point. We don't expect to be there long, however, since we average a comfortable 15% level historically, and we hope to see that behavior continue into the future. We would be far more uncomfortable if the average retracement were 30% or higher.

Let's say there is a one-off event where the worst drawdown reaches 80%: an entity retraces to just 1/5 its highest value. How important is that -0.8 retracement? Should one design a trading system around a one-off event? If we are more forgiving, we can take the approach that every day matters just as was done in the computation of volatility. We can define each day's pain as the extent to which we are off from the last peak attained in price. And if you ponder for just a moment, isn't that exactly what we deem trading pain to be in long term investing or trading the growth in an entity?

RRt is simple, easy to compute, and so general it is about as close as we can imagine coming to a basic property of a time series. The reward is the long term trend, accurate and robust, the pain is the retracement one lives with most of the time, sometimes smaller, sometimes greater, but always there.

Our experience suggests that RRt is a better criterion to use for trading system design optimizations. It is very close to optimizing for a fundamental property of the time series. We want to trade entities that offer the greatest possible robust long term trend with the least average retracement, and we want to construct our trend following trading systems or investment decisions to mine as much of that long term growth as possible, and to minimize the extent to which one is under water from those all-time high peaks.

You should also find your optimization response surfaces to be more consistent using RRt. The favorable parameter spaces should be better defined, and the optimization cliffs sometimes seen with R³ and Sterling ratios may well be shown to be artifacts of the optimization metric, not an intrinsic weakness within the signaling system.