Analysis metrics and walkforward fitness functions

There are many analysis metrics and fitness functions available in MultiWalk. Analysis metrics mathematically describe all or part of an equity curve, such as net profit made, maximum drawdown for the equity time period period, etc. Fitness functions also use these same analysis metrics, but are used for the walkforward in-sample selection process. In that context, the metrics are applied to the in-sample portion of the equity curve of every single optimization iteration. Then the “best” iteration is selected based on the outcome of the analysis metric for that in-sample time period. In this way all fitness functions are analysis metrics. However, not all analysis metrics, in MultiWalk, are fitness functions.

The below list describes the analysis metrics in MultiWalk.

Unless otherwise noted, the metrics are calculated on the daily mark-to-market trade results at end-of-day. In this context, end-of-day is defined as the close of the last bar before midnight. A new trading day is determined when a new bar closes past midnight. This method of tracking trade profit and loss is also called “mark-to-market”. This means that the trade PNL is tracked as though the trade were closed on each day at a particular time (“marked”) using the current trade’s gain or loss at that time (“market”). This gives a much better idea of a trade’s behavior rather than just using the close of the trade PNL (one end-point instead of several “marked” end-points throughout the trade’s life).

An asterisk after the metric’s name indicates that it is also available as a walkforward fitness function.

Net Profit^*
Overall profit (or loss) less defined commissions and slippage during trading period.

Monte Carlo
Monte Carlo analysis is a means to stress-test the trading results by randomly “mixing up” the daily equity fluctuations into different outcomes. Any MC score of 0.7 or greater in MulitWalk should be investigated further.

Kevin Davey provides a spreadsheet for Monte Carlo analysis as part of his Strategy Factory workshop. MultiWalk is based on the algorithm that Kevin used in this spreadsheet. Both compute 2500 randomized distributions. Kevin requires an MC return/drawdown ratio of 2.0 in order to accept a strategy.

It is very difficult to translate MultiWalk’s MC return/drawdown ratio to Kevin’s Monte Carlo spreadsheet. MultiWalk uses daily mark-to-market equity balances at the end of the day to determine the MC return/drawdown ratio. Kevin’s spreadsheet uses closed trades. This one primary difference means that MultiWalk’s MC will usually be lower than Kevins, but can still be a passing strategy. Since MultiWalk is using more data for the analysis, it is much harder to predict how that MC will compare to Kevin’s spreadsheet using closed trades.

Both my and Kevin’s approach are similar in that they randomly resample from the full data pool for each selection value (mark-to-market position profit for MultiWalk, closed trade position profit for Kevin’s spreadsheet). This is called resampling with “replacement”. Resampling with replacement means that some net profit values may appear more than once in any given Monte Carlo iteration while others may not appear at all.

Here are the important differences between the two approaches:

Kevin’s spreadsheet uses trade results, MultiWalk uses daily mark-to-market equity balances at the end of each day.
Kevin’s spreadsheet does not have a fixed starting equity. Margin requirement tends to be used as the starting equity. MultiWalk uses a fixed starting equity of $25,000 and equity quit-trading point of $5,000.
Kevin’s spreadsheet computes an MC score until a 10% risk of ruin is reached. He requires this score to be 2.0 or greater for a strategy to be considered tradeable. MultiWalk does not repeat for different risk of ruin scores, but rather computes one MC score for each 2500 iterations.

R^2^*
R^2 (“R-squared”) is the least squares “best fit” regression line of your equity curve. The value of 1 is a perfect fit and a value of zero is no fit – the data is equally dispersed around the line. The closer the value is to 1, the better. Mathematically, R^2 is usually positive, whether the slope is upward or downward. However, in order to make a distinction between upward (a climbing equity curve) and downward (a falling or losing equity curve), a negative sign is assigned to R^2 for declining equity curves.

Consider R^2 a descriptive metric only. Testing has shown that it has very little, if any, predictive value as a fitness function.

Walkforward Efficiency^*
This is how “efficiently” the OOS results of the walkforward periods match the IS walkforward periods. A value of 100% means that OOS periods performed exactly like the IS periods. Specifically, WF efficiency is the annualized rates of return for the out-of-sample results divided by the in-sample results.

Percent Time In Market^*
This is the percent of time that the strategy is in a trade compared to the actual times the particular market is open for trading. MultiWalk keeps tracks of how many minutes any one trade is in the market, giving a very precise indication of the overall total amount of time the strategy was in the market.

Maximum Drawdown (Max DD)^*
This is the maximum dollar drawdown for the time period.

Net Profit / Maximum Drawdown (NP/Max DD)^*
This is the risk:return ratio based on maximum drawdown. It is the way of asking the question “Am I willing to accept an occasional drawdown of X% in order to generate an average return of Y%?” It is the total net profit divided by the maximum drawdown and is a common metric to measure risk, sometimes also referred to as “Return On Account” (ROA). Anything above a 1:3 ratio is considered good.

Average Drawdown (Avg DD)^*
This is an average of the daily drawdown fluctuations. The result is a “smoothness” measurement for the equity curve. The smaller the Average Drawdown, the less volatility in the daily equity ups-and-downs. In other words, the smaller the Average Drawdown, the smoother the equity curve. The higher the Average DD, the more “jagged” or “seismic” the equity curve. A smaller Average DD will provide a less anxious, ulcer-producing ride than a large Average DD!

Net Profit / Average Drawdown (NP/Avg DD)^*
This is the risk:return ratio based on average drawdown. It is the total net profit divided by the average drawdown and assess the “smoothness” of the equity curve. Higher ratios have less drawdown fluctuations. Lower ratios are more “seismic” and jagged.

Maximun Run Up^*
RunUp is the opposite of DrawDown. It indicates the maximum peak profit run-up for the equity period.

Gross Profit^*
Gross Loss^*
Total gross profit or loss made in the equity curve time period.

Annualized Net Profit^*
Annualized Return / Average Drawdown^*
These are annualized forms of the corresponding metrics (Net Profit and NP/Avg DD). They make it possible to compare metrics across a variety of different symbols and equity curves. Recognize, however, that if you choose a small time period (or walkforward in-sample window), such as 3 months, that the annualization will be extrapolated as though it was over a 12 month period. This could lead to exaggerated results if the profit/loss was unusually high for that 3 month period.

Total Trading Days
Total number of days represented in the actual symbol data. Intraday data will usually include Sundays for markets that open Sunday evening. Daily bar data normally only include Monday through Friday. Holidays are also excluded from actual market data.

Total Calendar Days
These are the fixed number of calendar days prepresented by the equity curve. This will essentially be the number of calendar days between the first and last day of the equity curve period.

Days Profitable
Days Unprofitable
Percent Days Profitable^*
Percent Days Unprofitable
If the strategy is in a trade on any given day, the day will either be considered profitable or unprofitable based on the mark-to-market (or closed) results of the trade. These metrics are the total number of days that showed a profit or loss at the end of the day.

Trades Profitable
Trades Unprofitable
Percent Trades Profitable^*
Percent Trades Unprofitable
Based on closed trade results, these metrics are the number of trades that showed a profit or loss when the trade was closed.

Average Trade (Avg Trade)^*
This is the average dollar amount per trade.

Average Profitable Trade^*
Average Unprofitable Trade^*
These metrics consider only positive or negative trades when computing their dollar average.

Largest Profitable Trade
Largest Unprofitable Trade
Overall largest or smallest trade profit/loss.

Maximum Consecutive Profitable Trades^*
Maximum Consecutive Unprofitable Trades
These metrics indicate the maximum length of profitable or unprofitable trade runs.

Profit Factor^*
Profit Factor is gross profit divided by gross loss. The higher the profit factor, the greater your profit over loss.

TradeStation Index (TS Index M2M)^*
This fitness function metric is based on TradeStation’s Index. TradeStation Index maximizes the Net Profit and Winning Trades while minimizing Intraday Drawdown. It calculates this as:
Net Profit * NumWinTrades / AbsValue (Max. Intraday Drawdown)
MultiWalk does not calculate this on trade data, but rather uses end-of-day mark-to-market profit/loss data. Therefore it uses winning days rather than winning trades in the calculation. For MultiWalk, the calculation becomes:
Net Profit * NumWinM2MDays / AbsValue (Max. Intraday Drawdown)
This has provided the same characteristics of TradeStation’s Index, but applied to a resolution of daily data.

Net Profit x R^2^*
This is another way of modifying R^2 so that upward sloping curves are positive while negative sloping curves are negative. It, however, also adds the measurement of actual net profit. Higher net profit will yield larger values R^2 whereas lower net profit will reduce R^2, even if that R^2 was very high.

Net Profit x Profit Factor (NP x PF)^*
Similar to Highest NP x R^2, this metric uses Profit Factor instead as the multiplier.

RINA Index^*
During the late 1990’s RINA Systems developed a new ratio, called RINA Index, a trade performance measure that takes into consideration net profit, average draw down and percent time in the market.

The oringal RINA Index by RINA Systems is calculated by taking the net profit without trades that are outside of 3 sigma (standard deviations) from the average trade weighted by the average drawdown and by the percent time in the market.

RINA Index = (Net Profit – Net Profit in Outliers)/(Average Drawdown*Percent Time in the Market).

However, MultiWalk does not remove the outliers as it does not consider those abberations, but rather a valid part of the overall composition of risk.

This index is a good substitute for the ratio Net Profit / Maximum Drawdown. It gives more realistic reward/risk value for a trading performance. In addition to drawdown as an element of risk in the measurement of performance, time-in-themarket is included as another element of risk. The premise is that there is an inherent risk any time a position is established. Following this logic, the RINA Index would be higher (all other variables equal) for a system that spends less time in the market.

Generally, a system with a RINA Index of 30 or higher could be considered to have a reasonably good performance.

Sharpe Ratio^*
Sortino Ratio^*
The Sharpe Ratio and Sortino Ratio are similar in that they both measure how volatile an equity curve is relative to the risk. The main difference between the Sharpe Ratio and the Sortino Ratio is that the Sortino Ratio takes into account only the downside risk of an investment, while the Sharpe Ratio takes into account both the upside and downside risk.

Traditionally this is a measurement of an investment’s performance over a large period of time. MultiWalk applies the same concept to daily mark-to-market data. Since the resolution of data is so much smaller and often contains “zero return” days, the values will also be much smaller than the traditional use of Sharpe and Sortino ratios. Therefore you cannot apply, for example, the generally accepted Sharpe value of 1 or greater as “acceptable performance” since the ratios in MultiWalk will be much, much smaller!

MultiWalk calculates both Sharpe and Sortino ratios as presented in this CME Group document.