We’re less than a month away from the IFTA International Conference 2023, which will be held in Jakarta, Indonesia (you can find all the details here), and I’d like to provide a small preview of the genetic algorithms theme. In the presentation, we’ll start from here to build an evolutionary architecture for periodic asset allocation across an entire basket of stocks and ETFs. Key ingredients will include the addition of the “Instability Factor” and the proper composition of the “Genetic Pool”. Enjoy the read!
In this article, I aim to explore the project parameters of a genetic trading system to highlight certain dynamics that are often crucial during the design phase. The objective is to link the complexity and the number of elements in the genetic pool to the diversity and robustness of the end products.
To begin, we will examine the daily historical data of SPY (an ETF tracking SPDR S&P500) and process it within the Python version of the Akira engine (one of the open-source codes provided in the ‘Machine Learning Academy‘ program, focused on the application of Artificial Intelligence in Quantitative Analysis).
In line with a bullish underlying trend, we begin to analyze the instrument in search of statistical inefficiencies that can be repeatedly exploited in this asset.
At this point, we isolate a single element in the Genetic Pool: the Average Price, which is the arithmetic mean of open, high, low, and close prices. We allow the system to look back a maximum of 5 bars (MAX_OFFSET = 4), resulting in just 20 rules.
At this point, we proceed with a simulation from 2000 to 2019 as the In-Sample period (where the machine can only train within this timeframe) and 2020-2023 as the Out-of-Sample period (where the machine will test the patterns identified in the In-Sample period but cannot model its behavior accordingly).
We enter the market long at the opening of the next bar following a setup, purchasing $10,000 worth of SPY:
CAPITAL = $10,000
We exit the trade on the sixth opening after entry:
TIME_EXIT = 5
As the Fitness Function, we choose the ratio between Profit and MaxDrawDown:
FITNESS_FUNCTION = “Profit/MaxDrawDown”
We set a population of 100 virtual investors and 100 generations (resulting in 10,000 combinations):
POPULATION_SIZE = 100
NUM_GENERATIONS = 100
We enforce a single chromosome of 3 rules in the DNA of each investor:
DNA_SIZE = 3
We save the top 30% of the population at each generation:
BEST_DNA_RATIO = 0.3
We apply the Crossover algorithm to 30% of the population:
CROSS_DNA_RATIO = 0.3
We apply the Mutation algorithm to 40% of the population with a probability of 20%:
MUTATION_PROB = 0.2
In this version of the Akira engine, we allow for overlapping trades (we do not need to wait for the end of one trade to open another). This is done to verify the raw effectiveness of an entry condition.
We require a minimum of 1,000 trades in the In-Sample period per organism (to ensure statistical significance):
MIN_OPERATIONS = 1000
In Generation 0, we find the following equity curve, which pertains to the best individual among the 100 individuals in the entire population:
generation 0 : 5.1307042175270565 avgprice(2) > avgprice(1) and avgprice(2) > avgprice(0) and avgprice(1) > avgprice(0)
A simple rule like the one with three genes that we just identified appears to provide an advantage even in the period following 2020 (to the right of the dashed red line, we have the Out-of-Sample period).
In Generation 3 (after 400 combinations), we obtain a new organism that improves performance slightly in the In-Sample period but alters its behavior in the Out-of-Sample period:
generation 3 : 5.149406673214812 avgprice(3) > avgprice(1) and avgprice(2) > avgprice(0) and avgprice(1) > avgprice(0)
What has changed within the pattern is the shift from depth 3 to depth 4.
In Generation 8, convergence reaches its peak (from that point on, we are unable to improve the fitness function):
generation 8 : 5.949611461933047 avgprice(3) > avgprice(1) and avgprice(3) > avgprice(0) and avgprice(1) > avgprice(0)
The reference ratio (fitness function) has increased from 5.15 to 5.95. The dynamics in the Out-of-Sample period have remained positive from the beginning (it seems we have found an interesting thread).
At this point, let’s take a look at the set of curves related to the best representatives of each generation (we will only visualize the representatives that have shown improvement from the previous generations):
All in one genetic family (as can also be verified from the pattern formulas). Note how limited the differences are: this is due to the low genetic complexity of the project (only 3 genes, a genetic pool of only 20 total rules, average price as the sole working metric).
At this point, let’s try to enrich the genetic components to see how it translates into the final results. To do this, we include the following elements in the genetic pool:
avgprice = (close + open + low + high) / 4
medprice = (low + high) / 2
medbodyprice = (close + open) / 2
The total rules, with the same lagging, increase from 20 to 1127.
We relaunch the simulation with the same parameters and analyze what happens from generation to generation:
The best representative of Generation 0 appears less appealing than before, and this is due to the increase in possible combinations: it is more challenging to obtain something usable at the initial stages with only 100 combinations (although not impossible in absolute terms).
generation 0 : 0.9646324017850788 high(4) > low(4) and open(3) > medprice(4) and medprice(1) > avgprice(0)
The elementary genes make use of more diverse elements, and yet, there is a clear positive dynamic in the equity curve that continues into the Out-of-Sample period.
In Generation 3 (the fourth in total), we transition to a much more appealing self-solution, both in terms of metrics and visually. However, the period of distress in March 2020 (Covid19) is clearly visible.
generation 3 : 7.616502376543235 open(0) > close(0) and medbodyprice(4) > low(4) and open(2) > avgprice(1)
We have gone from a fitness value below 1 to 7.61. This value means that using this strategy, we expect a seven-to-one ratio, meaning for every $7.61 gained, we expect to lose one dollar.
Also, note how the setup formula has changed but still consists of 5 elementary traits: open, close, medbodyprice, low, avgprice.
In Generation 11, things improve even further. The variance decreases, and we reach a ratio of 7.89.
generation 11 : 7.897308288202666 open(0) > close(0) and medbodyprice(4) > low(4) and low(3) > close(0)
From a metric perspective, both the Profit Factor and the Kestner Ratio (a metric related to the smoothness of the curve) show significant improvement.
The last improvement is recorded in Generation 97 when we reach a fitness value of 14.54.
generation 97 : 14.54777099679234 open(0) > avgprice(0) and high(2) > open(2) and low(3) > close(0)
At this point, let’s view the entire swarm of curves related to the best representatives of each generation that have shown improvement compared to the previous generations.
The fact that we start from a benchmark curve that is already strongly positive should not be misleading. Improvement is evident when following the curves from the bottom to the top (and we are not only talking about profit but also about almost all comparison metrics).
In this case, we are still talking about the same family of curves (from a genetic standpoint). What stands out is how the swarm of curves has expanded, which is attributed to the enrichment of the genetic pool.
The evolution curve demonstrates that, compared to the previous case, evolutionary leaps have occurred more evenly and not just at the beginning of the process.
What happens if we double the number of genes within the DNA, for example, from 3 to 6?
Let’s restart the simulator and freeze the starting generation.
generation 0 : 0.0 avgprice(1) > low(0) and medbodyprice(4) > medprice(2) and close(4) > medbodyprice(2) and medbodyprice(2) > avgprice(4) and high(3) > medbodyprice(0) and high(3) > avgprice(0)
The variety has increased exponentially and has not allowed us to start off on the right foot. Also, note the fitness value of 0; this is because the proposed strategy fails to reach the minimum required 1000 trades.
Let’s continue with the evolution: we have to wait until generation 5 to see a significant improvement.
generation 5 : 0.2958310395778596 high(1) > low(1) and medbodyprice(4) > low(0) and high(2) > close(1) and low(3) > close(1) and medprice(2) > open(0) and high(4) > open(1)
We still observe areas of significant drawdown both in the In-Sample and Out-of-Sample periods. In the next generation, things improve significantly in this regard (particularly note the negative spike in March 2020, which is considerably smaller).
generation 6 : 2.988439098069271 high(1) > low(1) and medbodyprice(4) > low(0) and high(2) > close(1) and close(4) > medbodyprice(2) and medprice(2) > open(0) and high(4) > open(1)
However, already in Generation 13, we witness an unexpected development: despite an improvement in the In-Sample performance (fitness increases to 2.98), we observe a deterioration in the Out-of-Sample performance.
generation 13 : 3.830350212378348 low(3) > close(1) and avgprice(2) > medprice(0) and high(2) > close(1) and low(3) > close(1) and medprice(2) > open(0) and medbodyprice(3) > low(3)
The machine has likely started to model the noise, finding itself in that borderline area between fitting and overfitting.
Things worsen even further (in the Out-of-Sample) in Generation 25.
generation 25 : 5.803365988467282 high(3) > low(3) and avgprice(2) > medprice(0) and high(2) > close(1) and low(3) > open(0) and close(4) > low(0) and medprice(2) > medbodyprice(1)
The fitness function has increased from 3.83 to 5.80.
In Generation 89, we have the last improvement in the In-Sample performance, from 5.80 to an impressive 10.09.
generation 89 : 10.09139869030955 medprice(4) > low(1) and avgprice(2) > medprice(0) and avgprice(2) > medprice(1) and close(4) > close(1) and medprice(3) > medbodyprice(0) and medprice(2) > close(0)
The final solution closely resembles that of the previous simulation; however, we encountered a greater tendency to model noise (which is inherently dangerous for the future performance of the strategy). This occurred due to the enrichment of the genetic pool and the increased number of potential combinations associated with the doubled number of genes.
This article is just the starting point in the discussion of evolutionary algorithms in the context of trading and investing. During the speech I will deliver at the upcoming IFTA International Conference in Jakarta, Indonesia, from October 5th to 7th, 2023, we will extend these concepts towards the development of an architecture that genetically manages the periodic asset allocation of an entire basket of assets. We will discuss the importance of incorporating ‘instability’ algorithms and how this can be applied to both baskets of individual stocks and ETFs. We will delve into the details by adding complexity to the genetic pool and monitoring the metric response in terms of not only performance but also consistency.
It is still possible to purchase tickets to attend in person, alongside many speakers from around the world (you can find the complete program and additional details here).
Founder – Head of R&D