Algorithmic trading in less than 100 lines of Python code
5 stars based on
Michael Halls-Moore, founder of QuantStart. If you would like to watch the video of Michael's presentation, you can here. The post is suitable for those who are beginning quantitative trading as well as those who have had some experience with the area. The post discusses the common pitfalls of backtesting, as well as some uncommon ones! It also looks at the different sorts of backtesting mechanisms as well as the software landscape that implements these approaches.
Then we discuss whether it is worth building your own backtester, even with the prevalence of open source tools available today. Finally, we discuss the ins-and-outs of an event-driven backtesting system, a topic that I've covered frequently on QuantStart in prior posts.
That is, if we define a set of mechanisms for entry and exit into a portfolio of assets, and apply those rules to historical pricing data of those assets, we can attempt to understand the performance of this "trading strategy" that might have been attained in the past. It was once said that "All models are wrong, but some are useful". The same is true of backtests.
So what purpose do they serve? Backtests ultimately help us decide whether it is worth live-trading a set of strategy rules. It provides us with an idea of how a strategy might have performed in the past. Essentially it allows us to filter out bad strategy rules before we allocate any real capital.
It is easy to generate backtests. Unfortunately backtest results are not live trading results. They are instead a model of reality. A model that usually contains many assumptions. There are two main types of software backtest - the "for-loop" and the "event-driven" systems. When designing backtesting software there is always a trade-off between accuracy and implementation complexity. The above two backtesting types represent either end of the spectrum for this tradeoff.
There are many pitfalls associated with backtesting. They all concern the fact that a backtest is just a model of reality. Some of the more common pitfalls include:. There are some more subtle issues with backtesting that are not discusssed as often, but are still incredibly important to consider. Much has been written about the problems with backtesting. Tucker Balch and Ernie Chan both consider the issues at length. A For-Loop Backtester is the most straightforward type of backtesting system and the variant most often seen in quant blog posts, purely for its simplicity and transparency.
Essentially the For-Loop system iterates over every trading day or OHLC barperforms some calculation related to the price s of the asset ssuch as a Moving Average of the close, and then goes long or short a particular asset often on the same closing price, but sometimes the day after.
The iteration then continues. All the while the total equity is being tracked and stored to later produce an equity curve. As you can see the design of such a sytem is incredibly simple. This makes it attractive for getting a "first look" at the performance of a particular strategy ruleset.
For-Loop backtesters are straightforward to implement in nearly any programming language and are very fast to execute. The latter advantage means that many parameter combinations can be tested in order to optimise the trading setup. The main disadvantage with For-Loop backtesters is that they are quite unrealistic. They often have no transaction cost capability unless specifically added.
Usually orders are filled immediately "at market" with the midpoint price. As such there is often no accounting for spread. There is minimal code re-use between the backtesting system and the live-trading system.
This means that code often needs to be written twice, introducing the possibility of more bugs. For-Loop backtesters are prone to Look-Ahead Bias, due to bugs with indexing. For-Loop backtesters should really be utilised solely as a filtration mechanism. You can use them to eliminate the obviously bad strategies, but you should remain skeptical of strong performance.
Further research is often required. Strategies rarely perform better in live trading than they do in backtests! Event-Driven Backtesters lie at the other end of the spectrum. They are much more akin to live-trading infrastructure implementations. As such, they are often more realistic in the difference between backtested and live trading performance. Such systems are run in a large "while" loop that continually looks for "events" of differing types in the "event queue". When a particular event is identified it is routed to the appropriate module s in the infrastructure, which handles the event and then potentially generates new events which go back to the queue.
As you can see there is a heavy reliance on the portfolio handler module. Such a module is the "heart" of an Event-Driven backtesting system as we will see below. While the advantages are clear, there are also some strong disadvantages to using such a complex system:. In this section we will consider software both open source and commercial that exists for both For-Loop and Event-Driven systems.
There are plenty of code snippets to be found on quant blogs. A great list of such blogs can be found on Quantocracy. The expensive commercial offerings include Deltix and QuantHouse.
They are often found in quant hedge funds, family offices and prop trading firms. Cloud-based backtesting and live trading systems are relatively new. Quantopian is an example of a mature web-based setup for both backtesting and live trading. Institutional quants often also build their own in house software. In terms of open source software, there are many libraries available.
One of the most important aspects, however, is that no matter which piece of software you ultimately use, it must be paired with an equally solid source of financial data.
Otherwise you will be in a situation of "garbage in, garbage out" and your live trading results will differ substantially from your backtests. While software takes care of the details for us, it hides us from many implementation details that are often crucial when we wish to expand our trading strategy complexity.
At some point it is often necessary to write our own systems and the first question that arises is "Which programming language should I use? Despite having a background as a quantitative software developer I am not personally interested in "language wars".
There are only so many hours in the day and, as quants, we need to get things done - not spend time arguing language design on internet forums! Python is an extremely easy to learn programming language and is often the first language individuals come into contact with when they decide to learn programming.
It has a standard library of tools that can read in nearly any form of data imaginable and talk to any other "service" very easily. While it is great for ML and general data science, it does suffer a bit for more extensive classical statistical methods and time series analysis. It is great for building both For-Loop and Event-Driven backtesting systems. In fact, it is perhaps one of the only languages that straightforwardly permits end-to-end research, backtesting, deployment, live trading, reporting and monitoring.
However, work is being carried out to improve this problem and over time Python is becoming faster. R is a statistical programming environment, rather than a full-fledged "first class programming language" although some might argue otherwise!
It is widely used for For-Loop backtesting, often via the quantmod library, but is not particularly well suited to Event-Driven systems or live trading.
It does however excel at strategy research. This is its primary advantage. Unfortunately it is painful for carrying out strategy research. Due to being statically-typed it is quite tricky to easily load, read and format data compared to Python or R. You may also wish to take a look at Java, Scala, CJulia and many of the functional languages. It is a great learning experience to write your own Event-Driven backtesting system.
Firstly, it forces you to consider all aspects of your trading infrastructure, not just spend hours tinkering on a particular strategy. Even if you don't end up using the system for live trading, it will provide you with a huge number of questions that you should be asking of your commercial or FOSS backtesting vendors. While Event-Driven systems are not quick or easy to write, the experience will pay huge educational dividends later on in your quant trading career.
They are all written in Python due to the reasons I outlined above and thankfully Python is very much like reading pseudo-code. That is, it is very easy to follow. I've also written many articles on Event-Driven backtest design, which you can find herethat guide you through the development of each module of the system.
Rob Carver, at Investment Idiocy also lays out his approach to building such systems to trade futures. Remember that you don't have to be an expert on day 1. You can take it slowly, day-by-day, module-by-module. If you need help, you can always contact me or other willing quant bloggers. See the end of the article for my contact email. I'll now discuss the modules that are often found in many Event-Driven backtesting systems.
While not an exhaustive list, it should give you a "flavour" of how such systems are designed. This is where all of the historical pricing data is stored, along with your trading history, once live. Ideally, we want to obtain and store tick-level data as it gives us an idea of trading spreads. It also means we can construct our own OHLC bars, at lower frequencies, if desired.