воскресенье, 3 июня 2018 г.

Quantitative trading strategies in r part 1 of 3


QuantStart.


Join the Quantcademy private membership portal that caters to the rapidly-growing retail quant trader community . You'll find a knowledgeable, like-minded group of quant traders ready to answer your most pressing quant trading questions.


Check out my ebook on quant trading where I teach you how to build profitable systematic trading strategies with Python tools, from scratch.


Take a look at my new ebook on advanced trading strategies using time series analysis, machine learning and Bayesian statistics, with Python and R .


By Michael Halls-Moore on March 26th, 2013.


In this article I'm going to introduce you to some of the basic concepts which accompany an end-to-end quantitative trading system . This post will hopefully serve two audiences. The first will be individuals trying to obtain a job at a fund as a quantitative trader. The second will be individuals who wish to try and set up their own "retail" algorithmic trading business.


Quantitative trading is an extremely sophisticated area of quant finance. It can take a significant amount of time to gain the necessary knowledge to pass an interview or construct your own trading strategies. Not only that but it requires extensive programming expertise, at the very least in a language such as MATLAB, R or Python. However as the trading frequency of the strategy increases, the technological aspects become much more relevant. Thus being familiar with C/C++ will be of paramount importance.


A quantitative trading system consists of four major components:


Strategy Identification - Finding a strategy, exploiting an edge and deciding on trading frequency Strategy Backtesting - Obtaining data, analysing strategy performance and removing biases Execution System - Linking to a brokerage, automating the trading and minimising transaction costs Risk Management - Optimal capital allocation, "bet size"/Kelly criterion and trading psychology.


We'll begin by taking a look at how to identify a trading strategy.


Strategy Identification.


All quantitative trading processes begin with an initial period of research. This research process encompasses finding a strategy, seeing whether the strategy fits into a portfolio of other strategies you may be running, obtaining any data necessary to test the strategy and trying to optimise the strategy for higher returns and/or lower risk. You will need to factor in your own capital requirements if running the strategy as a "retail" trader and how any transaction costs will affect the strategy.


Contrary to popular belief it is actually quite straightforward to find profitable strategies through various public sources. Academics regularly publish theoretical trading results (albeit mostly gross of transaction costs). Quantitative finance blogs will discuss strategies in detail. Trade journals will outline some of the strategies employed by funds.


You might question why individuals and firms are keen to discuss their profitable strategies, especially when they know that others "crowding the trade" may stop the strategy from working in the long term. The reason lies in the fact that they will not often discuss the exact parameters and tuning methods that they have carried out. These optimisations are the key to turning a relatively mediocre strategy into a highly profitable one. In fact, one of the best ways to create your own unique strategies is to find similar methods and then carry out your own optimisation procedure.


Here is a small list of places to begin looking for strategy ideas:


Many of the strategies you will look at will fall into the categories of mean-reversion and trend-following/momentum . A mean-reverting strategy is one that attempts to exploit the fact that a long-term mean on a "price series" (such as the spread between two correlated assets) exists and that short term deviations from this mean will eventually revert. A momentum strategy attempts to exploit both investor psychology and big fund structure by "hitching a ride" on a market trend, which can gather momentum in one direction, and follow the trend until it reverses.


Another hugely important aspect of quantitative trading is the frequency of the trading strategy. Low frequency trading (LFT) generally refers to any strategy which holds assets longer than a trading day. Correspondingly, high frequency trading (HFT) generally refers to a strategy which holds assets intraday. Ultra-high frequency trading (UHFT) refers to strategies that hold assets on the order of seconds and milliseconds. As a retail practitioner HFT and UHFT are certainly possible, but only with detailed knowledge of the trading "technology stack" and order book dynamics . We won't discuss these aspects to any great extent in this introductory article.


Once a strategy, or set of strategies, has been identified it now needs to be tested for profitability on historical data. That is the domain of backtesting .


Strategy Backtesting.


The goal of backtesting is to provide evidence that the strategy identified via the above process is profitable when applied to both historical and out-of-sample data. This sets the expectation of how the strategy will perform in the "real world". However, backtesting is NOT a guarantee of success, for various reasons. It is perhaps the most subtle area of quantitative trading since it entails numerous biases, which must be carefully considered and eliminated as much as possible. We will discuss the common types of bias including look-ahead bias , survivorship bias and optimisation bias (also known as "data-snooping" bias). Other areas of importance within backtesting include availability and cleanliness of historical data, factoring in realistic transaction costs and deciding upon a robust backtesting platform. We'll discuss transaction costs further in the Execution Systems section below.


Once a strategy has been identified, it is necessary to obtain the historical data through which to carry out testing and, perhaps, refinement. There are a significant number of data vendors across all asset classes. Their costs generally scale with the quality, depth and timeliness of the data. The traditional starting point for beginning quant traders (at least at the retail level) is to use the free data set from Yahoo Finance. I won't dwell on providers too much here, rather I would like to concentrate on the general issues when dealing with historical data sets.


The main concerns with historical data include accuracy/cleanliness, survivorship bias and adjustment for corporate actions such as dividends and stock splits:


Accuracy pertains to the overall quality of the data - whether it contains any errors. Errors can sometimes be easy to identify, such as with a spike filter , which will pick out incorrect "spikes" in time series data and correct for them. At other times they can be very difficult to spot. It is often necessary to have two or more providers and then check all of their data against each other. Survivorship bias is often a "feature" of free or cheap datasets. A dataset with survivorship bias means that it does not contain assets which are no longer trading. In the case of equities this means delisted/bankrupt stocks. This bias means that any stock trading strategy tested on such a dataset will likely perform better than in the "real world" as the historical "winners" have already been preselected. Corporate actions include "logistical" activities carried out by the company that usually cause a step-function change in the raw price, that should not be included in the calculation of returns of the price. Adjustments for dividends and stock splits are the common culprits. A process known as back adjustment is necessary to be carried out at each one of these actions. One must be very careful not to confuse a stock split with a true returns adjustment. Many a trader has been caught out by a corporate action!


In order to carry out a backtest procedure it is necessary to use a software platform . You have the choice between dedicated backtest software, such as Tradestation, a numerical platform such as Excel or MATLAB or a full custom implementation in a programming language such as Python or C++. I won't dwell too much on Tradestation (or similar), Excel or MATLAB, as I believe in creating a full in-house technology stack (for reasons outlined below). One of the benefits of doing so is that the backtest software and execution system can be tightly integrated, even with extremely advanced statistical strategies. For HFT strategies in particular it is essential to use a custom implementation.


When backtesting a system one must be able to quantify how well it is performing. The "industry standard" metrics for quantitative strategies are the maximum drawdown and the Sharpe Ratio . The maximum drawdown characterises the largest peak-to-trough drop in the account equity curve over a particular time period (usually annual). This is most often quoted as a percentage. LFT strategies will tend to have larger drawdowns than HFT strategies, due to a number of statistical factors. A historical backtest will show the past maximum drawdown, which is a good guide for the future drawdown performance of the strategy. The second measurement is the Sharpe Ratio, which is heuristically defined as the average of the excess returns divided by the standard deviation of those excess returns. Here, excess returns refers to the return of the strategy above a pre-determined benchmark , such as the S&P500 or a 3-month Treasury Bill. Note that annualised return is not a measure usually utilised, as it does not take into account the volatility of the strategy (unlike the Sharpe Ratio).


Once a strategy has been backtested and is deemed to be free of biases (in as much as that is possible!), with a good Sharpe and minimised drawdowns, it is time to build an execution system.


Execution Systems.


An execution system is the means by which the list of trades generated by the strategy are sent and executed by the broker. Despite the fact that the trade generation can be semi - or even fully-automated, the execution mechanism can be manual, semi-manual (i. e. "one click") or fully automated. For LFT strategies, manual and semi-manual techniques are common. For HFT strategies it is necessary to create a fully automated execution mechanism, which will often be tightly coupled with the trade generator (due to the interdependence of strategy and technology).


The key considerations when creating an execution system are the interface to the brokerage , minimisation of transaction costs (including commission, slippage and the spread) and divergence of performance of the live system from backtested performance.


There are many ways to interface to a brokerage. They range from calling up your broker on the telephone right through to a fully-automated high-performance Application Programming Interface (API). Ideally you want to automate the execution of your trades as much as possible. This frees you up to concentrate on further research, as well as allow you to run multiple strategies or even strategies of higher frequency (in fact, HFT is essentially impossible without automated execution). The common backtesting software outlined above, such as MATLAB, Excel and Tradestation are good for lower frequency, simpler strategies. However it will be necessary to construct an in-house execution system written in a high performance language such as C++ in order to do any real HFT. As an anecdote, in the fund I used to be employed at, we had a 10 minute "trading loop" where we would download new market data every 10 minutes and then execute trades based on that information in the same time frame. This was using an optimised Python script. For anything approaching minute - or second-frequency data, I believe C/C++ would be more ideal.


In a larger fund it is often not the domain of the quant trader to optimise execution. However in smaller shops or HFT firms, the traders ARE the executors and so a much wider skillset is often desirable. Bear that in mind if you wish to be employed by a fund. Your programming skills will be as important, if not more so, than your statistics and econometrics talents!


Another major issue which falls under the banner of execution is that of transaction cost minimisation. There are generally three components to transaction costs: Commissions (or tax), which are the fees charged by the brokerage, the exchange and the SEC (or similar governmental regulatory body); slippage, which is the difference between what you intended your order to be filled at versus what it was actually filled at; spread, which is the difference between the bid/ask price of the security being traded. Note that the spread is NOT constant and is dependent upon the current liquidity (i. e. availability of buy/sell orders) in the market.


Transaction costs can make the difference between an extremely profitable strategy with a good Sharpe ratio and an extremely unprofitable strategy with a terrible Sharpe ratio. It can be a challenge to correctly predict transaction costs from a backtest. Depending upon the frequency of the strategy, you will need access to historical exchange data, which will include tick data for bid/ask prices. Entire teams of quants are dedicated to optimisation of execution in the larger funds, for these reasons. Consider the scenario where a fund needs to offload a substantial quantity of trades (of which the reasons to do so are many and varied!). By "dumping" so many shares onto the market, they will rapidly depress the price and may not obtain optimal execution. Hence algorithms which "drip feed" orders onto the market exist, although then the fund runs the risk of slippage. Further to that, other strategies "prey" on these necessities and can exploit the inefficiencies. This is the domain of fund structure arbitrage .


The final major issue for execution systems concerns divergence of strategy performance from backtested performance. This can happen for a number of reasons. We've already discussed look-ahead bias and optimisation bias in depth, when considering backtests. However, some strategies do not make it easy to test for these biases prior to deployment. This occurs in HFT most predominantly. There may be bugs in the execution system as well as the trading strategy itself that do not show up on a backtest but DO show up in live trading. The market may have been subject to a regime change subsequent to the deployment of your strategy. New regulatory environments, changing investor sentiment and macroeconomic phenomena can all lead to divergences in how the market behaves and thus the profitability of your strategy.


Risk Management.


The final piece to the quantitative trading puzzle is the process of risk management . "Risk" includes all of the previous biases we have discussed. It includes technology risk, such as servers co-located at the exchange suddenly developing a hard disk malfunction. It includes brokerage risk, such as the broker becoming bankrupt (not as crazy as it sounds, given the recent scare with MF Global!). In short it covers nearly everything that could possibly interfere with the trading implementation, of which there are many sources. Whole books are devoted to risk management for quantitative strategies so I wont't attempt to elucidate on all possible sources of risk here.


Risk management also encompasses what is known as optimal capital allocation , which is a branch of portfolio theory . This is the means by which capital is allocated to a set of different strategies and to the trades within those strategies. It is a complex area and relies on some non-trivial mathematics. The industry standard by which optimal capital allocation and leverage of the strategies are related is called the Kelly criterion . Since this is an introductory article, I won't dwell on its calculation. The Kelly criterion makes some assumptions about the statistical nature of returns, which do not often hold true in financial markets, so traders are often conservative when it comes to the implementation.


Another key component of risk management is in dealing with one's own psychological profile. There are many cognitive biases that can creep in to trading. Although this is admittedly less problematic with algorithmic trading if the strategy is left alone! A common bias is that of loss aversion where a losing position will not be closed out due to the pain of having to realise a loss. Similarly, profits can be taken too early because the fear of losing an already gained profit can be too great. Another common bias is known as recency bias . This manifests itself when traders put too much emphasis on recent events and not on the longer term. Then of course there are the classic pair of emotional biases - fear and greed. These can often lead to under - or over-leveraging, which can cause blow-up (i. e. the account equity heading to zero or worse!) or reduced profits.


As can be seen, quantitative trading is an extremely complex, albeit very interesting, area of quantitative finance. I have literally scratched the surface of the topic in this article and it is already getting rather long! Whole books and papers have been written about issues which I have only given a sentence or two towards. For that reason, before applying for quantitative fund trading jobs, it is necessary to carry out a significant amount of groundwork study. At the very least you will need an extensive background in statistics and econometrics, with a lot of experience in implementation, via a programming language such as MATLAB, Python or R. For more sophisticated strategies at the higher frequency end, your skill set is likely to include Linux kernel modification, C/C++, assembly programming and network latency optimisation.


If you are interested in trying to create your own algorithmic trading strategies, my first suggestion would be to get good at programming. My preference is to build as much of the data grabber, strategy backtester and execution system by yourself as possible. If your own capital is on the line, wouldn't you sleep better at night knowing that you have fully tested your system and are aware of its pitfalls and particular issues? Outsourcing this to a vendor, while potentially saving time in the short term, could be extremely expensive in the long-term.


Just Getting Started with Quantitative Trading?


3 Reasons to Subscribe to the QuantStart List:


1. Quant Trading Lessons.


You'll get instant access to a free 10-part course packed with hints and tips to help you get started in quantitative trading!


2. All The Latest Content.


Every week I'll send you a wrap of all activity on QuantStart so you'll never miss a post again.


Real, actionable quant trading tips with no nonsense.


A Quant’s Approach to Building Trading Strategies: Part One.


Recently, Quandl interviewed a senior quantitative portfolio manager at a large hedge fund. We spoke about how she builds trading strategies–how she transitions from an abstract representation of the market to something concrete with genuine predictive powers.


Can you tell us how you design new trading strategies?


It all starts with a hypothesis. I conjecture that there ought to be a relationship between two instruments, or maybe there’s a new instrument in the market that’s gaining popularity, or maybe there’s an unusual macroeconomic factor I’ve discovered that drives micro pricing behavior. So I write down an equation – a model, if you like – that aims to capture this relationship. Typically it’ll be some sort of process equation that shows how the variables evolve over time, with a random (stochastic) component.


The next step is to find a closed-form solution for this model. Sometimes this is easy; sometimes this takes days and weeks of algebra; sometimes there is no closed-form solution and I have to settle for an approximation. I find Mathematica’s symbolic manipulation toolkit very useful in this stage of the process.


Okay, so now I have a model of the market. I need to test if it’s realistic. At this stage, I usually turn to Matlab. I assume some plausible values for various parameters, and run some simulations. Do the simulated outputs look reasonable? Do they reflect, at least conceptually, the actual dynamics of the market?


Assuming the model passes this sanity check, it’s time to move beyond blue-sky exploration or ideation and into formal research.


What do you mean by “formal research”? And why is it necessary?


I mean the transition from an abstract, stylized representation of the market, to something that is concrete and unambiguous, with genuine predictive powers.


It’s hard to build truly predictive models. But it’s very easy to fool yourself into thinking you’ve built a predictive model, when in reality you’ve merely over-fitted, or used in-sample testing, or imposed exogenous knowledge in your rules, or what have you. Most “systems” fall apart in the real world for this precise reason.


I don’t want that to happen to my model; I will be risking real money on it. So over the years I’ve built and refined a slow, steady, systematic approach that minimizes the risk of fooling myself. That’s what I call “formal research”.


What steps do you include in your formal research process?


Early on, my biggest fear is data contamination. History is a limited resource; once you’ve run out of historical data to test against, you can’t generate any more. I’m paranoid about not exhausting my supply of uncontaminated out-of-sample data.


So I start by dividing my historical data into non-overlapping chunks. I then randomize so that even I don’t know which chunk is which. (This guards against subconscious biases: for instance, being risk-averse when I know my test dataset is 2008, or being risk-seeking in 2009).


I designate one chunk as my calibration set. I usually use Python for calibration: I use their built-in optimization libraries and have written a few of my own. In this particular example, my parameters are constrained and correlated. So I use a 2-step optimization process called the EM algorithm. Optimizers can be sensitive to initial conditions, so I use Monte Carlo to choose a number of starting points in the solution space. All of this is quite easy to do in Python.


The result of this calibration should be a set of “model parameters” – numerical values – that can be combined with actual market observations to predict other market prices.


Once I’ve calibrated the model, I test it out of sample. Are the predictions stable and the residuals mean-reverting? If not, the model doesn’t work; as simple as that. I try various “tricks” to break the model. For instance, I calibrate on monthly data but test on daily data. Or I test US parameters on Canadian market data. If the model truly reflects underlying economic reality, it should be fairly robust to these kinds of attacks. (Economics does not change when you cross borders).


So, you strictly separate in-sample and out-of-sample; you blind yourself to date ranges; you use Monte Carlo to avoid starting-point biases; and you try various robustness tricks. What else do you do to ensure that you’re not fooling yourself?


I place a very high premium on parsimony. If my model requires too many parameters or has too many degrees of freedom, it’s just curve-fitting; not a model at all. So I’m constantly trying to remove factors. If the model keeps working (and remains “rich”) with multiple factors removed, then it’s probably a good one.


A second proof of robustness is if the model works well no matter what trading strategy you build on top of it. If you can only make money using a complex non-linear scaling rule with all sorts of edge conditions, then that suggests a lack of robustness.


Finally, there’s no substitute for data. I think of every possible out-of-sample dataset that I can plausibly test the model on: different countries, different instruments, different time frames, different date frequencies. The model has to work on all of them; else you have selection bias in the results.


That sounds comprehensive. What happens next?


Armed with a calibrated model, the next step is to build a PL simulation. Mean-reverting residuals might not suffice if the opportunity set is too small to compensate for bid-ask, or if the occasional blowups kill all my profits. So I need to test an actual trading strategy using my model. Here is where I have to exercise the utmost care: it’s all too easy to curve-fit by adding new free variables, or bias the results with subconscious knowledge, or wish away outliers. Simplicity, strict separation of samples, and intellectual honesty are important here.


I use Excel for back-testing. This is a deliberate choice: Excel is not as powerful as Python, and this means there is an upper bound on how complex I can make my trading rules. This is a good thing: a strategy that requires complexity to be profitable is probably not a good strategy in the first place.


Excel also allows me to see my assumptions made explicit; it’s easy to lose track of such things when you’re working in code. It allows me to visualize performance statistics (risk, return, drawdowns, capital efficiency, Sharpe ratio and so on) quickly and clearly. Even if my model “works”, there’s no guarantee that a trading strategy built around the model will be economically viable, so these statistics matter.


Very few trading models make it past all the above steps: blue-sky formulation and sanity checks; historical calibration and out-of-sample performance; trading strategy back-test and profitability. But for the few that do, it’s now time to move into production. This is a whole different ball game.


You can read the second part of the interview here. In it, we discuss how production is a whole new ball game, and where to get ideas for new strategies. We also respond to reader questions in the third part of the interview.


Any questions for our quant? Comments? Leave them below and she’ll respond to you. We’d love to hear about your process for building trading strategies.


Leave a Reply Cancel reply.


[…] A Quant’s Approach to Building Trading Strategies: Part One A Quant’s Approach to Building Trading Strategies: Part Two A Quant’s Approach to Building Trading Strategies: Part Three […]


[…] A Quant’s Approach to Building Trading Strategies: Part One [Quandl] […]


[…] part of our interview with a senior quantitative portfolio manager at a large hedge fund. In the first part, she discussed the theoretical phase of creating a quantitative trading strategy. In the second […]


[…] Статья с аггрегатора Quandl Resource Hub. […]


[…] A Quant’s Approach to Building Trading Strategies: Part One [Quandl] Recently, Quandl interviewed a senior quantitative portfolio manager at a large hedge fund. We spoke about how she builds trading strategieshow she transitions from an abstract representation of the market to something concrete with genuine predictive powers. Can you tell us how you design new trading strategies? It all starts with a hypothesis. I conjecture that there ought to be a […]


[…] 1. A Quant’s Approach to Building Trading Strategies: Part One […]


I found the interview quite useful. However, I observe that you have already used Matlab, Python and Excel (and presumably use C#/C++/Java) for production. Isn’t this process of shifting between 4 languages cumbersome? Moreover, what is it in Matlab that you cannot do in Python or vice versa? Also, regarding Excel, don’t you find that even though visualization is useful, it carries a lot of operational risk (formulas not being dragged correctly, sheet not refreshed properly etc)? Would love to hear about these.


> Isn’t this process of shifting between 4 languages cumbersome?


It’s not that cumbersome. I typically find that the most tedious part is making sure the data flows consistently and smoothly between different apps or languages. Syntax translation is easy; data translation, not so much.


> What is it in Matlab that you cannot do in Python or vice versa?


These days, you’re right, there’s not much you cannot do in Python. And indeed I find myself using Python more and more. But that was not always the case; the plethora of open-source financial libraries in Python is a relatively recent phenomenon.


> Regarding Excel, don’t you find that even though visualization is useful, it carries a lot of operational risk (formulas not being dragged correctly, sheet not refreshed properly etc)?


I totally agree. Excel is fragile in many ways: it’s easy to make operational mistakes, it’s impossible to audit; it’s not very performant; it hangs at the most inconvenient times. So you have to be very careful in how and where you use Excel. That said, I do find the benefits outweigh the many costs.


I found the interview useful. However i see that you have already used Matlab, Python and excel (and would be possibly using C++/ C#/ Java/python for production). Is this process not cumbersome? Moreover what is it in Matlab that you cannot do in Python or vice-versa? And don’t you find Excel to be somewhat of an operational risk (you generally drag and drop formulas which might introduce manual errors, rely on sheets refreshing properly etc)?


[…] One quant’s approach to building trading strategies (quandl) […]


Very sensible approach. I especially like the importance placed on keeping your OOS data sacrosanct. The only aspect with which I have any quibbles is the removal of factors to test stability. Perhaps it’s just the interview format making things a little less clear. But I build models in a bottom-up fashion not top-down. A new factor either adds information or it doesn’t. If my two factor model has higher IC than my three factor model then the third factor is superfluous and shouldn”t be added in the first place. Then by definition removal of a factor from a well-specified model will always result in degraded prediction performance.


My apologies, lvcm, I wasn’t clear enough. (See also my reply to David, up-thread).


I don’t remove factors in my testing phase; I try to remove them in my specification phase. If new degrees of freedom aren’t adding explanatory power, I dump them. But once I’ve moved into testing and robustness checks, it doesn’t make sense to remove factors. (In fact I don’t even know what that would mean — it’s not like you can just “ignore” kappa or whatever).


Keeping OOS data sacrosanct — totally with you on that. If there’s one thing I wish I could hammer into people’s heads, it’s the importance of this step.


I do Quantitative Modeling and analysis for living. I have made some interesting models in R so far. The problem is that I am neither good at Python nor have the hours to put into learning it – to be able to do tasks comfortablly. Is there a way to collaborate with someone who has the experience and knowledge to do back testing, PL test, etc.


Of course I can draw model estimates against the historic prices – however, that is not quite enough. It is necessary to simulate how the model would have performed if it was actually trading.


Have you tried using Quantopian for back-testing? Their IDE (integrated developer environment) makes it quite easy, though it does require Python knowledge.


I have also built a back-testing environment in Ruby (a programming language similar to Python).


Anyway, I would be happy to help you in translating your model into something programmatic.


This is an informative for interview as quant. Could you give more details on the use of Monte Carlo in parameters’ initialization? Again thanks.


I’m a little surprised by this article. Why make an off the beaten track stochastic model which you spend ‘weeks’ solving with computer aided algebra but then discard most of the parameters? How could the final product be that different from other stuff.


You misunderstand me — my apologies for not being clearer!


I try to discard parameters right at the start, when I’m specifying my model. I’ve seen quants with models with 20 long-term parameters and 12 daily degrees of freedom. To me, those aren’t models, they’re universal articulators: they can fit anything. I’d never risk money on anything that complex.


So I try to be as parsimonious as possible when creating my model.


Once I’ve defined a model that I think is economically reasonable and logistically sound, only then do I try my robustness checks. And at this stage I don’t discard parameters. But I do pay attention to sensitivities. If my profitability is incredibly sensitive to one particular parameter hitting one specific value, and falls apart on minor perturbations, then I suspect my model is just “lucky”, not smart. But I agree with you that removing parameters entirely at this stage would be silly.


I’m impressed , what do you think regarding money management rule such as optimal betting size ?


This is a great interview and I appreciate that you took the time to provide insight into your strategy design. This would be very time consuming but would it be possible to provide an actually example using a real system (regardless if the system is profitable or not). Conceptually I understand what you are saying but it would be informative to put actual examples to the steps. Again, thank you for your time.


I would like to ask a basic question. I am starting up in thie quantitative analysis field. However, you seem to be pretty experienced and in this field for a long time. I eould like to ask whether quantitative or technical strategies are giving you consistent ‘comfortable’ returns. Do you rely on one system or keep on changing it arbitrarily and whether you use any fundamental analysis also to assist technical analysis.


You have to keep evolving with the markets. No single system or strategy works forever.


I’d like to ask what what additional checks and procedures are used when a model is taken live, in particular how you monitor and manage on a continuous basis the model once live? Do you set up predefined monitoring rules or circuit breakers that take the model out of action automatically? If so, how do you construct these, what kinds of measures do you use in them? Also relatedly how do you identify and deal with periods of reasonable underperformance? Such underperformance can make one doubt ones models and make it seem as if a model has stopped working when this turns out to not be the case.


I’m kind of old-fashioned — I don’t believe circuit-breakers really work. Or to be more precise, portfolios with programmatic circuit breakers underperform portfolios without, over the long term. The reasoning is that circuit-breakers stop you out of good trades at a loss way too often, such that those losses outweigh the rare occasions when they keep you out of big trouble.


Note: I’m talking about classic quant arb portfolios here; not electronic execution or HFT. In the latter cases I can totally see why you’d want multiple failsafes and circuit-breakers; those books can get away from you really fast. But that’s not my area of expertise.


Within my area, I’ve observed a few patterns in models that break. For starters, they rarely blow up instantly; instead, either the opportunity just disappears (arbitraged away by copycats) or the spread slowly and imperceptibly drifts further and further away from fair value and never comes back (regime change).


Conversely, if a trade diverges and then the divergence accelerates, that smells to me much more of a capitulation. In those cases I want to hold on to my position and indeed add if I can.


So the paradoxical conclusion is that the faster a model loses money, the more likely it is to be still valid.


Good luck programming a coherent circuit-breaker to handle that logic!


This is actually a microcosm of the larger problem. A situation where a circuit-breaker would really help, will almost definitely be one perverse enough to avoid most a priori attempts at definition. It’s the unknown unknowns that get you, every time.


Important note: the above is informed by my own position and risk preference. I am senior enough and successful enough that portfolio maximization is my central incentive. If I were younger, keeping my job (staying in the game) would be my central incentive. And in that case, circuit breakers help because they avoid catastrophic, job-losing drawdowns, while the foregone losses don’t show up on any PL report.


> how do you identify and deal with periods of reasonable underperformance?


This is the hundred million dollar question! I wish I had a definitive, unambiguous answer to give you — it would help me as well 🙂


Thanks for the feedback. I take some small comfort from the fact that professional quants also wrestle with this type of questions.


One more question if I may — I’ve toyed with the ideas of (but not yet really tested/simulated/implemented) more “gradual” type of management/monitoring, e. g. where you control say the amount of capital committed to a particular model (or basket of models) and scale this down or up gradually over time, depending on aggregate model performance.


The basic idea would be that the management process would have a sufficiently long term view to not take the model “out” of the markets for reasonable/expected drawdowns (due to looking at sufficiently large sample of performance) whilst still ensuring that the model eventually stops being traded if the returns go flat or negative for a representative sample size.


Of course this idea would be no guarantee against losses as such, but the hope would be that it might be enough to at least prevent an LTCM style of blowup.


(To add:I guess it seems to me that one of the mistakes with the LTCM blowup was assuming their models would always work, and therefore they had no plan, no monitoring level, nothing to tell them “system outside of known/expected parameters, scale down to preserve capital”. And I’d like to learn and avoid that type of mistake, if at all possible…)


You mentioned “regime change”. So how do you decide that your trade lost enough for you to consider your model not working anymore? I guess a “Post #3: monitoring and maintenace” would be nice 😀 Thanks for sharing!


All very sensible stuff. I found this comment interesting:


” For instance, I calibrate on monthly data but test on daily data.”


I guess it depends on what you mean by ‘calibration’ but this struck me as slightly unusual.


Let’s make it simple and suppose I’m trying to capture (slow) trends using a moving average crossover. I play around with monthly data until I get something I think works. To move to daily data I *ought* to multiply some parameters by.


20 (like the moving average lengths) because there are about 20 business days in a calendar month, and others by.


sqrt(20) [various scaling parameters too dull to discuss here]. But the model should still behave in the same way. The turnover for example shouldn’t increase when I move to daily.


On the other hand if I keep the parameters the same then instead of picking up say a 6 month trend I’m picking up a 6 business day trend. But the sweet spot for trend following most assets tends to be a fair bit slower than that so it’s unlikely to look as good. Also my turnover will be a lot higher, but then you’d expect that. To put it another way I’m not sure all aspects of market behaviour are ‘fractal’ such that I can just apply exactly the same model to different time scales.


Hi Rob. Original poster here. Thank you for a most perceptive comment!


Are markets fractal? Great question and one I’ve spent many evenings debating over scotch.


Personally I think they’re not, because certain exogenous events act as a forcing function: daily margin calls from exchanges, monthly MTMs for hedge funds, quarterly financial statements for publicly traded banks. These events cause *something* to happen (never mind what) at those frequencies. So not all time-scales are created equal, and merely speeding up / slowing down the clock is *not* a “neutral” approach.


So I’m actually very cautious about *which* strategies I’d do this kind of time-shifting with.


Here’s a toy strategy where time-shifting can work. Take 2 futures strips in the same “space” — maybe winter and spring wheat . Look for cases when 1 is backwardated and the other is in contango. Buy front low, sell back high, sell front high, buy back low. A totally simple, almost “dumb” strategy, but for many futures pairs it used to work well.


This is a great case for changing time scales. This strategy should work whether you sample / rebalance weekly, or monthly, or quarterly — because the decision variables are pure state, no path. We’re not looking at price histories; nor are we looking at instruments with a time component (bonds which accrue, or options which decay, or random walks with a drift). So, given that the strategy is really clean, we can get away with this kind of robustness test.


(Caveat: bid-ask is the one complicating factor here — your chosen time-scale needs to be big enough to allow for price action that overcomes friction. Bid-ask is the bane of quants everywhere.)


But I would never apply this same test to say a trend-following strategy. That would raise all sorts of philosophical questions. What does it mean for a strategy to have a “sweet spot” at say 9 days, or 200 days, or whenever? By optimizing for that sweet spot, are you curve-fitting? Or does the fact that almost everyone uses 9d and 200d create a self-fulfilling prophecy, and so those numbers represent something structural about the market? I’ve heard convincing arguments both ways. What if you sampled your data at interval X, and then did 9X and 200X moving averages — would that work? Fun philosophical questions; I’m not sure of the answers myself.


Other notes: I agree that “calibration” was a sloppy choice of word by me in that particular sentence. “Ideation” would have been better. If you’re calibrating, you’re already introducing more structure than time-shifting can safely handle.


You’re absolutely correct re (t) and sqrt(t) — and I agree with you, too dull to discuss here.


Thanks again for the comment!


I think that “calibrate on monthly data but test on daily data” means recalibrating a rolling model (like for example a rolling regression) every month, but using daily data. Then testing with that recalibrated model in the following month, again using daily data.


Kinda like a walk forward method of testing?


Sorry for not being clear. What I meant was closer to Rob’s original interpretation: I build an idea with data sampled at frequency X, but then test it with data sampled at frequency Y. Walk-forward or monthly recalibration is a separate exercise, that I would undertake after the model has been “in production” for some time.


Late Birds Get the Worm: Payment Data and Company Strength.


In a recent Huffington Post article, Visa’s Head of Global Financial Education Nathaniel Sillin wrote, “Understanding how much it costs to manage a home and the importance of paying your bills on time can help you avoid costly mistakes.” While many readers would likely nod in agreement, Sillin’s sage advice is not as universal as you might think. At least not in the business world. Bloomberg has cited a report by Quandl’s partner Dun & Bradstreet (essentially the Experian and Equifax of the business world), claiming that “For big businesses, . . . things are different. For one thing, they.


The Landscape of Auto Industry Data.


Since man first invented the wheel, our need to optimize the way we get around has been an almost primeval obsession. From the advent of the first motorized vehicle to self-driving cars, the auto industry has evolved quickly in its embrace of technology. We are now experiencing what is probably the greatest advancement in the automotive sector since Henry Ford first designed his moving assembly line: the rise of the connected car. Per Scotiabank’s BI Intelligence Estimates, by 2020 more than 75% of new cars shipped will be Internet-connected. From reading your Facebook notifications to measuring safety and engine health.


Icebergs, Chameleons and Vipers: A Survey of FX Execution.


The foreign exchange market has long been the most decentralized and opaque of all markets. As a result, FX traders labor under major informational disadvantages compared to their peers in other asset classes. Unlike equity markets, where SEC regulations mandate that public exchanges report transaction prices and daily trading volumes, FX boasts no such unified data sources. There are no central exchanges, pits or bulletin boards. Instead, FX transactions take place via a million phone calls, client visits, threads and trading platforms. The entire market is over-the-counter, party to party, and nobody knows what anybody else is doing beyond.


Quantitative Trading Strategy Using R: A Step by Step Guide.


In this post we will discuss about building a trading strategy using R. Before dwelling into the trading jargons using R let us spend some time understanding what R is. R is an open source. There are more than 4000 add on packages,18000 plus members of LinkedIn’s group and close to 80 R Meetup groups currently in existence. It is a perfect tool for statistical analysis especially for data analysis. The concise setup of Comprehensive R Archive Network knows as CRAN provides you the list of packages along with the base installation required. There are lot of packages available depending upon the analysis needs to be done. To implement the trading strategy, we will use the package called quantstrat.


Four Step Process of Any Basic Trading Strategy.


Hypothesis formation Testing Refining Production.


Our hypothesis is formulated as “market is mean reverting”. Mean reversion is a theory that suggests that the prices eventually move back to their average value. The second step involves testing the hypothesis for which we formulate a strategy on our hypothesis and compute indicators, signals and performance metrics. The testing phase can be broken down into three steps, getting the data, writing the strategy and analyzing the output. In this example we consider NIFTY-Bees. It is an exchange traded fund managed by Goldman Sachs. NSE has huge volume for the instrument hence we consider this. The image below shows the Open-High-Low-Close price of the same.


We set a threshold level to compare the fluctuations in the price. If the price increases/decreases we update the threshold column. The closing price is compared with the upper band and with the lower band. When the upper band is crossed, it is a signal for sell. Similarly when the lower band is crossed, it is a signal for sell.


The coding section can be summarized as follows,


A helicopter view towards the output of the strategy is given in the diagram below.


Thus our hypothesis that market is mean reverting is supported. Since this is back-testing we have room for refining the trading parameters that would improve our average returns and the profits realized. This can be done by setting different threshold levels, more strict entry rules, stop loss etc. One could choose more data for back-testing, use Bayseian approach for threshold set up, take volatility into account.


Once you are confident about the trading strategy backed by the back-testing results you could step into live trading. Production environment is a big topic in itself and it’s out of scope in the article’s context. To explain in brief this would involve writing the strategy on a trading platform.


As mentioned earlier, we would be building the model using quantstrat package. Quantstrat provides a generic infrastructure to model and backtest signal-based quantitative strategies. It is a high-level abstraction layer (built on xts, FinancialInstrument, blotter, etc.) that allows you to build and test strategies in very few lines of code.


The key features of quantstrat are,


Supports strategies which include indicators, signals, and rules Allows strategies to be applied to multi-asset portfolios Supports market, limit, stoplimit, and stoptrailing order types Supports order sizing and parameter optimization.


In this post we build a strategy that includes indicators, signals, and rules.


For a generic signal based model following are the objects one should consider,


Instruments - Contain market data Indicators - Quantitative values derived from market data Signals - Result of interaction between market data and indicators Rules - Generate orders using market data, indicators and signals.


Without much ado let’s discuss the coding part. We prefer R studio for coding and insist you use the same. You need to have certain packages installed before programming the strategy.


The following set of commands installs the necessary packages.


Once you have installed the packages you import them for further usage.


Read the data from csv file and convert it into xts object.


We initialize the portfolio with the stock, currency, initial equity and the strategy type.


Add position limit if you wish to trade more than once on the same side.


Create the strategy object.


We build a function that computes the thresholds are which we want to trade. If price moves by thresh1 we update threshold to new price. New bands for trading are Threshold+/-Thresh2. Output is an xts object though we use reclass function to ensure.


Add the indicator, signal and the trading rule.


Run the strategy and have a look at the order book.


Update the portfolio and view the trade statistics.


Here is the complete code.


Once you are familiar with these basics you could take a look at how to start using quantimod package in R. Or in case you’re good at C++, take a look at an example strategy coded in C++.


If you’re a retail trader or a tech professional looking to start your own automated trading desk, start learning algo trading today! Begin with basic concepts like automated trading architecture, market microstructure, strategy backtesting system and order management system.


Comments are closed.


Recent popular posts.


Most visited articles of the week.


Jobs for R users.


Is powered by WordPress using a bavotasan design.


Copyright © 2017 R-bloggers . All Rights Reserved. Terms and Conditions for this website.


LearnDataSci.


Home » Python for Finance, Part 2: Intro to Quantitative Trading Strategies.


Python for Finance, Part 2: Intro to Quantitative Trading Strategies.


Python for Finance, Part 2: Intro to Quantitative Trading Strategies.


Language: Python 3.5 Libraries: pandas, numpy, and matplotlib iPython notebook: available on GitHub.


In Python for Finance, Part I, we focused on using Python and Pandas to.


retrieve financial time-series from free online sources (Yahoo), format the data by filling missing observations and aligning them, calculate some simple indicators such as rolling moving averages and visualise the final time-series.


As a reminder, the dataframe containing the three “cleaned” price timeseries has the following format:


We have also calculated the rolling moving averages of these three timeseries as follows. Note that when calculating the $M$ days moving average, the first $M-1$ are not valid, as $M$ prices are required for the first moving average data point.


Building on these results, our ultimate goal will be to design a simple yet realistic trading strategy. However, first we need to go through some of the basic concepts related to quantitative trading strategies, as well as the tools and techniques in the process.


General considerations about trading strategies.


There are several ways one can go about when a trading strategy is to be developed. One approach would be to use the price time-series directly and work with numbers that correspond to some monetary value. For example, a researcher could be working with time-series expressing the price of a given stock, like the time-series we used in the previous article. Similarly, if working with fixed income instruments, e. g. bonds, one could be using a time-series expressing the price of the bond as a percentage of a given reference value, in this case the par value of the bond. Working with this type of time-series can be more intuitive as people are used to thinking in terms of prices. However, price time-series have some drawbacks. Prices are usually only positive, which makes it harder to use models and approaches which require or produce negative numbers. In addition, price time-series are usually non-stationary, that is their statistical properties are less stable over time.


An alternative approach is to use time-series which correspond not to actual values but changes in the monetary value of the asset. These time-series can and do assume negative values and also, their statistical properties are usually more stable than the ones of price time-series. The most frequently used forms used are relative returns defined as.


and log-returns defined as.


where $p\left(t\right)$ is the price of the asset at time $t$. For example, if $p\left(t\right) = 101$ and $p\left(t-1\right) = 100$ then $r_ >\left(t\right) = \frac = 1\%$.


There are several reasons why log-returns are being used in the industry and some of them are related to long-standing assumptions about the behaviour of asset returns and are out of our scope. However, what we need to point out are two quite interesting properties. Log-returns are additive and this facilitates treatment of our time-series, relative returns are not. We can see the additivity of log-returns in the following equation.


which is simply the log-return from $t_0$ to $t_2$. Secondly, log-returns are approximately equal to the relative returns for values of $\frac $ sufficiently close to $1$. By taking the 1st order Taylor expansion of $\log\left( \frac \right)$ around $1$, we get.


Both of these are trivially calculated using Pandas as:


Since log-returns are additive, we can create the time-series of cumulative log-returns defined as.


c\left(t\right) = \sum_ ^t r\left(t\right)


The cumulative log-returns and the total relative returns from 2000/01/01 for the three time-series can be seen below. Note that although log-returns are easy to manipulate, investors are accustomed to using relative returns. For example, a log-return of $1$ does not mean an investor has doubled the value of his portfolio. A relative return of $1 = 100\%$ does! Converting between the cumulative log-return $c\left(t\right)$ and the total relative return $c_ >\left(t\right) = \frac $ is simple.


For those who are wondering if this is correct, yes it is. If someone had bought $\$1000$ worth of AAPL shares in January 2000, her/his portfolio would now be worth over $\$30,000$. If only we had a time machine…


What is a quantitative trading strategy?


Our goal is to develop a toy trading strategy, but what does the term “quantitative trading strategy” actually mean? In this section we will give a definition that will guide us in our long-term goal.


Assume we have at our disposal a certain amount of dollars, $N$, which we are interested to invest. We have at our disposal a set of $K$ assets from which we can buy and sell freely any arbitrary amount. Our goal is to derive weights $w_i\left(t\right), i = 1, \ldots, K$ such that.


$$w_i\left(t\right) \in \mathbb \ \text \ \sum_ ^K w_i\left(t\right) \leq 1$$


so that an amount of dollars equal to $w_i\left(t\right) N$ is invested at time $t$ on asset $i$.


The inequality condition signifies $\sum_ ^K w_i\left(t\right) \leq 1$ that the maximum amount we can invest is equal to amount of dollars we have, that is $N$.


For example, assume we can invest in $2$ instruments only and that $N=\$1000$. The goal is to derive two weights $w_1\left(t\right)$ and $w_2\left(t\right)$.


If at some point $w_1\left(t\right) = 0.4$ and $w_2\left(t\right) = 0.6$, this means that we have invested $w_1\left(t\right)N = \$400$ in asset $1$ and $w_2\left(t\right)N = \$600$ in asset $2$. Since we only have $\$1000$, we can only invest up to that much which means that.


$$w_1\left(t\right)N + w_2\left(t\right)N \leq N \Rightarrow w_1\left(t\right) + w_2\left(t\right)


Georgios Efstathopoulos.


Georgios has 7+ years of experience as a quantitative analyst in the financial sector, and has worked extensively on statistical and machine learning models for quantitative trading, market and credit risk management and behavioural modelling. Georgios has PhD in Applied Mathematics and Statistics at Imperial College London, and is the founder and CEO of QuAnalytics Limited, a consultancy focusing on quantitative and data analytics solutions for individuals and organisation who wish to harvest the potential of their own data to grow their business.


Recommended.


Python for Finance, Part I: Yahoo Finance API, pandas, and matplotlib.


In detail, in the first of our tutorials, we are going to show how one can easily use Python to download financial data from free online databases, manipulate the downloaded data and then create some basic technical indicators which will then be used as the basis of our quantitative strategy.


Python for Finance, Part 3: A Moving Average Trading Strategy.


In this article, we will start designing a more complex trading strategy, which will have non-constant weights wi(t)wi(t), and thus adapt in some way to the recent behaviour of the price of our assets.


Top Data Science Online Courses in 2017.


The following is an extensive list of Data Science courses and resources, from platforms like Coursera, edX, and Udacity, that give you the skills needed to become a data scientist.


100+ Free Data Science Books for 2017.


Pulled from the web, here is a our collection of the best, free books on Data Science, Big Data, Data Mining, Machine Learning, Python, R, SQL, NoSQL and more.


Thanks for sharing Georgios. Would utilizing a Monte Carlo simulation to derive the optimal weights split in order to maximize the return be an improvement to this strategy instead of evenly splitting 1/3 each ? : weights_vector = pd. DataFrame(1 / 3, index=r_t. index, columns=r_t. columns)


You are correct, MC would be one of the possible ways to optimise the weights for this strategy. Note, however, that there are several open questions regarding this optimisation. First of all, what are we optimising for? Total return, portfolio volatility, draw-downs? Secondly, optimisation runs the risk of over-fitting the weights to the historical interval we are using for the optimisation. These and other issues will be addressed in the following articles in this series.


When can we expect the next post?


Thanks for the article. There are several instances above where you divide 1/3 and 1/7. Python assumes the result to be an integer so the answer is 0 in both cases. You should use 1.0/3 and 1.0/7 to avoid this issue.


Thanks for the notice. This series of articles assumes that Python 3 is used. In Python 3, 1 / 3 will produce 1.3333, instead of the integer division which was the case in Python 2.


However, to make this compatible with Python 2 users, in what follows I will make sure that in such cases extra care is taken to ensure that divisions are handled appropriately.


Copyright © 2017 LearnDataSci. All rights reserved.

Комментариев нет:

Отправить комментарий