David Sweet, author of “Tuning Up: From A/B testing to Bayesian optimization”, introduces Dan and Chris to system tuning, and takes them from A/B testing to response surface methodology, contextual bandit, and finally bayesian optimization. Along the way, we get fascinating insights into recommender systems and high-frequency trading!
Matched from the episode's transcript 👇
David Sweet: We can talk about them… Maybe I’ll try to do them from most interesting to least, but… One thing - there’s this interesting cultural dynamic in finance – in trading specifically. I’ll even narrow it down more - quantitative trading, where people, especially when they’re new to the field, they wanna come in and they wanna try the latest and greatest algorithms and ideas, and everything they’ve learned recently in school, from papers and whatnot, and make some money. Build the magic machine that makes a ton of money.
On the other side, you’ve got people who’ve been doing it for a while, usually [unintelligible 00:14:02.12] who roll their eyes at every new thing, like “Ahh… That’s not gonna work. Neural networks don’t work. SVMs don’t work.” And sometimes they’re right, sometimes they’re wrong… I think if you say something’s not gonna work, you’ll usually be right, but you just won’t be productive… So it’s one of the unfortunate aspects of the distribution of quality of new ideas in engineering.
So what I find is - I’ve seen people try, or I’ve been one of those who have tried all kinds of things. Basically, if you wanted to just randomly throw out ideas [unintelligible 00:14:34.14] And some of the things stick. Some people figure out how to get things to work.
The big problems with financial data are the signal-to-noise ratio is very low. The signals aren’t just small, but they’re competed away. The act of going and trading on signals which your competitors are seeing as well, is squashing the signals. So it creates this non-stationarity where over time your strategies become less and less tradable, sometimes very quickly. So you constantly have to adapt and look for new ways to predict or to trade.
One thing that – you mentioned reinforcement learning, and that brought to mind… I don’t think reinforcement learning is ready to just turn it on and get a usable answer out of, in finance. I haven’t seen that. And I say that only – I say it because it’s hard. I feel like it’s still cutting-edge for solving this kind of problem. I see a lot of promise in offline reinforcement learning, what’s been going on the past year or so… It’s just amazing, and it’s very much in line with… It’s like a machine learning replacement for the old - or an AI replacement, I’ll say - for the old-school simulation optimization; like, how do you make that more automated, or more autonomous, or hyper-automated, or get that next level of automation. So yeah, I see a lot of promise, but I haven’t seen people just kind of taking that out of the box and making it work.
[15:58] A contextual bandit on the other hand, which is a limited subset of reinforcement learning - not only do I think that that’s directly useful, but I think people in finance have been doing it ad-hoc for a long time anyway… You know, if not the most super-efficient way it can be done, like people understand it these days, I think, since the beginning of my [unintelligible 00:16:17.12] doing things that kind of look to me like a contextual bandit.
What makes that easier than a full reinforcement learning problem is that you’re only predicting the immediate reward, so you don’t have to worry about your decision now affecting the state of the world for your decision later, and then have this compounding of state changes based on previous decisions. That’s a more IID sample, so to speak, to build your model with.