Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The real trouble with bandits is that people don't bother to look into what the real potential benefit is as far as the target you're optimizing. Despite theoretically loving bandit techniques, I've convinced multiple teams not to go that path because the real advantage of using them is a slightly more optimal mix of people in experiment than if you ran them manually.

At some scales it can make sense, but for the vast majority of products/companies the statistical benefits don't outweigh the added complexity.

Bandits work best for relatively simple optimization choices at very large scale. If you care at all about the information gained in an experiment, it's fairly unlikely bandits are a good choice for you (or any reinforcement learning algorithm at that point).



Agreed. In the case I was describing above, new arms were constantly being introduced (often several times a day for each of hundreds of thousands of scenarios). Manual experiments weren't an option. This also meant we were in a constant state of partial convergence for most scenarios, but the same would be true with experiments.

How to cull arms, so that there are enough samples for any kind of convergence, is another problem in this setup. We eventually built an ML model to select arms and used bandits to pick between them. This proved "too effective". The arms were all user-generated content. The bandits on top of the model, both setup to maximize clicks was stupidly good at surfacing inappropriate content because it got a lot of clicks. We ended up having to put more safeties on arm selection for certain categories of our content where we had the most inappropriate submissions.


Thanks for the inputs.

"Bandits work best for relatively simple optimization choices at very large scale"

Have you considered different methods to address this shortcoming?


This is basically Breiman's "two cultures" at play. Do you care about optimizing y-hat, or do you care about doing inference on some parameters in your model? Depends on the business case, typically.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: