The real trouble with bandits is that people don't bother to look into what the ...

rented_mule · 2025-10-01T01:28:50 1759282130

Agreed. In the case I was describing above, new arms were constantly being introduced (often several times a day for each of hundreds of thousands of scenarios). Manual experiments weren't an option. This also meant we were in a constant state of partial convergence for most scenarios, but the same would be true with experiments.

How to cull arms, so that there are enough samples for any kind of convergence, is another problem in this setup. We eventually built an ML model to select arms and used bandits to pick between them. This proved "too effective". The arms were all user-generated content. The bandits on top of the model, both setup to maximize clicks was stupidly good at surfacing inappropriate content because it got a lot of clicks. We ended up having to put more safeties on arm selection for certain categories of our content where we had the most inappropriate submissions.

flashfaffe2 · 2025-10-01T10:28:30 1759314510

Thanks for the inputs.

"Bandits work best for relatively simple optimization choices at very large scale"

Have you considered different methods to address this shortcoming?

hruk · 2025-10-01T11:25:06 1759317906

This is basically Breiman's "two cultures" at play. Do you care about optimizing y-hat, or do you care about doing inference on some parameters in your model? Depends on the business case, typically.