In "any sense" of the word? Surely anyone who adjusts their behavior when they g...

hellojesus · on June 5, 2024

This seems to me to be where these systems need to go in the future, akin to reinforcement learning.

You feed an llm a prompt. It then abstracts and approximates what the result should be. It then devises a hypothesis and solves it and compares it to the approximated output. Then it can then formulate a new hypothesis and evaluate it, based off the outcome of hypothesis 1. From there it can either keep iterating or dump that path for a new one (e.g., the next best hypothesis in the original formation).

At some point the answer is "good enough." But along the way it keeps playing against its thoughts to see if it can do better.

A key issue may be the original approximation, so it may need to consider its adjustment when iterating.

Maybe this is how cutting edge llms work now. I have no idea.