Why not both? You can tell it in the prompt what you want and still constrain the output programmatically.
Also note that the output still depends on a random sampling of the next token according to the distribution that the net gives you - so there is a lot of genuine randomness in the model's behaviour. And because each sampled token influences the rest of the response, this randomness will become stronger the longer the response is.
So if you already know you're only interested in a particular subset of tokens, it makes sense to me to clamp the distribution to only those tokens and keep the model from getting onto the "wrong path" in the first place.
Also, pragmatically, if you can get the model to restrict itself to JSON without telling it in the prompt, you're saving that part if the context window for better uses.
I agree with others that it would be interesting to see an LLM that outputs JSON natively - but I think it would also be moving in the opposite direction of the general trend. Right now I can ask it for JSON, YAML, or a number of other formats.
To answer "why not both?" -- bottom line, the effort involved. I don't want to deal with yet another library, the bugs in it, and the inevitable -changes-. GPT's capacity for bridging the gap between structured and human languages is an enormous boon. It bridges a gap so large, we most of the time can't span it with our imaginations. I don't need to write code to tell GPT what to do, I can direct it in plain english.
I'm not worried about the size of the context window the same way we're not worried about memory or disk space - there will be more.
Also note that the output still depends on a random sampling of the next token according to the distribution that the net gives you - so there is a lot of genuine randomness in the model's behaviour. And because each sampled token influences the rest of the response, this randomness will become stronger the longer the response is.
So if you already know you're only interested in a particular subset of tokens, it makes sense to me to clamp the distribution to only those tokens and keep the model from getting onto the "wrong path" in the first place.
Also, pragmatically, if you can get the model to restrict itself to JSON without telling it in the prompt, you're saving that part if the context window for better uses.