While it seems plausible that eventually you could build a game around one of these models, the lack of an underlying state representation that you can permute in a precise way is a pretty strong barrier to anything resembling real user-and-system interaction. Even expressing pong through text prompts in a way that would produce desirable results in this is a tough challenge.
I could imagine a text adventure game with a 'visual' component perhaps working if you got the model to maintain enough consistency in spaces and character appearances.
I could imagine a text adventure game with a 'visual' component perhaps working if you got the model to maintain enough consistency in spaces and character appearances.