A transformer is a universal function approximator. The question is whether it can do so reasonably efficiently. Trained on natural language linear sequences, I’m with you. Trained on abstract logical graph representations? I don’t think that question’s answered yet, unless I’m missing something.
Think about a program that can write code but not execute it. It’s not hard to get a transformer to learn to write python code like merge sort or simple arithmetic code, even though a transformer can’t reasonably learn to sort, nor can it learn simple arithmetic. It’s an important disambiguation. In one view it appears it can’t (learn to sort) and in another it demonstrably can (learn to code up a sort function). It can learn to usefully manipulate the language without needing the capacity to execute the language. They probably can’t learn to “execute” what you’re talking about (execute being a loose analogy), but I’d say the jury’s out on whether they can learn to usefully manipulate it.
Transformer is a function from seq of symbols to seq of symbols.
For example, truth table is exactly similar kind of function from variables to [True, False] alphabet.
Transformer can represent some very complex logical operations, and per this article is turing complete: https://arxiv.org/abs/1901.03429, meaning any computable function, including theorem prover can be represented as transformer.
Another question is if it is feasible/viable/rational to build transformer for this? My intuition says: no.