The thing is that "LLM reasoning breaks down" simply did not surprise me enough that I thought it was worth clicking. Making LLMs fail is not hard. They're interesting for the ways that they work, not the (many, many) ways that they don't.
edit: I've had a look and I don't think any of their prompts are very good. They're certainly not how I'd write them if I wanted a current model to actually solve the problem.
The way to make me take a paper like this seriously would be if you set it up as an adversarial collaboration with a competent prompter, and that person agreed they couldn't make a generic prompt that solved the problem. "We tried three times and none worked" is not news, or at any rate not news about LLMs.
The thing is that "LLM reasoning breaks down" simply did not surprise me enough that I thought it was worth clicking. Making LLMs fail is not hard. They're interesting for the ways that they work, not the (many, many) ways that they don't.
edit: I've had a look and I don't think any of their prompts are very good. They're certainly not how I'd write them if I wanted a current model to actually solve the problem.
The way to make me take a paper like this seriously would be if you set it up as an adversarial collaboration with a competent prompter, and that person agreed they couldn't make a generic prompt that solved the problem. "We tried three times and none worked" is not news, or at any rate not news about LLMs.