I have two issues with Gemini that I don't experience with Claude: 1. It RENAMES VARIABLE NAMES even in places where I don't tell it to change (I pass them just as context). and 2. Sometimes it's missing closing square brackets.
Sure I'm a lazy bum, I call the variable "json" instead of "jsonStringForX", but it's contextual (within a closure or function), and I appreciate the feedback, but it makes reviewing the changes difficult (too much noise).
I have a very clear example of Gemini getting it wrong:
For a code like this, it keeps changing processing_class=tokenizer to "tokenizer=tokenizer", even though the parameter was renamed and even after adding the all caps comment.
#Set up the SFTTrainer
print("Setting up SFTTrainer...")
trainer = SFTTrainer(
model=model,
train_dataset=train_dataset,
args=sft_config,
processing_class=tokenizer, # DO NOT CHANGE. THIS IS NOW THE CORRECT PROPERTY NAME
)
print("SFTTrainer ready.")
I haven't tried with this latest version, but the 05-06 pro still did it wrong.
Do you have in the system prompt to actually not edit lines that has comments about not editing them? Had that happen to me too, that code comments been ignored, and adding instructions about actually following code comments helped for that. But different models so YMMV.
I find o1-pro, which nobody ever mentions, is in the top spot along with Gemini. But Gemini is an absolute mess to work with because it constantly adds tons of comments and changes unrelated code.
It is worth it sometimes, but usually I use it to explore ideas and then have o1-pro spit out a perfect solution ready diff test and merge.
It feels like I'm negotiating with a toddler. If I say nothing, it adds useless comments everywhere. If I tell it to not add comments, it deletes all of my comments. Tell it to put the comments back, and it still throws away half of my comments and rewrites the reset in a less precise way.
I think it is likely that the comments are more for the model than for the user. I would not be even slightly surprised if verbose coding versions outperformed light commenting versions.
On the other hand, I'm skeptical if that has any impact because these models have thinking tokens where they can put all those comments and attention shouldn't care about how close the tokens are as long as they're within the context window.
The excessive comments might help the model when it's called again to re edit the code in the future - wouldn't be surprised if it was optimizing for vibe coding and the redundant comments reinforce the function/intent of the line when it's being modified down the line
i've noticed with ChatGPT is will 100% ignore certain instructions and I wonder if it's just an LLM thing. For example, I can scream and yell in caps at ChatGPT to not use em or en dashes and if anything it makes it use them even more. I've literally never once made it successfully not use them, even when it ignored it the first time, and my follow up is "output the same thing again but NO EM or EN DASHES!"
i've not tested this thoroughly, it's just my ancedotal experience over like a dozen attempts.
There are some things so ubiquitous in the training data that it is really difficult to tell models to not so them. Simply because it is so ingrained in their core training. Em dashes are apparently one of those things.
It's something I read a lottle while ago in a larger article but can't remember which article it was.
Sure I'm a lazy bum, I call the variable "json" instead of "jsonStringForX", but it's contextual (within a closure or function), and I appreciate the feedback, but it makes reviewing the changes difficult (too much noise).