Almost always, notes like these are going to be about greenfield projects. Tryin...

reubenmorais · 2026-01-27T19:13:39 1769541219

I'm using it on a large set of existing codebases full of extremely ugly legacy code, weird build systems, tons of business logic and shipping directly to prod at neckbreaking growth over the last two years, and it's delivering the same type of value that Karpathy writes about.

jjfoooo4 · 2026-01-27T20:05:07 1769544307

That was true for me, but is no longer.

It's been especially helpful in explaining and understanding arcane bits of legacy code behavior my users ask about. I trigger Claude to examine the code and figure out how the feature works, then tell it to update the documentation accordingly.

chrisjj · 2026-01-28T00:03:47 1769558627

> I trigger Claude to examine the code and figure out how the feature works, then tell it to update the documentation accordingly.

And how do you verify its output isn't total fabrication?

jjfoooo4 · 2026-01-28T16:43:35 1769618615

I read through it, scanning sections that seem uncontroversial and reading more closely sections that talk about things I'm less sure about. The output cites key lines of code, which are faster to track down and look at than trying to remember where in a large codebase to look.

Inconsistencies also pop up in backtesting, for example if there's a point that the llm answers different ways in multiple iterations, that's a good candidate to improve docs on.

Similar to a coworker's work, there's a certain amount of trust in the competency involved.

_dark_matter_ · 2026-01-28T04:23:35 1769574215

Your docs are a contact. You can verify that contract using integration tests

chrisjj · 2026-01-28T11:09:28 1769598568

Contract? These docs are information answering user queries. So if you use a chatbot to generate them, I'd like to be reasonably sure they aren't laden with the fabricated misinformation for which these chatbots are famous.

jjfoooo4 · 2026-01-28T16:46:49 1769618809

It's a very reasonable concern. My solution is to have the bot classify what the message is talking about as a first pass, and have a relatively strict filtering about what it responds to.

For example, I have it ignore messages about code freezes, because that's a policy question that probably changes over time, and I have it ignore urgent oncall messages, because the asker there probably wants a quick response from a human.

But there's a lot of questions in the vein of "How do I write a query for {results my service emits}", how does this feature work, where automation can handle a lot (and provide more complete answers than a human can off the top of their head)

chrisjj · 2026-01-28T18:51:33 1769626293

OK, but little of that applies to this use case, to "then tell it to update the documentation accordingly."

1123581321 · 2026-01-27T19:18:17 1769541497

These models do well changing brownfield applications that have tests because the constraints on a successful implementation are tight. Their solutions can be automatically augmented by research and documentation.

mh2266 · 2026-01-28T03:04:11 1769569451

I don't exactly disagree with this but I have seen models simply deleting the tests, or updating the tests to pass and declaring the failures were "unrelated to my changes", so it helpfully fixed them

1123581321 · 2026-01-28T15:44:54 1769615094

I’ve had to deal with this a handful of times. You just have to make it restore the test, or keep trying to pass a suite of explicit red-green method tests it wrote earlier.

hnben · 2026-01-28T07:08:01 1769584081

Yes. You have to treat the model like an eager yet incompetent worker, i.e. don't go full yolo mode and review everything they do.