The AI field is reusing existing CS concepts for AI that we never had hardware for, and now these people are learning how applied Software Engineering can make their theoretical models more efficient. It's kind of funny, I've seen this in tech over and over. People discover new thing, then optimize using known thing.
I've been thinking the same, and its things that you don't need some crazy ML degree to know how to do... A lot of the algorithms are known... for a while now... Milk it while you can.
What we need are "idea dice" or "concept dice" for CS – each side could have a vague architectural nudge like "parallelize", "interpret", "precompute", "predict and unwind", "declarative"...
Unfortunately, I think the context rot paper [1] found that the performance degradation when context increased still occurred in models using attention sinks.
https://hanlab.mit.edu/blog/streamingllm
The AI field is reusing existing CS concepts for AI that we never had hardware for, and now these people are learning how applied Software Engineering can make their theoretical models more efficient. It's kind of funny, I've seen this in tech over and over. People discover new thing, then optimize using known thing.