Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Also the attention mechanism is baked in during pretraining

IIUC, this is no longer necessarily true with positional encodings like ALiBi: https://github.com/ofirpress/attention_with_linear_biases



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: