IIUC, this is no longer necessarily true with positional encodings like ALiBi: https://github.com/ofirpress/attention_with_linear_biases
IIUC, this is no longer necessarily true with positional encodings like ALiBi: https://github.com/ofirpress/attention_with_linear_biases