> and how matrices transform them is fine
this is circular. You are introducing/assuming the multiplication rule right here. You can't then derive it
I realise though that I was answering more "How does it work?" (application) rather than "Why does it work?" (derivation)
For the latter, something involving sets of linear equations is probably best, as you initially said
> and how matrices transform them is fine
this is circular. You are introducing/assuming the multiplication rule right here. You can't then derive it