Add comments to assembly flash loader for STM32. Add tiny improvement in
size of the algorithm (40 vs 48 bytes) and tiny speed improvement (~1.5%,
as time is wasted on waiting for end of operation anyway).
Signed-off-by: Freddie Chopin <freddie_chopin@op.pl>