The computation including a call to Stride() can't be optimized away
safely because the compiler can't tell that Stride() is effectively
constant, but we know it won't change so we can make a slice pointing
at that part of the array.
CPU time for updateData goes from 26.35% to 18.65% in my test case.