It turns out that affine matrices are much simpler than the 3x3 matrices
they imply, and we can use this to dramatically streamline some code.
For a test program, this was about a 50% gain in frame rate just from
the cost of the applyMatrixAndMask calls in imdraw, which were calling
matrix.Project() many times. Simplifying matrix.Project, alone, got a
nearly 50% frame rate boost!
Also modify pixelgl's SetMatrix to copy the six values of a 3x2
Affine into the corresponding locations of a 3x3 matrix.