Soooo. It turns out that the bunch of smallish (~4-5% of runtime)
loads associated with Len(), Unit(), Rotated(), and so on... Were
actually more like 15% or more of computational effort. I first
figured this out by creating:
func (u Vec) Normal(v Vec) Vec
which gives you a vector normal to u->v. That consumed a lot
of CPU time, and was followed by .Unit().Scaled(imd.thickness / 2),
which consumed a bit more CPU time.
After some poking, and in the interests of avoiding UI cruft,
the final selection is
func (u Vec) Normal() Vec
This returns the vector rotated 90 degrees, which turns out to
be the most common problem.
It turns out that affine matrices are much simpler than the 3x3 matrices
they imply, and we can use this to dramatically streamline some code.
For a test program, this was about a 50% gain in frame rate just from
the cost of the applyMatrixAndMask calls in imdraw, which were calling
matrix.Project() many times. Simplifying matrix.Project, alone, got a
nearly 50% frame rate boost!
Also modify pixelgl's SetMatrix to copy the six values of a 3x2
Affine into the corresponding locations of a 3x3 matrix.