Because that's expensive, even in the case where the conversion
is trivial. Use type assertion first. Reduces runtime cost of
imdraw.Push from ~15.3% to 8.4%, so not-quite-50% of runtime
cost of pushing points.
If you were setting imd.Color to Color objects that aren't RGBA
every single point, not much help. But if you set it and then
draw a bunch of points, this will be a big win.
Soooo. It turns out that the bunch of smallish (~4-5% of runtime)
loads associated with Len(), Unit(), Rotated(), and so on... Were
actually more like 15% or more of computational effort. I first
figured this out by creating:
func (u Vec) Normal(v Vec) Vec
which gives you a vector normal to u->v. That consumed a lot
of CPU time, and was followed by .Unit().Scaled(imd.thickness / 2),
which consumed a bit more CPU time.
After some poking, and in the interests of avoiding UI cruft,
the final selection is
func (u Vec) Normal() Vec
This returns the vector rotated 90 degrees, which turns out to
be the most common problem.
We never actually need the "normal" value; it's an extra calculation
we didn't need, because ijNormal is the same value early on. It's
totally possible that we could further simplify this; there's a lot
of time going into the normal computations.
A slice of points means copying every point into the slice, then
copying every point's data from the slice to TrianglesData. An
array of indicies lets the compiler make better choices.
For polyline, don't compute each normal twice; when we're going through a line,
the "next" normal for segment N is always the "previous" normal for segment
N+1, and we can compute fewer of them.
For internal operations (anything using getAndClearPoints), there's a
pretty good chance that the operation will repeatedly invoke something
like fillPolygon(), meaning that it needs to push "a few" points
and then invoke something that uses those points.
So, we add a slice for containing spare slices of points, and on the
way out of each such function, shove the current imd.points (as used
inside that function) onto a stack, and set imd.points to [0:0] of
the thing it was called with.
Performance goes from 11-13fps to 17-18fps on my test case.