A slice of points means copying every point into the slice, then
copying every point's data from the slice to TrianglesData. An
array of indicies lets the compiler make better choices.
For polyline, don't compute each normal twice; when we're going through a line,
the "next" normal for segment N is always the "previous" normal for segment
N+1, and we can compute fewer of them.
For internal operations (anything using getAndClearPoints), there's a
pretty good chance that the operation will repeatedly invoke something
like fillPolygon(), meaning that it needs to push "a few" points
and then invoke something that uses those points.
So, we add a slice for containing spare slices of points, and on the
way out of each such function, shove the current imd.points (as used
inside that function) onto a stack, and set imd.points to [0:0] of
the thing it was called with.
Performance goes from 11-13fps to 17-18fps on my test case.