Our techmap rules for $shift and $shiftx cells contained a special path
that aimed to decompose the shift LSB-first instead of MSB-first in
select cases that come up in pmux lowering. This path was needlessly
overcomplicated and contained bugs.
Instead of doing that, just switch over the main path to iterate
LSB-first (except for the specially-handled MSB for signed shifts
and overflow handling). This also makes the code consistent with
shl/shr/sshl/sshr cells, which are already decomposed LSB-first.
Fixes#2346.
By operating at a layer of abstraction over the rather clumsy Intel primitives,
we can avoid special hacks like `dffinit -highlow` in favour of simple techmapping.
This also makes the primitives much easier to manipulate, and more descriptive
(no more cyclonev_lcell_comb to mean anything from a LUT2 to a LUT6).