Without unbuffering output wires of, at least, toplevel modules, it
is not possible to have most designs that rely on IO via toplevel
ports (as opposed to using exclusively blackboxes) converge within
one delta cycle. That seriously impairs the performance of CXXRTL.
This commit avoids unbuffering outputs of all modules solely so that
in future, CXXRTL could gain fully separate compilation, and not for
any present technical reason.
With this change, it is easier to see which signals carry state (only
wire<>s appear as `reg` in VCD files) and to construct a minimal
checkpoint (CXXRTL_WIRE debug items represent the canonical smallest
set of state required to fully reconstruct the simulation).
Although logically two separate steps, these were treated as one for
historic reasons. Splitting the two makes it possible to have designs
that are only 2× slower than fastest possible (and are without extra
delta cycles) that allow probing all public wires.
Historically, elision was implemented before localization, so levels
with elision are lower than corresponding levels with localization.
This is unfortunate for two reasons:
1. Elision is a logical subset of localization, since it equals to
not giving a name to a temporary.
2. "Localize" currently actually means "unbuffer and localize",
and it would be useful to split those steps (at least for
public wires) for improved design visibility.
Although these options can be thought of as optimizations, they are
essentially orthogonal to the core of -O, which is managing signal
buffering and scope. Going from -O4 to -O2 means going from limited
to complete design visibility, yet in both cases proc and flatten
are desirable.
Before this commit, Verilog expressions like `x && 1` would result in
references to `logic_and_us` in generated CXXRTL code, which would
not compile. After this commit, since cells like that actually behave
the same regardless of signedness attributes, the signedness is
ignored, which also reduces the template instantiation pressure.
Constant wires can represent a significant chunk of the design in
generic designs or after optimization. Emitting them in VCD files
significantly improves usability because gtkwave removes all traces
that are not present in the VCD file after reload, and iterative
development suffers if switching a varying signal to a constant
disrupts the workflow.
Alias wires can represent a significant chunk of the design in highly
hierarchical designs; in Minerva SRAM, there are 273 member wires and
527 alias wires. Showing them in every hierarchy level significantly
improves usability.