This can result in massive reduction in runtime, up to 50% depending
on workload. Currently people are using `-mllvm -inline-threshold=`
as a workaround (with clang++), but this solution is more portable.
This was a correctness issue, but one of the consequences is that it
resulted in jumps in generated machine code where there should have
been none. As a side effect of fixing the bug, Minerva SoC became 10%
faster.
With this change, it is easier to see which signals carry state (only
wire<>s appear as `reg` in VCD files) and to construct a minimal
checkpoint (CXXRTL_WIRE debug items represent the canonical smallest
set of state required to fully reconstruct the simulation).
Before this commit, Verilog expressions like `x && 1` would result in
references to `logic_and_us` in generated CXXRTL code, which would
not compile. After this commit, since cells like that actually behave
the same regardless of signedness attributes, the signedness is
ignored, which also reduces the template instantiation pressure.
Constant wires can represent a significant chunk of the design in
generic designs or after optimization. Emitting them in VCD files
significantly improves usability because gtkwave removes all traces
that are not present in the VCD file after reload, and iterative
development suffers if switching a varying signal to a constant
disrupts the workflow.
Compared to the C++ API, the C API currently has two limitations:
1. Memories cannot be updated in a race-free way.
2. Black boxes cannot be implemented in C.
Debug information describes values, wires, and memories with a simple
C-compatible layout. It can be emitted on demand into a map, which
has no runtime cost when it is unused, and allows late bound designs.
The `hdlname` attribute is used as the lookup key such that original
names, as emitted by the frontend, can be used for debugging and
introspection.
Strategically inserting the pending memory write in memory::update to keep the
queue sorted allows us to skip the queue sort in memory::commit.
The Minerva SRAM SoC runs ~7% faster as a result.
If it is statically known that eval() will converge in one delta
cycle (that is, the second commit() will always return `false`)
because the design contains no feedback or buffered wires, then
there is no need to run the second delta cycle at all.
After this commit, the case where eval() always converges immediately
is detected and the second delta cycle is omitted. As a result,
Minerva SRAM SoC runs ~25% faster.
Both parameters and attributes are necessary because the parameters
have to be the same between every instantiation of the cell, but
attributes may well vary. For example, for an UART PHY, the type
of the PHY (tty, pty, socket) would be a parameter, but configuration
of the implementation specified by the type (socket address) would
be an attribute.
This commit adds support for replacing RTLIL modules with CXXRTL
black boxes. Black box port widths may not depend on the parameters
with which it is instantiated (yet); the parameters may only be used
to change the behavior of the black box.
There is no practical benefit from using `const memory` for ROMs;
it uses an std::vector internally, which prevents contemporary
compilers from constant-propagating ROM contents. (It is not clear
whether they are permitted to do so.)
However, there is a major benefit from using non-const `memory` for
ROMs, which is the ability to dynamically fill the ROM for each
individual simulation.
This commit reduces space and time overhead for writable memories
to O(write port count) in both cases; implements handling for write
port priorities; and simplifies runtime representation of memories.
After this commit, if NDEBUG is not defined, out-of-bounds accesses
cause assertion failures for reads and writes. If NDEBUG is defined,
out-of-bounds reads return zeroes, and out-of-bounds writes are
ignored.
This commit also adds support for memories that start with a non-zero
index (`Memory::start_offset` in RTLIL).
This results in further massive gains in performance, modest decrease
in compile time, and, for designs without feedback arcs, makes it
possible to run eval() once per clock edge in certain conditions.