This can result in massive reduction in runtime, up to 50% depending
on workload. Currently people are using `-mllvm -inline-threshold=`
as a workaround (with clang++), but this solution is more portable.
This was a correctness issue, but one of the consequences is that it
resulted in jumps in generated machine code where there should have
been none. As a side effect of fixing the bug, Minerva SoC became 10%
faster.
Without unbuffering output wires of, at least, toplevel modules, it
is not possible to have most designs that rely on IO via toplevel
ports (as opposed to using exclusively blackboxes) converge within
one delta cycle. That seriously impairs the performance of CXXRTL.
This commit avoids unbuffering outputs of all modules solely so that
in future, CXXRTL could gain fully separate compilation, and not for
any present technical reason.
With this change, it is easier to see which signals carry state (only
wire<>s appear as `reg` in VCD files) and to construct a minimal
checkpoint (CXXRTL_WIRE debug items represent the canonical smallest
set of state required to fully reconstruct the simulation).
Although logically two separate steps, these were treated as one for
historic reasons. Splitting the two makes it possible to have designs
that are only 2× slower than fastest possible (and are without extra
delta cycles) that allow probing all public wires.
Historically, elision was implemented before localization, so levels
with elision are lower than corresponding levels with localization.
This is unfortunate for two reasons:
1. Elision is a logical subset of localization, since it equals to
not giving a name to a temporary.
2. "Localize" currently actually means "unbuffer and localize",
and it would be useful to split those steps (at least for
public wires) for improved design visibility.
Although these options can be thought of as optimizations, they are
essentially orthogonal to the core of -O, which is managing signal
buffering and scope. Going from -O4 to -O2 means going from limited
to complete design visibility, yet in both cases proc and flatten
are desirable.
Before this commit, Verilog expressions like `x && 1` would result in
references to `logic_and_us` in generated CXXRTL code, which would
not compile. After this commit, since cells like that actually behave
the same regardless of signedness attributes, the signedness is
ignored, which also reduces the template instantiation pressure.
This commit changes the VCD writer such that for all signals that
have `debug_item.type == VALUE && debug_item.next == nullptr`, it
would only sample the value once.
Commit f2d7a187 added more debug information by including constant
wires, and decreased the performance of VCD writer proportionally
because the constant wires were still repeatedly sampled; this commit
eliminates the performance hit.
Constant wires can represent a significant chunk of the design in
generic designs or after optimization. Emitting them in VCD files
significantly improves usability because gtkwave removes all traces
that are not present in the VCD file after reload, and iterative
development suffers if switching a varying signal to a constant
disrupts the workflow.
This commit changes the VCD writer such that for all signals that
share `debug_item.curr`, it would only emit a single VCD identifier,
and sample the value once.
Commit 9b39c6f7 added redundancy to debug information by including
alias wires, and increased the size of VCD files proportionally; this
commit eliminates the redundancy from VCD files so that their size
is the same as before.
Alias wires can represent a significant chunk of the design in highly
hierarchical designs; in Minerva SRAM, there are 273 member wires and
527 alias wires. Showing them in every hierarchy level significantly
improves usability.
Compared to the C++ API, the C API currently has two limitations:
1. Memories cannot be updated in a race-free way.
2. Black boxes cannot be implemented in C.
Debug information describes values, wires, and memories with a simple
C-compatible layout. It can be emitted on demand into a map, which
has no runtime cost when it is unused, and allows late bound designs.
The `hdlname` attribute is used as the lookup key such that original
names, as emitted by the frontend, can be used for debugging and
introspection.
The $div and $mod cells use truncating division semantics (rounding
towards 0), as defined by e.g. Verilog. Another rounding mode, flooring
(rounding towards negative infinity), can be used in e.g. VHDL. The
new $divfloor cell provides this flooring division.
This commit also fixes the handling of $div in opt_expr, which was
previously optimized as if it was $divfloor.
The $div and $mod cells use truncating division semantics (rounding
towards 0), as defined by e.g. Verilog. Another rounding mode, flooring
(rounding towards negative infinity), can be used in e.g. VHDL. The
new $modfloor cell provides this flooring modulo (also known as "remainder"
in several languages, but this name is ambiguous).
This commit also fixes the handling of $mod in opt_expr, which was
previously optimized as if it was $modfloor.
Ensures that "BV" is the logic whenever solving an exists-forall problem with Yices, moves the "(set-logic ...)" directive above any non-info line, sets the `ef-max-iters` parameter to a very high number when using Yices in exists-forall mode so as not to prematurely abandon difficult problems, and does not provide the incompatible "--incremental" Yices argument when in exists-forall mode.
This isn't actually necessary anymore after scheduling was improved,
and `clean -purge` disrupts the mapping between wires in the input
RTLIL netlist and the output CXXRTL code.
log_assert(false) never returns and thus can't fall through, but gcc
doesn't seem to think that far. Making it the last case avoids the
problem entirely.
The current firrtl backend emits blackboxes as standard modules
with an empty body, but this causes the firrtl compiler to
optimize out entire circuits due to the absence of any drivers.
Yosys already tags blackboxes with a (*blackbox*) attribute, so this
commit just propagates this change to firrtl's syntax for blackboxes.
As per suggestion made in https://github.com/YosysHQ/yosys/pull/1987, now:
RTLIL::wire holds an is_signed field.
This is exported in JSON backend
This is exported via dump_rtlil command
This is read in via ilang_parser
Strategically inserting the pending memory write in memory::update to keep the
queue sorted allows us to skip the queue sort in memory::commit.
The Minerva SRAM SoC runs ~7% faster as a result.
This is quite possibly the worst way to implement this, but it does
work for a subset of well-behaved designs, and can be used to measure
how much performance is lost simulating the inactive edge of a clock.
It should be replaced with a clock tree analyzer generating safe
code once it is clear how should such a thing look like.
If the annotations are not used, this commit does not alter semantics
at all, other than removing elision of outputs of black box cells.
(Elision of such outputs is expected to be too rare to have any
noticeable benefit, and the implementation was somewhat of a hack.)
The (* cxxrtl.comb *) annotation alters the semantics of the output
of the black box it is applied to such that, if the black box
converges immediately, no additional delta cycle is necessary to
propagate the computed combinatorial value upwards in hierarchy.
The (* cxxrtl.sync *) annotation alters the semantics of the output
of the black box it is applied to such as to remove any uses of
the black box by the wires connected to this output, and break false
feedback arcs arising from conservative modeling of dependencies of
the black box.
Although currently these attributes are only recognized on black
boxes, if separate compilation is added in the future, it could also
emit and consume them.
The attribute for this is called (* cxxrtl.edge *), and there is
a planned attribute (* cxxrtl.sync *) that would cause blackbox
cell outputs to be added to sync defs rather than comb defs.
Rename the edge detector related stuff to avoid confusion.
Fixes#1823.
This will allow nextpnr to reuse the default value information already
present in yosys cells_sim.v and avoid duplicating (and probably
desyncing) this information.