In preparation for substantial expansion of CXXRTL's runtime, this commit
reorganizes the files used by the implementation. Only minimal changes are
required in a consumer.
First, change:
-I$(yosys-config --datdir)/include
to:
-I$(yosys-config --datdir)/include/backends/cxxrtl/runtime
Second, change:
#include <backends/cxxrtl/cxxrtl.h>
to:
#include <cxxrtl/cxxrtl.h>
(and do the same for cxxrtl_vcd.h, etc.)
The Verilog grammar does not allow an empty case. Most synthesis tools
are quite permissive about this, but Quartus is not. This causes
problems for amaranth with Quartus (see amaranth-lang/amaranth#931).
We add a new flow graph node type, PRINT_SYNC, as they don't get handled
with regular CELL_EVALs. We could probably move this grouping out of
the dump method.
Removing some signed checks and logic where we've already guaranteed the
values to be positive. Indeed, in these cases, if a negative value got
through (per my realisation in the signed fuzz harness), it would cause
an infinite loop due to flooring division.
The previous implementation for finding the end of a top-level s-expr
exhibited quadratic behavior as it would re-scan the complete input for
the current expression for every new line. For large designs with
trivial properties this could easily take seconds and dominate the
runtime over the actual solving.
This change remembers the current nesting level between lines, avoiding
the re-scanning.
This include seems to have been copied over from the JSON backend where
AIG models are sometimes inserted into the JSON output, but these other
backends don't do anything with AIG.
Python 3.12 emits a SyntaxWarning when encountering invalid escape
sequences. They still parse as expected. Marking these raw produces
the same result without the warnings.
This makes clk2fflogic add an attr to $ff cells that carry the state of
the emulated async FF. The $ff output doesn't have any async updates
that happened in the current cycle, but the $ff input does, so the $ff
input corresponds to the async FF's output in the original design.
Hence this patch also makes the following changes to passes besides
clk2fflogic (but only for FFs with the clk2fflogic attr set):
* opt_clean treats the input as a register name (instead of the
output)
* rename -witness ensures that the input has a public name
* the formal backends (smt2, btor, aiger) will use the input's
name for the initial state of the FF in witness files
* when sim reads a yw witness that assigns an initial value to the
input signal, the state update is redirected to the output
This ensures that yosys witness files for clk2fflogic designs have
useful and stable public signal names. It also makes it possible to
simulate a clk2fflogic witness on the original design (with some
limitations when the original design is already using $ff cells).
It might seem like setting the output of a clk2fflogic FF to update the
input's initial value might not work in general, but it works fine for
these reasons:
* Witnesses for FFs are only present in the initial cycle, so we do
not care about any later cycles.
* The logic that clk2fflogic generates loops the output of the
genreated FF back to the input, with muxes in between to apply any
edge or level sensitive updates. So when there are no active updates
in the current gclk cycle, there is a combinational path from the
output back to the input.
* The logic clk2fflogic generates makes sure that an edge sensitive
update cannot be active in the first cycle (i.e. the past initial
value is assumed to be whatever it needs to be to avoid an edge).
* When a level sensitive update is active in the first gclk cycle, it
is actively driving the output for the whole gclk cycle, so ignoring
any witness initialization is the correct behavior.
Can now append a user defined number of steps to input traces when joining.
If the number of steps is +ve, inputs are all set to 0.
If -ve then steps are skipped.
If all of steps are skipped (including init step) then the input trace will not be copied.
If more than one input trace is provided, the append option will need to be provided the same number of times as there are input traces.
Intended for use with SCY to combine sequential cover traces.
Arbitrary number of inputs, will load all and attempt to join them. Each trace will be replayed one after the other, in the same order as the files are provided.
Mismatch in init_only fields seems to work fine, with values in subsequent traces being assigned in the initial only if they weren't previously defined.
Uncertain if a mismatch in non init_only fields will cause problems.
Fixes WitnessSig.__eq__().
Adds helper functions to WitnessValues and ReadWitness classes.
While not setting the smtoffset here was clearly a bug, I think using
`chunk.offset` only worked incidentally. The `smtoffset` is an offset
into the `smtname, smtid` pair (here `"", idcounter`) which corresponds
to the smt bitvector `stringf("%s#%d", get_id(module), idcounter)` which
contains all the chunks this loop is iterating over.
Thus using an incrementing `smtoffset` (like the `$ff`/`$dff` case above
already does) should be the correct fix.
Wires weren't being assigned an smtoffset value so when generating a yosys witness trace it would also use an offset of 0.
Not sure if this has any other effects, but it fixes the bug I was having.
@jix could you take a look at this?
This should fix#3648 where when calling `emit_elaborated_extmodules` it
checks to see if a module is a black-box, however there was no
validation that the cell type was actually known, and it just always
assumed that we would get a valid instance, causing a segfault.
The output width for the boolean value should not influence the
operation width. The previous incorrect width extension would still
produce correct results, but could produce invalid smt2 output for
reduction operators when the output width was larger than the width of
the vector to which the reduction was applied.
This fixes#3654
This is not valid when the prefix of a trace already violates
assertions. This can happen when the trace generating solver doesn't
look for a minimal length counterexample.
I have added an optional flag to smtbmc that causes failed temporal induction counterexample traces to be checked for duplicate states and reported to the user, since loops in the counterexample mean that increasing the induction depth won't help prove a design's safety properties.
The witness metadata was missing fine grained FFs completely and for
coarse grained FFs where the output connection has multiple chunks it
lacked the offset of the chunk within the SMT expression. This fixes
both, the later by adding an "smtoffset" field to the metadata.
Uses the regex below to search (using vscode):
^\t\tlog\("(.{10,}(?<!\\n)|.{81,}\\n)"\);
Finds any log messages double indented (which help messages are)
and checks if *either* there are is no newline character at the end,
*or* the number of characters before the newline is more than 80.
This adds a native json based witness trace format. By having a common
format that includes everything we support, and providing a conversion
utility (yosys-witness) we no longer need to implement every format for
every tool that deals with witness traces, avoiding a quadratic
opportunity to introduce subtle bugs.
Included:
* smt2: New yosys-smt2-witness info lines containing full hierarchical
paths without lossy escaping.
* yosys-smtbmc --dump-yw trace.yw: Dump results in the new format.
* yosys-smtbmc --yw trace.yw: Read new format as constraints.
* yosys-witness: New tool to convert witness formats.
Currently this can only display traces in a human-readable-only
format and do a passthrough read/write of the new format.
* ywio.py: Small python lib for reading and writing the new format.
Used by yosys-smtbmc and yosys-witness to avoid duplication.
This attribute can be used by formal backends to indicate which clocks
were mapped to the global clock. Update the btor and smt2 backend which
already handle clock inputs to understand this attribute.
This approach had major issues with ROMs whose initialization was not
fully defined. If required, memory_map -rom-only -keepdc should be
called early in a formal flow instead. (This does require a careful
choice of optimization passes though. Sby's scripts will be updated
accordingly.)
Before this commit, values, wires, and memories with an initializer
were value-initialized in emitted C++ code. After this commit, all
values, wires, and memories are default-initialized, and the default
constructor of generated modules calls the reset() method, which
assigns the members that have an initializer.
Before this commit, zero width sigspecs were dumped as "" (empty
string). Unfortunately, 1364-2005 5.2.3.3 indicates that an empty
string is equivalent to "\0", and is 8 bits wide, so that's wrong.
After this commit, a replication operation with a count of zero is
used instead, which is explicitly permitted per 1364-2005 5.1.14,
and is defined to have size zero. (Its operand has to have a non-zero
size for it to be legal, though.)
PR #1203 has addressed this issue before, but in an incomplete way.
- *_en is split into *_ce (clock enable) and *_aload (async load aka
latch gate enable), so both can be present at once
- has_d is removed
- has_gclk is added (to have a clear marker for $ff)
- d_is_const and val_d leftovers are removed
- async2sync, clk2fflogic, opt_dff are updated to operate correctly on
FFs with async load
backends/protobuf/protobuf.cc depends on the source and header files
generated by protoc, but this dependency wasn't explicitly declared. Add
a rule to the Makefile to fix intermittent build failures when the
protobuf header/source file isn't built before protobuf.cc.
This mode will be used whenever read port cannot be handled in the
"extract address register" way, ie. whenever it has enable, reset,
init functionality or (in the future) mixed transparency mask.
The following VCD file crashes GTKWave's VCD loader:
$var wire 1 ! x:1 $end
$enddefinitions $end
In practice, a colon can be a part of a variable name that is
translated from a Verilog function, something like:
update$func$.../hdl/hazard3_csr.v:350$2534.$result
Public wires may alias buffered internal wires, so keep BUFFERED
wires in debug information even if they are private. Debug items are
only created for public wires, so this does not otherwise affect how
debug information is emitted.
Fixes#2540.
Fixes#2841.
While this helper is already useful to squash sequential initializations
into one in cxxrtl, its main purpose is to squash overlapping masked memory
initializations (when they land) and avoid having to deal with them in
cxxrtl runtime.
This essentially adds wide port support for free in passes that don't
have a usefully better way of handling wide ports than just breaking
them up to narrow ports, avoiding "please run memory_narrow" annoyance.
There will soon be more (versioned) memory cells, so handle passes that
only care if a cell is memory-related by a simple helper call instead of
a hardcoded list.
* xilinx: add SCC test for DSP48E1
* xilinx: Gate DSP48E1 being a whitebox behind ALLOW_WHITEBOX_DSP48E1
Have a test that checks it works through ABC9 when enabled
* abc9 to break SCCs using $__ABC9_SCC_BREAKER module
* Add test
* abc9_ops: remove refs to (* abc9_keep *) on wires
* abc9_ops: do not bypass cells in an SCC
* Add myself to CODEOWNERS for abc9*
* Fix compile
* abc9_ops: run -prep_hier before scc
* Fix tests
* Remove bug reference pending fix
* abc9: fix for -prep_hier -dff
* xaiger: restore PI handling
* abc9_ops: -prep_xaiger sigmap
* abc9_ops: -mark_scc -> -break_scc
* abc9: eliminate hard-coded abc9.box from tests
Also tidy up
* Address review
Previously, memories were silently discarded by the JSON backend, making
round-tripping modules with them crash.
Since there are already some users using JSON to implement custom
external passes that use memories (and infer width/size from memory
ports), let's fix this by just making JSON backend and frontend support
memories as first-class objects.
Processes are still not supported, and will now cause a hard error.
Fixes#1908.
Similar to the treatment of black boxes, splitting processes into two
scheduling nodes adds sufficient freedom so that netlists with
well-behaved processes (e.g. those emitted by nMigen) can immediately
converge.
Because processes are not emitted into edge-triggered regions, this
approach has comparable performance to -O5 (without -noproc), which
is substantially slower than -O6.
The exact shape of C++ code emitted by CXXRTL has a critical effect
on performance, both compile-time and runtime. CXXRTL's performance
greatly improved when it started localizing and inlining wires, not
only because this assists the optimizer and register allocator, but
also because inlining code into edge-triggered regions cuts the time
spent in eval() by at least a factor of two.
However, the logic of netlist layout has always been ad-hoc, fragile,
and very hard to understand and modify. After commit ece25a45, which
introduced outlining, the same logic started being applied to two
distinct netlists at once instead of one, which barely worked.
This commit does four major changes:
* There is now a single unambiguous source of truth (per subgraph)
for the layout of any emitted wire.
* Netlist layout is now done entirely during analysis using well
known graph algorithms; no graph operations happen when emitting.
* Netlist layout now happens completely separately for eval() and
debug_eval() subgraphs.
* Unreachable (within subgraph scope) netlist nodes are now neither
emitted nor considered for wire inlining decisions.
The netlist layout code should also now closely match the described
semantics.
As a part of this large cleanup, it includes many miscellaneous
improvements:
* The "bare minimum" debug level introduced in commit dd6a761d was
split into two levels; -g1 now emits debug information *only* for
inputs and state wires, and -g2 now emits debug information for
all public members. The old behavior matches -g2. This is done
to avoid bloat on low optimization levels.
* Debug aliases and inlined connections are now handled separately,
and complex RHS never interferes with inlined connections.
* Aliases to outlined wires now carry a pointer to the outline.
* Cell sync outputs can now be emitted in debug_eval().
* Black box debug information now includes comb/sync driver flags.
* The comment emitted for inlined cells is now accurate.
* Debug information statistics now has less noise.
* Netlist layout code is now much better documented.
Due to more precise inlining decisions, unmodified (i.e. with no
Yosys script being used) netlists now have much more logic inlined
into edge-triggered regions. On Minerva SoC SRAM, this improves
runtime by 20-25% across compilers and optimization levels.
Due to more precise reachability analysis, much less C++ code is now
emitted, especially at the maximum debug level. On Minerva SoC SRAM,
this improves clang compile time by 30-50% depending on options.
gcc is not affected.
On Minerva SoC SRAM compiled with clang-11, this change cuts commit
time in half (!) and overall time by 20%. When compiled with gcc-10,
there is no difference.
In C, non-static inline functions require an implementation elsewhere
(even though the body is right there in the header). It is basically
never desirable to use those as opposed to static inline ones.
Implementing outlining has greatly increased the amount of debug
information in a typical build, and consequently exposed performance
issues in C++ compilers, which are similar for both GCC and Clang;
the compile time of Minerva SoC SRAM increased almost twofold.
Although one would expect the slowdown to be caused by the increased
use of templates in `debug_eval()`, it is actually almost entirely
attributable to optimizations and codegen for `debug_items()`.
Fortunately, it is neither possible nor desirable to optimize
`debug_items()`: in most cases it is called exactly once, and its
body is a linear sequence of calls with unique arguments.
This commit turns off optimizations for `debug_items()` on GCC and
Clang, improving -Os compile time of Minerva SoC SRAM by ~40% (!)
Before this commit, if a sequence of wires assigned in a chain would
terminate on a cell, none of the wires would get marked as aliases,
and typically all of the public wires would get outlined. The reason
for this behavior is that alias analysis predates outlining and in
fact runs before it.
After this commit, alias analysis runs after outlining and considers
outlined wires valid aliasees. More importantly, if the chained wires
contain any valid aliasees, then all of the wires are aliased to
the one that is topologically deepest.
Aliased wires incur virtually no overhead for the VCD writer, unlike
outlined wires that would otherwise take their place. On Minerva SoC
SRAM, size of the full VCD dump is reduced by ~65%, and throughput
is increased by ~55%.
Aggressive wire localization and inlining is necessary for CXXRTL to
achieve high performance. However, that comes with a cost: reduced
debug information coverage. Previously, as a workaround, the `-Og`
option could have been used to guarantee complete coverage, at a cost
of a significant performance penalty.
This commit introduces debug information outlining. The main eval()
function is compiled with the user-specified optimization settings.
In tandem, an auxiliary debug_eval() function, compiled from the same
netlist, can be used to reconstruct the values of localized/inlined
signals on demand. To the extent that it is possible, debug_eval()
reuses the results of computations performed by eval(), only filling
in the missing values.
Benchmarking a representative design (Minerva SoC SRAM) shows that:
* Switching from `-O4`/`-Og` to `-O6` reduces runtime by ~40%.
* Switching from `-g1` to `-g2`, both used with `-O6`, increases
compile time by ~25%.
* Although `-g2` increases the resident size of generated modules,
this has no effect on runtime.
Because the impact of `-g2` is minimal and the benefits of having
unconditional 100% debug information coverage (and the performance
improvement as well) are major, this commit removes `-Og` and changes
the defaults to `-O6 -g2`.
We'll have our cake and eat it too!
"Elision" in this context is an unusual and not very descriptive term
whereas "inlining" is common and straightforward. Also, introducing
"inlining" makes it easier to introduce its dual under the obvious
name "outlining".
Before this commit, a cell's input was always assigned like:
p_cell.p_input = (value...);
If `p_input` is buffered (e.g. if the design is built at -O0), this
is not correct. (In practice, this breaks clocking.) Unfortunately,
the incorrect design was compiled without diagnostics because wire<>
was move-assignable and also implicitly constructible from value<>.
After this commit, cell inputs are no longer incorrectly assumed to
always be unbuffered, and wires are not assignable from values.
RTL contract violations and C++ contract violations are different:
the former depend on the netlist and will never violate memory safety
whereas the latter may. When loading a CXXRTL simulation into another
process, RTL contract violations should generally not crash it, while
C++ contract violations should.