Aggressive wire localization and inlining is necessary for CXXRTL to
achieve high performance. However, that comes with a cost: reduced
debug information coverage. Previously, as a workaround, the `-Og`
option could have been used to guarantee complete coverage, at a cost
of a significant performance penalty.
This commit introduces debug information outlining. The main eval()
function is compiled with the user-specified optimization settings.
In tandem, an auxiliary debug_eval() function, compiled from the same
netlist, can be used to reconstruct the values of localized/inlined
signals on demand. To the extent that it is possible, debug_eval()
reuses the results of computations performed by eval(), only filling
in the missing values.
Benchmarking a representative design (Minerva SoC SRAM) shows that:
* Switching from `-O4`/`-Og` to `-O6` reduces runtime by ~40%.
* Switching from `-g1` to `-g2`, both used with `-O6`, increases
compile time by ~25%.
* Although `-g2` increases the resident size of generated modules,
this has no effect on runtime.
Because the impact of `-g2` is minimal and the benefits of having
unconditional 100% debug information coverage (and the performance
improvement as well) are major, this commit removes `-Og` and changes
the defaults to `-O6 -g2`.
We'll have our cake and eat it too!
"Elision" in this context is an unusual and not very descriptive term
whereas "inlining" is common and straightforward. Also, introducing
"inlining" makes it easier to introduce its dual under the obvious
name "outlining".
Before this commit, a cell's input was always assigned like:
p_cell.p_input = (value...);
If `p_input` is buffered (e.g. if the design is built at -O0), this
is not correct. (In practice, this breaks clocking.) Unfortunately,
the incorrect design was compiled without diagnostics because wire<>
was move-assignable and also implicitly constructible from value<>.
After this commit, cell inputs are no longer incorrectly assumed to
always be unbuffered, and wires are not assignable from values.
RTL contract violations and C++ contract violations are different:
the former depend on the netlist and will never violate memory safety
whereas the latter may. When loading a CXXRTL simulation into another
process, RTL contract violations should generally not crash it, while
C++ contract violations should.
Although it is always possible to destroy and recreate the design to
simulate a power-on reset, this has two drawbacks:
* Black boxes are also destroyed and recreated, which causes them
to reacquire their resources, which might be costly and/or erase
important state.
* Pointers into the design are invalidated and have to be acquired
again, which is costly and might be very inconvenient if they are
captured elsewhere (especially through the C API).
* backends/blif: Remove unused vector of strings
For reasons that are unclear to me, this was being used to store every
result of `cstr` before returning them. The vector was never accessed otherwise,
resulting in a huge unnecessary memory sink when emitting to BLIF.
* backends/blif: Remove CSTR macro
* backends/blif: Actually call str()
In most cases, a CXXRTL simulation would use a top module, either
because this module serves as an entry point to the CXXRTL C API,
or because the outputs of a top module are unbuffered, improving
performance. Taking this into account, the CXXRTL backend now runs
`hierarchy -auto-top` if there is no top module. For the few cases
where this behavior is unwanted, it now accepts a `-nohierarchy`
option.
Fixes#2373.
This can be useful to determine whether the wire should be a part of
a design checkpoint, whether it can be used to override design state,
and whether driving it may cause a conflict.
Before this commit, the meaning of "sync def" included some flip-flop
cells but not others. There was no actual reason for this; it was
just poorly defined.
After this commit, a "sync def" means that a wire holds design state
because it is connected directly to a flip-flop output, and may never
be unbuffered. This is not affected by presence of async inputs.
This can be useful to distinguish e.g. a combinatorially driven wire
with type `CXXRTL_VALUE` from a module input with the same type, as
well as general introspection.
The only difference between "RTLIL" and "ILANG" is that the latter is
the text representation of the former, as opposed to the in-memory
graph representation. This distinction serves no purpose but confuses
people: it is not obvious that the ILANG backend writes RTLIL graphs.
Passes `write_ilang` and `read_ilang` are provided as aliases to
`write_rtlil` and `read_rtlil` for compatibility.
This commit adds support for real-valued parameters in blackboxes. Additionally,
parameters now retain their types are no longer all encoded as strings.
There is a caveat with this implementation due to my limited knowledge of yosys,
more specifically to how yosys encodes bitwidths of parameter values. The example
below can motivate the implementation choice I took. Suppose a verilog component
is declared with the following parameters:
parameter signed [26:0] test_signed;
parameter [26:0] test_unsigned;
parameter signed [40:0] test_signed_large;
If you instantiate it as follows:
defparam <inst_name> .test_signed = 49;
defparam <inst_name> .test_unsigned = 40'd35;
defparam <inst_name> .test_signed_large = 40'd12;
If you peek in the RTLIL::Const structure corresponding to these params, you
realize that parameter "test_signed" is being considered as a 32-bit value
since it's declared as "49" without a width specifier, even though the parameter
is defined to have a maximum width of 27 bits.
A similar issue occurs for parameter "test_unsigned" where it is supposed to take
a maximum bit width of 27 bits, but if the user supplies a 40-bit value as above,
then yosys considers the value to be 40 bits.
I suppose this is due to the type being defined by the RHS rather than the definition.
Regardless of this, I emit the same widths as what the user specifies on the RHS when
generating firrtl IR.
Previous blackbox components were just emitted with their interface ports,
but their generic parameters were never emitted and it was therefore
impossible to customize them.
This commit adds support for blackbox generic parameters, though support
is only provided for INTEGER and STRING parameters. Other types of
parameters such as DOUBLEs, ..., would result in undefined behavior here.
This allows the emission of custom extmodule instances such as the following:
extmodule fourteennm_lcell_comb_<instName>:
input cin: UInt<1>
output combout: UInt<1>
output cout: UInt<1>
input dataa: UInt<1>
input datab: UInt<1>
input datac: UInt<1>
input datad: UInt<1>
input datae: UInt<1>
input dataf: UInt<1>
input datag: UInt<1>
input datah: UInt<1>
input sharein: UInt<1>
output shareout: UInt<1>
output sumout: UInt<1>
defname = fourteennm_lcell_comb
parameter extended_lut = "off"
parameter lut_mask = "b0001001000010010000100100001001000010010000100100001001000010010"
parameter shared_arith = "off"
Refer to the SMT-LIB specification, section 4.1.7. According to the spec, some options can only be specified in `start` mode. Once the solver sees `set-logic`, it moves to `assert` mode.
This commit only affects translation of RTLIL processes (for which
there is limited support).
Due to the event-driven nature of Verilog, processes like
reg x;
always @*
x <= 1;
may never execute. This can be fixed in SystemVerilog code by using
`always_comb` instead of `always @*`, but in Verilog-2001 the options
are limited. This commit implements the following workaround:
reg init = 0;
reg x;
always @* begin
if (init) begin end
x <= 1;
end
Fixes#2271.
For several reasons:
* They're more convenient than accessing .data.
* They accommodate variably-sized types like size_t transparently.
* They statically ensure that no out of range conversions happen.
For now these are only provided for unsigned integers, but eventually
they should be provided for signed integers too. (Annoyingly this
affects conversions to/from `char` at the moment.)
Fixes#2127.
This can result in massive reduction in runtime, up to 50% depending
on workload. Currently people are using `-mllvm -inline-threshold=`
as a workaround (with clang++), but this solution is more portable.
This was a correctness issue, but one of the consequences is that it
resulted in jumps in generated machine code where there should have
been none. As a side effect of fixing the bug, Minerva SoC became 10%
faster.
Without unbuffering output wires of, at least, toplevel modules, it
is not possible to have most designs that rely on IO via toplevel
ports (as opposed to using exclusively blackboxes) converge within
one delta cycle. That seriously impairs the performance of CXXRTL.
This commit avoids unbuffering outputs of all modules solely so that
in future, CXXRTL could gain fully separate compilation, and not for
any present technical reason.
With this change, it is easier to see which signals carry state (only
wire<>s appear as `reg` in VCD files) and to construct a minimal
checkpoint (CXXRTL_WIRE debug items represent the canonical smallest
set of state required to fully reconstruct the simulation).
Although logically two separate steps, these were treated as one for
historic reasons. Splitting the two makes it possible to have designs
that are only 2× slower than fastest possible (and are without extra
delta cycles) that allow probing all public wires.
Historically, elision was implemented before localization, so levels
with elision are lower than corresponding levels with localization.
This is unfortunate for two reasons:
1. Elision is a logical subset of localization, since it equals to
not giving a name to a temporary.
2. "Localize" currently actually means "unbuffer and localize",
and it would be useful to split those steps (at least for
public wires) for improved design visibility.
Although these options can be thought of as optimizations, they are
essentially orthogonal to the core of -O, which is managing signal
buffering and scope. Going from -O4 to -O2 means going from limited
to complete design visibility, yet in both cases proc and flatten
are desirable.
Before this commit, Verilog expressions like `x && 1` would result in
references to `logic_and_us` in generated CXXRTL code, which would
not compile. After this commit, since cells like that actually behave
the same regardless of signedness attributes, the signedness is
ignored, which also reduces the template instantiation pressure.
This commit changes the VCD writer such that for all signals that
have `debug_item.type == VALUE && debug_item.next == nullptr`, it
would only sample the value once.
Commit f2d7a187 added more debug information by including constant
wires, and decreased the performance of VCD writer proportionally
because the constant wires were still repeatedly sampled; this commit
eliminates the performance hit.
Constant wires can represent a significant chunk of the design in
generic designs or after optimization. Emitting them in VCD files
significantly improves usability because gtkwave removes all traces
that are not present in the VCD file after reload, and iterative
development suffers if switching a varying signal to a constant
disrupts the workflow.
This commit changes the VCD writer such that for all signals that
share `debug_item.curr`, it would only emit a single VCD identifier,
and sample the value once.
Commit 9b39c6f7 added redundancy to debug information by including
alias wires, and increased the size of VCD files proportionally; this
commit eliminates the redundancy from VCD files so that their size
is the same as before.
Alias wires can represent a significant chunk of the design in highly
hierarchical designs; in Minerva SRAM, there are 273 member wires and
527 alias wires. Showing them in every hierarchy level significantly
improves usability.
Compared to the C++ API, the C API currently has two limitations:
1. Memories cannot be updated in a race-free way.
2. Black boxes cannot be implemented in C.
Debug information describes values, wires, and memories with a simple
C-compatible layout. It can be emitted on demand into a map, which
has no runtime cost when it is unused, and allows late bound designs.
The `hdlname` attribute is used as the lookup key such that original
names, as emitted by the frontend, can be used for debugging and
introspection.
The $div and $mod cells use truncating division semantics (rounding
towards 0), as defined by e.g. Verilog. Another rounding mode, flooring
(rounding towards negative infinity), can be used in e.g. VHDL. The
new $divfloor cell provides this flooring division.
This commit also fixes the handling of $div in opt_expr, which was
previously optimized as if it was $divfloor.
The $div and $mod cells use truncating division semantics (rounding
towards 0), as defined by e.g. Verilog. Another rounding mode, flooring
(rounding towards negative infinity), can be used in e.g. VHDL. The
new $modfloor cell provides this flooring modulo (also known as "remainder"
in several languages, but this name is ambiguous).
This commit also fixes the handling of $mod in opt_expr, which was
previously optimized as if it was $modfloor.
Ensures that "BV" is the logic whenever solving an exists-forall problem with Yices, moves the "(set-logic ...)" directive above any non-info line, sets the `ef-max-iters` parameter to a very high number when using Yices in exists-forall mode so as not to prematurely abandon difficult problems, and does not provide the incompatible "--incremental" Yices argument when in exists-forall mode.
This isn't actually necessary anymore after scheduling was improved,
and `clean -purge` disrupts the mapping between wires in the input
RTLIL netlist and the output CXXRTL code.