If the offset is larger than the signal itself,
meaning the signal is completely shifted out,
it tried to extract a negative amount of bits from the old signal.
This RTL pattern is suspicious since it is a complicated way of
arriving at a constant value, so we warn the user.
This allows tools like SBY to capture the $display output independent
from anything else sim might log. Additionally it provides source and
hierarchy locations for everything printed.
The previous version could easily generate a large amount of padding
when the constant factor was significantly larger than the width of the
shift data input. This could lead to huge amounts of logic being
generated before then being optimized away at a huge performance and
memory cost.
Additionally and more critically, when the input width was not a
multiple of the constant factor, the input data was padded with 'x bits
to such a multiple before interspersing the 'x padding needed to align
the selectable windows to power-of-two offsets.
Such a final padding would not be correct for shifts besides $shiftx,
and the previous version did attempt to remove that final padding at the
end so that the native zero/sign/x-extension behavior of the shift cell
would be used, but since the last selectable window also got
power-of-two padding appended after the padding the code is trying to
remove got added, it did not actually fully remove it in some cases.
I changed the code to only add 'x padding between selectable windows,
leaving the last selectable window unpadded. This omits the need to add
final padding to a multiple of the constant factor in the first place.
In turn, that means the only 'x bits added are actually impossible to
select. As a side effect no padding is added when the constant factor is
equal to or larger than the width of the shift data input, also solving
the reported performance bug.
This fixes#4056
Remove duplicate %.pmg -> %_pm.h pattern. One of the duplicates overrode
the other, and in some conditions there were build races as to whether
the target directory for the generated header would exist. Instead have
a single rule which is properly generalized.
Generalize what was formerly the unsigned-only architecture to support
both signed and unsigned multiplication, use that as default, and set
aside the special low-power architecture that was formerly used for
signed multipliers.
- moved all selection and filtering logic to the match block
- applied less-verbose code suggestions
- removed constraint on number of bits in shift-amount
- added check for possible wrap-arround in the operation
Add a separate shiftmul pattern to match on left shifts which implement
demuxing. This mirrors the right shift pattern matcher but is probably
best kept separate instead of merging the two into a single matcher.
In any case the diff of the two matchers should be easily readable.
The `opt_expr` pass running before `peepopt` can interfere with the
detection of a shiftmul pattern due to some of the bottom bits of the
shift amount being replaced with constant zero. Extend the detection to
cover those situations as well.
`memory_nordff` has the advantage that it can be called just ahead of
the simulation step no matter whether the clocked read port has been
inferred or was explicitly instantiated in a flow.
In commit fedd12261 ("booth: Move away from explicit `Wire` pointers")
a bug was introduced when checking for vacant slots in arrays holding
some intermediate results. Non-wire SigBit values were taken to imply
a vacant slot, but actually a constant one can make its way into those
results, if the multiplier cell configuration is just right. Fix the
vacancy check to address the bug.
The main speedup comes from swithing from using a SHA1 hash to std::hash<std::string>. There is no need to use an expensive cryptographic hash for fingerprinting in this context.
This adds the -noinitstate option which is required to simulate
counterexamples to induction with yw-cosim. Also add handling for
$initstate cells for non-co-simulation.
For the basic single-bit operations, opt for gate cells (`$_AND_` etc.)
instead of the coarse cells (`$and` etc.). For the emission of cells
move to the conventional module methods (`module->addAndGate`) away
from the local helpers. While at it, touch on the surrounding code.
To represent intermediate signals use the `SigBit`/`SigSpec` classes as
is customary in the Yosys codebase. Do not pass around `Wire` pointers
unless we have special reason to.
* Speed up the autoname pass by 2x. This is accomplished by only constructing IdString objects for plain strings that have a higher score.
* Defer creating IdStrings even further. This increases the speedup to 3x.
This does not correctly handle an `$overwrite_tag` on a module output,
but since we currently require the user to flatten the design for
cross-module dft, this cannot be observed from within the design, only
by manually inspecting the signals in the design.
This is still missing a mode to rewrite $overwrite_tag and $original_tag
by injecting $set_tag and $get_tag in the right places. It's also
missing bit-precise propagation models for shifts and arithmetic and
requires the design to be flattened.
* Fixes a non-deterministic polarity error for $eqx/$nex cells
* Fixes a deterministic polarity error for $_NOR_ and $_ORNOT_ cells
* Generates hdlnames when xprop is run after flatten
ABC's read_lib command has a dont_use
cell list that is configurable by the user.
This PR exposes that option to Yosys.
See
5405d4787a/src/map/scl/scl.c (L285)
for documentation on this option.
Signed-off-by: Ethan Mahintorabi <ethanmoon@google.com>
Group the reconnections, so that instead of producing
connect $auto$wreduce.cc:455:run$24 [0] 1'0
connect $auto$wreduce.cc:455:run$23 [31] 1'0
connect $auto$wreduce.cc:455:run$23 [30] 1'0
... (40 more lines)
we produce
connect $auto$wreduce.cc:461:run$23 [31:11] 21'000000000000000000000
connect $auto$wreduce.cc:461:run$24 [31:10] 22'0000000000000000000000
.
If the `$ge` cell we are replacing has wide output port, the upper bits
on the port should be driven to zero. That's not what a `$not` cell with
a single-bit input does. Instead opt for a `$logic_not` cell, which does
zero-pad its output.
Fixes#3867.
C does not guarantee that stdout/stderr can be reassigned.
Most platforms do make them assignable, however musl and WASI that
is based on musl do not. WASI does not have `dup2()`; instead it has
its own non-portable version of it that can only assign to previously
allocated fds.
Update the stream redirection code so that it does the right thing
on WASI and other platforms.
This is mostly important for YoWASP builds, since those do not have
a way to build with external ABC (I prototyped it but for some reason
ABC always segfaults when built as an independent Wasm binary...)
Distinguish between the A, B input ports of `$_ANDNOT_`, `$_ORNOT_`
gates when considering those for sharing. Unlike the input ports of the
other supported single-bit gates, those are not interchangeable.
Fixes#3848.
An `std::vector<T>::reverse_iterator` stores the
`std::vector<T>::iterator` which points to the (forwards-ordered)
*following* item. Thus while `vec.rbegin()` dereferences to the final
item of `vec`, the iterator it wraps (`vec.rbegin().base()`) is equal to
`vec.end()`.
In the remove case here, we advance `it` (backwards), erasing the item
we just advanced past by grabbing its (pre-increment) base
forward-iterator and subtracting 1.
The iterator maths here is obviously all OK, but the forward-iterator
that `it` wraps post-increment actually points to the item we just
removed. That iterator was invalidated by the `erase()` call.
That this works anyway is (AFAICT) some combination of luck and/or
promises that aren't part of the C++ spec, but MSVC's debug iterator
support picks this up.
`erase()` returns the new iterator that follows the item just erased,
which happens to be the exact one we want our reverse-iterator to wrap
for the next loop; we get a fresh iterator to the same base, now without
the preceding item.
This makes clk2fflogic add an attr to $ff cells that carry the state of
the emulated async FF. The $ff output doesn't have any async updates
that happened in the current cycle, but the $ff input does, so the $ff
input corresponds to the async FF's output in the original design.
Hence this patch also makes the following changes to passes besides
clk2fflogic (but only for FFs with the clk2fflogic attr set):
* opt_clean treats the input as a register name (instead of the
output)
* rename -witness ensures that the input has a public name
* the formal backends (smt2, btor, aiger) will use the input's
name for the initial state of the FF in witness files
* when sim reads a yw witness that assigns an initial value to the
input signal, the state update is redirected to the output
This ensures that yosys witness files for clk2fflogic designs have
useful and stable public signal names. It also makes it possible to
simulate a clk2fflogic witness on the original design (with some
limitations when the original design is already using $ff cells).
It might seem like setting the output of a clk2fflogic FF to update the
input's initial value might not work in general, but it works fine for
these reasons:
* Witnesses for FFs are only present in the initial cycle, so we do
not care about any later cycles.
* The logic that clk2fflogic generates loops the output of the
genreated FF back to the input, with muxes in between to apply any
edge or level sensitive updates. So when there are no active updates
in the current gclk cycle, there is a combinational path from the
output back to the input.
* The logic clk2fflogic generates makes sure that an edge sensitive
update cannot be active in the first cycle (i.e. the past initial
value is assumed to be whatever it needs to be to avoid an edge).
* When a level sensitive update is active in the first gclk cycle, it
is actively driving the output for the whole gclk cycle, so ignoring
any witness initialization is the correct behavior.
When assigning values to input ports of nested modules in cosimulation,
sim needs to find the actual driver of the signal to perform the
assignment. The existing code didn't handle unconnected inputs in that
scenario.
Allow Liberty canonical identifier including double quotes in if-body and pass-through for Synopsys-style unquoted identifiers issue#3498
Co-authored-by: Aki <201479+lethalbit@users.noreply.github.com>
Writeback of simulation state into initial state was only working for `run()` and `run_cosim_fst()`.
This change moves the writeback into the `write_output_files()` function so that all simulation modes work with the writeback option.
This was renaming cells while iterating over them which would always
cause an assertion failure. Apparently having to rename cells to make
all witness signals public is rarely required, so this slipped through.
In two places, we are joining label pieces by a '|' separator. We go
about it by putting the separator behind each entry, then removing the
trailing separator in a final fixup pass on the built string. For easier
reading, replace those occurrences by a new factored-out
'join_label_pieces' function.
Signed-off-by: Martin Povišer <povik@cutebit.org>
When the 'show' pass generates portboxes to detail the connection of
cell ports to wires, it has special handling of signal chunk
repetitions, but those repetitions are not accounted for in the
displayed bit range in case of cell outputs. Fix that, and so bring it
into consistence with the behavior on cell inputs.
So, taking for example the following Verilog snippet,
module DRIVER (Q);
output [7:0] Q;
assign Q = 8'b10101010;
endmodule
module main;
wire w;
DRIVER driver(.Q({8{w}}));
endmodule
make the show pass display '7:0 - 8x 0:0' in the driver-to-w portbox
instead of '7:7 - 8x 0:0' which it displayed formerly.
Signed-off-by: Martin Povišer <povik@cutebit.org>
This matches the behavior of smtbmc.
This also updates the sim internal memory API to allow masked writes
where State::Sa bits (internal don't care - not a valid value for a
signal) leave the memory content unchanged.
This adds the xprop_decoder attribute to bwmuxes that drive the original
unencoded signals. Setundef is changed to ignore the x inputs of these
bwmuxes, so that they survive the prep script of SBY's formal flow. This
is required to make simulation (via sim) using the prep model show the
decoded x signals instead of 0/1 values made up by the solver.
The new bitwise case equality (`$bweqx`) and bitwise mux (`$bwmux`)
cells enable compact encoding and decoding of 3-valued logic signals
using multiple 2-valued signals.
The previous approach only initialized past_d and past_ad while for FST
cosim we also need to initialize the other past values like past_clk,
etc. Also to properly initialize them, we need to run a combinational
update step in case any of the wires feeding into the FF are private or
otherwise not part of the FST.
Otherwise the final values will not show up in gtkwave waveforms
when looking at the generated traces.
Signed-off-by: Claire Xenia Wolf <claire@clairexen.net>
* Change simlib's $mux cell to use the ternary operator as $_MUX_
already does
* Stop opt_expr -keepdc from changing S=x to S=0
* Change const eval of $mux and $pmux to match the updated simlib
(fixes sim)
* The sat behavior of $mux already matches the updated simlib
The verilog frontend uses $mux for the ternary operators and this
changes all interpreations of the $mux cell (that I found) to match the
verilog simulation behavior for the ternary operator. For 'if' and
'case' expressions the frontend may also use $mux but uses $eqx if the
verilog simulation behavior is requested with the '-ifx' option.
For $pmux there is a remaining mismatch between the sat behavior and the
simlib behavior. Resolving this requires more discussion, as the $pmux
cell does not directly correspond to a specific verilog construct.
This reverts commit 08be796cb8, reversing
changes made to 38dbb44fa0.
This fixes#2728. PR #641 did not actually "fix" #639.
The actual issue in #639 is not equiv_make, but assumptions in equiv_simple
that are not true for the test case provided in #639.
This is a complete rewrite of the FF replacing code.
The previous implementation tried to implement the negative hold time by
wrapping async control signals individually with pulse stretching. This
did not correctly model the interaction between different simultaneously
changing inputs (e.g. a falling ALOAD together with a changing AD would
load the changed AD instead of the value AD had when ALOAD was high; a
falling CLR could mask a raising SET for one cycle; etc.).
The new approach first has the logic for all updates using only sampled
values followed by the logic for all updates using only current values.
That way, e.g., a falling ALOAD will load the sampled AD value but a
still active ALOAD will load the current AD value.
The new code also has deterministic behavior for the initial state: no
operation is active when that operation would depend on a specific
previous signal value. This also means clk2fflogic will no longer
generate any additional uninitialized FFs.
I also documented the negative hold time behavior in the help message,
copying the relevant part from async2sync's help messages.
In 2fcc1ee72e, the following is apparantly added in order to mark any
number of undefined LUT inputs:
lut_a.append(RTLIL::Const(State::Sx, minlut - input_nodes.size()));
However this can only be done if the number of input nodes is less
than minlut.
This fixes#3317
After speaking with the author of ABC he let me know that ifraig is a very old command, and that &get; &fraig -x; &put is over 100x faster than ifraig with improved PPA results.
After making the change I confirmed that this is in fact a major speed up. On our internal designs in O(millions) of standard cells we saw multi hour reductions in runtime.
Also included is an improvement to the dress command. Using AIG based transformations removes the spec it SATs against. Proving the input blif will make sure that no matter what commands are run the dress command can still do its job. I noticed a regression against some LUT mapping jobs that prompted me to fix this.
Uses the regex below to search (using vscode):
^\t\tlog\("(.{10,}(?<!\\n)|.{81,}\\n)"\);
Finds any log messages double indented (which help messages are)
and checks if *either* there are is no newline character at the end,
*or* the number of characters before the newline is more than 80.
Verific generates a lot of FFs with an unused async load and we cannot
always optimize that away before running clk2fflogic, so check for that
special case here.
Find FFs with undefined initialization values for which changing the
initialization does not change the observable behavior and initialize
them. For -ff2anyinit, this reduces the number of generated $anyinit
cells that drive wires with private names.
This attribute can be used by formal backends to indicate which clocks
were mapped to the global clock. Update the btor and smt2 backend which
already handle clock inputs to understand this attribute.
These can be used to protect undefined flip-flop initialization values
from optimizations that are not sound for formal verification and can
help mapping all solver-provided values in witness traces for flows that
use different backends simultaneously.
The formal backends do not support multiple clocks. This includes
constant clocks. Constant clocks do appear in what isn't a proper
multiclock design, for example when mapping not fully initialized ROMs.
As converting FFs with constant clocks to FFs using the global is doable
even in a single clock flow, make async2sync do this.
POSIX defines $TMPDIR as containing the pathname of the directory where
programs can create temporary files. On most systems, this variable points to
"/tmp". However, on some systems it can point to a different location.
Without respecting this variable, yosys fails to run on such systems.
Signed-off-by: Mohamed A. Bamakhrama <mohamed@alumni.tum.de>