* area should be 1 for all LUTs
* clean up macros
* add log_assert to fail noisily when encountering oddly configured DFF
* clean help msg
* flatten set to true by default
* update
* merge mult tests
* remove redundant test
* move all dsp tests to single file and remove redundant tests
* update ram tests
* add more dff tests
* fix c++20 compile errors
* add option to dump verilog
* default to use abc9
* remove -abc9 option since its the default now
---------
Co-authored-by: tony <minchunlin@gmail.com>
The input to a shift operation is padded.
This reduced the final number of MUX cells
but during techmap it can create huge
temporary multiplexers in the log shifter.
This significantly increases runtime and resources.
A limit is added with a warning when it is used.
If the offset is larger than the signal itself,
meaning the signal is completely shifted out,
it tried to extract a negative amount of bits from the old signal.
This RTL pattern is suspicious since it is a complicated way of
arriving at a constant value, so we warn the user.
The previous version could easily generate a large amount of padding
when the constant factor was significantly larger than the width of the
shift data input. This could lead to huge amounts of logic being
generated before then being optimized away at a huge performance and
memory cost.
Additionally and more critically, when the input width was not a
multiple of the constant factor, the input data was padded with 'x bits
to such a multiple before interspersing the 'x padding needed to align
the selectable windows to power-of-two offsets.
Such a final padding would not be correct for shifts besides $shiftx,
and the previous version did attempt to remove that final padding at the
end so that the native zero/sign/x-extension behavior of the shift cell
would be used, but since the last selectable window also got
power-of-two padding appended after the padding the code is trying to
remove got added, it did not actually fully remove it in some cases.
I changed the code to only add 'x padding between selectable windows,
leaving the last selectable window unpadded. This omits the need to add
final padding to a multiple of the constant factor in the first place.
In turn, that means the only 'x bits added are actually impossible to
select. As a side effect no padding is added when the constant factor is
equal to or larger than the width of the shift data input, also solving
the reported performance bug.
This fixes#4056
Remove duplicate %.pmg -> %_pm.h pattern. One of the duplicates overrode
the other, and in some conditions there were build races as to whether
the target directory for the generated header would exist. Instead have
a single rule which is properly generalized.
- moved all selection and filtering logic to the match block
- applied less-verbose code suggestions
- removed constraint on number of bits in shift-amount
- added check for possible wrap-arround in the operation
Add a separate shiftmul pattern to match on left shifts which implement
demuxing. This mirrors the right shift pattern matcher but is probably
best kept separate instead of merging the two into a single matcher.
In any case the diff of the two matchers should be easily readable.
The `opt_expr` pass running before `peepopt` can interfere with the
detection of a shiftmul pattern due to some of the bottom bits of the
shift amount being replaced with constant zero. Extend the detection to
cover those situations as well.
Uses the regex below to search (using vscode):
^\t\tlog\("(.{10,}(?<!\\n)|.{81,}\\n)"\);
Finds any log messages double indented (which help messages are)
and checks if *either* there are is no newline character at the end,
*or* the number of characters before the newline is more than 80.
The main part is converting ice40_dsp to recognize the new FF types
created in opt_dff instead of trying to recognize the mux patterns on
its own.
The fsm call has been moved upwards because the passes cannot deal with
$dffe/$sdff*, and other optimizations don't help it much anyway.
The main part is converting xilinx_dsp to recognize the new FF types
created in opt_dff instead of trying to recognize the patterns on its
own.
The fsm call has been moved upwards because the passes cannot deal with
$dffe/$sdff*, and other optimizations don't help it much anyway.