mirror of https://github.com/YosysHQ/yosys.git
333 lines
12 KiB
ReStructuredText
333 lines
12 KiB
ReStructuredText
.. _chapter:opt:
|
||
|
||
Optimization passes
|
||
===================
|
||
|
||
.. TODO: copypaste
|
||
|
||
Yosys employs a number of optimizations to generate better and cleaner results.
|
||
This chapter outlines these optimizations.
|
||
|
||
Simple optimizations
|
||
--------------------
|
||
|
||
The Yosys pass opt runs a number of simple optimizations. This includes removing
|
||
unused signals and cells and const folding. It is recommended to run this pass
|
||
after each major step in the synthesis script. At the time of this writing the
|
||
opt pass executes the following passes that each perform a simple optimization:
|
||
|
||
- Once at the beginning of opt:
|
||
|
||
- opt_expr
|
||
- opt_merge -nomux
|
||
|
||
- Repeat until result is stable:
|
||
|
||
- opt_muxtree
|
||
- opt_reduce
|
||
- opt_merge
|
||
- opt_rmdff
|
||
- opt_clean
|
||
- opt_expr
|
||
|
||
The following section describes each of the opt\_ passes.
|
||
|
||
The opt_expr pass
|
||
~~~~~~~~~~~~~~~~~
|
||
|
||
This pass performs const folding on the internal combinational cell types
|
||
described in :ref:`chapter:celllib`. This means a cell with all
|
||
constant inputs is replaced with the constant value this cell drives. In some
|
||
cases this pass can also optimize cells with some constant inputs.
|
||
|
||
.. table:: Const folding rules for $_AND\_ cells as used in opt_expr.
|
||
:name: tab:opt_expr_and
|
||
:align: center
|
||
|
||
========= ========= ===========
|
||
A-Input B-Input Replacement
|
||
========= ========= ===========
|
||
any 0 0
|
||
0 any 0
|
||
1 1 1
|
||
--------- --------- -----------
|
||
X/Z X/Z X
|
||
1 X/Z X
|
||
X/Z 1 X
|
||
--------- --------- -----------
|
||
any X/Z 0
|
||
X/Z any 0
|
||
--------- --------- -----------
|
||
:math:`a` 1 :math:`a`
|
||
1 :math:`b` :math:`b`
|
||
========= ========= ===========
|
||
|
||
.. How to format table?
|
||
|
||
:numref:`Table %s <tab:opt_expr_and>` shows the replacement rules used for
|
||
optimizing an $_AND\_ gate. The first three rules implement the obvious const
|
||
folding rules. Note that ‘any' might include dynamic values calculated by other
|
||
parts of the circuit. The following three lines propagate undef (X) states.
|
||
These are the only three cases in which it is allowed to propagate an undef
|
||
according to Sec. 5.1.10 of IEEE Std. 1364-2005 :cite:p:`Verilog2005`.
|
||
|
||
The next two lines assume the value 0 for undef states. These two rules are only
|
||
used if no other substitutions are possible in the current module. If other
|
||
substitutions are possible they are performed first, in the hope that the ‘any'
|
||
will change to an undef value or a 1 and therefore the output can be set to
|
||
undef.
|
||
|
||
The last two lines simply replace an $_AND\_ gate with one constant-1 input with
|
||
a buffer.
|
||
|
||
Besides this basic const folding the opt_expr pass can replace 1-bit wide $eq
|
||
and $ne cells with buffers or not-gates if one input is constant.
|
||
|
||
The opt_expr pass is very conservative regarding optimizing $mux cells, as these
|
||
cells are often used to model decision-trees and breaking these trees can
|
||
interfere with other optimizations.
|
||
|
||
The opt_muxtree pass
|
||
~~~~~~~~~~~~~~~~~~~~
|
||
|
||
This pass optimizes trees of multiplexer cells by analyzing the select inputs.
|
||
Consider the following simple example:
|
||
|
||
.. code:: verilog
|
||
:number-lines:
|
||
|
||
module uut(a, y); input a; output [1:0] y = a ? (a ? 1 : 2) : 3; endmodule
|
||
|
||
The output can never be 2, as this would require ``a`` to be 1 for the outer
|
||
multiplexer and 0 for the inner multiplexer. The opt_muxtree pass detects this
|
||
contradiction and replaces the inner multiplexer with a constant 1, yielding the
|
||
logic for ``y = a ? 1 : 3``.
|
||
|
||
The opt_reduce pass
|
||
~~~~~~~~~~~~~~~~~~~
|
||
|
||
This is a simple optimization pass that identifies and consolidates identical
|
||
input bits to $reduce_and and $reduce_or cells. It also sorts the input bits to
|
||
ease identification of shareable $reduce_and and $reduce_or cells in other
|
||
passes.
|
||
|
||
This pass also identifies and consolidates identical inputs to multiplexer
|
||
cells. In this case the new shared select bit is driven using a $reduce_or cell
|
||
that combines the original select bits.
|
||
|
||
Lastly this pass consolidates trees of $reduce_and cells and trees of $reduce_or
|
||
cells to single large $reduce_and or $reduce_or cells.
|
||
|
||
These three simple optimizations are performed in a loop until a stable result
|
||
is produced.
|
||
|
||
The opt_rmdff pass
|
||
~~~~~~~~~~~~~~~~~~
|
||
|
||
This pass identifies single-bit d-type flip-flops ($_DFF\_, $dff, and $adff
|
||
cells) with a constant data input and replaces them with a constant driver.
|
||
|
||
The opt_clean pass
|
||
~~~~~~~~~~~~~~~~~~
|
||
|
||
This pass identifies unused signals and cells and removes them from the design.
|
||
It also creates an ``\unused_bits`` attribute on wires with unused bits. This
|
||
attribute can be used for debugging or by other optimization passes.
|
||
|
||
The opt_merge pass
|
||
~~~~~~~~~~~~~~~~~~
|
||
|
||
This pass performs trivial resource sharing. This means that this pass
|
||
identifies cells with identical inputs and replaces them with a single instance
|
||
of the cell.
|
||
|
||
The option -nomux can be used to disable resource sharing for multiplexer cells
|
||
($mux and $pmux. This can be useful as it prevents multiplexer trees to be
|
||
merged, which might prevent opt_muxtree to identify possible optimizations.
|
||
|
||
FSM extraction and encoding
|
||
---------------------------
|
||
|
||
The fsm pass performs finite-state-machine (FSM) extraction and recoding. The
|
||
fsm pass simply executes the following other passes:
|
||
|
||
- Identify and extract FSMs:
|
||
|
||
- fsm_detect
|
||
- fsm_extract
|
||
|
||
- Basic optimizations:
|
||
|
||
- fsm_opt
|
||
- opt_clean
|
||
- fsm_opt
|
||
|
||
- Expanding to nearby gate-logic (if called with -expand):
|
||
|
||
- fsm_expand
|
||
- opt_clean
|
||
- fsm_opt
|
||
|
||
- Re-code FSM states (unless called with -norecode):
|
||
|
||
- fsm_recode
|
||
|
||
- Print information about FSMs:
|
||
|
||
- fsm_info
|
||
|
||
- Export FSMs in KISS2 file format (if called with -export):
|
||
|
||
- fsm_export
|
||
|
||
- Map FSMs to RTL cells (unless called with -nomap):
|
||
|
||
- fsm_map
|
||
|
||
The fsm_detect pass identifies FSM state registers and marks them using the
|
||
``\fsm_encoding = "auto"`` attribute. The fsm_extract extracts all FSMs marked
|
||
using the ``\fsm_encoding`` attribute (unless ``\fsm_encoding`` is set to
|
||
"none") and replaces the corresponding RTL cells with a $fsm cell. All other
|
||
fsm\_ passes operate on these $fsm cells. The fsm_map call finally replaces the
|
||
$fsm cells with RTL cells.
|
||
|
||
Note that these optimizations operate on an RTL netlist. I.e. the fsm pass
|
||
should be executed after the proc pass has transformed all RTLIL::Process
|
||
objects to RTL cells.
|
||
|
||
The algorithms used for FSM detection and extraction are influenced by a more
|
||
general reported technique :cite:p:`fsmextract`.
|
||
|
||
FSM detection
|
||
~~~~~~~~~~~~~
|
||
|
||
The fsm_detect pass identifies FSM state registers. It sets the ``\fsm_encoding
|
||
= "auto"`` attribute on any (multi-bit) wire that matches the following
|
||
description:
|
||
|
||
- Does not already have the ``\fsm_encoding`` attribute.
|
||
- Is not an output of the containing module.
|
||
- Is driven by single $dff or $adff cell.
|
||
- The ``\D``-Input of this $dff or $adff cell is driven by a multiplexer tree
|
||
that only has constants or the old state value on its leaves.
|
||
- The state value is only used in the said multiplexer tree or by simple
|
||
relational cells that compare the state value to a constant (usually $eq
|
||
cells).
|
||
|
||
This heuristic has proven to work very well. It is possible to overwrite it by
|
||
setting ``\fsm_encoding = "auto"`` on registers that should be considered FSM
|
||
state registers and setting ``\fsm_encoding = "none"`` on registers that match
|
||
the above criteria but should not be considered FSM state registers.
|
||
|
||
Note however that marking state registers with ``\fsm_encoding`` that are not
|
||
suitable for FSM recoding can cause synthesis to fail or produce invalid
|
||
results.
|
||
|
||
FSM extraction
|
||
~~~~~~~~~~~~~~
|
||
|
||
The fsm_extract pass operates on all state signals marked with the
|
||
(``\fsm_encoding != "none"``) attribute. For each state signal the following
|
||
information is determined:
|
||
|
||
- The state registers
|
||
|
||
- The asynchronous reset state if the state registers use asynchronous reset
|
||
|
||
- All states and the control input signals used in the state transition
|
||
functions
|
||
|
||
- The control output signals calculated from the state signals and control
|
||
inputs
|
||
|
||
- A table of all state transitions and corresponding control inputs- and
|
||
outputs
|
||
|
||
The state registers (and asynchronous reset state, if applicable) is simply
|
||
determined by identifying the driver for the state signal.
|
||
|
||
From there the $mux-tree driving the state register inputs is recursively
|
||
traversed. All select inputs are control signals and the leaves of the $mux-tree
|
||
are the states. The algorithm fails if a non-constant leaf that is not the state
|
||
signal itself is found.
|
||
|
||
The list of control outputs is initialized with the bits from the state signal.
|
||
It is then extended by adding all values that are calculated by cells that
|
||
compare the state signal with a constant value.
|
||
|
||
In most cases this will cover all uses of the state register, thus rendering the
|
||
state encoding arbitrary. If however a design uses e.g. a single bit of the
|
||
state value to drive a control output directly, this bit of the state signal
|
||
will be transformed to a control output of the same value.
|
||
|
||
Finally, a transition table for the FSM is generated. This is done by using the
|
||
ConstEval C++ helper class (defined in kernel/consteval.h) that can be used to
|
||
evaluate parts of the design. The ConstEval class can be asked to calculate a
|
||
given set of result signals using a set of signal-value assignments. It can also
|
||
be passed a list of stop-signals that abort the ConstEval algorithm if the value
|
||
of a stop-signal is needed in order to calculate the result signals.
|
||
|
||
The fsm_extract pass uses the ConstEval class in the following way to create a
|
||
transition table. For each state:
|
||
|
||
1. Create a ConstEval object for the module containing the FSM
|
||
2. Add all control inputs to the list of stop signals
|
||
3. Set the state signal to the current state
|
||
4. Try to evaluate the next state and control output
|
||
5. If step 4 was not successful:
|
||
|
||
- Recursively goto step 4 with the offending stop-signal set to 0.
|
||
- Recursively goto step 4 with the offending stop-signal set to 1.
|
||
|
||
6. If step 4 was successful: Emit transition
|
||
|
||
Finally a $fsm cell is created with the generated transition table and added to
|
||
the module. This new cell is connected to the control signals and the old
|
||
drivers for the control outputs are disconnected.
|
||
|
||
FSM optimization
|
||
~~~~~~~~~~~~~~~~
|
||
|
||
The fsm_opt pass performs basic optimizations on $fsm cells (not including state
|
||
recoding). The following optimizations are performed (in this order):
|
||
|
||
- Unused control outputs are removed from the $fsm cell. The attribute
|
||
``\unused_bits`` (that is usually set by the opt_clean pass) is used to
|
||
determine which control outputs are unused.
|
||
|
||
- Control inputs that are connected to the same driver are merged.
|
||
|
||
- When a control input is driven by a control output, the control input is
|
||
removed and the transition table altered to give the same performance without
|
||
the external feedback path.
|
||
|
||
- Entries in the transition table that yield the same output and only differ in
|
||
the value of a single control input bit are merged and the different bit is
|
||
removed from the sensitivity list (turned into a don't-care bit).
|
||
|
||
- Constant inputs are removed and the transition table is altered to give an
|
||
unchanged behaviour.
|
||
|
||
- Unused inputs are removed.
|
||
|
||
FSM recoding
|
||
~~~~~~~~~~~~
|
||
|
||
The fsm_recode pass assigns new bit pattern to the states. Usually this also
|
||
implies a change in the width of the state signal. At the moment of this writing
|
||
only one-hot encoding with all-zero for the reset state is supported.
|
||
|
||
The fsm_recode pass can also write a text file with the changes performed by it
|
||
that can be used when verifying designs synthesized by Yosys using Synopsys
|
||
Formality .
|
||
|
||
Logic optimization
|
||
------------------
|
||
|
||
Yosys can perform multi-level combinational logic optimization on gate-level
|
||
netlists using the external program ABC . The abc pass extracts the
|
||
combinational gate-level parts of the design, passes it through ABC, and
|
||
re-integrates the results. The abc pass can also be used to perform other
|
||
operations using ABC, such as technology mapping (see :ref:`sec:techmap_extern`
|
||
for details).
|