2013-07-20 08:19:12 -05:00
|
|
|
|
|
|
|
\chapter{Optimizations}
|
|
|
|
\label{chapter:opt}
|
|
|
|
|
|
|
|
Yosys employs a number of optimizations to generate better and cleaner results.
|
|
|
|
This chapter outlines these optimizations.
|
|
|
|
|
|
|
|
\section{Simple Optimizations}
|
|
|
|
|
|
|
|
The Yosys pass {\tt opt} runs a number of simple optimizations. This includes removing unused
|
|
|
|
signals and cells and const folding. It is recommended to run this pass after each major step
|
|
|
|
in the synthesis script. At the time of this writing the {\tt opt} pass executes the following
|
|
|
|
passes that each perform a simple optimization:
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
\item Once at the beginning of {\tt opt}:
|
|
|
|
\begin{itemize}
|
|
|
|
\item {\tt opt\_const}
|
|
|
|
\item {\tt opt\_share -nomux}
|
|
|
|
\end{itemize}
|
|
|
|
\item Repeat until result is stable:
|
|
|
|
\begin{itemize}
|
|
|
|
\item {\tt opt\_muxtree}
|
|
|
|
\item {\tt opt\_reduce}
|
|
|
|
\item {\tt opt\_share}
|
|
|
|
\item {\tt opt\_rmdff}
|
|
|
|
\item {\tt opt\_clean}
|
|
|
|
\item {\tt opt\_const}
|
|
|
|
\end{itemize}
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
The following section describes each of the {\tt opt\_*} passes.
|
|
|
|
|
|
|
|
\subsection{The opt\_const pass}
|
|
|
|
|
|
|
|
This pass performs const folding on the internal combinational cell types
|
|
|
|
described in Chap.~\ref{chapter:celllib}. This means a cell with all constant
|
|
|
|
inputs is replaced with the constant value this cell drives. In some cases
|
|
|
|
this pass can also optimize cells with some constant inputs.
|
|
|
|
|
|
|
|
\begin{table}
|
|
|
|
\hfil
|
|
|
|
\begin{tabular}{cc|c}
|
|
|
|
A-Input & B-Input & Replacement \\
|
|
|
|
\hline
|
|
|
|
any & 0 & 0 \\
|
|
|
|
0 & any & 0 \\
|
|
|
|
1 & 1 & 1 \\
|
|
|
|
\hline
|
|
|
|
X/Z & X/Z & X \\
|
|
|
|
1 & X/Z & X \\
|
|
|
|
X/Z & 1 & X \\
|
|
|
|
\hline
|
|
|
|
any & X/Z & 0 \\
|
|
|
|
X/Z & any & 0 \\
|
|
|
|
\hline
|
|
|
|
$a$ & 1 & $a$ \\
|
|
|
|
1 & $b$ & $b$ \\
|
|
|
|
\end{tabular}
|
|
|
|
\caption{Const folding rules for {\tt\$\_AND\_} cells as used in {\tt opt\_const}.}
|
|
|
|
\label{tab:opt_const_and}
|
|
|
|
\end{table}
|
|
|
|
|
|
|
|
Table~\ref{tab:opt_const_and} shows the replacement rules used for optimizing
|
|
|
|
an {\tt\$\_AND\_} gate. The first three rules implement the obvious const folding
|
|
|
|
rules. Note that `any' might include dynamic values calculated by other parts
|
|
|
|
of the circuit. The following three lines propagate undef (X) states.
|
|
|
|
These are the only three cases in which it is allowed to propagate an undef
|
|
|
|
according to Sec.~5.1.10 of IEEE Std. 1364-2005 \cite{Verilog2005}.
|
|
|
|
|
|
|
|
The next two lines assume the value 0 for undef states. These two rules are only
|
|
|
|
used if no other subsitutions are possible in the current module. If other substitutions
|
|
|
|
are possible they are performed first, in the hope that the `any' will change to
|
|
|
|
an undef value or a 1 and therefore the output can be set to undef.
|
|
|
|
|
|
|
|
The last two lines simply replace an {\tt\$\_AND\_} gate with one constant-1
|
|
|
|
input with a buffer.
|
|
|
|
|
|
|
|
Besides this basic const folding the {\tt opt\_const} pass can replace 1-bit wide
|
|
|
|
{\tt \$eq} and {\tt \$ne} cells with buffers or not-gates if one input is constant.
|
|
|
|
|
|
|
|
The {\tt opt\_const} pass is very conservative regarding optimizing {\tt \$mux} cells,
|
|
|
|
as these cells are often used to model decision-trees and breaking these trees can
|
|
|
|
interfere with other optimizations.
|
|
|
|
|
|
|
|
\subsection{The opt\_muxtree pass}
|
|
|
|
|
|
|
|
This pass optimizes trees of multiplexer cells by analyzing the select inputs.
|
|
|
|
Consider the following simple example:
|
|
|
|
|
|
|
|
\begin{lstlisting}[numbers=left,frame=single,language=Verilog]
|
|
|
|
module uut(a, y);
|
|
|
|
input a;
|
|
|
|
output [1:0] y = a ? (a ? 1 : 2) : 3;
|
|
|
|
endmodule
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
|
|
The output can never be 2, as this would require \lstinline[language=Verilog];a;
|
|
|
|
to be 1 for the outer multiplexer and 0 for the inner multiplexer. The {\tt
|
|
|
|
opt\_muxtree} pass detects this contradiction and replaces the inner multiplexer
|
|
|
|
with a constant 1, yielding the logic for \lstinline[language=Verilog];y = a ? 1 : 3;.
|
|
|
|
|
|
|
|
\subsection{The opt\_reduce pass}
|
|
|
|
|
|
|
|
\begin{sloppypar}
|
|
|
|
This is a simple optimization pass that identifies and consolidates identical input
|
|
|
|
bits to {\tt \$reduce\_and} and {\tt \$reduce\_or} cells. It also sorts the input
|
|
|
|
bits to ease identification of shareable {\tt \$reduce\_and} and {\tt \$reduce\_or} cells
|
|
|
|
in other passes.
|
|
|
|
\end{sloppypar}
|
|
|
|
|
|
|
|
This pass also identifies and consolidates identical inputs to multiplexer cells. In this
|
|
|
|
case the new shared select bit is driven using a {\tt \$reduce\_or} cell that combines
|
|
|
|
the original select bits.
|
|
|
|
|
|
|
|
Lastly this pass consolidates trees of {\tt \$reduce\_and} cells and trees of
|
|
|
|
{\tt \$reduce\_or} cells to single large {\tt \$reduce\_and} or {\tt \$reduce\_or} cells.
|
|
|
|
|
|
|
|
These three simple optimizations are performed in a loop until a stable result is
|
|
|
|
produced.
|
|
|
|
|
|
|
|
\subsection{The opt\_rmdff pass}
|
|
|
|
|
|
|
|
This pass identifies single-bit d-type flip-flops ({\tt \$\_DFF\_*}, {\tt \$dff}, and {\tt
|
|
|
|
\$adff} cells) with a constant data input and replaces them with a constant driver.
|
|
|
|
|
|
|
|
\subsection{The opt\_clean pass}
|
|
|
|
|
|
|
|
This pass identifies unused signals and cells and removes them from the design. It also
|
|
|
|
creates an \B{unused\_bits} attribute on wires with unused bits. This attribute can be
|
|
|
|
used for debugging or by other optimization passes.
|
|
|
|
|
|
|
|
\subsection{The opt\_share pass}
|
|
|
|
|
|
|
|
This pass performs trivial resource sharing. This means that this pass identifies cells
|
|
|
|
with identical inputs and replaces them with a single instance of the cell.
|
|
|
|
|
|
|
|
The option {\tt -nomux} can be used to disable resource sharing for multiplexer
|
2014-08-15 07:04:35 -05:00
|
|
|
cells ({\tt \$mux} and {\tt \$pmux}. This can be useful as
|
2013-07-20 08:19:12 -05:00
|
|
|
it prevents multiplexer trees to be merged, which might prevent {\tt opt\_muxtree}
|
|
|
|
to identify possible optimizations.
|
|
|
|
|
|
|
|
\section{FSM Extraction and Encoding}
|
|
|
|
|
|
|
|
The {\tt fsm} pass performs finite-state-machine (FSM) extraction and recoding. The {\tt fsm}
|
|
|
|
pass simply executes the following other passes:
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
\item Identify and extract FSMs:
|
|
|
|
\begin{itemize}
|
|
|
|
\item {\tt fsm\_detect}
|
|
|
|
\item {\tt fsm\_extract}
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
\item Basic optimizations:
|
|
|
|
\begin{itemize}
|
|
|
|
\item {\tt fsm\_opt}
|
|
|
|
\item {\tt opt\_clean}
|
|
|
|
\item {\tt fsm\_opt}
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
\item Expanding to nearby gate-logic (if called with {\tt -expand}):
|
|
|
|
\begin{itemize}
|
|
|
|
\item {\tt fsm\_expand}
|
|
|
|
\item {\tt opt\_clean}
|
|
|
|
\item {\tt fsm\_opt}
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
\item Re-code FSM states (unless called with {\tt -norecode}):
|
|
|
|
\begin{itemize}
|
|
|
|
\item {\tt fsm\_recode}
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
\item Print information about FSMs:
|
|
|
|
\begin{itemize}
|
|
|
|
\item {\tt fsm\_info}
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
\item Export FSMs in KISS2 file format (if called with {\tt -export}):
|
|
|
|
\begin{itemize}
|
|
|
|
\item {\tt fsm\_export}
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
\item Map FSMs to RTL cells (unless called with {\tt -nomap}):
|
|
|
|
\begin{itemize}
|
|
|
|
\item {\tt fsm\_map}
|
|
|
|
\end{itemize}
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
The {\tt fsm\_detect} pass identifies FSM state registers and marks them using the
|
|
|
|
\B{fsm\_encoding}{\tt = "auto"} attribute. The {\tt fsm\_extract} extracts all
|
|
|
|
FSMs marked using the \B{fsm\_encoding} attribute (unless \B{fsm\_encoding} is
|
|
|
|
set to {\tt "none"}) and replaces the corresponding RTL cells with a {\tt \$fsm}
|
|
|
|
cell. All other {\tt fsm\_*} passes operate on these {\tt \$fsm} cells. The
|
|
|
|
{\tt fsm\_map} call finally replaces the {\tt \$fsm} cells with RTL cells.
|
|
|
|
|
|
|
|
Note that these optimizations operate on an RTL netlist. I.e.~the {\tt fsm} pass
|
|
|
|
should be executed after the {\tt proc} pass has transformed all
|
|
|
|
{\tt RTLIL::Process} objects to RTL cells.
|
|
|
|
|
|
|
|
The algorithms used for FSM detection and extraction are influenced by a more
|
|
|
|
general reported technique \cite{fsmextract}.
|
|
|
|
|
|
|
|
\subsection{FSM Detection}
|
|
|
|
|
|
|
|
The {\tt fsm\_detect} pass identifies FSM state registers. It sets the
|
|
|
|
\B{fsm\_encoding}{\tt = "auto"} attribute on any (multi-bit) wire that matches
|
|
|
|
the following description:
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
\item Does not already have the \B{fsm\_encoding} attribute.
|
|
|
|
\item Is not an output of the containing module.
|
|
|
|
\item Is driven by single {\tt \$dff} or {\tt \$adff} cell.
|
|
|
|
\item The \B{D}-Input of this {\tt \$dff} or {\tt \$adff} cell is driven by a multiplexer
|
|
|
|
tree that only has constants or the old state value on its leaves.
|
|
|
|
\item The state value is only used in the said multiplexer tree or by simple relational
|
|
|
|
cells that compare the state value to a constant (usually {\tt \$eq} cells).
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
This heuristic has proven to work very well. It is possible to overwrite it by setting
|
|
|
|
\B{fsm\_encoding}{\tt = "auto"} on registers that should be considered FSM state registers
|
|
|
|
and setting \B{fsm\_encoding}{\tt = "none"} on registers that match the above criteria
|
|
|
|
but should not be considered FSM state registers.
|
|
|
|
|
|
|
|
\subsection{FSM Extraction}
|
|
|
|
|
|
|
|
The {\tt fsm\_extract} pass operates on all state signals marked with the
|
|
|
|
\B{fsm\_encoding} ({\tt != "none"}) attribute. For each state signal the following
|
|
|
|
information is determined:
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
\item The state registers
|
|
|
|
\item The asynchronous reset state if the state registers use asynchronous reset
|
|
|
|
\item All states and the control input signals used in the state transition functions
|
|
|
|
\item The control output signals calculated from the state signals and control inputs
|
|
|
|
\item A table of all state transitions and corresponding control inputs- and outputs
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
The state registers (and asynchronous reset state, if applicable) is simply determined
|
|
|
|
by identifying the driver for the state signal.
|
|
|
|
|
|
|
|
From there the {\tt \$mux}-tree driving the state register inputs is
|
|
|
|
recursively traversed. All select inputs are control signals and the leaves of the
|
2015-07-02 04:14:30 -05:00
|
|
|
{\tt \$mux}-tree are the states. The algorithm fails if a non-constant leaf
|
2013-07-20 08:19:12 -05:00
|
|
|
that is not the state signal itself is found.
|
|
|
|
|
|
|
|
The list of control outputs is initialized with the bits from the state signal.
|
|
|
|
It is then extended by adding all values that are calculated by cells that
|
|
|
|
compare the state signal with a constant value.
|
|
|
|
|
|
|
|
In most cases this will cover all uses of the state register, thus rendering the
|
|
|
|
state encoding arbitrary. If however a design uses e.g.~a single bit of the state
|
|
|
|
value to drive a control output directly, this bit of the state signal will be
|
|
|
|
transformed to a control output of the same value.
|
|
|
|
|
|
|
|
Finally, a transition table for the FSM is generated. This is done by using the
|
|
|
|
{\tt ConstEval} C++ helper class (defined in {\tt kernel/consteval.h}) that can
|
|
|
|
be used to evaluate parts of the design. The {\tt ConstEval} class can be asked
|
|
|
|
to calculate a given set of result signals using a set of signal-value
|
|
|
|
assignments. It can also be passed a list of stop-signals that abort the {\tt
|
|
|
|
ConstEval} algorithm if the value of a stop-signal is needed in order to
|
|
|
|
calculate the result signals.
|
|
|
|
|
|
|
|
The {\tt fsm\_extract} pass uses the {\tt ConstEval} class in the following way
|
|
|
|
to create a transition table. For each state:
|
|
|
|
|
|
|
|
\begin{enumerate}
|
|
|
|
\item Create a {\tt ConstEval} object for the module containing the FSM
|
|
|
|
\item Add all control inputs to the list of stop signals
|
|
|
|
\item Set the state signal to the current state
|
|
|
|
\item Try to evaluate the next state and control output \label{enum:fsm_extract_cealg_try}
|
|
|
|
\item If step~\ref{enum:fsm_extract_cealg_try} was not successful:
|
|
|
|
\begin{itemize}
|
|
|
|
\item Recursively goto step~\ref{enum:fsm_extract_cealg_try} with the offending stop-signal set to 0.
|
|
|
|
\item Recursively goto step~\ref{enum:fsm_extract_cealg_try} with the offending stop-signal set to 1.
|
|
|
|
\end{itemize}
|
|
|
|
\item If step~\ref{enum:fsm_extract_cealg_try} was successful: Emit transition
|
|
|
|
\end{enumerate}
|
|
|
|
|
|
|
|
Finally a {\tt \$fsm} cell is created with the generated transition table and added to the
|
|
|
|
module. This new cell is connected to the control signals and the old drivers for the
|
|
|
|
control outputs are disconnected.
|
|
|
|
|
|
|
|
\subsection{FSM Optimization}
|
|
|
|
|
|
|
|
The {\tt fsm\_opt} pass performs basic optimizations on {\tt \$fsm} cells (not including state
|
|
|
|
recoding). The following optimizations are performed (in this order):
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
\item Unused control outputs are removed from the {\tt \$fsm} cell. The attribute \B{unused\_bits}
|
|
|
|
(that is usually set by the {\tt opt\_clean} pass) is used to determine which control
|
|
|
|
outputs are unused.
|
|
|
|
\item Control inputs that are connected to the same driver are merged.
|
|
|
|
\item When a control input is driven by a control output, the control input is removed and the transition
|
|
|
|
table altered to give the same performance without the external feedback path.
|
|
|
|
\item Entries in the transition table that yield the same output and only
|
|
|
|
differ in the value of a single control input bit are merged and the different bit is removed
|
|
|
|
from the sensitivity list (turned into a don't-care bit).
|
|
|
|
\item Constant inputs are removed and the transition table is alterered to give an unchanged behaviour.
|
|
|
|
\item Unused inputs are removed.
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
\subsection{FSM Recoding}
|
|
|
|
|
|
|
|
The {\tt fsm\_recode} pass assigns new bit pattern to the states. Usually this
|
|
|
|
also implies a change in the width of the state signal. At the moment of this
|
|
|
|
writing only one-hot encoding with all-zero for the reset state is supported.
|
|
|
|
|
|
|
|
The {\tt fsm\_recode} pass can also write a text file with the changes performed
|
|
|
|
by it that can be used when verifying designs synthesized by Yosys using Synopsys
|
|
|
|
Formality \citeweblink{Formality}.
|
|
|
|
|
|
|
|
\section{Logic Optimization}
|
|
|
|
|
|
|
|
Yosys can perform multi-level combinational logic optimization on gate-level netlists using the
|
|
|
|
external program ABC \citeweblink{ABC}. The {\tt abc} pass extracts the combinational gate-level
|
|
|
|
parts of the design, passes it through ABC, and re-integrates the results. The {\tt abc} pass
|
|
|
|
can also be used to perform other operations using ABC, such as technology mapping (see
|
|
|
|
Sec.~\ref{sec:techmap_extern} for details).
|
|
|
|
|