Updated ABC info

Includes comparison of `abc` v `abc9`. Also creates a new subsection of the
yosys internals for extending yosys (moving the previous extensions.rst into it).

Co-authored-by: Lofty <dan.ravensloft@gmail.com>
This commit is contained in:
Krystine Sherwin 2023-12-13 10:08:45 +13:00
parent e34a25ea27
commit 1733a76273
No known key found for this signature in database
6 changed files with 196 additions and 45 deletions

View File

@ -9,7 +9,7 @@ yosys-config
The ``yosys-config`` tool (an auto-generated shell-script) can be used to query The ``yosys-config`` tool (an auto-generated shell-script) can be used to query
compiler options and other information needed for building loadable modules for compiler options and other information needed for building loadable modules for
Yosys. See :doc:`/yosys_internals/extensions` for details. Yosys. See :doc:`/yosys_internals/extending_yosys/extensions` for details.
.. literalinclude:: /temp/yosys-config .. literalinclude:: /temp/yosys-config
:start-at: Usage :start-at: Usage

View File

@ -1,39 +1,103 @@
The :cmd:ref:`abc` command The ABC toolbox
~~~~~~~~~~~~~~~~~~~~~~~~~~ ---------------
.. TODO:: discuss abc, consider using https://github.com/Ravenslofty/yosys-cookbook/blob/master/misc/abc9.md .. role:: yoscrypt(code)
The :cmd:ref:`abc` command provides an interface to ABC_, an open source tool
for low-level logic synthesis.
.. _ABC: http://www.eecs.berkeley.edu/~alanmi/abc/
The :cmd:ref:`abc` command processes a netlist of internal gate types and can
perform:
- logic minimization (optimization)
- mapping of logic to standard cell library (liberty format)
- mapping of logic to k-LUTs (for FPGA synthesis)
Optionally :cmd:ref:`abc` can process registers from one clock domain and
perform sequential optimization (such as register balancing).
ABC is also controlled using scripts. An ABC script can be specified to use more
advanced ABC features. It is also possible to write the design with
:cmd:ref:`write_blif` and load the output file into ABC outside of Yosys.
Example
^^^^^^^
.. todo:: describe ``abc`` images
.. literalinclude:: /code_examples/synth_flow/abc_01.v
:language: verilog
:caption: ``docs/source/code_examples/synth_flow/abc_01.v``
.. literalinclude:: /code_examples/synth_flow/abc_01.ys
:language: yoscrypt :language: yoscrypt
:caption: ``docs/source/code_examples/synth_flow/abc_01.ys``
.. figure:: /_images/code_examples/synth_flow/abc_01.* ABC_, from the University of California, Berkeley, is a logic toolbox used for
:class: width-helper fine-grained optimisation and LUT mapping.
Yosys has two different commands, which both use this logic toolbox, but use it
in different ways.
The :cmd:ref:`abc` pass can be used for both ASIC (e.g. :yoscrypt:`abc
-liberty`) and FPGA (:yoscrypt:`abc -lut`) mapping, but this page will focus on
FPGA mapping.
The :cmd:ref:`abc9` pass generally provides superior mapping quality due to
being aware of combination boxes and DFF and LUT timings, giving it a more
global view of the mapping problem.
.. _ABC: https://github.com/berkeley-abc/abc
ABC: the unit delay model, simple and efficient
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The :cmd:ref:`abc` pass uses a highly simplified view of an FPGA:
- An FPGA is made up of a network of inputs that connect through LUTs to a
network of outputs. These inputs may actually be I/O pins, D flip-flops,
memory blocks or DSPs, but ABC is unaware of this.
- Each LUT has 1 unit of delay between an input and its output, and this applies
for all inputs of a LUT, and for all sizes of LUT up to the maximum LUT size
allowed; e.g. the delay between the input of a LUT2 and its output is the same
as the delay between the input of a LUT6 and its output.
- A LUT may take up a variable number of area units. This is constant for each
size of LUT; e.g. a LUT4 may take up 1 unit of area, but a LUT5 may take up 2
units of area, but this applies for all LUT4s and LUT5s.
This is known as the "unit delay model", because each LUT uses one unit of
delay.
From this view, the problem ABC has to solve is finding a mapping of the network
to LUTs that has the lowest delay, and then optimising the mapping for size
while maintaining this delay.
This approach has advantages:
- It is simple and easy to implement.
- Working with unit delays is fast to manipulate.
- It reflects *some* FPGA families, for example, the iCE40HX/LP fits the
assumptions of the unit delay model quite well (almost all synchronous blocks,
except for adders).
But this approach has drawbacks, too:
- The network of inputs and outputs with only LUTs means that a lot of
combinational cells (multipliers and LUTRAM) are invisible to the unit delay
model, meaning the critical path it optimises for is not necessarily the
actual critical path.
- LUTs are implemented as multiplexer trees, so there is a delay caused by the
result propagating through the remaining multiplexers. This means the
assumption of delay being equal isn't true in physical hardware, and is
proportionally larger for larger LUTs.
- Even synchronous blocks have arrival times (propagation delay between clock
edge to output changing) and setup times (requirement for input to be stable
before clock edge) which affect the delay of a path.
ABC9: the generalised delay model, realistic and flexible
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ABC9 uses a more detailed and accurate model of an FPGA:
- An FPGA is made up of a network of inputs that connect through LUTs and
combinational boxes to a network of outputs. These boxes have specified delays
between inputs and outputs, and may have an associated network ("white boxes")
or not ("black boxes"), but must be treated as a whole.
- Each LUT has a specified delay between an input and its output in arbitrary
delay units, and this varies for all inputs of a LUT and for all sizes of LUT,
but each size of LUT has the same associated delay; e.g. the delay between
input A and output is different between a LUT2 and a LUT6, but is constant for
all LUT6s.
- A LUT may take up a variable number of area units. This is constant for each
size of LUT; e.g. a LUT4 may take up 1 unit of area, but a LUT5 may take up 2
units of area, but this applies for all LUT4s and LUT5s.
This is known as the "generalised delay model", because it has been generalised
to arbitrary delay units. ABC9 doesn't actually care what units you use here,
but the Yosys convention is picoseconds. Note the introduction of boxes as a
concept. While the generalised delay model does not require boxes, they
naturally fit into it to represent combinational delays. Even synchronous delays
like arrival and setup can be emulated with combinational boxes that act as a
delay. This is further extended to white boxes, where the mapper is able to see
inside a box, and remove orphan boxes with no outputs, such as adders.
Again, ABC9 finds a mapping of the network to LUTs that has the lowest delay,
and then minimises it to find the lowest area, but it has a lot more information
to work with about the network.
The result here is that ABC9 can remove boxes (like adders) to reduce area,
optimise better around those boxes, and also permute inputs to give the critical
path the fastest inputs.
.. todo:: more about logic minimization & register balancing et al with ABC

View File

@ -0,0 +1,76 @@
Setting up a flow for ABC9
--------------------------
Much of the configuration comes from attributes and ``specify`` blocks in
Verilog simulation models.
``specify`` syntax
~~~~~~~~~~~~~~~~~~
Since ``specify`` is a relatively obscure part of the Verilog standard, a quick
guide to the syntax:
.. code-block:: verilog
specify // begins a specify block
(A => B) = 123; // simple combinational path from A to B with a delay of 123.
(A *> B) = 123; // simple combinational path from A to all bits of B with a delay of 123 for all.
if (FOO) (A => B) = 123; // paths may apply under specific conditions.
(posedge CLK => (Q : D)) = 123; // combinational path triggered on the positive edge of CLK; used for clock-to-Q arrival paths.
$setup(A, posedge CLK, 123); // setup constraint for an input relative to a clock.
endspecify // ends a specify block
By convention, all delays in ``specify`` blocks are in integer picoseconds.
Files containing ``specify`` blocks should be read with the ``-specify`` option
to ``read_verilog`` so that they aren't skipped.
LUTs
^^^^
LUTs need to be annotated with an ``(* abc9_lut=N *)`` attribute, where ``N`` is
the relative area of that LUT model. For example, if an architecture can combine
LUTs to produce larger LUTs, then the combined LUTs would have increasingly
larger ``N``. Conversely, if an architecture can split larger LUTs into smaller
LUTs, then the smaller LUTs would have smaller ``N``.
LUTs are generally specified with simple combinational paths from the LUT inputs
to the LUT output.
DFFs
^^^^
DFFs should be annotated with an ``(* abc9_flop *)`` attribute, however ABC9 has
some specific requirements for this to be valid: - the DFF must initialise to
zero (consider using ``dfflegalize`` to ensure this). - the DFF cannot have any
asynchronous resets/sets (see the simplification idiom and the Boxes section for
what to do here).
It is worth noting that in pure ``abc9`` mode, only the setup and arrival times
are passed to ABC9 (specifically, they are modelled as buffers with the given
delay). In ``abc9 -dff``, the flop itself is passed to ABC9, permitting
sequential optimisations.
Some vendors have universal DFF models which include async sets/resets even when
they're unused. Therefore *the simplification idiom* exists to handle this: by
using a ``techmap`` file to discover flops which have a constant driver to those
asynchronous controls, they can be mapped into an intermediate, simplified flop
which qualifies as an ``(* abc9_flop *)``, ran through :cmd:ref:`abc9`, and then
mapped back to the original flop. This is used in :cmd:ref:`synth_intel_alm` and
:cmd:ref:`synth_quicklogic` for the PolarPro3.
DFFs are usually specified to have setup constraints against the clock on the
input signals, and an arrival time for the Q output.
Boxes
^^^^^
A "box" is a purely-combinational piece of hard logic. If the logic is exposed
to ABC9, it's a "whitebox", otherwise it's a "blackbox". Carry chains would be
best implemented as whiteboxes, but a DSP would be best implemented as a
blackbox (multipliers are too complex to easily work with). LUT RAMs can be
implemented as whiteboxes too.
Boxes are arguably the biggest advantage that ABC9 has over ABC: by being aware
of carry chains and DSPs, it avoids optimising for a path that isn't the actual
critical path, while the generally-longer paths result in ABC9 being able to
reduce design area by mapping other logic to larger-but-slower cells.

View File

@ -13,11 +13,11 @@ The guidelines directory contains notes on various aspects of Yosys development.
The files GettingStarted and CodingStyle may be of particular interest, and are The files GettingStarted and CodingStyle may be of particular interest, and are
reproduced here. reproduced here.
.. literalinclude:: ../temp/GettingStarted .. literalinclude:: /temp/GettingStarted
:language: none :language: none
:caption: guidelines/GettingStarted :caption: guidelines/GettingStarted
.. literalinclude:: ../temp/CodingStyle .. literalinclude:: /temp/CodingStyle
:language: none :language: none
:caption: guidelines/CodingStyle :caption: guidelines/CodingStyle
@ -87,7 +87,7 @@ Creating modules from scratch
Let's create the following module using the RTLIL API: Let's create the following module using the RTLIL API:
.. literalinclude:: ../../resources/PRESENTATION_Prog/absval_ref.v .. literalinclude:: ../../../resources/PRESENTATION_Prog/absval_ref.v
:language: Verilog :language: Verilog
:caption: docs/resources/PRESENTATION_Prog/absval_ref.v :caption: docs/resources/PRESENTATION_Prog/absval_ref.v

View File

@ -0,0 +1,11 @@
Extending Yosys
---------------
.. todo:: brief overview for the extending Yosys index
.. toctree::
:maxdepth: 3
extensions
abc_flow

View File

@ -32,9 +32,9 @@ chapter, even though the chapter only explains the conceptual idea behind it and
can be used as reference to implement a similar system in any language. can be used as reference to implement a similar system in any language.
.. toctree:: .. toctree::
:maxdepth: 3 :maxdepth: 3
flow/index flow/index
formats/index formats/index
techmap extending_yosys/index
extensions techmap