Merge remote-tracking branch 'lnis_origin/dev' into ganesh_dev

2020-04-05 20:59:10 -06:00 · 2020-04-05 20:59:10 -06:00 · 77f7e13ba7
parent d1d3446568 1a3a748dd2
commit 77f7e13ba7
218 changed files with 21309 additions and 3702 deletions
--- a/.travis.yml
+++ b/.travis.yml
@ -29,7 +29,6 @@ matrix:
        apt:
          sources:
          - ubuntu-toolchain-r-test # For newer GCC
-          - george-edison55-precise-backports # For cmake
          - llvm_toolchain-trusty-7
          packages:
          - autoconf
--- a/README.md
+++ b/README.md
@ -6,7 +6,7 @@
 The OpenFPGA framework is the **first open-source FPGA IP generator** supporting highly-customizable homogeneous FPGA architectures. OpenFPGA provides a full set of EDA support for customized FPGAs, including Verilog-to-bitstream generation and self-testing verification [testbenches/scripts](./testbenches/scripts) OpenFPGA opens the door to democratizing FPGA technology and EDA techniques, with agile prototyping approaches and constantly evolving EDA tools for chip designers and researchers.

 ## Compilation
-Dependencies and help using docker can be found at [**./tutorials/building.md**](./tutorials/building.md).
+Dependencies and help using docker can be found [**here**](./docs/source/tutorials/building.rst).

 **Compilation Steps:**
 ```bash
--- a/docs/Makefile
+++ b/docs/Makefile
@ -3,7 +3,7 @@

 # You can set these variables from the command line.
 SPHINXOPTS    =
-SPHINXBUILD   = sphinx-build
+SPHINXBUILD   = sphinx-build-3.6
 SOURCEDIR     = source
 BUILDDIR      = build

--- a/docs/source/arch_lang/addon_vpr_syntax.rst
+++ b/docs/source/arch_lang/addon_vpr_syntax.rst
@ -0,0 +1,100 @@
+.. _addon_vpr_syntax:
+
+Additional Syntax to Original VPR XML
+-------------------------------------
+
+.. warning:: Note this is only applicable to VPR8!
+
+Models, Complex blocks and Physical Tiles
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  
+Each ``<pb_type>`` should contain a ``<mode>`` that describe the physical implementation of the ``<pb_type>``. Note that this is fully compatible to the VPR architecture XML syntax.
+  
+``<model>`` should include the models that describe the primitive ``<pb_type>`` in physical mode.
+  
+.. note:: Currently, OpenFPGA only supports 1 ``<equivalent_sites>`` to be defined under each ``<tile>``
+
+Layout
+~~~~~~
+
+``<layout>`` may include additioinal syntax to enable tileable routing resource graph generation
+
+.. option:: tileable="<bool>"
+
+  Turn ``on``/``off`` tileable routing resource graph generator.
+  
+  Tileable routing architecture can minimize the number of unique modules in FPGA fabric to be physically implemented.
+
+  Technical details can be found in :cite:`XTang_FPT_2019`. 
+
+  .. note:: Strongly recommend to enable the tileable routing architecture when you want to PnR large FPGA fabrics, which can effectively reduce the runtime.
+
+.. option:: through_channel="<bool>"
+  
+  Allow routing channels to pass through multi-width and multi-height programable blocks. This is mainly used in heterogeneous FPGAs to increase routability, as illustrated in :numref:`fig_thru_channel`.
+  By default, it is ``off``.
+
+  .. _fig_thru_channel:
+  
+  .. figure:: ./figures/thru_channel.png
+     :scale: 80%
+     :alt: Impact of through channel
+  
+     Impact on routing architecture when through channel in multi-width and multi-height programmable blocks: (a) disabled; (b) enabled.
+
+  .. warning:: Do NOT enable ``through_channel`` if you are not using the tileable routing resource graph generator!
+  
+  .. warning:: Currently ``through_channel`` supports only a fixed routing channel width!
+
+
+A quick example to show tileable routing is enabled and through channels are disabled:
+
+.. code-block:: xml
+
+  <layout tileable="true" through_channel="false">
+  </layout>
+
+Switch Block
+~~~~~~~~~~~~
+
+``<switch_block>`` may include addition syntax to enable different connectivity for pass tracks
+
+.. option:: sub_type="<string>"
+  
+  Connecting type for pass tracks in each switch block
+  The supported connecting patterns are ``subset``, ``universal`` and ``wilton``, being the same as VPR capability
+  If not specified, the pass tracks will the same connecting patterns as start/end tracks, which are defined in ``type``
+
+.. option:: sub_Fs="<int>"
+
+  Connectivity parameter for pass tracks in each switch block. Must be a multiple of 3.
+  If not specified, the pass tracks will the same connectivity as start/end tracks, which are defined in ``fs``
+
+A quick example which defines a switch block
+  - Starting/ending routing tracks are connected in the ``wilton`` pattern
+  - Each starting/ending routing track can drive 3 other starting/ending routing tracks
+  - Passing routing tracks are connected in the ``subset`` pattern
+  - Each passing routing track can drive 6 other starting/ending routing tracks
+
+.. code-block:: xml
+
+  <device>
+    <switch_block type="wilton" fs="3" sub_type="subset" sub_fs="6"/>
+  </device>
+
+Routing Segments
+~~~~~~~~~~~~~~~~
+
+OpenFPGA suggests users to give explicit names for each routing segement in ``<segmentlist>`` 
+This is used to link ``circuit_model`` to routing segments.
+
+A quick example which defines a length-4 uni-directional routing segment called ``L4`` :
+
+.. code-block:: xml
+
+  <segmentlist>
+    <segment name="L4" freq="1" length="4" type="undir"/>
+  </segmentlist>
+
+.. note:: Currently, OpenFPGA only supports uni-directional routing architectures
+
--- a/docs/source/arch_lang/annotate_vpr_arch.rst
+++ b/docs/source/arch_lang/annotate_vpr_arch.rst
@ -0,0 +1,227 @@
+.. _annotate_vpr_arch:
+
+Bind circuit modules to VPR architecture 
+----------------------------------------
+Each defined circuit model should be linked to an FPGA module defined in the original part of architecture descriptions. It helps FPGA-circuit creating the circuit netlists for logic/routing blocks. Since the original part lacks such support, we create a few XML properties to link to Circuit models.
+
+Configuration Protocol
+~~~~~~~~~~~~~~~~~~~~~~
+
+Configuration protocol is the circuitry designed to program an FPGA.
+As an interface, configuration protocol could be really different in FPGAs, depending on the application context.
+
+Template
+````````
+
+.. code-block:: xml
+
+  <configuration_protocol>
+    <organization type="<string>" circuit_model_name="<string>"/>
+  </configuration_protocol>
+
+.. option:: type="scan_chain|memory_bank|standalone"
+
+  Specify the type of configuration circuits.
+
+  OpenFPGA supports different types of configuration protocols to program FPGA fabrics:
+    - ``scan_chain``: configurable memories are connected in a chain. Bitstream is loaded serially to program a FPGA
+    - ``memory_bank``: configurable memories are organized in an array, where each element can be accessed by an unique address to the BL/WL decoders
+    - ``standalone``: configurable memories are directly accessed through ports of FPGA fabrics. In other words, there are no protocol to control the memories. This allows full customization on the configuration protocol for hardware engineers.
+
+  .. note:: Avoid to use ``standalone`` when designing an FPGA chip. It will causes a huge number of I/Os required, far beyond any package size. It is well applicable to eFPGAs, where designers do need customized protocols between FPGA and processors. 
+
+.. warning:: Currently FPGA-SPICE only supports standalone memory organization.
+
+.. warning:: Currently RRAM-based FPGA only supports memory-bank organization for Verilog Generator.
+
+.. option:: circuit_model_name="<string>"
+
+  Specify the name of circuit model to be used as configurable memory.
+  - ``scan_chain`` requires a circuit model type of ``ccff``
+  - ``memory_bank`` requires a circuit model type of ``sram``
+  - ``standalone`` requires a circuit model type of ``sram``
+
+Configuration Chain Example
+```````````````````````````
+The following XML code describes a scan-chain circuitry to configure the core logic of FPGA, as illustrated in :numref:`fig_ccff_fpga`.
+It will use the circuit model defined in :ref:`circuit_model_examples`.
+
+.. code-block:: xml
+
+  <configuration_protocol>
+    <organization type="scan_chain" circuit_model_name="ccff"/>
+  </configuration_protocol>
+
+.. _fig_ccff_fpga:
+
+.. figure:: figures/ccff_fpga.png
+   :scale: 60%
+   :alt: map to buried treasure
+ 
+   Example of a configuration chain to program core logic of a FPGA 
+
+Memory bank Example
+```````````````````
+The following XML code describes a memory-bank circuitry to configure the core logic of FPGA, as illustrated in :numref:`fig_sram`.
+It will use the circuit model defined in :ref:`circuit_model_examples`.
+
+.. code-block:: xml
+
+  <configuration_protocol>
+    <organization type="memory_bank" circuit_model_name="sram"/>
+  </configuration_protocol>
+
+.. _fig_sram:
+
+.. figure:: figures/sram.png
+   :scale: 60%
+   :alt: map to buried treasure
+ 
+   Example of a memory organization using memory decoders 
+
+Standalone SRAM Example
+```````````````````````
+
+.. warning:: TO BE CONSTRUCTED
+
+Switch Blocks
+~~~~~~~~~~~~~
+
+Original VPR architecture description contains an XML node called switchlist under which all the multiplexers of switch blocks are described.
+To link a defined circuit model to a multiplexer in the switch blocks, a new XML property circuit_model_name should be added to the descriptions.
+
+Here is an example:
+
+.. code-block:: xml
+
+  <switch_block>
+    <switch type="mux" name="<string>" circuit_model_name="<string>"/>
+  </switch_block>
+
+- ``circuit_model_name="<string>"`` should match a circuit model whose type is ``mux`` defined in :ref:`circuit_library`.
+
+
+Connection Blocks
+~~~~~~~~~~~~~~~~~
+
+To link the defined circuit model of the multiplexer to the Connection Blocks, a ``circuit_model_name`` should be annotated to the definition of Connection Blocks switches.  
+
+Here is the example:
+
+.. code-block:: xml
+
+  <connection_block>
+    <switch type="ipin_cblock" name="<string>" circuit_model_name="<string>"/>
+  </connection_block>
+
+- ``circuit_model_name="<string>"`` should match a circuit model whose type is ``mux`` defined in :ref:`circuit_library`.
+
+Channel Wire Segments
+~~~~~~~~~~~~~~~~~~~~~
+
+Similar to the Switch Boxes and Connection Blocks, the channel wire segments in the original architecture descriptions can be adapted to provide a link to the defined circuit model.
+
+.. code-block:: xml
+
+  <segmentlist>
+    <segment name="<string>" circuit_model_name="<string>"/>
+  </segmentlist>
+
+- ``circuit_model_name="<string>"`` should match a circuit model whose type is ``chan_wire`` defined in :ref:`circuit_library`.
+
+Primitive Blocks inside Multi-mode Configurable Logic Blocks
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The architecture description employs a hierarchy of ``pb_types`` to depict the sub-modules and complex interconnections inside logic blocks. Each leaf node and interconnection in the pb_type hierarchy should be linked to a circuit model.
+Each primitive block, i.e., the leaf ``pb_types``, should be linked to a valid circuit model, using the XML syntax ``circuit_model_name``.
+The ``circuit_model_name`` should match the given name of a ``circuit_model`` defined by users.
+
+.. code-block:: xml
+
+  <pb_type_annotations>
+    <!-- physical pb_type binding in complex block IO -->
+    <pb_type name="io" physical_mode_name="physical"/>
+    <pb_type name="io[physical].iopad" circuit_model_name="iopad" mode_bits="1"/> 
+    <pb_type name="io[inpad].inpad" physical_pb_type_name="io[physical].iopad" mode_bits="1"/> 
+    <pb_type name="io[outpad].outpad" physical_pb_type_name="io[physical].iopad" mode_bits="0"/> 
+    <!-- End physical pb_type binding in complex block IO -->
+
+    <!-- physical pb_type binding in complex block CLB -->
+    <!-- physical mode will be the default mode if not specified -->
+    <pb_type name="clb">
+      <!-- Binding interconnect to circuit models as their physical implementation, if not defined, we use the default model -->
+      <interconnect name="crossbar" circuit_model_name="mux_2level"/>
+    </pb_type>
+    <pb_type name="clb.fle" physical_mode_name="physical"/>
+    <pb_type name="clb.fle[physical].fabric.frac_logic.frac_lut6" circuit_model_name="frac_lut6" mode_bits="0"/>
+    <pb_type name="clb.fle[physical].fabric.ff" circuit_model_name="static_dff"/>
+    <!-- Binding operating pb_type to physical pb_type -->
+    <pb_type name="clb.fle[n2_lut5].lut5inter.ble5.lut5" physical_pb_type_name="clb.fle[physical].fabric.frac_logic.frac_lut6" mode_bits="1" physical_pb_type_index_factor="0.5">
+      <!-- Binding the lut5 to the first 5 inputs of fracturable lut6 -->
+      <port name="in" physical_mode_port="in[0:4]"/>
+      <port name="out" physical_mode_port="lut5_out" physical_mode_pin_rotate_offset="1"/>
+    </pb_type>
+    <pb_type name="clb.fle[n2_lut5].lut5inter.ble5.ff" physical_pb_type_name="clb.fle[physical].fabric.ff"/>
+    <pb_type name="clb.fle[n1_lut6].ble6.lut6" physical_pb_type_name="clb.fle[physical].fabric.frac_logic.frac_lut6" mode_bits="0">
+      <!-- Binding the lut6 to the first 6 inputs of fracturable lut6 -->
+      <port name="in" physical_mode_port="in[0:5]"/>
+      <port name="out" physical_mode_port="lut6_out"/>
+    </pb_type>
+    <pb_type name="clb.fle[n1_lut6].ble6.ff" physical_pb_type_name="clb.fle[physical].fabric.ff" physical_pb_type_index_factor="2" physical_pb_type_index_offset="0"/>
+    <!-- End physical pb_type binding in complex block IO -->
+  </pb_type_annotations>
+  
+.. option:: <pb_type name="<string>" physical_mode_name="<string>">
+
+  Specify a physical mode for multi-mode ``pb_type`` defined in VPR architecture.
+
+  .. note:: This should be applied to non-primitive ``pb_type``, i.e., ``pb_type`` have child ``pb_type``.
+
+  - ``name="<string>"`` specifiy the full name of a ``pb_type`` in the hierarchy of VPR architecture.
+
+  - ``physical_mode_name="<string>"`` Specify the name of the mode that describes the physical implementation of the configurable block. This is critical in modeling actual circuit designs and architecture of an FPGA. Typically, only one ``physical_mode`` should be specified for each multi-mode ``pb_type``.
+
+.. note:: OpenFPGA will infer the physical mode for a single-mode ``pb_type`` defined in VPR architecture
+
+.. option:: <pb_type name="<string>" physical_pb_type_name="<string>" circuit_model_name="<string>" 
+  mode_bits="<int>" physical_pb_type_index_factor="<float>" physical_pb_type_index_offset="<int>">
+
+  Specify the physical implementation for a primitive ``pb_type`` in VPR architecture
+
+  .. note:: This should be applied to primitive ``pb_type``, i.e., ``pb_type`` have no children.
+
+  - ``name="<string>"`` specifiy the full name of a ``pb_type`` in the hierarchy of VPR architecture.
+
+  - ``physical_pb_type_name=<string>`` creates the link on ``pb_type`` between operating and physical modes. This syntax is mandatory for every primitive ``pb_type`` in an operating mode ``pb_type``. It should be a valid name of primitive ``pb_type`` in physical mode.   
+
+  - ``circuit_model_name="<string>"`` Specify a circuit model to implement a ``pb_type`` in VPR architecture. The ``circuit_model_name`` is mandatory for every primitive``pb_type`` in a physical_mode ``pb_type``.
+
+  - ``mode_bits="<int>"`` Specify the configuration bits for the ``circuit_model`` when operating at an operating mode. The length of ``mode_bits`` should match the ``port`` size defined in ``circuit_model``. The ``mode_bits`` should be derived from circuit designs while users are responsible for its correctness. FPGA-Bitstreamm will add the ``mode_bits`` during bitstream generation.
+
+  - ``physical_pb_type_index_factor="<float>"`` aims to align the indices for ``pb_type`` between operating and physical modes, especially when an operating mode contains multiple ``pb_type`` (``num_pb``>1) that are linked to the same physical ``pb_type``. When ``physical_pb_type_name`` is larger than 1, the  index of ``pb_type`` will be multipled by the given factor. 
+
+  - ``physical_pb_type_index_offset=<int>`` aims to align the indices for ``pb_type`` between operating and physical modes, especially when an operating mode contains multiple ``pb_type`` (``num_pb``>1) that are linked to the same physical ``pb_type``. When ``physical_pb_type_name`` is larger than 1, the  index of ``pb_type`` will be shifted by the given factor. 
+
+.. option:: <interconnect name="<string>" circuit_model_name="<string>">
+
+  - ``name="<string>"`` specifiy the name of a ``interconnect`` in VPR architecture. Different from ``pb_type``, hierarchical name is not required here.
+
+  - ``circuit_model_name="<string>"`` For the interconnection type direct, the type of the linked circuit model should be wire. For multiplexers, the type of linked circuit model should be ``mux``. For complete, the type of the linked circuit model can be either ``mux`` or ``wire``, depending on the case.
+
+.. option:: <port name="<string>" physical_mode_port="<string>" physical_mode_pin_rotate_offset="<int>"/>
+
+   Link a port of an operating ``pb_type`` to a port of a physical ``pb_type``
+
+  - ``name="<string>"`` specifiy the name of a ``port`` in VPR architecture. Different from ``pb_type``, hierarchical name is not required here.
+
+  - ``physical_mode_pin="<string>" creates the link of ``port`` of ``pb_type`` between operating and physical modes. This syntax is mandatory for every primitive ``pb_type`` in an operating mode ``pb_type``. It should be a valid ``port`` name of leaf ``pb_type`` in physical mode and the port size should also match. 
+
+  - ``physical_mode_pin_rotate_offset="<int>"`` aims to align the pin indices for ``port`` of ``pb_type`` between operating and physical modes, especially when an operating mode contains multiple ``pb_type`` (``num_pb``>1) that are linked to the same physical ``pb_type``. When ``physical_mode_pin_rotate_offset`` is larger than zero, the pin index of ``pb_type`` (whose index is large than 1) will be shifted by the given offset. 
+
+.. note::
+  It is highly recommended that only one physical mode is defined for a multi-mode configurable block. Try not to use nested physical mode definition. This will ease the debugging and lead to clean XML description. 
+
+.. note::
+  Be careful in using ``physical_pb_type_index_factor``, ``physical_pb_type_index_offset`` and ``physical_mode_pin_rotate_offset``! Try to avoid using them unless for highly complex configuration blocks with very deep hierarchy. 
+
+
--- a/docs/source/arch_lang/circuit_library.rst
+++ b/docs/source/arch_lang/circuit_library.rst
@ -0,0 +1,298 @@
+.. _circuit_library:
+
+Circuit Library
+---------------
+
+For OpenFPGA using VPR7
+~~~~~~~~~~~~~~~~~~~~~~~
+
+To support FPGA Verilog/SPICE, Verily and Bitstream Generator, physical modules containing gate-level and transistor-level features are required for FPGA primitive blocks.
+The physical modules are defined in XML syntax, similar to the original VPR FPGA architecture description language.
+
+For each module that appears in the FPGA architecture, a circuit model should be defined. In the definition of a circuit model, the user can specify if the Verilog/SPICE netlist of the module is either auto-generated or user-defined.
+
+Circuit Model Attributes
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: xml
+
+  <module_circuit_models>
+    <circuit_model type="string" name="string" prefix="string" is_default="int" 
+    circuit_netlist="string" verilog_netlist="string" dump_structural_verilog="string">
+      <transistor-level circuit_design_features="developped_further" />
+    </circuit_model>
+  </module_circuit_models>
+
+* **module_circuit_models**: the father node for all the circuit models. All the circuit models should be defined under this XML node.
+
+    * **circuit_model**: the child node defining transistor-level modeling parameters.
+
+        * **type**: can be [ ``inv_buf`` | ``pass_gate`` | ``gate`` | ``mux`` | ``wire`` | ``chan_wire`` | ``sram`` | ``lut`` | ``ff`` | ``scff`` | ``hard_logic`` | ``iopad`` ]. Specify the type of circuit model. The provided types cover all the modules in FPGAs. For the circuit models in the type of mux/wire/chan_wire/lut, FPGA-Verilog/SPICE can auto-generate Verilog/SPICE netlists. For the rest, FPGA-Verilog/SPICE requires a user-defined Verilog/SPICE netlist.
+
+        * **name**: define the name of this circuit model. The name should be unique and will be used to create the sub-circuit of the circuit model in Verilog/SPICE netlists. Note that for a customized Verilog/SPICE netlist, the name defined here should be the name of the top-level sub-circuit in the customized Verilog/SPICE netlist. FPGA-Verilog/SPICE will check if the given name is conflicted with any reserved words.
+
+        * **prefix**: specify the name of the circuit_model to shown in the auto-generated Verilog/SPICE netlists. The prefix can be the same as the name defined above. And again, the prefix should be unique.
+
+        * **is_default**: can be [``1`` | ``0``], corresponding to [``true`` | ``false``] respectively. Specify this circuit model is the default one for some modules, such as multiplexers. If a module is not linked to any circuit model by users, FPGA-Verilog/SPICE will find the default circuit model defined in the same type and link.  For a circuit model type, only one circuit model can be set as default.
+
+        * **circuit_netlist**: specify the path and file name of a customized Verilog/SPICE netlist. For some modules such as SRAMs, FFs, inpads, and outpads, FPGA-Verilog/SPICE does not support auto-generation of the transistor-level sub-circuits because their circuit design is highly dependent on the technology nodes. These circuit designs should be specified by users. For the other modules that can be auto-generated by FPGA-Verilog/SPICE, the user can also define a custom netlist. Multiplexers cannot be user-defined.
+
+        * **verilog_netlist**: specify the path and file name of a customized Verilog netlist. For some modules such as SRAMs, FFs, inpad and outpads, FPGA-Verilog/SPICE does not support auto-generation of the transistor-level sub-circuits because their circuit design is highly dependent on the technology nodes. These circuit designs should be specified by users. For the other modules that can be auto-generated by FPGA-Verilog/SPICE, the user can also define a custom netlist. Multiplexers cannot be user-defined.
+
+        * **dump_structural_verilog**: when the value of this keyword is set to be true, Verilog generator will output gate-level netlists of this module, instead of behavior-level. Gate-level netlists bring more opportunities in layout-level optimization while behavior-level is more suitable for high-speed formal verification and easier in debugging with HDL simulators.
+
+.. note:: If netlist is not specified, FPGA-Verilog/SPICE auto-generates the Verilog/SPICE netlists for multiplexers, wires, and LUTs.
+
+.. note:: The user-defined netlists, such as LUTs, the decoding methodology should comply with the auto-generated LUTs (See Section 4.5)
+
+.. note:: Under the XML node circuit_model, the features of transistor-level designs can be defined. In the following table, we show the common features supported for all the modules.  Then, we will introduce unique features supported only for some circuit models types.
+
+
+Design Technology-related Attributes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: xml
+
+  <circuit_model type="string" name="string" prefix="string" is_default="int" netlist="string" 
+  dump_structural_verilog="string">
+    <design_technology type="string"/>
+    <input_buffer exist="string" circuit_model_name="string"/>
+    <output_buffer exist="string" circuit_model_name="string"/>
+    <pass_gate_logic type="string" circuit_model_name="string"/>
+    <port type="string" prefix="string" lib_name="string" size="int" default_val="int" circuit_model_name="string" 
+    mode_select="boolean" is_global="boolean" is_set="boolean" is_reset="boolean" 
+    is_config_enable="boolean"/>
+  </circuit_model>
+
+* design_technology :
+
+    * **type:** [cmos|rram]. Specify the type of design technology of the circuit_model.
+
+.. note:: Currently, the RRAM-based designs are only supported for multiplexers.
+
+
+Circuit Port Attributes
+^^^^^^^^^^^^^^^^^^^^^^^
+
+* input_buffer and output_buffer:
+    
+    * **exist:** [on|off]. Define the existence of the input_buffer or output_buffer. Note that the existence is valid for all the inputs and outputs. Note that if users want only part of the inputs (or outputs) to be buffered, this is not supported here. A solution can be building a user-defined Verilog/SPICE netlist.
+
+    * **circuit_model_name:** Specify the name of circuit model which is used to implement input/output buffer, the type of specified circuit model should be inv_buf.
+
+* pass_gate_logic: defined the parameters in pass-gates, which are used in building multiplexers and LUTs.
+
+    * **circuit_model_name:** Specify the name of the circuit model which is used to implement transmission gate, the type of specified circuit model should be pass_gate.
+
+* port: define the port list of a circuit model.
+
+    * **type:** can be [input|output|sram|clock]. For programmable modules, such as multiplexers and LUTs, SRAM ports should be defined. For registers, such as FFs and memory banks, clock ports should be defined.
+
+    * **prefix:** the name of the port to appear in the autogenerated netlists. Each port will be shown as ``<prefix>[i]`` in Verilog/SPICE netlists.
+
+    * **lib_name:** the name of the port defined in standard cells or customized cells. If not specified, this attribute will be the same as ``prefix``.
+
+    * **size:** bandwidth of the port.
+
+    * **default_val:**  default logic value of a port, which is used as the initial logic value of this port in testbench generation. Can be either 0 or 1. We assume each pin of this port has the same default value.
+
+    * **circuit_model_name:** only valid when the type of port is sram. Specify the name of the circuit model which is connected to this port.
+
+    * **mode_select:** can be either ``true`` or ``false``. Specify if this port controls the mode switching in a configurable logic block. Only valid when the type of this port is sram. (A configurable logic block can operate in different modes, which is controlled by SRAM bits.)
+
+    * **is_global:** can be either ``true`` or ``false``. Specify if this port is a global port, which will be routed globally. Note that when multiple global ports are defined with the same name, these global ports will be short-wired together.
+
+    * **is_set:** can be either ``true`` or ``false``. Specify if this port controls a set signal. Only valid when ``is_global`` is true. All the set ports are connected to global set voltage stimuli in testbenches.
+
+    * **is_reset:** can be either ``true`` or ``false``. Specify if this port controls a reset signal. Only valid when ``is_global`` is true. All the reset ports are connected to a global reset voltage stimuli in testbenches.
+
+    * **is_config_enable:** can be either ``true`` or ``false``. Only valid when ``is_global`` is true. Specify if this port controls a configuration-enable signal. This port is only enabled during FPGA configuration, and always disabled during FPGA operation. All the ``config_enable`` ports are connected to global configuration-enable voltage stimuli in testbenches.
+
+.. note::  Different types of ``circuit_model`` have different XML syntax, with which users can highly customize their circuit topologies. See refer to examples of ``circuit_model`` for more details.
+
+.. note:: Note that we have a list of reserved port names, which indicate the usage of these ports when building FPGA fabrics. Please do not use ``mem_out``, ``mem_inv``, ``bl``, ``wl``, ``blb``, ``wlb``, ``ccff_head`` and ``ccff_tail``.
+
+For OpenFPGA using VPR8
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Circuit design is a dominant factor in Power, Performance, Area (P.P.A.) of FPGA fabrics.
+Upon practical applications, the hardware engineers may select various circuits to implement their FPGA fabrics.
+For instance, a ultra-low-power FPGA may be built with ulta-low-power circuit cells while a high-performance FPGA may use absolutely different circuit cells.
+OpenFPGA provide enriched XML syntax for users to highly customize their circuits in FPGA fabric.
+
+In the XML file, users can define a library of circuits, each of which corresponds to a primitive module required in the FPGA architecture.
+Users can specify if the Verilog/SPICE netlist of the module is either auto-generated by OpenFPGA or provided by themselves.
+As such, OpenFPGA can support any circuit design, leading to high flexibility in building FPGA fabrics.
+
+In principle, a circuit library consists of a number of ``<circuit_model>``, each of which correspond to a circuit design.
+OpenFPGA supports a wide range of circuit designs.
+The ``<circuit_model>`` could be as small as a cornerstone cell, such as inverter, buffer *etc*., or as large as a hardware IP, such as Block RAM.
+
+.. code-block:: xml
+
+  <circuit_library>
+    <circuit_model type="<string>" name="<string>">
+      <!-- Detailed circuit-level design parameters -->
+    </circuit_model>
+    <!-- More circuit models -->
+  </circuit_library>
+
+Currently, OpenFPGA supports the following categories of circuits:
+
+  - inverters/buffers
+  - pass-gate logic, including transmission gates and pass transistors
+  - standard cell logic gates, including AND, OR and MUX2
+  - metal wires
+  - multiplexers
+  - flip-flops
+  - Look-Up Tables, including single-output and multi-output fracturable LUTs
+  - Statis Random Access Memory (SRAM)
+  - scan-chain flip-flops
+  - I/O pad
+  - hardware IPs 
+
+Circuit Model
+^^^^^^^^^^^^^
+
+As OpenFPGA supports many types of circuit models and their circuit-level implementation could be really different, each type of circuit model has special syntax to customize their designs.
+However, most circuit models share the common generality in XML language.
+Here, we focus these common syntax and we will detail special syntax in :ref:`circuit_model_examples`
+
+.. code-block:: xml
+
+  <circuit_model type="<string>" name="<string>" prefix="<string>" is_default="<bool>" spice_netlist="<string>" verilog_netlist="<string>" dump_structural_verilog="<bool>">
+    <design_technology type="<string>"/>
+    <input_buffer exist="<string>" circuit_model_name="<string>"/>
+    <output_buffer exist="<string>" circuit_model_name="<string>"/>
+    <pass_gate_logic type="<string>" circuit_model_name="<string>"/>
+    <port type="<string>" prefix="<string>" lib_name="<string>" size="<int>" default_val="<int>" circuit_model_name="<string>" mode_select="<bool>" is_global="<bool>" is_set="<bool>" is_reset="<bool>" is_config_enable="<bool>"/>
+    <!-- more ports -->
+  </circuit_model>
+
+.. option:: <circuit_model type="<string>" name="<string>" prefix="<string>" is_default="<bool>"
+  spice_netlist="<string>" verilog_netlist="<string>" dump_structural_verilog="<bool>">
+  
+  Specify the general attributes for a circuit model
+
+  - ``type="inv_buf|pass_gate|gate|mux|wire|chan_wire|sram|lut|ff|ccff|hard_logic|iopad"`` Specify the type of circuit model. For the circuit models in the type of mux/wire/chan_wire/lut, FPGA-Verilog/SPICE can auto-generate Verilog/SPICE netlists. For the rest, FPGA-Verilog/SPICE requires a user-defined Verilog/SPICE netlist.
+
+  - ``name="<string>"`` Specify the name of this circuit model. The name should be unique and will be used to create the Verilog/SPICE module in Verilog/SPICE netlists. Note that for a customized Verilog/SPICE netlist, the name defined here MUST be the name in the customized Verilog/SPICE netlist. FPGA-Verilog/SPICE will check if the given name is conflicted with any reserved words.
+
+  - ``prefix="<string>"`` Specify the name of the ``<circuit_model>`` to shown in the auto-generated Verilog/SPICE netlists. The prefix can be the same as the name defined above. And again, the prefix should be unique
+
+  - ``is_default="true|false"``  Specify this circuit model is the default one for those in the same types. If a primitive module in VPR architecture is not linked to any circuit model by users, FPGA-Verilog/SPICE will find the default circuit model defined in the same type.
+
+  - ``spice_netlist="<string>"`` Specify the path and file name of a customized SPICE netlist. For some modules such as SRAMs, FFs, I/O pads, FPGA-SPICE does not support auto-generation of the transistor-level sub-circuits because their circuit design is highly dependent on the technology nodes. These circuit designs should be specified by users. For the other modules that can be auto-generated by FPGA-SPICE, the user can also define a custom netlist.
+
+  - ``verilog_netlist="<string>"`` Specify the path and file name of a customized Verilog netlist. For some modules such as SRAMs, FFs, I/O pads, FPGA-Verilog does not support auto-generation of the transistor-level sub-circuits because their circuit design is highly dependent on the technology nodes. These circuit designs should be specified by users. For the other modules that can be auto-generated by FPGA-Verilog, the user can also define a custom netlist.
+
+  - ``dump_structural_verilog="true|false"`` When the value of this keyword is set to be true, Verilog generator will output gate-level netlists of this module, instead of behavior-level. Gate-level netlists bring more opportunities in layout-level optimization while behavior-level is more suitable for high-speed formal verification and easier in debugging with HDL simulators.
+
+.. warning:: ``prefix`` may be deprecated soon
+
+.. note:: Multiplexers cannot be user-defined.
+
+.. note:: For a circuit model type, only one circuit model can be set as default.
+
+.. note:: If ``<spice_netlist>`` or ``<verilog_netlist>`` are not specified, FPGA-Verilog/SPICE auto-generates the Verilog/SPICE netlists for multiplexers, wires, and LUTs.
+
+.. note:: The user-defined netlists, such as LUTs, the decoding methodology should comply with the auto-generated LUTs!!!
+
+Design Technology
+^^^^^^^^^^^^^^^^^
+
+.. option:: <design_technology type="string"/>
+
+  Specify the design technology applied to a ``<circuit_model>``
+
+    - ``type="cmos|rram"`` Specify the type of design technology of the ``<circuit_model>``. Currently, OpenFPGA supports CMOS and RRAM technology for circuit models.
+      CMOS technology can be applied to any types of ``<circuit_model>``, while RRAM technology is only applicable to multiplexers and SRAMs 
+
+.. note:: Each ``<circuit_model>`` may have different technologies
+
+Input and Output Buffers
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. option:: <input_buffer exist="<string>" circuit_model_name="<string>"/>
+
+  - ``exist="true|false"`` Define the existence of the input buffer. Note that the existence is valid for all the inputs.
+
+  - ``circuit_model_name="<string>"`` Specify the name of circuit model which is used to implement input buffer, the type of specified circuit model should be ``inv_buf``.
+
+.. option:: <output_buffer exist="<string>" circuit_model_name="<string>"/>
+
+  - ``exist="true|false"`` Define the existence of the output buffer. Note that the existence is valid for all the outputs. Note that if users want only part of the inputs (or outputs) to be buffered, this is not supported here. A solution can be building a user-defined Verilog/SPICE netlist.
+
+  - ``circuit_model_name="<string>"`` Specify the name of circuit model which is used to implement the output buffer, the type of specified circuit model should be ``inv_buf``.
+
+.. note:: If users want only part of the inputs (or outputs) to be buffered, this is not supported here. A solution can be building a user-defined Verilog/SPICE netlist.
+
+Pass Gate Logic
+^^^^^^^^^^^^^^^
+
+.. option:: <pass_gate_logic circuit_model_name="<string>"/>
+
+  - ``circuit_model_name="<string>"`` Specify the name of the circuit model which is used to implement pass-gate logic, the type of specified circuit model should be ``pass_gate``.
+
+.. note:: pass-gate logic are used in building multiplexers and LUTs.
+
+
+Circuit Port
+^^^^^^^^^^^^
+
+A circuit model may consist of a number of ports. The port list is mandatory in any ``circuit_model`` and must be consistent to any user-defined netlists. 
+
+.. option:: <port type="<string>" prefix="<string>" lib_name="<string>" size="<int>"
+  default_val="<int>" circuit_model_name="<string>" mode_select="<bool>"
+  is_global="<bool>" is_set="<bool>" is_reset="<bool>" is_config_enable="<bool>"/>
+
+  Define the attributes for a port of a circuit model.
+
+  - ``type="input|output|sram|clock"`` Specify the type of the port, i.e., the directionality and usage. For programmable modules, such as multiplexers and LUTs, SRAM ports MUST be defined. For registers, such as FFs and memory banks, clock ports MUST be defined.
+
+    .. note:: ``sram`` and ``clock`` ports are considered as inputs in terms of directionality
+
+  - ``prefix="<string>"`` the name of the port to appear in the autogenerated netlists. Each port will be shown as ``<prefix>[i]`` in Verilog/SPICE netlists.
+
+    .. note:: if the circuit model is binded to a ``pb_type`` in VPR architecture, ``prefix`` must match the port name defined in ``pb_type``
+
+  - ``lib_name="<string>"`` the name of the port defined in standard cells or customized cells. If not specified, this attribute will be the same as ``prefix``.
+
+    .. note:: if the circuit model comes from a standard cell library, using ``lib_name`` is recommended. This is because 
+      - the port names defined in ``pb_type`` are very diffrerent from the standard cells
+      - the port sequence is very different 
+
+  - ``size="<int>"`` bandwidth of the port. MUST be larger than zero.
+
+  - ``default_val="<int>"`` Specify default logic value for a port, which is used as the initial logic value of this port in testbench generation. Can be either 0 or 1. We assume each pin of this port has the same default value.
+
+  - ``circuit_model_name="<string>"`` Specify the name of the circuit model which is connected to this port.
+
+    .. note:: ``circuit_model_name`` is only valid when the type of this port is ``sram``.
+
+  - ``io="true|false"`` Specify if this port should be treated as an I/O port of an FPGA fabric. When this is enabled, this port of each circuit model instanciated in FPGA will be added as an I/O of an FPGA.
+
+    .. note:: ``io`` is only valid for ``input`` ports
+
+  - ``mode_select="true|false"`` Specify if this port controls the mode switching in a configurable logic block. This is due to that a configurable logic block can operate in different modes, which is controlled by SRAM bits.
+
+    .. note:: ``mode_select`` is only valid when the type of this port is ``sram``.
+
+  - ``is_global="true|false"`` can be either ``true`` or ``false``. Specify if this port is a global port, which will be routed globally.
+
+    .. note:: For input ports, when multiple global input ports are defined with the same name, by default, these global ports will be short-wired together. When ``io`` is turned on for this port, these global ports will be independent in the FPGA fabric.
+
+    .. note:: For output ports, the global ports will be independent in the FPGA fabric 
+
+
+  - ``is_set="true|false"`` Specify if this port controls a set signal. All the set ports are connected to global set voltage stimuli in testbenches.
+
+  - ``is_reset="true|false"`` Specify if this port controls a reset signal. All the reset ports are connected to a global reset voltage stimuli in testbenches.
+
+  - ``is_config_enable="true|false"`` Specify if this port controls a configuration-enable signal. Only valid when ``is_global`` is ``true``. This port is only enabled during FPGA configuration, and always disabled during FPGA operation. All the ``config_enable`` ports are connected to global configuration-enable voltage stimuli in testbenches.
+
+.. note:: ``is_set``, ``is_reset`` and ``is_config_enable`` are only valid when ``is_global`` is ``true``. 
+
+.. note::  Different types of ``circuit_model`` have different XML syntax, with which users can highly customize their circuit topologies. See refer to examples of :ref:``circuit_model_example`` for more details.
+
+.. note:: Note that we have a list of reserved port names, which indicate the usage of these ports when building FPGA fabrics. Please do not use ``mem_out``, ``mem_inv``, ``bl``, ``wl``, ``blb``, ``wlb``, ``ccff_head`` and ``ccff_tail``.
--- a/docs/source/arch_lang/circuit_model_examples.rst
+++ b/docs/source/arch_lang/circuit_model_examples.rst
@ -1,33 +1,37 @@
+.. _circuit_model_examples:
+
 Circuit model examples
-======================
-The next subsections are dedicated to detailed examples of each circuit model type. Through these examples, we give a global overview of the different implementations which are available for the user.
+----------------------
+As circuit model in different types have various special syntax.
+Here, we will provide detailed examples on each type of ``circuit_model``.
+These examples may be considered as template for users to craft their own ``circuit_model``.

 Inverters and Buffers
---------------------
+~~~~~~~~~~~~~~~~~~~~~
+
+Template
+````````

 .. code-block:: xml

-  <circuit_model type="inv_buf" name="string" prefix="string" netlist="string" is_default="int">
-    <design_technology type="cmos" topology="string" size="int" tapered="off"/>
-    <port type="input" prefix="string" size="int"/>
-    <port type="output" prefix="string" size="int"/>
+  <circuit_model type="inv_buf" name="<string>" prefix="<string>" netlist="<string>" is_default="<int>">
+    <design_technology type="cmos" topology="<string>" size="<int>" num_level="<int>" f_per_stage="<float>"/>
+    <port type="input" prefix="<string>" size="<int>"/>
+    <port type="output" prefix="<string>" size="<int>"/>
  </circuit_model>

-.. note:: customized Verilog/SPICE netlists are not currently supported for inverters and buffers.
+.. option:: <design_technology type="cmos" topology="<string>" size="<int>" num_level="<int>" f_per_stage="<float>"/>

-* design_technology:
+  - ``topology="inverter|buffer"`` Specify the type of this component, can be either an inverter or a buffer.

-	* **topology:** [``inverter`` | ``buffer``]. Specify the type of this component, can be either an inverter or a buffer.
+  - ``size="<int>"`` Specify the driving strength of inverter/buffer. For a buffer, the size is the driving strength of the inverter at the second level. Note that we consider a two-level structure for a buffer here.

-	* **size:** Specify the driving strength of inverter/buffer. For a buffer, the size is the driving strength of the inverter at the second level. We consider a two-level structure for a buffer here. The support for multi-level structure of a buffer will be introduced in the tapered options.
+  - ``num_level="<int>"`` Define the number of levels of a tapered inverter/buffer. This is required when users need an inverter or a buffer consisting of >2 stages 

-	* **tapered:** [``on`` | ``off``]. Define if the buffer is a tapered (multi-level) buffer. When ``on`` is defined, the following parameter are required.*
+  - ``f_per_stage="<float>"`` Define the ratio of driving strength between the levels of a tapered inverter/buffer. Default value is 4.

-		* **tap_drive_level:** Define the number of levels of a tapered buffer. This parameter is valid only when tapered is turned on.
-
-		* **f_per_stage:** Define the ratio of driving strength between the levels of a tapered driver. This parameter is valid only when tapered is turned on. Default value is 4.
-
-**Inverter x1 example**
+Inverter 1x Example
+```````````````````

 :numref:`fig_inv1` is the inverter symbol depicted in this example.

@ -35,9 +39,9 @@ Inverters and Buffers

 .. figure:: ./figures/Inverter_1.png
   :scale: 100%
-   :alt: classical inverter x1 symbol
+   :alt: classical inverter 1x symbol

-   Classical inverter x1 symbol.
+   Classical inverter 1x symbol.

 The XML code describing this inverter is:

@ -50,18 +54,20 @@ The XML code describing this inverter is:
  </circuit_model>

 This example shows:
-	* The topology chosen as inverter
-	* Size of 1 for the output strength
-	* The tapered parameter is not declared and is off by default

-**Power-gated Inverter x1 example**
+  - The topology chosen as inverter
+  - Size of 1 for the output strength
+  - The tapered parameter is not declared and is ``false`` by default
+
+Power-gated Inverter 1x example
+```````````````````````````````

 The XML code describing an inverter which can be power-gated by the control signals ``EN`` and ``ENB`` :

 .. code-block:: xml

  <circuit_model type="inv_buf" name="INVTX1" prefix="INVTX1">
-    <design_technology type="cmos" topology="inverter" size="3" tapered="off" power_gated="true"/>
+    <design_technology type="cmos" topology="inverter" size="3" power_gated="true"/>
    <port type="input" prefix="in" size="1" lib_name="I"/>
    <port type="input" prefix="EN" size="1" lib_name="EN" is_global="true" default_val="0" is_config_enable="true"/>
    <port type="input" prefix="ENB" size="1" lib_name="ENB" is_global="true" default_val="1" is_config_enable="true"/>
@ -70,15 +76,16 @@ The XML code describing an inverter which can be power-gated by the control sign

 .. note:: For power-gated inverters: all the control signals must be set as ``config_enable`` so that the testbench generation will generate testing waveforms. If the power-gated inverters are auto-generated , all the ``config_enable`` signals must be ``global`` signals as well. If the pwoer-gated inverters come from user-defined netlists, restrictions on ``global`` signals are free.

-**Buffer x2 example**
+Buffer 2x example
+`````````````````

 :numref:`fig_buff` is the buffer symbol depicted in this example.

 .. _fig_buff:

 .. figure:: ./figures/Buffer.png
-   :scale: 100%
-   :alt: buffer symbol composed by 2 inverter, its output strength equal 2
+   :scale: 50%
+   :alt: buffer symbol composed by 2 inverter, its output strength equals to 2

   Buffer made by two inverter, with an output strength of 2.

@ -93,19 +100,20 @@ The XML code describing this buffer is:
  </circuit_model>

 This example shows:
-	* The topology chosen as buffer
-	* Size of 2 for the output strength
-	* The tapered parameter is not declared and is off by default
+  - The topology chosen as buffer
+  - Size of 2 for the output strength
+  - The tapered parameter is not declared and is ``false`` by default


-**Tapered inverter x16 example**
+Tapered inverter 16x example
+````````````````````````````

 :numref:`fig_invtap4` is the tapered inverter symbol depicted this example.

 .. _fig_invtap4:

 .. figure:: ./figures/Tapered_inverter.png
-   :scale: 100%
+   :scale: 50%
   :alt: tapered inverter composed by 3 inverter for an output strength = 16

   Inverter with high output strength made by 3 stage of inverter.
@ -115,62 +123,65 @@ The XML code describing this inverter is:
 .. code-block:: xml

  <circuit_model type="inv_buf" name="tapdrive4" prefix="tapdrive4">
-    <design_technology type="cmos" topology=”inverter" size="1" tapered="on" tap_drive_level="3" 
-	f_per_stage="4"/>
+    <design_technology type="cmos" topology=”inverter" size="1" num_level="3" f_per_stage="4"/>
    <port type="input" prefix="in" size="1"/>
    <port type="output" prefix="out" size="1"/>
  </circuit_model>


 This example shows:
-	* The topology chosen as inverter
-	* Size of 1 for the first stage output strength
-	* The tapered parameter is on. Then the required sub parameters are declared
-		* The number of stage is set to 3 by tap_drive_level
-		* f_per_stage is set to 4. Then 2nd stage output strength is 4* the 1st stage output strength (so 4*1 = 4) and the 3rd stage output strength is 4* the 2nd stage output strength (so 4*4 =  16).
-
+  - The topology chosen as inverter
+  - Size of 1 for the first stage output strength
+  - The number of stage is set to 3 by
+  - f_per_stage is set to 4. Then 2nd stage output strength is 4* the 1st stage output strength (so 4*1 = 4) and the 3rd stage output strength is 4* the 2nd stage output strength (so 4*4 =  16).

 Pass-gate Logic
---------------
+~~~~~~~~~~~~~~~
+
+Template
+````````

 .. code-block:: xml

-  <circuit_model type="pass_gate" name="string" prefix="string" netlist="string" is_default="int">
-    <design_technology type="cmos" topology="string" nmos_size="int" pmos_size="int"/>
-    <input_buffer exist="string" circuit_model_name="string" />
-    <output_buffer exist="string" circuit_model_name="string" />
-    <port type="input" prefix="string" size="int"/>
-    <port type="output" prefix="string" size="int"/>
+  <circuit_model type="pass_gate" name="<string>" prefix="<string>" netlist="<string>" is_default="<int>">
+    <design_technology type="cmos" topology="<string>" nmos_size="<float>" pmos_size="<float>"/>
+    <input_buffer exist="false"/>
+    <output_buffer exist="false"/>
+    <port type="input" prefix="<string>" size="<int>"/>
+    <port type="output" prefix="<string>" size="<int>"/>
  </circuit_model>

-.. note:: customized Verilog/SPICE netlists are not currently supported for pass-gate logics.
+.. note:: Please do not add input and output buffers to pass-gate logic.

-* design_technology:
+.. option:: <design_technology type="cmos" topology="<string>" nmos_size="<float>" pmos_size="<float>"/>

-	* **topology:** [``transmission_gate`` | ``pass_transistor``]. The transmission gate consists of a NMOS transistor and a PMOS transistor. The pass transistor consists of a NMOS transistor.
+  - ``topology="transmission_gate|pass_transistor"`` Specify the circuit topology for the pass-gate logic. A transmission gate consists of a *n*-type transistor and a *p*-type transistor. The pass transistor consists of only a  *n*-type transistor.

-	* **nmos_size:** the size of NMOS transistor in a transmission gate or pass_transistor, expressed in terms of the min_width defined in XML node <transistors>.
+  - ``nmos_size="<float>"`` the size of *n*-type transistor in a transmission gate or pass_transistor, expressed in terms of the minimum width ``min_width`` defined in the transistor model in :ref:`technology_library`.

-	* **pmos_size:** the size of PMOS transistor in a transmission gate, expressed in terms of the min_width defined in XML node <transistors>.
+  - ``pmos_size="<float>"`` the size of *p*-type transistor in a transmission gate, expressed in terms of the minimum width ``min_width`` defined in the transistor model in :ref:`technology_library`.

-**Transmission-gate example**
+.. note:: ``nmos_size`` and ``pmos_size`` are required for FPGA-SPICE
+
+Transmission-gate Example
+`````````````````````````

 :numref:`fig_passgate` is the pass-gate symbol depicted in this example.

 .. _fig_passgate:

 .. figure:: ./figures/pass-gate.png
-   :scale: 60%
+   :scale: 30%
   :alt: pmos and nmos transistortors forming a pass-gate

-   Pass-gate made by pmos ans nmos association.
+   Pass-gate made by a *p*-type and a *n*-type transistors.

 The XML code describing this pass-gate is:

 .. code-block:: xml

  <circuit_model type="pass_gate" name="tgate" prefix="tgate">
-    <design_technology type="cmos" topology="transmission_gate"/>
+    <design_technology type="cmos" topology="transmission_gate" nmos_size="1" pmos_size="2"/>
    <port type="input" prefix="in" size="1"/>
    <port type="input" prefix="sram" size="1"/>
    <port type="input" prefix="sramb" size="1"/>
@ -178,18 +189,18 @@ The XML code describing this pass-gate is:
  </circuit_model>

 This example shows:
-	* Topology is ``transmission_gate``, which means the component need entries for each transistor gate (pmos and nmos)
-	* 3 inputs considered, 1 for signal and 2 to control the transistors gates
-	* No input or output buffer used, these parameters can be uninitialized
+  - A ``transmission_gate`` built with a *n*-type transistor in the size of 1 and a *p*-type transistor in the size of 2.
+  - 3 inputs considered, 1 for datapath signal and 2 to turn on/off the transistors gates

-**Pass-transistor example**
+Pass-transistor Example
+```````````````````````

 :numref:`fig_passtran` is the pass-gate symbol depicted in this example.

 .. _fig_passtran:

 .. figure:: ./figures/pass_transistor.png
-   :scale: 50%
+   :scale: 30%
   :alt: nmos transistortor forming a pass-gate

   Pass-gate made by a nmos transistor.
@ -206,22 +217,23 @@ The XML code describing this pass-gate is:
  </circuit_model>

 This example shows:
-	* Topology is ``pass_transistor``, which means the component need an entry for the transistor gate (nmos)
-	* 2 inputs considered, 1 for signal and 1 to control the transistor gate
-	* No input or output buffer used, these parameters can be uninitialized
-
+  - A ``pass_transistor`` build with a *n*-type transistor in the size of 1 
+  - 2 inputs considered, 1 for datapath signal and 1 to turn on/off the transistor gate

 SRAMs
-----
+~~~~~
+
+Template
+````````

 .. code-block:: xml

-  <circuit_model type="sram" name="string" prefix="string" netlist="string"/>
+  <circuit_model type="sram" name="<string>" prefix="<string>" verilog_netlist="<string>" spice_netlist="<string>"/>
    <design_technology type="cmos"/>
-    <input_buffer exist="string" circuit_model_name="string"/>
-    <output_buffer exist="string" circuit_model_name="string"/>
-    <port type="input" prefix="string" size="int"/>
-    <port type="output" prefix="string" size="int"/>
+    <input_buffer exist="<string>" circuit_model_name="<string>"/>
+    <output_buffer exist="<string>" circuit_model_name="<string>"/>
+    <port type="input" prefix="<string>" size="<int>"/>
+    <port type="output" prefix="<string>" size="<int>"/>
  </circuit_model>

 .. note::  The circuit designs of SRAMs are highly dependent on the technology node and well optimized by engineers. Therefore, FPGA-Verilog/SPICE requires users to provide their customized SRAM Verilog/SPICE/Verilog netlists. A sample Verilog/SPICE netlist of SRAM can be found in the directory SpiceNetlists in the released package. FPGA-Verilog/SPICE assumes that all the LUTs and MUXes employ the SRAM circuit design. Therefore, currently only one SRAM type is allowed to be defined.
@ -231,66 +243,95 @@ SRAMs
 .. note:: The support SRAM modules should have a BL and a WL when the memory-bank-style configuration circuit is declared. Note that the WL should be the write/read enable signal, while BL is the data input.

 Logic gates
-----------
+~~~~~~~~~~~
+
+The circuit model in the type of ``gate`` aims to support direct mapping to standard cells or customized cells provided by technology vendors or users. 
+
+Template
+````````

 .. code-block:: xml

-  <circuit_model type="gate" name="string" prefix="string" netlist="string" dump_explicit_port_map="true|false"/>
-    <design_technology type="cmos" topology="string"/>
-    <input_buffer exist="string" circuit_model_name="string"/>
-    <output_buffer exist="string" circuit_model_name="string"/>
-    <port type="input" prefix="string" lib_name="string" size="int"/>
-    <port type="output" prefix="string" lib_name="string" size="int"/>
+  <circuit_model type="gate" name="<string>" prefix="<string>" spice_netlist="<string>" verilog_netlist="<string>"/>
+    <design_technology type="cmos" topology="<string>"/>
+    <input_buffer exist="<string>" circuit_model_name="<string>"/>
+    <output_buffer exist="<string>" circuit_model_name="<string>"/>
+    <port type="input" prefix="<string>" lib_name="<string>" size="<int>"/>
+    <port type="output" prefix="<string>" lib_name="<string>" size="<int>"/>
  </circuit_model>

-.. note::  The circuit model in the type of gate aims to support direct mapping to standard cells or customized cells provided by technology vendors or users. 

-.. note:: The logic functionality of a gate can be defined through the XML keyword ``topology``. Currently, OpenFPGA supports AND, OR and MUX2 gates. As for standard cells, the size of each port is limited to 1. Currently, only 2-input and single-output logic gates are supported.
+.. option:: <design_technology type="cmos" topology="<string>"/>
+  
+  - ``topology="AND|OR|MUX2"`` Specify the logic functionality of a gate. As for standard cells, the size of each port is limited to 1. Currently, only 2-input and single-output logic gates are supported.

-.. note:: It may happen that the port sequence in generated Verilog netlists has conflicts with the port sequence in standard and customized cells. To avoid this, users can set the XML keyword ``dump_explicit_port_map`` to be true, which enables explicit port mapping are dumped. Users can specify the pin/port name in the standard cell library using the XML keyword ``lib_name``.
+2-input OR Gate Example
+```````````````````````
+
+.. code-block:: xml
+
+    <circuit_model type="gate" name="OR2" prefix="OR2" is_default="true">
+      <design_technology type="cmos" topology="OR"/>
+      <input_buffer exist="false"/>
+      <output_buffer exist="false"/>
+      <port type="input" prefix="a" size="1"/>
+      <port type="input" prefix="b" size="1"/>
+      <port type="output" prefix="out" size="1"/>
+      <delay_matrix type="rise" in_port="a b" out_port="out">
+        10e-12 8e-12
+      </delay_matrix>
+      <delay_matrix type="fall" in_port="a b" out_port="out">
+        10e-12 7e-12
+      </delay_matrix>
+    </circuit_model>
+
+This example shows:
+  - A 2-input OR gate without any input and output buffers
+  - Propagation delay from input ``a`` to ``out`` is 10ps in rising edge and and 8ps in falling edge
+  - Propagation delay from input ``b`` to ``out`` is 10ps in rising edge and 7ps in falling edge

 Multiplexers
------------
+~~~~~~~~~~~~
+
+Template
+````````

 .. code-block:: xml

-  <circuit_model type="mux" name="string" prefix="string" is_default="int">
-    <design_technology type="string" structure="string" num_level="int" add_const_input="string" const_input_val="int" local_encoder="string" ron="float" roff="float" prog_transistor_size="float"/>
-    <input_buffer exist="string" circuit_model_name="string"/>
-    <output_buffer exist="string" circuit_model_name="string"/>
-    <pass_gate_logic type="string" circuit_model_name="string"/>
-    <port type="input" prefix="string" size="int"/>
-    <port type="output" prefix="string" size="int"/>
-    <port type="sram" prefix="string" size="int"/>
+  <circuit_model type="mux" name="<string>" prefix="<string>">
+    <design_technology type="<string>" structure="<string>" num_level="<int>" add_const_input="<bool>" const_input_val="<int>" local_encoder="<bool>"/>
+    <input_buffer exist="<string>" circuit_model_name="<string>"/>
+    <output_buffer exist="<string>" circuit_model_name="<string>"/>
+    <pass_gate_logic type="<string>" circuit_model_name="<string>"/>
+    <port type="input" prefix="<string>" size="<int>"/>
+    <port type="output" prefix="<string>" size="<int>"/>
+    <port type="sram" prefix="<string>" size="<int>"/>
  </circuit_model>

-.. note:: customized Verilog/SPICE netlists are not currently supported for multiplexers.
+.. note:: user-defined Verilog/SPICE netlists are not currently supported for multiplexers.

-* design_technology:
+.. option:: <design_technology type="<string>" structure="<string>" num_level="<int>" add_const_input="<bool>" const_input_val="<int>" local_encoder="<bool>"/>

-	* **structure:** can be [``tree`` \| ``multi-level`` \| ``one-level``]. The structure options are valid for SRAM-based multiplexers. For RRAM-based multiplexers, currently we only support the circuit design in [5]. If ``multi-level`` the following parameter is required:
+  - ``structure="tree|multi-level|one-level"`` Specify the multiplexer structure for a multiplexer. The structure option is only valid for SRAM-based multiplexers. For RRAM-based multiplexers, currently we only support the one-level structure

-		* **num_level:** specify the number of levels when multi-level structure is selected, only.
+  - ``num_level="<int>"`` Specify the number of levels when ``multi-level`` structure is selected.
    
-    * **add_const_input:** can be [``true`` \| ``false``]. When enabled, an extra input will be added to the multiplexer circuits defined in this ``circuit_model``. For example, an 4-input multiplexer will be turned to a 5-input multiplexer. The extra input will be wired to a constant value, which can be specified through the XML syntax ``const_input_val``. The constant value can be either 0 or 1 (By default it is 0). Note that adding such input will help reducing the leakage power of FPGA and parasitic signal activities, with a limited area overhead.
+  - ``add_const_input="true|false"`` Specify if an extra input should be added to the multiplexer circuits. For example, an 4-input multiplexer will be turned to a 5-input multiplexer. The extra input will be wired to a constant value, which can be specified through the XML syntax ``const_input_val``.

-		* **const_input_val:** specify the constant value, to which the extra input will be connected. This syntax is only valid when the ``add_const_input`` is set to true.
+    .. note::  Adding an extra constant input will help reducing the leakage power of FPGA and parasitic signal activities, with a limited area overhead.
+
+  - ``const_input_val="0|1"`` Specify the constant value, to which the extra input will be connected. By default it is 0. This syntax is only valid when the ``add_const_input`` is set to true. 
  
-    * **local_encoder:** can be [``true`` \| ``false``]. When enabled, an local encoder will be added to the multiplexer circuits defined in this ``circuit_model``. The local encoder will be interface the SRAM inputs of multiplexing structure and SRAMs. It can encode the one-hot codes (that drive the select port of multiplexing structure) to a binary code. For example, 8-bit ``00000001`` will be encoded to 3-bit ``000``. This will help reduce the number of SRAM cells used in FPGAs as well as configuration time (especially for scan-chain configuration protocols). But it may cost an area overhead.  
+  - ``local_encoder="true|false"``. Specify if a local encoder should be added to the multiplexer circuits. The local encoder will interface the SRAM inputs of multiplexing structure and SRAMs. It can encode the one-hot codes (that drive the select port of multiplexing structure) to a binary code. For example, 8-bit ``00000001`` will be encoded to 3-bit ``000``. This will help reduce the number of SRAM cells used in FPGAs as well as configuration time (especially for scan-chain configuration protocols). But it may cost an area overhead.  

-        .. note:: Local encoders are only applicable for one-level and multi-level multiplexers. Tree-like multiplexers are already encoded in their nature.
+    .. note:: Local encoders are only applicable for one-level and multi-level multiplexers. Tree-like multiplexers are already encoded in their nature.

-    * **prog_transistor_size:** valid only when the type of design technology is ``rram``. Specify the size of programming transistors used in the RRAM-based multiplexer, we use only n-type transistor and the size should be expressed in terms of the min_width defined in XML node ``transistors``. If type of design technology is ``rram``, then the following parameters are required:
-
-		* **ron:** valid only when the type of design technology is rram. Specify the on-resistance of the RRAM device used in the RRAM-based multiplexer. 
-
-		* **roff:** valid only when the type of design technology is rram. Specify the off-resistance of the RRAM device used in the RRAM-based multiplexer. 
-
-* port: for a multiplexer, the three types of ports, ``input``, ``output`` and ``sram`` should be defined. 
+.. note:: A multiplexer should have only three types of ports, ``input``, ``output`` and ``sram``, which are all mandatory. 

 .. note:: For tree-like multiplexers, they can be built with standard cell MUX2. To enable this, users should define a ``circuit_model``, which describes a 2-input multiplexer (See details and examples in how to define a logic gate using ``circuit_model``. In this case, the ``circuit_model_name`` in the ``pass_gate_logic`` should be the name of MUX2 ``circuit_model``.

-**Mux 1 level example**
+One-level Mux Example
+`````````````````````

 :numref:`fig_mux1` illustrates an example of multiplexer modelling, which consists of input/output buffers and a transmission-gate-based tree structure.

@ -316,12 +357,15 @@ The code describing this Multiplexer is:
    <port type="sram" prefix="sram" size="4"/> 
  </circuit_model>

-**This example shows:**
-	* Each circuit model composing the Multiplexer
-	* The possibility to select the input or output buffers
-	* The possibility to select the pass-gate inside the Mux.
+This example shows:
+  - A one-level 4-input CMOS multiplexer 
+  - All the inputs will be buffered using the circuit model ``inv1x``
+  - All the outputs will be buffered using the circuit model ``tapbuf4``
+  - The multiplexer will be built by transmission gate using the circuit model ``tgate``
+  - The multiplexer will have 4 inputs and 4 SRAMs to control which datapath to propagate

-**Mux-tree example**
+Tree-like Multiplexer Example
+`````````````````````````````

 :numref:`fig_mux` illustrates an example of multiplexer modelling, which consists of input/output buffers and a transmission-gate-based tree structure.

@ -347,61 +391,101 @@ If we arbitrarily fix the number of Mux entries at 4, the following code could i
    <port type="sram" prefix="sram" size="3"/>
  </circuit_model>

-**This example shows:**
-	* The tree topology, 4 entries split in 2 2-to-1 Muxes then another one make the final selection.
-	* The possibility to select the input or output buffers
-	* The number of entries parametrized by ``size`` in input port-type.
+This example shows:
+  - A tree-like 4-input CMOS multiplexer 
+  - All the inputs will be buffered using the circuit model ``inv1x``
+  - All the outputs will be buffered using the circuit model ``tapbuf4``
+  - The multiplexer will be built by transmission gate using the circuit model ``tgate``
+  - The multiplexer will have 4 inputs and 3 SRAMs to control which datapath to propagate

 Look-Up Tables
--------------
+~~~~~~~~~~~~~~
+
+Template
+````````

 .. code-block:: xml

-  <circuit_model type="lut" name="string" prefix="string" is_default="int" netlist="string"/>
-    <design_technology type="cmos" fracturable_lut="true|false"/>
-    <input_buffer exist="string" circuit_model_name="string"/>
-    <output_buffer exist="string" circuit_model_name="string"/>
-    <lut_input_buffer exist="string" circuit_model_name="string"/>
-    <lut_intermediate_buffer exist="string" circuit_model_name="string" location_map="string"/>
-    <lut_input_inverter exist="string" circuit_model_name="string"/>
-    <pass_gate_logic type="string" circuit_model_name="string"/>
-    <port type="input" prefix="string" size="int" tri_state_map="----11" circuit_model_name="string"/>
-    <port type="output" prefix="string" size="int" lut_frac_level="int" lut_output_mask="int"/>
-    <port type="sram" prefix="string" size="int" mode_select="true|false" circuit_model_name="string" default_val="0|1"/>
+  <circuit_model type="lut" name="<string>" prefix="<string>" spice_netlist="<string>" verilog_netlist="<string>"/>
+    <design_technology type="cmos" fracturable_lut="<bool>"/>
+    <input_buffer exist="<string>" circuit_model_name="<string>"/>
+    <output_buffer exist="<string>" circuit_model_name="<string>"/>
+    <lut_input_buffer exist="<string>" circuit_model_name="<string>"/>
+    <lut_input_inverter exist="<string>" circuit_model_name="<string>"/>
+    <lut_intermediate_buffer exist="<string>" circuit_model_name="<string>" location_map="<string>"/>
+    <pass_gate_logic type="<string>" circuit_model_name="<string>"/>
+    <port type="input" prefix="<string>" size="<int>" tri_state_map="<string>" circuit_model_name="<string>"/>
+    <port type="output" prefix="<string>" size="<int>" lut_frac_level="<int>" lut_output_mask="<int>"/>
+    <port type="sram" prefix="<string>" size="<int>" mode_select="<bool>" circuit_model_name="<string>" default_val="<int>"/>
  </circuit_model>

 .. note:: The Verilog/SPICE netlists of LUT can be auto-generated or customized.
  The auto-generated LUTs are based on a tree-like multiplexer, whose gates of the transistors are used as the inputs of LUTs and the drains/sources of the transistors are used for configurable memories (SRAMs).
  The LUT provided in customized Verilog/SPICE netlist should have the same decoding methodology as the traditional LUT.

-Additional design parameters for LUTs:
+.. option:: <lut_input_buffer exist="<string>" circuit_model_name="<string>"/>

-* **lut_input_buffer:** Define transistor-level description for the buffer for the inputs of a LUT (gates of the internal multiplexer). Use keyword circuit_model_name to specify the circuit_model that containing details of the circuit. 
+  Define transistor-level description for the buffer for the inputs of a LUT (gates of the internal multiplexer).

-* **lut_input_inverter:** Define transistor-level description for the inverter for the inputs of a LUT (gates of the internal multiplexer). Use keyword circuit_model_name to specify the circuit_model that containing details of the circuit. 
+  - ``exist="true|false"`` Specify if the input buffer should exist for LUT inputs

+  - ``circuit_model_name="<string>"`` Specify the ``circuit_model`` that will be used to build the input buffers

-* **lut_intermediate_buffer:** Define transistor-level description for the buffer locating at intermediate stages of internal multiplexer of a LUT. Use keyword circuit_model_name to specify the circuit_model that containing details of the circuit. To customize the location, users can define an integer array in the XML keyword location_map. For example, "-1-1-" indicates buffer inseration to every two stages of the LUT multiplexer tree, considering a 6-input LUT. 
+.. note:: In the context of LUT, ``input_buffer`` corresponds to the buffer for the datapath inputs of multiplexers inside a LUT. ``lut_input_buffer`` corresponds to the buffer at the inputs of a LUT

+.. option:: <lut_input_inverter exist="<string>" circuit_model_name="<string>"/>

-Instructions of defining design parameters:
+  Define transistor-level description for the inverter for the inputs of a LUT (gates of the internal multiplexer).

-* **input_buffer:** Specify the buffer/inverter that connects the SRAM outputs to the inputs of multiplexer.
+  - ``exist="true|false"`` Specify if the input buffer should exist for LUT inputs

-* **pass_gate_logic:** Specify the pass-gates of the internal multiplexer, the same as the multiplexers.
+  - ``circuit_model_name="<string>"`` Specify the ``circuit_model`` that will be used to build the input inverters

-* **port:** three types of ports (input, output and sram) should be defined. If the user provides an customized Verilog/SPICE netlist, the bandwidth of ports should be defined to the same as the Verilog/SPICE netlist. To support customizable LUTs, each type of port contain special keywords. For input ports, the keyword tri_state_map aims to customize which inputs are fixed to constant values when the LUT is in fracturable modes. For example, ``tri_state_map`` ="----11" indicates that the last two inputs will be fixed to be logic '1' when a 6-input LUT is in fracturable modes. The circuit_model_name of input port is used to specify which logic gates will be used to tri-state the inputs in fracturable LUT modes. It is required to use an AND gate to force logic '0' or an OR gate to force logic '1' for the input ports. For output ports, the keyword lut_frac_level is used to specify the level in LUT multiplexer tree where the output port are wired to. For example, lut_frac_level="4" in a fracturable LUT6 means that the output are potentially wired to the 4th stage of a LUT multiplexer and it is an output of a LUT4. The keyword lut_output_mask describes which fracturable outputs are used. For instance, in a 6-LUT, there are potentially four LUT4 outputs can be wired out. lut_output_mask="0,2" indicates that only the first and the thrid LUT4 outputs will be used in fracturable mode. Note that the size of the output port should be consistent to the length of lut_output_mask. 
+.. option:: <lut_intermediate_buffer exist="<string>" circuit_model_name="<string>" location_map="<string>"/>

-* **SRAM port for mode selection:** To enable switch between different operating modes, the SRAM bits of a fracturable LUT consists of two parts: configuration memory and mode selecting. The SRAM port for mode selection is specified through the XML keyword mode_select. Note that the size of such SRAM port should be consistent to the number of 1s or 0s in the ``tri_state_map``.
+  Define transistor-level description for the buffer locating at intermediate stages of internal multiplexer of a LUT. 

-**LUT example**
+  - ``exist="true|false"`` Specify if the input buffer should exist at intermediate stages
+
+  - ``circuit_model_name="<string>"`` Specify the ``circuit_model`` that will be used to build these buffers
+
+  - ``location_map="[1|-]"`` Customize the location of buffers in intermediate stages. Users can define an integer array consisting of '1' and '-'. For example, ``-1-1-`` indicates buffer inseration to every two stages of the LUT multiplexer tree, considering a 6-input LUT. 
+
+.. note:: For a LUT, three types of ports (``input``, ``output`` and ``sram``) should be defined. If the user provides an customized Verilog/SPICE netlist, the bandwidth of ports should be defined to the same as the Verilog/SPICE netlist. To support customizable LUTs, each type of port contain special keywords. 
+
+.. option:: <port type="input" prefix="<string>" size="<int>" tri_state_map="<string>" circuit_model_name="<string>"/>
+
+  - ``tri_state_map="[-|1]"`` Customize which inputs are fixed to constant values when the LUT is in fracturable modes. For example, ``tri_state_map="----11"`` indicates that the last two inputs will be fixed to be logic '1' when a 6-input LUT is in fracturable modes. 
+
+  - ``circuit_model_name="<string>"`` Specify the circuit model to build logic gates in order to tri-state the inputs in fracturable LUT modes. It is required to use an ``AND`` gate to force logic '0' or an ``OR`` gate to force logic '1' for the input ports.
+
+.. option:: <port type="output" prefix="<string>" size="<int>" lut_frac_level="<int>" lut_output_mask="<int>"/>
+
+  - ``lut_frac_level="<int>"`` Specify the level in LUT multiplexer tree where the output port are wired to. For example, ``lut_frac_level="4"`` in a fracturable LUT6 means that the output are potentially wired to the 4th stage of a LUT multiplexer and it is an output of a LUT4. 
+  
+  - ``lut_output_mask="<int>"`` Describe which fracturable outputs are used. For instance, in a 6-LUT, there are potentially four LUT4 outputs can be wired out. ``lut_output_mask="0,2"`` indicates that only the first and the thrid LUT4 outputs will be used in fracturable mode.
+
+.. note:: The size of the output port should be consistent to the length of ``lut_output_mask``. 
+
+.. option:: <port type="sram" prefix="<string>" size="<int>" mode_select="<bool>" circuit_model_name="<string>" default_val="<int>"/>
+
+  - ``mode_select="true|false"`` Specify if this port is used to switch the LUT between different operating modes, the SRAM bits of a fracturable LUT consists of two parts: configuration memory and mode selecting.
+
+  - ``circuit_model_name="<string>"`` Specify the circuit model to be drive the SRAM port. Typically, the circuit model should be in the type of ``ccff`` or ``sram``.
+
+  - ``default_val="0|1"`` Specify the default value for the SRAM port. The default value will be used in generating testbenches for unused LUTs
+
+.. note:: The size of a mode-selection SRAM port should be consistent to the number of '1s' or '0s' in the ``tri_state_map``.
+
+Single-Output LUT Example
+`````````````````````````

 :numref:`fig_lut` illustrates an example of LUT modeling, which consists of input/output buffers and a transmission-gate-based tree structure.

 .. _fig_lut:

 .. figure:: ./figures/lut.png
-   :scale: 100%
+   :scale: 80%
   :alt: Detailed LUT composition

   An example of a LUT with transistor-level design parameters.
@ -414,52 +498,90 @@ The code describing this LUT is:
    <input_buffer exist="on" circuit_model="inv1x"/>
    <output_buffer exist="on" circuit_model_name="inv1x"/>
    <lut_input_buffer exist="on" circuit_model_name="buf2"/>
+    <lut_input_inverter exist="on" circuit_model_name="inv1x"/>
    <pass_gate_logic circuit_model_name="tgate"/>
    <port type="input" prefix="in" size="6"/>
    <port type="output" prefix="out" size="1"/>
    <port type="sram" prefix="sram" size="64"/>
  </circuit_model>

-**This example shows:**
-	* The difference between ``input_buffer`` and ``lut_input_buffer`` and that they are independent.
-	* How each blocks is defined
+This example shows:
+  - A 6-input LUT which is configurable by 64 SRAM cells.
+  - The multiplexer inside LUT will be built with transmission gate using circuuit model ``inv1x``
+  - There are no internal buffered inserted to any intermediate stage of a LUT

-Flip-Flops
----------
+Fracturable LUT Example
+`````````````````````````

 .. code-block:: xml

-  <circuit_model type="ff" name="string" prefix="string" netlist="string"/>
+  <circuit_model type="lut" name="frac_lut6" prefix="frac_lut6" dump_structural_verilog="true">
+    <design_technology type="cmos" fracturable_lut="true"/>
+    <input_buffer exist="true" circuit_model_name="inv1x"/>
+    <output_buffer exist="true" circuit_model_name="inv1x"/>
+    <lut_input_inverter exist="true" circuit_model_name="inv1x"/>
+    <lut_input_buffer exist="true" circuit_model_name="buf4"/>
+    <lut_intermediate_buffer exist="true" circuit_model_name="buf4" location_map="-1-1-"/>
+    <pass_gate_logic circuit_model_name="tgate"/>
+    <port type="input" prefix="in" size="6" tri_state_map="-----1" circuit_model_name="OR2"/>
+    <port type="output" prefix="lut5_out" size="2" lut_frac_level="5" lut_output_mask="0,1"/>
+    <port type="output" prefix="lut6_out" size="1" lut_output_mask="0"/>
+    <port type="sram" prefix="sram" size="64"/>
+    <port type="sram" prefix="mode" size="1" mode_select="true" circuit_model_name="ccff" default_val="1"/>
+  </circuit_model>
+
+This example shows:
+  - Fracturable 6-input LUT which is configurable by 65 SRAM cells.
+  - Intermedate buffers are added to every two stages of the internal multiplexer
+  - There is a SRAM cell to switch the operating mode of this LUT, configured by a configuration-chain flip-flop ``ccff``
+  - The last input ``in[5]`` of LUT will be tri-stated in dual-LUT5 mode.
+  - An 2-input OR gate will be wired to the last input ``in[5]`` to tri-state the input. The mode-select SRAM will be wired to an input of the OR gate. 
+    It means that when the mode-selection bit is '1', the LUT will operate in dual-LUT5 mode.
+  - There will be two outputs wired to the 5th stage of routing multiplexer (the outputs of dual 5-input LUTs) 
+  - By default, the mode-selection configuration bit will be '1', indicating that by default the LUT will operate in dual-LUT5 mode.
+
+Flip-Flops
+~~~~~~~~~~
+
+Template
+````````
+
+.. code-block:: xml
+
+  <circuit_model type="ccff|ff" name="<string>" prefix="<string>" spice_netlist="<string>" verilog_netlist="<string>"/>
    <design_technology type="cmos"/>
-    <input_buffer exist="string" circuit_model_name="string"/>
-    <output_buffer exist="string" circuit_model_name="string"/>
-    <port type="input" prefix="string" size="int"/>
-    <port type="output" prefix="string" size="int"/>
-    <port type="clock" prefix="string" size="int"/>
+    <input_buffer exist="<string>" circuit_model_name="<string>"/>
+    <output_buffer exist="<string>" circuit_model_name="<string>"/>
+    <port type="input" prefix="<string>" size="<int>"/>
+    <port type="output" prefix="<string>" size="<int>"/>
+    <port type="clock" prefix="<string>" size="<int>"/>
  </circuit_model>

 .. note:: The circuit designs of flip-flops are highly dependent on the technology node and well optimized by engineers. Therefore, FPGA-Verilog/SPICE requires users to provide their customized FF Verilog/SPICE/Verilog netlists. A sample Verilog/SPICE netlist of FF can be found in the directory SpiceNetlists in the released package.
  
-  The information of input and output buffer should be clearly specified according to the customized Verilog/SPICE netlist! The existence of input/output buffers will influence the decision in creating testbenches, which may leads to larger errors in power analysis.
+  The information of input and output buffer should be clearly specified according to the customized SPICE netlist! The existence of input/output buffers will influence the decision in creating SPICE testbenches, which may leads to larger errors in power analysis.

-  FPGA-Verilog/SPICE currently support only one clock domain in the FPGA. Therefore there should be only one clock port to be defined and the size of the clock port should be 1.
+.. note:: FPGA-Verilog/SPICE currently support only one clock domain in the FPGA. Therefore there should be only one clock port to be defined and the size of the clock port should be 1.

-Instructions of defining design parameters:
+.. option:: <circuit_model type="ccff|ff" name="<string>" prefix="<string>" spice_netlist="<string>" verilog_netlist="<string>"/>

-* **circuit_model type:** can be ``ff`` or ``scff``. FF is typical Flip-Flop, SCFF is Scan-Chain Flip-Flop
+  - ``type="ccff|ff"`` Specify the type of a flip-flop. ``ff`` is a regular flip-flop while ``ccff`` denotes a configuration-chain flip-flop

-* **port:** three types of ports (``input``, ``output`` and ``clock``) should be defined. If the user provides a customized Verilog/SPICE netlist, the bandwidth of ports should be defined to the same as the Verilog/SPICE netlist.
+.. note:: A flip-flop should have three types of ports, ``input``, ``output`` and ``clock``.

-.. note:: In a valid FPGA architecture, users should provide at least either a SCFF or a SRAM, so that the configurations can loaded to core logic. 
+.. note:: If the user provides a customized Verilog/SPICE netlist, the bandwidth of ports should be defined to the same as the Verilog/SPICE netlist.

-**FF example**
+.. note:: In a valid FPGA architecture, users should provide at least either a ``ccff`` or ``sram`` circuit model, so that the configurations can loaded to core logic. 

-:numref:`fig_ff` illustrates an example of LUT modeling, which consists of input/output buffers and a transmission-gate-based tree structure.
+Flip-Flop example
+`````````````````
+
+:numref:`fig_ff` illustrates an example of regular flip-flop.

 .. _fig_ff:

 .. figure:: ./figures/FF.png
-   :scale: 100%
+   :scale: 50%
   :alt: FF symbol

   An example of classical Flip-Flop.
@ -468,27 +590,28 @@ The code describing this FF is:

 .. code-block:: xml

-  <circuit_model type="ff" name="dff" prefix="dff" verilog_netlist="ff.v">
-    <port type="input" prefix="D" size="1"/>
-    <port type="input" prefix="Set" size="1" is_global="true"/>
-    <port type="input" prefix="Reset" size="1" is_global="true"/>
-    <port type="output" prefix="Q" size="1"/>
-    <port type="clock" prefix="clk" size="1" is_global="true"/>
+  <circuit_model type="ff" name="dff" prefix="dff" verilog_netlist="ff.v" spice_netlist="ff.sp">
+    <port type="input" prefix="D" lib_name="D" size="1"/>
+    <port type="input" prefix="Set" lib_name="S" size="1" is_global="true"/>
+    <port type="input" prefix="Reset" lib_name="R" size="1" is_global="true"/>
+    <port type="output" prefix="Q" lib_name="Q" size="1"/>
+    <port type="clock" prefix="clk" lib_name="CK" size="1" is_global="true"/>
  </circuit_model>

-**This example shows:**
-	* Circuit model type as ``ff``
-	* The verilog netlist file associated to this component ``ff.v``
-	* 3 ports, ``Set``, ``Reset`` and ``clk``, defined as global
+This example shows:
+  - A regular flip-flop which is defined in a Verilog netlist ``ff.v`` and a SPICE netlist ``ff.sp``
+  - The flip-flop has ``set`` and ``reset`` functionalities
+  - The flip-flop port names defined differently in standard cell library and VPR architecture. The ``lib_name`` capture the port name defined in standard cells, while ``prefix`` capture the port name defined in ``pb_type`` of VPR architecture file

-**SCFF example**
+Configuration-chain Flip-flop Example
+`````````````````````````````````````

-:numref:`fig_scff` illustrates an example of LUT modeling, which consists of input/output buffers and a transmission-gate-based tree structure.
+:numref:`fig_ccff` illustrates an example of scan-chain flop-flop used to build a configuration chain.

-.. _fig_scff:
+.. _fig_ccff:

 .. figure:: ./figures/scff.png
-   :scale: 100%
+   :scale: 50%
   :alt: SCFF symbol

   An example of a Scan-Chain Flip-Flop.
@ -497,85 +620,103 @@ The code describing this FF is:

 .. code-block:: xml

-  <circuit_model type="scff" name="scff" prefix="scff" verilog_netlist="scff.v">
+  <circuit_model type="ccff" name="ccff" prefix="ccff" verilog_netlist="ccff.v" spice_netlist="ccff.sp">
    <port type="input" prefix="D" size="1"/>
    <port type="output" prefix="Q" size="2"/>
-    <port type="clock" prefix="clk" size="1" is_global="true"/>
+    <port type="clock" prefix="CK" size="1" is_global="true"/>
  </circuit_model>

-**This example shows:**
-	* Circuit model type as ``scff``
-	* The verilog netlist file associated to this component ``scff.v``
-	* 1 port, ``clk``, defined as global
+This example shows:
+  - A configuration-chain flip-flop which is defined in a Verilog netlist ``ccff.v`` and a SPICE netlist ``ccff.sp``
+  - The flip-flop has a global clock port, ``CK``, which will be wired a global programming clock 

 Hard Logics
-----------
+~~~~~~~~~~~
+
+Template
+````````

 .. code-block:: xml

-  <circuit_model type="hardlogic" name="string" prefix="string" netlist="string"/>
+  <circuit_model type="hardlogic" name="<string>" prefix="<string>" verilog_netlist="<string>" spice_netlist="<string>"/>
    <design_technology type="cmos"/>
-    <input_buffer exist="string" circuit_model_name="string"/>
-    <output_buffer exist="string" circuit_model_name="string"/>
-    <port type="input" prefix="string" size="int"/>
-    <port type="output" prefix="string" size="int"/>
+    <input_buffer exist="<string>" circuit_model_name="<string>"/>
+    <output_buffer exist="<string>" circuit_model_name="<string>"/>
+    <port type="input" prefix="<string>" size="<int>"/>
+    <port type="output" prefix="<string>" size="<int>"/>
  </circuit_model>

 .. note:: Hard logics are defined for non-configurable resources in FPGA architectures, such as adders, multipliers and RAM blocks.
  Their circuit designs are highly dependent on the technology node and well optimized by engineers.
-  As more functional units are included in FPGA architecture, it is impossible to auto-generate these functional units [3].
-  Therefore, FPGA-Verilog/SPICE requires users to provide their customized Verilog/SPICE netlists. A sample Verilog/SPICE netlist of a 1-bit adder can be found in the directory SpiceNetlists in the released package.
+  As more functional units are included in FPGA architecture, it is impossible to auto-generate these functional units.
+  Therefore, FPGA-Verilog/SPICE requires users to provide their customized Verilog/SPICE netlists.

-  The information of input and output buffer should be clearly specified according to the customized Verilog/SPICE netlist! The existence of input/output buffers will influence the decision in creating testbenches, which may leads to larger errors in power analysis.
+.. note:: Examples can be found in hard_logic_example_link_

-Instructions of defining design parameters:
+.. _hard_logic_example_link: https://github.com/LNIS-Projects/OpenFPGA/tree/master/openfpga_flow/VerilogNetlists

-* **port:** two types of ports (``input`` and ``output``) should be defined. If the user provides a user-defined Verilog/SPICE netlist, the bandwidth of ports should be defined to the same as the Verilog/SPICE netlist.
+.. note::  The information of input and output buffer should be clearly specified according to the customized Verilog/SPICE netlist! The existence of input/output buffers will influence the decision in creating SPICE testbenches, which may leads to larger errors in power analysis.

-Routing Wire Segments
---------------------
-
-FPGA-Verilog/SPICE provides two types of Verilog/SPICE models for the wire segments in FPGA architecture:
-
-	* One type is called ``wire``, which targets the local wires inside the logic blocks. The wire has one input and one output, directly connecting the output of a driver and the input of the downstream unit, respectively
-	* The other type is called ``chan_wire``, especially targeting the channel wires. The channel wires have one input and two outputs, one of which is connected to the inputs of Connection Boxes while the other is connected to the inputs of Switch Boxes. Two outputs are created because from the view of layout, the inputs of Connection Boxes are typically connected to the middle point of channel wires, which has less parasitic resistances and capacitances than connected to the ending point.
+1-bit Full Adder Example
+````````````````````````

 .. code-block:: xml

-  <circuit_model type="string" name="string" prefix="string" netlist="string"/>
+  <circuit_model type="hard_logic" name="adder" prefix="adder" spice_netlist="adder.sp" verilog_netlist="adder.v">
    <design_technology type="cmos"/>
-    <input_buffer exist="string" circuit_model_name="string"/>
-    <output_buffer exist="string" circuit_model_name="string"/>
-    <port type="input" prefix="string" size="int"/>
-    <port type="output" prefix="string" size="int"/>
-    <wire_param model_type="string" res_val="float" cap_val="float" level="int"/>
+    <input_buffer exist="true" circuit_model_name="inv1x"/>
+    <output_buffer exist="true" circuit_model_name="inv1x"/>
+    <port type="input" prefix="a" size="1"/>
+    <port type="input" prefix="b" size="1"/>
+    <port type="input" prefix="cin" size="1"/>
+    <port type="output" prefix="cout" size="1"/>
+    <port type="output" prefix="sumout" size="1"/>
+  </circuit_model>
+
+Routing Wire Segments
+~~~~~~~~~~~~~~~~~~~~~
+
+FPGA architecture requires two type of  wire segments:
+
+  - ``wire``, which targets the local wires inside the logic blocks. The wire has one input and one output, directly connecting the output of a driver and the input of the downstream unit, respectively
+  - ``chan_wire``, especially targeting the channel wires. The channel wires have one input and two outputs, one of which is connected to the inputs of Connection Boxes while the other is connected to the inputs of Switch Boxes. Two outputs are created because from the view of layout, the inputs of Connection Boxes are typically connected to the middle point of channel wires, which has less parasitic resistances and capacitances than connected to the ending point.
+
+Template
+````````
+
+.. code-block:: xml
+
+  <circuit_model type="wire|cham_wire" name="<string>" prefix="<string>" spice_netlist="<string>" verilog_netlist="<string>"/>
+    <design_technology type="cmos"/>
+    <input_buffer exist="<string>" circuit_model_name="<string>"/>
+    <output_buffer exist="<string>" circuit_model_name="<string>"/>
+    <port type="input" prefix="<string>" size="<int>"/>
+    <port type="output" prefix="<string>" size="<int>"/>
+    <wire_param model_type="<string>" R="<float>" C="<float>" num_level="<int>"/>
  </circuit_model>

 .. note:: FPGA-Verilog/SPICE can auto-generate the Verilog/SPICE model for wires while also allows users to provide their customized Verilog/SPICE netlists.

-  The information of input and output buffer should be clearly specified according to the customized netlist! The existence of input/output buffers will influence the decision in creating testbenches, which may leads to larger errors in power analysis.
+.. note:: The information of input and output buffer should be clearly specified according to the customized netlist! The existence of input/output buffers will influence the decision in creating testbenches, which may leads to larger errors in power analysis.

-Instructions of defining design parameters:
+.. option:: <wire_param model_type="<string>" R="<float>" C="<float>" num_level="<int>"/>

-* **type:** can be [``wire`` | ``chan_wire``]. The Verilog/SPICE model wire targets the local wire inside the logic block while the chan_wire targets the channel wires in global routing.
+  - ``model_type="pi|T"`` Specify the type of RC models for this wire segement. Currently, OpenFPGA supports the π-type and T-type multi-level RC models.
+  - ``R="<float>"`` Specify the total resistance of the wire
+  - ``C="<float>"`` Specify the total capacitance of the wire.
+  - ``num_level="<int>"`` Specify the number of levels of the RC wire model.

-* **port:** two types of ports (``input`` and ``output``) should be defined. If the user provides an customized Verilog/SPICE netlist, the bandwidth of ports should be defined to the same as the Verilog/SPICE netlist.
+.. note:: wire parameters are essential for FPGA-SPICE to accurately model wire parasitics

-* **wire_param:**
-
-	* **model_type:** can be [``pi`` | ``T``], corresponding to the π-type and T-type RC wire models.
-	* **res_val:** specify the total resistance of the wire
-	* **cap_val:** specify the total capacitance of the wire.
-	* **level:** specify the number of levels of the RC wire model.
-
-**Chan-Wire example**
+Routing Track Wire Example
+``````````````````````````

 :numref:`fig_wire` depicts the modeling for a length-2 channel wire.

 .. _fig_wire:

 .. figure:: ./figures/wire.png
-   :scale: 100%
+   :scale: 80%
   :alt: map to buried treasure

   An example of a length-2 channel wire modeling
@ -587,46 +728,45 @@ The code describing this wire is:
  <circuit_model type="chan_wire" name="segment0" prefix="chan_wire"/>
    <design_technology type="cmos"/>
    <port type="input" prefix="mux_out" size="1"/>
-    <port type="output" prefix="cb_sb" size="2"/>
+    <port type="output" prefix="cb_sb" size="1"/>
    <wire_param model_type="pi" res_val="103.84" cap_val="13.80e-15" level="1"/>
  </circuit_model>

-**This example shows**
-	* How to use the ``wire_param`` for a π-type RC wire model
-	* How to use this circuit_model to auto-generate the Verilog/SPICE netlist
+This example shows
+  - A routing track wire has 1 input and output 
+  - The routing wire will be modelled as a 1-level π-type RC wire model with a total resistance of :math:`103.84\Omega` and a total capacitance of :math:`13.89fF`

 I/O pads
--------
+~~~~~~~~
+
+Template
+````````

 .. code-block:: xml

-  <circuit_model type="iopads" name="string" prefix="string" netlist="string"/>
+  <circuit_model type="iopad" name="<string>" prefix="<string>" spice_netlist="<string>" verilog_netlist="<string>"/>
    <design_technology type="cmos"/>
-    <input_buffer exist="string" circuit_model_name="string"/>
-    <output_buffer exist="string" circuit_model_name="string"/>
-    <port type="input" prefix="string" size="int"/>
-    <port type="output" prefix="string" size="int"/>
-    <port type="sram" prefix="string" size="int" mode_select="true|false" 
-	circuit_model_name="string" default_val="int"/>
+    <input_buffer exist="<string>" circuit_model_name="<string>"/>
+    <output_buffer exist="<string>" circuit_model_name="<string>"/>
+    <port type="input" prefix="<string>" size="<int>"/>
+    <port type="output" prefix="<string>" size="<int>"/>
+    <port type="sram" prefix="<string>" size="<int>" mode_select="<bool>" circuit_model_name="<string>" default_val="<int>"/>
  </circuit_model>

 .. note::  The circuit designs of I/O pads are highly dependent on the technology node and well optimized by engineers.
  Therefore, FPGA-Verilog/SPICE requires users to provide their customized Verilog/SPICE/Verilog netlists. A sample Verilog/SPICE netlist of an I/O pad can be found in the directory SpiceNetlists in the released package.

-  The information of input and output buffer should be clearly specified according to the customized netlist! The existence of input/output buffers will influence the decision in creating testbenches, which may leads to larger errors in power analysis.
+.. note:: The information of input and output buffer should be clearly specified according to the customized netlist! The existence of input/output buffers will influence the decision in creating testbenches, which may leads to larger errors in power analysis.

-Instructions of defining design parameters:
+I/O Pad Example
+```````````````

-* **port:** four types of ports (``input``, ``output``, ``inout`` and ``sram``) should be defined. If the user provides a user-defined Verilog/SPICE netlist, the bandwidth of ports should be defined to the same as the Verilog/SPICE netlist.
-
-**IO-pad example**
-
-:numref:`fig_iopad` depicts an IO-Pad.
+:numref:`fig_iopad` depicts an I/O pad.

 .. _fig_iopad:

 .. figure:: ./figures/iopad.png
-   :scale: 100%
+   :scale: 50%
   :alt: IO-Pad symbol

   An example of an IO-Pad
@ -635,16 +775,19 @@ The code describing this IO-Pad is:

 .. code-block:: xml

-  <circuit_model type="iopad" name="iopad" prefix="iopad" verilog_netlist="io.v">
+  <circuit_model type="iopad" name="iopad" prefix="iopad" spice_netlist="io.sp" verilog_netlist="io.v">
+    <design_technology type="cmos"/>
+    <input_buffer exist="true" circuit_model_name="INVTX1"/>
+    <output_buffer exist="true" circuit_model_name="INVTX1"/>
+    <pass_gate_logic circuit_model_name="TGATE"/>
    <port type="inout" prefix="pad" size="1"/>
-    <port type="sram" prefix="dir" size="1" circuit_model_name="scff"/>
-    <port type="input" prefix="data_in" size="1"/>
-    <port type="input" prefix="zin" size="1" is_global="true"/>
-    <port type="output" prefix="data out" size="1"/>
+    <port type="sram" prefix="en" size="1" mode_select="true" circuit_model_name="ccff" default_val="1"/>
+    <port type="input" prefix="outpad" size="1"/>
+    <port type="output" prefix="inpad" size="1"/>
  </circuit_model>

-**This example shows**
-
-	* The association of the verilog netlist file ``io.v``
-	* The inout pad port_type, which means as inout as output.
-	* The instantiation of a SCFF as sram
+This example shows
+  - A general purpose I/O cell defined in Verilog netlist ``io.sp`` and SPICE netlist ``io.sp`` 
+  - The I/O cell has an ``inout`` port as the bi-directional port
+  - The directionality of I/O can be controlled by a configuration-chain flip-flop defined in circuit model ``ccff``
+  - If unused, the I/O cell will be configured to ``1``
--- a/docs/source/arch_lang/circuit_modules.rst
+++ b/docs/source/arch_lang/circuit_modules.rst
@ -1,104 +0,0 @@
-Define Circuit-level Modules
-============================
-
-To support FPGA Verilog/SPICE, Verily and Bitstream Generator, physical modules containing gate-level and transistor-level features are required for FPGA primitive blocks.
-The physical modules are defined in XML syntax, similar to the original VPR FPGA architecture description language.
-
-For each module that appears in the FPGA architecture, a circuit model should be defined. In the definition of a circuit model, the user can specify if the Verilog/SPICE netlist of the module is either auto-generated or user-defined.
-
-Define circuit_models
---------------------
-
-.. code-block:: xml
-
-  <module_circuit_models>
-    <circuit_model type="string" name="string" prefix="string" is_default="int" 
-    circuit_netlist="string" verilog_netlist="string" dump_structural_verilog="string">
-      <transistor-level circuit_design_features="developped_further" />
-    </circuit_model>
-  </module_circuit_models>
-
-* **module_circuit_models**: the father node for all the circuit models. All the circuit models should be defined under this XML node.
-
-    * **circuit_model**: the child node defining transistor-level modeling parameters.
-
-        * **type**: can be [ ``inv_buf`` | ``pass_gate`` | ``gate`` | ``mux`` | ``wire`` | ``chan_wire`` | ``sram`` | ``lut`` | ``ff`` | ``scff`` | ``hard_logic`` | ``iopad`` ]. Specify the type of circuit model. The provided types cover all the modules in FPGAs. For the circuit models in the type of mux/wire/chan_wire/lut, FPGA-Verilog/SPICE can auto-generate Verilog/SPICE netlists. For the rest, FPGA-Verilog/SPICE requires a user-defined Verilog/SPICE netlist.
-
-        * **name**: define the name of this circuit model. The name should be unique and will be used to create the sub-circuit of the circuit model in Verilog/SPICE netlists. Note that for a customized Verilog/SPICE netlist, the name defined here should be the name of the top-level sub-circuit in the customized Verilog/SPICE netlist. FPGA-Verilog/SPICE will check if the given name is conflicted with any reserved words.
-
-        * **prefix**: specify the name of the circuit_model to shown in the auto-generated Verilog/SPICE netlists. The prefix can be the same as the name defined above. And again, the prefix should be unique.
-
-        * **is_default**: can be [``1`` | ``0``], corresponding to [``true`` | ``false``] respectively. Specify this circuit model is the default one for some modules, such as multiplexers. If a module is not linked to any circuit model by users, FPGA-Verilog/SPICE will find the default circuit model defined in the same type and link.  For a circuit model type, only one circuit model can be set as default.
-
-        * **circuit_netlist**: specify the path and file name of a customized Verilog/SPICE netlist. For some modules such as SRAMs, FFs, inpads, and outpads, FPGA-Verilog/SPICE does not support auto-generation of the transistor-level sub-circuits because their circuit design is highly dependent on the technology nodes. These circuit designs should be specified by users. For the other modules that can be auto-generated by FPGA-Verilog/SPICE, the user can also define a custom netlist. Multiplexers cannot be user-defined.
-
-        * **verilog_netlist**: specify the path and file name of a customized Verilog netlist. For some modules such as SRAMs, FFs, inpad and outpads, FPGA-Verilog/SPICE does not support auto-generation of the transistor-level sub-circuits because their circuit design is highly dependent on the technology nodes. These circuit designs should be specified by users. For the other modules that can be auto-generated by FPGA-Verilog/SPICE, the user can also define a custom netlist. Multiplexers cannot be user-defined.
-
-        * **dump_structural_verilog**: when the value of this keyword is set to be true, Verilog generator will output gate-level netlists of this module, instead of behavior-level. Gate-level netlists bring more opportunities in layout-level optimization while behavior-level is more suitable for high-speed formal verification and easier in debugging with HDL simulators.
-
-.. note:: If netlist is not specified, FPGA-Verilog/SPICE auto-generates the Verilog/SPICE netlists for multiplexers, wires, and LUTs.
-
-.. note:: The user-defined netlists, such as LUTs, the decoding methodology should comply with the auto-generated LUTs (See Section 4.5)
-
-.. note:: Under the XML node circuit_model, the features of transistor-level designs can be defined. In the following table, we show the common features supported for all the modules.  Then, we will introduce unique features supported only for some circuit models types.
-
-
-Transistor level
----------------
-
-.. code-block:: xml
-
-  <circuit_model type="string" name="string" prefix="string" is_default="int" netlist="string" 
-  dump_structural_verilog="string">
-    <design_technology type="string"/>
-    <input_buffer exist="string" circuit_model_name="string"/>
-    <output_buffer exist="string" circuit_model_name="string"/>
-    <pass_gate_logic type="string" circuit_model_name="string"/>
-    <port type="string" prefix="string" lib_name="string" size="int" default_val="int" circuit_model_name="string" 
-    mode_select="boolean" is_global="boolean" is_set="boolean" is_reset="boolean" 
-    is_config_enable="boolean"/>
-  </circuit_model>
-
-* design_technology :
-
-    * **type:** [cmos|rram]. Specify the type of design technology of the circuit_model.
-
-.. note:: Currently, the RRAM-based designs are only supported for multiplexers.
-
-* input_buffer and output_buffer:
-    
-    * **exist:** [on|off]. Define the existence of the input_buffer or output_buffer. Note that the existence is valid for all the inputs and outputs. Note that if users want only part of the inputs (or outputs) to be buffered, this is not supported here. A solution can be building a user-defined Verilog/SPICE netlist.
-
-    * **circuit_model_name:** Specify the name of circuit model which is used to implement input/output buffer, the type of specified circuit model should be inv_buf.
-
-* pass_gate_logic: defined the parameters in pass-gates, which are used in building multiplexers and LUTs.
-
-    * **circuit_model_name:** Specify the name of the circuit model which is used to implement transmission gate, the type of specified circuit model should be pass_gate.
-
-* port: define the port list of a circuit model.
-
-    * **type:** can be [input|output|sram|clock]. For programmable modules, such as multiplexers and LUTs, SRAM ports should be defined. For registers, such as FFs and memory banks, clock ports should be defined.
-
-    * **prefix:** the name of the port to appear in the autogenerated netlists. Each port will be shown as ``<prefix>[i]`` in Verilog/SPICE netlists.
-
-    * **lib_name:** the name of the port defined in standard cells or customized cells. If not specified, this attribute will be the same as ``prefix``.
-
-    * **size:** bandwidth of the port.
-
-    * **default_val:**  default logic value of a port, which is used as the initial logic value of this port in testbench generation. Can be either 0 or 1. We assume each pin of this port has the same default value.
-
-    * **circuit_model_name:** only valid when the type of port is sram. Specify the name of the circuit model which is connected to this port.
-
-    * **mode_select:** can be either ``true`` or ``false``. Specify if this port controls the mode switching in a configurable logic block. Only valid when the type of this port is sram. (A configurable logic block can operate in different modes, which is controlled by SRAM bits.)
-
-    * **is_global:** can be either ``true`` or ``false``. Specify if this port is a global port, which will be routed globally. Note that when multiple global ports are defined with the same name, these global ports will be short-wired together.
-
-    * **is_set:** can be either ``true`` or ``false``. Specify if this port controls a set signal. Only valid when ``is_global`` is true. All the set ports are connected to global set voltage stimuli in testbenches.
-
-    * **is_reset:** can be either ``true`` or ``false``. Specify if this port controls a reset signal. Only valid when ``is_global`` is true. All the reset ports are connected to a global reset voltage stimuli in testbenches.
-
-    * **is_config_enable:** can be either ``true`` or ``false``. Only valid when ``is_global`` is true. Specify if this port controls a configuration-enable signal. This port is only enabled during FPGA configuration, and always disabled during FPGA operation. All the ``config_enable`` ports are connected to global configuration-enable voltage stimuli in testbenches.
-
-.. note::  Different types of ``circuit_model`` have different XML syntax, with which users can highly customize their circuit topologies. See refer to examples of ``circuit_model`` for more details.
-
-.. note:: Note that we have a list of reserved port names, which indicate the usage of these ports when building FPGA fabrics. Please do not use ``mem_out``, ``mem_inv``, ``bl``, ``wl``, ``blb``, ``wlb``, ``ccff_head`` and ``ccff_tail``.
--- a/docs/source/arch_lang/direct_interconnect.rst
+++ b/docs/source/arch_lang/direct_interconnect.rst
@ -1,10 +1,12 @@
-Interconnection extensions
-==========================
+.. _direct_interconnect:
+
+Inter-Tile Direct Interconnection extensions
+--------------------------------------------

 This section introduces extensions on the architecture description file about existing interconnection description.

 Directlist
----------
+~~~~~~~~~~

 The original direct connections in the directlist section are documented here_. Its description is given below:

@ -26,40 +28,49 @@ Our extension include three more options:
    <direct name="string" from_pin="string" to_pin="string" x_offset="int" y_offset="int" z_offset="int" switch_name="string" interconnection_type="string" x_dir="string" y_dir="string"/>
  </directlist>

-.. note:: these options are optional. However, if *interconnection_type* is set *x_dir* and *y_dir* are required.
+.. note:: these options are optional. However, if `interconnection_type` is set `x_dir` and `y_dir` are required.

-* **interconnection_type**: [``NONE`` | ``column`` | ``row``], specifies if it applies on a column or a row ot if it doesn't apply.
+.. option:: interconnection_type="<string>"

-* **x_dir**: [``positive`` | ``negative``], specifies if the next cell to connect has a bigger or lower x value. Considering a coordinate system where (0,0) is the origin at the bottom left and *x* and *y* are positives: 
+  the type of interconnection should be a string.
+  Available types are ``NONE`` | ``column`` | ``row``, specifies if it applies on a column or a row ot if it doesn't apply.

-    * x_dir="positive": 
+.. option:: x_dir="<string>"

-        * interconnection_type="column": a column will be connected to a column on the **right**, if it exists.
+  Available directionalities are ``positive`` | ``negative``, specifies if the next cell to connect has a bigger or lower ``x`` value.
+  Considering a coordinate system where (0,0) is the origin at the bottom left and ``x`` and ``y`` are positives: 

-        * interconnection_type="row": the most on the **right** cell from a row connection will connect the most on the **left** cell of next row, if it exists.
+    - x_dir="positive": 

-    * x_dir="negative": 
+        - interconnection_type="column": a column will be connected to a column on the ``right``, if it exists.

-        * interconnection_type="column": a column will be connected to a column on the **left**, if it exists.
+        - interconnection_type="row": the most on the ``right`` cell from a row connection will connect the most on the ``left`` cell of next row, if it exists.

-        * interconnection_type="row": the most on the **left** cell from a row connection will connect the most on the **right** cell of next row, if it exists.
+    - x_dir="negative": 

-* **y_dir**: [``positive`` | ``negative``], specifies if the next cell to connect has a bigger or lower x value. Considering a coordinate system where (0,0) is the origin at the bottom left and *x* and *y* are positives:
+        - interconnection_type="column": a column will be connected to a column on the ``left``, if it exists.

-    * y_dir="positive": 
+        - interconnection_type="row": the most on the ``left`` cell from a row connection will connect the most on the ``right`` cell of next row, if it exists.

-        * interconnection_type="column": the **bottom** cell of a column will be connected to the next column **top** cell, if it exists.
+.. option:: y_dir="<string>"

-        * interconnection_type="row": a row will be connected on an **above** row, if it exists.
+  Available directionalities are ``positive`` | ``negative``, specifies if the next cell to connect has a bigger or lower x value.
+  Considering a coordinate system where (0,0) is the origin at the bottom left and `x` and `y` are positives:

-    * y_dir="negative": 
+    - y_dir="positive": 

-        * interconnection_type="column": the **top** cell of a column will be connected to the next column **bottom** cell, if it exists.
+        - interconnection_type="column": the ``bottom`` cell of a column will be connected to the next column ``top`` cell, if it exists.

-        * interconnection_type="row": a row will be connected on a row **below**, if it exists.
+        - interconnection_type="row": a row will be connected on an ``above`` row, if it exists.
+
+    - y_dir="negative": 
+
+        - interconnection_type="column": the ``top`` cell of a column will be connected to the next column ``bottom`` cell, if it exists.
+
+        - interconnection_type="row": a row will be connected on a row ``below``, if it exists.

 Example
-------
+~~~~~~~

 For this example, we will study a scan-chain implementation. The description could be:

@ -81,7 +92,7 @@ For this example, we will study a scan-chain implementation. The description cou
 In this figure, the red arrows represent the initial direct connection. The green arrows represent the point to point connection to connect all the columns of CLB.

 Truth table
-----------
+~~~~~~~~~~~

 A point to point connection can be applied in different ways than showed in the example section. To help the designer implement his point to point connection, a truth table with our new parameters id provided below.

--- a/docs/source/arch_lang/figures/ccff_fpga.png
+++ b/docs/source/arch_lang/figures/ccff_fpga.png
--- a/docs/source/arch_lang/figures/thru_channel.png
+++ b/docs/source/arch_lang/figures/thru_channel.png
--- a/docs/source/arch_lang/generality.rst
+++ b/docs/source/arch_lang/generality.rst
@ -1,8 +1,46 @@
+.. _generality:
+
 General Hierarchy
-=================
+-----------------
+
+For OpenFPGA using VPR7
+~~~~~~~~~~~~~~~~~~~~~~~
+
 The extension of the VPR architectural description language is developed as an independent branch of the original one. Most of the FPGA-SPICE descriptions are located under a XML node called <spice_settings>, which is a child node under the root node <architecture>. 
 Under the <spice_settings>, some child node is created for describing SPICE simulation settings, technology library and transistor-level modeling of circuit modules.
 In the following sub-sections, we will introduce the structures of these XML nodes and the parameters provided.

+For OpenFPGA using VPR8
+~~~~~~~~~~~~~~~~~~~~~~~
+
+OpenFPGA uses a separated XML file other than the VPR8 architecture description file.
+This is to keep a loose integration to VPR8 so that OpenFPGA can easily integrate any future version of VPR with least engineering effort.
+However, to implement a physical FPGA, OpenFPGA requires the original VPR XML to include full physical design details.
+Full syntax can be found in :ref:`addon_vpr_syntax`.
+
+The OpenFPGA architecture description XML file consisting of the following parts:
+
+  - ``<openfpga_architecture>`` contains architecture-level information, such as device-level description, circuit-level and architecture annotations to original VPR architecture XML. It consists of the following code blocks
+
+    - ``<circuit_library>`` includes a number of ``circuit_model``, each of which describe a primitive block in FPGA architecture, such as Look-Up Tables and multiplexers. Full syntax can be found in :ref:`circuit_library`.
+    - ``<technology_library>`` includes transistor-level parameters, where users can specify which transistor models are going to be used when building the ``circuit models``. 
+    - ``<configuration_protocol>`` includes detailed description on the configuration protocols to be used in FPGA fabric.
+    - ``<connection_block>`` includes annotation on the connection block definition ``<connection_block>`` in original VPR XML
+    - ``<switch_block>`` includes annotation on the switch block definition ``<switchlist>`` in original VPR XML
+    - ``<routing_segment>`` includes annotation on the routing segment definition ``<segmentlist>`` in original VPR XML
+    - ``<direct_connection>`` includes annotation on the inter-tile direct connection definitioin ``<directlist>`` in original VPR XML
+    - ``<pb_type_annotation>`` includes annotation on the programmable block architecture ``<complexblocklist>`` in original VPR XML
+
+  - ``<openfpga_simulation_setting>`` includes all the parameters to be used in generate testbenches in simulation purpose. Full syntax can be found in :ref:`simulation_setting`.
+
+    - ``<clock_setting>`` defines the clock-related settings in simulation, such as clock frequency and number of clock cycles to be used
+    - ``<simulator_option>`` defines universal options available in both HDL and SPICE simulators. This is mainly used by FPGA-SPICE
+    - ``<monte_carlo>`` defines critical parameters to be used in monte-carlo simulations. This is used by FPGA-SPICE
+    - ``<measurement_setting>`` defines the parameters used to measure signal slew and delays. This is used by FPGA-SPICE
+    - ``<stimulus>`` defines the parameters used to generate voltage stimuli in testbenches. This is used by FPGA-SPICE
+
+.. note:: ``<technology_library>`` will be applied to ``circuit_model`` when running FPGA-SPICE. It will not impact FPGA-Verilog, FPGA-Bitstream, FPGA-SDC.
+
+.. note:: the parameters in ``<clock_setting>`` will be applied to both FPGA-Verilog and FPGA-SPICE simulations


--- a/docs/source/arch_lang/index.rst
+++ b/docs/source/arch_lang/index.rst
@ -1,6 +1,3 @@
-Extended Architecture Description Language
-==========================================
-
 .. _arch_lang:
   Extended FPGA Architecture Description Language
 
@ -9,16 +6,18 @@ Extended Architecture Description Language

   generality

-   interconnect
-   
-   spice_sim_setting
+   addon_vpr_syntax

-   tech_lib
+   direct_interconnect 
+   
+   simulation_setting
+
+   technology_library
  
-   circuit_modules
+   circuit_library
  
   circuit_model_examples
  
-   link_circuit_modules
+   annotate_vpr_arch

   
--- a/docs/source/arch_lang/link_circuit_modules.rst
+++ b/docs/source/arch_lang/link_circuit_modules.rst
@ -1,184 +0,0 @@
-Link circuit modules
--------------------
-Each defined circuit model should be linked to an FPGA module defined in the original part of architecture descriptions. It helps FPGA-circuit creating the circuit netlists for logic/routing blocks. Since the original part lacks such support, we create a few XML properties to link to Circuit models.
-
-SRAM
-====
-
-To link the defined circuit model of SRAM into the FPGA architecture description, a new line in XML format should be added under the XML node device. The new XML node is named as sram, which defines the area of an SRAM and the name of the circuit model to be linked. An example is shown as follows:
-
-.. code-block:: xml
-
-  <sram area=”int” circuit_model_name=”string”>
-
-  <sram>
-    <spice organization="string" circuit_model_name="scff"/>
-    <verilog organization="string" circuit_model_name="scff"/>
-  </sram>
-
-* **area:** is expressed in terms of the number of minimum width transistors. The SRAM area defined in this line is used in the area estimation of global routing multiplexers. circuit_model_name should match the name of the circuit model that has been defined under XML node module_circuit_model. The type of the linked circuit model should be sram.
-
-* **organization:** [scan-chain|memory_bank|standalone], is the type of configuration circuits.
-
-:numref:`fig_sram` illustrates an example where a memory organization using memory decoders and 6-transistor SRAMs.
-
-.. _fig_sram:
-
-.. figure:: figures/sram.png
-   :scale: 100%
-   :alt: map to buried treasure
- 
-   Example of a memory organization using memory decoders 
-
-.. note:: Currently circuit only supports standalone memory organization.
-
-.. note:: Currently RRAM-based FPGA only supports memory-bank organization for Verilog Generator.
-
-Here is an example.
-
-.. code-block:: xml
-
-  <sram area=”4” circuit_model_name=”sram6T”>
-
-
-Switch Boxes
-=============
-
-Original VPR architecture description contains an XML node called switchlist under which all the multiplexers of switch blocks are described.
-To link a defined circuit model to a multiplexer in the switch blocks, a new XML property circuit_model_name should be added to the descriptions.
-
-Here is an example:
-
-.. code-block:: xml
-
-  <switchlist>
-    <switch type=”mux” name=”string” R=”float” Cin=”float” Cout=”float” Tdel=”float” mux_trans_size=”float” buf_size=”float” circuit_model_name=”string”/>
-  </switchlist>
-
-* **circuit_model_name:** should match a circuit model whose type is mux defined under module_circuit_models.
-
-
-Connection Blocks
-==================
-
-To link the defined circuit model of the multiplexer to the Connection Blocks, a circuit_model_name should be added to the definition of Connection Blocks switches.  However, the original architecture descriptions do not offer a switch description for connection boxes as they do for the switch blocks.
-Therefore, FPGA-circuit requires a new XML node called **cblock** under the root XML node architecture, where a switch for connection blocks can be defined.
-
-Here is the example:
-
-.. code-block:: xml
-
-  <cblock>
-    <switch type=”mux” name=”string” R=”float” Cin=”float” Cout=”float” Tdel=”float” mux_trans_size=”float” buf_size=”float” circuit_model_name=”string”/>
-  </cblock>
-
-* **circuit_model_name:** should match a circuit model whose type is mux defined under module_circuit_models.
-
-Channel Wire Segments
-=====================
-
-Similar to the Switch Boxes and Connection Blocks, the channel wire segments in the original architecture descriptions can be adapted to provide a link to the defined circuit model.
-
-.. code-block:: xml
-
-  <segmentlist>
-    <segment freq=”float” length=”int” type=”string” Rmetal=”float” Cmetal=”float” circuit_model_name=”string”/>
-  </segmentlist>
-
-* circuit_model_name: should match a circuit model whose type is chan_wire defined under module_circuit_models.
-
-Primitive Blocks inside Multi-mode Configurable Logic Blocks
-=============================================================
-
-The architecture description employs a hierarchy of ``pb_types`` to depict the sub-modules and complex interconnections inside logic blocks. Each leaf node and interconnection in the pb_type hierarchy should be linked to a circuit model.
-Each primitive block, i.e., the leaf ``pb_types``, should be linked to a valid circuit model, using the XML syntax ``circuit_model_name``.
-The ``circuit_model_name`` should match the given name of a ``circuit_model`` defined by users.
-
-.. code-block:: xml
-  
-  <!-- Multi-mode BLE -->
-  <pb_type name="ble" num_pb="10" physical_mode_name="ble_phy"/>
-    <!-- Physical implementation of BLE shown in Fig. :ref:`` --> 
-    <mode name="ble_phy" disabled_in_packing="true"/>
-      <!-- Define a 6-input LUT in BLE and link it to circuit model -->
-      <pb_type name="flut6_phy" circuit_model_name="frac_lut6">
-        <input name="in" num_pins="6"/>
-        <output name="lut4_out" num_pins="4"/>
-        <output name="lut5_out" num_pins="2"/>
-        <output name="lut6_out" num_pins="1"/>
-      </pb_type>
-      <pb_type name="lut4_phy" circuit_model_name="lut4">
-        <input name="in" num_pins="4"/>
-        <output name="out" num_pins="1"/>
-      </pb_type>
-      <pb_type name="adder_phy" num_pb="2" circuit_model_name="adder">
-        <input name="a" num_pins="1"/>
-        <input name="b" num_pins="1"/>
-        <input name="cin" num_pins="1"/>
-        <output name="cout" num_pins="1"/>
-        <output name="sumout" num_pins="1"/>
-      </pb_type>
-      <pb_type name="ff_phy" num_pb="2" circuit_model_name="dff">
-        <input name="D" num_pins="1"/>
-        <output name="Q" num_pins="1"/>
-        <clock name="clk" num_pins="1"/>
-      </pb_type>
-      <interconnect>
-      <!-- Routing multiplexers are omitted in this example. -->
-      </interconnect>
-    </mode>
-    <!-- Arithmetic mode of BLE shown in Fig. 2(b)-->
-    <mode name="flut4_arithmetic"/>
-      <pb_type name="flut4_arith" num_pb="4"/>
-        <!-- Define a virtual 4-input LUT in BLE and link it to physical 6-input LUT defined at LINE 6 -->
-        <pb_type name="lut4" mode_bits="01" physical_pb_type_name="flut6_phy">
-          <!-- Define an input port and link it to its physical port defined at LINE 7 -->
-          <input name="in" num_pins="4" physical_mode_pin="in[3:0]"/>
-          <!-- Define an output port and link it to its physical port defined at LINE 8 -->
-          <output name="out" num_pins="1" physical_mode_pin="lut4_out"/>
-        </pb_type>
-        <pb_type name="adder" num_pb="2" physical_pb_type_name="adder_phy">
-          <input name="a" num_pins="1" physical_mode_pin="a"/>
-          <input name="b" num_pins="1" physical_mode_pin="b"/>
-          <input name="cin" num_pins="1" physical_mode_pin="cin"/>
-          <output name="cout" num_pins="1" physical_mode_pin="cout"/>
-          <output name="sumout" num_pins="1" physical_mode_pin="sumout"/>
-        </pb_type>
-        <pb_type name="ff" num_pb="2" physical_pb_type_name="ff_phy">
-          <input name="D" num_pins="1" physical_mode_pin="D"/>
-          <output name="Q" num_pins="1" physical_mode_pin="Q"/>
-          <clock name="clk" num_pins="1" physical_mode_pin="clk"/>
-        </pb_type>
-        <interconnect>
-        <!-- Routing multiplexers are omitted in this example. Full details can be found in [21] -->
-        </interconnect>
-      </pb_type>
-    </mode>
-  <!-- More operating modes can be defined -->
-  </pb_type>
-
-* **physical_mode_name:** tell the name of the mode that describes the physical implementation of the configurable block. This is critical in modeling actual circuit designs and architecture of an FPGA. Typically, only one physical_mode should be specified for each multi-mode ``pb_type``.
-
-* **idle_mode_name:** tell the name of the mode that the ``pb_type`` is configured to be by default. This is critical in building circuit netlists for unused logic blocks.
-
-* **circuit_model_name:** should match a circuit model defined under ``module_circuit_models``. The ``circuit_model_name`` is mandatory for every leaf ``pb_type`` in a physical_mode ``pb_type``. For the interconnection type direct, the type of the linked circuit model should be wire. For multiplexers, the type of linked circuit model should be ``mux``. For complete, the type of the linked circuit model can be either ``mux`` or ``wire``, depending on the case.
-
-* **mode_bits** specifies the configuration bits for the ``circuit_model`` when operating at an operating mode. The length of ``mode_bits`` should match the ``port`` size defined in ``circuit_model``. The ``mode_bits`` should be derived from circuit designs while users are responsible for its correctness. FPGA-Bitstreamm will add the ``mode_bits`` during bitstream generation.
-
-* **physical_pb_type_name** creates the link on ``pb_type`` between operating and physical modes. This syntax is mandatory for every leaf ``pb_type`` in an operating mode ``pb_type``. It should be a valid name of leaf ``pb_type`` in physical mode.   
-
-* **physical_pb_type_index_factor** aims to align the indices for ``pb_type`` between operating and physical modes, especially when an operating mode contains multiple ``pb_type`` (``num_pb``>1) that are linked to the same physical ``pb_type``. When ``physical_pb_type_name`` is larger than 1, the  index of ``pb_type`` will be multipled by the given factor. 
-
-* **physical_pb_type_index_offset** aims to align the indices for ``pb_type`` between operating and physical modes, especially when an operating mode contains multiple ``pb_type`` (``num_pb``>1) that are linked to the same physical ``pb_type``. When ``physical_pb_type_name`` is larger than 1, the  index of ``pb_type`` will be shifted by the given factor. 
-
-* **physical_mode_pin** creates the linke on ``port`` of ``pb_type`` between operating and physical modes. This syntax is mandatory for every leaf ``pb_type`` in an operating mode ``pb_type``. It should be a valid ``port`` name of leaf ``pb_type`` in physical mode and the port size should also match. 
-
-* **physical_mode_pin_rotate_offset** aims to align the pin indices for ``port`` of ``pb_type`` between operating and physical modes, especially when an operating mode contains multiple ``pb_type`` (``num_pb``>1) that are linked to the same physical ``pb_type``. When ``physical_mode_pin_rotate_offset`` is larger than zero, the pin index of ``pb_type`` (whose index is large than 1) will be shifted by the given offset. 
-
-.. note::
-  It is highly recommended that only one physical mode is defined for a multi-mode configurable block. Try not to use nested physical mode definition. This will ease the debugging and lead to clean XML description. 
-
-.. note::
-  Be careful in using ``physical_pb_type_index_factor``, ``physical_pb_type_index_offset`` and ``physical_mode_pin_rotate_offset``! Try to avoid using them unless for highly complex configuration blocks with very deep hierarchy. 
-
-
--- a/docs/source/arch_lang/simulation_setting.rst
+++ b/docs/source/arch_lang/simulation_setting.rst
@ -0,0 +1,373 @@
+.. _simulation_setting:
+
+Simulation settings
+-------------------
+
+For OpenFPGA using VPR7
+~~~~~~~~~~~~~~~~~~~~~~~
+
+All the parameters that need to be defined in the HSPICE simulations are located under a child node called <parameters>, which is under its father node <spice_settings>. 
+The parameters are divided into three categories and can be defined in three XML nodes, <options>, <measure> and <stimulate>, respectively. 
+
+* The XML node <options>
+
+.. code-block:: xml
+
+   <options sim_temp=”int” post=”string”captab=”string” fast=”string”/> 
+
+These properties define the options that will be printed in the top SPICE netlists.
+
+* **sim_temp:** specify the temperature which will be defined in SPICE netlists. In the top SPICE netlists, it will show as .temp <int>.
+
+* **post:** [on|off]. Specify if the simulation waveforms should be printed out after SPICE simulations. In all the SPICE netlists, it will show as .option POST when turned on.
+
+.. note:: when the SPICE netlists are large or a long simulation duration is defined, the post option is recommended to be off. If not, huge disk space will be occupied by the waveform files.
+
+* **captab:** [on|off]. Specify if the capacitances of all the nodes in the SPICE netlists will be printed out. In the top SPICE netlists, it will show as .option CAPTAB when turned on. When turned on, the SPICE simulation runtime may increase.
+
+* The XML node <stimulate>
+
+.. code-block:: xml
+
+    <stimulate>
+      <clock op_freq=”auto|float” sim_slack=”float” prog_freq=”float”>
+        <rise slew_time=”float” slew_type=”string”/>
+        <fall slew_time=”float” slew_type=”string”/>
+      </clock>
+    </stimulate>
+
+Define stimulates for the clock signal.
+
+* **op_freq:** either auto or a float number (unit:[Hz])  Specify the operation clock frequency that is used in SPICE simulations. This frequency is used in testbenches for operation phase simulation. Note that this is a mandatory option. Users have to specify either this frequency is automatically determined by assigning “auto” or give an exact number. If this clock frequency is specified, the sim_slack option is disregarded.
+
+* **sim_slack:** add slack to the critical path delay in the SPICE simulation. For example, sim_slack=0.2 implies that the clock period in SPICE simulations is 1.2 of the critical path delay reported by VPR. **Only valid when option op_freq is not specified.**
+
+* **prog_freq:** Specify the programming clock frequency that is used in SPICE simulations. This frequency is used in testbenches for programming phase simulation.
+
+* **slew_type & slew_time:** define the slew of clock signals at the rising/falling edge. Property slew_type can be either absolute or fractional [abs|frac]. 
+
+    * The type of **absolute** implies that the slew time is the absolute value. For example, slew_time=20e-12, slew_type=abs means that the slew of a clock signal is  20ps. 
+    * The type of **fractional** means that the slew time is related to the period (frequency) of the clock signal. For example, slew_time=0.05, slew_type=frac means that the slew of a clock signal takes 5% of the period of the clock.
+
+:numref:`fig_meas_edge` depicts the definition of the slew and delays of signals and the parameters that can be supported by FPGA-SPICE.
+
+.. code-block:: xml
+
+     <stimulate>
+       <input>
+         <rise slew_time=”float” slew_type=”string”/>
+         <fall slew_time=”float” slew_type=”string”/>
+       </input>
+     </stimulate>
+
+Define the slew of input signals at the rising/falling edge.
+
+* **slew_type & slew_time:** define the slew of all the input signals at the rising/falling edge. Property slew_type can be either absolute or fractional [abs|frac]. 
+
+    * The type of **absolute** implies that the slew time is the absolute value. For example, slew_time=20e-12, slew_type=abs means that the slew of a clock signal is  20ps. 
+
+    * The type of **fractional** means that the slew time is related to the period (frequency) of the clock signal. For example, slew_time=0.05, slew_type=frac means that the slew of a clock signal takes 5% of the period of the clock.
+
+.. note:: These slew settings are valid for all the input signals of the testbenches in different complexity levels.
+
+.. _fig_meas_edge:
+
+.. figure:: figures/meas_edge.png 
+   :scale: 100%
+   :alt: map to buried traesure
+  
+   Parameters in measuring the slew and delay of signals
+
+* The XML node <measure>
+
+.. code-block:: xml
+    
+   <measure sim_num_clock_cycle=”int”accuracy=”float”accuracy_type=”string”/>
+
+* **sim_num_clock_cycle:** can be either “auto” or an integer. By setting to “auto”, FPGA-SPICE automatically determines the number of clock cycles to simulate, which is related to the average of all the signal density in ACE2 results. When set to an integer, FPGA-SPICE will use the given number of clock cycles in the SPICE netlists.
+    
+* **accuracy_type:** [abs|frac]. Specify the type of transient step in SPICE simulation. 
+
+    * When **abs** is selected, the accuracy should be the absolute value, such as 1e-12. 
+
+    * When **frac** is selected, the accuracy is the number of simulation points in a clock cycle period, for example, 100.
+    
+* **accuracy:** specify the transient step in SPICE simulation. Typically, the smaller the step is, the higher the accuracy that can be reached while the long simulation runtime is. The recommended accuracy is between 0.1ps and 0.01ps, which generates good accuracy and runtime is not significantly long. 
+    
+.. note:: Users can define the parameters in measuring the slew of signals, under a child node <slew> of the node <measure>.
+
+.. code-block:: xml
+    
+    <rise upper_thres_pct=”float” lower_thres_pct=”float”/>
+
+Define the starting and ending point in measuring the slew of a rising edge of a signal.
+    
+* **upper_thres_pct:** the ending point in measuring the slew of a rising edge. It is expressed as a percentage of the maximum voltage of a signal. For example, the meaning of upper_thres_pct=0.95 is depicted in Figure 2. 
+    
+* **lower_thres_pct:** the starting point in measuring the slew of a rising edge. It is expressed as a percentage of the maximum voltage of a signal. For example, the meaning of lower_thres_pct=0.05 is depicted in Figure 2.
+    
+.. code-block:: xml
+    
+    <fall upper_thres_pct=”float” lower_thres_pct=”float”/>
+
+* **upper_thres_pct:** the ending point in measuring the slew of a falling edge. It is expressed as a percentage of the maximum voltage of a signal. For example, the meaning of upper_thres_pct=0.05 is depicted in Figure 2.
+    
+ * **lower_thres_pct:** the starting point in measuring the slew of a falling edge. It is expressed as a percentage of the maximum voltage of a signal. For example, the meaning of lower_thres_pct=0.95 is depicted in Figure 2.
+    
+    
+.. note:: Users can define the parameters related to measurements of delays between signals, under a child node <delay> of the node <measure>.
+
+.. code-block:: xml
+    
+    <rise input_thres_pct=”float” output_thres_pct=”float”/>
+
+Define the starting and ending point in measuring the delay between two signals when they are both at a rising edge.
+    
+* **input_thres_pct:** the starting point in measuring the delay of a rising edge. It is expressed as a percentage of the maximum voltage of a signal. For example, the meaning of input_thres_pct=0.5 is depicted in Figure 2.     
+
+* **output_thres_pct:** the ending point in measuring the delay of a rising edge. It is expressed as a percentage of the maximum voltage of a signal. For example, the meaning of output_thres_pct=0.5 is depicted in Figure 2.
+    
+.. code-block:: xml
+    
+    <fall input_thres_pct=”float” output_thres_pct=”float”/>
+
+Define the starting and ending point in measuring the delay between two signals when they are both at a falling edge.
+
+* **input_thres_pct:** the starting point in measuring the delay of a falling edge. It is expressed as a percentage of the maximum voltage of a signal. For example, upper_thres_pct=0.5 is depicted in :numref:`fig_meas_edge`. 
+    
+* **output_thres_pct:** the ending point in measuring the delay of a falling edge. It is expressed as a percentage of the maximum voltage of a signal. For example, lower_thres_pct=0. 5 is depicted in :numref:`fig_meas_edge`.
+
+
+For OpenFPGA using VPR8
+~~~~~~~~~~~~~~~~~~~~~~~
+
+All the simulation settings are stored under the XML node ``<openfpga_simulation_setting>``
+General organization is as follows
+
+.. code-block:: xml
+
+    <openfpga_simulation_setting>
+      <clock_setting>
+        <operating frequency="<int>|<string>" num_cycles="<int>|<string>" slack="<float>"/>
+        <programming frequency="<int>"/>
+      </clock_setting>
+      <simulator_option>
+        <operating_condition temperature="<int>"/>
+        <output_log verbose="<bool>" captab="<bool>"/>
+        <accuracy type="<string>" value="<float>"/>
+        <runtime fast_simulation="<bool>"/>
+      </simulator_option>
+      <monte_carlo num_simulation_points="<int>"/>
+      <measurement_setting>
+        <slew>
+          <rise upper_thres_pct="<float>" lower_thres_pct="<float>"/>
+          <fall upper_thres_pct="<float>" lower_thres_pct="<float>"/>
+        </slew>
+        <delay>
+          <rise input_thres_pct="<float>" output_thres_pct="<float>"/>
+          <fall input_thres_pct="<float>" output_thres_pct="<float>"/>
+        </delay>
+      </measurement_setting>
+      <stimulus>
+        <clock>
+          <rise slew_type="<string>" slew_time="<float>"/>
+          <fall slew_type="<string>" slew_time="<float>"/>
+        </clock>
+        <input>
+          <rise slew_type="<string>" slew_time="<float>"/>
+          <fall slew_type="<string>" slew_time="<float>"/>
+        </input>
+      </stimulus>
+    </openfpga_simulation_setting>
+
+Clock Setting
+^^^^^^^^^^^^^
+Clock setting focuses on defining the clock periods to applied on FPGA fabrics
+As a programmable device, an FPGA has two types of clocks. 
+The first is the operating clock, which is applied by users' implementations.
+The second is the programming clock, which is applied on the configuration protocol to load users' implementation to FPGA fabric.
+OpenFPGA allows users to freely define these clocks as well as the number of clock cycles.
+We should the full syntax in the code block below and then provide details on each of them.
+
+.. code-block:: xml
+
+  <clock_setting>
+    <operating frequency="<float>|<string>" num_cycles="<int>|<string>" slack="<float>"/>
+    <programming frequency="<float>"/>
+  </clock_setting>
+
+Operating clock setting
+```````````````````````
+Operating clocks are defined under the XML node ``<operating>``
+
+.. option:: <operating frequency="<float>|<string>" num_cycles="<int>|<string>" slack="<float>"/>
+
+- ``frequency="<float|string>``
+  Specify frequency of the operating clock. OpenFPGA allows users to specify an absolute value in the unit of ``[Hz]`` 
+  Alternatively, users can bind the frequency to the maximum clock frequency analyzed by VPR STA engine.
+  This is very useful to validate the maximum operating frequency for users' implementations
+  In such case, the value of this attribute should be a reserved word ``auto``.
+
+- ``num_cycles="<int>|<string>"``
+  can be either ``auto`` or an integer. When set to ``auto``, OpenFPGA will infer the number of clock cycles from the average/median of all the signal activities.
+  When set to an integer, OpenFPGA will use the given number of clock cycles in HDL and SPICE simulations.
+
+- ``slack="<float>"``
+  add a margin to the critical path delay in the HDL and SPICE simulations.
+  This parameter is applied to the critical path delay provided by VPR STA engine.
+  So it is only valid when option ``frequency`` is set to ``auto``.
+  This aims to compensate any inaccuracy in STA results.
+  Typically, the slack value is between ``0`` and ``1``. 
+  For example, ``slack=0.2`` implies that the actual clock period in simulations is 120% of the critical path delay reported by VPR. 
+
+.. note:: Only valid when option ``frequency`` is set to ``auto``
+
+.. warning:: Avoid to use a negative slack! This may cause your simulation to fail!
+
+Programming clock setting
+`````````````````````````
+Programming clocks are defined under the XML node ``<programming>``
+
+.. option:: <programming frequency="<float>"/>
+
+- ``frequency="<float>"``
+  Specify the frequency of the programming clock using an absolute value in the unit of ``[Hz]`` 
+  This frequency is used in testbenches for programming phase simulation.
+
+.. note:: Programming clock frequency is typically much slower than the operating clock and strongly depends on the process technology. Suggest to characterize the speed of your configuration protocols before specifying a value!
+
+Simulator Option
+^^^^^^^^^^^^^^^^
+This XML node includes universal options available in both HDL and SPICE simulators.
+
+.. note:: This is mainly used by FPGA-SPICE
+
+Operating condition
+```````````````````
+
+.. option:: <operating_condition temperature="<int>"/>``
+
+- ``temperature="<int>"``
+  Specify the temperature which will be defined in SPICE netlists. In the top SPICE netlists, it will show as 
+
+.. code-block:: python
+
+    .temp <int>
+
+Output logs
+```````````
+
+.. option:: <output_log verbose="<bool>" captab="<bool>"/>``
+
+  Specify the options in outputting simulation results to log files
+
+- ``verbose="true|false"``
+
+  Specify if the simulation waveforms should be printed out after SPICE simulations. If turned on, it will show in all the SPICE netlists
+
+.. code-block:: python
+  
+  .option POST
+
+.. note:: when the SPICE netlists are large or a long simulation duration is defined, the post option is recommended to be off. If not, huge disk space will be occupied by the waveform files.
+
+- ``captab="true|false"``
+  Specify if the capacitances of all the nodes in the SPICE netlists will be printed out. If turned on, it will show in the top-level SPICE netlists
+
+.. code-block:: python
+
+  .option CAPTAB 
+
+.. note:: When turned on, the SPICE simulation runtime may increase.
+
+Simulation Accuracy
+```````````````````
+
+.. option:: <accuracy type="<string>" value="<float>"/>``
+
+  Specify the simulation steps (accuracy) to be used
+
+- ``type="abs|frac"``
+
+  Specify the type of transient step in SPICE simulation. 
+
+    * When ``abs`` is selected, the accuracy should be the absolute value, such as ``1e-12``. 
+
+    * When ``frac`` is selected, the accuracy is the number of simulation points in a clock cycle period, for example, 100.
+    
+- ``value="<float>"``
+
+  Specify the transient step in SPICE simulation. Typically, the smaller the step is, the higher the accuracy that can be reached while the long simulation runtime is. The recommended accuracy is between 0.1ps and 0.01ps, which generates good accuracy and runtime is not significantly long. 
+
+Simulation Speed
+````````````````
+    
+.. option:: <runtime fast_simulation="<bool>"/>
+
+  Specify if any runtime optimization will be applied to the simulator.  
+
+- ``fast_simulation="true|false"``
+
+  Specify if fast simulation is turned on for the simulator.  
+
+   If turned on, it will show in the top-level SPICE netlists
+
+.. code-block:: python
+
+  .option fast 
+
+Monte Carlo Simulation
+``````````````````````
+
+.. option:: <monte_carlo num_simulation_points="<int>"/>
+   
+   Run SPICE simulations in monte carlo mode.
+   This is mainly for FPGA-SPICE
+   When turned on, FPGA-SPICE will apply the device variation defined in :ref:`technology_library` to monte carlo simulation
+
+- ``num_simulation_points="<int>"``
+
+  Specify the number of simulation points to be considered in monte carlo.
+  The larger the number is, the longer simulation time will be but more accurate the results will be.
+
+Measurement Setting
+```````````````````
+- Users can define the parameters in measuring the slew of signals, under XML node ``<slew>``
+
+- Users can define the parameters in measuring the delay of signals, under XML node ``<delay>``
+
+Both delay and slew measurement share the same syntax in defining the upper and lower voltage thresholds.
+
+.. option:: <rise|fall upper_thres_pct="<float>" lower_thres_pct="<float>"/>
+
+  Define the starting and ending point in measuring the slew of a rising or a falling edge of a signal.
+    
+  - ``upper_thres_pct="<float>"`` the ending point in measuring the slew of a rising edge. It is expressed as a percentage of the maximum voltage of a signal. For example, the meaning of upper_thres_pct=0.95 is depicted in :numref:`fig_measure_edge`. 
+    
+  - ``lower_thres_pct="<float>"`` the starting point in measuring the slew of a rising edge. It is expressed as a percentage of the maximum voltage of a signal. For example, the meaning of lower_thres_pct=0.05 is depicted in :numref:`fig_measure_edge`.
+
+.. _fig_measure_edge:
+
+.. figure:: figures/meas_edge.png 
+   :scale: 80%
+   :alt: map to buried traesure
+  
+   An illustrative example on measuring the slew and delay of signals
+
+Stimulus Setting
+````````````````
+Users can define the slew time of input and clock signals to be applied to FPGA I/Os in testbenches under XML node ``<clock>`` and ``<input>`` respectively.
+This is used by FPGA-SPICE in generating testbenches
+
+.. option:: <rise|fall slew_type="<string>" slew_time="<float>"/>
+
+  Specify the slew rate of an input or clock signal at rising or falling edge 
+
+  - ``slew_type="[abs|frac]"`` specify the type of slew time definition at the rising or falling edge of a lock/input port.
+
+    * The type of ``abs`` implies that the slew time is the absolute value. For example, ``slew_type="abs" slew_time="20e-12"`` means that the slew of a clock signal is 20ps. 
+    * The type of ``frac`` means that the slew time is related to the period (frequency) of the clock signal. For example, ``slew_type="frac" slew_time="0.05"`` means that the slew of a clock signal takes 5% of the period of the clock.
+
+  - ``slew_time="<float>"`` specify the slew rate of an input or clock signal at the rising/falling edge. 
+ 
+  :numref:`fig_measure_edge` depicts the definition of the slew and delays of signals and the parameters that can be supported by FPGA-SPICE.
--- a/docs/source/arch_lang/spice_sim_setting.rst
+++ b/docs/source/arch_lang/spice_sim_setting.rst
@ -1,134 +0,0 @@
-Parameters for SPICE simulation settings
-========================================
-All the parameters that need to be defined in the HSPICE simulations are located under a child node called <parameters>, which is under its father node <spice_settings>. 
-The parameters are divided into three categories and can be defined in three XML nodes, <options>, <measure> and <stimulate>, respectively. 
-
-* The XML node <options>
-
-.. code-block:: xml
-
-   <options sim_temp=”int” post=”string”captab=”string” fast=”string”/> 
-
-These properties define the options that will be printed in the top SPICE netlists.
-
-* **sim_temp:** specify the temperature which will be defined in SPICE netlists. In the top SPICE netlists, it will show as .temp <int>.
-
-* **post:** [on|off]. Specify if the simulation waveforms should be printed out after SPICE simulations. In all the SPICE netlists, it will show as .option POST when turned on.
-
-.. note:: when the SPICE netlists are large or a long simulation duration is defined, the post option is recommended to be off. If not, huge disk space will be occupied by the waveform files.
-
-* **captab:** [on|off]. Specify if the capacitances of all the nodes in the SPICE netlists will be printed out. In the top SPICE netlists, it will show as .option CAPTAB when turned on. When turned on, the SPICE simulation runtime may increase.
-
-* The XML node <stimulate>
-
-.. code-block:: xml
-
-    <stimulate>
-      <clock op_freq=”auto|float” sim_slack=”float” prog_freq=”float”>
-        <rise slew_time=”float” slew_type=”string”/>
-        <fall slew_time=”float” slew_type=”string”/>
-      </clock>
-    </stimulate>
-
-Define stimulates for the clock signal.
-
-* **op_freq:** either auto or a float number (unit:[Hz])  Specify the operation clock frequency that is used in SPICE simulations. This frequency is used in testbenches for operation phase simulation. Note that this is a mandatory option. Users have to specify either this frequency is automatically determined by assigning “auto” or give an exact number. If this clock frequency is specified, the sim_slack option is disregarded.
-
-* **sim_slack:** add slack to the critical path delay in the SPICE simulation. For example, sim_slack=0.2 implies that the clock period in SPICE simulations is 1.2 of the critical path delay reported by VPR. **Only valid when option op_freq is not specified.**
-
-* **prog_freq:** Specify the programming clock frequency that is used in SPICE simulations. This frequency is used in testbenches for programming phase simulation.
-
-* **slew_type & slew_time:** define the slew of clock signals at the rising/falling edge. Property slew_type can be either absolute or fractional [abs|frac]. 
-
-    * The type of **absolute** implies that the slew time is the absolute value. For example, slew_time=20e-12, slew_type=abs means that the slew of a clock signal is  20ps. 
-    * The type of **fractional** means that the slew time is related to the period (frequency) of the clock signal. For example, slew_time=0.05, slew_type=frac means that the slew of a clock signal takes 5% of the period of the clock.
-
-:numref:`fig_meas_edge` depicts the definition of the slew and delays of signals and the parameters that can be supported by FPGA-SPICE.
-
-.. code-block:: xml
-
-     <stimulate>
-       <input>
-         <rise slew_time=”float” slew_type=”string”/>
-         <fall slew_time=”float” slew_type=”string”/>
-       </input>
-     </stimulate>
-
-Define the slew of input signals at the rising/falling edge.
-
-* **slew_type & slew_time:** define the slew of all the input signals at the rising/falling edge. Property slew_type can be either absolute or fractional [abs|frac]. 
-
-    * The type of **absolute** implies that the slew time is the absolute value. For example, slew_time=20e-12, slew_type=abs means that the slew of a clock signal is  20ps. 
-
-    * The type of **fractional** means that the slew time is related to the period (frequency) of the clock signal. For example, slew_time=0.05, slew_type=frac means that the slew of a clock signal takes 5% of the period of the clock.
-
-.. note:: These slew settings are valid for all the input signals of the testbenches in different complexity levels.
-
-.. _fig_meas_edge:
-
-.. figure:: figures/meas_edge.png 
-   :scale: 100%
-   :alt: map to buried traesure
-  
-   Parameters in measuring the slew and delay of signals
-
-* The XML node <measure>
-
-.. code-block:: xml
-    
-   <measure sim_num_clock_cycle=”int”accuracy=”float”accuracy_type=”string”/>
-
-* **sim_num_clock_cycle:** can be either “auto” or an integer. By setting to “auto”, FPGA-SPICE automatically determines the number of clock cycles to simulate, which is related to the average of all the signal density in ACE2 results. When set to an integer, FPGA-SPICE will use the given number of clock cycles in the SPICE netlists.
-    
-* **accuracy_type:** [abs|frac]. Specify the type of transient step in SPICE simulation. 
-
-    * When **abs** is selected, the accuracy should be the absolute value, such as 1e-12. 
-
-    * When **frac** is selected, the accuracy is the number of simulation points in a clock cycle period, for example, 100.
-    
-* **accuracy:** specify the transient step in SPICE simulation. Typically, the smaller the step is, the higher the accuracy that can be reached while the long simulation runtime is. The recommended accuracy is between 0.1ps and 0.01ps, which generates good accuracy and runtime is not significantly long. 
-    
-.. note:: Users can define the parameters in measuring the slew of signals, under a child node <slew> of the node <measure>.
-
-.. code-block:: xml
-    
-    <rise upper_thres_pct=”float” lower_thres_pct=”float”/>
-
-Define the starting and ending point in measuring the slew of a rising edge of a signal.
-    
-* **upper_thres_pct:** the ending point in measuring the slew of a rising edge. It is expressed as a percentage of the maximum voltage of a signal. For example, the meaning of upper_thres_pct=0.95 is depicted in Figure 2. 
-    
-* **lower_thres_pct:** the starting point in measuring the slew of a rising edge. It is expressed as a percentage of the maximum voltage of a signal. For example, the meaning of lower_thres_pct=0.05 is depicted in Figure 2.
-    
-.. code-block:: xml
-    
-    <fall upper_thres_pct=”float” lower_thres_pct=”float”/>
-
-* **upper_thres_pct:** the ending point in measuring the slew of a falling edge. It is expressed as a percentage of the maximum voltage of a signal. For example, the meaning of upper_thres_pct=0.05 is depicted in Figure 2.
-    
- * **lower_thres_pct:** the starting point in measuring the slew of a falling edge. It is expressed as a percentage of the maximum voltage of a signal. For example, the meaning of lower_thres_pct=0.95 is depicted in Figure 2.
-    
-    
-.. note:: Users can define the parameters related to measurements of delays between signals, under a child node <delay> of the node <measure>.
-
-.. code-block:: xml
-    
-    <rise input_thres_pct=”float” output_thres_pct=”float”/>
-
-Define the starting and ending point in measuring the delay between two signals when they are both at a rising edge.
-    
-* **input_thres_pct:** the starting point in measuring the delay of a rising edge. It is expressed as a percentage of the maximum voltage of a signal. For example, the meaning of input_thres_pct=0.5 is depicted in Figure 2.     
-
-* **output_thres_pct:** the ending point in measuring the delay of a rising edge. It is expressed as a percentage of the maximum voltage of a signal. For example, the meaning of output_thres_pct=0.5 is depicted in Figure 2.
-    
-.. code-block:: xml
-    
-    <fall input_thres_pct=”float” output_thres_pct=”float”/>
-
-Define the starting and ending point in measuring the delay between two signals when they are both at a falling edge.
-
-* **input_thres_pct:** the starting point in measuring the delay of a falling edge. It is expressed as a percentage of the maximum voltage of a signal. For example, upper_thres_pct=0.5 is depicted in :numref:`fig_meas_edge`. 
-    
-* **output_thres_pct:** the ending point in measuring the delay of a falling edge. It is expressed as a percentage of the maximum voltage of a signal. For example, lower_thres_pct=0. 5 is depicted in :numref:`fig_meas_edge`.
-    
-    
--- a/docs/source/arch_lang/tech_lib.rst
+++ b/docs/source/arch_lang/tech_lib.rst
@ -1,34 +0,0 @@
-Technology library Declaration
-==============================
-
-.. code-block:: xml
-
-  <tech_lib lib_type=”string” transistor_type=”string” lib_path=”string” nominal_vdd=”float”/>
-
-* **lib_type:** can be either industry or academia [industry|academia]. For the industry library, some transistor types are available, and the type of transistor should be declared in the property transistor_type. 
-
-* **transistor_type:** This XML property specify the transistors to be used in the industry library. For example, the type of transistors can be “TT”, “FF” etc.
-
-* **lib_path:** specify the path of the library. For example: lib_path=/home/tech/45nm.pm.
-
-* **nominal_vdd:** specify the working voltage for the technology. The voltage will be used as the supply voltage in all the SPICE netlist.
-
-.. code-block:: xml
-
-   <transistors pn_ratio=”float” model_ref=”string”/>
-
-* **pn_ratio:** specify the ratio between p-type transistors and n-type transistors. The ratio will be used when building circuit structures such as inverters, buffers, etc.
-    
-* **model_ref:** specify the reference of in calling a transistor model. In SPICE netlist, define a transistor follows the convention: <model_ref><trans_name> <ports> <model_name>. The reference depends on the technology and the type of library. For example, the PTM bulk model uses “M” as the reference while the PTM FinFET model uses “X” as the reference.
-
-.. code-block:: xml
-
-   <nmos model_name=”string” chan_length=”float” min_width=”float”/>
-   <pmos model_name=”string” chan_length=”float” min_width=”float”/>
-
-* **model_name:**  specify the name of the p/n type transistor, which can be found in the manual of the technology provider.
-   
-* **chan_length:** specify the channel length of p/n type transistor.
-  
-* **min_width:** specify the minimum width of p/n type transistor. This parameter will be used in building inverter, buffer, etc. as a base number for transistor sizing. 
-  
--- a/docs/source/arch_lang/technology_library.rst
+++ b/docs/source/arch_lang/technology_library.rst
@ -0,0 +1,146 @@
+.. _technology_library:
+
+Technology library
+------------------
+
+For OpenFPGA using VPR7
+~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: xml
+
+  <tech_lib lib_type=”string” transistor_type=”string” lib_path=”string” nominal_vdd=”float”/>
+
+* **lib_type:** can be either industry or academia [industry|academia]. For the industry library, some transistor types are available, and the type of transistor should be declared in the property transistor_type. 
+
+* **transistor_type:** This XML property specify the transistors to be used in the industry library. For example, the type of transistors can be “TT”, “FF” etc.
+
+* **lib_path:** specify the path of the library. For example: lib_path=/home/tech/45nm.pm.
+
+* **nominal_vdd:** specify the working voltage for the technology. The voltage will be used as the supply voltage in all the SPICE netlist.
+
+.. code-block:: xml
+
+   <transistors pn_ratio=”float” model_ref=”string”/>
+
+* **pn_ratio:** specify the ratio between p-type transistors and n-type transistors. The ratio will be used when building circuit structures such as inverters, buffers, etc.
+    
+* **model_ref:** specify the reference of in calling a transistor model. In SPICE netlist, define a transistor follows the convention: <model_ref><trans_name> <ports> <model_name>. The reference depends on the technology and the type of library. For example, the PTM bulk model uses “M” as the reference while the PTM FinFET model uses “X” as the reference.
+
+.. code-block:: xml
+
+   <nmos model_name=”string” chan_length=”float” min_width=”float”/>
+   <pmos model_name=”string” chan_length=”float” min_width=”float”/>
+
+* **model_name:**  specify the name of the p/n type transistor, which can be found in the manual of the technology provider.
+   
+* **chan_length:** specify the channel length of p/n type transistor.
+  
+* **min_width:** specify the minimum width of p/n type transistor. This parameter will be used in building inverter, buffer, etc. as a base number for transistor sizing. 
+  
+For OpenFPGA using VPR8
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Technology library aims to describe transistor-level parameters to be applied to the physical design of FPGAs. In addition to transistor models, technology library also supports the definition of process variations on any transistor models. 
+General organization is as follows.
+
+.. code-block:: xml
+
+  <technology_library>
+    <device_library>
+      <device_model name="<string>" type="<string>">
+        <lib type="<string>" corner="<string>" ref="<string>" path="<string>"/>
+        <design vdd="<float>" pn_ratio="<float>"/>
+        <pmos name="<string>" chan_length="<float>" min_width="<float>" variation="<string>"/>
+        <nmos name="<string>" chan_length="<float>" min_width="<float>" variation="<string>"/>
+        <rram rlrs="<float>" rhrs="<float>" variation="<string>"/> 
+      </device_model>
+    </device_library>
+    <variation_library>
+      <variation name="<string>" abs_deviation="<float>" num_sigma="<int>"/>
+    </variation_library>
+  </technology_library>
+
+Device Library
+^^^^^^^^^^^^^^
+Device library contains detailed description on device models, such as transistors and Resistive Random Access Memories (RRAMs).
+A device library may consist of a number of ``<device_model>`` and each of them denotes a different transistor model.
+
+A device model represents a transistor/RRAM model available in users' technology library.
+
+.. option:: <device_model name="<string>" type="<string>">
+  
+  Specify the name and type of a device model
+  
+  - ``name="<string>"`` is the unique name of the device model in the context of ``<device_library>``. 
+  - ``type="transistor|rram"`` is the type of device model in terms of functionality
+    Currently, OpenFPGA supports two types: transistor and RRAM.
+
+.. note:: the name of ``<device_model>`` may not be the name in users' technology library.
+
+.. option:: <lib type="<string>" corner="<string>" ref="<string>" path="<string>"/>
+
+  Specify the technology library that defines the device model
+
+  - ``type="academia|industry"``  For the industry library, FPGA-SPICE will use ``.lib <lib_file_path>`` to include the library file in SPICE netlists. For academia library, FPGA-SPICE will use ``.include <lib_file_path>`` to include the library file in SPICE netlists
+
+  - ``corner="<string>"`` is the process corner name available in technology library. 
+    For example, the type of transistors can be ``TT``, ``SS`` and ``FF`` *etc*.
+
+  - ``ref="<string>"`` specify the reference of in calling a transistor model. In SPICE netlists, define a transistor follows the convention: 
+
+  .. code-block:: xml
+
+    <model_ref><trans_name> <ports> <model_name>
+
+  The reference depends on the technology and the type of library. For example, the PTM bulk model uses “M” as the reference while the PTM FinFET model uses “X” as the reference.
+
+  - ``path="<string>"`` specify the path of the technology library file. For example: 
+
+  .. code-block:: xml 
+
+    lib_path=/home/tech/45nm.pm.
+
+.. option:: <design vdd="<float>" pn_ratio="<float>"/>
+
+   Specify transistor-level design parameters
+
+   - ``vdd="<float>"`` specify the working voltage for the technology. The voltage will be used as the supply voltage in all the SPICE netlists.
+ 
+   - ``pn_ratio="<float>"`` specify the ratio between *p*-type and *n*-type transistors. The ratio will be used when building circuit structures such as inverters, buffers, etc.
+
+.. option:: <pmos|nmos name="<string>" chan_length="<float>" min_width="<float>" variation="<string>"/>
+  
+  Specify device-level parameters for transistors
+
+  - ``name="<string>"`` specify the name of the p/n type transistor, which can be found in the manual of the technology provider.
+
+  - ``chan_length="<float>"`` specify the channel length of *p/n* type transistor.
+  
+  - ``min_width="<float>"`` specify the minimum width of *p/n* type transistor. This parameter will be used in building inverter, buffer, *etc*. as a base number for transistor sizing. 
+
+  - ``variation="<string>"`` specify the variation name defined in the ``<variation_library>`` 
+
+.. option:: <rram rlrs="<float>" rhrs="<float>" variation="<string>"/> 
+
+  Specify device-level parameters for RRAMs
+
+  - ``rlrs="<float>"`` specify the resistance of Low Resistance State (LRS) of a RRAM device
+
+  - ``rhrs="<float>"`` specify the resistance of High Resistance State (HRS) of a RRAM device 
+
+  - ``variation="<string>"`` specify the variation name defined in the ``<variation_library>`` 
+
+Variation Library
+^^^^^^^^^^^^^^^^^
+Variation library contains detailed description on device variations specified by users.
+A variation library may consist of a number of ``<variation>`` and each of them denotes a different variation parameter.
+
+.. option:: <variation name="<string>" abs_deviation="<float>" num_sigma="<int>"/>
+  
+  Specify detail variation parameters
+
+  - ``name="<string>"`` is the unique name of the device variation in the context of ``<variation_library>``.  The name will be used in ``<device_model>`` to bind variations
+  
+  - ``abs_variation="<float>"`` is the absolute deviation of a variation
+
+  - ``num_sigma="<int>"`` is the standard deviation of a variation
--- a/docs/source/contact.rst
+++ b/docs/source/contact.rst
@ -1,15 +1,15 @@
 .. _contact:
-   
-Contact
-=======

+Contact
+~~~~~~~
+   
 General questions:

 Prof. Pierre-Emmanuel Gaillardon 

 pierre-emmanuel.gaillardon@utah.edu

-Technical Details about FPGA-SPICE/Verilog/Bitstream:
+Technical Details about FPGA-SPICE/Verilog/Bitstream/SDC:

 Dr. Xifan Tang

--- a/docs/source/figures/openfpga_flow.png
+++ b/docs/source/figures/openfpga_flow.png
--- a/docs/source/figures/openfpga_motivation.png
+++ b/docs/source/figures/openfpga_motivation.png
--- a/docs/source/fpga_bitstream/command_line_usage.rst
+++ b/docs/source/fpga_bitstream/command_line_usage.rst
@ -1,14 +0,0 @@
-Command-line Options for FPGA Bitstream Generator
-=================================================
-
-All the command line options of FPGA-Bitstream can be shown by calling the help menu of VPR. Here are all the FPGA-Verilog-related options that you can find:
-
-FPGA-Verilog Supported Option::	
-	
-	--fpga_bitstream_generator
-
-.. csv-table:: Commmand-line Option of FPGA-Bitstream
-   :header: "Command Options", "Description"
-   :widths: 15, 30
-
-   "--fpga_bitstream_generator", "Turn on the FPGA-Bitstream and output a .bitstream file containing FPGA configuration."
--- a/docs/source/fpga_bitstream/fabric_dependent_bitstream.rst
+++ b/docs/source/fpga_bitstream/fabric_dependent_bitstream.rst
@ -0,0 +1,7 @@
+Fabric-dependent Bitstream
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Fabric-dependent bitstream is design to be loadable to the configuration protocols of FPGAs. 
+The bitstream just sets an order to the configuration bits in the database, without duplicating the database.
+OpenFPGA framework provides a fabric-dependent bitstream generator which is aligned to our Verilog netlists.
+The fabric-dependent bitstream can be found in autogenerated Verilog testbenches.
--- a/docs/source/fpga_bitstream/generic_bitstream.rst
+++ b/docs/source/fpga_bitstream/generic_bitstream.rst
@ -1,9 +1,24 @@
-Bistream Output File Format
-============================
+Generic Bitstream
+~~~~~~~~~~~~~~~~~
+
+Usage
+`````
+
+Generic bitstream is a fabric-independent bitstream where configuration bits are organized out-of-order in a database.
+This can be regarded as a raw bitstream used for 
+  - ``debugging``: Hardware engineers can validate if their configuration memories across the FPGA fabric are assigned to expected values 
+  - ``an exchangeable file format for bitstream assembler``: Software engineers can use the raw bitstream to build a bitstream assembler which organize the bitstream in the loadable formate to FPGA chips.
+  - ``creation of artificial bitstream``: Test engineers can craft artificial bitstreams to test each element of the FPGA fabric, which is typically not synthesizable by VPR. 
+
+.. note:: The fabric-independent bitstream cannot be directly loaded to FPGA fabrics
+
+File Format
+```````````
+
+OpenFPGA can output the generic bitstream to an XML format, which is easy to debug. As shown in the following XML code, configuration bits are organized block by block, where each block could be a LUT, a routing multiplexer `etc`. Each ``bitstream_block`` includes two sets of information: 

-FPGA-Bitstream can generate two types of bitstreams:
-* Generic bitstreams, where configuration bits are organized out-of-order in a database. We output the generic bitstream to a XML format, which is easy to debug. As shown in the following XML code, configuration bits are organized block by block, where each block could be a LUT, a routing multiplexer `etc`. Each ``bitstream_block`` includes two sets of information: 
  - ``hierarchy`` represents the location of this block in FPGA fabric.
+
  - ``bitstream`` represents the configuration bits affiliated to this block.

 .. code-block:: xml
@ -34,5 +49,3 @@ FPGA-Bitstream can generate two types of bitstreams:
          <bit memory_port="mem_out[15]" value="0"/>
      </bitstream>
  </bitstream_block>
-
-* Fabric-dependent bitstreams, where configuration bits are organized to be loadable to the configuration protocols of FPGAs. The bitstream just sets an order to the configuration bits in the database, without duplicating the database. OpenFPGA framework provides a fabric-dependent bitstream generator which is aligned to our Verilog netlists. The fabric-dependent bitstream can be found in autogenerated Verilog testbenches.
--- a/docs/source/fpga_bitstream/index.rst
+++ b/docs/source/fpga_bitstream/index.rst
@ -1,14 +1,14 @@
 FPGA-Bitstream
-==============
+--------------
+
+FPGA-Bitstream can generate two types of bitstreams:

 .. _fpga_bitstream:
-   User Manual for FPGA Bitstream Generator
+   FPGA-Bitstream
 
 .. toctree::
   :maxdepth: 2
  
-   command_line_usage
+   generic_bitstream

-   file_organization
- 
-   
+   fabric_dependent_bitstream
--- a/docs/source/fpga_spice/command_line_usage.rst
+++ b/docs/source/fpga_spice/command_line_usage.rst
@ -1,5 +1,5 @@
-Command-line Options for FPGA SPICE Generator
-=================================================
+Command-line Options
+~~~~~~~~~~~~~~~~~~~~
 All the command line options of FPGA-SPICE can be shown by calling the help menu of VPR. Here are all the FPGA-SPICE-related options that you can find:

 FPGA-SPICE Supported Options::
--- a/docs/source/fpga_spice/customize_subckt.rst
+++ b/docs/source/fpga_spice/customize_subckt.rst
@ -1,5 +1,5 @@
 Create Customized SPICE Modules
-===============================
+-------------------------------
 To make sure the customized SPICE netlists can be correctly included in FPGA-SPICE, the following rules should be fully respected:

 1.    The customized SPICE netlists could contain multiple sub-circuits but the names of these sub-circuits should not be conflicted with any reserved words.. Here is an example of defining a sub-circuit in SPICE netlists. The <subckt_name> should be a unique one, which should not be conflicted with any reserved words.
--- a/docs/source/fpga_spice/file_organization.rst
+++ b/docs/source/fpga_spice/file_organization.rst
@ -1,5 +1,5 @@
 Hierarchy of SPICE Output Files
-===============================
+-------------------------------

 All the generated SPICE netlists are located in the <spice_dir> as you specify in the command-line options.
 Under the <spice_dir>, FPGA-SPICE creates a number of folders:  include, subckt, lut_tb, dff_tb, grid_tb, pb_mux_tb, cb_mux_tb, sb_mux_tb, top_tb, results. Under the <spice_dir>, FPGA-SPICE also creates a shell script called run_hspice_sim.sh, which run all the simulations for all the testbenches.
--- a/docs/source/fpga_spice/index.rst
+++ b/docs/source/fpga_spice/index.rst
@ -1,10 +1,13 @@
-FPGA-SPICE: SPICE Auto-Generation
-====================================
+FPGA-SPICE
+----------
+
+.. warning:: FPGA-SPICE has not been integrated to VPR8 version yet. Please the following tool guide is for VPR7 version now

 .. _fpga_spice:
-   User Manual for FPGA-SPICE support
+   FPGA-SPICE
 
 .. toctree::
+   :maxdepth: 2
  
   command_line_usage

--- a/docs/source/fpga_spice/spice_simulation.rst
+++ b/docs/source/fpga_spice/spice_simulation.rst
@ -1,5 +1,5 @@
 Run SPICE simulation
-====================
+--------------------

 * Simulation results 

--- a/docs/source/fpga_verilog/command_line_usage.rst
+++ b/docs/source/fpga_verilog/command_line_usage.rst
@ -1,55 +0,0 @@
-Command-line Options for FPGA-Verilog Generator
-=================================================
-
-All the command line options of FPGA-Verilog can be shown by calling the help menu of VPR. Here are all the FPGA-Verilog-related options that you can find:
-
-FPGA-Verilog Supported Options::	
-	
-	--fpga_verilog
-	--fpga_verilog_dir <directory_path_of_dumped_verilog_files>
-	--fpga_verilog_include_timing
-	--fpga_verilog_include_signal_init
-	--fpga_verilog_print_modelsim_autodeck <modelsim_ini_path>
-	--fpga_verilog_print_top_testbench 
-	--fpga_verilog_print_autocheck_top_testbench <reference_verilog_file_path>
-	--fpga_verilog_print_formal_verification_top_netlist
-	--fpga_verilog_include_icarus_simulator
-
-
-.. csv-table:: Commmand-line Options of FPGA-Verilog
-   :header: "Command Options", "Description"
-   :widths: 15, 30
-
-   "--fpga_verilog", "Turn on the FPGA-Verilog."
-   "--fpga_verilog_dir <dir_path>", "Specify the directory that all the Verilog files will be outputted to <dir_path> is the destination directory."
-   "--fpga_verilog_include_timing", "Includes the timings found in the XML file."
-   "--fpga_verilog_init_sim", "Initializes the simulation for ModelSim."
-   "--fpga_verilog_print_modelsim_autodeck", "Generates the scripts necessary to the ModelSim simulation."
-   "--fpga_verilog_modelsim_ini_path <string>", "Gives the path for the .ini necessary to ModelSim."
-   "--fpga_verilog_print_top_testbench", "Print the full-chip-level testbench for the FPGA. Determines the type of autodeck."
-   "--fpga_verilog_print_top_auto_testbench \
-   <path_to_the_verilog_benchmark>", "Prints the testbench associated with the given benchmark. Determines the type of autodeck."
-   "--fpga_verilog_dir <dir_path>", "Specify the directory where all the Verilog files will be outputted to. <dir_path> is the destination directory."
-   "--fpga_verilog_include_timing", "Includes the timings found in the XML architecture description file."
-   "--fpga_verilog_include_signal_init", "Set all nets to random value to be close of a real power-on case"
-   "--fpga_verilog_print_modelsim_autodeck <modelsim_ini_path>", "Generates the scripts necessary to the ModelSim simulation and specify the path to modelsim.ini file."
-   "--fpga_verilog_print_top_testbench", "Prints the full-chip-level testbench for the FPGA, which includes programming phase and operationg phase (random patterns)."
-   "--fpga_verilog_print_autocheck_top_testbench \
-   <reference_verilog_file_path>", "Prints a testbench stimulating the generated FPGA and the initial benchmark to compare stimuli responses, which includes programming phase and operationg phase (random patterns)"
-   "--fpga_verilog_print_formal_verification_top_netlist", "Prints a Verilog top file compliant with formal verification tools. With this top file the FPGA is initialy programmed. It also prints a testbench with random patterns, which can be manually or automatically check regarding previous options."
-   "--fpga_verilog_include_icarus_simulator", "Activates waveforms .vcd file generation and simulation timeout, which are required for Icarus Verilog simulator"
-   "--fpga_verilog_print_input_blif_testbench", "Generates a Verilog test-bench to use with input blif file"
-   "--fpga_verilog_print_report_timing_tcl", "Generates tcl commands to run STA analysis with TO COMPLETE TOOL"
-   "--fpga_verilog_report_timing_rpt_path <path_to_generate_reports>", "Specifies path where report timing are written"
-   "--fpga_verilog_print_sdc_pnr", "Generates SDC constraints to PNR"
-   "--fpga_verilog_print_sdc_analysis", "Generates SDC to run timing analysis in PNR tool"
-   "--fpga_verilog_print_user_defined_template", "Generates a template of hierarchy modules and their port mapping"
-
-.. note:: The selected directory will contain the *Verilog top file* and three other folders. The folders are: 
-
-	* **sub_module:** contains each module verilog file and is more detailed in the next part *Verilog Output File Format*. 
-	* **routing:** contains the Verilog for the connection blocks and the switch boxes. 
-	* **lb:** contains the grids Verilog files.
-
-
-
--- a/docs/source/fpga_verilog/figures/verification_step.png
+++ b/docs/source/fpga_verilog/figures/verification_step.png
--- a/docs/source/fpga_verilog/file_organization.rst
+++ b/docs/source/fpga_verilog/file_organization.rst
@ -1,5 +1,5 @@
 Hierarchy of Verilog Output Files
-============================
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 All the generated Verilog Netlists are located in the <verilog_dir>/SRC as you specify in the command-line options. Under the <verilog_dir>/SRC, FPGA-Verilog creates the top file name_top.v and some folders: lb (logic blocks), routing and sub_modules. 

--- a/docs/source/fpga_verilog/func_verify.rst
+++ b/docs/source/fpga_verilog/func_verify.rst
@ -1,5 +1,5 @@
 Perform Functionality Verification
-==================================
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 If the --fpga_verilog_print_modelsim_autodeck option is selected, it is possible to directly generate scripts for Modelsim. Inside of the Verilog directory specified with --fpga_verilog_dir can be found name_runsim.tcl scripts which perform the functional verification onto the FPGA generated. 

@ -7,6 +7,6 @@ The point of the verification step is to check that the FPGA reproduces the righ

 .. _fig_ModelSim:

-.. figure:: ./figures/Verification_step.pdf
-   :scale: 100%
+.. figure:: figures/verification_step.png
+   :scale: 50%
   :alt: Functional Verification using ModelSim
--- a/docs/source/fpga_verilog/index.rst
+++ b/docs/source/fpga_verilog/index.rst
@ -1,14 +1,12 @@
-FPGA-Verilog: Verilog Auto-Generation
-------------------------------------
+FPGA-Verilog
+------------

 .. _fpga_verilog:
-   User Manual for FPGA Verilog Generator 
+   FPGA-Verilog
 
 .. toctree::
   :maxdepth: 2
  
-   command_line_usage
-
   file_organization
 
   func_verify
--- a/docs/source/fpga_verilog/sc_flow.rst
+++ b/docs/source/fpga_verilog/sc_flow.rst
@ -1,5 +1,5 @@
 From Verilog to Layout
-======================
+~~~~~~~~~~~~~~~~~~~~~~

 The generated Verilog code can be used through a semi-custom design flow to generate the layout.

--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -12,33 +12,29 @@ Welcome to OpenFPGA's documentation!
   motivation

 .. toctree::
+   :maxdepth: 2
   :caption: Getting Started

-   eda_flow
-
-   run_fpga_flow
-
-   run_fpga_task
-
+   tutorials/index

 .. toctree::
   :maxdepth: 2
-   :caption: Tools Guide
+   :caption: Architecture Description Language

   arch_lang/index

+.. toctree::
+   :maxdepth: 2
+   :caption: OpenFPGA Tools
+
+   openfpga_shell/index
+
   fpga_spice/index

   fpga_verilog/index

   fpga_bitstream/index

-.. toctree::
-   :maxdepth: 2
-   :caption: User Guide
-
-   tutorials/index
-
 .. toctree::
   :maxdepth: 2
   :caption: Appendix
@ -52,8 +48,6 @@ For more information on the Yosys see yosys_doc_ or yosys_github_

 For more information on the original FPGA architecture description language see xml_vtr_

-
-
 Indices and tables
 ==================

--- a/docs/source/motivation.rst
+++ b/docs/source/motivation.rst
@ -1,8 +1,62 @@
-Motivation
-==========
+Why OpenFPGA?
+-------------
+
+OpenFPGA aims to be an open-source framework that enables rapid prototyping of customizable FPGA architectures. As shown in :numref:`fig_openfpga_motivation`, a conventional approach will take a large group of experienced engineers more than one year to achieve production-ready layout and assoicated CAD tools. In fact, most of the engineering efforts are spent on manual layouts and developing ad-hoc CAD support.
+
+.. _fig_openfpga_motivation:
+
+.. figure:: ./figures/openfpga_motivation.png
+   :scale: 50%
+   :alt: OpenFPGA: a fast prototyping framework for customizable FPGAs
+
+   Comparison on engineering time and effort to prototype an FPGA using OpenFPGA and conventional approaches
+
+Using OpenFPGA, the development cycle in both hardware and software can be significantly accelerated. OpenFPGA can automatically generate Verilog netlists describing a full FPGA fabric based on an XML-based description file. Thanks to modern semi-custom design tools, production-ready layout generation can be achieved within 24 hours. To help sign-off, OpenFPGA can auto-generate Verilog testbenches to validate the correctness of FPGA fabric using modern verification tools.
+OpenFPGA also provides native bitstream generation support based the same XML-based description file used in Verilog generation. This avoid the recurring engineering in developing CAD tools for different FPGAs. Once the FPGA architecture is finalized, the CAD tool is ready to use.
+
+OpenFPGA can support any architecture that VPR can describe, covering most of the architecture enhancements available in modern FPGAs, and hence unlocks a large design space in prototyping customizable FPGAs. In addition, OpenFPGA provides enriched syntax which allows users to customized primitive circuit designed downto transistor-level parameters. This helps developers to customize the P.P.A. (Power, Performance and Area) to the best. All these features open the door of prototyping/studying flexible FPGAs to a small group of junior engineers or researchers. 
+
+In terms of tool functionality, OpenFPGA consists of the following parts: FPGA-Verilog, FPGA-SDC, FPGA-Bitstream and FPGA-SPICE.
+The rest of this section will focus on detailed motivation on each of them, as depicted in :numref:`fig_openfpga_flow`. 
+
+.. _fig_openfpga_flow:
+
+.. figure:: ./figures/openfpga_flow.png
+   :scale: 50%
+   :alt: Design flows avaiable in OpenFPGA
+
+   Design flows in different purposes using OpenFPGA
+
+
+FPGA-Verilog
+~~~~~~~~~~~~
+
+Driven by the strong need in data processing applications, Field Programmable Gate Arrays (FPGAs) are playing an ever-increasing role as programmable accelerators in modern
+computing systems. To fully unlock processing capabilities for domain-specific applications, FPGA architectures have to be tailored for seamless cooperation with other computing resources. However, prototyping and bringing to production a customized FPGA is a costly and complex endeavor even for industrial vendors. OpenFPGA, an opensource framework, aims to rapid prototype of customizable FPGA architectures through a semi-custom design approach. We propose an XML-to-Prototype design flow, where the Verilog netlists of a full FPGA fabric can be autogenerated using an extension of the XML language from the VTR framework and then fed into a back-end flow to generate production-ready layouts. 
+
+The technical details can be found in our TVLSI'19 paper :cite:`XTang_TVLSI_2019` and FPL'19 paper :cite:`XTang_FPL_2019`.
+
+FPGA-SDC
+~~~~~~~~
+
+Design constraints are indepensible in modern ASIC design flows to guarantee the performance level.
+OpenFPGA includes a rich SDC generator in the OpenFPGA framework to deal with both PnR constraints and sign-off timing analysis.
+Our flow automatically generates two sets of SDC files. The first set of SDC is designed for the P&R flow, where all the combinational loops are broken to enable wellcontrolled timing-driven P&R. In addition, there are SDC files devoted to constrain pin-to-pin timing for all the resources in FPGAs, in order to obtain nicely constrained and homogeneous delays across the fabric. The second set of SDC is designed for the timing analysis of a benchmark at the post P&R stage.
+
+The technical details can be found in our FPL'19 paper :cite:`XTang_FPL_2019`.
+
+
+FPGA-Bitstream
+~~~~~~~~~~~~~~
+
+EDA support is essential for end-users to implement designs on a customized FPGA. OpenFPGA provides a general-purpose bitstream generator FPGA-Bitstream for any architecture that can be described by VPR. As the native CAD tool for any customized FPGA that is produced by FPGA-Verilog, FPGA-Bitstream is ready to use once users finalize the XML-based architecture description file. This eliminates the huge engineering efforts spent on developing bitstream generator for customized FPGAs.
+
+Using FPGA-Bitstream, users can launch (1) Verilog-to-Bitstream flow. This is the typical implementation flow for end-users; (2) Verilog-to-Verification flow. OpenFPGA can output Verilog testbenches with self-testing features to validate users' implemetations on their customized FPGA fabrics.
+
+The technical details can be found in our TVLSI'19 paper :cite:`XTang_TVLSI_2019` and FPL'19 paper :cite:`XTang_FPL_2019`.

 FPGA-SPICE
----------
+~~~~~~~~~~

 The built-in timing and power analysis engines of VPR are based on analytical models :cite:`VBetz_Book_1999,JGoeders_FPT_2012`. Analytical model-based analysis can promise accuracy only on a limited number of circuit designs for which the model is valid. As the technology advancements create more opportunities on circuit designs and FPGA architectures, the analytical power model require to be updated to follow the new trends. However, without referring to simulation results, the analytical power models cannot prove their accuracy. SPICE simulators have the advantages of generality and accuracy over analytical models. For this reason, SPICE simulation results are often selected to check the accuracy of analytical models. Therefore, there is a strong need for a simulation-based power analysis approach for FPGAs, which can support general circuit designs.

@ -11,28 +65,4 @@ FPGA-SPICE aims at generating SPICE netlists and testbenches for the FPGA archit

 SPICE modeling for FPGA architectures requires detailed transistor-level modeling for all the circuit elements within the considered FPGA architecture. However, current VPR architectural description language :cite:`JLuu_FPGA_2011` does not offer enough transistor-level parameters to model the most common circuit modules, such as multiplexers and LUTs. Therefore, we develop an extension on the VPR architectural description language to model the transistor-level circuit designs.

-In this manual, we will introduce how to use FPGA-SPICE to conduct an accurate power analysis. First, we give an overview of the design flow of FPGA-SPICE-based tool suites. Then, we show the command-line options of FPGA-SPICE. Afterward, we introduce the extension of architectural language and the transistor-level design supports. Finally, we present how to simulate the generated SPICE netlists and testbenches. 
-
-In the appendix, we introduce the hierarchy of the generated SPICE netlists and testbenches, to help you customize the SPICE netlists. We also attach an example of an architecture XML file for your interest.
-
 The technical details can be found in our ICCD’15 paper :cite:`XTang_ICCD_2015` and TVLSI'19 paper :cite:`XTang_TVLSI_2019`.
-
-FPGA-Verilog
------------
-
-On a second note, it is becoming more and more necessary to have fast access to the Verilog code of the structures and architectures researchers want to study. We think that some issues cannot be studies through VPR only and a complete overview is possible through a more extensive workflow. One of the prerequisites for this is the generation of the Verilog which enables Place & Route and Signoff analysis. While VPR allows the researcher to have access to fast results if the characteristics of the system are well known by the user, it is quite limited otherwise. In the same way, it is quite hard to study the same architecture across multiple technology nodes without substantial knowledge of it. 
-
-This motivates us to generate the Verilog code of the architecture to enable a second level of research concerning the architectures to be explored. This Verilog code encompasses the whole design and is divided into multiple sub-directories for targetted analysis or a global one. This is left to the choice of the user. 
-
-In this manual, we present FPGA-Verilog. This extension enables the generation of a fully functional Verilog code enabling a deeper understanding of the architectures of the FPGAs. We introduce different options to this module to do the verification of the system. This will be presented in more depth in the FPGA-Bitstream section.
-
-The technical details can be found in our TVLSI'19 paper :cite:`XTang_TVLSI_2019` and FPL'19 paper :cite:`XTang_FPL_2019`.
-
-FPGA-Bitstream
--------------
-
-To have the right functionality on top of the FPGA generated, it is necessary to have a Bitstream generation which programs the FPGA. For this reason, we generate a Bitstream and some testbenches in parallel which allow the user to do some functional verification of the system to make sure that the functionality is respected. This includes three different testbenches. First, the FPGA is configured then the clock runs with random patterns are generated to test the functionality. Secondly, the FPGA can be configured in parallel to the testbench itself to do a comparison of the signals and check the validity. Finally, the configuration can be skipped to directly have access to the functioning of the system and reduce the processing time.
-
-This will be explained in more depth in the FPGA-Bitstream section.
-
-The technical details can be found in our TVLSI'19 paper :cite:`XTang_TVLSI_2019` and FPL'19 paper :cite:`XTang_FPL_2019`.
--- a/docs/source/openfpga_shell/index.rst
+++ b/docs/source/openfpga_shell/index.rst
@ -0,0 +1,14 @@
+OpenFPGA Interface
+------------------
+
+.. _openfpga_shell:
+   OpenFPGA Shell
+ 
+.. toctree::
+   :maxdepth: 2
+
+   launch_openfpga_shell
+
+   openfpga_script
+  
+   openfpga_commands
--- a/docs/source/openfpga_shell/launch_openfpga_shell.rst
+++ b/docs/source/openfpga_shell/launch_openfpga_shell.rst
@ -0,0 +1,21 @@
+.. _launch_openfpga_shell:
+
+Launch OpenFPGA Shell
+---------------------
+
+OpenFPGA employs a shell-like user interface, in order to integrate all the tools in a well-modularized way.
+Currently, OpenFPGA shell is an unified platform to call ``vpr``, ``FPGA-Verilog``, ``FPGA-Bitstream``, ``FPGA-SDC`` and ``FPGA-SPICE``.
+To launch OpenFPGA shell, users can choose two modes.
+
+.. option::	--interactive or -i
+
+  Launch OpenFPGA in interactive mode where users type-in command by command and get runtime results
+
+.. option::	--file or -f
+
+  Launch OpenFPGA in script mode where users write commands in scripts and FPGA will execute them
+
+.. option::	--help or -h
+	
+  Show the help desk
+
--- a/docs/source/openfpga_shell/openfpga_commands.rst
+++ b/docs/source/openfpga_shell/openfpga_commands.rst
@ -0,0 +1,205 @@
+.. _openfpga_commands:
+
+Commands
+--------
+
+As OpenFPGA integrates various tools, the commands are categorized into different classes:
+
+Basic Commands
+~~~~~~~~~~~~~~
+
+.. option:: help
+
+  Show help desk to list all the available commands
+
+.. option:: exit
+
+  Exit OpenFPGA shell
+
+VPR
+~~~
+
+.. option:: vpr
+  
+  OpenFPGA allows users to call ``vpr`` in the standard way as documented in vtr project.
+
+Setup OpenFPGA
+~~~~~~~~~~~~~~
+
+.. option:: read_openfpga_arch
+
+  Read the XML architecture file required by OpenFPGA
+
+  - ``--file`` or ``-f`` Specify the file name 
+
+  - ``--verbose`` Show verbose log
+
+.. option:: write_openfpga_arch
+
+  Write the OpenFPGA XML architecture file to a file
+
+  - ``--file`` or ``-f`` Specify the file name 
+
+  - ``--verbose`` Show verbose log
+
+.. option:: link_openfpga_arch
+
+  Annotate the OpenFPGA architecture to VPR data base
+
+  - ``--activity_file`` Specify the signal activity file
+
+  - ``--sort_gsb_chan_node_in_edges`` Sort the edges for the routing tracks in General Switch Blocks (GSBs). Strongly recommand to turn this on for uniquifying the routing modules
+
+  - ``--verbose`` Show verbose log
+
+.. option:: write_gsb_to_xml
+
+  Write the internal structure of General Switch Blocks (GSBs) across a FPGA fabric, including the interconnection between the nodes and node-level details, to XML files
+
+  - ``--file`` or ``-f`` Specify the output directory of the XML files. Each GSB will be written to an indepedent XML file
+
+  - ``--verbose`` Show verbose log
+
+  .. note:: This command is used to help users to study the difference between GSBs
+
+.. option:: check_netlist_naming_conflict 
+
+  Check and correct any naming conflicts in the BLIF netlist
+  This is strongly recommended. Otherwise, the outputted Verilog netlists may not be compiled successfully.
+
+  .. warning:: This command may be deprecated in future when it is merged to VPR upstream
+  
+  - ``--fix`` Apply fix-up to the names that violate the syntax
+
+  - ``--report <.xml>`` Report the naming fix-up to a log file
+
+.. option:: pb_pin_fixup
+
+  Apply fix-up to clustering nets based on routing results
+  This is strongly recommended. Otherwise, the bitstream generation may be wrong
+
+  .. warning:: This command may be deprecated in future when it is merged to VPR upstream
+  
+  - ``--verbose`` Show verbose log
+   
+.. option:: lut_truth_table_fixup
+
+  Apply fix-up to Look-Up Table truth tables based on packing results
+
+  .. warning:: This command may be deprecated in future when it is merged to VPR upstream
+
+  - ``--verbose`` Show verbose log
+  
+.. option:: build_fabric
+
+  Build the module graph.
+
+  - ``--compress_routing`` Enable compression on routing architecture modules. Strongly recommend this as it will minimize the number of routing modules to be outputted. It can reduce the netlist size significantly.
+  
+  - ``--duplicate_grid_pin`` Enable pin duplication on grid modules. This is optional unless ultra-dense layout generation is needed
+
+  - ``--verbose`` Show verbose log
+
+  .. note:: This is a must-run command before launching FPGA-Verilog, FPGA-Bitstream, FPGA-SDC and FPGA-SPICE
+
+  
+FPGA-Bitstream
+~~~~~~~~~~~~~~
+
+.. option:: repack
+
+  Repack the netlist to physical pbs
+  This must be done before bitstream generator and testbench generation
+  Strongly recommend it is done after all the fix-up have been applied
+   
+  - ``--verbose`` Show verbose log
+
+.. option:: build_architecture_bitstream
+
+  Decode VPR implementing results to an fabric-independent bitstream database 
+  
+  - ``--file`` or ``-f`` Output the fabric-independent bitstream to an XML file
+  
+  - ``--verbose`` Show verbose log
+
+.. option:: build_fabric_bitstream
+
+  Reorganize the bitstream database for a specific FPGA fabric
+
+  - ``--verbose`` Show verbose log
+  
+FPGA-Verilog
+~~~~~~~~~~~~
+
+.. option:: write_fabric_verilog
+
+  Write the Verilog netlist for FPGA fabric based on module graph
+
+  - ``--file`` or ``-f`` Specify the output directory for the Verilog netlists
+
+  - ``--explict_port_mapping`` Use explict port mapping when writing the Verilog netlists
+
+  - ``--include_timing`` Output timing information to Verilog netlists for primitive modules
+ 
+  - ``--include_signal_init`` Output signal initialization to Verilog netlists for primitive modules
+
+  - ``--support_icarus_simulator`` Output Verilog netlists with syntax that iVerilog simulatorcan accept
+
+  - ``--print_user_defined_template`` Output a template Verilog netlist for all the user-defined ``circuit models`` in :ref:`circuit_library`. This aims to help engineers to check what is the port sequence required by top-level Verilog netlists
+
+  - ``--verbose`` Show verbose log
+
+.. option:: write_verilog_testbench
+ 
+  Write the Verilog testbench for FPGA fabric
+
+  - ``--file`` or ``-f`` The output directory for all the testbench netlists. We suggest the use of same output directory as fabric Verilog netlists
+
+  - ``--reference_benchmark_file_path`` Must specify the reference benchmark Verilog file if you want to output any testbenches
+
+  - ``--print_top_testbench`` Enable top-level testbench which is a full verification including programming circuit and core logic of FPGA
+
+  - ``--print_formal_verification_top_netlist`` Generate a top-level module which can be used in formal verification
+
+  - ``--print_preconfig_top_testbench`` Enable pre-configured top-level testbench which is a fast verification skipping programming phase
+
+  - ``--print_simulation_ini`` Output an exchangeable simulation ini file, which is needed only when you need to interface different HDL simulators using openfpga flow-run scripts
+
+FPGA-SDC
+~~~~~~~~
+
+.. option:: write_pnr_sdc
+ 
+  Write the SDC files for PnR backend
+  
+  - ``--file`` or ``-f`` Specify the output directory for SDC files
+
+  - ``--constrain_global_port`` Constrain all the global ports of FPGA fabric.
+
+  - ``--constrain_non_clock_global_port`` Constrain all the non-clock global ports as clocks ports of FPGA fabric
+
+    .. note:: ``constrain_global_port`` will treat these global ports in Clock Tree Synthesis (CTS), in purpose of balancing the delay to each sink. Be carefull to enable ``constrain_non_clock_global_port``, this may significanly increase the runtime of CTS as it is supposed to be routed before any other nets. This may cause routing congestion as well.
+
+  - ``--constrain_grid`` Constrain all the grids of FPGA fabric
+
+  - ``--constrain_sb`` Constrain all the switch blocks of FPGA fabric
+
+  - ``--constrain_cb`` Constrain all the connection blocks of FPGA fabric
+
+  - ``--constrain_configurable_memory_outputs`` Constrain all the outputs of configurable memories of FPGA fabric
+
+  - ``--constrain_routing_multiplexer_outputs`` Constrain all the outputs of routing multiplexer of FPGA fabric
+
+  - ``--constrain_switch_block_outputs`` Constrain all the outputs of switch blocks of FPGA fabric
+
+  - ``--constrain_zero_delay_paths`` Constrain all the zero-delay paths in FPGA fabric
+
+  .. note:: Zero-delay path may cause errors in some PnR tools as it is considered illegal
+  
+  - ``--verbose`` Enable verbose output
+
+.. option:: write_analysis_sdc
+
+  Write the SDC to run timing analysis for a mapped FPGA fabric
+
+  - ``--file`` or ``-f`` Specify the output directory for SDC files
--- a/docs/source/openfpga_shell/openfpga_script.rst
+++ b/docs/source/openfpga_shell/openfpga_script.rst
@ -0,0 +1,72 @@
+.. _openfpga_script_format:
+
+OpenFPGA Script Format
+----------------------
+
+OpenFPGA accepts a simplified tcl-like script format.
+Commented lines are started with `#`.
+Note that comments can be added inline or as a new line.
+
+The following is an example.
+
+.. code-block:: python
+
+  # Run VPR for the s298 design
+  vpr ./test_vpr_arch/k6_frac_N10_40nm.xml ./test_blif/and.blif --clock_modeling route #--write_rr_graph example_rr_graph.xml
+  
+  # Read OpenFPGA architecture definition
+  read_openfpga_arch -f ./test_openfpga_arch/k6_frac_N10_40nm_openfpga.xml
+  
+  # Write out the architecture XML as a proof
+  #write_openfpga_arch -f ./arch_echo.xml
+  
+  # Annotate the OpenFPGA architecture to VPR data base
+  link_openfpga_arch --activity_file ./test_blif/and.act --sort_gsb_chan_node_in_edges #--verbose
+  
+  # Check and correct any naming conflicts in the BLIF netlist
+  check_netlist_naming_conflict --fix --report ./netlist_renaming.xml
+  
+  # Apply fix-up to clustering nets based on routing results
+  pb_pin_fixup --verbose
+  
+  # Apply fix-up to Look-Up Table truth tables based on packing results
+  lut_truth_table_fixup #--verbose
+  
+  # Build the module graph 
+  #  - Enabled compression on routing architecture modules
+  #  - Enable pin duplication on grid modules 
+  build_fabric --compress_routing --duplicate_grid_pin #--verbose
+  
+  # Repack the netlist to physical pbs
+  # This must be done before bitstream generator and testbench generation
+  # Strongly recommend it is done after all the fix-up have been applied
+  repack #--verbose
+  
+  # Build the bitstream 
+  #  - Output the fabric-independent bitstream to a file
+  build_architecture_bitstream --verbose --file /var/tmp/xtang/openfpga_test_src/fabric_indepenent_bitstream.xml
+  
+  # Build fabric-dependent bitstream
+  build_fabric_bitstream --verbose
+  
+  # Write the Verilog netlist for FPGA fabric
+  #  - Enable the use of explicit port mapping in Verilog netlist
+  write_fabric_verilog --file /var/tmp/xtang/openfpga_test_src/SRC --explicit_port_mapping --include_timing --include_signal_init --support_icarus_simulator --print_user_defined_template --verbose
+  
+  # Write the Verilog testbench for FPGA fabric
+  #  - We suggest the use of same output directory as fabric Verilog netlists
+  #  - Must specify the reference benchmark file if you want to output any testbenches
+  #  - Enable top-level testbench which is a full verification including programming circuit and core logic of FPGA
+  #  - Enable pre-configured top-level testbench which is a fast verification skipping programming phase
+  #  - Simulation ini file is optional and is needed only when you need to interface different HDL simulators using openfpga flow-run scripts
+  write_verilog_testbench --file /var/tmp/xtang/openfpga_test_src/SRC --reference_benchmark_file_path /var/tmp/xtang/and.v --print_top_testbench --print_preconfig_top_testbench --print_simulation_ini /var/tmp/xtang/openfpga_test_src/simulation_deck.ini
+  
+  # Write the SDC files for PnR backend
+  #  - Turn on every options here 
+  write_pnr_sdc --file /var/tmp/xtang/openfpga_test_src/SDC 
+  
+  # Write the SDC to run timing analysis for a mapped FPGA fabric
+  write_analysis_sdc --file /var/tmp/xtang/openfpga_test_src/SDC_analysis
+  
+  # Finish and exit OpenFPGA
+  exit
--- a/docs/source/tutorials/compile.rst
+++ b/docs/source/tutorials/compile.rst
@ -0,0 +1,70 @@
+.. _compile:
+
+How to Compile
+--------------
+
+General Guidelines
+~~~~~~~~~~~~~~~~~~
+OpenFPGA uses CMake to generate the Makefile scripts
+In general, please follow the steps to compile
+
+::
+
+  git clone https://github.com/LNIS-Projects/OpenFPGA.git
+  cd OpenFPGA
+  mkdir build
+  cd build            
+  cmake ..  -DCMAKE_BUILD_TYPE=debug 
+  make                             
+
+.. note:: OpenFPGA requires gcc/g++ version >5
+
+.. note:: cmake3.12+ is recommended to compile OpenFPGA with GUI
+
+.. note:: recommand to use ``make -j`` to accelerate the compilation
+
+Quick Compilation Verification
+To quickly verify the tool is well compiled, user can run the following command from OpenFPGA root repository
+
+::
+
+  python3 openfpga_flow/scripts/run_fpga_task.py compilation_verification --debug --show_thread_logs
+
+Dependencies
+~~~~~~~~~~~~
+Full list of dependencies can be found at travis_setup_link_
+In particular, OpenFPGA requires specific versions for the following dependencies:
+
+:cmake:
+  version >3.12 for graphical interface
+
+:iverilog:
+  version 10.1+ is required to run Verilog-to-Verification flow
+
+.. _travis_setup_link: https://github.com/LNIS-Projects/OpenFPGA/blob/0cfb88a49f152aab0a06f309ff160f222bb51ed7/.travis.yml#L34
+
+Docker
+~~~~~~
+If some of these dependencies are not installed on your machine, you can choose to use a Docker (the Docker tool needs to be installed).
+For the ease of the customer first experience, a Dockerfile is provided in the OpenFPGA folder. A container ready to use can be created with the following command
+
+::
+
+  docker run lnis/open_fpga:release
+
+.. note:: This command is for quick testing. If you want to conserve your work, you should certainly use other options, such as ``-v``.
+
+Otherwise, a container where you can build OpenFPGA yourself can be created with the following commands
+
+::
+
+  docker build . -t open_fpga
+  docker run -it --rm -v $PWD:/localfile/OpenFPGA -w="/localfile/OpenFPGA" open_fpga bash
+
+For more information about dock, see dock_download_link_
+
+.. _dock_download_link: https://www.docker.com/products/docker-desktop
+
+To build the tool, go in the OpenFPGA folder and follow the compilation steps
+
+.. note:: Using docker, you cannot use ``make -j``, errors will happen
--- a/docs/source/tutorials/eda_flow.rst
+++ b/docs/source/tutorials/eda_flow.rst
@ -1,11 +1,13 @@
-EDA flow
-========
+.. _eda_flow:
+
+Supported EDA flows in OpenFPGA
+-------------------------------

 As illustrated in :numref:`fig_eda_flow`, FPGA-SPICE creates a modified VTR flow. All the input files for VPR do not need modifications except the architecture description XML. As simulation-based power analysis requires the transistor-level netlists, we extend the architecture description language to support transistor-level modeling (See details in "Tools Guide>Extended Architecture Description Language"). FPGA-SPICE, embedded in VPR, outputs the SPICE netlists and testbenches according to placement and routing results when enabled by command-line options. (See each "FPGA-*Branch*" about command-line options available) Besides automatically generating all the SPICE netlists, FPGA-SPICE supports user-defined SPICE netlists for modules. We believe the support on user-defined SPICE netlists allows FPGA-SPICE to be general enough to support novel circuit designs and even technologies. (See "FPGA-SPICE... > Create Customized SPICE Modules" for guidelines in customizing your FPGA-SPICE compatible SPICE netlists.) With the dumped SPICE netlists and testbenches, a SPICE simulator, i.e., HSPICE, can be called to conduct a power analysis. FPGA-SPICE automatically generates a shell script, which brings convenience for users to run all the simulations (See "FPGA-SPICE... > Run SPICE simulation").

 .. _fig_eda_flow:

-.. figure:: figures/eda_flow.png
+.. figure:: ./figures/eda_flow.png
   :scale: 50%
   :alt: map to buried treasure

@ -14,44 +16,3 @@ As illustrated in :numref:`fig_eda_flow`, FPGA-SPICE creates a modified VTR flow
 FPGA-Verilog is the part of the flow in charge of the Verilog and the semi-custom design flow. In our case, we use Cadence Innovus. The goal is to get the full-FPGA layout to complete the analysis provided by FPGA-SPICE. By having the layout, we can get an area analysis on the one hand and have new information concerning the power analysis. For instance, having the layout allows the user to have new information on the circuit such as the parasitics. 

 FPGA-Bitstream is the part of the flow in charge of the functional verification of the produced FPGA. Testbenches are generated by FPGA-Verilog and are combined with the full FPGA fabric in Modelsim. A bitstream is generated at the same time as the testbenches. This bitstream configures the FPGA with the functionality given by the user to VPR at the beginning of the flow. First, we configure the FPGA with the bitstream, and then waveforms are sent onto the I/O pads to check the functionality.
-
-
-How to compile
-==============
-Guides can be found in the *compilation* directory in the main folder. We tested it for MacOS High Sierra 10.13.4, Ubuntu 18.04 and Red Hat 7.5. This list is not exhaustive as other distributions could work as well.
-
-As a general rule, the compilation follows these steps:
-
-1) You clone the repository with:
-git clone --recurse-submodules https://github.com/LNIS-Projects/OpenFPGA,git
-
-Two different approaches exist from then on: Either you need the full flow, or you just need the extended version of VPR.
-If you need the full flow:
-
-2) Go into the folder you just cloned and make the different submodules through a global Makefile:
-cd OpenFPGA 
-mkdir build (*if folder doesn't already exist*)
-cd build
-cmake ..
-make OR make -j (*if you have multiple cores, this will make the compilation way faster*) 
-
-If you only need vpr:
-cd OpenFPGA 
-mkdir build (if folder doesn't already exist)
-cd build
-cmake ..
-make vpr/make vpr -j
-
-3) Architectures, circuits and already written scripts exist to allow you to test the flow without having to provide any new information to the system. For this:
-cd vpr7_x2p
-cd vpr
-source ./go_fpga_verilog/spice.sh
-
-They are scripts linking to a testing architecture and a simple circuit.
-
-4) If you only need to see the new options implemented in vpr, do:
-./vpr
-
-This step will show you all the different options which were added on top of VPR to enable deeper analysis of FPGA architectures.
-
-The released package includes a version of VPR with FPGA-SPICE, Verilog and Bitstream support, Yosys and ACE2.
--- a/docs/source/tutorials/figures/eda_flow.pdf
+++ b/docs/source/tutorials/figures/eda_flow.pdf
--- a/docs/source/tutorials/figures/eda_flow.png
+++ b/docs/source/tutorials/figures/eda_flow.png
--- a/tutorials/images/architectures_schematics/frac_lut8.pdf
+++ b/tutorials/images/architectures_schematics/frac_lut8.pdf
--- a/tutorials/images/architectures_schematics/fract_lut6.pdf
+++ b/tutorials/images/architectures_schematics/fract_lut6.pdf
--- a/docs/source/tutorials/getting_started.rtf
+++ b/docs/source/tutorials/getting_started.rtf
@ -1,11 +0,0 @@
-{\rtf1\ansi\ansicpg1252\cocoartf1561\cocoasubrtf400
-{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
-{\colortbl;\red255\green255\blue255;}
-{\*\expandedcolortbl;;}
-\margl1440\margr1440\vieww10800\viewh8400\viewkind0
-\pard\tx566\tx1133\tx1700\tx2267\tx2834\tx3401\tx3968\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural\partightenfactor0
-
-\f0\fs24 \cf0 01 Getting Started\
-=================================================\
-\
-**Under Construction**}
--- a/docs/source/tutorials/index.rst
+++ b/docs/source/tutorials/index.rst
@ -1,10 +1,18 @@
 .. _tutorials:
-   Tutorials
+   Getting Started
 
 .. toctree::
   :maxdepth: 2

-   getting_started
+   compile
+
+   eda_flow
+
+   run_fpga_flow
+
+   run_fpga_task
+
+   
  

   
--- a/docs/source/tutorials/run_fpga_flow.rst
+++ b/docs/source/tutorials/run_fpga_flow.rst
--- a/docs/source/tutorials/run_fpga_task.rst
+++ b/docs/source/tutorials/run_fpga_task.rst
--- a/docs/source/z_reference.bib
+++ b/docs/source/z_reference.bib
@ -83,7 +83,7 @@ month={Oct},}
 }

@INPROCEEDINGS{XTang_FPL_2019,
-author={X. {Tang} and E. {Giacomin} and A. {Alacchi} and B. {Chauviere} and P. {Gaillardon}},
+author={X. Tang and E. Giacomin and A. Alacchi and B. Chauviere and P. Gaillardon},
 booktitle={2019 29th International Conference on Field Programmable Logic and Applications (FPL)},
 title={OpenFPGA: An Opensource Framework Enabling Rapid Prototyping of Customizable FPGAs},
 year={2019},
@ -95,3 +95,12 @@ doi={10.1109/FPL.2019.00065},
 ISSN={1946-147X},
 month={Sep.},}

+@INPROCEEDINGS{XTang_FPT_2019,
+author={X. Tang and E. Giacomin and A. Alacchi and P. Gaillardon},  
+booktitle={2019 International Conference on Field-Programmable Technology (ICFPT)}, 
+title={A Study on Switch Block Patterns for Tileable FPGA Routing Architectures},
+year={2019},
+volume={},
+number={},
+doi={10.1109/ICFPT47387.2019.00039},
+pages={247-250},}
--- a/libopenfpga/libarchopenfpga/src/check_circuit_library.cpp
+++ b/libopenfpga/libarchopenfpga/src/check_circuit_library.cpp
@ -308,8 +308,9 @@ size_t check_circuit_library_ports(const CircuitLibrary& circuit_lib) {
  /* Check global ports: make sure all the global ports are input ports */
  for (const auto& port : circuit_lib.ports()) {
    if ( (circuit_lib.port_is_global(port)) 
-      && (!circuit_lib.is_input_port(port)) ) {
-      VTR_LOG_ERROR("Circuit port (type=%s) of model (name=%s) is defined as global but not an input port!\n",
+      && (!circuit_lib.is_input_port(port)) 
+      && (!circuit_lib.is_output_port(port)) ) {
+      VTR_LOG_ERROR("Circuit port (type=%s) of model (name=%s) is defined as global but not an input/output port!\n",
                    CIRCUIT_MODEL_PORT_TYPE_STRING[size_t(circuit_lib.port_type(port))],
                    circuit_lib.model_name(port).c_str());
      num_err++;
@ -322,7 +323,7 @@ size_t check_circuit_library_ports(const CircuitLibrary& circuit_lib) {
        || (circuit_lib.port_is_reset(port)) 
        || (circuit_lib.port_is_config_enable(port)) )
      && (!circuit_lib.port_is_global(port)) ) {
-      VTR_LOG_ERROR("Circuit port (type=%s) of model (name=%s) is defined as a set/reset/config_enable port but  it is not global!\n",
+      VTR_LOG_ERROR("Circuit port (type=%s) of model (name=%s) is defined as a set/reset/config_enable port but it is not global!\n",
                    CIRCUIT_MODEL_PORT_TYPE_STRING[size_t(circuit_lib.port_type(port))],
                    circuit_lib.model_name(port).c_str());
      num_err++;
@ -462,7 +463,9 @@ void check_circuit_library(const CircuitLibrary& circuit_lib) {
  iopad_port_types_required.push_back(CIRCUIT_MODEL_PORT_INPUT);
  iopad_port_types_required.push_back(CIRCUIT_MODEL_PORT_OUTPUT);
  iopad_port_types_required.push_back(CIRCUIT_MODEL_PORT_INOUT);
-  iopad_port_types_required.push_back(CIRCUIT_MODEL_PORT_SRAM);
+  /* Some I/Os may not have SRAM port, such as AIB interface
+   * iopad_port_types_required.push_back(CIRCUIT_MODEL_PORT_SRAM);
+   */

  num_err += check_circuit_model_port_required(circuit_lib, CIRCUIT_MODEL_IOPAD, iopad_port_types_required);

--- a/libopenfpga/libarchopenfpga/src/circuit_library.cpp
+++ b/libopenfpga/libarchopenfpga/src/circuit_library.cpp
@ -892,6 +892,12 @@ size_t CircuitLibrary::port_default_value(const CircuitPortId& circuit_port_id)
  return port_default_values_[circuit_port_id];
 }

+/* Return a flag if the port is used in mode-selection purpuse of a circuit model */
+bool CircuitLibrary::port_is_io(const CircuitPortId& circuit_port_id) const {
+  /* validate the circuit_port_id */
+  VTR_ASSERT(valid_circuit_port_id(circuit_port_id));
+  return port_is_io_[circuit_port_id];
+}

 /* Return a flag if the port is used in mode-selection purpuse of a circuit model */
 bool CircuitLibrary::port_is_mode_select(const CircuitPortId& circuit_port_id) const {
@ -1344,6 +1350,7 @@ CircuitPortId CircuitLibrary::add_model_port(const CircuitModelId& model_id,
  port_lib_names_.emplace_back();
  port_inv_prefix_.emplace_back();
  port_default_values_.push_back(-1);
+  port_is_io_.push_back(false);
  port_is_mode_select_.push_back(false);
  port_is_global_.push_back(false);
  port_is_reset_.push_back(false);
@ -1414,6 +1421,15 @@ void CircuitLibrary::set_port_default_value(const CircuitPortId& circuit_port_id
  return;
 }

+/* Set the is_mode_select for a port of a circuit model */
+void CircuitLibrary::set_port_is_io(const CircuitPortId& circuit_port_id, 
+                                    const bool& is_io) {
+  /* validate the circuit_port_id */
+  VTR_ASSERT(valid_circuit_port_id(circuit_port_id));
+  port_is_io_[circuit_port_id] = is_io;
+  return;
+}
+
 /* Set the is_mode_select for a port of a circuit model */
 void CircuitLibrary::set_port_is_mode_select(const CircuitPortId& circuit_port_id, 
                                             const bool& is_mode_select) {
--- a/libopenfpga/libarchopenfpga/src/circuit_library.h
+++ b/libopenfpga/libarchopenfpga/src/circuit_library.h
@ -275,6 +275,7 @@ class CircuitLibrary {
    std::string port_lib_name(const CircuitPortId& circuit_port_id) const;
    std::string port_inv_prefix(const CircuitPortId& circuit_port_id) const;
    size_t port_default_value(const CircuitPortId& circuit_port_id) const;
+    bool port_is_io(const CircuitPortId& circuit_port_id) const;
    bool port_is_mode_select(const CircuitPortId& circuit_port_id) const;
    bool port_is_global(const CircuitPortId& circuit_port_id) const;
    bool port_is_reset(const CircuitPortId& circuit_port_id) const;
@ -346,6 +347,8 @@ class CircuitLibrary {
                             const std::string& inv_prefix);
    void set_port_default_value(const CircuitPortId& circuit_port_id, 
                                const size_t& default_val);
+    void set_port_is_io(const CircuitPortId& circuit_port_id, 
+                        const bool& is_io);
    void set_port_is_mode_select(const CircuitPortId& circuit_port_id, 
                                 const bool& is_mode_select);
    void set_port_is_global(const CircuitPortId& circuit_port_id, 
@ -529,6 +532,7 @@ class CircuitLibrary {
    vtr::vector<CircuitPortId, std::string> port_lib_names_;
    vtr::vector<CircuitPortId, std::string> port_inv_prefix_;
    vtr::vector<CircuitPortId, size_t> port_default_values_;
+    vtr::vector<CircuitPortId, bool> port_is_io_;
    vtr::vector<CircuitPortId, bool> port_is_mode_select_;
    vtr::vector<CircuitPortId, bool> port_is_global_;
    vtr::vector<CircuitPortId, bool> port_is_reset_;
--- a/libopenfpga/libarchopenfpga/src/read_xml_circuit_library.cpp
+++ b/libopenfpga/libarchopenfpga/src/read_xml_circuit_library.cpp
@ -486,6 +486,13 @@ void read_xml_circuit_port(pugi::xml_node& xml_port,
  /* Parse the port size, by default it will be 1 */
  circuit_lib.set_port_size(port, get_attribute(xml_port, "size", loc_data).as_int(1));

+  /* Identify if the port is for io, this is only applicable to INPUT ports.
+   * By default, it will NOT be a mode selection port
+   */
+  if (CIRCUIT_MODEL_PORT_INPUT == circuit_lib.port_type(port)) {
+    circuit_lib.set_port_is_io(port, get_attribute(xml_port, "io", loc_data, pugiutil::ReqOpt::OPTIONAL).as_bool(false));
+  } 
+
  /* Identify if the port is for mode selection, this is only applicable to SRAM ports.
   * By default, it will NOT be a mode selection port
   */
--- a/libopenfpga/libopenfpgashell/src/shell.tpp
+++ b/libopenfpga/libopenfpgashell/src/shell.tpp
@ -320,19 +320,11 @@ void Shell<T>::print_commands() const {
    /* Print the class name */
    VTR_LOG("%s:\n", command_class_names_[cmd_class].c_str());

-    size_t cnt = 0;
    for (const ShellCommandId& cmd : commands_by_classes_[cmd_class]) {
      /* Print the command names in this class
       * but limited4 command per line for a clean layout
       */
-      VTR_LOG("%s", commands_[cmd].name().c_str());
-      cnt++;
-      if (4 == cnt) {
-        VTR_LOG("\n");
-        cnt = 0;
-      } else {
-        VTR_LOG("\t");
-      } 
+      VTR_LOG("\t%s\n", commands_[cmd].name().c_str());
    }

    /* Put a new line in the end as a splitter */
@ -373,6 +365,8 @@ void Shell<T>::execute_command(const char* cmd_line,
    if (false == command_status_[dep_cmd]) {
      VTR_LOG("Command '%s' is required to be executed before command '%s'!\n",
              commands_[dep_cmd].name().c_str(), commands_[cmd_id].name().c_str());
+      /* Echo the command help desk */
+      print_command_options(commands_[cmd_id]);
      return;
    } 
  }
--- a/libs/libarchfpga/src/physical_types.h
+++ b/libs/libarchfpga/src/physical_types.h
@ -1589,13 +1589,19 @@ struct t_clock_arch_spec {
 /*   Detailed routing architecture */
 struct t_arch {
    char* architecture_id; //Secure hash digest of the architecture file to uniquely identify this architecture
+    
+    /* Xifan Tang: options for tileable routing architectures */
+    bool tileable;
+    bool through_channel;

    t_chan_width_dist Chans;
    enum e_switch_block_type SBType;
+    enum e_switch_block_type SBSubType;
    std::vector<t_switchblock_inf> switchblocks;
    float R_minW_nmos;
    float R_minW_pmos;
    int Fs;
+    int subFs;
    float grid_logic_tile_area;
    std::vector<t_segment_inf> Segments;
    t_arch_switch_inf* Switches = nullptr;
--- a/libs/libarchfpga/src/read_xml_arch_file.cpp
+++ b/libs/libarchfpga/src/read_xml_arch_file.cpp
@ -2533,8 +2533,11 @@ static void ProcessModelPorts(pugi::xml_node port_group, t_model* model, std::se
 static void ProcessLayout(pugi::xml_node layout_tag, t_arch* arch, const pugiutil::loc_data& loc_data) {
    VTR_ASSERT(layout_tag.name() == std::string("layout"));

-    //Expect no attributes on <layout>
-    expect_only_attributes(layout_tag, {}, loc_data);
+    //Expect only tileable attributes on <layout>
+    //expect_only_attributes(layout_tag, {"tileable"}, loc_data);
+
+    arch->tileable = get_attribute(layout_tag, "tileable", loc_data, ReqOpt::OPTIONAL).as_bool(false);
+    arch->through_channel = get_attribute(layout_tag, "through_channel", loc_data, ReqOpt::OPTIONAL).as_bool(false);

    //Count the number of <auto_layout> or <fixed_layout> tags
    size_t auto_layout_cnt = 0;
@ -2882,7 +2885,7 @@ static void ProcessDevice(pugi::xml_node Node, t_arch* arch, t_default_fc_spec&

    //<switch_block> tag
    Cur = get_single_child(Node, "switch_block", loc_data);
-    expect_only_attributes(Cur, {"type", "fs"}, loc_data);
+    //expect_only_attributes(Cur, {"type", "fs", "sub_type", "sub_fs"}, loc_data);
    Prop = get_attribute(Cur, "type", loc_data).value();
    if (strcmp(Prop, "wilton") == 0) {
        arch->SBType = WILTON;
@ -2898,8 +2901,26 @@ static void ProcessDevice(pugi::xml_node Node, t_arch* arch, t_default_fc_spec&
                       "Unknown property %s for switch block type x\n", Prop);
    }

+    std::string sub_type_str = get_attribute(Cur, "sub_type", loc_data, BoolToReqOpt(false)).as_string("");
+    /* If not specified, we set the same value as 'type' */
+    if (!sub_type_str.empty()) {
+        if (sub_type_str == std::string("wilton")) {
+            arch->SBSubType = WILTON;
+        } else if (sub_type_str == std::string("universal")) {
+            arch->SBSubType = UNIVERSAL;
+        } else if (sub_type_str == std::string("subset")) {
+            arch->SBSubType = SUBSET;
+        } else {
+            archfpga_throw(loc_data.filename_c_str(), loc_data.line(Cur),
+                           "Unknown property %s for switch block subtype x\n", Prop);
+        }
+    } else {
+        arch->SBSubType = arch->SBType;
+    }
+
    ReqOpt CUSTOM_SWITCHBLOCK_REQD = BoolToReqOpt(!custom_switch_block);
    arch->Fs = get_attribute(Cur, "fs", loc_data, CUSTOM_SWITCHBLOCK_REQD).as_int(3);
+    arch->subFs = get_attribute(Cur, "sub_fs", loc_data,  BoolToReqOpt(false)).as_int(arch->Fs);

    Cur = get_single_child(Node, "default_fc", loc_data, ReqOpt::OPTIONAL);
    if (Cur) {
--- a/openfpga/src/annotation/annotate_rr_graph.cpp
+++ b/openfpga/src/annotation/annotate_rr_graph.cpp
@ -12,6 +12,7 @@

 /* Headers from vpr library */
 #include "rr_graph_obj_util.h"
+#include "openfpga_rr_graph_utils.h"

 #include "annotate_rr_graph.h"

@ -257,10 +258,41 @@ RRGSB build_rr_gsb(const DeviceContext& vpr_device_ctx,
    /* Fill opin_rr_nodes */
    /* Copy from temp_opin_rr_node to opin_rr_node */
    for (const RRNodeId& inode : temp_opin_rr_nodes[0]) {
+      /* Skip those has no configurable outgoing, they should NOT appear in the GSB connection
+       * This is for those grid output pins used by direct connections
+       */
+      if (0 == std::distance(vpr_device_ctx.rr_graph.node_configurable_out_edges(inode).begin(),
+                             vpr_device_ctx.rr_graph.node_configurable_out_edges(inode).end())) {
+        continue;
+      }
+
+      /* Do not consider OPINs that directly drive an IPIN
+       * they are supposed to be handled by direct connection
+       */
+      if (true == is_opin_direct_connected_ipin(vpr_device_ctx.rr_graph, inode)) { 
+        continue;
+      }
+
      /* Grid[x+1][y+1] Bottom side outputs pins */
      rr_gsb.add_opin_node(inode, side_manager.get_side());
    }
+
    for (const RRNodeId& inode : temp_opin_rr_nodes[1]) {
+      /* Skip those has no configurable outgoing, they should NOT appear in the GSB connection
+       * This is for those grid output pins used by direct connections
+       */
+      if (0 == std::distance(vpr_device_ctx.rr_graph.node_configurable_out_edges(inode).begin(),
+                             vpr_device_ctx.rr_graph.node_configurable_out_edges(inode).end())) {
+        continue;
+      }
+
+      /* Do not consider OPINs that directly drive an IPIN
+       * they are supposed to be handled by direct connection
+       */
+      if (true == is_opin_direct_connected_ipin(vpr_device_ctx.rr_graph, inode)) { 
+        continue;
+      }
+
      /* Grid[x+1][y] TOP side outputs pins */
      rr_gsb.add_opin_node(inode, side_manager.get_side());
    }
@ -341,6 +373,13 @@ RRGSB build_rr_gsb(const DeviceContext& vpr_device_ctx,
                                                  ix, iy, IPIN, ipin_rr_node_grid_side);
    /* Fill the ipin nodes of RRGSB */ 
    for (const RRNodeId& inode : temp_ipin_rr_nodes) {
+      /* Do not consider IPINs that are directly connected by an OPIN
+       * they are supposed to be handled by direct connection
+       */
+      if (true == is_ipin_direct_connected_opin(vpr_device_ctx.rr_graph, inode)) { 
+        continue;
+      }
+ 
      rr_gsb.add_ipin_node(inode, side_manager.get_side());
    }
    /* Clear the temp data */
@ -372,12 +411,12 @@ void annotate_device_rr_gsb(const DeviceContext& vpr_device_ctx,
  /* For each switch block, determine the size of array */
  for (size_t ix = 0; ix < gsb_range.x(); ++ix) {
    for (size_t iy = 0; iy < gsb_range.y(); ++iy) {
-      /* Here we give the builder the fringe coordinates so that it can handle the GSBs at the borderside correctly */
+      /* Here we give the builder the fringe coordinates so that it can handle the GSBs at the borderside correctly
+       * sort drive_rr_nodes should be called if required by users
+       */
      const RRGSB& rr_gsb = build_rr_gsb(vpr_device_ctx, 
                                         vtr::Point<size_t>(vpr_device_ctx.grid.width() - 2, vpr_device_ctx.grid.height() - 2), 
                                         vtr::Point<size_t>(ix, iy));
-      /* TODO: sort drive_rr_nodes should be done when building the tileable rr_graph */
-      //sort_rr_gsb_drive_rr_nodes(rr_gsb);
 
      /* Add to device_rr_gsb */
      vtr::Point<size_t> gsb_coordinate = rr_gsb.get_sb_coordinate();
@ -394,6 +433,27 @@ void annotate_device_rr_gsb(const DeviceContext& vpr_device_ctx,
          gsb_range.x() * gsb_range.y());
 }

+/********************************************************************
+ * Sort all the incoming edges for each channel node which are
+ * output ports of the GSB
+ *******************************************************************/
+void sort_device_rr_gsb_chan_node_in_edges(const RRGraph& rr_graph,
+                                           DeviceRRGSB& device_rr_gsb) {
+  vtr::ScopedStartFinishTimer timer("Sort incoming edges for each routing track output node of General Switch Block(GSB)");
+
+  /* Note that the GSB array is smaller than the grids by 1 column and 1 row!!! */
+  vtr::Point<size_t> gsb_range = device_rr_gsb.get_gsb_range();
+
+  /* For each switch block, determine the size of array */
+  for (size_t ix = 0; ix < gsb_range.x(); ++ix) {
+    for (size_t iy = 0; iy < gsb_range.y(); ++iy) {
+      vtr::Point<size_t> gsb_coordinate(ix, iy);
+      RRGSB& rr_gsb = device_rr_gsb.get_mutable_gsb(gsb_coordinate);
+      rr_gsb.sort_chan_node_in_edges(rr_graph);
+    } 
+  }
+}
+
 /********************************************************************
 * Build the link between rr_graph switches to their physical circuit models 
 * The binding is done based on the name of rr_switches defined in the
--- a/openfpga/src/annotation/annotate_rr_graph.h
+++ b/openfpga/src/annotation/annotate_rr_graph.h
@ -19,6 +19,9 @@ void annotate_device_rr_gsb(const DeviceContext& vpr_device_ctx,
                            DeviceRRGSB& device_rr_gsb,
                            const bool& verbose_output);

+void sort_device_rr_gsb_chan_node_in_edges(const RRGraph& rr_graph,
+                                           DeviceRRGSB& device_rr_gsb);
+
 void annotate_rr_graph_circuit_models(const DeviceContext& vpr_device_ctx, 
                                      const Arch& openfpga_arch,
                                      VprDeviceAnnotation& vpr_device_annotation,
--- a/openfpga/src/annotation/annotate_simulation_setting.cpp
+++ b/openfpga/src/annotation/annotate_simulation_setting.cpp
@ -0,0 +1,227 @@
+/********************************************************************
+ * This file includes functions that are used to annotate pb_graph_node
+ * and pb_graph_pins from VPR to OpenFPGA
+ *******************************************************************/
+#include <cmath>
+#include <iterator>
+
+/* Headers from vtrutil library */
+#include "vtr_assert.h"
+#include "vtr_log.h"
+
+/* Headers from vpr library */
+#include "timing_info.h"
+#include "AnalysisDelayCalculator.h"
+#include "net_delay.h"
+
+#include "annotate_simulation_setting.h"
+
+/* begin namespace openfpga */
+namespace openfpga {
+
+/********************************************************************
+ * Find the average signal density for all the nets of user's benchmark
+ *******************************************************************/
+static 
+float average_atom_net_signal_density(const AtomContext& atom_ctx,
+                                      const std::unordered_map<AtomNetId, t_net_power>& net_activity) {
+  float avg_density = 0.;
+  size_t net_cnt = 0;
+
+  /* get the average density of all the nets */
+  for (const AtomNetId& atom_net : atom_ctx.nlist.nets()) {
+    /* Skip the nets without any activity annotation */
+    if (0 == net_activity.count(atom_net)) {
+      continue;
+    }
+
+    /* Only care non-zero density nets */
+    if (0. == net_activity.at(atom_net).density) {
+      continue;
+    }
+
+    avg_density += net_activity.at(atom_net).density; 
+    net_cnt++;
+  }
+
+  return avg_density / net_cnt; 
+}                                     
+
+/********************************************************************
+ * Find the average signal density for all the nets of user's benchmark
+ * by applying a weight to each net density
+ *******************************************************************/
+static 
+float average_weighted_atom_net_signal_density(const AtomContext& atom_ctx,
+                                               const std::unordered_map<AtomNetId, t_net_power>& net_activity) {
+
+  float weighted_avg_density = 0.;
+  size_t weighted_net_cnt = 0;
+
+  /* get the average density of all the nets */
+  for (const AtomNetId& atom_net : atom_ctx.nlist.nets()) {
+    /* Skip the nets without any activity annotation */
+    if (0 == net_activity.count(atom_net)) {
+      continue;
+    }
+
+    /* Only care non-zero density nets */
+    if (0. == net_activity.at(atom_net).density) {
+      continue;
+    }
+
+    /* Consider the weight of fan-out */
+    size_t net_weight; 
+    if (0 == std::distance(atom_ctx.nlist.net_sinks(atom_net).begin(), atom_ctx.nlist.net_sinks(atom_net).end())) {
+      net_weight = 1;
+    } else {
+      VTR_ASSERT(0 < std::distance(atom_ctx.nlist.net_sinks(atom_net).begin(), atom_ctx.nlist.net_sinks(atom_net).end()));
+      net_weight = std::distance(atom_ctx.nlist.net_sinks(atom_net).begin(), atom_ctx.nlist.net_sinks(atom_net).end());
+    }
+    weighted_avg_density += net_activity.at(atom_net).density* net_weight;
+    weighted_net_cnt += net_weight;
+  }
+
+  return weighted_avg_density / weighted_net_cnt; 
+}
+
+/********************************************************************
+ * Find median of signal density of all the nets 
+ *******************************************************************/
+static 
+size_t median_atom_net_signal_density(const AtomContext& atom_ctx,
+                                      const std::unordered_map<AtomNetId, t_net_power>& net_activity) {
+  /* Sort the net density */
+  std::vector<float> net_densities;
+
+  net_densities.reserve(net_activity.size());
+
+  for (const AtomNetId& atom_net : atom_ctx.nlist.nets()) {
+    /* Skip the nets without any activity annotation */
+    if (0 == net_activity.count(atom_net)) {
+      continue;
+    }
+
+    net_densities.push_back(net_activity.at(atom_net).density);
+  }
+  std::sort(net_densities.begin(), net_densities.end());
+
+  /* Get the median */
+  /* check for even case */
+  if (net_densities.size() % 2 != 0) { 
+    return net_densities[size_t(net_densities.size() / 2)];
+  }            
+  
+  return 0.5 * (net_densities[size_t((net_densities.size() - 1) / 2)] + net_densities[size_t((net_densities.size() - 1) / 2)]);
+}
+
+/********************************************************************
+ * Find the number of clock cycles in simulation based on the average signal density
+ *******************************************************************/
+static 
+size_t recommend_num_sim_clock_cycle(const AtomContext& atom_ctx,
+                                     const std::unordered_map<AtomNetId, t_net_power>& net_activity, 
+                                     const float& sim_window_size) {
+
+  float average_density = average_atom_net_signal_density(atom_ctx, net_activity);
+  float average_weighted_density = average_weighted_atom_net_signal_density(atom_ctx, net_activity);
+  float median_density = median_atom_net_signal_density(atom_ctx, net_activity);
+
+  VTR_LOG("Average net density: %.2f\n",
+          average_density);
+  VTR_LOG("Median net density: %.2f\n",
+          median_density);
+  VTR_LOG("Average net density after weighting: %.2f\n",
+          average_weighted_density);
+
+  /* We have three choices in selecting the number of clock cycles based on signal density
+   * 1. average signal density 
+   * 2. median signal density
+   * 3. a mixed of average and median signal density 
+   */
+  size_t recmd_num_clock_cycles = 0;
+  if ( (0. == median_density) 
+    && (0. == average_density) ) {
+    recmd_num_clock_cycles = 1;
+    VTR_LOG_WARN("All the signal density is zero!\nNumber of clock cycles in simulations are set to be %ld!\n",
+                 recmd_num_clock_cycles);
+  } else if (0. == average_density) {
+    recmd_num_clock_cycles = (size_t)round(1 / median_density); 
+  } else if (0. == median_density) {
+    recmd_num_clock_cycles = (size_t)round(1 / average_density);
+  } else {
+    /* add a sim window size to balance the weight of average density and median density
+     * In practice, we find that there could be huge difference between avereage and median values 
+     * For a reasonable number of simulation clock cycles, we do this window size.
+     */
+    recmd_num_clock_cycles = (size_t)round(1 / (sim_window_size * average_density + (1 - sim_window_size) * median_density ));
+
+    VTR_LOG("Window size set for simulation: %.2f\n",
+            sim_window_size);
+    VTR_LOG("Net density after applying window size : %.2f\n", 
+            (sim_window_size * average_density + (1 - sim_window_size) * median_density));
+  }
+  
+  VTR_ASSERT(0 < recmd_num_clock_cycles);
+
+  return recmd_num_clock_cycles; 
+}
+
+/********************************************************************
+ * Annotate simulation setting based on VPR results
+ *  - If the operating clock frequency is set to follow the vpr timing results,
+ *    we will set a new operating clock frequency here
+ *  - If the number of clock cycles in simulation is set to be automatically determined,
+ *    we will infer the number based on the average signal density
+ *******************************************************************/
+void annotate_simulation_setting(const AtomContext& atom_ctx, 
+                                 const std::unordered_map<AtomNetId, t_net_power>& net_activity, 
+                                 SimulationSetting& sim_setting) {
+
+  /* Find if the operating frequency is binded to vpr results */
+  if (0. == sim_setting.operating_clock_frequency()) {
+    VTR_LOG("User specified the operating clock frequency to use VPR results\n");
+    /* Run timing analysis and collect critical path delay
+     * This code is copied from function vpr_analysis() in vpr_api.h 
+     * Should keep updated to latest VPR code base
+     * Note:
+     *   - MUST mention in documentation that VPR should be run in timing enabled mode
+     */
+    vtr::vector<ClusterNetId, float*> net_delay;
+    vtr::t_chunk net_delay_ch;
+    /* Load the net delays */
+    net_delay = alloc_net_delay(&net_delay_ch);
+    load_net_delay_from_routing(net_delay);
+
+    /* Do final timing analysis */
+    auto analysis_delay_calc = std::make_shared<AnalysisDelayCalculator>(atom_ctx.nlist, atom_ctx.lookup, net_delay);
+    auto timing_info = make_setup_hold_timing_info(analysis_delay_calc);
+    timing_info->update();
+
+    /* Get critical path delay. Update simulation settings */
+    float T_crit = timing_info->least_slack_critical_path().delay() * (1. + sim_setting.operating_clock_frequency_slack());
+    sim_setting.set_operating_clock_frequency(1 / T_crit); 
+    VTR_LOG("Use VPR critical path delay %g [ns] with a %g [%] slack in OpenFPGA.\n",
+            T_crit / 1e9, sim_setting.operating_clock_frequency_slack() * 100);
+  }
+  VTR_LOG("Will apply operating clock frequency %g [MHz] to simulations\n",
+          sim_setting.operating_clock_frequency() / 1e6);
+
+  if (0. == sim_setting.num_clock_cycles()) {
+    /* Find the number of clock cycles to be used in simulation 
+     * by average over the signal activity 
+     */
+    VTR_LOG("User specified the number of operating clock cycles to be inferred from signal activities\n");
+
+    /* Use a fixed simulation window size now. TODO: this could be specified by users */
+    size_t num_clock_cycles = recommend_num_sim_clock_cycle(atom_ctx,
+                                                            net_activity, 
+                                                            0.5);
+    sim_setting.set_num_clock_cycles(num_clock_cycles);
+
+    VTR_LOG("Will apply %lu operating clock cycles to simulations\n",
+            sim_setting.num_clock_cycles());
+  }
+}
+
+} /* end namespace openfpga */
--- a/openfpga/src/annotation/annotate_simulation_setting.h
+++ b/openfpga/src/annotation/annotate_simulation_setting.h
@ -0,0 +1,23 @@
+#ifndef ANNOTATE_SIMULATION_SETTING_H
+#define ANNOTATE_SIMULATION_SETTING_H
+
+/********************************************************************
+ * Include header files that are required by function declaration
+ *******************************************************************/
+#include "vpr_context.h"
+#include "openfpga_context.h"
+
+/********************************************************************
+ * Function declaration
+ *******************************************************************/
+
+/* begin namespace openfpga */
+namespace openfpga {
+
+void annotate_simulation_setting(const AtomContext& atom_ctx, 
+                                 const std::unordered_map<AtomNetId, t_net_power>& net_activity, 
+                                 SimulationSetting& sim_setting);
+
+} /* end namespace openfpga */
+
+#endif
--- a/openfpga/src/annotation/device_rr_gsb.cpp
+++ b/openfpga/src/annotation/device_rr_gsb.cpp
@ -28,13 +28,13 @@ vtr::Point<size_t> DeviceRRGSB::get_gsb_range() const {
 } 

 /* Get a rr switch block in the array with a coordinate */
-const RRGSB DeviceRRGSB::get_gsb(const vtr::Point<size_t>& coordinate) const {
+const RRGSB& DeviceRRGSB::get_gsb(const vtr::Point<size_t>& coordinate) const {
  VTR_ASSERT(validate_coordinate(coordinate));
  return rr_gsb_[coordinate.x()][coordinate.y()];
 } 

 /* Get a rr switch block in the array with a coordinate */
-const RRGSB DeviceRRGSB::get_gsb(const size_t& x, const size_t& y) const { 
+const RRGSB& DeviceRRGSB::get_gsb(const size_t& x, const size_t& y) const { 
  vtr::Point<size_t> coordinate(x, y);  
  return get_gsb(coordinate);
 }
@ -87,7 +87,7 @@ size_t DeviceRRGSB::get_num_gsb_unique_module() const {
 } 

 /* Get a rr switch block which a unique mirror */ 
-const RRGSB DeviceRRGSB::get_sb_unique_module(const size_t& index) const {
+const RRGSB& DeviceRRGSB::get_sb_unique_module(const size_t& index) const {
  VTR_ASSERT (validate_sb_unique_module_index(index));
  
  return rr_gsb_[sb_unique_module_[index].x()][sb_unique_module_[index].y()];
@ -130,7 +130,7 @@ const RRGSB& DeviceRRGSB::get_cb_unique_module(const t_rr_type& cb_type, const v
 } 

 /* Give a coordinate of a rr switch block, and return its unique mirror */ 
-const RRGSB DeviceRRGSB::get_sb_unique_module(const vtr::Point<size_t>& coordinate) const {
+const RRGSB& DeviceRRGSB::get_sb_unique_module(const vtr::Point<size_t>& coordinate) const {
  VTR_ASSERT(validate_coordinate(coordinate));
  size_t sb_unique_module_id = sb_unique_module_id_[coordinate.x()][coordinate.y()];  
  return get_sb_unique_module(sb_unique_module_id);
@ -193,6 +193,18 @@ void DeviceRRGSB::add_rr_gsb(const vtr::Point<size_t>& coordinate,
  rr_gsb_[coordinate.x()][coordinate.y()] = rr_gsb; 
 }

+/* Get a rr switch block in the array with a coordinate */
+RRGSB& DeviceRRGSB::get_mutable_gsb(const vtr::Point<size_t>& coordinate) {
+  VTR_ASSERT(validate_coordinate(coordinate));
+  return rr_gsb_[coordinate.x()][coordinate.y()];
+} 
+
+/* Get a rr switch block in the array with a coordinate */
+RRGSB& DeviceRRGSB::get_mutable_gsb(const size_t& x, const size_t& y) { 
+  vtr::Point<size_t> coordinate(x, y);  
+  return get_mutable_gsb(coordinate);
+}
+
 /* Add a switch block to the array, which will automatically identify and update the lists of unique mirrors and rotatable mirrors */
 void DeviceRRGSB::build_cb_unique_module(const RRGraph& rr_graph, const t_rr_type& cb_type) {
  /* Make sure a clean start */
@ -255,6 +267,7 @@ void DeviceRRGSB::build_sb_unique_module(const RRGraph& rr_graph) {
          break;
        }
      }
+
      /* Add to list if this is a unique mirror*/
      if (true == is_unique_module) {
        sb_unique_module_.push_back(sb_coordinate);
--- a/openfpga/src/annotation/device_rr_gsb.h
+++ b/openfpga/src/annotation/device_rr_gsb.h
@ -28,12 +28,12 @@ class DeviceRRGSB {
  public: /* Contructors */
  public: /* Accessors */
    vtr::Point<size_t> get_gsb_range() const; /* get the max coordinate of the switch block array */
-    const RRGSB get_gsb(const vtr::Point<size_t>& coordinate) const; /* Get a rr switch block in the array with a coordinate */
-    const RRGSB get_gsb(const size_t& x, const size_t& y) const; /* Get a rr switch block in the array with a coordinate */
+    const RRGSB& get_gsb(const vtr::Point<size_t>& coordinate) const; /* Get a rr switch block in the array with a coordinate */
+    const RRGSB& get_gsb(const size_t& x, const size_t& y) const; /* Get a rr switch block in the array with a coordinate */
    size_t get_num_gsb_unique_module() const; /* get the number of unique mirrors of GSB */
    size_t get_num_sb_unique_module() const; /* get the number of unique mirrors of switch blocks */
-    const RRGSB get_sb_unique_module(const size_t& index) const; /* Get a rr switch block which a unique mirror */ 
-    const RRGSB get_sb_unique_module(const vtr::Point<size_t>& coordinate) const; /* Get a rr switch block which a unique mirror */ 
+    const RRGSB& get_sb_unique_module(const size_t& index) const; /* Get a rr switch block which a unique mirror */ 
+    const RRGSB& get_sb_unique_module(const vtr::Point<size_t>& coordinate) const; /* Get a rr switch block which a unique mirror */ 
    const RRGSB& get_cb_unique_module(const t_rr_type& cb_type, const size_t& index) const; /* Get a rr switch block which a unique mirror */ 
    const RRGSB& get_cb_unique_module(const t_rr_type& cb_type, const vtr::Point<size_t>& coordinate) const;
    size_t get_num_cb_unique_module(const t_rr_type& cb_type) const; /* get the number of unique mirrors of CBs */
@ -43,6 +43,8 @@ class DeviceRRGSB {
    void reserve_sb_unique_submodule_id(const vtr::Point<size_t>& coordinate); /* Pre-allocate the rr_sb_unique_module_id matrix that the device requires */ 
    void resize_upon_need(const vtr::Point<size_t>& coordinate); /* Resize the rr_switch_block array if needed */ 
    void add_rr_gsb(const vtr::Point<size_t>& coordinate, const RRGSB& rr_gsb); /* Add a switch block to the array, which will automatically identify and update the lists of unique mirrors and rotatable mirrors */
+    RRGSB& get_mutable_gsb(const vtr::Point<size_t>& coordinate); /* Get a rr switch block in the array with a coordinate */
+    RRGSB& get_mutable_gsb(const size_t& x, const size_t& y); /* Get a rr switch block in the array with a coordinate */
    void build_unique_module(const RRGraph& rr_graph); /* Add a switch block to the array, which will automatically identify and update the lists of unique mirrors and rotatable mirrors */
    void clear(); /* clean the content */
  private: /* Internal cleaners */
--- a/openfpga/src/annotation/write_xml_device_rr_gsb.cpp
+++ b/openfpga/src/annotation/write_xml_device_rr_gsb.cpp
@ -0,0 +1,205 @@
+/***************************************************************************************
+ * Output internal structure of DeviceRRGSB to XML format 
+ ***************************************************************************************/
+/* Headers from vtrutil library */
+#include "vtr_log.h"
+#include "vtr_assert.h"
+#include "vtr_time.h"
+
+/* Headers from openfpgautil library */
+#include "openfpga_side_manager.h"
+#include "openfpga_digest.h"
+
+#include "openfpga_naming.h"
+#include "openfpga_rr_graph_utils.h"
+
+#include "write_xml_device_rr_gsb.h"
+
+/* begin namespace openfpga */
+namespace openfpga {
+
+/***************************************************************************************
+ * Output internal structure (only the switch block part) of a RRGSB to XML format 
+ ***************************************************************************************/
+static 
+void write_rr_switch_block_to_xml(const std::string fname_prefix,
+                                  const RRGraph& rr_graph,
+                                  const RRGSB& rr_gsb,
+                                  const bool& verbose) {
+  /* Prepare file name */
+  std::string fname(fname_prefix);
+  vtr::Point<size_t> gsb_coordinate(rr_gsb.get_sb_x(), rr_gsb.get_sb_y());
+  fname += generate_switch_block_module_name(gsb_coordinate);
+  fname += ".xml";
+
+  VTR_LOGV(verbose,
+           "Output internal structure of Switch Block to '%s'\n",
+           fname.c_str());
+
+  /* Create a file handler*/
+  std::fstream fp;
+  /* Open a file */
+  fp.open(fname, std::fstream::out | std::fstream::trunc);
+
+  /* Validate the file stream */
+  check_file_stream(fname.c_str(), fp);
+
+  /* Output location of the Switch Block */
+  fp << "<rr_gsb x=\"" << rr_gsb.get_x() << "\" y=\"" << rr_gsb.get_y() << "\""
+     << " num_sides=\"" << rr_gsb.get_num_sides() << "\">" << std::endl;
+
+  /* Output each side */ 
+  for (size_t side = 0; side < rr_gsb.get_num_sides(); ++side) {
+    SideManager gsb_side_manager(side);
+    enum e_side gsb_side = gsb_side_manager.get_side();
+   
+    /* Output IPIN nodes */ 
+    for (size_t inode = 0; inode < rr_gsb.get_num_ipin_nodes(gsb_side); ++inode) {
+      const RRNodeId& cur_rr_node = rr_gsb.get_ipin_node(gsb_side, inode);
+      /* General information of this IPIN */
+      fp << "\t<" << rr_node_typename[rr_graph.node_type(cur_rr_node)]
+         << " side=\"" << gsb_side_manager.to_string() 
+         << "\" index=\"" << inode 
+         << "\" mux_size=\"" << get_rr_graph_configurable_driver_nodes(rr_graph, cur_rr_node).size()
+         << "\">" 
+         << std::endl; 
+      /* General information of each driving nodes */
+      for (const RRNodeId& driver_node : get_rr_graph_configurable_driver_nodes(rr_graph, cur_rr_node)) {
+        /* Skip OPINs: they should be in direct connections */
+        if (OPIN == rr_graph.node_type(driver_node)) {
+          continue;
+        }
+
+        enum e_side chan_side = rr_gsb.get_cb_chan_side(gsb_side);
+        SideManager chan_side_manager(chan_side);
+
+         /* For channel node, we do not know the node direction
+         * But we are pretty sure it is either IN_PORT or OUT_PORT
+         * So we just try and find what is valid
+         */
+        int driver_node_index = rr_gsb.get_chan_node_index(chan_side, driver_node);
+        /* We must have a valide node index */
+        VTR_ASSERT(-1 != driver_node_index);
+
+        const RRSegmentId& des_segment_id = rr_gsb.get_chan_node_segment(chan_side, driver_node_index);
+
+        fp << "\t\t<driver_node type=\"" << rr_node_typename[rr_graph.node_type(driver_node)]
+           << "\" side=\"" << chan_side_manager.to_string() 
+           << "\" index=\"" << driver_node_index 
+           << "\" segment_id=\"" << size_t(des_segment_id)
+           << "\"/>" 
+           << std::endl; 
+      }
+      fp << "\t</" << rr_node_typename[rr_graph.node_type(cur_rr_node)] 
+         << ">" 
+         << std::endl; 
+    }
+
+    /* Output chan nodes */
+    for (size_t inode = 0; inode < rr_gsb.get_chan_width(gsb_side); ++inode) {
+      /* We only care OUT_PORT */
+      if (OUT_PORT != rr_gsb.get_chan_node_direction(gsb_side, inode)) {
+        continue;
+      }
+      /* Output drivers */
+      const RRNodeId& cur_rr_node = rr_gsb.get_chan_node(gsb_side, inode);
+      std::vector<RREdgeId> driver_rr_edges = rr_gsb.get_chan_node_in_edges(rr_graph, gsb_side, inode);
+
+      /* Output node information: location, index, side */
+      const RRSegmentId& src_segment_id = rr_gsb.get_chan_node_segment(gsb_side, inode);
+
+      /* Check if this node is directly connected to the node on the opposite side */
+      if (true == rr_gsb.is_sb_node_passing_wire(rr_graph, gsb_side, inode)) {
+        driver_rr_edges.clear();
+      }
+
+      fp << "\t<" << rr_node_typename[rr_graph.node_type(cur_rr_node)]
+         << " side=\"" << gsb_side_manager.to_string() 
+         << "\" index=\"" << inode 
+         << "\" segment_id=\"" << size_t(src_segment_id)
+         << "\" mux_size=\"" << driver_rr_edges.size()
+         << "\">" 
+         << std::endl; 
+
+      /* Direct connection: output the node on the opposite side */
+      if (0 == driver_rr_edges.size()) {
+        SideManager oppo_side =  gsb_side_manager.get_opposite();
+        fp << "\t\t<driver_node type=\"" << rr_node_typename[rr_graph.node_type(cur_rr_node)]
+           << "\" side=\"" << oppo_side.to_string() 
+           << "\" index=\"" << rr_gsb.get_node_index(rr_graph, cur_rr_node, oppo_side.get_side(), IN_PORT) 
+           << "\" segment_id=\"" << size_t(src_segment_id)
+           << "\"/>" 
+           << std::endl; 
+      } else {
+        for (const RREdgeId& driver_rr_edge : driver_rr_edges) {
+          const RRNodeId& driver_rr_node = rr_graph.edge_src_node(driver_rr_edge);
+          e_side driver_node_side = NUM_SIDES;
+          int driver_node_index = -1;
+          rr_gsb.get_node_side_and_index(rr_graph, driver_rr_node, IN_PORT, driver_node_side, driver_node_index);
+          VTR_ASSERT(-1 != driver_node_index);
+          SideManager driver_side(driver_node_side);
+
+          if (OPIN == rr_graph.node_type(driver_rr_node)) {
+            SideManager grid_side(rr_graph.node_side(driver_rr_node));
+            fp << "\t\t<driver_node type=\"" << rr_node_typename[OPIN]
+               << "\" side=\"" << driver_side.to_string() 
+               << "\" index=\"" << driver_node_index  
+               << "\" grid_side=\"" <<  grid_side.to_string() 
+               <<"\"/>" 
+               << std::endl; 
+          } else {
+            const RRSegmentId& des_segment_id = rr_gsb.get_chan_node_segment(driver_node_side, driver_node_index);
+            fp << "\t\t<driver_node type=\"" << rr_node_typename[rr_graph.node_type(driver_rr_node)]
+               << "\" side=\"" << driver_side.to_string() 
+               << "\" index=\"" << driver_node_index 
+               << "\" segment_id=\"" << size_t(des_segment_id)
+               << "\"/>" 
+               << std::endl; 
+          }
+        }  
+      }
+      fp << "\t</" << rr_node_typename[rr_graph.node_type(cur_rr_node)]
+         << ">" 
+         << std::endl; 
+    }
+  }
+
+  fp << "</rr_gsb>" 
+     << std::endl;
+
+  /* close a file */
+  fp.close();
+}
+
+/***************************************************************************************
+ * Output internal structure (only the switch block part) of all the RRGSBs
+ * in a DeviceRRGSB  to XML format 
+ ***************************************************************************************/
+void write_device_rr_gsb_to_xml(const char* sb_xml_dir, 
+                                const RRGraph& rr_graph,
+                                const DeviceRRGSB& device_rr_gsb,
+                                const bool& verbose) {
+  std::string xml_dir_name = format_dir_path(std::string(sb_xml_dir));
+
+  /* Create directories */
+  create_dir_path(xml_dir_name.c_str());
+
+  vtr::Point<size_t> sb_range = device_rr_gsb.get_gsb_range();
+
+  size_t gsb_counter = 0;
+
+  /* For each switch block, an XML file will be outputted */
+  for (size_t ix = 0; ix < sb_range.x(); ++ix) {
+    for (size_t iy = 0; iy < sb_range.y(); ++iy) {
+      const RRGSB& rr_gsb = device_rr_gsb.get_gsb(ix, iy);
+      write_rr_switch_block_to_xml(xml_dir_name, rr_graph, rr_gsb, verbose);
+      gsb_counter++;
+    }
+  }
+
+  VTR_LOG("Output %lu XML files to directory '%s'\n",
+          gsb_counter,
+          xml_dir_name.c_str());
+}
+
+} /* end namespace openfpga */
--- a/openfpga/src/annotation/write_xml_device_rr_gsb.h
+++ b/openfpga/src/annotation/write_xml_device_rr_gsb.h
@ -0,0 +1,25 @@
+#ifndef WRITE_XML_DEVICE_RR_GSB_H
+#define WRITE_XML_DEVICE_RR_GSB_H
+
+/********************************************************************
+ * Include header files that are required by function declaration
+ *******************************************************************/
+#include <string>
+#include "rr_graph_obj.h"
+#include "device_rr_gsb.h"
+
+/********************************************************************
+ * Function declaration
+ *******************************************************************/
+
+/* begin namespace openfpga */
+namespace openfpga {
+
+void write_device_rr_gsb_to_xml(const char* sb_xml_dir,
+                                const RRGraph& rr_graph,
+                                const DeviceRRGSB& device_rr_gsb,
+                                const bool& verbose);
+
+} /* end namespace openfpga */
+
+#endif
--- a/openfpga/src/base/openfpga_bitstream_command.cpp
+++ b/openfpga/src/base/openfpga_bitstream_command.cpp
@ -11,6 +11,88 @@
 /* begin namespace openfpga */
 namespace openfpga {

+/********************************************************************
+ * - Add a command to Shell environment: repack
+ * - Add associated options 
+ * - Add command dependency
+ *******************************************************************/
+static 
+ShellCommandId add_openfpga_repack_command(openfpga::Shell<OpenfpgaContext>& shell,
+                                           const ShellCommandClassId& cmd_class_id,
+                                           const std::vector<ShellCommandId>& dependent_cmds) {
+  Command shell_cmd("repack");
+  /* Add an option '--verbose' */
+  shell_cmd.add_option("verbose", false, "Enable verbose output");
+  
+  /* Add command 'repack' to the Shell */
+  ShellCommandId shell_cmd_id = shell.add_command(shell_cmd, "Pack physical programmable logic blocks");
+  shell.set_command_class(shell_cmd_id, cmd_class_id);
+  shell.set_command_execute_function(shell_cmd_id, repack);
+
+  /* Add command dependency to the Shell */
+  shell.set_command_dependency(shell_cmd_id, dependent_cmds);
+
+  return shell_cmd_id;
+}
+
+/********************************************************************
+ * - Add a command to Shell environment: build_architecture_bitstream
+ * - Add associated options 
+ * - Add command dependency
+ *******************************************************************/
+static 
+ShellCommandId add_openfpga_arch_bitstream_command(openfpga::Shell<OpenfpgaContext>& shell,
+                                                   const ShellCommandClassId& cmd_class_id,
+                                                   const std::vector<ShellCommandId>& dependent_cmds) {
+  Command shell_cmd("build_architecture_bitstream");
+
+  /* Add an option '--file' in short '-f'*/
+  CommandOptionId opt_file = shell_cmd.add_option("file", true, "file path to output the bitstream database");
+  shell_cmd.set_option_short_name(opt_file, "f");
+  shell_cmd.set_option_require_value(opt_file, openfpga::OPT_STRING);
+
+  /* Add an option '--verbose' */
+  shell_cmd.add_option("verbose", false, "Enable verbose output");
+  
+  /* Add command 'build_architecture_bitstream' to the Shell */
+  ShellCommandId shell_cmd_id = shell.add_command(shell_cmd, "Build fabric-independent bitstream database");
+  shell.set_command_class(shell_cmd_id, cmd_class_id);
+  shell.set_command_execute_function(shell_cmd_id, fpga_bitstream);
+
+  /* Add command dependency to the Shell */
+  shell.set_command_dependency(shell_cmd_id, dependent_cmds);
+
+  return shell_cmd_id;
+}
+
+/********************************************************************
+ * - Add a command to Shell environment: build_fabric_bitstream
+ * - Add associated options 
+ * - Add command dependency
+ *******************************************************************/
+static 
+ShellCommandId add_openfpga_fabric_bitstream_command(openfpga::Shell<OpenfpgaContext>& shell,
+                                                     const ShellCommandClassId& cmd_class_id,
+                                                     const std::vector<ShellCommandId>& dependent_cmds) {
+  Command shell_cmd("build_fabric_bitstream");
+
+  /* Add an option '--verbose' */
+  shell_cmd.add_option("verbose", false, "Enable verbose output");
+
+  /* Add command 'fabric_bitstream' to the Shell */
+  ShellCommandId shell_cmd_id = shell.add_command(shell_cmd, "Reorganize the fabric-independent bitstream for the FPGA fabric created by FPGA-Verilog");
+  shell.set_command_class(shell_cmd_id, cmd_class_id);
+  shell.set_command_execute_function(shell_cmd_id, build_fabric_bitstream);
+
+  /* Add command dependency to the Shell */
+  shell.set_command_dependency(shell_cmd_id, dependent_cmds);
+
+  return shell_cmd_id;
+}
+
+/********************************************************************
+ * Top-level function to add all the commands related to FPGA-Bitstream
+ *******************************************************************/
 void add_openfpga_bitstream_commands(openfpga::Shell<OpenfpgaContext>& shell) {
  /* Get the unique id of 'build_fabric' command which is to be used in creating the dependency graph */
  const ShellCommandId& shell_cmd_build_fabric_id = shell.command(std::string("build_fabric"));
@ -21,58 +103,26 @@ void add_openfpga_bitstream_commands(openfpga::Shell<OpenfpgaContext>& shell) {
  /******************************** 
   * Command 'repack' 
   */
-  Command shell_cmd_repack("repack");
-  /* Add an option '--verbose' */
-  shell_cmd_repack.add_option("verbose", false, "Enable verbose output");
-  
-  /* Add command 'repack' to the Shell */
-  ShellCommandId shell_cmd_repack_id = shell.add_command(shell_cmd_repack, "Pack physical programmable logic blocks");
-  shell.set_command_class(shell_cmd_repack_id, openfpga_bitstream_cmd_class);
-  shell.set_command_execute_function(shell_cmd_repack_id, repack);
-
  /* The 'repack' command should NOT be executed before 'build_fabric' */
  std::vector<ShellCommandId> cmd_dependency_repack;
  cmd_dependency_repack.push_back(shell_cmd_build_fabric_id);
-  shell.set_command_dependency(shell_cmd_repack_id, cmd_dependency_repack);
+  ShellCommandId shell_cmd_repack_id = add_openfpga_repack_command(shell, openfpga_bitstream_cmd_class, cmd_dependency_repack);

  /******************************** 
-   * Command 'fpga_bitstream' 
+   * Command 'build_architecture_bitstream' 
   */
-  Command shell_cmd_fpga_bitstream("fpga_bitstream");
-
-  /* Add an option '--file' in short '-f'*/
-  CommandOptionId fpga_bitstream_opt_file = shell_cmd_fpga_bitstream.add_option("file", true, "file path to output the bitstream database");
-  shell_cmd_fpga_bitstream.set_option_short_name(fpga_bitstream_opt_file, "f");
-  shell_cmd_fpga_bitstream.set_option_require_value(fpga_bitstream_opt_file, openfpga::OPT_STRING);
-  /* Add an option '--verbose' */
-  shell_cmd_fpga_bitstream.add_option("verbose", false, "Enable verbose output");
-  
-  /* Add command 'fpga_bitstream' to the Shell */
-  ShellCommandId shell_cmd_fpga_bitstream_id = shell.add_command(shell_cmd_fpga_bitstream, "Build bitstream database");
-  shell.set_command_class(shell_cmd_fpga_bitstream_id, openfpga_bitstream_cmd_class);
-  shell.set_command_execute_function(shell_cmd_fpga_bitstream_id, fpga_bitstream);
-
-  /* The 'fpga_bitstream' command should NOT be executed before 'repack' */
-  std::vector<ShellCommandId> cmd_dependency_fpga_bitstream;
-  cmd_dependency_fpga_bitstream.push_back(shell_cmd_repack_id);
-  shell.set_command_dependency(shell_cmd_fpga_bitstream_id, cmd_dependency_fpga_bitstream);
+  /* The 'build_architecture_bitstream' command should NOT be executed before 'repack' */
+  std::vector<ShellCommandId> cmd_dependency_arch_bitstream;
+  cmd_dependency_arch_bitstream.push_back(shell_cmd_repack_id);
+  ShellCommandId shell_cmd_arch_bitstream_id = add_openfpga_arch_bitstream_command(shell, openfpga_bitstream_cmd_class, cmd_dependency_arch_bitstream);

  /******************************** 
   * Command 'build_fabric_bitstream' 
   */
-  Command shell_cmd_fabric_bitstream("build_fabric_bitstream");
-  /* Add an option '--verbose' */
-  shell_cmd_fabric_bitstream.add_option("verbose", false, "Enable verbose output");
-
-  /* Add command 'fabric_bitstream' to the Shell */
-  ShellCommandId shell_cmd_fabric_bitstream_id = shell.add_command(shell_cmd_fabric_bitstream, "Reorganize the fabric-independent bitstream for the FPGA fabric created by FPGA-Verilog");
-  shell.set_command_class(shell_cmd_fabric_bitstream_id, openfpga_bitstream_cmd_class);
-  shell.set_command_execute_function(shell_cmd_fabric_bitstream_id, build_fabric_bitstream);
-
-  /* The 'fabric_bitstream' command should NOT be executed before 'fpga_bitstream' */
+  /* The 'build_fabric_bitstream' command should NOT be executed before 'build_architecture_bitstream' */
  std::vector<ShellCommandId> cmd_dependency_fabric_bitstream;
-  cmd_dependency_fabric_bitstream.push_back(shell_cmd_fpga_bitstream_id);
-  shell.set_command_dependency(shell_cmd_fabric_bitstream_id, cmd_dependency_fabric_bitstream);
+  cmd_dependency_fabric_bitstream.push_back(shell_cmd_arch_bitstream_id);
+  add_openfpga_fabric_bitstream_command(shell, openfpga_bitstream_cmd_class, cmd_dependency_fabric_bitstream);
 } 

 } /* end namespace openfpga */
--- a/openfpga/src/base/openfpga_build_fabric.cpp
+++ b/openfpga/src/base/openfpga_build_fabric.cpp
@ -30,28 +30,27 @@ void compress_routing_hierarchy(OpenfpgaContext& openfpga_ctx,

  /* Report the stats */
  VTR_LOGV(verbose_output, 
-           "Detected %lu unique X-direction connection blocks from a total of %d (compression rate=%d%)\n",
+           "Detected %lu unique X-direction connection blocks from a total of %d (compression rate=%.2f%)\n",
           openfpga_ctx.device_rr_gsb().get_num_cb_unique_module(CHANX),
           find_device_rr_gsb_num_cb_modules(openfpga_ctx.device_rr_gsb(), CHANX),
-           100 * (openfpga_ctx.device_rr_gsb().get_num_cb_unique_module(CHANX) / find_device_rr_gsb_num_cb_modules(openfpga_ctx.device_rr_gsb(), CHANX) - 1));
+           100. * ((float)find_device_rr_gsb_num_cb_modules(openfpga_ctx.device_rr_gsb(), CHANX) / (float)openfpga_ctx.device_rr_gsb().get_num_cb_unique_module(CHANX) - 1.));

  VTR_LOGV(verbose_output,
-           "Detected %lu unique Y-direction connection blocks from a total of %d (compression rate=%d%)\n",
+           "Detected %lu unique Y-direction connection blocks from a total of %d (compression rate=%.2f%)\n",
           openfpga_ctx.device_rr_gsb().get_num_cb_unique_module(CHANY),
           find_device_rr_gsb_num_cb_modules(openfpga_ctx.device_rr_gsb(), CHANY),
-           100 * (openfpga_ctx.device_rr_gsb().get_num_cb_unique_module(CHANY) / find_device_rr_gsb_num_cb_modules(openfpga_ctx.device_rr_gsb(), CHANY) - 1));
+           100. * ((float)find_device_rr_gsb_num_cb_modules(openfpga_ctx.device_rr_gsb(), CHANY) / (float)openfpga_ctx.device_rr_gsb().get_num_cb_unique_module(CHANY) - 1.));

  VTR_LOGV(verbose_output,
-           "Detected %lu unique switch blocks from a total of %d (compression rate=%d%)\n",
+           "Detected %lu unique switch blocks from a total of %d (compression rate=%.2f%)\n",
           openfpga_ctx.device_rr_gsb().get_num_sb_unique_module(),
           find_device_rr_gsb_num_sb_modules(openfpga_ctx.device_rr_gsb()),
-           100 * (openfpga_ctx.device_rr_gsb().get_num_sb_unique_module() / find_device_rr_gsb_num_sb_modules(openfpga_ctx.device_rr_gsb()) - 1));
+           100. * ((float)find_device_rr_gsb_num_sb_modules(openfpga_ctx.device_rr_gsb()) / (float)openfpga_ctx.device_rr_gsb().get_num_sb_unique_module() - 1.));

-  VTR_LOGV(verbose_output,
-           "Detected %lu unique general switch blocks from a total of %d (compression rate=%d%)\n",
-           openfpga_ctx.device_rr_gsb().get_num_gsb_unique_module(),
-           find_device_rr_gsb_num_gsb_modules(openfpga_ctx.device_rr_gsb()),
-           100 * (openfpga_ctx.device_rr_gsb().get_num_gsb_unique_module() / find_device_rr_gsb_num_gsb_modules(openfpga_ctx.device_rr_gsb()) - 1));
+  VTR_LOG("Detected %lu unique general switch blocks from a total of %d (compression rate=%.2f%)\n",
+          openfpga_ctx.device_rr_gsb().get_num_gsb_unique_module(),
+          find_device_rr_gsb_num_gsb_modules(openfpga_ctx.device_rr_gsb()),
+          100. * ((float)find_device_rr_gsb_num_gsb_modules(openfpga_ctx.device_rr_gsb()) / (float)openfpga_ctx.device_rr_gsb().get_num_gsb_unique_module() - 1.));
 }

 /********************************************************************
@ -66,6 +65,8 @@ void build_fabric(OpenfpgaContext& openfpga_ctx,
  
  if (true == cmd_context.option_enable(cmd, opt_compress_routing)) {
    compress_routing_hierarchy(openfpga_ctx, cmd_context.option_enable(cmd, opt_verbose));
+    /* Update flow manager to enable compress routing */
+    openfpga_ctx.mutable_flow_manager().set_compress_routing(true);
  }

  VTR_LOG("\n");
--- a/openfpga/src/base/openfpga_context.h
+++ b/openfpga/src/base/openfpga_context.h
@ -59,8 +59,8 @@ class OpenfpgaContext : public Context  {
    const openfpga::FlowManager& flow_manager() const { return flow_manager_; }
    const openfpga::BitstreamManager& bitstream_manager() const { return bitstream_manager_; }
    const std::vector<openfpga::ConfigBitId>& fabric_bitstream() const { return fabric_bitstream_; }
-    const openfpga::IoLocationMap& io_location_map() { return io_location_map_; }
-    const std::unordered_map<AtomNetId, t_net_power>& net_activity() { return net_activity_; }
+    const openfpga::IoLocationMap& io_location_map() const { return io_location_map_; }
+    const std::unordered_map<AtomNetId, t_net_power>& net_activity() const { return net_activity_; }
  public:  /* Public mutators */
    openfpga::Arch& mutable_arch() { return arch_; }
    openfpga::VprDeviceAnnotation& mutable_vpr_device_annotation() { return vpr_device_annotation_; }
--- a/openfpga/src/base/openfpga_flow_manager.cpp
+++ b/openfpga/src/base/openfpga_flow_manager.cpp
@ -8,6 +8,14 @@
 /* begin namespace openfpga */
 namespace openfpga {

+/**************************************************
+ * Public Constructor
+ *************************************************/
+FlowManager::FlowManager() {
+  /* Turn off compress_routing as default */
+  compress_routing_ = false;
+}
+
 /**************************************************
 * Public Accessors 
 *************************************************/
--- a/openfpga/src/base/openfpga_flow_manager.h
+++ b/openfpga/src/base/openfpga_flow_manager.h
@ -15,6 +15,8 @@ namespace openfpga {
 *
 *******************************************************************/
 class FlowManager {
+  public: /* Public constructor */
+    FlowManager();
  public: /* Public accessors */
    bool compress_routing() const;
  public: /* Public mutators */
--- a/openfpga/src/base/openfpga_link_arch.cpp
+++ b/openfpga/src/base/openfpga_link_arch.cpp
@ -2,8 +2,6 @@
 * This file includes functions to read an OpenFPGA architecture file
 * which are built on the libarchopenfpga library
 *******************************************************************/
-#include <cmath>
-#include <iterator>

 /* Headers from vtrutil library */
 #include "vtr_time.h"
@ -11,9 +9,6 @@
 #include "vtr_log.h"

 /* Headers from vpr library */
-#include "timing_info.h"
-#include "AnalysisDelayCalculator.h"
-#include "net_delay.h"
 #include "read_activity.h"

 #include "vpr_device_annotation.h"
@ -22,6 +17,7 @@
 #include "annotate_pb_graph.h"
 #include "annotate_routing.h"
 #include "annotate_rr_graph.h"
+#include "annotate_simulation_setting.h"
 #include "mux_library_builder.h"
 #include "build_tile_direct.h"
 #include "annotate_placement.h"
@ -55,163 +51,6 @@ bool is_vpr_rr_graph_supported(const RRGraph& rr_graph) {
  return true;
 }

-/********************************************************************
- * Find the number of clock cycles in simulation based on the average signal density
- *******************************************************************/
-static 
-size_t recommend_num_sim_clock_cycle(const AtomContext& atom_ctx,
-                                     const std::unordered_map<AtomNetId, t_net_power>& net_activity, 
-                                     const float& sim_window_size) {
-  size_t recmd_num_sim_clock_cycle = 0;
-
-  float avg_density = 0.;
-  size_t net_cnt = 0;
-
-  float weighted_avg_density = 0.;
-  size_t weighted_net_cnt = 0;
-
-  /* get the average density of all the nets */
-  for (const AtomNetId& atom_net : atom_ctx.nlist.nets()) {
-    /* Skip the nets without any activity annotation */
-    if (0 == net_activity.count(atom_net)) {
-      continue;
-    }
-
-    /* Only care non-zero density nets */
-    if (0. == net_activity.at(atom_net).density) {
-      continue;
-    }
-
-    avg_density += net_activity.at(atom_net).density; 
-    net_cnt++;
-
-    /* Consider the weight of fan-out */
-    size_t net_weight; 
-    if (0 == std::distance(atom_ctx.nlist.net_sinks(atom_net).begin(), atom_ctx.nlist.net_sinks(atom_net).end())) {
-      net_weight = 1;
-    } else {
-      VTR_ASSERT(0 < std::distance(atom_ctx.nlist.net_sinks(atom_net).begin(), atom_ctx.nlist.net_sinks(atom_net).end()));
-      net_weight = std::distance(atom_ctx.nlist.net_sinks(atom_net).begin(), atom_ctx.nlist.net_sinks(atom_net).end());
-    }
-    weighted_avg_density += net_activity.at(atom_net).density* net_weight;
-    weighted_net_cnt += net_weight;
-  }
-  avg_density = avg_density / net_cnt; 
-  weighted_avg_density = weighted_avg_density / weighted_net_cnt; 
-
-  /* Sort the net density */
-  std::vector<float> net_densities;
-  net_densities.reserve(net_cnt);
-  for (const AtomNetId& atom_net : atom_ctx.nlist.nets()) {
-    /* Skip the nets without any activity annotation */
-    if (0 == net_activity.count(atom_net)) {
-      continue;
-    }
-
-    /* Only care non-zero density nets */
-    if (0. == net_activity.at(atom_net).density) {
-      continue;
-    }
-
-    net_densities.push_back(net_activity.at(atom_net).density);
-  }
-  std::sort(net_densities.begin(), net_densities.end());
-  /* Get the median */
-  float median_density = 0.;
-  /* check for even case */
-  if (net_cnt % 2 != 0) { 
-    median_density = net_densities[size_t(net_cnt / 2)];
-  } else {              
-    median_density = 0.5 * (net_densities[size_t((net_cnt - 1) / 2)] + net_densities[size_t((net_cnt - 1) / 2)]);
-  }
-
-  /* It may be more reasonable to use median 
-   * But, if median density is 0, we use average density
-  */
-  if ((0. == median_density) && (0. == avg_density)) {
-    recmd_num_sim_clock_cycle = 1;
-    VTR_LOG_WARN("All the signal density is zero!\nNumber of clock cycles in simulations are set to be %ld!\n",
-                 recmd_num_sim_clock_cycle);
-  } else if (0. == avg_density) {
-      recmd_num_sim_clock_cycle = (int)round(1 / median_density); 
-  } else if (0. == median_density) {
-      recmd_num_sim_clock_cycle = (int)round(1 / avg_density);
-  } else {
-    /* add a sim window size to balance the weight of average density and median density
-     * In practice, we find that there could be huge difference between avereage and median values 
-     * For a reasonable number of simulation clock cycles, we do this window size.
-     */
-    recmd_num_sim_clock_cycle = (int)round(1 / (sim_window_size * avg_density + (1 - sim_window_size) * median_density ));
-  }
-  
-  VTR_ASSERT(0 < recmd_num_sim_clock_cycle);
-
-  VTR_LOG("Average net density: %.2f\n", avg_density);
-  VTR_LOG("Median net density: %.2f\n", median_density);
-  VTR_LOG("Average net density after weighting: %.2f\n", weighted_avg_density);
-  VTR_LOG("Window size set for Simulation: %.2f\n", sim_window_size);
-  VTR_LOG("Net density after Window size : %.2f\n", 
-          (sim_window_size * avg_density + (1 - sim_window_size) * median_density));
-  VTR_LOG("Recommend no. of clock cycles: %ld\n", recmd_num_sim_clock_cycle);
-
-  return recmd_num_sim_clock_cycle; 
-}
-
-/********************************************************************
- * Annotate simulation setting based on VPR results
- *  - If the operating clock frequency is set to follow the vpr timing results,
- *    we will set a new operating clock frequency here
- *  - If the number of clock cycles in simulation is set to be automatically determined,
- *    we will infer the number based on the average signal density
- *******************************************************************/
-static 
-void annotate_simulation_setting(const AtomContext& atom_ctx, 
-                                 const std::unordered_map<AtomNetId, t_net_power>& net_activity, 
-                                 SimulationSetting& sim_setting) {
-
-  /* Find if the operating frequency is binded to vpr results */
-  if (0. == sim_setting.operating_clock_frequency()) {
-    VTR_LOG("User specified the operating clock frequency to use VPR results\n");
-    /* Run timing analysis and collect critical path delay
-     * This code is copied from function vpr_analysis() in vpr_api.h 
-     * Should keep updated to latest VPR code base
-     * Note:
-     *   - MUST mention in documentation that VPR should be run in timing enabled mode
-     */
-    vtr::vector<ClusterNetId, float*> net_delay;
-    vtr::t_chunk net_delay_ch;
-    /* Load the net delays */
-    net_delay = alloc_net_delay(&net_delay_ch);
-    load_net_delay_from_routing(net_delay);
-
-    /* Do final timing analysis */
-    auto analysis_delay_calc = std::make_shared<AnalysisDelayCalculator>(atom_ctx.nlist, atom_ctx.lookup, net_delay);
-    auto timing_info = make_setup_hold_timing_info(analysis_delay_calc);
-    timing_info->update();
-
-    /* Get critical path delay. Update simulation settings */
-    float T_crit = timing_info->least_slack_critical_path().delay() * (1. + sim_setting.operating_clock_frequency_slack());
-    sim_setting.set_operating_clock_frequency(1 / T_crit); 
-    VTR_LOG("Use VPR critical path delay %g [ns] with a %g [%] slack in OpenFPGA.\n",
-            T_crit / 1e9, sim_setting.operating_clock_frequency_slack() * 100);
-  }
-  VTR_LOG("Will apply operating clock frequency %g [MHz] to simulations\n",
-          sim_setting.operating_clock_frequency() / 1e6);
-
-  if (0. == sim_setting.num_clock_cycles()) {
-    /* Find the number of clock cycles to be used in simulation by average over the signal activity */
-
-    VTR_LOG("User specified the number of operating clock cycles to be inferred from signal activities\n");
-    size_t num_clock_cycles = recommend_num_sim_clock_cycle(atom_ctx,
-                                                            net_activity, 
-                                                            0.5);
-    sim_setting.set_num_clock_cycles(num_clock_cycles);
-
-    VTR_LOG("Will apply %lu operating clock cycles to simulations\n",
-            sim_setting.num_clock_cycles());
-  }
-}
-
 /********************************************************************
 * Top-level function to link openfpga architecture to VPR, including:
 * - physical pb_type
@ -226,6 +65,7 @@ void link_arch(OpenfpgaContext& openfpga_ctx,
  vtr::ScopedStartFinishTimer timer("Link OpenFPGA architecture to VPR architecture");

  CommandOptionId opt_activity_file = cmd.option("activity_file");
+  CommandOptionId opt_sort_edge = cmd.option("sort_gsb_chan_node_in_edges");
  CommandOptionId opt_verbose = cmd.option("verbose");

  /* Annotate pb_type graphs
@ -272,13 +112,19 @@ void link_arch(OpenfpgaContext& openfpga_ctx,
                         openfpga_ctx.mutable_device_rr_gsb(),
                         cmd_context.option_enable(cmd, opt_verbose));

+  if (true == cmd_context.option_enable(cmd, opt_sort_edge)) {
+    sort_device_rr_gsb_chan_node_in_edges(g_vpr_ctx.device().rr_graph,
+                                          openfpga_ctx.mutable_device_rr_gsb());
+  } 
+
  /* Build multiplexer library */
  openfpga_ctx.mutable_mux_lib() = build_device_mux_library(g_vpr_ctx.device(),
                                                            const_cast<const OpenfpgaContext&>(openfpga_ctx)); 

  /* Build tile direct annotation */
  openfpga_ctx.mutable_tile_direct() = build_device_tile_direct(g_vpr_ctx.device(),
-                                                                openfpga_ctx.arch().arch_direct);
+                                                                openfpga_ctx.arch().arch_direct,
+                                                                cmd_context.option_enable(cmd, opt_verbose));

  /* Annotate placement results */
  annotate_mapped_blocks(g_vpr_ctx.device(), 
--- a/openfpga/src/base/openfpga_naming.cpp
+++ b/openfpga/src/base/openfpga_naming.cpp
@ -17,6 +17,19 @@
 /* begin namespace openfpga */
 namespace openfpga {

+/************************************************
+ * A generic function to generate the instance name
+ * in the following format:
+ * <instance_name>_<id>_
+ * This is mainly used by module manager to give a default
+ * name for each instance when outputting the module
+ * in Verilog/SPICE format
+ ***********************************************/
+std::string generate_instance_name(const std::string& instance_name,
+                                   const size_t& instance_id) {
+  return instance_name + std::string("_") + std::to_string(instance_id) + std::string("_");
+}
+
 /************************************************
 * Generate the node name for a multiplexing structure 
 * Case 1 : If there is an intermediate buffer followed by,
@ -483,7 +496,7 @@ std::string generate_grid_duplicated_port_name(const size_t& width,
  port_name += std::to_string(width);
  port_name += std::string("_height_");
  port_name += std::to_string(height);
-  port_name += std::string("_pin_");
+  port_name += std::string("__pin_");
  port_name += std::to_string(pin_id);
  port_name += std::string("_");

@ -1326,10 +1339,13 @@ std::string generate_pb_type_port_name(t_port* pb_type_port) {
 ********************************************************************/
 std::string generate_fpga_global_io_port_name(const std::string& prefix, 
                                              const CircuitLibrary& circuit_lib,
-                                              const CircuitModelId& circuit_model) {
+                                              const CircuitModelId& circuit_model,
+                                              const CircuitPortId& circuit_port) {
  std::string port_name(prefix);
  
  port_name += circuit_lib.model_name(circuit_model);
+  port_name += std::string("_");
+  port_name += circuit_lib.port_prefix(circuit_port);
   
  return port_name;
 }
--- a/openfpga/src/base/openfpga_naming.h
+++ b/openfpga/src/base/openfpga_naming.h
@ -23,6 +23,9 @@
 /* begin namespace openfpga */
 namespace openfpga {

+std::string generate_instance_name(const std::string& instance_name,
+                                   const size_t& instance_id);
+
 std::string generate_mux_node_name(const size_t& node_level, 
                                   const bool& add_buffer_postfix);

@ -239,7 +242,8 @@ std::string generate_pb_type_port_name(t_port* pb_type_port);

 std::string generate_fpga_global_io_port_name(const std::string& prefix, 
                                              const CircuitLibrary& circuit_lib,
-                                              const CircuitModelId& circuit_model);
+                                              const CircuitModelId& circuit_model,
+                                              const CircuitPortId& circuit_port);

 std::string generate_fpga_top_module_name();

--- a/openfpga/src/base/openfpga_pb_pin_fixup.cpp
+++ b/openfpga/src/base/openfpga_pb_pin_fixup.cpp
@ -64,7 +64,7 @@ void update_cluster_pin_with_post_routing_results(const DeviceContext& device_ct
                                                  const bool& verbose) {
  /* Handle each pin */
  auto logical_block = clustering_ctx.clb_nlist.block_type(blk_id);
-  auto physical_tile = pick_best_physical_type(logical_block);
+  auto physical_tile = device_ctx.grid[grid_coord.x()][grid_coord.y()].type;

  for (int j = 0; j < logical_block->pb_type->num_pins; j++) {
    /* Get the ptc num for the pin in rr_graph, we need t consider the z offset here
@ -224,8 +224,6 @@ void update_pb_pin_with_post_routing_results(const DeviceContext& device_ctx,
      if (true == is_empty_type(device_ctx.grid[io_coord.x()][io_coord.y()].type)) {
        continue;
      }
-      /* We must have an I/O type here */
-      VTR_ASSERT(true == is_io_type(device_ctx.grid[io_coord.x()][io_coord.y()].type));
      /* Get the mapped blocks to this grid */
      for (const ClusterBlockId& cluster_blk_id : placement_ctx.grid_blocks[io_coord.x()][io_coord.y()].blocks) {
        /* Skip invalid ids */ 
--- a/openfpga/src/base/openfpga_sdc.cpp
+++ b/openfpga/src/base/openfpga_sdc.cpp
@ -10,6 +10,7 @@

 #include "circuit_library_utils.h"
 #include "pnr_sdc_writer.h"
+#include "analysis_sdc_writer.h"
 #include "openfpga_sdc.h"

 /* Include global variables of VPR */
@ -26,12 +27,14 @@ void write_pnr_sdc(OpenfpgaContext& openfpga_ctx,

  CommandOptionId opt_output_dir = cmd.option("file");
  CommandOptionId opt_constrain_global_port = cmd.option("constrain_global_port");
+  CommandOptionId opt_constrain_non_clock_global_port = cmd.option("constrain_non_clock_global_port");
  CommandOptionId opt_constrain_grid = cmd.option("constrain_grid");
  CommandOptionId opt_constrain_sb = cmd.option("constrain_sb");
  CommandOptionId opt_constrain_cb = cmd.option("constrain_cb");
  CommandOptionId opt_constrain_configurable_memory_outputs = cmd.option("constrain_configurable_memory_outputs");
  CommandOptionId opt_constrain_routing_multiplexer_outputs = cmd.option("constrain_routing_multiplexer_outputs");
  CommandOptionId opt_constrain_switch_block_outputs = cmd.option("constrain_switch_block_outputs");
+  CommandOptionId opt_constrain_zero_delay_paths = cmd.option("constrain_zero_delay_paths");

  /* This is an intermediate data structure which is designed to modularize the FPGA-SDC
   * Keep it independent from any other outside data structures
@ -44,12 +47,14 @@ void write_pnr_sdc(OpenfpgaContext& openfpga_ctx,
  PnrSdcOption options(sdc_dir_path);

  options.set_constrain_global_port(cmd_context.option_enable(cmd, opt_constrain_global_port));
+  options.set_constrain_non_clock_global_port(cmd_context.option_enable(cmd, opt_constrain_non_clock_global_port));
  options.set_constrain_grid(cmd_context.option_enable(cmd, opt_constrain_grid));
  options.set_constrain_sb(cmd_context.option_enable(cmd, opt_constrain_sb));
  options.set_constrain_cb(cmd_context.option_enable(cmd, opt_constrain_cb));
  options.set_constrain_configurable_memory_outputs(cmd_context.option_enable(cmd, opt_constrain_configurable_memory_outputs));
  options.set_constrain_routing_multiplexer_outputs(cmd_context.option_enable(cmd, opt_constrain_routing_multiplexer_outputs));
  options.set_constrain_switch_block_outputs(cmd_context.option_enable(cmd, opt_constrain_switch_block_outputs));
+  options.set_constrain_zero_delay_paths(cmd_context.option_enable(cmd, opt_constrain_zero_delay_paths));

  /* We first turn on default sdc option and then disable part of them by following users' options */
  if (false == options.generate_sdc_pnr()) {
@ -77,4 +82,38 @@ void write_pnr_sdc(OpenfpgaContext& openfpga_ctx,
  }
 } 

+/********************************************************************
+ * A wrapper function to call the analysis SDC generator of FPGA-SDC
+ *******************************************************************/
+void write_analysis_sdc(OpenfpgaContext& openfpga_ctx,
+                        const Command& cmd, const CommandContext& cmd_context) {
+
+  CommandOptionId opt_output_dir = cmd.option("file");
+
+  /* This is an intermediate data structure which is designed to modularize the FPGA-SDC
+   * Keep it independent from any other outside data structures
+   */
+  std::string sdc_dir_path = format_dir_path(cmd_context.option_value(cmd, opt_output_dir));
+
+  /* Create directories */
+  create_dir_path(sdc_dir_path.c_str());
+
+  AnalysisSdcOption options(sdc_dir_path);
+  options.set_generate_sdc_analysis(true);
+
+  /* Collect global ports from the circuit library:
+   * TODO: should we place this in the OpenFPGA context?
+   */
+  std::vector<CircuitPortId> global_ports = find_circuit_library_global_ports(openfpga_ctx.arch().circuit_lib);
+
+  if (true == options.generate_sdc_analysis()) {
+    print_analysis_sdc(options,
+                       1./openfpga_ctx.arch().sim_setting.operating_clock_frequency(),
+                       g_vpr_ctx, 
+                       openfpga_ctx,
+                       global_ports,
+                       openfpga_ctx.flow_manager().compress_routing());
+  }
+}
+
 } /* end namespace openfpga */
--- a/openfpga/src/base/openfpga_sdc.h
+++ b/openfpga/src/base/openfpga_sdc.h
@ -18,6 +18,9 @@ namespace openfpga {
 void write_pnr_sdc(OpenfpgaContext& openfpga_ctx,
                   const Command& cmd, const CommandContext& cmd_context); 

+void write_analysis_sdc(OpenfpgaContext& openfpga_ctx,
+                        const Command& cmd, const CommandContext& cmd_context);
+
 } /* end namespace openfpga */

 #endif
--- a/openfpga/src/base/openfpga_sdc_command.cpp
+++ b/openfpga/src/base/openfpga_sdc_command.cpp
@ -16,9 +16,9 @@ namespace openfpga {
 * - Add command dependency
 *******************************************************************/
 static 
-void add_openfpga_write_pnr_sdc_command(openfpga::Shell<OpenfpgaContext>& shell,
-                                        const ShellCommandClassId& cmd_class_id,
-                                        const ShellCommandId& shell_cmd_build_fabric_id) {
+ShellCommandId add_openfpga_write_pnr_sdc_command(openfpga::Shell<OpenfpgaContext>& shell,
+                                                  const ShellCommandClassId& cmd_class_id,
+                                                  const std::vector<ShellCommandId>& dependent_cmds) {
  Command shell_cmd("write_pnr_sdc");

  /* Add an option '--file' in short '-f'*/
@ -29,6 +29,9 @@ void add_openfpga_write_pnr_sdc_command(openfpga::Shell<OpenfpgaContext>& shell,
  /* Add an option '--constrain_global_port' */
  shell_cmd.add_option("constrain_global_port", false, "Constrain all the global ports of FPGA fabric");

+  /* Add an option '--constrain_non_clock_global_port' */
+  shell_cmd.add_option("constrain_non_clock_global_port", false, "Constrain all the non-clock global ports as clock ports of FPGA fabric");
+
  /* Add an option '--constrain_grid' */
  shell_cmd.add_option("constrain_grid", false, "Constrain all the grids of FPGA fabric");

@ -47,6 +50,9 @@ void add_openfpga_write_pnr_sdc_command(openfpga::Shell<OpenfpgaContext>& shell,
  /* Add an option '--constrain_switch_block_outputs' */
  shell_cmd.add_option("constrain_switch_block_outputs", false, "Constrain all the outputs of switch blocks of FPGA fabric");

+  /* Add an option '--constrain_zero_delay_paths' */
+  shell_cmd.add_option("constrain_zero_delay_paths", false, "Constrain zero-delay paths in FPGA fabric");
+
  /* Add an option '--verbose' */
  shell_cmd.add_option("verbose", false, "Enable verbose output");
  
@ -55,25 +61,69 @@ void add_openfpga_write_pnr_sdc_command(openfpga::Shell<OpenfpgaContext>& shell,
  shell.set_command_class(shell_cmd_id, cmd_class_id);
  shell.set_command_execute_function(shell_cmd_id, write_pnr_sdc);

-  /* The 'build_fabric' command should NOT be executed before 'link_openfpga_arch' */
-  std::vector<ShellCommandId> cmd_dependency;
-  cmd_dependency.push_back(shell_cmd_build_fabric_id);
-  shell.set_command_dependency(shell_cmd_id, cmd_dependency);
+  /* Add command dependency to the Shell */
+  shell.set_command_dependency(shell_cmd_id, dependent_cmds);
+
+  return shell_cmd_id;
+}
+
+/********************************************************************
+ * - Add a command to Shell environment: generate PnR SDC 
+ * - Add associated options 
+ * - Add command dependency
+ *******************************************************************/
+static 
+ShellCommandId add_openfpga_write_analysis_sdc_command(openfpga::Shell<OpenfpgaContext>& shell,
+                                                       const ShellCommandClassId& cmd_class_id,
+                                                       const std::vector<ShellCommandId>& dependent_cmds) {
+  Command shell_cmd("write_analysis_sdc");
+
+  /* Add an option '--file' in short '-f'*/
+  CommandOptionId output_opt = shell_cmd.add_option("file", true, "Specify the output directory for SDC files");
+  shell_cmd.set_option_short_name(output_opt, "f");
+  shell_cmd.set_option_require_value(output_opt, openfpga::OPT_STRING);
+
+  /* Add an option '--verbose' */
+  shell_cmd.add_option("verbose", false, "Enable verbose output");
+  
+  /* Add command 'write_fabric_verilog' to the Shell */
+  ShellCommandId shell_cmd_id = shell.add_command(shell_cmd, "generate SDC files for timing analysis a PnRed FPGA fabric mapped by a benchmark");
+  shell.set_command_class(shell_cmd_id, cmd_class_id);
+  shell.set_command_execute_function(shell_cmd_id, write_analysis_sdc);
+
+  /* Add command dependency to the Shell */
+  shell.set_command_dependency(shell_cmd_id, dependent_cmds);
+
+  return shell_cmd_id;
 }

 void add_openfpga_sdc_commands(openfpga::Shell<OpenfpgaContext>& shell) {
  /* Get the unique id of 'build_fabric' command which is to be used in creating the dependency graph */
-  const ShellCommandId& shell_cmd_build_fabric_id = shell.command(std::string("build_fabric"));
+  const ShellCommandId& build_fabric_id = shell.command(std::string("build_fabric"));

  /* Add a new class of commands */
  ShellCommandClassId openfpga_sdc_cmd_class = shell.add_command_class("FPGA-SDC");

  /******************************** 
-   * Command 'write_fabric_verilog' 
+   * Command 'write_pnr_sdc' 
   */
+  /* The 'write_pnr_sdc' command should NOT be executed before 'build_fabric' */
+  std::vector<ShellCommandId> pnr_sdc_cmd_dependency;
+  pnr_sdc_cmd_dependency.push_back(build_fabric_id);
  add_openfpga_write_pnr_sdc_command(shell,
                                     openfpga_sdc_cmd_class,
-                                     shell_cmd_build_fabric_id);
+                                     pnr_sdc_cmd_dependency);
+
+  /******************************** 
+   * Command 'write_analysis_sdc' 
+   */
+  /* The 'write_analysis_sdc' command should NOT be executed before 'build_fabric' */
+  std::vector<ShellCommandId> analysis_sdc_cmd_dependency;
+  analysis_sdc_cmd_dependency.push_back(build_fabric_id);
+  add_openfpga_write_analysis_sdc_command(shell,
+                                          openfpga_sdc_cmd_class,
+                                          analysis_sdc_cmd_dependency);
+
 } 

 } /* end namespace openfpga */
--- a/openfpga/src/base/openfpga_setup_command.cpp
+++ b/openfpga/src/base/openfpga_setup_command.cpp
@ -9,6 +9,7 @@
 #include "openfpga_lut_truth_table_fixup.h"
 #include "check_netlist_naming_conflict.h"
 #include "openfpga_build_fabric.h"
+#include "openfpga_write_gsb.h"
 #include "openfpga_setup_command.h"

 /* begin namespace openfpga */
@ -57,7 +58,7 @@ ShellCommandId add_openfpga_write_arch_command(openfpga::Shell<OpenfpgaContext>&
  shell.set_command_class(shell_cmd_id, cmd_class_id);
  shell.set_command_const_execute_function(shell_cmd_id, write_arch);

-  /* The 'write_openfpga_arch' command should NOT be executed before 'read_openfpga_arch' */
+  /* Add command dependency to the Shell */
  shell.set_command_dependency(shell_cmd_id, dependent_cmds);

  return shell_cmd_id;
@ -78,6 +79,9 @@ ShellCommandId add_openfpga_link_arch_command(openfpga::Shell<OpenfpgaContext>&
  CommandOptionId opt_act_file = shell_cmd.add_option("activity_file", true, "file path to the signal activity");
  shell_cmd.set_option_require_value(opt_act_file, openfpga::OPT_STRING);

+  /* Add an option '--sort_gsb_chan_node_in_edges'*/
+  shell_cmd.add_option("sort_gsb_chan_node_in_edges", false, "Sort all the incoming edges for each routing track output node in General Switch Blocks (GSBs)");
+
  /* Add an option '--verbose' */
  shell_cmd.add_option("verbose", false, "Show verbose outputs");
  
@ -86,7 +90,36 @@ ShellCommandId add_openfpga_link_arch_command(openfpga::Shell<OpenfpgaContext>&
  shell.set_command_class(shell_cmd_id, cmd_class_id);
  shell.set_command_execute_function(shell_cmd_id, link_arch);

-  /* The 'link_openfpga_arch' command should NOT be executed before 'read_openfpga_arch' and 'vpr' */
+  /* Add command dependency to the Shell */
+  shell.set_command_dependency(shell_cmd_id, dependent_cmds);
+
+  return shell_cmd_id;
+}
+
+/********************************************************************
+ * - Add a command to Shell environment: write_gsb_to_xml
+ * - Add associated options 
+ * - Add command dependency
+ *******************************************************************/
+static 
+ShellCommandId add_openfpga_write_gsb_command(openfpga::Shell<OpenfpgaContext>& shell,
+                                              const ShellCommandClassId& cmd_class_id,
+                                              const std::vector<ShellCommandId>& dependent_cmds) {
+  Command shell_cmd("write_gsb_to_xml");
+  /* Add an option '--file' in short '-f'*/
+  CommandOptionId opt_file = shell_cmd.add_option("file", true, "path to the directory that stores the XML files");
+  shell_cmd.set_option_short_name(opt_file, "f");
+  shell_cmd.set_option_require_value(opt_file, openfpga::OPT_STRING);
+
+  /* Add an option '--verbose' */
+  shell_cmd.add_option("verbose", false, "Show verbose outputs");
+
+  /* Add command 'write_openfpga_arch' to the Shell */
+  ShellCommandId shell_cmd_id = shell.add_command(shell_cmd, "write internal structures of General Switch Blocks to XML file");
+  shell.set_command_class(shell_cmd_id, cmd_class_id);
+  shell.set_command_const_execute_function(shell_cmd_id, write_gsb);
+
+  /* Add command dependency to the Shell */
  shell.set_command_dependency(shell_cmd_id, dependent_cmds);

  return shell_cmd_id;
@ -116,7 +149,7 @@ ShellCommandId add_openfpga_check_netlist_naming_conflict_command(openfpga::Shel
  shell.set_command_class(shell_cmd_id, cmd_class_id);
  shell.set_command_execute_function(shell_cmd_id, check_netlist_naming_conflict);

-  /* The 'link_openfpga_arch' command should NOT be executed before 'vpr' */
+  /* Add command dependency to the Shell */
  shell.set_command_dependency(shell_cmd_id, dependent_cmds);

  return shell_cmd_id;
@ -133,6 +166,7 @@ ShellCommandId add_openfpga_pb_pin_fixup_command(openfpga::Shell<OpenfpgaContext
                                                 const std::vector<ShellCommandId>& dependent_cmds) {

  Command shell_cmd("pb_pin_fixup");
+
  /* Add an option '--verbose' */
  shell_cmd.add_option("verbose", false, "Show verbose outputs");

@ -141,7 +175,7 @@ ShellCommandId add_openfpga_pb_pin_fixup_command(openfpga::Shell<OpenfpgaContext
  shell.set_command_class(shell_cmd_id, cmd_class_id);
  shell.set_command_execute_function(shell_cmd_id, pb_pin_fixup);

-  /* The 'pb_pin_fixup' command should NOT be executed before 'read_openfpga_arch' and 'vpr' */
+  /* Add command dependency to the Shell */
  shell.set_command_dependency(shell_cmd_id, dependent_cmds);

  return shell_cmd_id;
@ -166,7 +200,7 @@ ShellCommandId add_openfpga_lut_truth_table_fixup_command(openfpga::Shell<Openfp
  shell.set_command_class(shell_cmd_id, cmd_class_id);
  shell.set_command_execute_function(shell_cmd_id, lut_truth_table_fixup);

-  /* The 'lut_truth_table_fixup' command should NOT be executed before 'read_openfpga_arch' and 'vpr' */
+  /* Add command dependency to the Shell */
  shell.set_command_dependency(shell_cmd_id, dependent_cmds);

  return shell_cmd_id;
@ -198,7 +232,7 @@ ShellCommandId add_openfpga_build_fabric_command(openfpga::Shell<OpenfpgaContext
  shell.set_command_class(shell_cmd_id, cmd_class_id);
  shell.set_command_execute_function(shell_cmd_id, build_fabric);

-  /* The 'build_fabric' command should NOT be executed before 'link_openfpga_arch' */
+  /* Add command dependency to the Shell */
  shell.set_command_dependency(shell_cmd_id, dependent_cmds);

  return shell_cmd_id;
@ -220,6 +254,7 @@ void add_openfpga_setup_commands(openfpga::Shell<OpenfpgaContext>& shell) {
  /******************************** 
   * Command 'write_openfpga_arch' 
   */
+  /* The 'write_openfpga_arch' command should NOT be executed before 'read_openfpga_arch' */
  std::vector<ShellCommandId> write_arch_dependent_cmds(1, read_arch_cmd_id);
  add_openfpga_write_arch_command(shell,
                                  openfpga_setup_cmd_class,
@ -228,15 +263,27 @@ void add_openfpga_setup_commands(openfpga::Shell<OpenfpgaContext>& shell) {
  /******************************** 
   * Command 'link_openfpga_arch' 
   */
+  /* The 'link_openfpga_arch' command should NOT be executed before 'vpr' */
  std::vector<ShellCommandId> link_arch_dependent_cmds;
  link_arch_dependent_cmds.push_back(read_arch_cmd_id);
  link_arch_dependent_cmds.push_back(vpr_cmd_id);
  ShellCommandId link_arch_cmd_id = add_openfpga_link_arch_command(shell,
                                                                   openfpga_setup_cmd_class,
                                                                   link_arch_dependent_cmds);
+  /******************************** 
+   * Command 'write_gsb' 
+   */
+  /* The 'write_gsb' command should NOT be executed before 'link_openfpga_arch' */
+  std::vector<ShellCommandId> write_gsb_dependent_cmds;
+  write_gsb_dependent_cmds.push_back(link_arch_cmd_id);
+  add_openfpga_write_gsb_command(shell,
+                                 openfpga_setup_cmd_class,
+                                 write_gsb_dependent_cmds);
+
  /******************************************* 
   * Command 'check_netlist_naming_conflict'
   */ 
+  /* The 'check_netlist_naming_conflict' command should NOT be executed before 'vpr' */
  std::vector<ShellCommandId> nlist_naming_dependent_cmds;
  nlist_naming_dependent_cmds.push_back(vpr_cmd_id);
  add_openfpga_check_netlist_naming_conflict_command(shell,
@ -246,6 +293,7 @@ void add_openfpga_setup_commands(openfpga::Shell<OpenfpgaContext>& shell) {
  /******************************** 
   * Command 'pb_pin_fixup' 
   */
+  /* The 'pb_pin_fixup' command should NOT be executed before 'read_openfpga_arch' and 'vpr' */
  std::vector<ShellCommandId> pb_pin_fixup_dependent_cmds;
  pb_pin_fixup_dependent_cmds.push_back(read_arch_cmd_id);
  pb_pin_fixup_dependent_cmds.push_back(vpr_cmd_id);
@ -256,6 +304,7 @@ void add_openfpga_setup_commands(openfpga::Shell<OpenfpgaContext>& shell) {
  /******************************** 
   * Command 'lut_truth_table_fixup' 
   */
+  /* The 'lut_truth_table_fixup' command should NOT be executed before 'read_openfpga_arch' and 'vpr' */
  std::vector<ShellCommandId> lut_tt_fixup_dependent_cmds;
  lut_tt_fixup_dependent_cmds.push_back(read_arch_cmd_id);
  lut_tt_fixup_dependent_cmds.push_back(vpr_cmd_id);
@ -265,6 +314,7 @@ void add_openfpga_setup_commands(openfpga::Shell<OpenfpgaContext>& shell) {
  /******************************** 
   * Command 'build_fabric' 
   */
+  /* The 'build_fabric' command should NOT be executed before 'link_openfpga_arch' */
  std::vector<ShellCommandId> build_fabric_dependent_cmds;
  build_fabric_dependent_cmds.push_back(link_arch_cmd_id);
  add_openfpga_build_fabric_command(shell,
--- a/openfpga/src/base/openfpga_title.cpp
+++ b/openfpga/src/base/openfpga_title.cpp
@ -11,6 +11,13 @@
 const char* create_openfpga_title() {
  std::string title;

+  title += std::string("\n");
+  title += std::string("            ___                   _____ ____   ____    _     \n"); 
+  title += std::string("           / _ \\ _ __   ___ _ __ |  ___|  _ \\ / ___|  / \\    \n"); 
+  title += std::string("          | | | | '_ \\ / _ \\ '_ \\| |_  | |_) | |  _  / _ \\   \n"); 
+  title += std::string("          | |_| | |_) |  __/ | | |  _| |  __/| |_| |/ ___ \\  \n"); 
+  title += std::string("           \\___/| .__/ \\___|_| |_|_|   |_|    \\____/_/   \\_\\ \n"); 
+  title += std::string("                |_|                                          \n"); 
  title += std::string("\n");
  title += std::string("               OpenFPGA: An Open-source FPGA IP Generator\n");
  title += std::string("                     Versatile Place and Route (VPR)\n");
--- a/openfpga/src/base/openfpga_verilog_command.cpp
+++ b/openfpga/src/base/openfpga_verilog_command.cpp
@ -17,9 +17,9 @@ namespace openfpga {
 * - Add command dependency
 *******************************************************************/
 static 
-void add_openfpga_write_fabric_verilog_command(openfpga::Shell<OpenfpgaContext>& shell,
-                                               const ShellCommandClassId& cmd_class_id,
-                                               const ShellCommandId& shell_cmd_build_fabric_id) {
+ShellCommandId add_openfpga_write_fabric_verilog_command(openfpga::Shell<OpenfpgaContext>& shell,
+                                                         const ShellCommandClassId& cmd_class_id,
+                                                         const std::vector<ShellCommandId>& dependent_cmds) {
  Command shell_cmd("write_fabric_verilog");

  /* Add an option '--file' in short '-f'*/
@ -50,10 +50,10 @@ void add_openfpga_write_fabric_verilog_command(openfpga::Shell<OpenfpgaContext>&
  shell.set_command_class(shell_cmd_id, cmd_class_id);
  shell.set_command_execute_function(shell_cmd_id, write_fabric_verilog);

-  /* The 'build_fabric' command should NOT be executed before 'link_openfpga_arch' */
-  std::vector<ShellCommandId> cmd_dependency;
-  cmd_dependency.push_back(shell_cmd_build_fabric_id);
-  shell.set_command_dependency(shell_cmd_id, cmd_dependency);
+  /* Add command dependency to the Shell */
+  shell.set_command_dependency(shell_cmd_id, dependent_cmds);
+
+  return shell_cmd_id;
 }

 /********************************************************************
@ -62,9 +62,9 @@ void add_openfpga_write_fabric_verilog_command(openfpga::Shell<OpenfpgaContext>&
 * - Add command dependency
 *******************************************************************/
 static 
-void add_openfpga_write_verilog_testbench_command(openfpga::Shell<OpenfpgaContext>& shell,
-                                                  const ShellCommandClassId& cmd_class_id,
-                                                  const ShellCommandId& shell_cmd_build_fabric_id) {
+ShellCommandId add_openfpga_write_verilog_testbench_command(openfpga::Shell<OpenfpgaContext>& shell,
+                                                            const ShellCommandClassId& cmd_class_id,
+                                                            const std::vector<ShellCommandId>& dependent_cmds) {
  Command shell_cmd("write_verilog_testbench");

  /* Add an option '--file' in short '-f'*/
@ -97,15 +97,15 @@ void add_openfpga_write_verilog_testbench_command(openfpga::Shell<OpenfpgaContex
  shell.set_command_class(shell_cmd_id, cmd_class_id);
  shell.set_command_execute_function(shell_cmd_id, write_verilog_testbench);

-  /* The command should NOT be executed before 'build_fabric' */
-  std::vector<ShellCommandId> cmd_dependency;
-  cmd_dependency.push_back(shell_cmd_build_fabric_id);
-  shell.set_command_dependency(shell_cmd_id, cmd_dependency);
+  /* Add command dependency to the Shell */
+  shell.set_command_dependency(shell_cmd_id, dependent_cmds);
+
+  return shell_cmd_id;
 }

 void add_openfpga_verilog_commands(openfpga::Shell<OpenfpgaContext>& shell) {
  /* Get the unique id of 'build_fabric' command which is to be used in creating the dependency graph */
-  const ShellCommandId& shell_cmd_build_fabric_id = shell.command(std::string("build_fabric"));
+  const ShellCommandId& build_fabric_cmd_id = shell.command(std::string("build_fabric"));

  /* Add a new class of commands */
  ShellCommandClassId openfpga_verilog_cmd_class = shell.add_command_class("FPGA-Verilog");
@ -113,16 +113,22 @@ void add_openfpga_verilog_commands(openfpga::Shell<OpenfpgaContext>& shell) {
  /******************************** 
   * Command 'write_fabric_verilog' 
   */
+  /* The 'write_fabric_verilog' command should NOT be executed before 'build_fabric' */
+  std::vector<ShellCommandId> fabric_verilog_dependent_cmds;
+  fabric_verilog_dependent_cmds.push_back(build_fabric_cmd_id);
  add_openfpga_write_fabric_verilog_command(shell,
                                            openfpga_verilog_cmd_class,
-                                            shell_cmd_build_fabric_id);
+                                            fabric_verilog_dependent_cmds);

  /******************************** 
   * Command 'write_verilog_testbench' 
   */
+  /* The command 'write_verilog_testbench' should NOT be executed before 'build_fabric' */
+  std::vector<ShellCommandId> verilog_testbench_dependent_cmds;
+  verilog_testbench_dependent_cmds.push_back(build_fabric_cmd_id);
  add_openfpga_write_verilog_testbench_command(shell,
                                               openfpga_verilog_cmd_class,
-                                               shell_cmd_build_fabric_id);
+                                               verilog_testbench_dependent_cmds);
 } 

 } /* end namespace openfpga */
--- a/openfpga/src/base/openfpga_write_gsb.cpp
+++ b/openfpga/src/base/openfpga_write_gsb.cpp
@ -0,0 +1,43 @@
+/********************************************************************
+ * This file includes functions to compress the hierachy of routing architecture
+ *******************************************************************/
+/* Headers from vtrutil library */
+#include "vtr_time.h"
+#include "vtr_log.h"
+
+#include "write_xml_device_rr_gsb.h"
+
+#include "openfpga_write_gsb.h"
+
+/* Include global variables of VPR */
+#include "globals.h"
+
+/* begin namespace openfpga */
+namespace openfpga {
+
+/********************************************************************
+ * Write internal structrure of all the General Switch Blocks (GSBs)
+ * to an XML file 
+ *******************************************************************/
+void write_gsb(const OpenfpgaContext& openfpga_ctx,
+               const Command& cmd, const CommandContext& cmd_context) { 
+
+  /* Check the option '--file' is enabled or not 
+   * Actually, it must be enabled as the shell interface will check 
+   * before reaching this fuction
+   */
+  CommandOptionId opt_file = cmd.option("file");
+  VTR_ASSERT(true == cmd_context.option_enable(cmd, opt_file));
+  VTR_ASSERT(false == cmd_context.option_value(cmd, opt_file).empty());
+
+  CommandOptionId opt_verbose = cmd.option("verbose");
+
+  std::string sb_file_name = cmd_context.option_value(cmd, opt_file);
+
+  write_device_rr_gsb_to_xml(sb_file_name.c_str(),
+                             g_vpr_ctx.device().rr_graph,
+                             openfpga_ctx.device_rr_gsb(),
+                             cmd_context.option_enable(cmd, opt_verbose));
+} 
+
+} /* end namespace openfpga */
--- a/openfpga/src/base/openfpga_write_gsb.h
+++ b/openfpga/src/base/openfpga_write_gsb.h
@ -0,0 +1,23 @@
+#ifndef OPENFPGA_WRITE_GSB_H
+#define OPENFPGA_WRITE_GSB_H
+
+/********************************************************************
+ * Include header files that are required by function declaration
+ *******************************************************************/
+#include "command.h"
+#include "command_context.h"
+#include "openfpga_context.h"
+
+/********************************************************************
+ * Function declaration
+ *******************************************************************/
+
+/* begin namespace openfpga */
+namespace openfpga {
+
+void write_gsb(const OpenfpgaContext& openfpga_ctx,
+               const Command& cmd, const CommandContext& cmd_context); 
+
+} /* end namespace openfpga */
+
+#endif
--- a/openfpga/src/fabric/build_grid_module_duplicated_pins.cpp
+++ b/openfpga/src/fabric/build_grid_module_duplicated_pins.cpp
@ -8,6 +8,8 @@
 *
 * Please follow this rules when creating new features!
 *******************************************************************/
+#include <algorithm>
+
 /* Headers from vtrutil library */
 #include "vtr_assert.h"

@ -17,6 +19,8 @@
 #include "openfpga_naming.h"
 #include "openfpga_interconnect_types.h"

+#include "openfpga_physical_tile_utils.h"
+
 #include "build_grid_module_utils.h"
 #include "build_grid_module_duplicated_pins.h"

@ -88,7 +92,8 @@ void add_grid_module_duplicated_pb_type_ports(ModuleManager& module_manager,
           * we do not duplicate in these cases */
          if ( (RECEIVER == pin_class_type)
            /* Xifan: I assume that each direct connection pin must have Fc=0. */
-            || ( (DRIVER == pin_class_type) && (0. == grid_type_descriptor->fc_specs[ipin].fc_value) ) ) {
+            || ( (DRIVER == pin_class_type)
+              && (0. == find_physical_tile_pin_Fc(grid_type_descriptor, ipin)) ) ) {
            vtr::Point<size_t> dummy_coordinate;
            std::string port_name = generate_grid_port_name(dummy_coordinate, iwidth, iheight, side, ipin, false);
            BasicPort grid_port(port_name, 0, 0);
@ -158,7 +163,7 @@ void add_grid_module_net_connect_duplicated_pb_graph_pin(ModuleManager& module_m
  size_t grid_pin_index = pb_graph_pin->pin_count_in_cluster 
                        + child_instance * grid_type_descriptor->num_pins / grid_type_descriptor->capacity;

-  int pin_width = grid_type_descriptor->pin_height_offset[grid_pin_index];
+  int pin_width = grid_type_descriptor->pin_width_offset[grid_pin_index];
  int pin_height = grid_type_descriptor->pin_height_offset[grid_pin_index];
  for (const e_side& side : grid_pin_sides) {
    if (true != grid_type_descriptor->pinloc[pin_width][pin_height][side][grid_pin_index]) {
@ -169,7 +174,7 @@ void add_grid_module_net_connect_duplicated_pb_graph_pin(ModuleManager& module_m
     * Follow the traditional recipe when adding nets!  
     * Xifan: I assume that each direct connection pin must have Fc=0. 
     */
-    if (0. == grid_type_descriptor->fc_specs[grid_pin_index].fc_value) {
+    if (0. == find_physical_tile_pin_Fc(grid_type_descriptor, grid_pin_index)) {
      /* Create a net to connect the grid pin to child module pin */
      ModuleNetId net = module_manager.create_module_net(grid_module);
      /* Find the port in grid_module */
--- a/openfpga/src/fabric/build_grid_modules.cpp
+++ b/openfpga/src/fabric/build_grid_modules.cpp
@ -18,6 +18,7 @@
 #include "openfpga_reserved_words.h"
 #include "openfpga_naming.h"
 #include "openfpga_interconnect_types.h"
+#include "openfpga_physical_tile_utils.h"
 #include "pb_type_utils.h"
 #include "pb_graph_utils.h"
 #include "module_manager_utils.h"
@ -143,6 +144,38 @@ void add_grid_module_nets_connect_pb_type_ports(ModuleManager& module_manager,
  }
 }

+/********************************************************************
+ *******************************************************************/
+static 
+void add_primitive_module_fpga_global_io_port(ModuleManager& module_manager,
+                                              const ModuleId& primitive_module,
+                                              const ModuleId& logic_module,
+                                              const size_t& logic_instance_id,
+                                              const ModuleManager::e_module_port_type& module_io_port_type,
+                                              const CircuitLibrary& circuit_lib,
+                                              const CircuitModelId& primitive_model,
+                                              const CircuitPortId& circuit_port) {
+  BasicPort module_port(generate_fpga_global_io_port_name(std::string(GIO_INOUT_PREFIX), circuit_lib, primitive_model, circuit_port), circuit_lib.port_size(circuit_port));
+  ModulePortId primitive_io_port_id = module_manager.add_port(primitive_module, module_port, module_io_port_type);
+  ModulePortId logic_io_port_id = module_manager.find_module_port(logic_module, circuit_lib.port_prefix(circuit_port));
+  BasicPort logic_io_port = module_manager.module_port(logic_module, logic_io_port_id);
+  VTR_ASSERT(logic_io_port.get_width() == module_port.get_width());
+
+  /* Wire the GPIO port form primitive_module to the logic module!*/
+  for (size_t pin_id = 0; pin_id < module_port.pins().size(); ++pin_id) {      
+    ModuleNetId net = module_manager.create_module_net(primitive_module);
+    if ( (ModuleManager::MODULE_GPIO_PORT == module_io_port_type)
+      || (ModuleManager::MODULE_GPIN_PORT == module_io_port_type) ) {
+      module_manager.add_module_net_source(primitive_module, net, primitive_module, 0, primitive_io_port_id, module_port.pins()[pin_id]);
+      module_manager.add_module_net_sink(primitive_module, net, logic_module, logic_instance_id, logic_io_port_id, logic_io_port.pins()[pin_id]);
+    } else {
+      VTR_ASSERT(ModuleManager::MODULE_GPOUT_PORT == module_io_port_type);
+      module_manager.add_module_net_source(primitive_module, net, logic_module, logic_instance_id, logic_io_port_id, logic_io_port.pins()[pin_id]);
+      module_manager.add_module_net_sink(primitive_module, net, primitive_module, 0, primitive_io_port_id, module_port.pins()[pin_id]);
+    }
+  }
+}
+
 /********************************************************************
 * Print Verilog modules of a primitive node in the pb_graph_node graph
 * This generic function can support all the different types of primitive nodes
@ -238,12 +271,6 @@ void build_primitive_block_module(ModuleManager& module_manager,
  std::string memory_module_name = generate_memory_module_name(circuit_lib, primitive_model, sram_model, std::string(MEMORY_MODULE_POSTFIX));
  ModuleId memory_module = module_manager.find_module(memory_module_name);

-  /* Vectors to record all the memory modules have been added
-   * They are used to add module nets of configuration bus
-   */
-  std::vector<ModuleId> memory_modules;
-  std::vector<size_t> memory_instances;
-
  /* If there is no memory module required, we can skip the assocated net addition */
  if (ModuleId::INVALID() != memory_module) {
    size_t memory_instance_id = module_manager.num_instance(primitive_module, memory_module); 
@ -263,7 +290,7 @@ void build_primitive_block_module(ModuleManager& module_manager,
  /* Add all the nets to connect configuration ports from memory module to primitive modules
   * This is a one-shot addition that covers all the memory modules in this primitive module!
   */
-  if (false == memory_modules.empty()) {
+  if (0 < module_manager.configurable_children(primitive_module).size()) {
    add_module_nets_memory_config_bus(module_manager, primitive_module, 
                                      sram_orgz_type, circuit_lib.design_tech_type(sram_model));
  }
@ -280,18 +307,32 @@ void build_primitive_block_module(ModuleManager& module_manager,
  if (CIRCUIT_MODEL_IOPAD == circuit_lib.model_type(primitive_model)) {
    std::vector<CircuitPortId> primitive_model_inout_ports = circuit_lib.model_ports_by_type(primitive_model, CIRCUIT_MODEL_PORT_INOUT);
    for (auto port : primitive_model_inout_ports) {
-      BasicPort module_port(generate_fpga_global_io_port_name(std::string(GIO_INOUT_PREFIX), circuit_lib, primitive_model), circuit_lib.port_size(port));
-      ModulePortId primitive_gpio_port_id = module_manager.add_port(primitive_module, module_port, ModuleManager::MODULE_GPIO_PORT);
-      ModulePortId logic_gpio_port_id = module_manager.find_module_port(logic_module, circuit_lib.port_prefix(port));
-      BasicPort logic_gpio_port = module_manager.module_port(logic_module, logic_gpio_port_id);
-      VTR_ASSERT(logic_gpio_port.get_width() == module_port.get_width());
+      add_primitive_module_fpga_global_io_port(module_manager, primitive_module,
+                                               logic_module, logic_instance_id,
+                                               ModuleManager::MODULE_GPIO_PORT,
+                                               circuit_lib,
+                                               primitive_model,
+                                               port);
+    }
+  }

-      /* Wire the GPIO port form primitive_module to the logic module!*/
-      for (size_t pin_id = 0; pin_id < module_port.pins().size(); ++pin_id) {      
-        ModuleNetId net = module_manager.create_module_net(primitive_module);
-        module_manager.add_module_net_source(primitive_module, net, primitive_module, 0, primitive_gpio_port_id, module_port.pins()[pin_id]);
-        module_manager.add_module_net_sink(primitive_module, net, logic_module, logic_instance_id, logic_gpio_port_id, logic_gpio_port.pins()[pin_id]);
-      }
+  /* Find the other i/o ports required by the primitive node, and add them to the module */
+  for (const auto& port : circuit_lib.model_global_ports(primitive_model, false)) {
+    if ( (CIRCUIT_MODEL_PORT_INPUT == circuit_lib.port_type(port))
+      && (true == circuit_lib.port_is_io(port)) ) {
+      add_primitive_module_fpga_global_io_port(module_manager, primitive_module,
+                                               logic_module, logic_instance_id,
+                                               ModuleManager::MODULE_GPIN_PORT,
+                                               circuit_lib,
+                                               primitive_model,
+                                               port);
+    } else if (CIRCUIT_MODEL_PORT_OUTPUT == circuit_lib.port_type(port)) {
+      add_primitive_module_fpga_global_io_port(module_manager, primitive_module,
+                                               logic_module, logic_instance_id,
+                                               ModuleManager::MODULE_GPOUT_PORT,
+                                               circuit_lib,
+                                               primitive_model,
+                                               port);
    }
  }

@ -428,6 +469,12 @@ void add_module_pb_graph_pin_interc(ModuleManager& module_manager,
    size_t wire_instance = module_manager.num_instance(pb_module, wire_module);
    module_manager.add_child_module(pb_module, wire_module);

+    /* Give an instance name: this name should be consistent with the block name given in SDC generator,
+     * If you want to bind the SDC generation to modules
+     */
+    std::string wire_instance_name = generate_instance_name(module_manager.module_name(wire_module), wire_instance);
+    module_manager.set_child_instance_name(pb_module, wire_module, wire_instance, wire_instance_name);
+
    /* Ensure input and output ports of the wire model has only 1 pin respectively */
    VTR_ASSERT(1 == circuit_lib.port_size(interc_model_inputs[0]));
    VTR_ASSERT(1 == circuit_lib.port_size(interc_model_outputs[0]));
@ -872,7 +919,7 @@ void rec_build_logical_tile_modules(ModuleManager& module_manager,
  /* Add module nets to connect memory cells inside
   * This is a one-shot addition that covers all the memory modules in this pb module!
   */
-  if (false == memory_modules.empty()) {
+  if (0 < module_manager.configurable_children(pb_module).size()) {
    add_module_nets_memory_config_bus(module_manager, pb_module, 
                                      sram_orgz_type, circuit_lib.design_tech_type(sram_model));
  }
@ -1093,13 +1140,21 @@ void build_grid_modules(ModuleManager& module_manager,
    if (true == is_empty_type(&physical_tile)) {
      continue;
    } else if (true == is_io_type(&physical_tile)) {
-      /* Special for I/O block, generate one module for each border side */
-      for (int iside = 0; iside < NUM_SIDES; iside++) {
-        SideManager side_manager(iside);
+      /* Special for I/O block:
+       * We will search the grids and see where the I/O blocks are located:
+       * - If a I/O block locates on border sides of FPGA fabric:
+       *   i.e., one or more from {TOP, RIGHT, BOTTOM, LEFT},
+       *   we will generate one module for each border side 
+       * - If a I/O block locates in the center of FPGA fabric:
+       *   we will generate one module with NUM_SIDES (same treatment as regular grids) 
+       */
+      std::set<e_side> io_type_sides = find_physical_io_tile_located_sides(device_ctx.grid,
+                                                                           &physical_tile);
+      for (const e_side& io_type_side : io_type_sides) {
        build_physical_tile_module(module_manager, circuit_lib,
                                   sram_orgz_type, sram_model,
                                   &physical_tile,
-                                   side_manager.get_side(),
+                                   io_type_side,
                                   duplicate_grid_pin,
                                   verbose);
      } 
--- a/openfpga/src/fabric/build_routing_modules.cpp
+++ b/openfpga/src/fabric/build_routing_modules.cpp
@ -19,6 +19,7 @@
 #include "openfpga_reserved_words.h"
 #include "openfpga_naming.h"

+#include "rr_gsb_utils.h"
 #include "openfpga_rr_graph_utils.h"
 #include "module_manager_utils.h"
 #include "build_module_graph_utils.h"
@ -227,7 +228,7 @@ void build_switch_block_interc_modules(ModuleManager& module_manager,

  /* Determine if the interc lies inside a channel wire, that is interc between segments */
  if (false == rr_gsb.is_sb_node_passing_wire(rr_graph, chan_side, chan_node_id)) {
-    driver_rr_nodes = get_rr_graph_configurable_driver_nodes(rr_graph, cur_rr_node);
+    driver_rr_nodes = get_rr_gsb_chan_node_configurable_driver_nodes(rr_graph, rr_gsb, chan_side, chan_node_id);
    /* Special: if there are zero-driver nodes. We skip here */
    if (0 == driver_rr_nodes.size()) {
      return; 
@ -470,6 +471,18 @@ void build_connection_block_module_short_interc(ModuleManager& module_manager,
  VTR_ASSERT_SAFE(1 == driver_rr_nodes.size());
  const RRNodeId& driver_rr_node = driver_rr_nodes[0]; 

+  /* Xifan Tang: VPR considers delayless switch to be configurable
+   * As a result, the direct connection is considered to be configurable...
+   * Here, I simply kick out OPINs in CB connection because they should be built
+   * in the top mopdule.
+   * 
+   * Note: this MUST BE reconsidered if we do have OPIN connected to IPINs 
+   * through a programmable multiplexer!!!
+   */
+  if (OPIN == rr_graph.node_type(driver_rr_node)) {
+    return;
+  }
+
  VTR_ASSERT((CHANX == rr_graph.node_type(driver_rr_node)) || (CHANY == rr_graph.node_type(driver_rr_node)));

  /* Create port description for the routing track middle output */
--- a/openfpga/src/fabric/build_top_module.cpp
+++ b/openfpga/src/fabric/build_top_module.cpp
@ -105,6 +105,13 @@ vtr::Matrix<size_t> add_top_module_grid_instances(ModuleManager& module_manager,
      /* Skip width or height > 1 tiles (mostly heterogeneous blocks) */
      if ( (0 < grids[ix][iy].width_offset)
        || (0 < grids[ix][iy].height_offset)) {
+        /* Find the root of this grid, the instance id should be valid. 
+         * We just copy it here
+         */
+        vtr::Point<size_t> root_grid_coord(ix - grids[ix][iy].width_offset,
+                                           iy - grids[ix][iy].height_offset);
+        VTR_ASSERT(size_t(-1) != grid_instance_ids[root_grid_coord.x()][root_grid_coord.y()]);
+        grid_instance_ids[ix][iy] = grid_instance_ids[root_grid_coord.x()][root_grid_coord.y()];
        continue;
      }
      /* We should not meet any I/O grid */
@ -153,10 +160,16 @@ vtr::Matrix<size_t> add_top_module_grid_instances(ModuleManager& module_manager,
      /* Skip width, height > 1 tiles (mostly heterogeneous blocks) */
      if ( (0 < grids[io_coordinate.x()][io_coordinate.y()].width_offset)
        || (0 < grids[io_coordinate.x()][io_coordinate.y()].height_offset)) {
+        /* Find the root of this grid, the instance id should be valid. 
+         * We just copy it here
+         */
+        vtr::Point<size_t> root_grid_coord(io_coordinate.x() - grids[io_coordinate.x()][io_coordinate.y()].width_offset,
+                                           io_coordinate.y() - grids[io_coordinate.x()][io_coordinate.y()].height_offset);
+        VTR_ASSERT(size_t(-1) != grid_instance_ids[root_grid_coord.x()][root_grid_coord.y()]);
+        grid_instance_ids[io_coordinate.x()][io_coordinate.y()] = grid_instance_ids[root_grid_coord.x()][root_grid_coord.y()];
        continue;
      }
-      /* We should not meet any I/O grid */
-      VTR_ASSERT(true == is_io_type(grids[io_coordinate.x()][io_coordinate.y()].type));
+
      /* Add a grid module to top_module*/
      grid_instance_ids[io_coordinate.x()][io_coordinate.y()] = add_top_module_grid_instance(module_manager, top_module, grids[io_coordinate.x()][io_coordinate.y()].type, io_side, io_coordinate);

--- a/openfpga/src/fabric/build_top_module_connection.cpp
+++ b/openfpga/src/fabric/build_top_module_connection.cpp
@ -12,6 +12,7 @@
 #include "openfpga_naming.h"
 #include "pb_type_utils.h"
 #include "rr_gsb_utils.h"
+#include "openfpga_physical_tile_utils.h"

 #include "build_top_module_utils.h"
 #include "build_top_module_connection.h"
@ -62,6 +63,11 @@ void add_top_module_nets_connect_grids_and_sb(ModuleManager& module_manager,
                                              const vtr::Matrix<size_t>& sb_instance_ids,
                                              const bool& compact_routing_hierarchy) {

+  /* Skip those Switch blocks that do not exist */
+  if (false == rr_gsb.is_sb_exist()) {
+    return;
+  }
+
  /* We could have two different coordinators, one is the instance, the other is the module */
  vtr::Point<size_t> instance_sb_coordinate(rr_gsb.get_sb_x(), rr_gsb.get_sb_y());
  vtr::Point<size_t> module_gsb_coordinate(rr_gsb.get_x(), rr_gsb.get_y());
@ -109,9 +115,10 @@ void add_top_module_nets_connect_grids_and_sb(ModuleManager& module_manager,
      /* Collect sink-related information */
      vtr::Point<size_t> sink_sb_port_coord(rr_graph.node_xlow(module_sb.get_opin_node(side_manager.get_side(), inode)),
                                            rr_graph.node_ylow(module_sb.get_opin_node(side_manager.get_side(), inode)));
+      size_t sink_grid_pin_index = rr_graph.node_pin_num(module_sb.get_opin_node(side_manager.get_side(), inode));
      std::string sink_sb_port_name = generate_sb_module_grid_port_name(side_manager.get_side(),
                                                                        rr_graph.node_side(module_sb.get_opin_node(side_manager.get_side(), inode)),
-                                                                        src_grid_pin_index); 
+                                                                        sink_grid_pin_index); 
      ModulePortId sink_sb_port_id = module_manager.find_module_port(sink_sb_module, sink_sb_port_name);
      VTR_ASSERT(true == module_manager.valid_module_port_id(sink_sb_module, sink_sb_port_id));
      BasicPort sink_sb_port =  module_manager.module_port(sink_sb_module, sink_sb_port_id); 
@ -178,6 +185,11 @@ void add_top_module_nets_connect_grids_and_sb_with_duplicated_pins(ModuleManager
                                                                   const vtr::Matrix<size_t>& sb_instance_ids,
                                                                   const bool& compact_routing_hierarchy) {

+  /* Skip those Switch blocks that do not exist */
+  if (false == rr_gsb.is_sb_exist()) {
+    return;
+  }
+
  /* We could have two different coordinators, one is the instance, the other is the module */
  vtr::Point<size_t> instance_sb_coordinate(rr_gsb.get_sb_x(), rr_gsb.get_sb_y());
  vtr::Point<size_t> module_gsb_coordinate(rr_gsb.get_x(), rr_gsb.get_y());
@ -232,7 +244,7 @@ void add_top_module_nets_connect_grids_and_sb_with_duplicated_pins(ModuleManager
       * For other duplicated pins, we follow the new naming
       */
      std::string src_grid_port_name;
-      if (0. == grids[grid_coordinate.x()][grid_coordinate.y()].type->fc_specs[src_grid_pin_index].fc_value) {
+      if (0. == find_physical_tile_pin_Fc(grids[grid_coordinate.x()][grid_coordinate.y()].type, src_grid_pin_index)) {
        src_grid_port_name = generate_grid_port_name(grid_coordinate, src_grid_pin_width, src_grid_pin_height, 
                                                     rr_graph.node_side(rr_gsb.get_opin_node(side_manager.get_side(), inode)),
                                                     src_grid_pin_index, false);
@ -248,9 +260,10 @@ void add_top_module_nets_connect_grids_and_sb_with_duplicated_pins(ModuleManager
      /* Collect sink-related information */
      vtr::Point<size_t> sink_sb_port_coord(rr_graph.node_xlow(module_sb.get_opin_node(side_manager.get_side(), inode)),
                                            rr_graph.node_ylow(module_sb.get_opin_node(side_manager.get_side(), inode)));
+      size_t sink_grid_pin_index = rr_graph.node_pin_num(module_sb.get_opin_node(side_manager.get_side(), inode));
      std::string sink_sb_port_name = generate_sb_module_grid_port_name(side_manager.get_side(),
                                                                        rr_graph.node_side(module_sb.get_opin_node(side_manager.get_side(), inode)),
-                                                                        src_grid_pin_index); 
+                                                                        sink_grid_pin_index); 
      ModulePortId sink_sb_port_id = module_manager.find_module_port(sink_sb_module, sink_sb_port_name);
      VTR_ASSERT(true == module_manager.valid_module_port_id(sink_sb_module, sink_sb_port_id));
      BasicPort sink_sb_port =  module_manager.module_port(sink_sb_module, sink_sb_port_id); 
@ -641,6 +654,7 @@ void add_top_module_nets_connect_grids_and_gsbs(ModuleManager& module_manager,
    for (size_t iy = 0; iy < gsb_range.y(); ++iy) {
      vtr::Point<size_t> gsb_coordinate(ix, iy);
      const RRGSB& rr_gsb = device_rr_gsb.get_gsb(ix, iy);
+
      /* Connect the grid pins of the GSB to adjacent grids */
      if (false == duplicate_grid_pin) {
        add_top_module_nets_connect_grids_and_sb(module_manager, top_module, 
--- a/openfpga/src/fabric/build_top_module_directs.cpp
+++ b/openfpga/src/fabric/build_top_module_directs.cpp
@ -95,6 +95,11 @@ void add_module_nets_tile_direct_connection(ModuleManager& module_manager,
  size_t src_pin_height = grids[src_clb_coord.x()][src_clb_coord.y()].type->pin_height_offset[src_tile_pin]; 
  std::string src_port_name = generate_grid_port_name(src_clb_coord, src_pin_width, src_pin_height, src_pin_grid_side, src_tile_pin, false);
  ModulePortId src_port_id = module_manager.find_module_port(src_grid_module, src_port_name); 
+  if (true != module_manager.valid_module_port_id(src_grid_module, src_port_id)) {
+    VTR_LOG_ERROR("Fail to find port '%s[%lu][%lu].%s'\n",
+                  src_module_name.c_str(), src_clb_coord.x(), src_clb_coord.y(),
+                  src_port_name.c_str());
+  }
  VTR_ASSERT(true == module_manager.valid_module_port_id(src_grid_module, src_port_id));
  VTR_ASSERT(1 == module_manager.module_port(src_grid_module, src_port_id).get_width());

--- a/openfpga/src/fabric/build_top_module_memory.cpp
+++ b/openfpga/src/fabric/build_top_module_memory.cpp
@ -177,7 +177,29 @@ void organize_top_module_tile_memory_modules(ModuleManager& module_manager,
 * the sequence of memory_modules and memory_instances will follow
 * a chain of tiles considering their physical location
 *
- * Inter tile connection:
+ * Inter-tile connection:
+ *
+ * Inter-tile connection always start from the I/O peripherals 
+ * and the core tiles (CLBs and heterogeneous blocks).
+ * The sequence of configuration memory will be organized as follows:
+ *   - I/O peripherals
+ *     - BOTTOM side (From left to right)
+ *     - RIGHT side (From bottom to top)
+ *     - TOP side (From left to right)
+ *     - LEFT side (From top to bottom)
+ *   - Core tiles
+ *     - Tiles at the bottom row, i.e., Tile[0..i] (From left to right)
+ *     - One row upper, i.e. Tile[i+1 .. j] (From right to left)
+ *     - Repeat until we finish all the rows
+ *
+ *   Note: the tail may not always be on the top-right corner as shown in the figure.
+ *         It may exit at the top-left corner.
+ *         This really depends on the number of rows your have in the core tile array.
+ *
+ *   Note: the organization of inter-tile aims to reduce the wire length
+ *         to connect between tiles. Therefore, it is organized as a snake
+ *         where we can avoid long wires between rows and columns
+ *
 *    +--------------------------------------------------------+
 *    |              +------+------+-----+------+              |
 *    |              | I/O  | I/O  | ... | I/O  |              |
@ -210,7 +232,20 @@ void organize_top_module_tile_memory_modules(ModuleManager& module_manager,
 *                   +------+------+-----+------+              |
 *        head >-----------------------------------------------+
 *
- * Inner tile connection
+ * Inner tile connection:
+ *
+ *   Inside each tile, the configuration memory will be organized
+ *   in the following sequence:
+ *     - Switch Block (SB)
+ *     - X-directional Connection Block (CBX)
+ *     - Y-directional Connection Block (CBY)
+ *     - Configurable Logic Block (CLB), which could also be heterogeneous blocks
+ *
+ *   Note:
+ *     Due to multi-column and multi-width hetergeoenous blocks,
+ *     each tile may not have one or more of SB, CBX, CBY, CLB
+ *     In such case, the sequence will be respected.
+ *     The missing block will just be skipped when organizing the configuration memories.
 *
 *       Tile
 *     +---------------+----------+
--- a/openfpga/src/fabric/module_manager.cpp
+++ b/openfpga/src/fabric/module_manager.cpp
@ -120,7 +120,7 @@ std::string ModuleManager::module_name(const ModuleId& module_id) const {

 /* Get the string of a module port type */
 std::string ModuleManager::module_port_type_str(const enum e_module_port_type& port_type) const {
-  std::array<const char*, NUM_MODULE_PORT_TYPES> MODULE_PORT_TYPE_STRING = {{"GLOBAL PORTS", "GPIO PORTS", "INOUT PORTS", "INPUT PORTS", "OUTPUT PORTS", "CLOCK PORTS"}};
+  std::array<const char*, NUM_MODULE_PORT_TYPES> MODULE_PORT_TYPE_STRING = {{"GLOBAL PORTS", "GPIN PORTS", "GPOUT PORTS", "GPIO PORTS", "INOUT PORTS", "INPUT PORTS", "OUTPUT PORTS", "CLOCK PORTS"}};
  return MODULE_PORT_TYPE_STRING[port_type];
 }

--- a/openfpga/src/fabric/module_manager.h
+++ b/openfpga/src/fabric/module_manager.h
@ -28,6 +28,8 @@ class ModuleManager {
  public: /* Private data structures */
    enum e_module_port_type {
      MODULE_GLOBAL_PORT, /* Global inputs */
+      MODULE_GPIN_PORT,   /* General-purpose input */
+      MODULE_GPOUT_PORT,  /* General-purpose outputs, could be used for spypads */
      MODULE_GPIO_PORT,   /* General-purpose IOs, which are data IOs of the fabric */
      MODULE_INOUT_PORT,  /* Normal (non-global) inout ports */
      MODULE_INPUT_PORT,  /* Normal (non-global) input ports */
--- a/openfpga/src/fpga_bitstream/build_grid_bitstream.cpp
+++ b/openfpga/src/fpga_bitstream/build_grid_bitstream.cpp
@ -685,8 +685,6 @@ void build_grid_bitstream(BitstreamManager& bitstream_manager,
        || (0 < grids[io_coordinate.x()][io_coordinate.y()].height_offset) ) {
        continue;
      }
-      /* We should not meet any I/O grid */
-      VTR_ASSERT(true == is_io_type(grids[io_coordinate.x()][io_coordinate.y()].type));
      build_physical_block_bitstream(bitstream_manager, top_block, module_manager,
                                     circuit_lib, mux_lib,
                                     device_annotation, cluster_annotation, 
--- a/openfpga/src/fpga_bitstream/build_routing_bitstream.cpp
+++ b/openfpga/src/fpga_bitstream/build_routing_bitstream.cpp
@ -114,7 +114,7 @@ void build_switch_block_interc_bitstream(BitstreamManager& bitstream_manager,

  /* Determine if the interc lies inside a channel wire, that is interc between segments */
  if (false == rr_gsb.is_sb_node_passing_wire(rr_graph, chan_side, chan_node_id)) {
-    driver_rr_nodes = get_rr_graph_configurable_driver_nodes(rr_graph, cur_rr_node);
+    driver_rr_nodes = get_rr_gsb_chan_node_configurable_driver_nodes(rr_graph, rr_gsb, chan_side, chan_node_id);
    /* Special: if there are zero-driver nodes. We skip here */
    if (0 == driver_rr_nodes.size()) {
      return; 
--- a/openfpga/src/fpga_sdc/analysis_sdc_grid_writer.cpp
+++ b/openfpga/src/fpga_sdc/analysis_sdc_grid_writer.cpp
@ -0,0 +1,648 @@
+/********************************************************************
+ * This file includes functions that are used to write SDC commands
+ * to disable unused ports of grids, such as Configurable Logic Block
+ * (CLBs), heterogeneous blocks, etc.
+ *******************************************************************/
+/* Headers from vtrutil library */
+#include "vtr_assert.h"
+
+/* Headers from openfpgautil library */
+#include "openfpga_digest.h"
+
+/* Headers from vprutil library */
+#include "vpr_utils.h"
+
+#include "openfpga_reserved_words.h"
+#include "openfpga_naming.h"
+
+#include "pb_type_utils.h"
+
+#include "sdc_writer_utils.h" 
+#include "analysis_sdc_writer_utils.h" 
+#include "analysis_sdc_grid_writer.h" 
+
+/* begin namespace openfpga */
+namespace openfpga {
+
+/********************************************************************
+ * Recursively visit all the pb_types in the hierarchy 
+ * and disable all the ports
+ *
+ * Note: it is a must to disable all the ports in all the child pb_types!
+ * This can prohibit timing analyzer to consider any FF-to-FF path or 
+ * combinatinal path inside an unused grid, when finding critical paths!!!
+ *******************************************************************/
+static 
+void rec_print_analysis_sdc_disable_unused_pb_graph_nodes(std::fstream& fp, 
+                                                          const VprDeviceAnnotation& device_annotation,
+                                                          const ModuleManager& module_manager,
+                                                          const ModuleId& parent_module,
+                                                          const std::string& hierarchy_name,
+                                                          t_pb_graph_node* physical_pb_graph_node) {
+  t_pb_type* physical_pb_type = physical_pb_graph_node->pb_type;
+
+  /* Validate file stream */
+  valid_file_stream(fp);
+    
+  /* Disable all the ports of current module (parent_module)!
+   * Hierarchy name already includes the instance name of parent_module 
+   */
+  fp << "#######################################" << std::endl; 
+  fp << "# Disable all the ports for pb_graph_node " << physical_pb_graph_node->pb_type->name << "[" << physical_pb_graph_node->placement_index << "]" << std::endl;
+  fp << "#######################################" << std::endl; 
+
+  fp << "set_disable_timing ";
+  fp << hierarchy_name; 
+  fp << "*";
+  fp << std::endl;
+
+  /* Return if this is the primitive pb_type */
+  if (true == is_primitive_pb_type(physical_pb_type)) {
+    return;
+  }
+
+  /* Go recursively */
+  t_mode* physical_mode = device_annotation.physical_mode(physical_pb_type);
+
+  /* Disable all the ports by iterating over its instance in the parent module */
+  for (int ichild = 0; ichild < physical_mode->num_pb_type_children; ++ichild) {
+    /* Generate the name of the Verilog module for this child */
+    std::string child_module_name = generate_physical_block_module_name(&(physical_mode->pb_type_children[ichild]));
+
+    ModuleId child_module = module_manager.find_module(child_module_name);
+    VTR_ASSERT(true == module_manager.valid_module_id(child_module));
+
+    /* Each child may exist multiple times in the hierarchy*/
+    for (int inst = 0; inst < physical_mode->pb_type_children[ichild].num_pb; ++inst) {
+      std::string child_instance_name = module_manager.instance_name(parent_module, child_module, module_manager.child_module_instances(parent_module, child_module)[inst]);
+      /* Must have a valid instance name!!! */
+      VTR_ASSERT(false == child_instance_name.empty()); 
+
+      std::string updated_hierarchy_name = hierarchy_name + child_instance_name + std::string("/");
+
+      rec_print_analysis_sdc_disable_unused_pb_graph_nodes(fp, device_annotation, module_manager, child_module, updated_hierarchy_name, 
+                                                           &(physical_pb_graph_node->child_pb_graph_nodes[physical_mode->index][ichild][inst])); 
+    }
+  }
+}
+
+/********************************************************************
+ * Disable an unused pin of a pb_graph_node (parent_module) 
+ *******************************************************************/
+static
+void disable_pb_graph_node_unused_pin(std::fstream& fp, 
+                                      const ModuleManager& module_manager,
+                                      const ModuleId& parent_module,
+                                      const std::string& hierarchy_name,
+                                      const t_pb_graph_pin* pb_graph_pin,
+                                      const PhysicalPb& physical_pb,
+                                      const PhysicalPbId& pb_id) {
+  /* Validate file stream */
+  valid_file_stream(fp);
+    
+  /* Identify if the pb_graph_pin has been used or not
+   * TODO: identify if this is a parasitic net
+   */ 
+  if (AtomNetId::INVALID() != physical_pb.pb_graph_pin_atom_net(pb_id, pb_graph_pin)) {
+    /* Used pin; Nothing to do */
+    return;
+  }
+
+  /* Reach here, it means that this pin is not used. Disable timing analysis for the pin */
+  /* Find the module port by name */
+  std::string module_port_name = generate_pb_type_port_name(pb_graph_pin->port);
+  ModulePortId module_port = module_manager.find_module_port(parent_module, module_port_name);
+  VTR_ASSERT(true == module_manager.valid_module_port_id(parent_module, module_port));
+  BasicPort port_to_disable = module_manager.module_port(parent_module, module_port);
+  port_to_disable.set_width(pb_graph_pin->pin_number, pb_graph_pin->pin_number);
+
+  fp << "set_disable_timing ";
+  fp << hierarchy_name; 
+  fp << generate_sdc_port(port_to_disable);
+  fp << std::endl;
+}
+
+/********************************************************************
+ * Disable unused input ports and output ports of this pb_graph_node (parent_module) 
+ * This function will iterate over all the input pins, output pins
+ * of the physical_pb_graph_node, and check if they are mapped
+ * For unused pins, we will find the port in parent_module
+ * and then print SDC commands to disable them
+ *******************************************************************/
+static
+void disable_pb_graph_node_unused_pins(std::fstream& fp, 
+                                       const ModuleManager& module_manager,
+                                       const ModuleId& parent_module,
+                                       const std::string& hierarchy_name,
+                                       t_pb_graph_node* physical_pb_graph_node,
+                                       const PhysicalPb& physical_pb) {
+  const PhysicalPbId& pb_id = physical_pb.find_pb(physical_pb_graph_node);
+  VTR_ASSERT(true == physical_pb.valid_pb_id(pb_id));
+
+  fp << "#######################################" << std::endl; 
+  fp << "# Disable unused pins for pb_graph_node " << physical_pb_graph_node->pb_type->name << "[" << physical_pb_graph_node->placement_index << "]" << std::endl;
+  fp << "#######################################" << std::endl; 
+
+  /* Disable unused input pins */
+  for (int iport = 0; iport < physical_pb_graph_node->num_input_ports; ++iport) {
+    for (int ipin = 0; ipin < physical_pb_graph_node->num_input_pins[iport]; ++ipin) {
+      disable_pb_graph_node_unused_pin(fp, module_manager, parent_module,
+                                       hierarchy_name,
+                                       &(physical_pb_graph_node->input_pins[iport][ipin]),
+                                       physical_pb, pb_id);
+    }
+  }
+
+  /* Disable unused output pins */
+  for (int iport = 0; iport < physical_pb_graph_node->num_output_ports; ++iport) {
+    for (int ipin = 0; ipin < physical_pb_graph_node->num_output_pins[iport]; ++ipin) {
+      disable_pb_graph_node_unused_pin(fp, module_manager, parent_module,
+                                       hierarchy_name,
+                                       &(physical_pb_graph_node->output_pins[iport][ipin]),
+                                       physical_pb, pb_id);
+    }
+  }
+
+  /* Disable unused clock pins */
+  for (int iport = 0; iport < physical_pb_graph_node->num_clock_ports; ++iport) {
+    for (int ipin = 0; ipin < physical_pb_graph_node->num_clock_pins[iport]; ++ipin) {
+      disable_pb_graph_node_unused_pin(fp, module_manager, parent_module,
+                                       hierarchy_name,
+                                       &(physical_pb_graph_node->clock_pins[iport][ipin]),
+                                       physical_pb, pb_id);
+    }
+  }
+}
+
+/********************************************************************
+ * Disable unused inputs of routing multiplexers of this pb_graph_node 
+ * This function will first cache the nets for each input and output pins 
+ * and store the results in a mux_name-to-net mapping
+ *******************************************************************/
+static 
+void disable_pb_graph_node_unused_mux_inputs(std::fstream& fp, 
+                                             const VprDeviceAnnotation& device_annotation,
+                                             const ModuleManager& module_manager,
+                                             const ModuleId& parent_module,
+                                             const std::string& hierarchy_name,
+                                             t_pb_graph_node* physical_pb_graph_node,
+                                             const PhysicalPb& physical_pb) {
+
+  fp << "#######################################" << std::endl; 
+  fp << "# Disable unused mux_inputs for pb_graph_node " << physical_pb_graph_node->pb_type->name << "[" << physical_pb_graph_node->placement_index << "]" << std::endl;
+  fp << "#######################################" << std::endl; 
+
+  t_pb_type* physical_pb_type = physical_pb_graph_node->pb_type;
+
+  t_mode* physical_mode = device_annotation.physical_mode(physical_pb_type);
+
+  std::map<std::string, AtomNetId> mux_instance_to_net_map;
+
+  /* Cache the nets for each input pins of each child pb_graph_node */
+  for (int ichild = 0; ichild < physical_mode->num_pb_type_children; ++ichild) {
+    for (int inst = 0; inst < physical_mode->pb_type_children[ichild].num_pb; ++inst) {
+
+      t_pb_graph_node* child_pb_graph_node = &(physical_pb_graph_node->child_pb_graph_nodes[physical_mode->index][ichild][inst]); 
+
+      /* Cache the nets for input pins of the child pb_graph_node */
+      for (int iport = 0; iport < child_pb_graph_node->num_input_ports; ++iport) {
+        for (int ipin = 0; ipin < child_pb_graph_node->num_input_pins[iport]; ++ipin) {
+          const PhysicalPbId& pb_id = physical_pb.find_pb(child_pb_graph_node); 
+          VTR_ASSERT(true == physical_pb.valid_pb_id(pb_id));
+          /* Generate the mux name */ 
+          std::string mux_instance_name = generate_pb_mux_instance_name(GRID_MUX_INSTANCE_PREFIX, &(child_pb_graph_node->input_pins[iport][ipin]), std::string(""));
+          /* Cache the net */
+          mux_instance_to_net_map[mux_instance_name] = physical_pb.pb_graph_pin_atom_net(pb_id, &(child_pb_graph_node->input_pins[iport][ipin]));
+        }
+      }
+
+      /* Cache the nets for clock pins of the child pb_graph_node */
+      for (int iport = 0; iport < child_pb_graph_node->num_clock_ports; ++iport) {
+        for (int ipin = 0; ipin < child_pb_graph_node->num_clock_pins[iport]; ++ipin) {
+          const PhysicalPbId& pb_id = physical_pb.find_pb(child_pb_graph_node); 
+          /* Generate the mux name */ 
+          std::string mux_instance_name = generate_pb_mux_instance_name(GRID_MUX_INSTANCE_PREFIX, &(child_pb_graph_node->clock_pins[iport][ipin]), std::string(""));
+          /* Cache the net */
+          mux_instance_to_net_map[mux_instance_name] = physical_pb.pb_graph_pin_atom_net(pb_id, &(child_pb_graph_node->clock_pins[iport][ipin]));
+        }
+      }
+
+    }
+  }
+
+  /* Cache the nets for each output pins of this pb_graph_node */
+  for (int iport = 0; iport < physical_pb_graph_node->num_output_ports; ++iport) {
+    for (int ipin = 0; ipin < physical_pb_graph_node->num_output_pins[iport]; ++ipin) {
+      const PhysicalPbId& pb_id = physical_pb.find_pb(physical_pb_graph_node); 
+      /* Generate the mux name */ 
+      std::string mux_instance_name = generate_pb_mux_instance_name(GRID_MUX_INSTANCE_PREFIX, &(physical_pb_graph_node->output_pins[iport][ipin]), std::string(""));
+      /* Cache the net */
+      mux_instance_to_net_map[mux_instance_name] = physical_pb.pb_graph_pin_atom_net(pb_id, &(physical_pb_graph_node->output_pins[iport][ipin]));
+    }
+  }
+
+  /* Now disable unused inputs of routing multiplexers, by tracing from input pins of the parent_module */ 
+  for (int iport = 0; iport < physical_pb_graph_node->num_input_ports; ++iport) {
+    for (int ipin = 0; ipin < physical_pb_graph_node->num_input_pins[iport]; ++ipin) {
+      /* Find the module port by name */
+      std::string module_port_name = generate_pb_type_port_name(physical_pb_graph_node->input_pins[iport][ipin].port);
+      ModulePortId module_port = module_manager.find_module_port(parent_module, module_port_name);
+      VTR_ASSERT(true == module_manager.valid_module_port_id(parent_module, module_port));
+
+      const PhysicalPbId& pb_id = physical_pb.find_pb(physical_pb_graph_node); 
+      const AtomNetId& mapped_net = physical_pb.pb_graph_pin_atom_net(pb_id, &(physical_pb_graph_node->input_pins[iport][ipin])); 
+
+      disable_analysis_module_input_pin_net_sinks(fp, module_manager, parent_module,
+                                                  hierarchy_name,
+                                                  module_port, ipin,
+                                                  mapped_net,
+                                                  mux_instance_to_net_map);
+    }
+  }
+
+  for (int iport = 0; iport < physical_pb_graph_node->num_clock_ports; ++iport) {
+    for (int ipin = 0; ipin < physical_pb_graph_node->num_clock_pins[iport]; ++ipin) {
+      /* Find the module port by name */
+      std::string module_port_name = generate_pb_type_port_name(physical_pb_graph_node->clock_pins[iport][ipin].port);
+      ModulePortId module_port = module_manager.find_module_port(parent_module, module_port_name);
+      VTR_ASSERT(true == module_manager.valid_module_port_id(parent_module, module_port));
+
+      const PhysicalPbId& pb_id = physical_pb.find_pb(physical_pb_graph_node); 
+      const AtomNetId& mapped_net = physical_pb.pb_graph_pin_atom_net(pb_id, &(physical_pb_graph_node->clock_pins[iport][ipin])); 
+
+      disable_analysis_module_input_pin_net_sinks(fp, module_manager, parent_module,
+                                                  hierarchy_name,
+                                                  module_port, ipin,
+                                                  mapped_net,
+                                                  mux_instance_to_net_map);
+    }
+  }
+
+  /* Now disable unused inputs of routing multiplexers, by tracing from output pins of the child_module */ 
+  for (int ichild = 0; ichild < physical_mode->num_pb_type_children; ++ichild) {
+    /* Generate the name of the Verilog module for this child */
+    std::string child_module_name = generate_physical_block_module_name(&(physical_mode->pb_type_children[ichild]));
+
+    ModuleId child_module = module_manager.find_module(child_module_name);
+    VTR_ASSERT(true == module_manager.valid_module_id(child_module));
+
+    for (int inst = 0; inst < physical_mode->pb_type_children[ichild].num_pb; ++inst) {
+
+      t_pb_graph_node* child_pb_graph_node = &(physical_pb_graph_node->child_pb_graph_nodes[physical_mode->index][ichild][inst]); 
+
+      for (int iport = 0; iport < child_pb_graph_node->num_output_ports; ++iport) {
+        for (int ipin = 0; ipin < child_pb_graph_node->num_output_pins[iport]; ++ipin) {
+          /* Find the module port by name */
+          std::string module_port_name = generate_pb_type_port_name(child_pb_graph_node->output_pins[iport][ipin].port);
+          ModulePortId module_port = module_manager.find_module_port(child_module, module_port_name);
+          VTR_ASSERT(true == module_manager.valid_module_port_id(child_module, module_port));
+
+          const PhysicalPbId& pb_id = physical_pb.find_pb(child_pb_graph_node); 
+          const AtomNetId& mapped_net = physical_pb.pb_graph_pin_atom_net(pb_id, &(child_pb_graph_node->output_pins[iport][ipin])); 
+
+          /* Corner case: if the pb_graph_pin has no fan-out we will skip this pin */
+          if (0 == child_pb_graph_node->output_pins[iport][ipin].num_output_edges) {
+            continue;
+          }
+
+          disable_analysis_module_output_pin_net_sinks(fp, module_manager, parent_module,
+                                                       hierarchy_name,
+                                                       child_module, inst, 
+                                                       module_port, ipin,
+                                                       mapped_net, 
+                                                       mux_instance_to_net_map);
+        }
+      }
+    }
+  }
+}
+
+/********************************************************************
+ * Recursively visit all the pb_types in the hierarchy 
+ * and disable all the unused resources, including:
+ * 1. input ports
+ * 2. output ports
+ * 3. unused inputs of routing multiplexers
+ *
+ * As this function is executed in a recursive way. 
+ * To avoid repeated disable timing for ports, during each run of this function,
+ * only the unused input ports, output ports of the parent module will be disabled.
+ * In addition, we will cache all the net ids mapped to the input ports of 
+ * child modules, and the net ids mapped to the output ports of parent module.
+ * As such, we can trace from 
+ * 1. the input ports of parent module to disable unused inputs of routing multiplexer
+ *    which drives the inputs of child modules
+ *                      
+ *                          Parent_module
+ *                         +---------------------------------------------
+ *                         |          MUX                  child_module
+ *                         |         +-------------+       +--------
+ *    input_pin0(netA) --->|-------->| Routing     |------>|
+ *    input_pin1(netB) --->|----x--->| Multiplexer | netA  |
+ *                         |         +-------------+       |
+ *                         |                               |
+ *
+ * 2. the output ports of child module to disable unused inputs of routing multiplexer
+ *    which drives the outputs of parent modules
+ *
+ *   Case 1: 
+ *                                  parent_module
+ *         --------------------------------------+
+ *         child_module                          |
+ *        -------------+                         |
+ *                     |    +-------------+      |
+ *  output_pin0 (netA) |--->| Routing     |----->|---->
+ *  output_pin1 (netB) |-x->| Multiplexer | netA |
+ *                     |    +-------------+      |
+ *
+ *    Case 2: 
+ *
+ *                         Parent_module
+ *                         +---------------------------------------------
+ *                         |
+ *                         |    +--------------------------------------------+
+ *                         |    |     MUX                  child_module      |
+ *                         |    |    +-------------+       +-----------+     |
+ *                         |    +--->| Routing     |------>|           |     |
+ *    input_pin0(netA) --->|----x--->| Multiplexer | netA  | output_pin|-----+
+ *                         |         +-------------+       |           | netA
+ *                         |                               |           |
+ *
+ *
+ * Note: it is a must to disable all the ports in all the child pb_types!
+ * This can prohibit timing analyzer to consider any FF-to-FF path or 
+ * combinatinal path inside an unused grid, when finding critical paths!!!
+ *******************************************************************/
+static 
+void rec_print_analysis_sdc_disable_pb_graph_node_unused_resources(std::fstream& fp, 
+                                                                   const VprDeviceAnnotation& device_annotation,
+                                                                   const ModuleManager& module_manager,
+                                                                   const ModuleId& parent_module,
+                                                                   const std::string& hierarchy_name,
+                                                                   t_pb_graph_node* physical_pb_graph_node,
+                                                                   const PhysicalPb& physical_pb) {
+  t_pb_type* physical_pb_type = physical_pb_graph_node->pb_type;
+
+  /* Disable unused input ports and output ports of this pb_graph_node (parent_module) */
+  disable_pb_graph_node_unused_pins(fp, module_manager, parent_module,
+                                    hierarchy_name, physical_pb_graph_node, physical_pb); 
+
+  /* Return if this is the primitive pb_type 
+   * Note: this must return before we disable any unused inputs of routing multiplexer!
+   * This is due to that primitive pb_type does NOT contain any routing multiplexers inside!!!   
+   */
+  if (true == is_primitive_pb_type(physical_pb_type)) {
+    return;
+  }
+
+  /* Disable unused inputs of routing multiplexers of this pb_graph_node */
+  disable_pb_graph_node_unused_mux_inputs(fp, device_annotation,
+                                          module_manager, parent_module, 
+                                          hierarchy_name, physical_pb_graph_node,
+                                          physical_pb);
+
+
+  t_mode* physical_mode = device_annotation.physical_mode(physical_pb_type);
+
+  /* Disable all the ports by iterating over its instance in the parent module */
+  for (int ichild = 0; ichild < physical_mode->num_pb_type_children; ++ichild) {
+    /* Generate the name of the Verilog module for this child */
+    std::string child_module_name = generate_physical_block_module_name(&(physical_mode->pb_type_children[ichild]));
+
+    ModuleId child_module = module_manager.find_module(child_module_name);
+    VTR_ASSERT(true == module_manager.valid_module_id(child_module));
+
+    /* Each child may exist multiple times in the hierarchy*/
+    for (int inst = 0; inst < physical_pb_type->modes[physical_mode->index].pb_type_children[ichild].num_pb; ++inst) {
+      std::string child_instance_name = module_manager.instance_name(parent_module, child_module, module_manager.child_module_instances(parent_module, child_module)[inst]);
+      /* Must have a valid instance name!!! */
+      VTR_ASSERT(false == child_instance_name.empty()); 
+
+      std::string updated_hierarchy_name = hierarchy_name + child_instance_name + std::string("/");
+
+      rec_print_analysis_sdc_disable_pb_graph_node_unused_resources(fp, device_annotation,
+                                                                    module_manager, child_module, updated_hierarchy_name, 
+                                                                    &(physical_pb_graph_node->child_pb_graph_nodes[physical_mode->index][ichild][inst]), 
+                                                                    physical_pb); 
+    }
+  }
+}
+
+/********************************************************************
+ * This function can work in two differnt modes:
+ * 1. For partially unused pb blocks
+ * ---------------------------------
+ * Disable the timing for only unused resources in a physical block
+ * We have to walk through pb_graph node, port by port and pin by pin.
+ * Identify which pins have not been used, and then disable the timing 
+ * for these ports. 
+ * Plus, for input ports, we will trace the routing multiplexers
+ * and disable the timing for unused inputs.
+ *
+ * 2. For fully unused pb_blocks
+ * ----------------------------- 
+ * Disable the timing for a fully unused grid!
+ * This is very straightforward!
+ * Just walk through each pb_type and disable all the ports using wildcards
+ *******************************************************************/
+static 
+void print_analysis_sdc_disable_pb_block_unused_resources(std::fstream& fp,
+                                                          t_physical_tile_type_ptr grid_type,
+                                                          const vtr::Point<size_t>& grid_coordinate,
+                                                          const VprDeviceAnnotation& device_annotation,
+                                                          const ModuleManager& module_manager,
+                                                          const std::string& grid_instance_name,
+                                                          const size_t& grid_z,
+                                                          const PhysicalPb& physical_pb,
+                                                          const bool& unused_block) {
+  /* If the block is partially unused, we should have a physical pb */
+  if (false == unused_block) {
+    VTR_ASSERT(false == physical_pb.empty());
+  }
+
+  VTR_ASSERT(1 == grid_type->equivalent_sites.size());
+  t_pb_graph_node* pb_graph_head = grid_type->equivalent_sites[0]->pb_graph_head; 
+  VTR_ASSERT(nullptr != pb_graph_head);
+
+  /* Find an unique name to the pb instance in this grid
+   * Note: this must be consistent with the instance name we used in build_grid_module()!!!
+   */
+  /* TODO: validate that the instance name is used in module manager!!! */
+  std::string pb_module_name = generate_physical_block_module_name(pb_graph_head->pb_type);
+  std::string pb_instance_name = generate_physical_block_instance_name(pb_graph_head->pb_type, grid_z);
+
+  ModuleId pb_module = module_manager.find_module(pb_module_name);
+  VTR_ASSERT(true == module_manager.valid_module_id(pb_module));
+
+  /* Print comments */
+  fp << "#######################################" << std::endl; 
+ 
+  if (true == unused_block) {
+    fp << "# Disable Timing for unused grid[" << grid_coordinate.x() << "][" << grid_coordinate.y() << "][" << grid_z << "]" << std::endl;
+  } else {
+    VTR_ASSERT_SAFE(false == unused_block);
+    fp << "# Disable Timing for unused resources in grid[" << grid_coordinate.x() << "][" << grid_coordinate.y() << "][" << grid_z << "]" << std::endl;
+  }
+
+  fp << "#######################################" << std::endl; 
+
+  std::string hierarchy_name = grid_instance_name + std::string("/") + pb_instance_name + std::string("/");
+
+  /* Go recursively through the pb_graph hierarchy, and disable all the ports level by level */
+  if (true == unused_block) {
+    rec_print_analysis_sdc_disable_unused_pb_graph_nodes(fp, device_annotation,
+                                                         module_manager, pb_module, hierarchy_name,
+                                                         pb_graph_head); 
+  } else { 
+    VTR_ASSERT_SAFE(false == unused_block);
+    rec_print_analysis_sdc_disable_pb_graph_node_unused_resources(fp, device_annotation,
+                                                                  module_manager, pb_module, hierarchy_name,
+                                                                  pb_graph_head, physical_pb); 
+  }
+}
+
+/********************************************************************
+ * Disable the timing for a fully unused grid!
+ * This is very straightforward!
+ * Just walk through each pb_type and disable all the ports using wildcards
+ *******************************************************************/
+static 
+void print_analysis_sdc_disable_unused_grid(std::fstream& fp, 
+                                            const vtr::Point<size_t>& grid_coordinate,
+                                            const DeviceGrid& grids, 
+                                            const VprDeviceAnnotation& device_annotation,
+                                            const VprClusteringAnnotation& cluster_annotation,
+                                            const VprPlacementAnnotation& place_annotation,
+                                            const ModuleManager& module_manager,
+                                            const e_side& border_side) {
+  /* Validate file stream */
+  valid_file_stream(fp);
+
+  t_physical_tile_type_ptr grid_type = grids[grid_coordinate.x()][grid_coordinate.y()].type;
+  /* Bypass conditions for grids : 
+   * 1. EMPTY type, which is by nature unused
+   * 2. Offset > 0, which has already been processed when offset = 0
+   */
+  if ( (true == is_empty_type(grid_type))
+    || (0 < grids[grid_coordinate.x()][grid_coordinate.y()].width_offset)
+    || (0 < grids[grid_coordinate.x()][grid_coordinate.y()].height_offset) ) {
+    return;
+  }
+
+  /* Find an unique name to the grid instane
+   * Note: this must be consistent with the instance name we used in build_top_module()!!!
+   */
+  /* TODO: validate that the instance name is used in module manager!!! */
+  std::string grid_module_name_prefix(GRID_MODULE_NAME_PREFIX);
+  std::string grid_module_name = generate_grid_block_module_name(grid_module_name_prefix, std::string(grid_type->name), is_io_type(grid_type), border_side);
+  std::string grid_instance_name = generate_grid_block_instance_name(grid_module_name_prefix, std::string(grid_type->name), is_io_type(grid_type), border_side, grid_coordinate);
+
+  ModuleId grid_module = module_manager.find_module(grid_module_name);
+  VTR_ASSERT(true == module_manager.valid_module_id(grid_module));
+
+  /* Print comments */
+  fp << "#######################################" << std::endl; 
+  fp << "# Disable Timing for grid[" << grid_coordinate.x() << "][" << grid_coordinate.y() << "]" << std::endl;
+  fp << "#######################################" << std::endl; 
+
+  /* For used grid, find the unused rr_node in the local rr_graph 
+   * and then disable each port which is not used
+   * as well as the unused inputs of routing multiplexers!
+   */
+  size_t grid_z = 0;
+  for (const ClusterBlockId& blk_id : place_annotation.grid_blocks(grid_coordinate)) {
+    if (ClusterBlockId::INVALID() != blk_id) { 
+      const PhysicalPb& physical_pb = cluster_annotation.physical_pb(blk_id);
+      print_analysis_sdc_disable_pb_block_unused_resources(fp, grid_type, grid_coordinate,
+                                                           device_annotation,
+                                                           module_manager, grid_instance_name, grid_z,
+                                                           physical_pb, false);
+    } else {
+      VTR_ASSERT(ClusterBlockId::INVALID() == blk_id);
+      /* For unused grid, disable all the pins in the physical_pb_type */
+      print_analysis_sdc_disable_pb_block_unused_resources(fp, grid_type, grid_coordinate,
+                                                           device_annotation, 
+                                                           module_manager, grid_instance_name, grid_z,
+                                                           PhysicalPb(), true);
+    }
+    grid_z++;
+  }
+}
+
+/********************************************************************
+ * Top-level function writes SDC commands to disable unused ports 
+ * of grids, such as Configurable Logic Block (CLBs), heterogeneous blocks, etc.
+ *
+ * This function will iterate over all the grids available in the FPGA fabric
+ * It will disable the timing analysis for
+ * 1. Grids, which are totally not used (no logic has been mapped to)
+ * 2. Unused part of grids, including the ports, inputs of routing multiplexers 
+ *
+ * Note that it is a must to disable the unused inputs of routing multiplexers
+ * because it will cause unexpected paths in timing analysis
+ * For example:
+ *                           +---------------------+
+ *     inputA (net0) ------->|                     |
+ *                           | Routing multiplexer |----> output (net0)
+ *     inputB (net1) ------->|                     |
+ *                           +---------------------+
+ *
+ * During timing analysis, the path from inputA to output should be considered
+ * while the path from inputB to output should NOT be considered!!!
+ *
+ *******************************************************************/
+void print_analysis_sdc_disable_unused_grids(std::fstream& fp, 
+                                             const DeviceGrid& grids, 
+                                             const VprDeviceAnnotation& device_annotation,
+                                             const VprClusteringAnnotation& cluster_annotation,
+                                             const VprPlacementAnnotation& place_annotation,
+                                             const ModuleManager& module_manager) {
+
+  /* Process unused core grids */
+  for (size_t ix = 1; ix < grids.width() - 1; ++ix) {
+    for (size_t iy = 1; iy < grids.height() - 1; ++iy) {
+      /* We should not meet any I/O grid */
+      VTR_ASSERT(false == is_io_type(grids[ix][iy].type));
+
+      print_analysis_sdc_disable_unused_grid(fp, vtr::Point<size_t>(ix, iy),
+                                             grids, device_annotation, cluster_annotation, place_annotation,
+                                             module_manager, NUM_SIDES);
+    }
+  }
+
+  /* Instanciate I/O grids */
+  /* Create the coordinate range for each side of FPGA fabric */
+  std::vector<e_side> io_sides{TOP, RIGHT, BOTTOM, LEFT};
+  std::map<e_side, std::vector<vtr::Point<size_t>>> io_coordinates;
+
+  /* TOP side*/
+  for (size_t ix = 1; ix < grids.width() - 1; ++ix) { 
+    io_coordinates[TOP].push_back(vtr::Point<size_t>(ix, grids.height() - 1));
+  } 
+
+  /* RIGHT side */
+  for (size_t iy = 1; iy < grids.height() - 1; ++iy) { 
+    io_coordinates[RIGHT].push_back(vtr::Point<size_t>(grids.width() - 1, iy));
+  } 
+
+  /* BOTTOM side*/
+  for (size_t ix = 1; ix < grids.width() - 1; ++ix) { 
+    io_coordinates[BOTTOM].push_back(vtr::Point<size_t>(ix, 0));
+  } 
+
+  /* LEFT side */
+  for (size_t iy = 1; iy < grids.height() - 1; ++iy) { 
+    io_coordinates[LEFT].push_back(vtr::Point<size_t>(0, iy));
+  }
+
+  /* Add instances of I/O grids to top_module */
+  for (const e_side& io_side : io_sides) {
+    for (const vtr::Point<size_t>& io_coordinate : io_coordinates[io_side]) {
+      print_analysis_sdc_disable_unused_grid(fp, io_coordinate,
+                                             grids, device_annotation, cluster_annotation, place_annotation,
+                                             module_manager, io_side);
+    }
+  }
+}
+
+} /* end namespace openfpga */
--- a/openfpga/src/fpga_sdc/analysis_sdc_grid_writer.h
+++ b/openfpga/src/fpga_sdc/analysis_sdc_grid_writer.h
@ -0,0 +1,31 @@
+#ifndef ANALYSIS_SDC_GRID_WRITER_H
+#define ANALYSIS_SDC_GRID_WRITER_H
+
+/********************************************************************
+ * Include header files that are required by function declaration
+ *******************************************************************/
+#include <fstream>
+#include <vector>
+#include "device_grid.h"
+#include "module_manager.h"
+#include "vpr_device_annotation.h"
+#include "vpr_clustering_annotation.h"
+#include "vpr_placement_annotation.h"
+
+/********************************************************************
+ * Function declaration
+ *******************************************************************/
+
+/* begin namespace openfpga */
+namespace openfpga {
+
+void print_analysis_sdc_disable_unused_grids(std::fstream& fp, 
+                                             const DeviceGrid& grids, 
+                                             const VprDeviceAnnotation& device_annotation,
+                                             const VprClusteringAnnotation& cluster_annotation,
+                                             const VprPlacementAnnotation& place_annotation,
+                                             const ModuleManager& module_manager);
+
+} /* end namespace openfpga */
+
+#endif
--- a/openfpga/src/fpga_sdc/analysis_sdc_routing_writer.cpp
+++ b/openfpga/src/fpga_sdc/analysis_sdc_routing_writer.cpp
@ -0,0 +1,540 @@
+/********************************************************************
+ * This file includes functions that are used to output a SDC file
+ * that constrain routing modules of a FPGA fabric (P&Red netlist) 
+ * using a benchmark 
+ *******************************************************************/
+#include <map>
+
+/* Headers from vtrutil library */
+#include "vtr_assert.h"
+
+/* Headers from openfpgautil library */
+#include "openfpga_digest.h"
+#include "openfpga_side_manager.h"
+#include "openfpga_port.h"
+
+#include "openfpga_reserved_words.h"
+#include "openfpga_naming.h"
+
+#include "sdc_writer_utils.h"
+#include "analysis_sdc_writer_utils.h"
+#include "analysis_sdc_routing_writer.h"
+
+/* begin namespace openfpga */
+namespace openfpga {
+
+/********************************************************************
+ * This function will disable 
+ * 1. all the unused port (unmapped by a benchmark) of a connection block
+ * 2. all the unused inputs (unmapped by a benchmark) of routing multiplexers 
+ *    in a connection block
+ *******************************************************************/
+static 
+void print_analysis_sdc_disable_cb_unused_resources(std::fstream& fp, 
+                                                    const AtomContext& atom_ctx, 
+                                                    const ModuleManager& module_manager, 
+                                                    const RRGraph& rr_graph, 
+                                                    const VprRoutingAnnotation& routing_annotation, 
+                                                    const DeviceRRGSB& device_rr_gsb,
+                                                    const RRGSB& rr_gsb, 
+                                                    const t_rr_type& cb_type,
+                                                    const bool& compact_routing_hierarchy) {
+  /* Validate file stream */
+  valid_file_stream(fp);
+
+  vtr::Point<size_t> gsb_coordinate(rr_gsb.get_cb_x(cb_type), rr_gsb.get_cb_y(cb_type));
+
+  std::string cb_instance_name = generate_connection_block_module_name(cb_type, gsb_coordinate);
+
+  /* If we use the compact routing hierarchy, we need to find the module name !*/
+  vtr::Point<size_t> cb_coordinate(rr_gsb.get_cb_x(cb_type), rr_gsb.get_cb_y(cb_type));
+  if (true == compact_routing_hierarchy) {
+    vtr::Point<size_t> cb_coord(rr_gsb.get_x(), rr_gsb.get_y());
+    /* Note: use GSB coordinate when inquire for unique modules!!! */
+    const RRGSB& unique_mirror = device_rr_gsb.get_cb_unique_module(cb_type, cb_coord);
+    cb_coordinate.set_x(unique_mirror.get_cb_x(cb_type)); 
+    cb_coordinate.set_y(unique_mirror.get_cb_y(cb_type)); 
+  }
+
+  std::string cb_module_name = generate_connection_block_module_name(cb_type, cb_coordinate);
+
+  ModuleId cb_module = module_manager.find_module(cb_module_name);
+  VTR_ASSERT(true == module_manager.valid_module_id(cb_module));
+
+  /* Print comments */
+  fp << "##################################################" << std::endl; 
+  fp << "# Disable timing for Connection block " << cb_module_name << std::endl;
+  fp << "##################################################" << std::endl; 
+
+  /* Disable all the input port (routing tracks), which are not used by benchmark */
+  for (size_t itrack = 0; itrack < rr_gsb.get_cb_chan_width(cb_type); ++itrack) {
+    const RRNodeId& chan_node = rr_gsb.get_chan_node(rr_gsb.get_cb_chan_side(cb_type), itrack);
+    /* Check if this node is used by benchmark  */
+    if (false == is_rr_node_to_be_disable_for_analysis(routing_annotation, chan_node)) {
+      continue;
+    }
+
+    /* Disable both input of the routing track if it is not used! */
+    std::string port_name = generate_cb_module_track_port_name(cb_type,
+                                                               itrack,  
+                                                               IN_PORT);
+
+    /* Ensure we have this port in the module! */
+    ModulePortId module_port = module_manager.find_module_port(cb_module, port_name);
+    VTR_ASSERT(true == module_manager.valid_module_port_id(cb_module, module_port));
+
+    fp << "set_disable_timing ";
+    fp << cb_instance_name << "/";
+    fp << generate_sdc_port(module_manager.module_port(cb_module, module_port));
+    fp << std::endl;
+  }
+
+  /* Disable all the output port (routing tracks), which are not used by benchmark */
+  for (size_t itrack = 0; itrack < rr_gsb.get_cb_chan_width(cb_type); ++itrack) {
+    const RRNodeId& chan_node = rr_gsb.get_chan_node(rr_gsb.get_cb_chan_side(cb_type), itrack);
+    /* Check if this node is used by benchmark  */
+    if (false == is_rr_node_to_be_disable_for_analysis(routing_annotation, chan_node)) {
+      continue;
+    }
+
+    /* Disable both input of the routing track if it is not used! */
+    std::string port_name = generate_cb_module_track_port_name(cb_type,
+                                                               itrack,  
+                                                               OUT_PORT);
+
+    /* Ensure we have this port in the module! */
+    ModulePortId module_port = module_manager.find_module_port(cb_module, port_name);
+    VTR_ASSERT(true == module_manager.valid_module_port_id(cb_module, module_port));
+
+    fp << "set_disable_timing ";
+    fp << cb_instance_name << "/";
+    fp << generate_sdc_port(module_manager.module_port(cb_module, module_port));
+    fp << std::endl;
+  }
+
+  /* Build a map between mux_instance name and net_num */
+  std::map<std::string, AtomNetId> mux_instance_to_net_map;
+
+  /* Disable all the output port (grid input pins), which are not used by benchmark */
+  std::vector<enum e_side> cb_sides = rr_gsb.get_cb_ipin_sides(cb_type);
+
+  for (size_t side = 0; side < cb_sides.size(); ++side) {
+    enum e_side cb_ipin_side = cb_sides[side];
+    for (size_t inode = 0; inode < rr_gsb.get_num_ipin_nodes(cb_ipin_side); ++inode) {
+      RRNodeId ipin_node = rr_gsb.get_ipin_node(cb_ipin_side, inode);
+
+      /* Find the MUX instance that drives the IPIN! */
+      std::string mux_instance_name = generate_cb_mux_instance_name(CONNECTION_BLOCK_MUX_INSTANCE_PREFIX, rr_graph.node_side(ipin_node), inode, std::string(""));
+      mux_instance_to_net_map[mux_instance_name] = atom_ctx.lookup.atom_net(routing_annotation.rr_node_net(ipin_node));  
+
+      if (false == is_rr_node_to_be_disable_for_analysis(routing_annotation, ipin_node)) {
+        continue;
+      }
+
+      if (0 == std::distance(rr_graph.node_configurable_in_edges(ipin_node).begin(), rr_graph.node_configurable_in_edges(ipin_node).end())) {
+        continue;
+      }
+
+      std::string port_name = generate_cb_module_grid_port_name(cb_ipin_side,
+                                                                rr_graph.node_pin_num(ipin_node)); 
+
+      /* Find the port in unique mirror! */
+      if (true == compact_routing_hierarchy) {
+        /* Note: use GSB coordinate when inquire for unique modules!!! */
+        vtr::Point<size_t> cb_coord(rr_gsb.get_x(), rr_gsb.get_y());
+        const RRGSB& unique_mirror = device_rr_gsb.get_cb_unique_module(cb_type, cb_coord);
+        const RRNodeId& unique_mirror_ipin_node = unique_mirror.get_ipin_node(cb_ipin_side, inode);
+        port_name = generate_cb_module_grid_port_name(cb_ipin_side, 
+                                                      rr_graph.node_pin_num(unique_mirror_ipin_node)); 
+      }
+
+      /* Ensure we have this port in the module! */
+      ModulePortId module_port = module_manager.find_module_port(cb_module, port_name);
+      VTR_ASSERT(true == module_manager.valid_module_port_id(cb_module, module_port));
+
+      fp << "set_disable_timing ";
+      fp << cb_instance_name << "/";
+      fp << generate_sdc_port(module_manager.module_port(cb_module, module_port));
+      fp << std::endl;
+    }
+  }
+
+  /* Disable all the unused inputs of routing multiplexers, which are not used by benchmark 
+   * Here, we start from each input of the Connection Blocks, and traverse forward to the sink 
+   * port of the module net whose source is the input
+   * We will find the instance name which is the parent of the sink port, and search the 
+   * net id through the instance_name_to_net_map 
+   * The the net id does not match the net id of this input, we will disable the sink port!
+   *
+   *                   cb_module
+   *                 +-----------------------
+   *                 |           MUX instance A
+   *                 |          +-----------
+   *   input_port--->|--+---x-->| sink port (disable!)
+   *                 |  |       +----------
+   *                 |  |        MUX instance B
+   *                 |  |       +----------
+   *                 |  +------>| sink port (do not disable!)
+   */ 
+  for (size_t itrack = 0; itrack < rr_gsb.get_cb_chan_width(cb_type); ++itrack) {
+    const RRNodeId& chan_node = rr_gsb.get_chan_node(rr_gsb.get_cb_chan_side(cb_type), itrack);
+
+    /* Disable both input of the routing track if it is not used! */
+    std::string port_name = generate_cb_module_track_port_name(cb_type,
+                                                               itrack,  
+                                                               OUT_PORT);
+
+    /* Ensure we have this port in the module! */
+    ModulePortId module_port = module_manager.find_module_port(cb_module, port_name);
+    VTR_ASSERT(true == module_manager.valid_module_port_id(cb_module, module_port));
+
+    AtomNetId mapped_atom_net = atom_ctx.lookup.atom_net(routing_annotation.rr_node_net(chan_node)); 
+
+    disable_analysis_module_input_port_net_sinks(fp, 
+                                                 module_manager, cb_module,
+                                                 cb_instance_name,
+                                                 module_port,
+                                                 mapped_atom_net,
+                                                 mux_instance_to_net_map);
+  }
+}
+
+/********************************************************************
+ * Iterate over all the connection blocks in a device
+ * and disable unused ports for each of them 
+ *******************************************************************/
+static 
+void print_analysis_sdc_disable_unused_cb_ports(std::fstream& fp,
+                                                const AtomContext& atom_ctx, 
+                                                const ModuleManager& module_manager, 
+                                                const RRGraph& rr_graph, 
+                                                const VprRoutingAnnotation& routing_annotation, 
+                                                const DeviceRRGSB& device_rr_gsb,
+                                                const t_rr_type& cb_type,
+                                                const bool& compact_routing_hierarchy) {
+  /* Build unique X-direction connection block modules */
+  vtr::Point<size_t> cb_range = device_rr_gsb.get_gsb_range();
+
+  for (size_t ix = 0; ix < cb_range.x(); ++ix) {
+    for (size_t iy = 0; iy < cb_range.y(); ++iy) {
+      /* Check if the connection block exists in the device!
+       * Some of them do NOT exist due to heterogeneous blocks (height > 1) 
+       * We will skip those modules
+       */
+      const RRGSB& rr_gsb = device_rr_gsb.get_gsb(ix, iy);
+      if (false == rr_gsb.is_cb_exist(cb_type)) {
+        continue;
+      }
+
+      print_analysis_sdc_disable_cb_unused_resources(fp, 
+                                                     atom_ctx, 
+                                                     module_manager, 
+                                                     rr_graph, 
+                                                     routing_annotation, 
+                                                     device_rr_gsb, 
+                                                     rr_gsb, 
+                                                     cb_type,
+                                                     compact_routing_hierarchy);
+    }
+  }
+}
+
+/********************************************************************
+ * Iterate over all the connection blocks in a device
+ * and disable unused ports for each of them 
+ *******************************************************************/
+void print_analysis_sdc_disable_unused_cbs(std::fstream& fp,
+                                           const AtomContext& atom_ctx, 
+                                           const ModuleManager& module_manager, 
+                                           const RRGraph& rr_graph, 
+                                           const VprRoutingAnnotation& routing_annotation, 
+                                           const DeviceRRGSB& device_rr_gsb,
+                                           const bool& compact_routing_hierarchy) {
+
+  print_analysis_sdc_disable_unused_cb_ports(fp, atom_ctx,
+                                             module_manager, 
+                                             rr_graph, 
+                                             routing_annotation,
+                                             device_rr_gsb,
+                                             CHANX, compact_routing_hierarchy);
+
+  print_analysis_sdc_disable_unused_cb_ports(fp, atom_ctx,
+                                             module_manager, 
+                                             rr_graph, 
+                                             routing_annotation,
+                                             device_rr_gsb,
+                                             CHANY, compact_routing_hierarchy);
+}
+
+/********************************************************************
+ * This function will disable 
+ * 1. all the unused port (unmapped by a benchmark) of a switch block
+ * 2. all the unused inputs (unmapped by a benchmark) of routing multiplexers 
+ *    in a switch block
+ *******************************************************************/
+static 
+void print_analysis_sdc_disable_sb_unused_resources(std::fstream& fp, 
+                                                    const AtomContext& atom_ctx, 
+                                                    const ModuleManager& module_manager, 
+                                                    const RRGraph& rr_graph, 
+                                                    const VprRoutingAnnotation& routing_annotation, 
+                                                    const DeviceRRGSB& device_rr_gsb,
+                                                    const RRGSB& rr_gsb, 
+                                                    const bool& compact_routing_hierarchy) {
+  /* Validate file stream */
+  valid_file_stream(fp);
+
+  vtr::Point<size_t> gsb_coordinate(rr_gsb.get_sb_x(), rr_gsb.get_sb_y());
+
+  std::string sb_instance_name = generate_switch_block_module_name(gsb_coordinate);
+
+  /* If we use the compact routing hierarchy, we need to find the module name !*/
+  vtr::Point<size_t> sb_coordinate(rr_gsb.get_sb_x(), rr_gsb.get_sb_y());
+  if (true == compact_routing_hierarchy) {
+    vtr::Point<size_t> sb_coord(rr_gsb.get_x(), rr_gsb.get_y());
+    /* Note: use GSB coordinate when inquire for unique modules!!! */
+    const RRGSB& unique_mirror = device_rr_gsb.get_sb_unique_module(sb_coord);
+    sb_coordinate.set_x(unique_mirror.get_sb_x()); 
+    sb_coordinate.set_y(unique_mirror.get_sb_y()); 
+  }
+
+  std::string sb_module_name = generate_switch_block_module_name(sb_coordinate);
+
+  ModuleId sb_module = module_manager.find_module(sb_module_name);
+  VTR_ASSERT(true == module_manager.valid_module_id(sb_module));
+
+  /* Print comments */
+  fp << "##################################################" << std::endl; 
+  fp << "# Disable timing for Switch block " << sb_module_name << std::endl;
+  fp << "##################################################" << std::endl; 
+
+  /* Build a map between mux_instance name and net_num */
+  std::map<std::string, AtomNetId> mux_instance_to_net_map;
+
+  /* Disable all the input/output port (routing tracks), which are not used by benchmark */
+  for (size_t side = 0; side < rr_gsb.get_num_sides(); ++side) {
+    SideManager side_manager(side);
+
+    for (size_t itrack = 0; itrack < rr_gsb.get_chan_width(side_manager.get_side()); ++itrack) {
+      const RRNodeId& chan_node = rr_gsb.get_chan_node(side_manager.get_side(), itrack);
+
+      std::string port_name = generate_sb_module_track_port_name(rr_graph.node_type(rr_gsb.get_chan_node(side_manager.get_side(), itrack)),
+                                                                 side_manager.get_side(), itrack,  
+                                                                 rr_gsb.get_chan_node_direction(side_manager.get_side(), itrack));
+
+      if (true == compact_routing_hierarchy) {
+        /* Note: use GSB coordinate when inquire for unique modules!!! */
+        vtr::Point<size_t> sb_coord(rr_gsb.get_x(), rr_gsb.get_y());
+        const RRGSB& unique_mirror = device_rr_gsb.get_sb_unique_module(sb_coord);
+        port_name = generate_sb_module_track_port_name(rr_graph.node_type(unique_mirror.get_chan_node(side_manager.get_side(), itrack)),
+                                                       side_manager.get_side(), itrack,  
+                                                       unique_mirror.get_chan_node_direction(side_manager.get_side(), itrack));
+      }
+
+      /* Ensure we have this port in the module! */
+      ModulePortId module_port = module_manager.find_module_port(sb_module, port_name);
+      VTR_ASSERT(true == module_manager.valid_module_port_id(sb_module, module_port));
+
+      /* Cache the net name for routing tracks which are outputs of the switch block */
+      if (OUT_PORT == rr_gsb.get_chan_node_direction(side_manager.get_side(), itrack)) {
+        /* Generate the name of mux instance related to this output node */
+        std::string mux_instance_name = generate_sb_memory_instance_name(SWITCH_BLOCK_MUX_INSTANCE_PREFIX, side_manager.get_side(), itrack, std::string(""));
+        mux_instance_to_net_map[mux_instance_name] = atom_ctx.lookup.atom_net(routing_annotation.rr_node_net(chan_node));
+      }
+
+      /* Check if this node is used by benchmark  */
+      if (false == is_rr_node_to_be_disable_for_analysis(routing_annotation, chan_node)) {
+        continue;
+      }
+
+      fp << "set_disable_timing ";
+      fp << sb_instance_name << "/";
+      fp << generate_sdc_port(module_manager.module_port(sb_module, module_port));
+      fp << std::endl;
+    }
+  }
+
+  /* Disable all the input port (grid output pins), which are not used by benchmark */
+  for (size_t side = 0; side < rr_gsb.get_num_sides(); ++side) {
+    SideManager side_manager(side);
+
+    for (size_t inode = 0; inode < rr_gsb.get_num_opin_nodes(side_manager.get_side()); ++inode) {
+      const RRNodeId& opin_node = rr_gsb.get_opin_node(side_manager.get_side(), inode);
+
+      std::string port_name = generate_sb_module_grid_port_name(side_manager.get_side(), 
+                                                                rr_graph.node_side(opin_node),
+                                                                rr_graph.node_pin_num(opin_node)); 
+
+      if (true == compact_routing_hierarchy) {
+        /* Note: use GSB coordinate when inquire for unique modules!!! */
+        vtr::Point<size_t> sb_coord(rr_gsb.get_x(), rr_gsb.get_y());
+        const RRGSB& unique_mirror = device_rr_gsb.get_sb_unique_module(sb_coord);
+        const RRNodeId& unique_mirror_opin_node = unique_mirror.get_opin_node(side_manager.get_side(), inode);
+
+        port_name = generate_sb_module_grid_port_name(side_manager.get_side(),
+                                                      rr_graph.node_side(unique_mirror_opin_node),
+                                                      rr_graph.node_pin_num(unique_mirror_opin_node)); 
+      }
+
+
+      /* Ensure we have this port in the module! */
+      ModulePortId module_port = module_manager.find_module_port(sb_module, port_name);
+      VTR_ASSERT(true == module_manager.valid_module_port_id(sb_module, module_port));
+
+      /* Check if this node is used by benchmark  */
+      if (false == is_rr_node_to_be_disable_for_analysis(routing_annotation, opin_node)) {
+        continue;
+      }
+
+      fp << "set_disable_timing ";
+      fp << sb_instance_name << "/";
+      fp << generate_sdc_port(module_manager.module_port(sb_module, module_port));
+      fp << std::endl;
+    }
+  }
+
+  /* Disable all the unused inputs of routing multiplexers, which are not used by benchmark 
+   * Here, we start from each input of the Switch Blocks, and traverse forward to the sink 
+   * port of the module net whose source is the input
+   * We will find the instance name which is the parent of the sink port, and search the 
+   * net id through the instance_name_to_net_map 
+   * The the net id does not match the net id of this input, we will disable the sink port!
+   *
+   *                   sb_module
+   *                 +-----------------------
+   *                 |           MUX instance A
+   *                 |          +-----------
+   *   input_port--->|--+---x-->| sink port (disable! net_id = Y) 
+   *   (net_id = X)  |  |       +----------
+   *                 |  |        MUX instance B
+   *                 |  |       +----------
+   *                 |  +------>| sink port (do not disable! net_id = X)
+   *
+   * Because the input ports of a SB module come from 
+   * 1. Grid output pins
+   * 2. routing tracks
+   * We will walk through these ports and do conditionally disable_timing
+   */ 
+
+  /* Iterate over input ports coming from grid output pins */
+  for (size_t side = 0; side < rr_gsb.get_num_sides(); ++side) {
+    SideManager side_manager(side);
+
+    for (size_t inode = 0; inode < rr_gsb.get_num_opin_nodes(side_manager.get_side()); ++inode) {
+      const RRNodeId& opin_node = rr_gsb.get_opin_node(side_manager.get_side(), inode);
+
+      std::string port_name = generate_sb_module_grid_port_name(side_manager.get_side(),
+                                                                rr_graph.node_side(opin_node),
+                                                                rr_graph.node_pin_num(opin_node)); 
+
+      if (true == compact_routing_hierarchy) {
+        /* Note: use GSB coordinate when inquire for unique modules!!! */
+        vtr::Point<size_t> sb_coord(rr_gsb.get_x(), rr_gsb.get_y());
+        const RRGSB& unique_mirror = device_rr_gsb.get_sb_unique_module(sb_coord);
+        const RRNodeId& unique_mirror_opin_node = unique_mirror.get_opin_node(side_manager.get_side(), inode);
+
+        port_name = generate_sb_module_grid_port_name(side_manager.get_side(),
+                                                      rr_graph.node_side(unique_mirror_opin_node),
+                                                      rr_graph.node_pin_num(unique_mirror_opin_node)); 
+      }
+
+
+      /* Ensure we have this port in the module! */
+      ModulePortId module_port = module_manager.find_module_port(sb_module, port_name);
+      VTR_ASSERT(true == module_manager.valid_module_port_id(sb_module, module_port));
+
+      AtomNetId mapped_atom_net = atom_ctx.lookup.atom_net(routing_annotation.rr_node_net(opin_node));
+
+      disable_analysis_module_input_port_net_sinks(fp, module_manager,
+                                                   sb_module,
+                                                   sb_instance_name,
+                                                   module_port,
+                                                   mapped_atom_net,
+                                                   mux_instance_to_net_map);
+    }
+  }
+
+  /* Iterate over input ports coming from routing tracks */
+  for (size_t side = 0; side < rr_gsb.get_num_sides(); ++side) {
+    SideManager side_manager(side);
+
+    for (size_t itrack = 0; itrack < rr_gsb.get_chan_width(side_manager.get_side()); ++itrack) {
+      /* Skip output ports, they have already been disabled or not */
+      if (OUT_PORT == rr_gsb.get_chan_node_direction(side_manager.get_side(), itrack)) {
+        continue;
+      }
+
+      const RRNodeId& chan_node = rr_gsb.get_chan_node(side_manager.get_side(), itrack);
+
+      std::string port_name = generate_sb_module_track_port_name(rr_graph.node_type(chan_node),
+                                                                 side_manager.get_side(), itrack,  
+                                                                 rr_gsb.get_chan_node_direction(side_manager.get_side(), itrack));
+
+      if (true == compact_routing_hierarchy) {
+        /* Note: use GSB coordinate when inquire for unique modules!!! */
+        vtr::Point<size_t> sb_coord(rr_gsb.get_x(), rr_gsb.get_y());
+        const RRGSB& unique_mirror = device_rr_gsb.get_sb_unique_module(sb_coord);
+        const RRNodeId& unique_mirror_chan_node = unique_mirror.get_chan_node(side_manager.get_side(), itrack);
+
+        port_name = generate_sb_module_track_port_name(rr_graph.node_type(unique_mirror_chan_node),
+                                                       side_manager.get_side(), itrack,  
+                                                       unique_mirror.get_chan_node_direction(side_manager.get_side(), itrack));
+      }
+
+
+      /* Ensure we have this port in the module! */
+      ModulePortId module_port = module_manager.find_module_port(sb_module, port_name);
+      VTR_ASSERT(true == module_manager.valid_module_port_id(sb_module, module_port));
+
+      AtomNetId mapped_atom_net = atom_ctx.lookup.atom_net(routing_annotation.rr_node_net(chan_node));
+
+      disable_analysis_module_input_port_net_sinks(fp, module_manager,
+                                                   sb_module,
+                                                   sb_instance_name,
+                                                   module_port,
+                                                   mapped_atom_net,
+                                                   mux_instance_to_net_map);
+    }
+  }
+}
+
+
+/********************************************************************
+ * Iterate over all the connection blocks in a device
+ * and disable unused ports for each of them 
+ *******************************************************************/
+void print_analysis_sdc_disable_unused_sbs(std::fstream& fp,
+                                           const AtomContext& atom_ctx, 
+                                           const ModuleManager& module_manager, 
+                                           const RRGraph& rr_graph, 
+                                           const VprRoutingAnnotation& routing_annotation, 
+                                           const DeviceRRGSB& device_rr_gsb,
+                                           const bool& compact_routing_hierarchy) {
+
+  /* Build unique X-direction connection block modules */
+  vtr::Point<size_t> sb_range = device_rr_gsb.get_gsb_range();
+
+  for (size_t ix = 0; ix < sb_range.x(); ++ix) {
+    for (size_t iy = 0; iy < sb_range.y(); ++iy) {
+      /* Check if the connection block exists in the device!
+       * Some of them do NOT exist due to heterogeneous blocks (height > 1) 
+       * We will skip those modules
+       */
+      const RRGSB& rr_gsb = device_rr_gsb.get_gsb(ix, iy);
+      if (false == rr_gsb.is_sb_exist()) {
+        continue;
+      }
+
+      print_analysis_sdc_disable_sb_unused_resources(fp,
+                                                     atom_ctx, 
+                                                     module_manager, 
+                                                     rr_graph, 
+                                                     routing_annotation, 
+                                                     device_rr_gsb, 
+                                                     rr_gsb, 
+                                                     compact_routing_hierarchy);
+    }
+  }
+}
+
+} /* end namespace openfpga */
--- a/openfpga/src/fpga_sdc/analysis_sdc_routing_writer.h
+++ b/openfpga/src/fpga_sdc/analysis_sdc_routing_writer.h
@ -0,0 +1,39 @@
+#ifndef ANALYSIS_SDC_ROUTING_WRITER_H
+#define ANALYSIS_SDC_ROUTING_WRITER_H
+
+/********************************************************************
+ * Include header files that are required by function declaration
+ *******************************************************************/
+#include <fstream>
+#include <vector>
+#include "vpr_context.h"
+#include "module_manager.h"
+#include "device_rr_gsb.h"
+#include "vpr_routing_annotation.h"
+
+/********************************************************************
+ * Function declaration
+ *******************************************************************/
+
+/* begin namespace openfpga */
+namespace openfpga {
+
+void print_analysis_sdc_disable_unused_cbs(std::fstream& fp,
+                                           const AtomContext& atom_ctx, 
+                                           const ModuleManager& module_manager, 
+                                           const RRGraph& rr_graph, 
+                                           const VprRoutingAnnotation& routing_annotation, 
+                                           const DeviceRRGSB& device_rr_gsb,
+                                           const bool& compact_routing_hierarchy);
+
+void print_analysis_sdc_disable_unused_sbs(std::fstream& fp,
+                                           const AtomContext& atom_ctx, 
+                                           const ModuleManager& module_manager, 
+                                           const RRGraph& rr_graph, 
+                                           const VprRoutingAnnotation& routing_annotation, 
+                                           const DeviceRRGSB& device_rr_gsb,
+                                           const bool& compact_routing_hierarchy);
+
+} /* end namespace openfpga */
+
+#endif
--- a/Show More
+++ b/Show More