axi_interconnect

This document contains technical documentation for the axi_interconnect module.

Version

This file is part of axi_interconnect release 0.0.0 collected at 2022-01-07 12:02.

Releases from Truestream follow the semantic versioning scheme: MAJOR.MINOR.PATCH+HASH.

  • MAJOR will be incremented for incompatible API or functionality changes.

  • MINOR will be incremented when new functionality is added in a backwards compatible manner.

  • PATCH will be incremented for backwards compatible bug fixes.

The HASH field is the git sha that the release was made from. It is included in the version number for internal traceability.

Release notes

Changelog and release history for the axi_interconnect module. Changelogs from Truestream follow the keep a changelog format.

Requirements

This module has the following dependencies:

  • The open-source hdl_modules project version 1.0.0.

  • The Truestream module axi_data_width_converter version 2.0.0.

  • The Truestream module axi_crossbar version 3.0.0.

Library name

This module’s source files shall be compiled to a VHDL library symbolically named axi_interconnect.

Overview

An AXI interconnect is a glue logic box that instantiates a combination of

  1. AXI data width converter,

  2. AXI clock domain crossing,

  3. AXI data FIFO,

  4. AXI read/write throttling,

  5. AXI crossbar

for each port depending on it’s attributes.

The goal of this AXI interconnect implementation developed by Truestream is to deliver 100% throughput in all scenarios. This means never stalling the data bus by even a single cycle, even if the user application is not always well-behaved in an AXI sense.

Examples

Example with left-side processing

Below is an illustration of a few scenarios with processing on the left side of the crossbar. This is the typical use case, and in this scenario the interconnect guarantees 100% throughput to the right side.

digraph my_graph {
graph [ dpi = 300 splines=ortho];
rankdir="LR";

left_port_0 [ shape=none label="Left port 0:\n64 bit\n100 MHz\nwell-behaved" ];
axi_cdc_0 [ shape=box label="AXI CDC"];
axi_write_throttle_0 [ shape=box label="AXI write\nthrottle"];
left_port_0 -> axi_cdc_0 [ dir="none" weight=10 ];
axi_cdc_0 -> axi_write_throttle_0 [ dir="none" weight=10 ];

left_port_1 [ shape=none label="Left port 1:\n128 bit\n100 MHz\nwell-behaved" ];
axi_cdc_1 [ shape=box label="AXI CDC"];
axi_data_width_converter_1 [ shape=box label="AXI data\nwidth converter"];
left_port_1 -> axi_cdc_1 [ dir="none" ];
axi_cdc_1 -> axi_data_width_converter_1 [ dir="none" weight=10 ];

left_port_2 [ shape=none label="Left port 2:\n128 bit\n250 MHz\nill-behaved" ];
axi_data_width_converter_2 [ shape=box label="AXI data\nwidth converter"];
axi_cdc_2 [ shape=box label="AXI CDC"];
axi_write_throttle_2 [ shape=box label="AXI write\nthrottle"];
left_port_2 -> axi_data_width_converter_2 [ dir="none" ];
axi_data_width_converter_2 -> axi_cdc_2 [ dir="none" weight=10 ];
axi_cdc_2 -> axi_write_throttle_2 [ dir="none" weight=10 ];

left_port_3 [ shape=none label="" ];
dots_3 [ shape=none label=".\n.\n." ];
left_port_3 -> dots_3 [ style=invis weight=10 ];

{
  rank=same; left_port_0; left_port_1; left_port_2; left_port_3;
}

{
  rank=same; dots_3; axi_cdc_2;
}

axi_write_crossbar [ shape=box label="AXI write\ncrossbar" height=4.4];
axi_write_throttle_0 -> axi_write_crossbar [ dir="none" ];
axi_data_width_converter_1 -> axi_write_crossbar [ dir="none" ];
axi_write_throttle_2 -> axi_write_crossbar [ dir="none" ];
dots_3 -> axi_write_crossbar [ style="invis" ]

right_port [ shape=none label="Right port:\nwell-behaved\n200 MHz\n64 bit" ];
axi_write_crossbar -> right_port [ dir="none" ];
}

The examples above instantiate different processing blocks depending on the configuration of the left ports. The processing blocks achieve the goal of ensuring the AXI transactions are well-behaved before reaching the crossbar, so that no cycles are wasted on the right side.

The left port 0 has the same data width as the right port, but is in a different clock domain, so it needs a clock crossing. Since the clock rate is lower on the left side for this port the bus also needs to be throttled. The clock crossing from 100 MHz to 200 MHz would otherwise yield a data word every second cycle in the right domain.

For left port 1, the relation of clock rate and data width means the data rate is the same on the left side and the right side. This means that the transactions are still well-behaved when reaching the crossbar, given that they are well-behaved from the AXI master on the left side.

Port 2 on the left is configured to indicate that the AXI master is not well-behaved. In this case there is need for a throttling block, despite the data rate being high enough to send with full throughput on the right side. Since an ill-behaved AXI master might start sending a burst but pause halfway in, the throttle block must buffer a full burst before sending it through to the crossbar.

Example with right-side processing

It is also possible to place processing blocks on the right side of the crossbar. This can be suitable in a few scenarios. For example:

digraph my_graph {
graph [ dpi = 300 splines=ortho];
rankdir="LR";

left_port_0 [ shape=none label="Left port 0:\n64 bit\n300 MHz\nwell-behaved" ];

left_port_1 [ shape=none label="Left port 1:\n128 bit\n300 MHz\nwell-behaved" ];
axi_data_width_converter_1 [ shape=box label="AXI data\nwidth converter"];
left_port_1 -> axi_data_width_converter_1 [ dir="none" weight=10 ];
{
  rank=same; left_port_0; left_port_1;
}


axi_write_crossbar [ shape=box label="AXI write\ncrossbar" height=2.5];
left_port_0 -> axi_write_crossbar [ dir="none" ];
axi_data_width_converter_1 -> axi_write_crossbar [ dir="none" ];

axi_cdc_right [ shape=box label="AXI CDC"];
right_port [ shape=none label="Right port:\nwell-behaved\n250 MHz\n64 bit" ];
axi_write_crossbar -> axi_cdc_right [ dir="none" ];
axi_cdc_right -> right_port [ dir="none" ];
}

In this case both of the left ports are in the same clock domain, and the right port is in a slower domain. This means it is more efficient to run the crossbar in the left clock domain, and have only one CDC on the right side.

Warning

Placing processing blocks after the crossbar is considered a niche use-case, and the interconnect can not guarantee 100% utilization in all scenarios.

Specifically an AXI data width converter can be placed after the crossbar, which will have a one clock cycle overhead per burst. In that case the throughput on the right will not be 100% even if the left ports push data at a high enough rate.

In others scenarios, such as the one illustrated above, it is however completely safe to place processing blocks after the crossbar.

Configuration interface

The properties of the left and right ports are set via generics to axi_read_interconnect and axi_write_interconnect.

The following generics are available:

  • num_left_ports: The number of ports on the left side.

  • num_right_ports: The number of ports on the right side. Must be set to one.

  • max_burst_length_beats: The maximum AXI burst length to be used. Typically set to 16 or 256.

  • left_id_widths: The AXI ID width used for each port on the left side. A higher value will result in greater resource utilization.

    Note that the AXI ID width used on the right side will be the maximum of left_id_widths plus some spare bits used for response arbitration. See documentation of axi_crossbar.

  • left_addr_widths: The AXI address width used for each port on the left side. A higher value will result in greater resource utilization.

    Note that the AXI address width used on the right side will be the maximum of left_addr_widths.

  • left_is_well_behaved: Set to true if the AXI master to the left is guaranteed to be well-behaved. One value for each port.

    Setting a value of false might necessitate the insertion of AXI FIFOs and/or throttling blocks, which will increase the resource utilization.

  • left_data_widths: The AXI data width used for each port on the left side.

  • left_clock_is_the_same_as_crossbar_clock: Set to true if the left port AXI clock is the same as the crossbar_clock port. One value for each left port.

    A value of false will necessitate the insertion of AXI CDCs before the crossbar.

  • left_clock_rates_mhz: The clock rate in MHz for each port on the left. The clock and data rate configuration will in some cases determine the need for buffering and throttling.

  • left_address_fifo_depths: In cases where FIFO buffering is instantiated before the crossbar, this generic controls the depths of the address (AR/AW) FIFOs. One value per port.

    Note that the value zero can be set to generate a passthrough instead of a FIFO.

  • left_data_fifo_depths: In cases where FIFO buffering is instantiated before the crossbar, this generic controls the depths of the data (R/W) FIFOs. One value per port. To guarantee full throughput, a value of at least max_burst_length_beats must be used.

    Note that the value zero can be set to generate a passthrough instead of a FIFO.

  • crossbar_data_width: The AXI data width used by the crossbar.

    Note that a different value than left_data_widths for any given port will insert a data width converter for that port before the crossbar. A different value than right_data_widths will insert data width conversion after the crossbar.

  • crossbar_clock_rate_mhz: The clock rate in MHz for the crossbar_clock port.

    Note that if this value is different than left/right_clock_rate_mhz for any given port, then left/right_clock_is_the_same_as_crossbar_clock must be set to false. Conversely if left/right_clock_is_the_same_as_crossbar_clock is true then left/right_clock_rates_mhz must be the same as crossbar_clock_rate_mhz for that port.

    Having the same *_clock_rate_mhz value but left/right_clock_is_the_same_as_crossbar_clock set to false is valid in situations where two clocks have the same frequency but come from different clocks sources. This situation necessitates the insertion of CDCs despite the clocks having the same frequency.

  • right_data_widths: The AXI data width used for each port on the right side.

    A value different than crossbar_data_width will necessitate insertion of data width conversion after the crossbar.

  • right_clock_is_the_same_as_crossbar_clock: Set to true if the right port AXI clock is the same as the crossbar_clock port. One value for each right port.

    A value of false will necessitate the insertion of AXI CDCs after the crossbar.

  • right_clock_rates_mhz: The clock rate in MHz for each port on the right. The clock and data rate configuration will in some cases determine the need for buffering and throttling.

  • right_address_fifo_depths: In cases where FIFO buffering is instantiated after the crossbar, this generic controls the depths of the address (AR/AW) FIFOs. One value per port.

    Note that the value zero can be set to generate a passthrough instead of a FIFO.

  • right_data_fifo_depths: In cases where FIFO buffering is instantiated after the crossbar, this generic controls the depths of the data (R/W) FIFOs. One value per port. To guarantee full throughput, a value of at least max_burst_length_beats must be used.

    Note that the value zero can be set to generate a passthrough instead of a FIFO.

Additionally axi_write_interconnect has the generics:

  • support_left_write_burst_without_stall: Set to true in order to ensure there is buffering on the left side that can receive a whole data burst without stalling. See more under Left write bursts without stall.

    Setting a value of true might necessitate the insertion of AXI FIFOs and/or switch the order of processing blocks, which will increase the resource utilization.

  • left_write_response_fifo_depths: In cases where FIFO buffering is instantiated before the crossbar, this generic controls the depths of the write response (B) FIFOs. One value per port.

    Note that the value zero can be set to generate a passthrough instead of a FIFO.

  • right_write_response_fifo_depths: In cases where FIFO buffering is instantiated after the crossbar, this generic controls the depths of the write response (B) FIFOs. One value per port.

    Note that the value zero can be set to generate a passthrough instead of a FIFO.

Left write bursts without stall

There is a configuration option to support left-side write bursting without stall for each port. If any AXI master on the left is written in such a way that it is important for it to send bursts of data without stall, then this option must be considered. If on the other hand it is not important for the AXI master, or bursted data is buffered in a FIFO already, then this option can be set to false to save resources.

The use case for this is somewhat convoluted but consider the following processing chain:

digraph my_graph {
graph [ dpi = 300 splines=ortho];
rankdir="LR";

left_port [ shape=none label="64 bit\n300 MHz\nill-behaved" ];
axi_data_width_converter [ shape=box label="AXI data\nwidth converter"];
axi_cdc [ shape=box label="AXI CDC"];
axi_write_throttle [ shape=box label="AXI write\nthrottle"];
right_port [ shape=none label="32 bit\n250 MHz\nwell-behaved" ];

left_port -> axi_data_width_converter [ dir="none" ];
axi_data_width_converter -> axi_cdc [ dir="none" ];
axi_cdc -> axi_write_throttle [ dir="none" ];
axi_write_throttle -> right_port [ dir="none" ];
}

In this case there is full throughput on the right side, but the left side can only do a data transaction every second cycle due to the width conversion from 64 to 32. If we were to set the support_left_write_burst_without_stall generic to true for this port, the processing blocks would be re-ordered to this configuration:

digraph my_graph {
graph [ dpi = 300 splines=ortho];
rankdir="LR";

left_port [ shape=none label="64 bit\n300 MHz\nill-behaved" ];
axi_cdc [ shape=box label="AXI CDC"];
axi_write_throttle [ shape=box label="AXI write\nthrottle"];
axi_data_width_converter [ shape=box label="AXI data\nwidth converter"];
right_port [ shape=none label="32 bit\n250 MHz\nwell-behaved" ];

left_port -> axi_cdc [ dir="none" ];
axi_cdc -> axi_write_throttle [ dir="none" ];
axi_write_throttle -> axi_data_width_converter [ dir="none" ];
axi_data_width_converter -> right_port [ dir="none" ];
}

With the AXI CDC being placed first in the chain, the AXI master to the left can burst to the CDC data FIFO without stalling. The downside of this configuration is that the AXI CDC will have a little higher resource utilization when it is placed on a wider bus.

AXI well-behavedness

The concept of an AXI master/slave being well-behaved is recurring in the discussion of this module. For an AXI participant to be considered well-behaved it has to fulfill some requirements. First of all, the AXI standard AMBA AXI and ACE Protocol Specification, ARM IHI 0022E (ID022613) must be followed. Secondly, the general rules of handshaking data interfaces must be followed. And apart from that, there are some specific requirements depending on what actor we are discussing, listed below.

AXI read master

An AXI read master must adhere to the following requirements to be considered well-behaved.

  1. The RREADY signal must be asserted in the same cycle, or before, the corresponding ARVALID.

    I.e. a master shall not negotiate a burst, but not be able to receive the data.

  2. Once RREADY has been asserted, it must remain high until the RLAST transaction has occurred.

    I.e. there must never be holes in the data stream. This will unnecessarily stall the AXI slave and decrease bus utilization.

AXI read slave

An AXI read slave must adhere to the following requirements to be considered well-behaved.

  1. When ARREADY has been asserted the slave should be ready to assert RVALID as soon as possible.

    The AXI master might have a state machine that waits to receive data before it can continue processing.

  2. Once RVALID has been asserted, it must remain high until the RLAST transaction has occurred.

    I.e. there must never be holes in the data stream. This will unnecessarily stall the AXI master and decrease bus utilization.

AXI write master

An AXI write master must adhere to the following requirements to be considered well-behaved.

  1. The WVALID signal must be asserted in the same cycle, or before, the corresponding AWVALID.

    I.e. a master shall not send it’s address transaction, and then wait a long while before starting to send data. Sending the data before the address is acceptable though.

  2. If WVALID is asserted before it’s corresponding AWVALID, no more than one burst of data shall be sent before the address transaction is sent.

    I.e. a master shall not fill up the data buffering with data, but the address transaction necessary to interpret the data comes way later.

  3. Once WVALID has been asserted, it must remain high until the WLAST transaction has occurred.

    I.e. there must never be holes in the data stream. This will unnecessarily stall the AXI slave and decrease bus utilization.

  4. The BREADY signal must be constantly asserted.

    I.e. the master must always be able to receive the write response. Stalling the write response channel can fill the response queue of the AXI slave, which in turn stalls AW and W transactions.

AXI write slave

An AXI write slave must adhere to the following requirements to be considered well-behaved.

  1. Once AWREADY is asserted the slave must within a few clock cycles assert WREADY.

    Having WREADY asserted before AWREADY is also acceptable.

    Accepting address transactions but not being able to accept data can stall an AXI master.

  2. Once WREADY has been asserted, it must remain high until the WLAST transaction has occurred.

    I.e. there shall never be holes in the data stream. This will unnecessarily stall the AXI master and decrease bus utilization.

  3. The write response shall be sent as soon as possible.

    The AXI master might have state machine that waits for a write response before it can continue processing.

Handshaking rules

The AXI interfaces used by this module feature handshaking via ready/valid.

Using AXI4-Stream-like handshake interfaces (ready and valid to qualify data transactions) is very common in FPGA designs. It enables a backpressure situation where the slave, i.e. the receiver of data, can indicate when it is ready to receive the data.

Below are some rules governing how these handshake signals interact. They are adapted from the AMBA 4 AXI4-Stream Protocol Specification, ARM IHI 0051A (ID030610).

  1. A transactions occurs on the positive edge of the clock when both ready and valid are high. The graph below shows some typical transactions.

    _images/wavedrom-3d15423e-798f-461a-9717-97c9eecacbfc.svg
  2. The ready signal may fall without a transaction having occured:

    _images/wavedrom-cd663789-ab82-4eca-9337-ce1cad7f4bf2.svg
  1. The valid signal may NOT fall without a transaction having occured:

    _images/wavedrom-90509609-c66e-4e1b-abfe-5e14a86ab7f8.svg
  2. Once valid is asserted, the associated data may NOT be changed unless a transaction has occurred.

    _images/wavedrom-b5937e55-3c79-4148-ae5c-a80f48be6a1f.svg

    This applies to any auxillary signals associated with the bus as well, e.g. a last indicator.

    Note also that this restriction on data not changing only applies when valid is asserted. When it is not, the data may be changed freely.

  3. In order to avoid deadlock situations, the master may NOT wait for the slave to assert ready before asserting valid. The slave however may wait for valid before asserting ready.

Design details

Below follows a description of the different sub-modules.

axi_interconnect_pkg.vhd

Package with types and utility functions for this module.

axi_read_interconnect.vhd

component axi_read_interconnect is
  generic (
    num_left_ports : positive;
    num_right_ports : positive;
    --
    max_burst_length_beats : positive;
    --
    left_id_widths : natural_vec_t;
    left_addr_widths : positive_vec_t;
    left_is_well_behaved : boolean_vec_t;
    --
    left_data_widths : positive_vec_t;
    left_clock_is_the_same_as_crossbar_clock : boolean_vec_t;
    left_clock_rates_mhz : real_vector;
    --
    left_address_fifo_depths : natural_vec_t;
    left_data_fifo_depths : natural_vec_t;
    --
    crossbar_data_width : positive;
    crossbar_clock_rate_mhz : real;
    --
    right_data_widths : positive_vec_t;
    right_clock_is_the_same_as_crossbar_clock : boolean_vec_t;
    right_clock_rates_mhz : real_vector;
    --
    right_address_fifo_depths : natural_vec_t;
    right_data_fifo_depths : natural_vec_t
  );
  port (
    crossbar_clock : in std_logic;
    --# {{}}
    left_clocks : in std_logic_vector;
    left_ports_m2s : in axi_read_m2s_vec_t;
    left_ports_s2m : out axi_read_s2m_vec_t;
    --# {{}}
    right_clocks : in std_logic_vector;
    right_ports_m2s : out axi_read_m2s_vec_t;
    right_ports_s2m : in axi_read_s2m_vec_t
  );
end component;

Top level for AXI read interconnect that instantiates processing before and after the crossbar based on the generic configuration set by the user.

axi_write_interconnect.vhd

component axi_write_interconnect is
  generic (
    num_left_ports : positive;
    num_right_ports : positive;
    --
    max_burst_length_beats : positive;
    --
    left_id_widths : natural_vec_t;
    left_addr_widths : positive_vec_t;
    left_is_well_behaved : boolean_vec_t;
    support_left_write_burst_without_stall : boolean_vec_t;
    --
    left_data_widths : positive_vec_t;
    left_clock_is_the_same_as_crossbar_clock : boolean_vec_t;
    left_clock_rates_mhz : real_vector;
    --
    left_address_fifo_depths : natural_vec_t;
    left_data_fifo_depths : natural_vec_t;
    left_write_response_fifo_depths : natural_vec_t;
    --
    crossbar_data_width : positive;
    crossbar_clock_rate_mhz : real;
    --
    right_data_widths : positive_vec_t;
    right_clock_is_the_same_as_crossbar_clock : boolean_vec_t;
    right_clock_rates_mhz : real_vector;
    --
    right_address_fifo_depths : natural_vec_t;
    right_data_fifo_depths : natural_vec_t;
    right_write_response_fifo_depths : natural_vec_t
  );
  port (
    crossbar_clock : in std_logic;
    --# {{}}
    left_clocks : in std_logic_vector;
    left_ports_m2s : in axi_write_m2s_vec_t;
    left_ports_s2m : out axi_write_s2m_vec_t;
    --# {{}}
    right_clocks : in std_logic_vector;
    right_ports_m2s : out axi_write_m2s_vec_t;
    right_ports_s2m : in axi_write_s2m_vec_t
  );
end component;

Top level for AXI write interconnect that instantiates processing before and after the crossbar based on the generic configuration set by the user.

read_interconnect_processing.vhd

component read_interconnect_processing is
  generic (
    max_burst_length_beats : positive;
    id_width_bits : natural;
    addr_width_bits : positive;
    parameters : interconnect_processing_parameters_t;
    processing_configuration : interconnect_processing_t;
    address_fifo_depth : positive;
    data_fifo_depth : positive
  );
  port (
    left_clk : in std_logic;
    left_port_m2s : in axi_read_m2s_t;
    left_port_s2m : out axi_read_s2m_t;
    --# {{}}
    right_clk : in std_logic;
    right_port_m2s : out axi_read_m2s_t;
    right_port_s2m : in axi_read_s2m_t
  );
end component;

Utility box that instantiates a chain of processing boxes based on a configuration vector.

write_interconnect_processing.vhd

component write_interconnect_processing is
  generic (
    max_burst_length_beats : positive;
    id_width_bits : natural;
    addr_width_bits : positive;
    parameters : interconnect_processing_parameters_t;
    processing_configuration : interconnect_processing_t;
    address_fifo_depth : positive;
    data_fifo_depth : positive;
    write_response_fifo_depth : positive
  );
  port (
    left_clk : in std_logic;
    left_port_m2s : in axi_write_m2s_t;
    left_port_s2m : out axi_write_s2m_t;
    --# {{}}
    right_clk : in std_logic;
    right_port_m2s : out axi_write_m2s_t;
    right_port_s2m : in axi_write_s2m_t
  );
end component;

Utility box that instantiates a chain of processing boxes based on a configuration vector.

Resource utilization

This entity has netlist builds set up with automatic size checkers in module_axi_interconnect.py. The following table lists the resource utilization for the entity, depending on generic configuration.

Resource utilization for write_interconnect_processing.vhd netlist builds.

Generics

Total LUTs

FFs

RAMB36

RAMB18

support_left_write_burst_without_stall = False

(Using wrapper

write_interconnect_processing_netlist_wrapper.vhd)

387

599

1

1

support_left_write_burst_without_stall = True

(Using wrapper

write_interconnect_processing_netlist_wrapper.vhd)

399

600

2

1