axi_crossbar

This document contains technical documentation for the axi_crossbar module.

Version

This file is part of axi_crossbar release 0.0.0 collected at 2022-01-07 12:02.

Releases from Truestream follow the semantic versioning scheme: MAJOR.MINOR.PATCH+HASH.

  • MAJOR will be incremented for incompatible API or functionality changes.

  • MINOR will be incremented when new functionality is added in a backwards compatible manner.

  • PATCH will be incremented for backwards compatible bug fixes.

The HASH field is the git sha that the release was made from. It is included in the version number for internal traceability.

Release notes

Changelog and release history for the axi_crossbar module. Changelogs from Truestream follow the keep a changelog format.

3.0.0+a4f1024a - (24 march 2021)

Added

Breaking changes

  • Adapt for latest tsfpga.

2.0.0+c995aaf0 - (21 august 2020)

Breaking changes

  • Adapt for tsfpga version 4.0.0

Requirements

This module has the following dependencies:

Library name

This module’s source files shall be compiled to a VHDL library symbolically named axi_crossbar.

Overview

This module provides small footprint blocks for AXI4 and AXI3 N-to-1 read and write arbitration.

digraph my_graph {
graph [ dpi = 300 splines=ortho];
rankdir="LR";
cloud0 [ label="AXI master" shape=none image="cloud_450.png"];
cloud0 -> axi_crossbar [dir="both"];
cloud1 [ label="AXI master" shape=none image="cloud_450.png"];
cloud1 -> axi_crossbar [dir="both"];
axi_crossbar [ shape=box label=<
<table border="0" cellborder="0" cellspacing="0" cellpadding="0" height="170px">
<tr>
  <td height="170px" align="left">left<br />ports</td>
  <td height="170px" cellpadding="40">AXI<br /> crossbar</td>
  <td height="170px" align="right">right<br />port</td>
  </tr>
</table>>];
axi_crossbar -> ddr [dir="both"];
ddr [ label="DDR" shape=box];
}

The user would typically instantiate axi_read_crossbar.vhd or axi_write_crossbar.vhd. These entities have the following generics:

  • num_left_ports: The number of AXI masters on the left side.

  • data_width: The width of RDATA and WDATA for all the AXI ports.

  • left_id_widths: The width of ARID, RID, AWID and BID for the ports on the left side. One value for each port. Note that more bits are needed on the right side, see Limitations. Lowering these values will decrease resource utilization.

  • address_fifo_depths: Depth of optional FIFO for the AR and AW channels. One value for each port. Set to 0 to disable.

  • data_fifo_depths: Depth of optional FIFO for the R and W channels. One value for each port. Set to 0 to disable.

  • port_is_asynchronous: Indicates that a left port is asynchronous to the right port. If FIFOs are enabled the inferred FIFOs will be asynchronous as to synchronize the bus. Default is False, indicating that the port is synchronous.

  • outstanding_transaction_limits: Set a non-zero value to enable Outstanding transaction limits. Default is 0, meaning that the mechanism is disabled. One value for each port.

Specification

  1. The module provides N-to-1 arbitration for AXI read and write transactions.

  2. Apart from limitations below, it is fully compliant with the AMBA AXI4 Specification.

  3. Achieves 100% utilization of R, W and B channels. There are no right-side clock cycles wasted when arbitrating.

  4. For AR and AW a transaction can be performed on the right side every second cycle.

Limitations

  1. These AXI4 signals are not included in the interfaces, and are assumed to be constant:

    • Lock type: AxLOCK

    • Memory type: AxCACHE

    • Protection type: AxPROT

    • Quality of service: AxQOS

    • Region identifier: AxREGION

    • User-defined signaling: AxUSER and xUSER

    • Write ID (only used by AXI3): WID

  2. AXI standard demands there be no combinatorial paths between input and output handshake signals (ready and valid). This rule is not honored in this module, since it increases logic footprint and is not necessary to reach timing.

  3. The module uses extra ID bits on the right side in order to index read/write responses. The extra ID bits will be appended above the regular ID bits from the left ports. This achieves higher performance and lower resource utilization, but means that more bits are needed on the right AXI port.

\[\text{right ID width} = \text{max(left_id_widths)} + \left\lceil \log_2 ( \text{num_left_ports} ) \right\rceil\]
  1. Always set AxCACHE to be modifiable, since the IDs will be manipulated.

  2. Since the IDs will be manipulated, it is recommended that behavior of the slave on the right side not be dependent on ID. If it is dependent on ID, it must only consider the bits that were not appended by the axi_crossbar. For RID and BID the crossbar will return to the left ports only the ID range indicated by left_id_widths, i.e. no extra bits. So it is safe for the masters on the left to have behavior dependent on ID.

  3. The module does not have any reset functionality. The design targets modern SRAM-based FPGAs, where initial values can be used and there is no need for reset.

Outstanding transaction limits

The outstanding_transaction_limits generic of axi_read_crossbar.vhd and axi_write_crossbar.vhd support setting boundaries on the number of outstanding address transactions. This is to avoid deadlock situations where a master on the left performs many address transactions, but is not ready to send/receive all the data continuously. In this case the other masters on the left, who may very well be ready to send/receive data, would be starved out since the right side slave would be stalled waiting for data transactions.

By default the generic for the limit is zero, indicating the mechanism is not used. Enabling it will increase resource utilization, so do it only if the mechanism is needed.

Performance considerations

For optimal performance an AXI master should be able to send or receive data with full throughput after an address transaction has been made. This is because any stalling on the right side will affect all left ports.

In most cases, the master itself should take care of this, but below are some tips on how the crossbar can be used to improve performance if the masters are not well behaved.

  1. For read transactions, the crossbar can be configured in a way that guarantees that data can always be accepted. This is done by enabling data FIFOs data_fifo_depths and setting outstanding_transaction_limits such that the FIFOs are larger than the data of the maximum number of outstanding transactions.

  2. For write transactions, packet_mode can be used to ensure that a write transaction is not started before data for one packet is available. It is still possible to start multiple write transactions with only one packet in the FIFO, but the risk of stalling is reduced.

  3. outstanding_transaction_limits can also be used to limit how much an ill-behaved master affects other masters, or to balance loads if masters have different burst lengths or burst frequency.

Design details

Below follows a description of the different sub-modules.

address_arbiter.vhd

component address_arbiter is
  generic (
    num_channels : positive
  );
  port(
    clk : in std_logic;
    --# {{}}
    address_valid : in std_logic_vector;
    data_available : in std_logic_vector;
    --# {{}}
    address_select_ready : in std_logic;
    address_select : out integer range 0 to not_selected_idx
  );
end component;

Selects arbitration channel according to a simple Round Robin scheme. Will select a channel if it has address valid and data available. If we are working with writes, data_available should be asserted when there is sufficient write data available to burst. For reads, data_available should be asserted when we are ready to receive a read data burst.

The address_select output will have a value other than nothing_selected for one clock cycle when address_select_ready is '1'. This should make sure that an address transaction occurs.

axi_crossbar_pkg.vhd

Package with utility functions used in the rest of the module.

axi_read_crossbar.vhd

component axi_read_crossbar is
  generic (
    data_width : positive;
    addr_width : positive;
    num_left_ports : positive;
    left_id_widths : natural_vec_t;
    address_fifo_depths : natural_vec_t;
    data_fifo_depths : natural_vec_t;
    outstanding_transaction_limits : natural_vec_t;
    port_is_asynchronous : boolean_vector
  );
  port (
    clk : in std_logic;
    -- Only need to assign if port_is_asynchronous is true for that port
    clocks_asynchronous_left : in std_logic_vector;
    --# {{}}
    left_ports_m2s : in axi_read_m2s_vec_t;
    left_ports_s2m : out axi_read_s2m_vec_t;
    --# {{}}
    right_port_m2s : out axi_read_m2s_t;
    right_port_s2m : in axi_read_s2m_t
  );
end component;

Top level that performs N-to-1 arbitration of AXI read transactions. AXI masters on the left and AXI slave on the right are assumed to be synchronous to clk unless indicated by port_is_asynchronous.

This top level can be seen as a wrapper that instantiates axi_read_crossbar_core. On top of the core it adds (asynchronous) FIFOs for flow control, and a limiter for number of outstanding transactions.

If any value in address_fifo_depths or data_fifo_depths is non-zero there will be a FIFO inferred for that channel. If a left port is in another clock domain than the slave on the right, port_is_asynchronous shall be asserted for that port. When this happens the inferred FIFO will be an asynchronous FIFO in order to synchronize all ports to the same clock. In this case the clocks_asynchronous_left signal must be assigned for that port. If the port is not asynchronous to the right side the clocks_asynchronous_left can be left unassigned.

When using a read data FIFO the address arbitration takes into account the FIFO level so that a whole burst can read continuously. If the FIFO is not sufficiently empty, the arbiter will not select that left port.

It should be noted that if your AXI master on the left side already has FIFOs on the address and data channels, there is very little gain in adding FIFOs to the crossbar as well. In that case it is more resource efficient to increase the depths of the existing FIFOs.

axi_read_crossbar_core.vhd

component axi_read_crossbar_core is
  generic (
    num_left_ports : positive;
    left_id_widths : natural_vec_t
  );
  port (
    clk : in std_logic;
    --# {{}}
    arbiter_data_available : in std_logic_vector;
    --# {{}}
    left_ports_m2s : in axi_read_m2s_vec_t;
    left_ports_s2m : out axi_read_s2m_vec_t;
    --# {{}}
    right_port_m2s : out axi_read_m2s_t;
    right_port_s2m : in axi_read_s2m_t
  );
end component;

Core functionaly for N-to-1 AXI read arbitration.

The arbitration will “lock on” to a left port when ARVALID and arbiter_data_available is asserted for that port. This is to avoid cases where an address transaction is sent through but there is not enough space in R buffering to receive the data.

The arbiter_data_available can for example be assigned to RREADY if that is the desired behavior, or '1' if there is no reliable way to qualify data availablity.

axi_write_crossbar.vhd

component axi_write_crossbar is
  generic (
    data_width : positive;
    addr_width : positive;
    num_left_ports : positive;
    left_id_widths : natural_vec_t;
    address_fifo_depths : natural_vec_t;
    data_fifo_depths : natural_vec_t;
    response_fifo_depths : natural_vec_t;
    outstanding_transaction_limits : natural_vec_t;
    port_is_asynchronous : boolean_vector
  );
  port (
    clk : in std_logic;
    -- Only need to assign if port_is_asynchronous is true for that port
    clocks_asynchronous_left : in std_logic_vector;
    --# {{}}
    left_ports_m2s : in axi_write_m2s_vec_t;
    left_ports_s2m : out axi_write_s2m_vec_t;
    --# {{}}
    right_port_m2s : out axi_write_m2s_t;
    right_port_s2m : in axi_write_s2m_t
  );
end component;

Top level that performs N-to-1 arbitration of AXI write transactions. AXI masters on the left and AXI slave on the right are assumed to be synchronous to clk unless indicated by port_is_asynchronous.

This top level can be seen as a wrapper that instantiates axi_write_crossbar_core. On top of the core it adds (asynchronous) FIFOs for flow control, and a limiter for number of outstanding transactions.

It has the same generics as axi_read_crossbar but with response_fifo_depths added. This generic controls the depth of the optional FIFO for the write response channel (B). Set to 0 to disable. One value for each port.

axi_write_crossbar_core.vhd

component axi_write_crossbar_core is
  generic (
    num_left_ports : positive;
    left_id_widths : natural_vec_t
  );
  port (
    clk : in std_logic;
    --# {{}}
    arbiter_data_available : in std_logic_vector;
    --# {{}}
    left_ports_m2s : in axi_write_m2s_vec_t;
    left_ports_s2m : out axi_write_s2m_vec_t;
    --# {{}}
    right_port_m2s : out axi_write_m2s_t;
    right_port_s2m : in axi_write_s2m_t
  );
end component;

Core functionaly for N-to-1 AXI write arbitration.

The arbitration will “lock on” to a left port when AWVALID and arbiter_data_available is asserted for that port. This is to avoid cases where an address transaction is sent through but there is no data available for a while.

The arbiter_data_available can for example be assigned to WVALID if that is the desired behavior, or '1' if there is no reliable way to qualify data availablity.

response_arbiter.vhd

component response_arbiter is
  generic (
    num_channels : positive;
    id_widths : natural_vec_t
  );
  port(
    clk : in std_logic;
    --# {{}}
    response_id : in unsigned;
    response_select : out integer range id_widths'range
  );
end component;

Selects where a response shall be routed based on the ID.