This document contains technical documentation for the off_chip_fifo module.


This file is part of off_chip_fifo release 0.0.0 collected at 2022-01-07 12:03.

Releases from Truestream follow the semantic versioning scheme: MAJOR.MINOR.PATCH+HASH.

  • MAJOR will be incremented for incompatible API or functionality changes.

  • MINOR will be incremented when new functionality is added in a backwards compatible manner.

  • PATCH will be incremented for backwards compatible bug fixes.

The HASH field is the git sha that the release was made from. It is included in the version number for internal traceability.

Release notes

Changelog and release history for the off_chip_fifo module. Changelogs from Truestream follow the keep a changelog format.


Initial release.


This module has the following dependencies:

  • The open-source hdl_modules project version 1.0.0.

  • The Truestream module axi_master version 1.0.0.

Library name

This module’s source files shall be compiled to a VHDL library symbolically named off_chip_fifo.


This module provides a First In First Out (FIFO) mechanism where the data is buffered in DDR memory via AXI. It provides a simple interface, like a traditional FIFO, but can have a much greater depth (i.e. hold more data) than a FIFO that is limited by on-chip Block RAM.

In FPGA design, a FIFO is a very common component that is used to implement data flow control. It achieves a situation where data can be written at a certain rate, or in a certain pattern, but read out independently at a completely different rate or pattern. On-chip FIFOs are often implemented using Block RAM (for a depth in the 1000’s) or LUTRAM (for a depth in the 10’s). If the data read and write patterns demand a greater data buffer than that, an off-chip DDR memory can be used to buffer data, utilizing an AXI FIFO module like this.

Block diagram

Below is a simple block diagram of the module.

digraph my_graph {
graph [ dpi = 300 ];

input_data [ label="input" shape=none ];
w_fifo [ label="" shape=none image="fifo.png"];
ring_buffer_manager [ label="ring_buffer_manager" shape=box];
input_state_machine [ label="" shape=none image="cloud_450.png"];
job_partitioner [ label="job_partitioner" shape=box];
axi_write_master_core [ label="axi_write_master_core" shape=box];
base_addresses [ label="buffer_start_address\nbuffer_last_address" shape=none ];

input_data -> w_fifo:w;
input_data -> input_state_machine:w;
base_addresses:e -> ring_buffer_manager;
input_state_machine -> ring_buffer_manager:w [ label="request" dir="both" ];
input_state_machine -> job_partitioner;
job_partitioner:e -> axi_write_master_core:w

w_fifo -> axi_write_master_core
axi_write_master_core:e -> axi [ label="AW, W" ]

axi [ label="AXI\nDDR" shape=box height=2.5 ];
read_job_fifo [ label="" shape=none image="fifo.png"];
output_state_machine [ label="" shape=none image="cloud_450.png"];
axi_read_master_core [ label="axi_read_master_core" shape=box];
output_data [ label="output" shape=none ];

axi -> output_data [ label="R" ];
axi -> output_state_machine [ label="B" ]

job_partitioner:e -> read_job_fifo:w;
read_job_fifo:e -> output_state_machine;
output_state_machine -> axi_read_master_core;
axi -> axi_read_master_core [ label="AR" dir="back" ];
ring_buffer_manager:e -> output_state_machine:s [ label="release" dir="back" ];


Note that the job_partitioner, which splits bursts into smaller chunks, is only needed if

\[\text{max_packet_length_bytes} \geq \text{axi_max_burst_length_beats} \times \frac{\text{data_width}}{8}.\]

Otherwise, no burst splitting is necessary, since the whole packet length can fit in one burst.


This module uses AXI read and write ports of the same width as the native data port.

The module can not sustain full throughput on the input interface over time. For each AXI burst there is a two clock cycle stall on the data flow. This means that, if continuously fed with data, the data FIFO will build up, which will eventually cause input_ready to be lowered.

With this in mind, the depth of the data FIFO must be chosen carefully. It must be able to hold a packet of maximum length, but could also have spare space to handle a build up of data. Spare space should be dimensioned based on the behavior of input stream, so that back-to-back packets can be handled as necessary.

On the output side there is no limitation in throughput.

Handshake data interface

This module uses handshaking for data qualification on the input and output interfaces.

Using AXI4-Stream-like handshake interfaces (ready and valid to qualify data transactions) is very common in FPGA designs. It enables a backpressure situation where the slave, i.e. the receiver of data, can indicate when it is ready to receive the data.

Below are some rules governing how these handshake signals interact. They are adapted from the AMBA 4 AXI4-Stream Protocol Specification, ARM IHI 0051A (ID030610).

  1. A transactions occurs on the positive edge of the clock when both ready and valid are high. The graph below shows some typical transactions.

  2. The ready signal may fall without a transaction having occured:

  1. The valid signal may NOT fall without a transaction having occured:

  2. Once valid is asserted, the associated data may NOT be changed unless a transaction has occurred.


    This applies to any auxillary signals associated with the bus as well, e.g. a last indicator.

    Note also that this restriction on data not changing only applies when valid is asserted. When it is not, the data may be changed freely.

  3. In order to avoid deadlock situations, the master may NOT wait for the slave to assert ready before asserting valid. The slave however may wait for valid before asserting ready.

Design details

Below follows a description of the different sub-modules.


component off_chip_fifo is
  generic (
    address_width : positive;
    data_width : positive;
    max_packet_length_bytes : positive;
    -- 256 for AXI4, 16 for AXI3
    axi_max_burst_length_beats : positive;
    read_job_fifo_depth : positive;
    write_data_fifo_depth : positive
  port (
    clk : in std_logic;
    --# {{}}
    -- Note that packets written are limited in length by max_packet_length_bytes.
    input_ready : out std_logic;
    input_valid : in std_logic;
    input_data : in std_logic_vector;
    input_last : in std_logic;
    --# {{}}
    output_ready : in std_logic;
    output_valid : out std_logic;
    output_data : out std_logic_vector;
    output_last : out std_logic;
    --# {{}}
    axi_read_m2s : out axi_read_m2s_t;
    axi_read_s2m : in axi_read_s2m_t;
    --# {{}}
    axi_write_m2s : out axi_write_m2s_t;
    axi_write_s2m : in axi_write_s2m_t;
    --# {{}}
    -- Note that restrictions apply to these addresses. See documentation.
    -- Memory address to the first byte that may be used by this module.
    buffer_start_address : in addr_t;
    -- Memory address to the last byte that may be used by this module.
    buffer_last_address : in addr_t;
    buffer_addresses_valid : in std_logic
end component;

This entity provides a FIFO structure where data is buffered in memory via AXI.

Set buffer addresses

See ring_buffer_manager.vhd for information on how to correctly set the buffer base addresses.


In order to keep complexity and resource utilization down, it uses a simple AXI mechanism that implies a limitation:


The buffer_start_address must be a multiple of


However, in order to achieve the best memory performance, it is beneficial if buffer_start_address is multiple of 4096. This will minimize the rate of burst splitting.

AXI behavior

This entity does not contain a data FIFO on the output side. It is up to the user what type of buffering they need (synchronous or asynchronous, on the output side or on the R side, …). With that in mind, the AR transactions that this entity performs are not necessarily well behaved in an AXI sense: It is possible that more R transactions are queued up than what the R slave can receive. If this is the case, it would result in degradation of memory performance.

Given this risk, it is recommended to have an R FIFO in conjunction with an axi_read_throttle instance in your AXI subsystem.

Resource utilization

This entity has netlist builds set up with automatic size checkers in The following table lists the resource utilization for the entity, depending on generic configuration.

Resource utilization for off_chip_fifo.vhd netlist builds.


Total LUTs




address_width = 28

data_width = 64

max_packet_length_bytes = 1526

axi_max_burst_length_beats = 256

read_job_fifo_depth = 8

write_data_fifo_depth = 1024






Package with utility functions used in the rest of the module.


component ring_buffer_manager is
  generic (
    buffer_segment_length_bytes : positive
  port (
    clk : in std_logic;
    --# {{}}
    request_ready : in std_logic;
    request_valid : out std_logic;
    request_address : out addr_t;
    --# {{}}
    release_last_address : in std_logic;
    --# {{}}
    -- Memory address to the first byte that may be used.
    buffer_start_address : in addr_t;
    -- Memory address to the last byte that may be used.
    buffer_last_address : in addr_t;
    buffer_addresses_valid : in std_logic
end component;

This entity provides a mechanism for handling segments in a ring buffer. It is a simple implementation where the buffer segment length is always the same. A more general implementation would have a size value associated with the request and release buses.


As soon as buffer_addresses_valid is asserted, this entity will start serving addresses on the request interface. To avoid faulty operation, the user must make sure that:

  • The buffer addresses must be correctly set before starting any operation.

  • Once set, the buffer addresses must not be changed.

Also, there is no internal error checking for how many addresses are outstanding:


Do not use the release interface to release more addresses than you have request ed.

Resource utilization

This entity has netlist builds set up with automatic size checkers in The following table lists the resource utilization for the entity, depending on generic configuration.

Resource utilization for ring_buffer_manager.vhd netlist builds.


Total LUTs

Logic LUTs


DSP Blocks

buffer_segment_length_bytes = 2048