Low-Latency Control on Open-Source FPGA tools


Implementing Low-Latency Control on FPGAs: A Beginner's Guide to SystemVerilog with Open-Source Tools on WSL

Welcome back to AppliedKaos! Today, we’re shifting gears to explore the intersection of hardware and speed.

Have you ever wondered how financial trading platforms execute trades in microseconds, or how factory robots react instantly to sensor feedback? The secret sauce is often FPGAs (Field Programmable Gate Arrays).

FPGAs are unique because you don’t just write software for them; you design the hardware circuit itself. This allows for massive parallelism and deterministic, ultra-low latency.

In this guide, you’ll learn the fundamentals of low-latency design, why FPGAs excel at it, and how to get started using SystemVerilog and a powerful, completely open-source toolchain running on Windows Subsystem for Linux (WSL) Ubuntu 22.04.


Why Low Latency Matters (and Why FPGAs Win)

Latency is the time delay between an input signal and the corresponding output reaction. In many critical systems, throughput (how much data you process) is less important than latency (how fast you respond).

Common Low-Latency Applications:

  • High-Frequency Trading (HFT): Every microsecond counts when capturing market opportunities.

  • Industrial Motion Control: Robots need immediate feedback to maintain stability and safety.

  • Autonomous Vehicles: Real-time sensor processing for obstacle avoidance.

  • Medical Devices: Instant response in life-critical monitoring systems.

The FPGA Advantage:

A CPU or GPU executes instructions sequentially. It has to handle operating system interrupts, cache misses, and task scheduling. This introduces variability (jitter) and overhead.

An FPGA allows you to build dedicated streaming architectures in hardware. Data flows through customized logic pipelines, processing new information every single clock cycle. This results in cycle-accurate, deterministic performance which is the very definition of predictable low latency.


Your Open-Source Toolchain on WSL

We will use the OSS CAD Suite, a pre-packaged bundle of the best open-source digital logic design tools. Our specific setup on WSL Ubuntu 22.04 will use:

  1. Icarus Verilog (iverilog): For simulation and SystemVerilog support.

  2. Verilator: A ultra-fast Verilog/SystemVerilog simulator that compiles your code into C++.

  3. GTKWave: A waveform viewer to visualize our digital signals.

Prerequisites

You need WSL installed on your Windows machine. If you haven't done this yet, open PowerShell as Administrator and run:

PowerShell
wsl --install
# After restart, ensure you are on Ubuntu 22.04
wsl --set-version Ubuntu-22.04 2

Step-by-Step Instructions

We will build a simple, low-latency "Glitch Filter." This circuit will wait until an input signal holds a steady value for a certain number of clock cycles before updating the output, ignoring short noise spikes.

Step 1: Install OSS CAD Suite on WSL

Open your WSL Ubuntu 22.04 terminal.

  1. Update your system:

    Bash
    sudo apt update && sudo apt upgrade -y
    sudo apt install -y git make gtkwave
    

    Note: While OSS CAD Suite includes GTKWave, installing it via apt ensures dependencies are met for the GUI.

  2. Download the latest OSS CAD Suite for Linux: Go to the OSS CAD Suite Releases page and find the linux-x64 asset. Alternatively, use wget (replace the date with the latest version):

    Bash
    mkdir -p ~/tools
    cd ~/tools
    wget https://github.comYosysHQ/oss-cad-suite-build/releases/download/202X-XX-XX/oss-cad-suite-linux-x64-202X-XX-XX.tgz
    
  3. Extract the archive:

    Bash
    tar -xzvf oss-cad-suite-linux-x64-*.tgz
    
  4. Add to your PATH: Add the following line to your ~/.bashrc file to make the tools available everywhere.

    Bash
    echo 'export PATH="$HOME/tools/oss-cad-suite/bin:$PATH"' >> ~/.bashrc
    source ~/.bashrc
    
  5. Verify installation:

    Bash
    iverilog -V
    verilator --version
    yosys --version
    

Step 2: Create Your SystemVerilog Design

Create a new directory for your project:

Bash
mkdir -p ~/kaos_fpga_filter
cd ~/kaos_fpga_filter

Create a file named glitch_filter.sv:

Code snippet
// glitch_filter.sv
// AppliedKaos: Beginner's Guide to Low Latency FPGA

`timescale 1ns / 1ps

module glitch_filter #(
    parameter int THRESHOLD = 4 // Number of cycles the signal must be steady
)(
    input  logic clk,
    input  logic rst_n, // Active low reset
    input  logic glitchy_in,
    output logic clean_out
);

    // Dynamic array/counter type based on threshold parameter
    localparam int CounterWidth = $clog2(THRESHOLD + 1);
    logic [CounterWidth-1:0] counter;
    logic sampling;

    // A simple FSM/Synchronizer stage
    always_ff @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            counter   <= '0;
            clean_out <= '0;
            sampling  <= '0;
        end else begin
            // Synchronize input to avoid metastability
            sampling <= glitchy_in;

            if (sampling == clean_out) begin
                // Input matches output, reset counter
                counter <= '0;
            end else begin
                // Input differs, start/continue counting
                if (counter >= THRESHOLD - 1) begin
                    // Stabilized long enough, update output
                    clean_out <= sampling;
                    counter   <= '0;
                end else begin
                    counter <= counter + 1'b1;
                end
            end
        end
    end

endmodule

Low-Latency Design Notes:

  • always_ff @(posedge clk): This ensures the logic inside is mapped to registers (Flip-Flops), creating a deterministic, synchronous circuit.

  • Parameterization (THRESHOLD): Allows us to reuse the module for different latency/noise scenarios.

  • Determinism: The output updates exactly $THRESHOLD+1$ clock cycles after a stable input change (including the synchronization stage).

Step 3: Write a SystemVerilog Testbench

Create a testbench file named glitch_filter_tb.sv. We will use Icarus Verilog for this simulation.

Code snippet
// glitch_filter_tb.sv
`timescale 1ns / 1ps

module glitch_filter_tb;

    logic clk;
    logic rst_n;
    logic glitchy_in;
    logic clean_out;

    // Parameterize the DUT for simulation
    localparam int SIM_THRESHOLD = 3;

    // Instantiate the Device Under Test (DUT)
    glitch_filter #(
        .THRESHOLD(SIM_THRESHOLD)
    ) dut (
        .clk(clk),
        .rst_n(rst_n),
        .glitchy_in(glitchy_in),
        .clean_out(clean_out)
    );

    // Clock generation: 100MHz (10ns period)
    always #5 clk = ~clk;

    initial begin
        // Initialize signals
        clk = 0;
        rst_n = 0;
        glitchy_in = 0;

        // Dump waveforms for GTKWave
        $dumpfile("glitch_filter.vcd");
        $dumpvars(0, glitch_filter_tb);

        // Apply Reset
        #15 rst_n = 1;
        #10;

        // Test Scenario 1: Stable High
        $display("Status: Applying Stable High");
        glitchy_in = 1;
        #100;

        // Test Scenario 2: Stable Low
        $display("Status: Applying Stable Low");
        glitchy_in = 0;
        #100;

        // Test Scenario 3: Short Glitch (should be ignored)
        $display("Status: Applying Short Glitch");
        glitchy_in = 1;
        # (5 * 2); // Two clock cycles
        glitchy_in = 0;
        #100;

        // Test Scenario 4: Valid Signal just at threshold
        $display("Status: Applying Signal just at Threshold");
        glitchy_in = 1;
        # (5 * (SIM_THRESHOLD + 1)); // Exact stable time
        glitchy_in = 0;
        #100;

        $display("Status: Simulation Finished");
        $finish;
    end

endmodule

Step 4: Simulate with Icarus Verilog

  1. Compile the design and testbench:

    Bash
    iverilog -g2012 -o simulation.vvp glitch_filter.sv glitch_filter_tb.sv
    
    • -g2012: Tells Icarus to use the SystemVerilog 2012 standard.

    • -o simulation.vvp: Specifies the output executable file name.

  2. Run the simulation:

    Bash
    vvp simulation.vvp
    

    This will generate the waveform file glitch_filter.vcd specified in the testbench.

Step 5: Visualize Waveforms with GTKWave

Because you are likely running WSL without a full GUI desktop, you need an X Server installed on Windows (like GcXsrv or Xming) to view GUI applications from Linux.

  1. Start your Windows X Server (ensure "Disable Access Control" is checked if using GcXsrv).

  2. Set the DISPLAY variable in your WSL terminal:

    Bash
    export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0
    
  3. Run GTKWave:

    Bash
    gtkwave glitch_filter.vcd
    

In the GTKWave GUI:

  1. Expand the glitch_filter_tb tree on the left.

  2. Select dut.

  3. Drag signals like clk, glitchy_in, counter, and clean_out into the Waves window.

Observe how clean_out only changes after glitchy_in remains stable while the counter reaches the threshold. Short spikes on glitchy_in that reset the counter do not affect clean_out.


Going Faster: Pipelining for Low Latency

In this example, the logic between registers is trivial. However, in complex control algorithms (like a PID controller), the "longest path" of logic between two Flip-Flops dictates the maximum clock frequency.

If your logic path is too long, the signal won't stabilize before the next clock edge, failing timing constraints.

The core technique to achieve ultra-low latency while maintaining high frequency is pipelining. Pipelining breaks complex combinational logic into smaller stages separated by registers.

Standard PID Loop (CPU) VS Pipelined PID (FPGA)

  • CPU: Input →  Read Regs →  Calc P  Calc I →  Calc D  Sum  Write Out. All must finish before the next loop iteration. Latency is the sum of all steps.

  • FPGA Pipelined: Input  (Stage 1: Calc P, Calc I, Calc D) →  Regs →  (Stage 2: Summing)  Regs  Output. A new output is generated every cycle (high throughput), and the latency is fixed at exactly 2 clock cycles.

By keeping pipeline stages "shallow" (minimizing logic depth), you can run your FPGA at hundreds of MHz, achieving response times measured in nanoseconds.


Conclusion

Implementing low-latency control on FPGAs requires a fundamental shift in thinking from sequential software to concurrent hardware design. You must prioritize cycle-accurate determinism and minimizing logic depth.

By using SystemVerilog and open-source tools within WSL, you have a powerful, cost-effective ecosystem to start designing your own hardware accelerators.

The next step? Synthesize this design using Yosys and implement it on real FPGA hardware!

Stay Kaotic!

 AppliedKaos 


Comments