Building rtl-aid: CI-Native Documentation for RTL Projects

rtl-aid is a small documentation and linting toolkit for Verilog and SystemVerilog projects. It generates one Markdown page per RTL module, preserves the human-written description, and keeps the mechanical sections such as ports, parameters, and module calls in sync with the source.

The project also includes rtllint, a companion command that runs Verilator lint and tags warnings inline in the RTL. The goal is not to replace a full EDA flow. The goal is to make an unfamiliar RTL codebase easier to inspect, review, and maintain from the command line and from agentic coding tools.

Repository: https://github.com/vishwaksen-1/rtl-aid

What this post is about

This post explains why I built rtl-aid, what it currently does, and the design constraints behind it.

It covers:

Generating module documentation from RTL.
Preserving manual descriptions while updating generated sections.
Building a simple module dependency graph.
Making documentation checks usable in CI.
Tagging Verilator warnings directly inside source files.
The limits of using a lightweight parser instead of a full HDL frontend.

It does not cover a complete Verilog parser, formal verification, synthesis integration, or a replacement for tools like Verilator, Yosys, Surelog, or commercial EDA suites.

Background

RTL projects become hard to read in a very specific way.

In software, a new contributor can often start with package names, imports, tests, and runtime entry points. In RTL, the important structure is usually spread across module declarations, parameters, port lists, instantiations, include paths, testbenches, and build scripts. Even a small design can make you ask basic questions before you can make progress:

What modules exist?
Which module instantiates which child?
What are the top-level inputs and outputs?
Which parameters configure this block?
Is this file part of the design or only a testbench?
Where are the existing lint issues?

The obvious answer is "read the source," but that does not scale well when you are only trying to build a mental map. A lot of RTL source is structural information surrounded by implementation details. I wanted a way to extract the structural layer first, then decide which files deserved deeper reading.

That became rtl-aid.

Why this matters

The practical value is not that the generated documentation is beautiful. The value is that it is cheap to regenerate, predictable in shape, and good enough to guide the next action.

For a human reviewer, rtl-aid turns a pile of .v and .sv files into a browsable module index. For an AI coding agent, it provides a much smaller representation of the design than raw RTL. Instead of spending context on every block and expression, the agent can first inspect the generated Markdown and graph.json.

A typical first pass looks like this:

rtldoc -d rtl/ -o .agent/docs/ --json-graph

After that, each module has a Markdown file with the same sections:

Description
Parameters
Inputs
Outputs
Inouts
Calls
Called By

The Description section is human-managed. Everything else is generated. That split is the core of the tool: humans provide meaning, the tool maintains the boring structural truth.

Constraints

The first version of rtl-aid is deliberately small.

Runtime: Python 3.7+.
Dependencies: no Python runtime dependencies.
External tools: rtllint requires Verilator, while rtldoc is standalone.
Parser: regex-based, not a full Verilog/SystemVerilog frontend.
Scope: one module per file.
Supported style: ANSI-style Verilog-2001/SystemVerilog module headers.
Output: plain Markdown and JSON.
CI behaviour: deterministic exit codes and diff-aware writes.

These constraints shaped most decisions. I wanted something that could be installed with:

pip install rtl-aid

and then run in a repository without pulling in a parser stack, a web app, or a database. That meant accepting a narrower syntax target and making the limitations explicit.

The tradeoff is clear: Veridoc is not the right tool if your codebase relies heavily on pre-2001 port declarations, multiple modules per file, typedef-heavy declarations, or macro-expanded structure. It is useful when the codebase follows common ANSI module style, and you want fast structural documentation.

System design

Veridoc has two commands:

Command	Purpose
`rtldoc`	Generate and maintain per-module Markdown docs.
`rtllint`	Run Verilator lint and tag warning lines inline.

The documentation flow is:

Scan .v and .sv files from directories or explicit file lists.
Ignore known testbench suffixes such as _tb.v, _tb.sv, _bench.v, and _testbench.sv.
Strip comments before parsing.
Extract the first module declaration from each file.
Parse parameters, inputs, outputs, and inouts.
Detect instantiations of known modules.
Build the reverse called_by graph.
Generate or update Markdown.
Optionally write graph.json.

The graph export is intentionally simple:

{
  "cpu_core": {
    "calls": ["alu", "decoder", "register_file"],
    "called_by": []
  },
  "alu": {
    "calls": ["mux4"],
    "called_by": ["cpu_core"]
  }
}

This makes it easy to consume from scripts, CI, or an agent. You do not need to parse Markdown to recover the dependency map.

Important decisions

Preserve descriptions

Generated documentation usually fails when it overwrites the only useful human text.

rtldoc avoids that by treating the Description section as user-owned. If a module doc already exists, the description is preserved and the generated sections are replaced. If the file is new, the description starts as:

TODO: Add description

CI mode can then fail if descriptions are still missing:

rtldoc -d rtl/ -o docs/modules/ --ci --print-errors

The result is a useful split of responsibility. The tool owns facts it can extract. The developer owns intent, context, and design notes.

Use diff-aware writes

Documentation generators can create noisy commits if they rewrite files every time they run. rtldoc only writes a Markdown file when the content has actually changed.

That makes it safe to run before every commit or inside CI. If nothing changed structurally, nothing gets touched.

Dry-run mode gives the same behaviour without writing:

rtldoc -d rtl/ -o docs/modules/ --dry-run

With -vv, rtldoc can also show section-level additions and removals, which is useful before committing a source change.

Keep the graph separate

Markdown is nice for humans. JSON is better for tools.

The --json-graph flag writes a machine-readable dependency graph next to the docs. This is especially useful for agent workflows. An agent can inspect the graph first, identify the top-level modules, and then read only the specific generated docs it needs.

That is the main agentic pattern:

rtldoc -d rtl/ -o .agent/docs/ --json-graph

Read the graph. Read the relevant module docs. Only then jump into the source.

Keep lint visible but non-blocking

rtllint is intentionally different from a strict CI lint gate. It runs Verilator and annotates the source line where a warning or error appears:

assign result = a + b;  /* Check: Operator ADD generates 9 bits ... */

It also inserts small test metadata near the top of the file:

// lint-test: verilator --lint-only -Wall rtl/alu.v
// tb-test: tba

The comments are idempotent. Re-running the command replaces existing /* Check: */ tags instead of stacking duplicates.

This gives a review workflow where lint debt is searchable, visible in diffs, and close to the code that caused it. It does not have to block the build immediately.

Implementation notes

The parser is intentionally direct.

It strips comments, then looks for a module header of the form:

module <name> [#(...)] (...);

From there it extracts:

Parameters from the optional #(...) block.
Inputs, outputs, and inouts from the port list.
Comma-inherited directions such as output reg a, b, c.
Instantiations that match known module names.

That last part matters. rtldoc only records calls to modules it has already discovered in the current scan. This avoids treating every function-like token as a module instance.

The generated Markdown is predictable:

# alu

## Description
TODO: Add description

## Parameters
- DATA_WIDTH = 8

## Inputs
- clk
- rst
- operand_a
- operand_b

## Outputs
- result

## Inouts
- None

## Calls
- [mux4](mux4.md)

## Called By
- [cpu_core](cpu_core.md)

The important part is not the formatting. It is just so that every module page has the same shape.

Future work

Add Graphviz or Mermaid export for the dependency graph.
Build an MCP server wrapper so tools like Claude, Cursor, Devin, and other agents can call Veridoc without shelling out manually.
Add a GitHub Actions workflow template.
Improve parser coverage for common real-world RTL styles.
Adding an optional customizable Doc format, rather than enforcing the hardcoded format.
Add an optional backend using Tree-sitter, PyVerilog, Surelog, or another parser for projects that need deeper SystemVerilog support.
Add a cleanup command for removing lint tags.
Generate a browsable static HTML view from the Markdown and graph.

The larger direction is to make RTL projects easier to enter. Not by hiding the source, but by giving humans and agents a reliable map before they start reading every file.

That is the real point of rtl-aid: turn structure into a cheap artefact, keep it current, and let the deeper engineering attention go where it actually matters.

Building rtl-aid: CI-Native Documentation for RTL Projects

What this post is about

Background

Why this matters

Constraints

System design

Important decisions

Preserve descriptions

Use diff-aware writes

Keep the graph separate

Keep lint visible but non-blocking

Implementation notes

Future work

Comments

More from this blog

AES, DES, ECB, CBC: When Random-Looking Bytes Still Leak Structure

Why Unigram KL Is Not Enough

Ancient Ciphers Hide Text, But Do They Destroy Structure?

Does Hashing Make Text Look Random?

Command Palette

What this post is about

Background

Why this matters

Constraints

System design

Important decisions

Preserve descriptions

Use diff-aware writes

Keep the graph separate

Keep lint visible but non-blocking

Implementation notes

Future work

Comments

More from this blog