DDR4 Memory Training: How It Works

A detailed technical reference on the DDR4 memory training process, with particular attention to AMD Zen 1 / Threadripper platforms. This is the deep-dive companion to my ZFS NAS build writeup.

I used Claude (Anthropic’s AI assistant) for research and writing help on this piece.

Table of Contents

  1. Overview
  2. Where Training Runs on AMD Zen
  3. The Eye Diagram
  4. Training Steps in Detail
  5. On-Die Termination and ProcODT
  6. What Manual Settings Control vs What Training Determines
  7. Training Failure Modes
  8. Open Source Status and openSIL
  9. References

Overview

DDR4 memory training is a calibration process that runs during POST, before the operating system loads. Its purpose is to characterize the electrical properties of the specific physical path between the CPU’s memory controller and each DRAM chip on each installed DIMM, then configure the timing and voltage parameters needed for reliable data transfer at the target speed.

The need for training arises from manufacturing variation. Every motherboard has slightly different trace lengths, impedance characteristics, and crosstalk profiles. Every DRAM chip has slightly different internal timing and drive strength characteristics. Even two boards of the same model with identical DIMMs will have different optimal parameters. Training measures the actual electrical behavior of the specific system and adjusts dozens of timing registers to match.

The entire process can be understood through a single concept: the eye diagram.

Where Training Runs on AMD Zen

On AMD Zen (Family 17h) platforms, including Threadripper 1000-series, memory training does not run on the x86 cores. It runs on the PSP (Platform Security Processor), an ARM Cortex-A5 core embedded in the CPU die.

The boot sequence:

  1. Power on. The PSP boots from on-die ROM.
  2. PSP loads and verifies the ABL (AGESA Bootloader) stages from SPI flash.
  3. ABL stages execute in sequence. One of them parses the APCB (AMD PSP Customization Block), which contains SPD data from the installed DIMMs and platform configuration (including any manual BIOS overrides for timings, voltage, ProcODT, etc.).
  4. ABL runs memory training. This is the phase where the system power-cycles and the screen stays black.
  5. Training results are written to the APOB (AGESA PSP Output Block) in DRAM.
  6. PSP decompresses the BIOS image into DRAM and releases the x86 cores from reset.
  7. The first x86 instruction fetch comes from DRAM, which is already online.

This means DRAM is fully initialized and trained before the x86 CPU executes a single instruction. The POST codes visible on the motherboard’s debug LED (like 0d, C0, AE) come from ABL stages running on the PSP, not from UEFI/BIOS code on the x86 cores.

Training results can be cached in SPI flash (as part of the APOB NV copy). On S3 resume, the PSP replays the saved training data instead of retraining from scratch, which is much faster. On cold boot (S5), the behavior depends on the BIOS’s Fast Boot setting: enabled replays cached data, disabled forces a full retrain. Research by PC Engines found that the APOB training data can have byte-level variations across boots, making cached results unreliable for cold boot on some platforms. This is one reason why disabling Fast Boot is recommended for stability-critical configurations.

Source code availability

The training code for Zen 1 is proprietary. It is distributed as binary ABL blobs and is not readable or auditable.

However, AMD open-sourced AGESA for older platforms (Family 14h, 15h, 16h). The full memory training source code for these platforms is available in coreboot at src/vendorcode/amd/agesa/f15tn/Proc/Mem/. The training algorithms are architecturally similar to Zen, so this code provides a good understanding of the general approach. Key files:

The Eye Diagram

The eye diagram is the central concept in understanding why memory training exists and what it optimizes.

To create an eye diagram, you overlay many successive data bit transitions on top of each other, triggered on the clock or data strobe edge. If the signal is clean, the overlaid transitions form a pattern that looks like an open eye.

        Voltage
          ^
   VIH ---+         _____________________
          |        /   :             :   \
          |       /    :             :    \
          |      /     :             :     \
   Vref --+-----X------:-- EYE -----:------X-----
          |      \     :  (sample   :     /
          |       \    :   here)    :    /
          |        \   :             :   /
   VIL ---+         ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
          |            :             :
          +------------+-------------+-------> Time
                    setup          hold
                    margin         margin

                  |<--- eye width --->|

The goal of training is to find the center of the eye for each signal. This maximizes the margin against all sources of degradation, making the link as robust as possible against temperature drift, voltage fluctuation, and aging.

At DDR4-3200 (1600 MHz clock), a single unit interval (one bit period) is 625 ps. The JEDEC specification requires setup and hold margins that leave only tens of picoseconds of slack. At DDR4-2133, the unit interval is wider (938 ps), which is one reason lower speeds are more forgiving and easier to train.

Per-bit training

Training is performed per DQ bit, not per DIMM or per byte lane. Each of the 64 data lines (plus 8 ECC lines on ECC DIMMs) in a DDR4 channel has its own trace on the motherboard with its own length, impedance, and coupling to adjacent traces. The optimal sampling point for DQ0 may differ from DQ7 by tens of picoseconds. Per-bit deskew registers allow the controller to fine-tune the delay for each individual data line.

Training Steps in Detail

The following sequence is based on the JEDEC DDR4 specification (JESD79-4B) and the open-source AGESA F15tn training code in coreboot. Zen 1 follows the same general sequence, though the implementation details are in proprietary ABL code.

1. ZQ Calibration

What it calibrates: Output driver impedance and on-die termination resistance inside each DRAM chip.

Each DQ output pin on a DRAM chip contains parallel 240-ohm resistor legs made of polysilicon. Polysilicon resistance is inherently imprecise and varies with temperature and manufacturing process. The DRAM chip has an internal calibration circuit connected to an external precision 240-ohm resistor on the ZQ pin (soldered onto the DIMM PCB).

When the controller issues a ZQCL (ZQ Calibration Long) command, the DRAM’s internal comparator adjusts p-channel transistors in a voltage divider until it reaches equilibrium at VDDq/2, referenced against the external precision resistor. The resulting calibration values propagate to all DQ pins, so their drive strength and termination match the PCB impedance.

ZQCL runs once during initialization. Shorter ZQCS (ZQ Calibration Short) commands can run periodically during operation to compensate for temperature drift.

2. Vref DQ Calibration

What it calibrates: The voltage threshold inside each DRAM chip that determines whether an incoming signal is a logic 0 or logic 1.

DDR4 uses POD (Pseudo Open Drain) signaling, where the signal swings between GND and VDDQ. Unlike DDR3, which used an external Vref pin, DDR4 generates Vref internally. The reference voltage is programmable via Mode Register 6 (MR6) with a step size of 0.65% of VDDQ per step.

Vref calibration is the vertical centering of the eye diagram. If Vref is too high, noise on low signals can cause false 1 readings. If too low, noise on high signals can cause false 0 readings. The optimal Vref places the threshold at the point of maximum vertical margin.

DDR4 supports PDA (Per DRAM Addressability), which allows setting a different Vref for each individual DRAM chip on the DIMM. This is useful because DRAM chips at different physical positions on the DIMM see different signal amplitudes due to trace routing.

3. Write Leveling

What it calibrates: The timing relationship between the data strobe (DQS) and the clock (CK) at each DRAM chip, compensating for fly-by routing skew.

DDR4 uses fly-by topology for the clock, address, and command signals. Instead of routing these signals in parallel to each DRAM chip (as in DDR2), they are routed serially: the signal enters at one end of the DIMM and passes through each chip in sequence. This improves signal integrity by reducing stub reflections, but it means each chip receives the clock at a different time. On a dual-rank DIMM with 16 chips, the clock skew between the first and last chip can be several hundred picoseconds.

The data signals (DQ) and data strobes (DQS), however, are routed directly from the controller to each byte lane. So the data arrives at all chips at roughly the same time, but the clock arrives at different times. Write leveling measures and compensates for this clock skew.

Algorithm:

  1. The controller sets MR1[7]=1, putting the DRAM in write-leveling mode.
  2. In this mode, the DRAM samples the clock (CK) on the rising edge of DQS and returns the sampled value on DQ.
  3. The controller sends a DQS pulse with an initial delay (the “seed value”). The AGESA F15tn code uses platform-specific seeds: 0x1A for unbuffered DIMMs, 0x3B for registered DIMMs.
  4. If DQ returns 0, the DQS pulse arrived before the CK rising edge. The controller increments the delay.
  5. The controller repeats, sweeping the delay until DQ transitions from 0 to 1. At this point, DQS is aligned with CK at that DRAM chip.
  6. The delay is locked and stored in a per-chip register.
  7. The process repeats for each DQS signal (each byte lane) on each DIMM.

The AGESA code shows this runs in two passes: Pass 1 at 400 MHz to get coarse alignment, then Pass 2 at the target frequency with seeds derived from Pass 1 results.

4. DQS Receiver Enable (Gate Training)

What it calibrates: The timing of the DQS gate signal in the memory controller.

During a read, the DRAM drives DQS along with the data. But DQS is only valid during the read burst. Between bursts, DQS is floating, and electrical noise can cause false transitions that would corrupt the controller’s read FIFO. The DQS gate is a window that tells the controller when to “listen” for valid DQS transitions and when to ignore the line.

Algorithm:

  1. The controller issues reads and observes the DQS signal.
  2. It places the gate in the middle of a read burst and detects the 0-to-1 transition of DQS.
  3. The gate is advanced by 90 degrees of MEM_CLK.
  4. The controller walks the gate back one MEM_CLK at a time.
  5. When the gate first detects a 0 during the walkback, it has found the DQS preamble (the stable low period before the first DQS transition in a burst).
  6. The gate position is locked with quarter-bit accuracy.

The AGESA code implements this as HwBasedDQSReceiverEnableTrainingPart1 and Part2, with both hardware-assisted and software fallback paths.

5. Read Centering

What it calibrates: The internal read capture delay, placing the sampling point at the center of the read data eye.

This is the horizontal centering for reads. The controller needs to sample incoming data (DQ) at the moment when the signal is most stable, which is the center of the eye.

Algorithm:

  1. The controller enables MPR (Multi-Purpose Register) mode by setting MR3[2]=1. This causes reads to return known test patterns from four 8-bit registers instead of actual memory contents. The patterns are: MPR0=01010101, MPR1=00110011, MPR2=00001111, MPR3=00000000.
  2. The controller issues continuous reads.
  3. It incrementally sweeps the internal read delay register from one extreme to the other.
  4. At each delay value, it compares the received data against the expected MPR pattern.
  5. The first delay value that produces correct data is the left edge of the eye. The last delay value that produces correct data is the right edge.
  6. The read delay is set to the midpoint between left and right edges.
  7. This is performed per DQ bit (per-bit deskew), because each data line has a slightly different optimal sampling point.

6. Write Centering

What it calibrates: The timing of write data (DQ) relative to the write strobe (DQS), placing the data transition at the center of the DRAM’s write data eye.

Algorithm:

  1. The controller writes a known pattern to DRAM, then reads it back.
  2. It incrementally adjusts the write delay for each DQ bit relative to DQS.
  3. At each delay value, it compares the read-back data against the written pattern.
  4. The range of delay values that produce correct read-back is the passing window (the eye).
  5. Left and right edges are identified, and the delay is set to the center.
  6. Repeated per DQ bit.

More sophisticated training algorithms (available in newer controllers and used in Synopsys DDR PHY IP) use PRBS (Pseudo-Random Bit Sequence) patterns like PRBS23 or PRBS31 instead of simple toggle patterns. PRBS patterns stress worst-case inter-symbol interference, crosstalk, and simultaneous switching noise patterns that simple patterns miss. This produces a smaller but more realistic eye during training, leading to a more robust center point during actual operation.

7. MaxRdLatency Training

What it calibrates: The maximum time the memory controller waits for read data to arrive after issuing a read command.

This accounts for the total round-trip delay through the controller pipeline, PHY, PCB traces, DRAM internal CAS latency, and the return path. If this value is too short, the controller stops listening before the data arrives. If too long, it wastes bandwidth waiting for data that already arrived.

The controller issues reads with progressively longer latency windows until it finds the minimum value that consistently captures the data.

Complete AGESA Training Sequence

From the open-source F15tn code in mntrain3.c:

  1. EnterHardwareTraining — initialize training mode
  2. LrdimmBuf2DramTrain — LRDIMM buffer training (if applicable)
  3. SwWLTraining — software write leveling
  4. HwBasedWLTrainingPart1 — hardware write leveling pass 1 (at 400 MHz)
  5. HwBasedDQSReceiverEnableTrainingPart1 — receiver enable pass 1
  6. Frequency ramp loop (increase speed, retrain):
  7. NonOptimizedSWDQSRecEnTrainingPart1 — basic DQS recovery adjustments
  8. OptimizedSwDqsRecEnTrainingPart1 — refined DQS recovery
  9. NonOptimizedSRdWrPosTraining — initial read/write position training
  10. OptimizedSRdWrPosTraining — optimized read/write centering
  11. MaxRdLatencyTraining — maximum read latency calibration
  12. TrainExitHwTrn — exit hardware training mode

On-Die Termination and ProcODT

The reflection problem

When a high-speed electrical signal reaches the end of a transmission line, any mismatch between the trace impedance and the termination impedance causes part of the signal energy to reflect back toward the source. These reflections interfere with subsequent signals and can close the eye, causing data errors.

DDR4 uses on-die termination (ODT) to address this. Instead of external termination resistors on the motherboard (as in earlier DDR generations), the termination resistors are built into the DRAM chips and the CPU’s memory controller. This reduces stub lengths and improves signal integrity at high frequencies.

Two sides of the termination

DRAM-side termination is controlled by three Mode Register settings:

Processor-side termination (ProcODT) is the termination impedance at the CPU’s memory controller pins. This is what the BIOS setting controls.

ProcODT’s role in training

ProcODT is a precondition for training, not a parameter that training adjusts. It must be set before training begins (it is part of the APCB configuration that ABL reads). If ProcODT is wrong:

Recommended ProcODT ranges for AMD Zen 1: - 4 single-rank DIMMs: 40-53.3 ohms - 4 dual-rank DIMMs: 48-60 ohms - 8 dual-rank DIMMs: 53.3-60 ohms (sometimes 68 ohms)

These ranges are empirical, drawn from community tuning experience and overclocker reports rather than AMD-published specifications. AMD does not document recommended ProcODT settings per DIMM configuration.

ProcODT interacts with DRAM-side RTT values and DRAM voltage. Higher ProcODT values work better with lower DRAM voltages; running high ProcODT with high DRAM voltage simultaneously can degrade signal quality. These interactions are why memory tuning on fully populated boards can require iteration.

Additional voltage rails

Two other voltage rails affect memory signal quality on Zen:

What Manual Settings Control vs What Training Determines

Understanding the division of labor between user-configured BIOS settings and the automatic training process:

Set manually (or via SPD/XMP)

These are the “target” that training optimizes around:

Determined automatically by training

These are the per-signal, per-bit calibration results:

The manual settings define the operating envelope. Training finds the optimal operating point within that envelope for the specific physical hardware. This is why the same manual settings can produce different training outcomes on different boards, or even on the same board at different temperatures.

Training Failure Modes

Symptoms and causes

Symptom Likely cause
System power-cycles 2-3 times, then boots Normal training behavior. Each cycle is a retry with adjusted parameters.
System power-cycles repeatedly, then boots to “CMOS reset” message Training failed after exhausting retries. AGESA fell back to safe defaults (overclock recovery).
Stuck on POST code 0d indefinitely (Gigabyte/AMI boards) Training failed and AGESA is stuck. Requires manual power-off and CMOS clear.
Stuck on POST code C0 or C7 Memory detection failure (C0) or memory initialization failure (C7). Often a seating issue or incompatible DIMM.
Boots but with intermittent errors under load Training succeeded with marginal centering. The eye is open but narrow.
Cold boot fails but warm reboot works Temperature change between power states shifts signal characteristics enough to push marginal training over the edge. Often a ProcODT or SoC voltage issue.
Intermittent training success/failure on identical settings Multiple possible causes: ProcODT on Auto selecting different values each attempt; training algorithm searching a large parameter space with insufficient guidance from manual settings; dead CMOS battery losing training results between boots.

The 0d code specifically

On Gigabyte X399 boards, POST code 0d is documented as “Reserved for future AMI SEC error codes.” In practice, on Zen 1 Threadripper, it indicates that the ABL memory training stage failed to find a stable configuration. If the code flashes by and the system continues, it means one training attempt failed and the PSP retried with different parameters. If the code sticks, training has exhausted its retry budget.

Open Source Status and openSIL

Current status (as of 2026)

Platform Training code Status
AMD Family 14h (Ontario) Open source in coreboot Readable, auditable
AMD Family 15h (Trinity/Kaveri) Open source in coreboot Readable, auditable
AMD Family 16h (Kabini) Open source in coreboot Readable, auditable
AMD Family 17h (Zen 1, Zen+, Zen 2) Proprietary ABL binary blobs Not readable
AMD Family 19h (Zen 3, Zen 4) Proprietary ABL binary blobs Not readable
AMD Family 1Ah (Zen 5) Proprietary ABL binary blobs Not readable

AMD open-sourced AGESA in 2011, but only for the platforms listed above. For Family 17h onward, AGESA is distributed as pre-built binaries. AMD’s stated reason is the need to protect IP while enabling faster coreboot support.

openSIL

AMD announced openSIL (Open Silicon Initialization Library) in 2023 as an open-source replacement for AGESA. It consists of three statically linked libraries (xSIM, xPRF, xUSL) under an MIT license. openSIL is expected to be production-ready for 6th-generation EPYC (“Venice”) and Ryzen Zen 6 (“Medusa”) in 2026-2027, at which point AGESA will reach end-of-life.

A proof-of-concept port to an AM5 motherboard with coreboot has been demonstrated. However, memory training under openSIL is still described as running on “embedded microcontrollers prior to x86 reset de-assertion,” suggesting the PSP will continue to handle DRAM initialization even under the new architecture.

References

DDR4 Specification and Training

AMD AGESA and PSP

Signal Integrity

Practical Tuning

Training Data Persistence