DDR4 Memory Training: How It Works

A detailed technical reference on the DDR4 memory training process, with particular attention to AMD Zen 1 / Threadripper platforms. This is the deep-dive companion to my ZFS NAS build writeup.

I used Claude (Anthropic’s AI assistant) for research and writing help on this piece.

Overview
Where Training Runs on AMD Zen
The Eye Diagram
Training Steps in Detail
On-Die Termination and ProcODT
What Manual Settings Control vs What Training Determines
Training Failure Modes
Open Source Status and openSIL
References

Overview

DDR4 memory training is a calibration process that runs during POST, before the operating system loads. Its purpose is to characterize the electrical properties of the specific physical path between the CPU’s memory controller and each DRAM chip on each installed DIMM, then configure the timing and voltage parameters needed for reliable data transfer at the target speed.

The need for training arises from manufacturing variation. Every motherboard has slightly different trace lengths, impedance characteristics, and crosstalk profiles. Every DRAM chip has slightly different internal timing and drive strength characteristics. Even two boards of the same model with identical DIMMs will have different optimal parameters. Training measures the actual electrical behavior of the specific system and adjusts dozens of timing registers to match.

The entire process can be understood through a single concept: the eye diagram.

Where Training Runs on AMD Zen

On AMD Zen (Family 17h) platforms, including Threadripper 1000-series, memory training does not run on the x86 cores. It runs on the PSP (Platform Security Processor), an ARM Cortex-A5 core embedded in the CPU die.

The boot sequence:

Power on. The PSP boots from on-die ROM.
PSP loads and verifies the ABL (AGESA Bootloader) stages from SPI flash.
ABL stages execute in sequence. One of them parses the APCB (AMD PSP Customization Block), which contains SPD data from the installed DIMMs and platform configuration (including any manual BIOS overrides for timings, voltage, ProcODT, etc.).
ABL runs memory training. This is the phase where the system power-cycles and the screen stays black.
Training results are written to the APOB (AGESA PSP Output Block) in DRAM.
PSP decompresses the BIOS image into DRAM and releases the x86 cores from reset.
The first x86 instruction fetch comes from DRAM, which is already online.

This means DRAM is fully initialized and trained before the x86 CPU executes a single instruction. The POST codes visible on the motherboard’s debug LED (like 0d, C0, AE) come from ABL stages running on the PSP, not from UEFI/BIOS code on the x86 cores.

Training results can be cached in SPI flash (as part of the APOB NV copy). On S3 resume, the PSP replays the saved training data instead of retraining from scratch, which is much faster. On cold boot (S5), the behavior depends on the BIOS’s Fast Boot setting: enabled replays cached data, disabled forces a full retrain. Research by PC Engines found that the APOB training data can have byte-level variations across boots, making cached results unreliable for cold boot on some platforms. This is one reason why disabling Fast Boot is recommended for stability-critical configurations.

Source code availability

The training code for Zen 1 is proprietary. It is distributed as binary ABL blobs and is not readable or auditable.

However, AMD open-sourced AGESA for older platforms (Family 14h, 15h, 16h). The full memory training source code for these platforms is available in coreboot at src/vendorcode/amd/agesa/f15tn/Proc/Mem/. The training algorithms are architecturally similar to Zen, so this code provides a good understanding of the general approach. Key files:

NB/mntrain3.c — training sequence dispatcher
Tech/DDR3/mttwl3.c — write leveling implementation
Main/mmflow.c — memory initialization flow
Main/mmStandardTraining.c — top-level training entry point

The Eye Diagram

The eye diagram is the central concept in understanding why memory training exists and what it optimizes.

To create an eye diagram, you overlay many successive data bit transitions on top of each other, triggered on the clock or data strobe edge. If the signal is clean, the overlaid transitions form a pattern that looks like an open eye.

        Voltage
          ^
   VIH ---+         _____________________
          |        /   :             :   \
          |       /    :             :    \
          |      /     :             :     \
   Vref --+-----X------:-- EYE -----:------X-----
          |      \     :  (sample   :     /
          |       \    :   here)    :    /
          |        \   :             :   /
   VIL ---+         ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
          |            :             :
          +------------+-------------+-------> Time
                    setup          hold
                    margin         margin

                  |<--- eye width --->|

Eye width (horizontal opening) = timing margin. The range of sampling points in time where the data is valid. Measured in picoseconds.
Eye height (vertical opening) = voltage margin. The difference between a reliable 0 and a reliable 1 at the sampling point.
Eye closure = degradation from jitter (horizontal), noise (vertical), inter-symbol interference (ISI), and crosstalk from adjacent traces.

The goal of training is to find the center of the eye for each signal. This maximizes the margin against all sources of degradation, making the link as robust as possible against temperature drift, voltage fluctuation, and aging.

At DDR4-3200 (1600 MHz clock), a single unit interval (one bit period) is 625 ps. The JEDEC specification requires setup and hold margins that leave only tens of picoseconds of slack. At DDR4-2133, the unit interval is wider (938 ps), which is one reason lower speeds are more forgiving and easier to train.

Per-bit training

Training is performed per DQ bit, not per DIMM or per byte lane. Each of the 64 data lines (plus 8 ECC lines on ECC DIMMs) in a DDR4 channel has its own trace on the motherboard with its own length, impedance, and coupling to adjacent traces. The optimal sampling point for DQ0 may differ from DQ7 by tens of picoseconds. Per-bit deskew registers allow the controller to fine-tune the delay for each individual data line.

Training Steps in Detail

The following sequence is based on the JEDEC DDR4 specification (JESD79-4B) and the open-source AGESA F15tn training code in coreboot. Zen 1 follows the same general sequence, though the implementation details are in proprietary ABL code.

1. ZQ Calibration

What it calibrates: Output driver impedance and on-die termination resistance inside each DRAM chip.

Each DQ output pin on a DRAM chip contains parallel 240-ohm resistor legs made of polysilicon. Polysilicon resistance is inherently imprecise and varies with temperature and manufacturing process. The DRAM chip has an internal calibration circuit connected to an external precision 240-ohm resistor on the ZQ pin (soldered onto the DIMM PCB).

When the controller issues a ZQCL (ZQ Calibration Long) command, the DRAM’s internal comparator adjusts p-channel transistors in a voltage divider until it reaches equilibrium at VDDq/2, referenced against the external precision resistor. The resulting calibration values propagate to all DQ pins, so their drive strength and termination match the PCB impedance.

ZQCL runs once during initialization. Shorter ZQCS (ZQ Calibration Short) commands can run periodically during operation to compensate for temperature drift.

2. Vref DQ Calibration

What it calibrates: The voltage threshold inside each DRAM chip that determines whether an incoming signal is a logic 0 or logic 1.

DDR4 uses POD (Pseudo Open Drain) signaling, where the signal swings between GND and VDDQ. Unlike DDR3, which used an external Vref pin, DDR4 generates Vref internally. The reference voltage is programmable via Mode Register 6 (MR6) with a step size of 0.65% of VDDQ per step.

Vref calibration is the vertical centering of the eye diagram. If Vref is too high, noise on low signals can cause false 1 readings. If too low, noise on high signals can cause false 0 readings. The optimal Vref places the threshold at the point of maximum vertical margin.

DDR4 supports PDA (Per DRAM Addressability), which allows setting a different Vref for each individual DRAM chip on the DIMM. This is useful because DRAM chips at different physical positions on the DIMM see different signal amplitudes due to trace routing.

3. Write Leveling

What it calibrates: The timing relationship between the data strobe (DQS) and the clock (CK) at each DRAM chip, compensating for fly-by routing skew.

DDR4 uses fly-by topology for the clock, address, and command signals. Instead of routing these signals in parallel to each DRAM chip (as in DDR2), they are routed serially: the signal enters at one end of the DIMM and passes through each chip in sequence. This improves signal integrity by reducing stub reflections, but it means each chip receives the clock at a different time. On a dual-rank DIMM with 16 chips, the clock skew between the first and last chip can be several hundred picoseconds.

The data signals (DQ) and data strobes (DQS), however, are routed directly from the controller to each byte lane. So the data arrives at all chips at roughly the same time, but the clock arrives at different times. Write leveling measures and compensates for this clock skew.

Algorithm:

The controller sets MR1[7]=1, putting the DRAM in write-leveling mode.
In this mode, the DRAM samples the clock (CK) on the rising edge of DQS and returns the sampled value on DQ.
The controller sends a DQS pulse with an initial delay (the “seed value”). The AGESA F15tn code uses platform-specific seeds: 0x1A for unbuffered DIMMs, 0x3B for registered DIMMs.
If DQ returns 0, the DQS pulse arrived before the CK rising edge. The controller increments the delay.
The controller repeats, sweeping the delay until DQ transitions from 0 to 1. At this point, DQS is aligned with CK at that DRAM chip.
The delay is locked and stored in a per-chip register.
The process repeats for each DQS signal (each byte lane) on each DIMM.

The AGESA code shows this runs in two passes: Pass 1 at 400 MHz to get coarse alignment, then Pass 2 at the target frequency with seeds derived from Pass 1 results.

4. DQS Receiver Enable (Gate Training)

What it calibrates: The timing of the DQS gate signal in the memory controller.

During a read, the DRAM drives DQS along with the data. But DQS is only valid during the read burst. Between bursts, DQS is floating, and electrical noise can cause false transitions that would corrupt the controller’s read FIFO. The DQS gate is a window that tells the controller when to “listen” for valid DQS transitions and when to ignore the line.

Algorithm:

The controller issues reads and observes the DQS signal.
It places the gate in the middle of a read burst and detects the 0-to-1 transition of DQS.
The gate is advanced by 90 degrees of MEM_CLK.
The controller walks the gate back one MEM_CLK at a time.
When the gate first detects a 0 during the walkback, it has found the DQS preamble (the stable low period before the first DQS transition in a burst).
The gate position is locked with quarter-bit accuracy.

The AGESA code implements this as HwBasedDQSReceiverEnableTrainingPart1 and Part2, with both hardware-assisted and software fallback paths.

5. Read Centering

What it calibrates: The internal read capture delay, placing the sampling point at the center of the read data eye.

This is the horizontal centering for reads. The controller needs to sample incoming data (DQ) at the moment when the signal is most stable, which is the center of the eye.

Algorithm:

The controller enables MPR (Multi-Purpose Register) mode by setting MR3[2]=1. This causes reads to return known test patterns from four 8-bit registers instead of actual memory contents. The patterns are: MPR0=01010101, MPR1=00110011, MPR2=00001111, MPR3=00000000.
The controller issues continuous reads.
It incrementally sweeps the internal read delay register from one extreme to the other.
At each delay value, it compares the received data against the expected MPR pattern.
The first delay value that produces correct data is the left edge of the eye. The last delay value that produces correct data is the right edge.
The read delay is set to the midpoint between left and right edges.
This is performed per DQ bit (per-bit deskew), because each data line has a slightly different optimal sampling point.

6. Write Centering

What it calibrates: The timing of write data (DQ) relative to the write strobe (DQS), placing the data transition at the center of the DRAM’s write data eye.

Algorithm:

The controller writes a known pattern to DRAM, then reads it back.
It incrementally adjusts the write delay for each DQ bit relative to DQS.
At each delay value, it compares the read-back data against the written pattern.
The range of delay values that produce correct read-back is the passing window (the eye).
Left and right edges are identified, and the delay is set to the center.
Repeated per DQ bit.

More sophisticated training algorithms (available in newer controllers and used in Synopsys DDR PHY IP) use PRBS (Pseudo-Random Bit Sequence) patterns like PRBS23 or PRBS31 instead of simple toggle patterns. PRBS patterns stress worst-case inter-symbol interference, crosstalk, and simultaneous switching noise patterns that simple patterns miss. This produces a smaller but more realistic eye during training, leading to a more robust center point during actual operation.

7. MaxRdLatency Training

What it calibrates: The maximum time the memory controller waits for read data to arrive after issuing a read command.

This accounts for the total round-trip delay through the controller pipeline, PHY, PCB traces, DRAM internal CAS latency, and the return path. If this value is too short, the controller stops listening before the data arrives. If too long, it wastes bandwidth waiting for data that already arrived.

The controller issues reads with progressively longer latency windows until it finds the minimum value that consistently captures the data.

Complete AGESA Training Sequence

From the open-source F15tn code in mntrain3.c:

EnterHardwareTraining — initialize training mode
LrdimmBuf2DramTrain — LRDIMM buffer training (if applicable)
SwWLTraining — software write leveling
HwBasedWLTrainingPart1 — hardware write leveling pass 1 (at 400 MHz)
HwBasedDQSReceiverEnableTrainingPart1 — receiver enable pass 1
Frequency ramp loop (increase speed, retrain):
- HwBasedWLTrainingPart2 — write leveling at target frequency
- HwBasedDQSReceiverEnableTrainingPart2 — receiver enable at target frequency
NonOptimizedSWDQSRecEnTrainingPart1 — basic DQS recovery adjustments
OptimizedSwDqsRecEnTrainingPart1 — refined DQS recovery
NonOptimizedSRdWrPosTraining — initial read/write position training
OptimizedSRdWrPosTraining — optimized read/write centering
MaxRdLatencyTraining — maximum read latency calibration
TrainExitHwTrn — exit hardware training mode

On-Die Termination and ProcODT

The reflection problem

When a high-speed electrical signal reaches the end of a transmission line, any mismatch between the trace impedance and the termination impedance causes part of the signal energy to reflect back toward the source. These reflections interfere with subsequent signals and can close the eye, causing data errors.

DDR4 uses on-die termination (ODT) to address this. Instead of external termination resistors on the motherboard (as in earlier DDR generations), the termination resistors are built into the DRAM chips and the CPU’s memory controller. This reduces stub lengths and improves signal integrity at high frequencies.

Two sides of the termination

DRAM-side termination is controlled by three Mode Register settings:

RTT_NOM (MR1): Active termination during reads. Controls how non-target ranks terminate while the target rank is being read.
RTT_WR (MR2): Dynamic write termination. Activated during writes to the target rank.
RTT_PARK (MR5): Always-on passive termination. Provides background termination regardless of ODT pin state.

Processor-side termination (ProcODT) is the termination impedance at the CPU’s memory controller pins. This is what the BIOS setting controls.

ProcODT’s role in training

ProcODT is a precondition for training, not a parameter that training adjusts. It must be set before training begins (it is part of the APCB configuration that ABL reads). If ProcODT is wrong:

Too high (e.g., 80 ohms when 60 is needed): Excessive signal reflection, reduced eye height, training may lock onto a shifted or narrowed eye.
Too low (e.g., 20 ohms when 60 is needed): Signal over-damped, reduced amplitude, eye height reduced from the opposite direction.
Near miss (one step from optimal): Training may succeed but with minimal margin. The system appears stable most of the time but fails intermittently under temperature changes, cold boot vs warm reboot, or sustained load.

Recommended ProcODT ranges for AMD Zen 1: - 4 single-rank DIMMs: 40-53.3 ohms - 4 dual-rank DIMMs: 48-60 ohms - 8 dual-rank DIMMs: 53.3-60 ohms (sometimes 68 ohms)

These ranges are empirical, drawn from community tuning experience and overclocker reports rather than AMD-published specifications. AMD does not document recommended ProcODT settings per DIMM configuration.

ProcODT interacts with DRAM-side RTT values and DRAM voltage. Higher ProcODT values work better with lower DRAM voltages; running high ProcODT with high DRAM voltage simultaneously can degrade signal quality. These interactions are why memory tuning on fully populated boards can require iteration.

Additional voltage rails

Two other voltage rails affect memory signal quality on Zen:

VDDP (cLDO_VDDP): Powers the PHY (physical layer interface) between the memory controller and the DDR4 signaling. Must be at least 0.1V below VDIMM. Small changes can have large effects on training stability.
VDDG (cLDO_VDDG): Powers the Infinity Fabric. On Zen 1, this is a single rail. Must be at least 40 mV below SoC voltage since it is derived from SoC.

What Manual Settings Control vs What Training Determines

Understanding the division of labor between user-configured BIOS settings and the automatic training process:

Set manually (or via SPD/XMP)

These are the “target” that training optimizes around:

Memory frequency (e.g., DDR4-2133)
Primary timings: tCL, tRCD, tRP, tRAS
Command rate (1T/2T)
DRAM voltage
SoC voltage
ProcODT
Gear Down Mode (on/off)
DRAM-side RTT values (usually left on Auto, which lets AGESA pick based on DIMM population)

Determined automatically by training

These are the per-signal, per-bit calibration results:

Write leveling delays (per byte lane, per DIMM)
DQS receiver enable timing (per byte lane)
Read DQ delay (per bit, per byte lane)
Write DQ delay (per bit, per byte lane)
Vref DQ (per DRAM chip, if PDA is used)
MaxRdLatency
Internal drive strength / slew rate adjustments
Dozens of secondary and tertiary timing parameters

The manual settings define the operating envelope. Training finds the optimal operating point within that envelope for the specific physical hardware. This is why the same manual settings can produce different training outcomes on different boards, or even on the same board at different temperatures.

Training Failure Modes

Symptoms and causes

Symptom	Likely cause
System power-cycles 2-3 times, then boots	Normal training behavior. Each cycle is a retry with adjusted parameters.
System power-cycles repeatedly, then boots to “CMOS reset” message	Training failed after exhausting retries. AGESA fell back to safe defaults (overclock recovery).
Stuck on POST code 0d indefinitely (Gigabyte/AMI boards)	Training failed and AGESA is stuck. Requires manual power-off and CMOS clear.
Stuck on POST code C0 or C7	Memory detection failure (C0) or memory initialization failure (C7). Often a seating issue or incompatible DIMM.
Boots but with intermittent errors under load	Training succeeded with marginal centering. The eye is open but narrow.
Cold boot fails but warm reboot works	Temperature change between power states shifts signal characteristics enough to push marginal training over the edge. Often a ProcODT or SoC voltage issue.
Intermittent training success/failure on identical settings	Multiple possible causes: ProcODT on Auto selecting different values each attempt; training algorithm searching a large parameter space with insufficient guidance from manual settings; dead CMOS battery losing training results between boots.

The 0d code specifically

On Gigabyte X399 boards, POST code 0d is documented as “Reserved for future AMI SEC error codes.” In practice, on Zen 1 Threadripper, it indicates that the ABL memory training stage failed to find a stable configuration. If the code flashes by and the system continues, it means one training attempt failed and the PSP retried with different parameters. If the code sticks, training has exhausted its retry budget.

Open Source Status and openSIL

Current status (as of 2026)

Platform	Training code	Status
AMD Family 14h (Ontario)	Open source in coreboot	Readable, auditable
AMD Family 15h (Trinity/Kaveri)	Open source in coreboot	Readable, auditable
AMD Family 16h (Kabini)	Open source in coreboot	Readable, auditable
AMD Family 17h (Zen 1, Zen+, Zen 2)	Proprietary ABL binary blobs	Not readable
AMD Family 19h (Zen 3, Zen 4)	Proprietary ABL binary blobs	Not readable
AMD Family 1Ah (Zen 5)	Proprietary ABL binary blobs	Not readable

AMD open-sourced AGESA in 2011, but only for the platforms listed above. For Family 17h onward, AGESA is distributed as pre-built binaries. AMD’s stated reason is the need to protect IP while enabling faster coreboot support.

openSIL

AMD announced openSIL (Open Silicon Initialization Library) in 2023 as an open-source replacement for AGESA. It consists of three statically linked libraries (xSIM, xPRF, xUSL) under an MIT license. openSIL is expected to be production-ready for 6th-generation EPYC (“Venice”) and Ryzen Zen 6 (“Medusa”) in 2026-2027, at which point AGESA will reach end-of-life.

A proof-of-concept port to an AM5 motherboard with coreboot has been demonstrated. However, memory training under openSIL is still described as running on “embedded microcontrollers prior to x86 reset de-assertion,” suggesting the PSP will continue to handle DRAM initialization even under the new architecture.

References

DDR4 Specification and Training

DDR4 Initialization, Training and Calibration (systemverilog.io) — the single best freely available technical reference on DDR4 training. Covers ZQ calibration, Vref, write leveling, read/write centering with diagrams.
JEDEC DDR4 Mini Workshop Presentation (PDF) — JEDEC’s official slides on DDR4 training modes, MPR patterns, and preamble training.
Firmware Training Benefits for DDR IP (Synopsys) — comparison of hardware vs firmware training, why PRBS patterns produce more robust centering than simple toggle patterns.

AMD AGESA and PSP

coreboot AMD Family 17h Documentation — describes the FSP 2.0 integration model, PSP boot sequence, APCB/APOB data flow.
coreboot PSP Integration Guide — PSP firmware types, ABL stage numbering, APCB/APOB roles.
Open-source AGESA F15tn training code (coreboot GitHub) — full readable training source for Family 15h. Key files: NB/mntrain3.c, Tech/DDR3/mttwl3.c.
Reversing the AMD Secure Processor (dayzerosec.com) — reverse engineering of the PSP boot process and ABL execution.
AGESA Bootloaders (DeepWiki) — ABL module architecture.
AMD openSIL GitHub Repository — proof-of-concept for the AGESA replacement.

Signal Integrity

Efficient Eye Diagram Testing in DDR3/DDR4 (Signal Integrity Journal) — measurement methodology for DDR4 eye diagrams.
Micron DDR4 Point-to-Point Design Guide (PDF) — PCB design guidelines including impedance targets and termination.

Practical Tuning

MemTestHelper DDR4 OC Guide (GitHub) — practical voltage and timing relationships for AMD platforms.
Finding Appropriate Termination Resistance on Ryzen (Overclock Odyssey) — practical ProcODT and RTT tuning methodology.

Training Data Persistence

PC Engines Fast Boot Research — investigation into AGESA training data consistency across boots.

DDR4 Memory Training: How It Works

Table of Contents

Overview

Where Training Runs on AMD Zen

Source code availability

The Eye Diagram

Per-bit training

Training Steps in Detail

1. ZQ Calibration

2. Vref DQ Calibration

3. Write Leveling

4. DQS Receiver Enable (Gate Training)

5. Read Centering

6. Write Centering

7. MaxRdLatency Training

Complete AGESA Training Sequence

On-Die Termination and ProcODT

The reflection problem

Two sides of the termination

ProcODT’s role in training

Additional voltage rails

What Manual Settings Control vs What Training Determines

Set manually (or via SPD/XMP)

Determined automatically by training

Training Failure Modes

Symptoms and causes

The 0d code specifically

Open Source Status and openSIL

Current status (as of 2026)

openSIL

References

DDR4 Specification and Training

AMD AGESA and PSP

Signal Integrity

Practical Tuning

Training Data Persistence