A detailed technical reference on the DDR4 memory training process, with particular attention to AMD Zen 1 / Threadripper platforms. This is the deep-dive companion to my ZFS NAS build writeup.
I used Claude (Anthropic’s AI assistant) for research and writing help on this piece.
DDR4 memory training is a calibration process that runs during POST, before the operating system loads. Its purpose is to characterize the electrical properties of the specific physical path between the CPU’s memory controller and each DRAM chip on each installed DIMM, then configure the timing and voltage parameters needed for reliable data transfer at the target speed.
The need for training arises from manufacturing variation. Every motherboard has slightly different trace lengths, impedance characteristics, and crosstalk profiles. Every DRAM chip has slightly different internal timing and drive strength characteristics. Even two boards of the same model with identical DIMMs will have different optimal parameters. Training measures the actual electrical behavior of the specific system and adjusts dozens of timing registers to match.
The entire process can be understood through a single concept: the eye diagram.
On AMD Zen (Family 17h) platforms, including Threadripper 1000-series, memory training does not run on the x86 cores. It runs on the PSP (Platform Security Processor), an ARM Cortex-A5 core embedded in the CPU die.
The boot sequence:
This means DRAM is fully initialized and trained before the x86 CPU executes a single instruction. The POST codes visible on the motherboard’s debug LED (like 0d, C0, AE) come from ABL stages running on the PSP, not from UEFI/BIOS code on the x86 cores.
Training results can be cached in SPI flash (as part of the APOB NV copy). On S3 resume, the PSP replays the saved training data instead of retraining from scratch, which is much faster. On cold boot (S5), the behavior depends on the BIOS’s Fast Boot setting: enabled replays cached data, disabled forces a full retrain. Research by PC Engines found that the APOB training data can have byte-level variations across boots, making cached results unreliable for cold boot on some platforms. This is one reason why disabling Fast Boot is recommended for stability-critical configurations.
The training code for Zen 1 is proprietary. It is distributed as binary ABL blobs and is not readable or auditable.
However, AMD open-sourced AGESA for older platforms (Family 14h, 15h,
16h). The full memory training source code for these platforms is
available in coreboot at
src/vendorcode/amd/agesa/f15tn/Proc/Mem/. The training
algorithms are architecturally similar to Zen, so this code provides a
good understanding of the general approach. Key files:
NB/mntrain3.c — training sequence dispatcherTech/DDR3/mttwl3.c — write leveling implementationMain/mmflow.c — memory initialization flowMain/mmStandardTraining.c — top-level training entry
pointThe eye diagram is the central concept in understanding why memory training exists and what it optimizes.
To create an eye diagram, you overlay many successive data bit transitions on top of each other, triggered on the clock or data strobe edge. If the signal is clean, the overlaid transitions form a pattern that looks like an open eye.
Voltage
^
VIH ---+ _____________________
| / : : \
| / : : \
| / : : \
Vref --+-----X------:-- EYE -----:------X-----
| \ : (sample : /
| \ : here) : /
| \ : : /
VIL ---+ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
| : :
+------------+-------------+-------> Time
setup hold
margin margin
|<--- eye width --->|
The goal of training is to find the center of the eye for each signal. This maximizes the margin against all sources of degradation, making the link as robust as possible against temperature drift, voltage fluctuation, and aging.
At DDR4-3200 (1600 MHz clock), a single unit interval (one bit period) is 625 ps. The JEDEC specification requires setup and hold margins that leave only tens of picoseconds of slack. At DDR4-2133, the unit interval is wider (938 ps), which is one reason lower speeds are more forgiving and easier to train.
Training is performed per DQ bit, not per DIMM or per byte lane. Each of the 64 data lines (plus 8 ECC lines on ECC DIMMs) in a DDR4 channel has its own trace on the motherboard with its own length, impedance, and coupling to adjacent traces. The optimal sampling point for DQ0 may differ from DQ7 by tens of picoseconds. Per-bit deskew registers allow the controller to fine-tune the delay for each individual data line.
The following sequence is based on the JEDEC DDR4 specification (JESD79-4B) and the open-source AGESA F15tn training code in coreboot. Zen 1 follows the same general sequence, though the implementation details are in proprietary ABL code.
What it calibrates: Output driver impedance and on-die termination resistance inside each DRAM chip.
Each DQ output pin on a DRAM chip contains parallel 240-ohm resistor legs made of polysilicon. Polysilicon resistance is inherently imprecise and varies with temperature and manufacturing process. The DRAM chip has an internal calibration circuit connected to an external precision 240-ohm resistor on the ZQ pin (soldered onto the DIMM PCB).
When the controller issues a ZQCL (ZQ Calibration Long) command, the DRAM’s internal comparator adjusts p-channel transistors in a voltage divider until it reaches equilibrium at VDDq/2, referenced against the external precision resistor. The resulting calibration values propagate to all DQ pins, so their drive strength and termination match the PCB impedance.
ZQCL runs once during initialization. Shorter ZQCS (ZQ Calibration Short) commands can run periodically during operation to compensate for temperature drift.
What it calibrates: The voltage threshold inside each DRAM chip that determines whether an incoming signal is a logic 0 or logic 1.
DDR4 uses POD (Pseudo Open Drain) signaling, where the signal swings between GND and VDDQ. Unlike DDR3, which used an external Vref pin, DDR4 generates Vref internally. The reference voltage is programmable via Mode Register 6 (MR6) with a step size of 0.65% of VDDQ per step.
Vref calibration is the vertical centering of the eye diagram. If Vref is too high, noise on low signals can cause false 1 readings. If too low, noise on high signals can cause false 0 readings. The optimal Vref places the threshold at the point of maximum vertical margin.
DDR4 supports PDA (Per DRAM Addressability), which allows setting a different Vref for each individual DRAM chip on the DIMM. This is useful because DRAM chips at different physical positions on the DIMM see different signal amplitudes due to trace routing.
What it calibrates: The timing relationship between the data strobe (DQS) and the clock (CK) at each DRAM chip, compensating for fly-by routing skew.
DDR4 uses fly-by topology for the clock, address, and command signals. Instead of routing these signals in parallel to each DRAM chip (as in DDR2), they are routed serially: the signal enters at one end of the DIMM and passes through each chip in sequence. This improves signal integrity by reducing stub reflections, but it means each chip receives the clock at a different time. On a dual-rank DIMM with 16 chips, the clock skew between the first and last chip can be several hundred picoseconds.
The data signals (DQ) and data strobes (DQS), however, are routed directly from the controller to each byte lane. So the data arrives at all chips at roughly the same time, but the clock arrives at different times. Write leveling measures and compensates for this clock skew.
Algorithm:
The AGESA code shows this runs in two passes: Pass 1 at 400 MHz to get coarse alignment, then Pass 2 at the target frequency with seeds derived from Pass 1 results.
What it calibrates: The timing of the DQS gate signal in the memory controller.
During a read, the DRAM drives DQS along with the data. But DQS is only valid during the read burst. Between bursts, DQS is floating, and electrical noise can cause false transitions that would corrupt the controller’s read FIFO. The DQS gate is a window that tells the controller when to “listen” for valid DQS transitions and when to ignore the line.
Algorithm:
The AGESA code implements this as
HwBasedDQSReceiverEnableTrainingPart1 and
Part2, with both hardware-assisted and software fallback
paths.
What it calibrates: The internal read capture delay, placing the sampling point at the center of the read data eye.
This is the horizontal centering for reads. The controller needs to sample incoming data (DQ) at the moment when the signal is most stable, which is the center of the eye.
Algorithm:
What it calibrates: The timing of write data (DQ) relative to the write strobe (DQS), placing the data transition at the center of the DRAM’s write data eye.
Algorithm:
More sophisticated training algorithms (available in newer controllers and used in Synopsys DDR PHY IP) use PRBS (Pseudo-Random Bit Sequence) patterns like PRBS23 or PRBS31 instead of simple toggle patterns. PRBS patterns stress worst-case inter-symbol interference, crosstalk, and simultaneous switching noise patterns that simple patterns miss. This produces a smaller but more realistic eye during training, leading to a more robust center point during actual operation.
What it calibrates: The maximum time the memory controller waits for read data to arrive after issuing a read command.
This accounts for the total round-trip delay through the controller pipeline, PHY, PCB traces, DRAM internal CAS latency, and the return path. If this value is too short, the controller stops listening before the data arrives. If too long, it wastes bandwidth waiting for data that already arrived.
The controller issues reads with progressively longer latency windows until it finds the minimum value that consistently captures the data.
From the open-source F15tn code in mntrain3.c:
EnterHardwareTraining — initialize training modeLrdimmBuf2DramTrain — LRDIMM buffer training (if
applicable)SwWLTraining — software write levelingHwBasedWLTrainingPart1 — hardware write leveling pass 1
(at 400 MHz)HwBasedDQSReceiverEnableTrainingPart1 — receiver enable
pass 1HwBasedWLTrainingPart2 — write leveling at target
frequencyHwBasedDQSReceiverEnableTrainingPart2 — receiver enable
at target frequencyNonOptimizedSWDQSRecEnTrainingPart1 — basic DQS
recovery adjustmentsOptimizedSwDqsRecEnTrainingPart1 — refined DQS
recoveryNonOptimizedSRdWrPosTraining — initial read/write
position trainingOptimizedSRdWrPosTraining — optimized read/write
centeringMaxRdLatencyTraining — maximum read latency
calibrationTrainExitHwTrn — exit hardware training modeWhen a high-speed electrical signal reaches the end of a transmission line, any mismatch between the trace impedance and the termination impedance causes part of the signal energy to reflect back toward the source. These reflections interfere with subsequent signals and can close the eye, causing data errors.
DDR4 uses on-die termination (ODT) to address this. Instead of external termination resistors on the motherboard (as in earlier DDR generations), the termination resistors are built into the DRAM chips and the CPU’s memory controller. This reduces stub lengths and improves signal integrity at high frequencies.
DRAM-side termination is controlled by three Mode Register settings:
Processor-side termination (ProcODT) is the termination impedance at the CPU’s memory controller pins. This is what the BIOS setting controls.
ProcODT is a precondition for training, not a parameter that training adjusts. It must be set before training begins (it is part of the APCB configuration that ABL reads). If ProcODT is wrong:
Recommended ProcODT ranges for AMD Zen 1: - 4 single-rank DIMMs: 40-53.3 ohms - 4 dual-rank DIMMs: 48-60 ohms - 8 dual-rank DIMMs: 53.3-60 ohms (sometimes 68 ohms)
These ranges are empirical, drawn from community tuning experience and overclocker reports rather than AMD-published specifications. AMD does not document recommended ProcODT settings per DIMM configuration.
ProcODT interacts with DRAM-side RTT values and DRAM voltage. Higher ProcODT values work better with lower DRAM voltages; running high ProcODT with high DRAM voltage simultaneously can degrade signal quality. These interactions are why memory tuning on fully populated boards can require iteration.
Two other voltage rails affect memory signal quality on Zen:
Understanding the division of labor between user-configured BIOS settings and the automatic training process:
These are the “target” that training optimizes around:
These are the per-signal, per-bit calibration results:
The manual settings define the operating envelope. Training finds the optimal operating point within that envelope for the specific physical hardware. This is why the same manual settings can produce different training outcomes on different boards, or even on the same board at different temperatures.
| Symptom | Likely cause |
|---|---|
| System power-cycles 2-3 times, then boots | Normal training behavior. Each cycle is a retry with adjusted parameters. |
| System power-cycles repeatedly, then boots to “CMOS reset” message | Training failed after exhausting retries. AGESA fell back to safe defaults (overclock recovery). |
| Stuck on POST code 0d indefinitely (Gigabyte/AMI boards) | Training failed and AGESA is stuck. Requires manual power-off and CMOS clear. |
| Stuck on POST code C0 or C7 | Memory detection failure (C0) or memory initialization failure (C7). Often a seating issue or incompatible DIMM. |
| Boots but with intermittent errors under load | Training succeeded with marginal centering. The eye is open but narrow. |
| Cold boot fails but warm reboot works | Temperature change between power states shifts signal characteristics enough to push marginal training over the edge. Often a ProcODT or SoC voltage issue. |
| Intermittent training success/failure on identical settings | Multiple possible causes: ProcODT on Auto selecting different values each attempt; training algorithm searching a large parameter space with insufficient guidance from manual settings; dead CMOS battery losing training results between boots. |
On Gigabyte X399 boards, POST code 0d is documented as “Reserved for future AMI SEC error codes.” In practice, on Zen 1 Threadripper, it indicates that the ABL memory training stage failed to find a stable configuration. If the code flashes by and the system continues, it means one training attempt failed and the PSP retried with different parameters. If the code sticks, training has exhausted its retry budget.
| Platform | Training code | Status |
|---|---|---|
| AMD Family 14h (Ontario) | Open source in coreboot | Readable, auditable |
| AMD Family 15h (Trinity/Kaveri) | Open source in coreboot | Readable, auditable |
| AMD Family 16h (Kabini) | Open source in coreboot | Readable, auditable |
| AMD Family 17h (Zen 1, Zen+, Zen 2) | Proprietary ABL binary blobs | Not readable |
| AMD Family 19h (Zen 3, Zen 4) | Proprietary ABL binary blobs | Not readable |
| AMD Family 1Ah (Zen 5) | Proprietary ABL binary blobs | Not readable |
AMD open-sourced AGESA in 2011, but only for the platforms listed above. For Family 17h onward, AGESA is distributed as pre-built binaries. AMD’s stated reason is the need to protect IP while enabling faster coreboot support.
AMD announced openSIL (Open Silicon Initialization Library) in 2023 as an open-source replacement for AGESA. It consists of three statically linked libraries (xSIM, xPRF, xUSL) under an MIT license. openSIL is expected to be production-ready for 6th-generation EPYC (“Venice”) and Ryzen Zen 6 (“Medusa”) in 2026-2027, at which point AGESA will reach end-of-life.
A proof-of-concept port to an AM5 motherboard with coreboot has been demonstrated. However, memory training under openSIL is still described as running on “embedded microcontrollers prior to x86 reset de-assertion,” suggesting the PSP will continue to handle DRAM initialization even under the new architecture.
NB/mntrain3.c,
Tech/DDR3/mttwl3.c.