Building a ZFS NAS from Spare Parts

Table of Contents

  1. Parts List
  2. The Memory Problem
  3. BIOS Update and the CMOS Battery
  4. Intermittent Boot Failures
  5. Memory Training
  6. The Settings That Worked
  7. The Overnight Test
  8. MemTest Results
  9. SSD Inventory
  10. ZFS Storage Design
  11. Bringing It Online

Introduction

I had an AMD Threadripper 1950X and a Gigabyte X399 Aorus Gaming 7 motherboard from a previous build, along with a growing collection of photos and videos that needed a proper home with redundancy and fast access. The goal was a dedicated ZFS NAS with RAIDZ2, SSD caching tiers, and as much RAM as possible for the ZFS ARC (Adaptive Replacement Cache).

The storage and CPU side of the build came together without much trouble. Getting all 128GB of mixed DDR4 to run stably on a first-generation Threadripper turned into a longer journey than expected, but also a surprisingly educational one. I ended up learning more than I anticipated about how DDR4 memory training works at the hardware level.

I used Claude (Anthropic’s AI assistant) for research and writing help on this piece.

Parts List

Component Details
CPU AMD Threadripper 1950X (16C/32T, quad-channel DDR4)
Motherboard Gigabyte X399 Aorus Gaming 7
RAM 128GB DDR4-2666 (8x 16GB, two different Patriot Viper Elite kits)
Storage 6x 16TB HDD (RAIDZ2) + SSD caching tiers
Boot 256GB SSD (ext4)
OS Ubuntu 25.10

The 1950X provides 64 PCIe 3.0 lanes. On this board, the three M.2 slots connect directly to the CPU’s PCIe lanes while the eight SATA ports connect to the X399 chipset’s dedicated SATA controller. Since M.2 and SATA use completely different lane sources, there are no sharing conflicts. All three M.2 slots and all eight SATA ports can be populated simultaneously, which is important for the ZFS caching topology: 6 HDDs on SATA for the main array, plus separate SSDs for the special vdev, SLOG, and L2ARC.

The Memory Problem

I had two sets of Patriot Viper Elite DDR4-2666 16GB sticks, four from each kit, for a total of eight sticks and 128GB. Both kits are labeled DDR4-2666 CL16. The assumption was that they were close enough to run together.

Decoding the Part Numbers

Looking more carefully at the part numbers told a different story.

Kit Part Number XMP Timings JEDEC Default Manufactured
“GY” (Viper Elite) PVE416G266C6GY 16-17-17-36 DDR4-2133 / 15-15-15-36 2021, Week 7
“K” (Viper Elite II) PVE2432G266C6K 16-17-17-36 DDR4-2666 / 19-19-19-43 2023, Week 37

I bought the GY sticks first in 2021. When I wanted to double the capacity in 2024, the original Viper Elite was no longer available, so I bought the Viper Elite II (K sticks) since they appeared to have matching timings. The K sticks were manufactured in 2023 (Week 37), about two and a half years after the GY sticks (2021, Week 7). Different product generation, different production run, and almost certainly different DRAM ICs under the same heatspreader.

The XMP profiles are identical: both kits are rated for DDR4-2666 at 16-17-17-36, 1.2V. But the JEDEC base profiles, which are what the motherboard falls back to when XMP is disabled, are completely different. The original Elite defaults to DDR4-2133 at tight timings (15-15-15-36); the Elite II defaults to DDR4-2666 at very loose timings (19-19-19-43). When the motherboard has to auto-negotiate without XMP, these two sets of sticks are giving it different instructions about what “default” means.

With identical kits, you can enable XMP and let the motherboard apply a single consistent profile to all sticks. With mixed kits that have different JEDEC defaults, the motherboard receives conflicting base profiles and has to reconcile them. On top of the SPD conflict, sticks from different production runs likely have different electrical characteristics (drive strength, termination behavior, internal timing margins) even when the rated specs match. The training algorithm has to find parameters that work for both types of silicon simultaneously, which narrows the overall stability window. This is manageable on a lightly populated board. On a Threadripper with all eight slots filled, the memory controller is already under significant electrical stress, and narrower margins become a real problem.

AMD Memory Controller Limits

The 1950X’s memory controller has official validated speeds that depend on the number and type of installed DIMMs (source: AMD reviewer’s guide, reproduced in PC Perspective’s launch review):

Configuration Single-Rank DIMMs Dual-Rank DIMMs
4 DIMMs (1 per channel) DDR4-2667 DDR4-2400
8 DIMMs (2 per channel) DDR4-2133 DDR4-1866

16GB DDR4 sticks are almost always dual-rank (memory chips on both sides of the PCB). With all eight slots populated, that means 16 physical ranks that the controller has to drive simultaneously. AMD’s official guaranteed speed for this configuration is DDR4-1866. Anything above that is beyond what AMD validates, even if the individual sticks are rated higher.

In practice, most 1950X systems can run DDR4-2133 with all eight slots populated, and with careful tuning some can reach 2400 or 2666.

Why Capacity Over Speed

For a ZFS NAS, the tradeoff between RAM capacity and speed is straightforward. ZFS uses available RAM for the ARC, which caches the most frequently and recently accessed data. A read that hits the ARC is served from memory. A read that misses goes to the underlying storage, which in this case is spinning hard drives.

The latency difference between DDR4-2133 and DDR4-2666 is measured in single-digit nanoseconds. The difference between a cache hit in RAM and a random read from a hard drive is measured in milliseconds: roughly a factor of 100,000. In bandwidth terms, quad-channel DDR4-2133 provides about 68 GB/s. A Gigabit Ethernet connection maxes out at about 120 MB/s. The memory subsystem is faster than the network by a factor of over 500x at the lower speed.

I wanted all 128GB at whatever frequency would be stable.

BIOS Update and the CMOS Battery

Before attempting the 8-DIMM configuration, I updated the BIOS to the latest version (F13d). Newer BIOS revisions for X399 boards include improved memory training algorithms and compatibility fixes.

The flash completed successfully. After the update, the board rebooted and went through a memory retraining cycle (the system power-cycles itself a few times while the screen stays black; more on what this process actually does later). It came up to the setup screen, then reported that the CMOS settings had been reset to defaults.

I re-entered the settings, saved, rebooted. Reset again. Every time the system lost power, the CMOS was wiped.

The motherboard had been sitting in storage for a few years, including some time in a shipping container. The CR2032 CMOS battery was completely dead. When I went to replace it with a fresh one, the small plastic retaining clip on the battery socket cracked and broke off. These clips become brittle after years of heat cycling and long storage. Without the clip providing downward pressure, the spring-loaded negative contact pushes the battery up and out of the socket. No sustained contact, no power to the CMOS.

The fix was hot glue. Seat the battery, apply small beads at three points around the perimeter to bridge the battery edge to the socket housing, hold for 60 seconds while the glue sets. Hot glue is non-conductive and handles PC case temperatures. It is not elegant.

One detail worth noting if you ever have to improvise a battery retention method: on a CR2032, the positive terminal is the entire top surface and the side rim. The negative terminal is only the small textured circle on the bottom. The socket has a center pin (negative, touches the bottom) and a side tab (positive, touches the rim). If your tape or glue insulates the side tab, the circuit is open and you have fixed nothing.

Verification: boot into BIOS, set the time, save, unplug the power cord, wait two minutes, check the clock. If the time held, the battery is making contact.

Intermittent Boot Failures

Before I replaced the battery and configured manual timings, the system was exhibiting inconsistent behavior. With all eight sticks installed and everything left on Auto, the board would sometimes boot successfully at DDR4-2400 with automatically determined timings, and other times get stuck on POST code 0d (a memory training failure on Gigabyte/AMI boards) and never make it past the black screen.

The same hardware, same sticks, same slots, but different outcomes each boot. At the time this was confusing. In hindsight, understanding what was happening during the boot process makes the behavior completely predictable.

Memory Training

On AMD Zen platforms, DDR4 initialization does not run on the x86 CPU cores. It runs on the PSP (Platform Security Processor), an ARM Cortex-A5 core embedded in the CPU die, as part of AMD’s AGESA firmware. The PSP initializes memory, decompresses the BIOS image into DRAM, and only then releases the x86 cores from reset. The first x86 instruction fetch comes from DRAM that is already trained and online.

The training algorithm for Zen 1 is proprietary (distributed as binary blobs). However, AMD open-sourced AGESA for older platforms (Family 15h), and that code is available in coreboot. The training sequence is architecturally similar, so we can understand the general approach. I have written up a detailed technical reference based on the JEDEC spec, AMD documentation, and the open-source code. What follows is a summary.

The concept is not entirely unfamiliar to me. Almost twenty years ago I worked at an embedded systems firm where we built devices around an ARM Cortex-M3 microcontroller with external RAM and flash on a shared bus. I remember watching our electrical engineers probe traces with oscilloscopes, trying to find timing and termination values that worked for both the RAM and the flash. A shared bus is harder than dedicated interfaces because the two devices have different timing requirements (flash is much slower than RAM, with different setup and hold times), and both devices load the bus electrically even when only one is being addressed. They characterized the board, cursed about our choice of parts, and hardcoded the final values in firmware. The hardware was fixed, so the values only needed to be found once.

DDR4 training is the automated version of that process. A PC motherboard cannot know at design time what DIMMs will be installed, so it has to repeat the characterization on every boot.

The Eye Diagram

The core concept in memory training is the eye diagram. If you overlay many successive data bit transitions on a scope, triggered on the clock edge, the resulting pattern looks like an open eye when the signal is clean:

Jitter closes the eye horizontally. Noise and crosstalk close it vertically. The goal of every training step is to find the center of the eye for each signal, maximizing the margin against all sources of degradation. At DDR4-2133, one bit period (unit interval) is 938 ps. At DDR4-3200, it shrinks to 625 ps. Higher speeds mean smaller eyes, tighter margins, and less room for error.

What Training Actually Does

Training runs through a sequence of calibration steps. Each step measures a specific aspect of the physical link and adjusts corresponding timing or voltage registers:

  1. ZQ Calibration. Each DRAM chip calibrates its output driver impedance and on-die termination against a precision external resistor, compensating for manufacturing variation in the internal polysilicon resistor legs.

  2. Vref Calibration. Sets the voltage threshold inside each DRAM that distinguishes logic 0 from logic 1. This is the vertical centering of the eye. DDR4 supports per-chip Vref programming.

  3. Write Leveling. DDR4 routes the clock serially through each DRAM on a DIMM (fly-by topology), so each chip receives the clock at a slightly different time. Write leveling measures this skew and adjusts the data strobe delay per chip to compensate. The open-source AGESA code shows this runs in two passes: coarse alignment at 400 MHz, then fine adjustment at the target frequency.

  4. DQS Gate Training. Calibrates when the controller’s receiver opens to capture incoming data strobe transitions during reads, and when to ignore the line (which is floating and prone to false transitions between bursts).

  5. Read Centering. The controller sweeps its internal read delay register while reading known test patterns from the DRAM’s MPR (Multi-Purpose Register). It finds the left edge (first delay that yields correct data) and right edge (last delay), then places the sampling point at the center. This is performed per DQ bit, because each of the 64 data lines has its own trace with its own length and coupling characteristics.

  6. Write Centering. Same concept for writes: sweep the write delay, find the passing window, center it. Also per bit.

  7. MaxRdLatency. Calibrates the total round-trip time the controller waits for read data, accounting for all delays through the controller, PHY, PCB traces, and DRAM.

All of this happens during the power-cycling phase of POST, while the screen is black and the debug LED is flashing codes. The results are stored in CMOS. On subsequent boots, the saved training data can be replayed (if Fast Boot is enabled) or training runs fresh each time (if disabled).

What Goes Wrong With Auto Settings

With this understanding, the intermittent 0d errors I was seeing before fixing the battery and setting manual timings make perfect sense.

The dead battery meant training results were lost after every power cycle. Each boot was a cold start: no saved training data, no manual overrides (those were wiped too), everything on Auto. The training algorithm had to search a large parameter space from scratch each time.

With all settings on Auto, two critical parameters were left to the firmware’s judgment. The first was the target frequency and timings. The two kits have different JEDEC base profiles: the GY sticks say “default to 2133 at 15-15-15-36” while the K sticks say “default to 2666 at 19-19-19-43.” The firmware has to reconcile these conflicting instructions, and the outcome of that negotiation can vary between boots depending on which sticks’ SPD data it reads first and how the AGESA version handles the conflict.

The second, and more important, was ProcODT (Processor On-Die Termination). ProcODT controls the termination impedance at the CPU’s memory controller pins. Without proper termination, signals reflect off the end of the trace and interfere with subsequent data, closing the eye. ProcODT is a precondition for training, not something training adjusts; it has to be set correctly before training begins. With Auto, the firmware was selecting a ProcODT value each boot that may or may not have been appropriate for 8 dual-rank DIMMs with mixed silicon characteristics.

Additionally, the mixed silicon itself narrows the window. Training has to find parameters that work for both DRAM types simultaneously. With identical kits, the optimal parameters for one stick are close to optimal for all of them, giving the training algorithm a wide target. With mixed silicon, the acceptable parameter ranges for each set of sticks overlap less, and the training algorithm has to find a point in that smaller overlap region.

So each boot was a search through a large parameter space, with an imprecise ProcODT starting point, targeting a narrow stability window, with no memory of previous successful results. Sometimes the search converged and the system booted (occasionally even at DDR4-2400). Sometimes it did not and training failed with 0d.

The Settings That Worked

I configured the following in BIOS before installing the GY sticks, populating the secondary slots symmetrically (one K stick and one GY stick per channel to balance the electrical load):

Setting Value
XMP Disabled
Memory Speed DDR4-2133
tCL 16
tRCD 18
tRP 18
tRAS 36
Command Rate 2T
DRAM Voltage 1.25V
SoC Voltage (VDDCR_SOC) 1.15V
ProcODT 60 ohms
Gear Down Mode Enabled
Fast Boot Disabled

A few notes on the less obvious settings:

Command Rate 2T. Controls how many clock cycles elapse between the controller issuing a command and data transfer. 1T is faster but requires clean signal integrity on the command/address bus. With eight DIMMs adding capacitive load to every trace, 2T is necessary to avoid timing violations.

ProcODT at 60 ohms. Explicitly sets the termination impedance instead of letting the firmware guess. 53.3 and 60 ohms are the most commonly successful values for 8 dual-rank DIMMs on Zen 1.

SoC Voltage at 1.15V. Powers the Infinity Fabric and integrated memory controller. The default (~1.0V) is insufficient for driving 16 ranks of memory. 1.15V is the generally accepted safe maximum for Zen 1.

Gear Down Mode. An AMD feature that aligns command and address signals to every other clock edge instead of every edge, effectively halving the command bus frequency. This significantly improves signal integrity with minimal throughput impact since the data bus still runs at full speed.

Fast Boot Disabled. Forces full memory training on every boot. With 8 mixed DIMMs, I wanted the board to do thorough training each time rather than replaying possibly stale cached results.

With these settings, training succeeded on the first attempt. The BIOS information screen reported 131,072 MB across all four channels.

The Overnight Test

Seeing 128GB in the BIOS is necessary but not sufficient. The memory needs to survive sustained stress testing.

This matters especially for ZFS. ZFS checksums all data and metadata on disk, so it can detect corruption that has already been written. But it has no way to detect corruption that occurs in RAM before the data is written. If a bit flips in memory, ZFS will compute the checksum of the corrupted data, write both to disk, and consider the operation successful. The corruption becomes permanent and undetectable. This is not unique to ZFS, but because ZFS is often chosen specifically for its data integrity guarantees, it is worth being explicit that those guarantees depend on the RAM being reliable.

I booted into MemTest86+ from a USB stick. It reported some oddities in the DIMM inventory:

I started the test in parallel mode: SMP: 32T (PAR), using all 32 logical threads to read and write patterns across all 128GB simultaneously. This maximizes heat in the DRAM and memory controller, maximizes electrical crosstalk on the traces, and creates worst-case conditions for signal integrity. If the configuration has a marginal stability issue, parallel mode will expose it.

With 128GB to sweep, each pass takes several hours. The recommendation is at least 4 complete passes with zero errors, which for this amount of memory takes 12-18 hours.

I left it running overnight.

MemTest Results

The first pass completed clean before I went to bed. I ran three more passes the next morning. All four passed with zero errors across all 128GB, in parallel mode with all 32 threads.

The settings above (DDR4-2133, 16-18-18-36, 2T, 1.25V DRAM, 1.15V SoC, ProcODT 60 ohms, Gear Down Mode enabled) are stable. 128GB of mixed DDR4 on a first-generation Threadripper, confirmed working.

SSD Inventory

Before building the ZFS pool, I needed to figure out what SSDs I had available and what condition they were in. I used an external USB M.2 adapter (one that passes through SMART data) to check each drive with smartctl.

The haul:

# Model Size Wear Verdict
1 Sabrent Rocket 1TB 18% Healthy, matched pair with #2
2 Sabrent Rocket 1TB 18% Healthy, matched pair with #1
3 Sabrent Rocket 1TB 27% Healthy, previously the spare NVMe from my other server
4 Crucial P1 1TB 9% Healthy, QLC, lightest wear
5 SK hynix BC711 256GB 5% Healthy, ex-laptop OEM pull
6 Samsung 960 EVO 500GB 129% SMART FAILED
7 Samsung 960 EVO 500GB 129% SMART FAILED

The two 960 EVOs had been hammered with ~345 TB of writes each against a 200 TBW rating. Samsung’s firmware had flagged them with a reliability degradation warning. Interestingly, both still showed zero Media and Data Integrity Errors and 100% Available Spare. They are not actively corrupting data, but they have exceeded the point where the manufacturer guarantees the flash cell endurance. I set them aside as potential scratch disks for non-critical use elsewhere, but would not trust them in a ZFS pool.

The matched pair of Sabrent Rockets stood out immediately. Identical model, firmware version, power-on hours (4,257), and total bytes written (273 TB each). They had clearly lived together in a RAID array their entire lives. A naturally matched pair is ideal for a ZFS mirror.

ZFS Storage Design

The Main Array

The primary storage is 6x 16TB HDDs in RAIDZ2. Two of the drives are Seagate IronWolf NAS drives that came from my other server’s backup pool; the other four are from my spare inventory.

RAIDZ2 provides double parity: any two drives can fail simultaneously without data loss. With 16TB drives, this is the minimum acceptable redundancy level. A single-parity configuration (RAIDZ1) would leave the array vulnerable during the 24-48 hours it takes to resilver a replacement 16TB drive, and a second failure during that window would be catastrophic. RAIDZ2 tolerates a second failure during resilver.

The usable capacity is approximately 60TB (4 data disks equivalent, minus ZFS overhead).

SSD Caching Tiers

ZFS supports three SSD acceleration mechanisms that sit between RAM and the HDD array. Each serves a different purpose:

Special vdev (metadata mirror). ZFS metadata includes directory entries, file attributes, and block pointers. On a pure HDD pool, every ls or find operation requires seeking across spinning platters. A special vdev moves all metadata onto SSD, making directory operations effectively instant. I also enabled special_small_blocks=128K, which puts files smaller than 128K on the SSD as well. For a photo library, this means thumbnails, sidecar files (.xmp, .json), and small documents are served from SSD automatically.

The special vdev must be mirrored because its loss would make the pool unrecoverable. The matched Sabrent Rocket pair (drives #1 and #2) was the obvious choice. Same model, same firmware, same wear level.

SLOG (Separate ZFS Intent Log). ZFS uses a write-ahead log (the ZIL) to guarantee that synchronous writes survive a crash. By default, the ZIL lives on the main pool (the HDDs). A SLOG moves the ZIL to a dedicated SSD, which accelerates synchronous write commits. This matters most for NFS (which defaults to sync writes) and less for Samba (which defaults to async). With 5 seconds of write buffering (the default zfs_txg_timeout), the SLOG only holds a few gigabytes of data at any moment.

The SK hynix 256GB (drive #5) handles this. 256GB is far more than needed, and at 5% wear it has decades of life left at this workload.

L2ARC (Level 2 Adaptive Replacement Cache). The ARC (in RAM) is the fastest cache layer. When the ARC is full and evicts data, the L2ARC catches it on SSD before it falls all the way back to HDD. This helps when repeatedly accessing the same data (browsing a photo album, rewatching videos, apps that scan metadata). It does not help for first-time sequential access (the data has to miss all caches and come from HDD regardless).

The solo Sabrent Rocket 1TB (drive #3, the ex-spare from my other server) provides 1TB of warm read cache.

Boot drive. The OS runs on the Crucial P1 1TB (drive #4) with a plain ext4 filesystem. The boot drive is deliberately kept separate from the ZFS pool and is treated as disposable. If Ubuntu breaks, reinstall it in ten minutes and reimport the pool with zpool import tank. All data survives because ZFS pool metadata lives on the pool’s own devices, not on the OS drive.

The final SSD role assignment:

Role Drive Size
Special vdev (mirror) Sabrent Rocket pair (#1 + #2) 2x 1TB
SLOG SK hynix BC711 (#5) 256GB
L2ARC Sabrent Rocket (#3) 1TB
Boot (ext4) Crucial P1 (#4) 1TB

All five drives are M.2 NVMe. The X399 Aorus Gaming 7 has three on-board M.2 slots (M2M, M2Q, M2P), all connected directly to the CPU’s PCIe 3.0 lanes with no SATA lane conflicts. With three additional PCIe M.2 adapter cards in the full-size PCIe slots (also CPU-direct), there are six NVMe slots available. The only placement preference is putting the SLOG in an on-board M.2 slot for marginally lower latency, though in practice the difference between on-board and adapter is unmeasurable.

Bringing It Online

The pool came up first try on Ubuntu 25.10, which ships ZFS 2.3.4 in standard repos (no PPA needed). Burn-in confirmed the design: ~274 MB/s sustained sequential writes across the RAIDZ2, with iostat showing each tier carrying its share — HDDs evenly loaded, the Sabrent mirror tracking metadata in lockstep, the SK hynix absorbing sync traffic on its own, and the L2ARC populating on repeated reads. First scrub clean, zero errors.

Every drive in the build arrived with a prior life — two HDDs from my previous server’s backup pool, four from spare inventory, the matched Sabrent pair retired from an earlier RAID setup, a laptop OEM pull, and two factory-flagged Samsungs set aside. None of it is new, but each drive ended up in the role its wear, size, and silicon were best suited for: TLC for the irreplaceable metadata mirror, lightly-used silicon for the SLOG, the largest spare for read cache, and the QLC for the disposable boot role.