Validating NACHA Addenda Records with Pydantic

A settlement file lands overnight, the Entry Detail (Type 6) records parse cleanly, reconciliation runs green — and three days later an operations analyst discovers a batch of consumer returns was silently dropped because a single Addenda (Type 7) record carried a tab character in its remittance field. That is the exact scenario this page exists to prevent. NACHA addenda records are the payload that carries return reason codes, notification-of-change data, and free-text remittance, and they are the single record type most likely to corrupt a reconciliation run without raising an error. This guide sits inside the Pydantic schema validation for payments gate and, within the broader Automated File Ingestion & Parsing Pipelines framework, extends the same declarative model composition down to Type 7 records so a malformed addendum is rejected at the ingestion boundary instead of downstream.

The records arrive already framed by the upstream fixed-width file decoding stage against the byte-level NACHA record layouts; Pydantic's job is to enforce their meaning. We target Addenda 05 (standard remittance) and Addenda 99 (return entries), which together account for the overwhelming majority of exception volume in institutional ACH processing.

Concept Spec: The Type 7 Byte Layout

Every NACHA record is exactly 94 bytes. The addenda type code in positions 2–3 discriminates the sub-layout, so validation is a two-stage problem: frame the fixed field, then dispatch to the correct schema. The offsets below are 0-indexed and end-exclusive (Python slice semantics), which is the representation that eliminates the off-by-one errors that dominate positional-parser bugs.

Field	Addenda 05 bytes	Addenda 99 bytes	Rule enforced
Record Type Code	`[0:1]`	`[0:1]`	Must equal `7`
Addenda Type Code	`[1:3]`	`[1:3]`	`05` or `99`
Payment Related Info	`[3:83]` (80 chars)	—	Printable ASCII only
Return Reason Code	—	`[3:6]`	`R\d{2}` in allowlist
Original Entry Trace	—	`[6:21]` (15 digits)	Links to Type 6 parent
Date of Death	—	`[21:27]`	`YYMMDD` or spaces
Original DFI ID	—	`[27:35]` (8 digits)	RDFI routing prefix
Addenda Sequence No.	`[83:87]`	—	Increments per parent
Entry Detail Sequence	`[87:94]`	—	Matches Type 6 tail
Trace Number	—	`[79:94]` (15 digits)	Unique within file

Validation cost is linear: for a batch of $n$ addenda records carrying $f$ fields each, enforcement is $O (n \cdot f)$ with a small constant, because Pydantic v2's core is compiled in Rust. Validation is almost never the throughput ceiling — the memory footprint of the batch is. That constraint drives the streaming design in the implementation below.

Full Annotated Implementation

The two models share a discipline. ConfigDict(strict=True, frozen=True, extra="forbid") pins the coercion policy: no silent string-to-int coercion, immutable-and-hashable instances the matching engine can dedupe without defensive copies, and a hard failure if the framer hands over an unexpected key. A model_validator(mode="before") accepts either a raw 94-byte line or a pre-sliced dict, so the same model serves both the streaming path and a unit test that constructs fields directly.

python

from __future__ import annotations

import re
from typing import Any

from pydantic import (
    BaseModel,
    ConfigDict,
    Field,
    StrictStr,
    field_validator,
    model_validator,
)

# NACHA restricts remittance payloads to printable ASCII (0x20–0x7E); a tab,
# newline, or high-bit byte is a conformance failure, not a warning.
_PRINTABLE_ASCII = re.compile(r"^[\x20-\x7E]+$")


class NACHAAddenda05(BaseModel):
    """Validated NACHA Addenda 05 (remittance) record. Immutable post-validation."""

    model_config = ConfigDict(strict=True, frozen=True, extra="forbid")

    record_type: StrictStr = Field(pattern=r"^7$")
    addenda_type_code: StrictStr = Field(pattern=r"^05$")
    payment_related_info: StrictStr = Field(min_length=1, max_length=80)
    addenda_sequence_number: int = Field(ge=1, le=9999)
    entry_detail_sequence_number: int = Field(ge=1, le=9_999_999)

    @field_validator("payment_related_info")
    @classmethod
    def enforce_nacha_charset(cls, v: str) -> str:
        """Reject non-printable bytes that ODFIs return as R-rejected files."""
        if not _PRINTABLE_ASCII.match(v):
            raise ValueError("Addenda 05 contains prohibited non-printable ASCII")
        return v.rstrip()  # trailing pad is not remittance content

    @model_validator(mode="before")
    @classmethod
    def frame_fixed_width(cls, data: dict[str, Any] | str) -> dict[str, Any]:
        # Accept a raw line from the stream, or a pre-sliced dict from tests.
        if not isinstance(data, str):
            return data
        if len(data) != 94:
            raise ValueError(f"Invalid record length: {len(data)} (expected 94)")
        return {
            "record_type": data[0:1],
            "addenda_type_code": data[1:3],
            "payment_related_info": data[3:83].rstrip(),
            "addenda_sequence_number": int(data[83:87]),
            "entry_detail_sequence_number": int(data[87:94]),
        }


class NACHAAddenda99(BaseModel):
    """Validated NACHA Addenda 99 (return entry) record."""

    model_config = ConfigDict(strict=True, frozen=True, extra="forbid")

    record_type: StrictStr = Field(pattern=r"^7$")
    addenda_type_code: StrictStr = Field(pattern=r"^99$")
    return_reason_code: StrictStr = Field(pattern=r"^R\d{2}$")
    original_entry_trace: StrictStr = Field(pattern=r"^\d{15}$")
    date_of_death: StrictStr = Field(default="      ")  # YYMMDD or 6 spaces
    original_dfi_id: StrictStr = Field(pattern=r"^\d{8}$")
    addenda_information: StrictStr = Field(default="", max_length=44)
    trace_number: StrictStr = Field(pattern=r"^\d{15}$")

    # Authoritative return reason codes per NACHA Operating Rules Appendix 4.
    # A syntactically valid-but-unknown code (e.g. "R99") must still fail.
    _VALID_RETURN_CODES = frozenset(
        {
            "R01", "R02", "R03", "R04", "R05", "R06", "R07", "R08", "R09", "R10",
            "R11", "R12", "R13", "R14", "R15", "R16", "R17", "R20", "R21", "R22",
            "R23", "R24", "R25", "R26", "R27", "R28", "R29", "R30", "R31", "R32",
            "R33", "R34", "R37", "R38", "R39", "R40", "R41", "R42", "R43", "R45",
            "R50", "R51", "R52", "R53", "R61", "R67", "R68", "R69", "R70", "R71",
            "R72", "R73", "R74", "R75", "R76", "R77", "R80", "R81", "R82", "R83",
            "R84", "R85",
        }
    )

    @field_validator("return_reason_code")
    @classmethod
    def validate_return_code(cls, v: str) -> str:
        if v not in cls._VALID_RETURN_CODES:
            raise ValueError(f"Unknown NACHA return reason code: {v!r}")
        return v

    @model_validator(mode="before")
    @classmethod
    def frame_fixed_width(cls, data: dict[str, Any] | str) -> dict[str, Any]:
        if not isinstance(data, str):
            return data
        if len(data) != 94:
            raise ValueError(f"Invalid record length: {len(data)} (expected 94)")
        return {
            "record_type": data[0:1],
            "addenda_type_code": data[1:3],
            "return_reason_code": data[3:6],
            "original_entry_trace": data[6:21],
            "date_of_death": data[21:27],
            "original_dfi_id": data[27:35],
            "addenda_information": data[35:79].rstrip(),
            "trace_number": data[79:94],
        }

The streaming driver keeps exactly one batch resident in memory and never lets a single bad record halt the run. Using validate_python (not validate_json) skips a JSON round-trip and validates the native slice directly — the same generator discipline the sibling high-volume Pandas parsing strategies guide applies to columnar loads.

python

from collections.abc import Iterator

from pydantic import TypeAdapter, ValidationError

# Pre-compile adapters once so per-record validation carries no model-build cost.
_ADDENDA_05 = TypeAdapter(NACHAAddenda05)
_ADDENDA_99 = TypeAdapter(NACHAAddenda99)


def stream_addenda(path: str, batch_size: int = 10_000) -> Iterator[list[dict]]:
    """Yield validated Type 7 records in memory-bounded batches.

    Malformed records are emitted as single-item error batches destined for a
    dead-letter queue; the pipeline never aborts on a bad line.
    """
    with open(path, "r", encoding="ascii", errors="strict") as fh:
        batch: list[dict] = []
        for line in fh:
            raw = line.rstrip("\r\n")
            if len(raw) != 94 or raw[0] != "7":
                continue  # not an addenda record — skip without error
            type_code = raw[1:3]
            try:
                if type_code == "05":
                    record = _ADDENDA_05.validate_python(raw)
                elif type_code == "99":
                    record = _ADDENDA_99.validate_python(raw)
                else:
                    continue  # 02/98 not handled by this gate
                batch.append(record.model_dump(mode="json"))
                if len(batch) >= batch_size:
                    yield batch
                    batch = []
            except ValidationError as exc:
                # Route to the exception queue; preserve the raw bytes for audit.
                yield [{"error": exc.errors(include_url=False), "raw_record": raw}]
        if batch:
            yield batch

Calibration & Configuration

The models are strict by design, but three knobs adapt them to context. First, batch_size is a pure memory-versus-syscall trade: 10,000 records is a safe default for a container capped at 512 MB, but a wire-scale file of short addenda can push to 50,000 while an ISO 20022 pipeline running these records alongside wide pain.002 payloads should drop it. Track container heap with tracemalloc and halve the batch if usage crosses 70 percent.

Second, the return-code allowlist is a policy surface, not a constant. RDFIs occasionally originate dishonored-return codes (R6x) that a receiving-only reconciliation pipeline should quarantine rather than accept — load _VALID_RETURN_CODES from the same configuration source your ACH origination profile uses so the two never drift. Third, payment_related_info charset strictness differs by rail: pure ACH remittance is printable-ASCII, but if you fold ISO 20022 remittance into the same model you must widen the pattern to permit UTF-8 and validate length in bytes rather than characters.

Validation Example: Before and After

Consider a real-looking Addenda 99 line returning an unauthorized debit (R10) originated by RDFI 09100001, referencing original trace 091000010000001:

text

799R10091000010000001      09100001                                            091000017654321

Passing that 94-byte string to _ADDENDA_99.validate_python(raw) frames the slices, confirms R10 is in the allowlist, checks both 15-digit trace numbers and the 8-digit DFI id, then returns a frozen NACHAAddenda99 whose model_dump() is a canonical dict ready for the exception queue. Now corrupt one byte — replace the return code with RXX:

text

799RXX091000010000001      09100001                                            091000017654321

The field_validator on return_reason_code never even runs, because the Field(pattern=r"^R\d{2}$") regex rejects RXX first, yielding a ValidationError whose .errors() names loc=("return_reason_code",) and input="RXX". The streaming driver catches it, emits [{"error": [...], "raw_record": "799RXX..."}], and moves to the next line — the ledger never sees the record, and the audit trail retains the exact bytes.

Failure Modes & Guardrails

Three edge cases cause silent corruption in production addenda pipelines, and each has a specific guard.

Trailing-pad stripping erases legitimate content. payment_related_info is right-padded with spaces to 80 bytes, so rstrip() is correct — but calling it on a field where trailing spaces are significant (they never are in NACHA remittance, but be certain before reusing this helper) silently mutates data. Strip only the pad, and always in the before validator so the stored field is canonical.
Non-ASCII bytes raise at file-open, not at validation. Opening with encoding="ascii", errors="strict" means a high-bit byte anywhere in the file throws UnicodeDecodeError from the for line in fh loop — outside the per-record try. That is deliberate: a file that is not ASCII is not a NACHA file. Do not "fix" it by switching to errors="replace", which would substitute � and let a corrupted remittance field pass the printable-ASCII check as a mangled but valid-looking string.
Orphaned addenda pass in isolation but break reconciliation. An Addenda 99 whose original_entry_trace matches no Type 6 parent in the batch validates perfectly on its own — the linkage is a cross-record invariant these single-record models cannot see. Enforce parent-child integrity in a second pass keyed on the trace number, and do not auto-reject: a missing parent is usually a truncated transmission, which belongs in manual review, not the dead-letter queue. This mismatch also constrains Reg E error-resolution obligations, where a delayed return extends provisional-credit liability.

Records that clear all three guards are canonical and flow forward to the transaction matching and reconciliation algorithms; everything else is quarantined with its original bytes intact.

Up: Pydantic Schema Validation for Payments — the ingestion validation gate this page extends to Type 7 records.
Optimizing pandas.read_fwf for 1GB NACHA Files — memory-bounded columnar decoding for the same files.
Handling Encoding Drift in Legacy Bank Exports — why errors="strict" matters upstream of this gate.
asyncio vs Multiprocessing for Payment Ingestion — scheduling CPU-bound validation across large batches.
How to Validate NACHA Batch Headers Programmatically — the Type 5 sibling of this record-level validation.

Validating NACHA Addenda Records with Pydantic #

Concept Spec: The Type 7 Byte Layout #

Full Annotated Implementation #

Calibration & Configuration #

Validation Example: Before and After #

Failure Modes & Guardrails #

Related #