Fixed-Width File Decoding for ACH/Wire Reconciliation Pipelines

Fixed-width file decoding is the deterministic foundation of payment reconciliation, and it sits at the very front of the automated file ingestion and parsing pipelines framework this section documents. In banking operations, NACHA batch files, correspondent-bank exports of Fedwire data, and legacy mainframe settlement tapes rely on rigid positional schemas rather than delimiters — a single off-by-one boundary error or silent encoding mismatch propagates through the matching engine, triggers false Regulation E disputes, and breaks exception routing. What breaks without disciplined decoding is not one file; it is every downstream stage that trusts the field it was handed.

This guide anchors the byte-level contract that the rest of the ingestion layer depends on. It establishes production-grade patterns for memory-safe slicing, strict schema validation, and audit-compliant error isolation, and it connects directly to the NACHA record layouts reference that catalogs the positional formats, to the Pydantic schema validation for payments stage that consumes decoded records, and to the encoding drift in legacy bank exports work that handles the character-set problems fixed-width files carry with them.

Concept Definition: The Positional Contract

A fixed-width record has no separators. Meaning is carried entirely by byte offset and length, so a decoder is nothing more than a set of substring boundaries applied to a byte array of known size. NACHA records are strictly 94 bytes — no exceptions, no ragged lines. Every logical file is a stack of these 94-byte records: a File Header (Type 1), one or more Batch Headers (Type 5), Entry Detail records (Type 6), optional Addenda (Type 7), Batch Control (Type 8), and a File Control (Type 9), padded to a 940-byte block boundary with all-nines filler records.

The Entry Detail (Type 6) record is where reconciliation lives, because it carries the routing number, account, amount, and trace number that later keying and matching depend on. Its field boundaries are fixed by the NACHA Operating Rules and must be encoded as a declarative contract rather than as string indices scattered through business logic. Below is the canonical 0-indexed layout for a Type 6 entry:

Field	Start (0-idx)	Length	Type	Validation Rule
Record Type Code	0	1	Int	Must equal `6`
Transaction Code	1	2	Int	`21`–`29` (credits), `31`–`39` (debits)
RDFI Routing (8)	3	8	Str	First 8 digits of ABA
Check Digit	11	1	Int	Mod-10 over the 8 routing digits
DFI Account Number	12	17	Str	Left-justified, space-padded
Amount	29	10	Decimal	Implied 2 decimals, zero-padded
Individual ID Number	39	15	Str	Left-justified
Individual Name	54	22	Str	Right-padded, strip trailing space
Discretionary Data	76	2	Str	Optional
Addenda Indicator	78	1	Int	`0` or `1`
Trace Number	79	15	Str	ODFI routing (8) + sequence (7)

Two properties make this a contract and not a suggestion. First, every byte from offset 0 to 93 is accounted for; there is no unclaimed space, so a record that decodes to any other length is corrupt by definition. Second, the amount field uses implied decimals: 0000012345 is $123.45, never 12,345. Decoding it as an integer and forgetting the implied scale is the single most common silent-money bug in ACH pipelines, which is why amounts must be lifted into decimal.Decimal at the moment of extraction and never touched as floats.

The Type 6 record as a byte tape: eleven fields with no separators, sized in proportion to their byte length and anchored to fixed offsets that must sum to exactly 94. The Amount field carries two implied decimal places — read as a plain integer it inflates every value hundredfold.

Architecture: Where Decoding Sits in the Pipeline

Decoding is the first transformation after a file crosses the secure transfer boundary. The order of operations is deliberate and irreversible: bytes are read from disk, decoded against the positional contract into strongly typed intermediate records, handed to Pydantic schema validation for cross-field and checksum rules, and only then normalized into the canonical schema that transaction matching algorithms consume. Anything that fails at the decode stage is quarantined with its raw bytes intact rather than coerced forward.

The decoder itself must be memory-bounded. Reconciliation engines routinely process multi-gigabyte settlement tapes, and loading an entire file into memory triggers OOM kills that stall the whole ingestion window. The correct shape is a generator that reads exactly one record-length of bytes at a time, yields a decoded record or an isolated error object, and holds constant memory regardless of file size. This is the same streaming discipline the sibling high-volume Pandas parsing strategies guide applies to DataFrame workflows — but fixed-width decoding usually bypasses DataFrames entirely in favor of zero-copy byte slicing over a mmap region.

Decoding is the first transformation after the secure-transfer boundary: the reader frames fixed 94-byte records at constant memory, the positional decoder and Pydantic gate each reject rather than repair, and every failure diverts to a quarantine queue with its raw bytes intact — the main flow never blocks on a bad record.

Phase-by-Phase Implementation

The decoder is built in four ordered steps. Each step is independently testable, and each hardens one failure surface — framing, extraction, typing, or isolation.

Step 1 — Declare the field contract as data

Field boundaries live in a frozen dataclass, never as inline slice literals. Encoding the schema as data means a layout change is a data edit, not a code change, and the same decode loop serves every record type.

python

from dataclasses import dataclass, field
from decimal import Decimal
from typing import Callable, List

RECORD_LENGTH = 94  # NACHA: fixed, non-negotiable

@dataclass(frozen=True, slots=True)
class FieldSpec:
    name: str
    start: int          # 0-indexed byte offset
    length: int
    dtype: str          # 'str' | 'int' | 'amount'
    validators: List[Callable[[object], bool]] = field(default_factory=list)

# Declarative contract for the NACHA Entry Detail (Type 6) record.
ENTRY_DETAIL_6: List[FieldSpec] = [
    FieldSpec("record_type", 0, 1, "int", [lambda v: v == 6]),
    FieldSpec("transaction_code", 1, 2, "int",
              [lambda v: 21 <= v <= 39]),
    FieldSpec("rdfi_routing", 3, 8, "str"),
    FieldSpec("check_digit", 11, 1, "int"),
    FieldSpec("account_number", 12, 17, "str"),
    FieldSpec("amount", 29, 10, "amount"),      # implied 2 decimals
    FieldSpec("individual_name", 54, 22, "str"),
    FieldSpec("addenda_indicator", 78, 1, "int", [lambda v: v in (0, 1)]),
    FieldSpec("trace_number", 79, 15, "str"),
]

Step 2 — Enforce framing before any field is read

The record is rejected the instant its length deviates from 94 bytes. Reconciliation pipelines require deterministic failure; there is no heuristic recovery, because a shifted frame means every subsequent field is meaningless.

python

class BoundaryError(ValueError):
    """Raised when a record does not match the declared frame length."""

def assert_frame(raw: bytes, expected: int = RECORD_LENGTH) -> None:
    if len(raw) != expected:
        raise BoundaryError(
            f"BOUNDARY_MISMATCH: expected {expected} bytes, got {len(raw)}"
        )

Step 3 — Decode fields with explicit typing and implied-decimal money

Extraction decodes each slice at the byte level, coerces by declared type, and lifts the amount into Decimal using its implied 2-decimal scale. Money never becomes a float, and every validator runs against the typed value.

python

def _to_amount(raw_digits: str) -> Decimal:
    """NACHA amounts are zero-padded integer cents with 2 implied decimals."""
    cents = int(raw_digits or "0")
    return (Decimal(cents) / Decimal(100)).quantize(Decimal("0.01"))

def decode_record(raw: bytes, schema: List[FieldSpec],
                  encoding: str = "ascii") -> dict:
    assert_frame(raw)
    record: dict = {}
    for spec in schema:
        chunk = raw[spec.start : spec.start + spec.length]
        try:
            text = chunk.decode(encoding).strip()
        except UnicodeDecodeError as exc:
            raise ValueError(
                f"ENCODING_CORRUPTION at offset {spec.start}: {exc}"
            ) from exc

        if spec.dtype == "int":
            value: object = int(text) if text else 0
        elif spec.dtype == "amount":
            value = _to_amount(text)
        else:
            value = text

        for is_valid in spec.validators:
            if not is_valid(value):
                raise ValueError(
                    f"VALIDATION_FAILED: {spec.name}={value!r} at offset {spec.start}"
                )
        record[spec.name] = value
    return record

Step 4 — Stream the file and isolate failures per record

The top-level ingestion function is a generator. It frames one record at a time, yields clean decodes, and quarantines any failure as a structured error object carrying the raw hex and byte offset — so a single bad record never aborts a batch of thousands.

python

import logging
from pathlib import Path
from typing import Iterator

logger = logging.getLogger("payment.ingest.fixed_width")

def stream_records(path: Path, schema: List[FieldSpec],
                   record_length: int = RECORD_LENGTH) -> Iterator[dict]:
    """Constant-memory generator with deterministic per-record isolation."""
    with open(path, "rb") as fh:
        offset = 0
        while True:
            chunk = fh.read(record_length)
            if not chunk:
                break
            if len(chunk) < record_length:
                logger.warning("TRUNCATED_RECORD: %d bytes at EOF (offset %d)",
                               len(chunk), offset)
                break
            try:
                yield decode_record(chunk, schema)
            except ValueError as exc:
                logger.error("RECORD_REJECTED at offset %d: %s", offset, exc)
                yield {
                    "_error": str(exc),
                    "_raw_hex": chunk.hex(),
                    "_byte_offset": offset,
                }
            offset += record_length

This structure keeps memory flat, accounts for every byte, and routes malformed records to a quarantine consumer exactly as the async batch processing architectures guide describes for dead-letter handling — the main ingestion thread never blocks on a bad record.

Edge Cases & Known Failure Modes

Fixed-width decoding fails quietly far more often than it fails loudly. The table below captures the production failure modes that most frequently reach the matching engine as silent corruption, each with its root cause and mitigation.

Failure Mode	Root Cause	Mitigation
Off-by-one field drift	A single inserted/dropped byte shifts every field after it	Assert 94-byte frame per record; reject, never realign
Implied-decimal loss	Amount read as integer, implied 2-decimal scale forgotten	Lift to `Decimal(cents)/100` at extraction; never float
CRLF vs LF line endings	Windows-origin files add `\r`, making records 95 bytes	Strip line terminators during framing, not after decode
EBCDIC/CP037 mainframe export	z/OS tapes are not ASCII; `.decode("ascii")` explodes	Negotiate encoding — see encoding drift
UTF-8 BOM prefix	`\xEF\xBB\xBF` prepended, shifting the first record by 3 bytes	Detect and strip BOM before framing
Space-padded numeric field	Blank amount/ID field yields empty string on `int()`	Treat empty numeric as `0` only where the spec permits it
Trailing filler `9`-records	940-byte block padding decoded as real entries	Terminate on Batch/File Control; ignore all-nines filler
Truncated final record	Transfer cut mid-record leaves < 94 bytes at EOF	Detect short read; quarantine remainder, do not decode

The most dangerous entries in this table are the ones that do not raise an exception — implied-decimal loss and space-padded numerics both produce a plausible-but-wrong number that flows straight into a tolerance threshold comparison and auto-matches against the wrong ledger entry. Guard them with validators, not with hope.

Compliance & Auditability

Fixed-width decoding is a regulated control surface, not a plumbing detail. The 94-byte record structure and the Entry Detail field positions are mandated by the NACHA Operating Rules (Article Two, "Rights and Responsibilities of ODFIs, Their Originators, and Third-Party Senders"), and a decoder that silently repairs a malformed record is producing an unauthorized entry rather than rejecting one. The correct posture is deterministic rejection with a preserved raw payload.

Traceability is the second obligation. Regulation E, 12 CFR § 1005.11, imposes strict error-resolution timelines — typically 10 business days to investigate and provisionally credit — and those clocks start from data that a decoder produced. If a reconciliation break is later disputed, examiners expect to reconstruct exactly which bytes produced which field. That is why every rejected record is logged with its byte offset, a sanitized raw-hex dump, and the schema-contract version in force at decode time. The Federal Reserve's operating circulars for Fedwire similarly require that received values be reproducible from source, so the raw payload must be retained under the institution's record-retention schedule (commonly seven years) and never overwritten by a "cleaned" version.

Because tolerance and matching logic downstream can conceal fraud or BSA/AML signals when it operates on mis-decoded amounts, the decode stage is treated as a first-class audit boundary: it emits structured logs, it versions its field contract in source control, and any change to the positional layout runs a parallel decode against historical files before activation.

Testing & Verification

Decoding is verified against known-good NACHA sample bytes with pytest. Because the contract is data, the tests assert both the happy path and each isolated failure mode. Construct fixtures as exact 94-byte records so framing is exercised alongside extraction.

python

import pytest
from decimal import Decimal

def make_entry(amount_cents: int) -> bytes:
    """Build a valid 94-byte Type 6 record with a given amount in cents."""
    rec = (
        b"6"                      # record type
        b"22"                     # transaction code (credit)
        b"07100000"               # RDFI routing (8)
        b"5"                      # check digit
        b"12345678901234567"      # account number (17)
        + f"{amount_cents:010d}".encode("ascii")   # amount (10)
        + b"ID-000000000001"      # individual id (15)
        + b"ACME PAYROLL LLC     "  # individual name (22, space-padded)
        + b"  "                   # discretionary (2)
        + b"0"                    # addenda indicator
        + b"071000000000001"      # trace number (15)
    )
    assert len(rec) == 94, len(rec)
    return rec

def test_amount_uses_implied_decimals():
    record = decode_record(make_entry(12345), ENTRY_DETAIL_6)
    assert record["amount"] == Decimal("123.45")
    assert isinstance(record["amount"], Decimal)

def test_boundary_mismatch_is_rejected():
    with pytest.raises(BoundaryError):
        decode_record(make_entry(100)[:-1], ENTRY_DETAIL_6)  # 93 bytes

def test_bad_transaction_code_fails_validation():
    corrupt = b"6" + b"99" + make_entry(100)[3:]
    with pytest.raises(ValueError, match="VALIDATION_FAILED"):
        decode_record(corrupt, ENTRY_DETAIL_6)

A decoded clean record serializes to a stable JSON shape that the validation stage can assert against sample data:

json

{
  "record_type": 6,
  "transaction_code": 22,
  "rdfi_routing": "07100000",
  "check_digit": 5,
  "account_number": "12345678901234567",
  "amount": "123.45",
  "individual_name": "ACME PAYROLL LLC",
  "addenda_indicator": 0,
  "trace_number": "071000000000001"
}

Frequently Asked Questions

Why reject a 93-byte record instead of padding it back to 94?

Because padding assumes you know where the missing byte was, and you do not. A short record means the frame is already corrupt, so every field boundary after the loss is wrong. Right-padding a 93-byte record to 94 produces a record that decodes without error but carries silently shifted values into matching. Deterministic rejection with the raw bytes preserved is the only defensible behavior under the NACHA Operating Rules.

My amounts are off by a factor of 100 — what happened?

The amount field carries implied 2-decimal scale: 0000012345 means $123.45, not $12,345. Reading it with a plain int() and forgetting to divide by 100 (or, worse, dividing with float arithmetic) is the classic cause. Lift the field into decimal.Decimal(cents) / 100 at the moment of extraction and quantize to two places, and never let the value touch a float.

The parser throws UnicodeDecodeError on some files but not others. Why?

Those files are almost certainly not ASCII. Mainframe exports arrive as EBCDIC (CP037), and some gateways prepend a UTF-8 BOM. chunk.decode("ascii") fails the moment it hits a byte above 0x7F. Do not swap in errors="ignore" — that deletes bytes and shifts the frame. Negotiate the encoding explicitly using the patterns in handling encoding drift in legacy bank exports.

How do I keep memory flat on a multi-gigabyte settlement tape?

Never call .read() or .readlines() on the whole file. Read exactly one record length per iteration inside a generator, yield the decoded record, and let the caller consume it before the next read. Memory stays constant regardless of whether the file is 94 bytes or 94 gigabytes, and the OS handles paging if you back the reader with mmap.

A batch has thousands of good records and three bad ones — should the whole file fail?

No. Isolate failures per record. The streaming decoder yields a structured error object (raw hex plus byte offset) for each malformed record and keeps processing the rest, so good entries reconcile on schedule while the three exceptions route to a quarantine queue for investigation. Failing the entire batch on a single bad record turns a three-record problem into a settlement-window outage.

How do I stop the trailing 9-filler records from decoding as real entries?

NACHA files are padded to a 940-byte block boundary with all-nines filler records after the File Control (Type 9). Terminate decoding when you reach the Batch and File Control records rather than reading to physical EOF, and treat any all-nines record as filler to be ignored — not as an entry with a record-type code of 9.

NACHA Record Layouts Explained — the full positional catalog for every NACHA record type this decoder frames.
Handling Encoding Drift in Legacy Bank Exports — EBCDIC, BOM, and character-set negotiation for fixed-width files.
Pydantic Schema Validation for Payments — the cross-field and checksum validation stage that consumes decoded records.
High-Volume Pandas Parsing Strategies — chunked DataFrame ingestion when columnar aggregation genuinely helps.
Async Batch Processing Architectures — dead-letter queues and concurrency for the records this stage quarantines.
Automated File Ingestion & Parsing Pipelines — the parent guide that maps the full ingestion layer this decoder opens.

Fixed-Width File Decoding for ACH/Wire Reconciliation Pipelines #

Concept Definition: The Positional Contract #

Architecture: Where Decoding Sits in the Pipeline #

Phase-by-Phase Implementation #

Step 1 — Declare the field contract as data #

Step 2 — Enforce framing before any field is read #

Step 3 — Decode fields with explicit typing and implied-decimal money #

Step 4 — Stream the file and isolate failures per record #

Edge Cases & Known Failure Modes #

Compliance & Auditability #

Testing & Verification #

Frequently Asked Questions #

Related guides in this collection #

Fixed-Width File Decoding for ACH/Wire Reconciliation Pipelines

Concept Definition: The Positional Contract

Architecture: Where Decoding Sits in the Pipeline

Phase-by-Phase Implementation

Step 1 — Declare the field contract as data

Step 2 — Enforce framing before any field is read

Step 3 — Decode fields with explicit typing and implied-decimal money

Step 4 — Stream the file and isolate failures per record

Edge Cases & Known Failure Modes

Compliance & Auditability

Testing & Verification

Frequently Asked Questions

Related guides in this collection