How to Map ISO Policy Forms to JSON Schemas

This guide extends the Policy Schema Design cluster with the concrete transformation layer that converts an Insurance Services Office (ISO) policy form — HO-00-01, CG-00-01, CA-00-01 and their endorsement families — into the strict, versioned JSON contract that the validation gate enforces before any claim is allowed to act on a record.

Problem Statement

An ISO form is not a clean data structure. It is a legal document whose meaning is distributed across a base coverage part, a stack of endorsements that add, delete, or amend clauses, and a declarations page that supplies the actual limits. Mapping that to JSON breaks at production scale in three precise ways, and each is a distinct engineering failure rather than a vague “data quality” complaint.

The first is edition-date drift. ISO revises form editions on its own cadence (HO-00-01 05 11 versus HO-00-01 03 22), and the same form code can carry different mandatory fields between editions. A mapper keyed only on the form code silently maps a new edition through old rules, producing a record that validates but means the wrong thing — a coverage that was renamed or a sublimit that moved.

The second is endorsement override collisions. An endorsement frequently overrides a base-form value (a hurricane deductible endorsement replaces the flat all_perils_deductible with a percentage-of-Coverage-A figure). If the mapper merges base and endorsement values in a non-deterministic order, the resulting deductible depends on dict iteration order, and the same form re-mapped during a replay produces a different number — fatal in a domain where every value must be reconstructable during a market-conduct exam.

The third is ambiguous-clause coercion. Conditional language (“if applicable”, “subject to endorsement”) cannot be force-cast to a boolean or a decimal without losing the fact that it was conditional. A mapper that guesses produces a confident wrong answer instead of a flagged uncertain one.

The pattern below resolves all three: a registry keyed on form code and edition date, a deterministic precedence order for merging endorsements over the base form, and explicit Unresolved markers that carry a confidence score instead of a fabricated value.

Prerequisites

This pattern sits downstream of raw document capture and upstream of the schema-validation gate, so it assumes the canonical field names produced by Field Mapping Strategies are already in place; where the source is a scanned declarations page rather than a structured feed, Policy PDF Parsing & Extraction Workflows produce the raw payload this mapper consumes.

Python 3.10+ — required for X | Y union types, structural match, and frozen-dataclass ergonomics.
pydantic==2.* — runtime coercion with structured ValidationError reporting. See the Pydantic v2 docs.
jsonschema==4.* — Draft202012Validator for validating the emitted envelope against the published contract. Refer to the JSON Schema specification for constraint semantics.
orjson==3.* — deterministic, sorted-key serialization so the lineage hash is reproducible byte-for-byte.

Confirm before running: the schema registry is reachable and pinned to a known revision, the active contract version is exported as POLICY_SCHEMA_VERSION, and the append-only audit store that the whole Core Architecture & Compliance Mapping domain shares is writable so every rejection can be recorded with its payload hash before the record is dropped.

Step-by-Step Implementation

Step 1 — Key the registry on form code and edition date

The registry resolves an ISO form identifier to the exact mapping ruleset in force for that edition. Keying on the composite (form_code, edition_date) is what eliminates edition-date drift: a new edition that has no registered ruleset fails loudly instead of being mapped through stale rules.

from __future__ import annotations

import datetime as dt
import logging
from dataclasses import dataclass, field

logger = logging.getLogger("iso_mapper.registry")


class UnknownFormEditionError(LookupError):
    """Raised when no mapping ruleset is registered for a form edition."""


@dataclass(frozen=True, slots=True)
class FormKey:
    form_code: str          # e.g. "HO-00-01"
    edition_date: str       # ISO 8601 "YYYY-MM", e.g. "2022-03"


@dataclass(frozen=True, slots=True)
class MappingRuleset:
    key: FormKey
    schema_version: str
    # canonical_field -> ISO clause id it is sourced from
    clause_map: dict[str, str]
    required_fields: frozenset[str]


@dataclass(slots=True)
class SchemaRegistry:
    _rulesets: dict[FormKey, MappingRuleset] = field(default_factory=dict)

    def register(self, ruleset: MappingRuleset) -> None:
        self._rulesets[ruleset.key] = ruleset

    def resolve(self, form_code: str, edition_date: str) -> MappingRuleset:
        key = FormKey(form_code=form_code, edition_date=edition_date)
        ruleset = self._rulesets.get(key)
        if ruleset is None:
            logger.error("no ruleset for %s edition %s", form_code, edition_date)
            raise UnknownFormEditionError(f"{form_code} ({edition_date})")
        return ruleset

Step 2 — Merge endorsements over the base form in a fixed precedence

Endorsements amend the base coverage part, and the declarations page supplies the binding limits. Resolve every field through one deterministic precedence — declarations > endorsement > base — so the merged value never depends on iteration order. An ambiguous clause produces an explicit Unresolved marker, not a guessed value.

from decimal import Decimal


@dataclass(frozen=True, slots=True)
class Unresolved:
    """A clause that could not be deterministically typed."""
    clause_id: str
    raw_text: str
    confidence: float       # 0.0–1.0 from the upstream extractor


ClauseValue = Decimal | str | bool | Unresolved


def merge_clause(
    field_name: str,
    base: dict[str, ClauseValue],
    endorsement: dict[str, ClauseValue],
    declarations: dict[str, ClauseValue],
) -> ClauseValue:
    """Resolve one canonical field by fixed precedence. Never reorders inputs."""
    for layer in (declarations, endorsement, base):   # highest precedence first
        if field_name in layer:
            value = layer[field_name]
            if isinstance(value, Unresolved):
                logger.warning(
                    "field %s unresolved (clause %s, confidence %.2f)",
                    field_name, value.clause_id, value.confidence,
                )
            return value
    raise KeyError(f"no source layer supplies required field {field_name!r}")

Step 3 — Emit a frozen, hash-sealed JSON envelope

Map every canonical field through the resolved ruleset, then wrap the result in a lineage envelope. Serializing with sorted keys makes the SHA-256 reproducible, so the same input always yields the same hash — the property the audit trail depends on.

import hashlib

import orjson


@dataclass(frozen=True, slots=True)
class MappedPolicy:
    metadata: dict[str, str]
    coverage: dict[str, ClauseValue]


def build_envelope(
    ruleset: MappingRuleset,
    base: dict[str, ClauseValue],
    endorsement: dict[str, ClauseValue],
    declarations: dict[str, ClauseValue],
    jurisdiction: str,
) -> MappedPolicy:
    coverage: dict[str, ClauseValue] = {
        field_name: merge_clause(field_name, base, endorsement, declarations)
        for field_name in sorted(ruleset.clause_map)
    }
    missing = ruleset.required_fields - coverage.keys()
    if missing:
        raise ValueError(f"required fields absent after merge: {sorted(missing)}")

    payload = orjson.dumps(coverage, option=orjson.OPT_SORT_KEYS)
    metadata = {
        "source_form_id": f"{ruleset.key.form_code} {ruleset.key.edition_date}",
        "schema_version": ruleset.schema_version,
        "jurisdiction": jurisdiction,
        "ingestion_timestamp": dt.datetime.now(dt.timezone.utc).isoformat(),
        "payload_hash_sha256": hashlib.sha256(payload).hexdigest(),
    }
    return MappedPolicy(metadata=metadata, coverage=coverage)

Verification & Testing

Confirm two invariants before trusting the mapper in a pipeline: the lineage hash is deterministic for identical input, and ambiguous clauses are preserved rather than coerced. Both are cheap to assert.

def test_hash_is_reproducible(ruleset, base, endorsement, decs) -> None:
    a = build_envelope(ruleset, base, endorsement, decs, "CA")
    b = build_envelope(ruleset, base, endorsement, decs, "CA")
    assert a.coverage == b.coverage
    assert a.metadata["payload_hash_sha256"] == b.metadata["payload_hash_sha256"]


def test_endorsement_overrides_base(ruleset) -> None:
    base = {"all_perils_deductible": Decimal("1000")}
    endorsement = {"all_perils_deductible": Decimal("2500")}
    merged = merge_clause("all_perils_deductible", base, endorsement, {})
    assert merged == Decimal("2500")   # endorsement wins, deterministically


def test_ambiguous_clause_is_flagged(ruleset) -> None:
    decs = {"wind_sublimit": Unresolved("CG-0001-7", "if applicable", 0.41)}
    value = merge_clause("wind_sublimit", {}, {}, decs)
    assert isinstance(value, Unresolved)
    assert value.confidence < 0.5

Run the resolved envelope through jsonschema.Draft202012Validator against the published contract for schema_version as a final gate; any Unresolved instance should fail the typed schema and route the record to manual review rather than into adjudication.

Compliance & Audit Note

Because the lineage hash is computed over sorted-key serialization, it is reproducible months later from the raw form package alone, which is exactly what a state insurance department examiner or an internal SOX control tests for: given this input and this schema_version, prove the system produced this coverage record and no other. The source_form_id and edition_date recorded in the envelope let a compliance officer demonstrate that the correct ISO edition governed the mapping, and the preserved Unresolved markers prove the platform never fabricated a value for a conditional clause. Where state mandates override the ISO base definition, the State Regulation Mapping layer applies its jurisdictional adjustments after this envelope is sealed, and any regulated value inherits the field-level controls defined by Data Boundary Enforcement so it never leaks into a plaintext dashboard.

Troubleshooting Checklist

Same form maps differently across runs. Symptom: a deductible or sublimit changes value on replay of identical input. Cause: endorsement and base values merged by dict iteration order instead of fixed precedence. Fix: route every field through merge_clause and never dict.update() one layer over another.

UnknownFormEditionError floods from one carrier. Symptom: a wave of failures after a quarterly ISO release. Cause: a new edition shipped with no registered ruleset. Fix: register the edition’s MappingRuleset and, until then, fall back to the last stable edition with a deprecation_warning flag rather than mapping silently.

Hash differs for visually identical payloads. Symptom: two runs of the same policy produce different payload_hash_sha256. Cause: serialization without sorted keys, or a Decimal rendered inconsistently. Fix: keep orjson.OPT_SORT_KEYS and normalize all monetary fields to Decimal before serialization.

Required field absent after merge. Symptom: ValueError: required fields absent. Cause: the declarations page omitted a limit the edition marks mandatory. Fix: do not default the value — route the record to the manual-review queue the Claims Lifecycle Architecture triage layer drains.

Conditional clause silently typed as a number. Symptom: a “subject to endorsement” sublimit appears as 0 downstream. Cause: an extractor coerced ambiguous text instead of emitting Unresolved. Fix: preserve the Unresolved marker through the merge and let the schema validator reject it into review.

Policy Schema Design — the parent component this mapping feeds, defining the typed contract every mapped envelope must satisfy
Handling Multi-State Compliance in Claims Routing — how jurisdictional overrides apply after the ISO-to-JSON mapping is sealed
Designing Fallback Routes for Missing Adjuster Data — the review-queue pattern that absorbs unresolved and incomplete records
Building Async Batch Processors for Daily Policy Ingestion — the concurrency layer that runs this mapper across nightly volume
Validating Coverage Against Policy Limits — the downstream check that consumes the coverage enumerations this mapping locks in