Skip to content

Delta Models

Delta Model Overview

This document explains the delta model layers, how ingest builds them, and how we reconstruct a resolved act snapshot.

Why Delta Models Exist

Snapshot exports contain full Act payloads for each version. Storing those directly duplicates content. The delta model separates:

  • Typed content payloads (schema-backed objects),
  • Timeline selection (intervals + deltas + lineage),
  • Resolved view (the reconstructed Act for a date).

Model Layers

1) Schema Models (Typed Content)

These are the canonical, fully-typed shapes from law_graph.schema.models (e.g., Act, Article, Attachment, Title, Chapter). Each resolved schema now inherits from a shared *Core base so that only snapshot fields intended for persistence live in the blobs, while resolved-only fields (for example, article_versions) stay on the schema subclass.

They represent the final resolved content but are too nested to store directly without duplication.

1b) Core Models (Persisted Field Sets)

law_graph.schema.core_models defines the flattened field sets that are actually persisted inside blobs. Highlights:

  • ArticleCore — article text/metadata minus article_versions.
  • AttachmentCore — attachment metadata minus attachment_versions and attachment_expressions.
  • TitleCore — structural metadata minus nested articles/chapters.
  • ChapterCore — chapter metadata minus nested articles.
  • PrefaceCore, ReferenceCore — primitive expression-level fields.
  • CrossActUpdateEdgeCore — full cross-act update edge payload (matches the snapshot structure).

Both schema models and payload models inherit these cores so there is exactly one definition of “what is stored in the blob” per node type.

2) Blob Models (Storage-Friendly)

Blob models live in law_graph.delta.payloads and now subclass the core models directly (no embedded resolved schema types). They add helpers to sanitize snapshot inputs and to reconstruct the resolved schema objects via to_schema():

  • ArticleBlob, AttachmentBlob (flattened persisted fields)
  • ActStructureBlob (structure root)
  • TitleStructureBlob, ChapterStructureBlob (store references via article_urn_nirs instead of embedding full Article objects)
  • PrefaceGroupBlob, ReferenceGroupBlob
  • PreambleBlob, ConclusionsBlob, ExpressionUrisBlob

Blob models are stored in blobs and referenced by deltas.

Relationship Diagram (Mermaid)

flowchart TD
   Snapshots[Act snapshots] --> Builder[build_canonical_records + ActBuilder]
    Builder --> Base[BaseActRecord]
   Builder --> MetaDeltas[MetadataDeltaRecord*]

    Base --> Blobs[ContentBlob*]
    Base --> Intervals[IntervalNode*]
    Intervals --> NodeIntervals[NodeInterval*]
   MetaDeltas --> MetaDelta[MetadataDelta]

    Blobs --> Blobs[BlobModel]
    Blobs --> Schema[Schema models]

    Resolver[resolve_state] --> View[ResolvedActView]
    Base --> Resolver
   MetaDeltas --> Resolver
    View --> Act[Act]

3) Delta Timeline Models

Defined in law_graph.delta.models:

  • BaseActRecord: blob store + interval registry for a single act
  • ContentBlob: {content_version_id, node_type, content}
  • IntervalNode / NodeInterval: validity intervals per node
  • NodeInterval.observations: snapshot lineage for this interval
  • NodeInterval.legal_updates: raw legal update evidence (no confidence)
  • MetadataDelta / MetadataDeltaRecord: patch metadata fields at an effective date

These models represent how content changes over time.

Normattiva metadata exposes cross-act update links in:

  • metadati.attiAggiornati[].urnModifiche[] (active updates)
  • metadati.aggiornamentiAtto|aggiornamentiStruttura|aggiornamentiTitolo[].urnModifiche[] (passive/consolidation updates)

We normalize these into NodeInterval.legal_updates as raw evidence records:

  • updating_act_urn (urnModificante)
  • update_kind (which metadata bucket)
  • update_date (when available)
  • updated_anchor / updating_anchor (normalized anchors)
  • raw (minimal audit/debug fields, only persisted when raw payloads are enabled)

Edge nodes are no longer persisted; evidence is attached to the interval transition where a new content version becomes valid.

No confidence/scoring is stored; scoring is derived offline.

Anchor Convention

Anchors are normalized strings derived without database lookups. The current format is:

  • Articles: ARTICLE:{anchor} derived from articoloAggiornato fields
  • Interval anchors use ARTICLE:{urn_fragment} when the node id is a URN
  • Structure root: STRUCTURE:root
Article Annotations

Article notes and references are stored as independent nodes to avoid inflating article content churn:

  • article_references nodes store ArticleReferencesBlob payloads
  • article_notes nodes store ArticleNotesBlob payloads

These nodes use snapshot-date intervals per article (independent from article version boundaries).

4) Act Metadata Patching

Act-level metadata (doc title, doc date, etc.) is stored as deltas because it is small and not naturally interval-selected.

  • ActMetadata (typed subset of Act)
  • ActMetadataPatch + MetadataDelta (set/unset fields)
  • REQUIRED_METADATA_FIELDS prevents clearing mandatory fields

Ingestion Flow

1) Snapshot input (ActSnapshot): - Raw JSON is loaded for each version/tag. 2) Build canonical records + variant events (build_canonical_records + build_variant_event): - Extract payloads (articles, attachments, structure, etc.). - Hash each payload into a ContentBlob. - Build IntervalNode entries for validity intervals. - Attach observations and legal update evidence to NodeIntervals. - Emit MetadataDeltaRecords whenever act-level metadata changes. 3) Persist artifacts: - Canonical JSONL files: act_base, node_intervals, content_blobs, metadata_deltas.

Reconstruction Flow

1) Resolve a date (resolve_state): - Select NodeIntervals that cover the target date for each node. - Replay MetadataDeltaRecords up to the target date. - Materialize the selected ContentBlob payloads. 2) Rebuild structure: - TitleStructureBlob and ChapterStructureBlob reference article URNs, so we look up the corresponding Article blobs and rebuild Title -> Chapter -> Article nesting. 3) Assemble Act: - Combine metadata + reconstructed structure + attachments + expression fields into a resolved Act.

The resolver returns a ResolvedActView which includes:

  • act: the resolved Act snapshot
  • nodes: resolved node payloads (useful for diagnostics)
  • metadata: resolved act metadata

Example: Avoiding Duplicate Article Storage

Raw snapshot: Title contains nested chapters and articles.

If we stored Title directly as a blob, we would duplicate full Article payloads (also stored in NodeType.ARTICLE blobs).

Blob approach:

  • Store Article as its own blob.
  • Store TitleStructureBlob with article_urn_nirs and nested ChapterStructureBlobs.
  • During reconstruction, use article_urn_nirs to look up the stored Article blobs and rebuild the original nesting.

This keeps storage deduplicated while preserving structure.