Delta Models
Delta Model Overview
This document explains the delta model layers, how ingest builds them, and how we reconstruct a resolved act snapshot.
Why Delta Models Exist
Snapshot exports contain full Act payloads for each version. Storing those
directly duplicates content. The delta model separates:
- Typed content payloads (schema-backed objects),
- Timeline selection (intervals + deltas + lineage),
- Resolved view (the reconstructed
Actfor a date).
Model Layers
1) Schema Models (Typed Content)
These are the canonical, fully-typed shapes from law_graph.schema.models
(e.g., Act, Article, Attachment, Title, Chapter). Each resolved
schema now inherits from a shared *Core base so that only snapshot fields
intended for persistence live in the blobs, while resolved-only fields (for
example, article_versions) stay on the schema subclass.
They represent the final resolved content but are too nested to store directly without duplication.
1b) Core Models (Persisted Field Sets)
law_graph.schema.core_models defines the flattened field sets that are
actually persisted inside blobs. Highlights:
ArticleCore— article text/metadata minusarticle_versions.AttachmentCore— attachment metadata minusattachment_versionsandattachment_expressions.TitleCore— structural metadata minus nestedarticles/chapters.ChapterCore— chapter metadata minus nestedarticles.PrefaceCore,ReferenceCore— primitive expression-level fields.CrossActUpdateEdgeCore— full cross-act update edge payload (matches the snapshot structure).
Both schema models and payload models inherit these cores so there is exactly one definition of “what is stored in the blob” per node type.
2) Blob Models (Storage-Friendly)
Blob models live in law_graph.delta.payloads and now subclass the core
models directly (no embedded resolved schema types). They add helpers to
sanitize snapshot inputs and to reconstruct the resolved schema objects via
to_schema():
ArticleBlob,AttachmentBlob(flattened persisted fields)ActStructureBlob(structure root)TitleStructureBlob,ChapterStructureBlob(store references viaarticle_urn_nirsinstead of embedding fullArticleobjects)PrefaceGroupBlob,ReferenceGroupBlobPreambleBlob,ConclusionsBlob,ExpressionUrisBlob
Blob models are stored in blobs and referenced by deltas.
Relationship Diagram (Mermaid)
flowchart TD
Snapshots[Act snapshots] --> Builder[build_canonical_records + ActBuilder]
Builder --> Base[BaseActRecord]
Builder --> MetaDeltas[MetadataDeltaRecord*]
Base --> Blobs[ContentBlob*]
Base --> Intervals[IntervalNode*]
Intervals --> NodeIntervals[NodeInterval*]
MetaDeltas --> MetaDelta[MetadataDelta]
Blobs --> Blobs[BlobModel]
Blobs --> Schema[Schema models]
Resolver[resolve_state] --> View[ResolvedActView]
Base --> Resolver
MetaDeltas --> Resolver
View --> Act[Act]
3) Delta Timeline Models
Defined in law_graph.delta.models:
BaseActRecord: blob store + interval registry for a single actContentBlob:{content_version_id, node_type, content}IntervalNode/NodeInterval: validity intervals per nodeNodeInterval.observations: snapshot lineage for this intervalNodeInterval.legal_updates: raw legal update evidence (no confidence)MetadataDelta/MetadataDeltaRecord: patch metadata fields at an effective date
These models represent how content changes over time.
3b) Legal Update Evidence (Raw)
Normattiva metadata exposes cross-act update links in:
metadati.attiAggiornati[].urnModifiche[](active updates)metadati.aggiornamentiAtto|aggiornamentiStruttura|aggiornamentiTitolo[].urnModifiche[](passive/consolidation updates)
We normalize these into NodeInterval.legal_updates as raw evidence records:
updating_act_urn(urnModificante)update_kind(which metadata bucket)update_date(when available)updated_anchor/updating_anchor(normalized anchors)raw(minimal audit/debug fields, only persisted when raw payloads are enabled)
Edge nodes are no longer persisted; evidence is attached to the interval transition where a new content version becomes valid.
No confidence/scoring is stored; scoring is derived offline.
Anchor Convention
Anchors are normalized strings derived without database lookups. The current format is:
- Articles:
ARTICLE:{anchor}derived fromarticoloAggiornatofields - Interval anchors use
ARTICLE:{urn_fragment}when the node id is a URN - Structure root:
STRUCTURE:root
Article Annotations
Article notes and references are stored as independent nodes to avoid inflating article content churn:
article_referencesnodes storeArticleReferencesBlobpayloadsarticle_notesnodes storeArticleNotesBlobpayloads
These nodes use snapshot-date intervals per article (independent from article version boundaries).
4) Act Metadata Patching
Act-level metadata (doc title, doc date, etc.) is stored as deltas because it is small and not naturally interval-selected.
ActMetadata(typed subset ofAct)ActMetadataPatch+MetadataDelta(set/unset fields)REQUIRED_METADATA_FIELDSprevents clearing mandatory fields
Ingestion Flow
1) Snapshot input (ActSnapshot):
- Raw JSON is loaded for each version/tag.
2) Build canonical records + variant events (build_canonical_records + build_variant_event):
- Extract payloads (articles, attachments, structure, etc.).
- Hash each payload into a ContentBlob.
- Build IntervalNode entries for validity intervals.
- Attach observations and legal update evidence to NodeIntervals.
- Emit MetadataDeltaRecords whenever act-level metadata changes.
3) Persist artifacts:
- Canonical JSONL files: act_base, node_intervals, content_blobs, metadata_deltas.
Reconstruction Flow
1) Resolve a date (resolve_state):
- Select NodeIntervals that cover the target date for each node.
- Replay MetadataDeltaRecords up to the target date.
- Materialize the selected ContentBlob payloads.
2) Rebuild structure:
- TitleStructureBlob and ChapterStructureBlob reference article
URNs, so we look up the corresponding Article blobs and rebuild
Title -> Chapter -> Article nesting.
3) Assemble Act:
- Combine metadata + reconstructed structure + attachments + expression
fields into a resolved Act.
The resolver returns a ResolvedActView which includes:
act: the resolvedActsnapshotnodes: resolved node payloads (useful for diagnostics)metadata: resolved act metadata
Example: Avoiding Duplicate Article Storage
Raw snapshot: Title contains nested chapters and articles.
If we stored Title directly as a blob, we would duplicate full Article
payloads (also stored in NodeType.ARTICLE blobs).
Blob approach:
- Store
Articleas its own blob. - Store
TitleStructureBlobwitharticle_urn_nirsand nestedChapterStructureBlobs. - During reconstruction, use
article_urn_nirsto look up the storedArticleblobs and rebuild the original nesting.
This keeps storage deduplicated while preserving structure.