ingest_manifest.db Table Reference
This page documents table purpose, key structure, and invariants for
ingest_manifest.db.
Overview
ingest_manifest.db stores ingest-time state and canonical normalized outputs:
- ingest run ledger
- canonical act hub and source publication hub
- source-publication keyed version checkpoints
- canonical act build rows and child normalized rows
- missing-interval snapshot metadata and detail rows
- schema version metadata
Table Map (Mermaid Flowchart)
flowchart LR
ig_schema_meta["Schema Metadata (ig_schema_meta)<br/>Ingest schema/version key-value state"]
ig_run["Ingest Run (ig_run)<br/>One row per ingest invocation"]
ig_act["Canonical Act (ig_act)<br/>Canonical act identity by URN"]
ig_source_ref["Source Reference (ig_source_ref)<br/>Source publication identity"]
ig_act_source_map["Act Source Map (ig_act_source_map)<br/>Source identity mapped to canonical act"]
ig_version_state["Version State (ig_version_state)<br/>Per-version source hash checkpoints"]
ig_act_build["Act Build (ig_act_build)<br/>Canonical build header per act/version"]
ig_node_interval["Node Interval (ig_node_interval)<br/>Normalized node validity intervals"]
ig_content_blob["Content Blob (ig_content_blob)<br/>Serialized content payload rows"]
ig_metadata_delta["Metadata Delta (ig_metadata_delta)<br/>Effective-date metadata patches"]
ig_missing_interval_meta["Missing Interval Meta (ig_missing_interval_meta)<br/>Snapshot header for missing-interval hints"]
ig_missing_interval["Missing Interval (ig_missing_interval)<br/>Snapshot detail rows for missing intervals"]
ig_run -->|FK: last_run_id| ig_version_state
ig_run -->|FK: run_id| ig_act_build
ig_run -->|FK: run_id| ig_missing_interval_meta
ig_run -->|FK: resolved_run_id| ig_act_source_map
ig_act -->|FK: act_id| ig_act_source_map
ig_source_ref -->|FK: source_ref_id| ig_act_source_map
ig_source_ref -->|FK: source_ref_id| ig_version_state
ig_act -->|FK: act_id| ig_act_build
ig_act_build -->|FK: build_id| ig_node_interval
ig_act_build -->|FK: build_id| ig_content_blob
ig_act_build -->|FK: build_id| ig_metadata_delta
ig_missing_interval_meta -->|FK: meta_id| ig_missing_interval
ig_source_ref -->|FK: source_ref_id| ig_missing_interval
ig_act -->|FK: act_id| ig_missing_interval
Schema Metadata Table
ig_schema_meta
Contains: - Schema metadata key/value rows (including current schema version marker).
Purpose:
- Stores ingest schema metadata managed by IngestManifestDbStore.
Keys:
- Primary key: key
Foreign keys: - None
Invariants:
- Runtime expects key='schema_version' to exist.
- Runtime expects value='5.0' (current INGEST_SCHEMA_VERSION).
Run Ledger Table
ig_run
Contains: - One run ledger row per ingest invocation with lifecycle status.
Purpose: - One row per ingest invocation.
Keys:
- Primary key: run_id
Foreign keys:
- Referenced by ig_version_state.last_run_id (nullable)
- Referenced by ig_act_build.run_id
- Referenced by ig_missing_interval_meta.run_id (nullable)
- Referenced by ig_act_source_map.resolved_run_id (nullable)
Invariants:
- status check constraint:
status IN ('running', 'completed', 'failed').
Identity Hub Tables
ig_act
Contains:
- Canonical act identity rows (one per canonical urn_nir).
Purpose: - Canonical act hub keyed by legal act identity.
Keys:
- Primary key: act_id
- Unique: urn_nir
Foreign keys:
- Referenced by ig_act_source_map.act_id
- Referenced by ig_act_build.act_id
- Referenced by ig_missing_interval.act_id (nullable)
Invariants: - One canonical row per URN identity.
ig_source_ref
Contains:
- Source publication identity rows keyed by (source, owner alias pair).
Purpose:
- Source publication identity hub (source, owner_alias_kind,
owner_alias_value).
Keys:
- Primary key: source_ref_id
- Unique: (source, owner_alias_kind, owner_alias_value) via
ux_ig_source_ref_source_owner_alias
Foreign keys:
- Referenced by ig_act_source_map.source_ref_id
- Referenced by ig_version_state.source_ref_id
- Referenced by ig_missing_interval.source_ref_id
Invariants: - One row per unique source publication key.
ig_act_source_map
Contains: - Mapping rows from source publication identity to canonical act id.
Purpose: - Mapping from source publication identity to canonical act identity.
Keys:
- Primary key: source_ref_id
Foreign keys:
- source_ref_id -> ig_source_ref.source_ref_id
- act_id -> ig_act.act_id
- resolved_run_id -> ig_run.run_id (nullable)
Invariants: - At most one canonical act mapping per source publication identity.
Incremental Version-State Table
ig_version_state
Contains: - Checkpoint rows storing known per-version source hashes per source ref.
Purpose:
- Stores known source hashes for each (source_ref_id, version_tag).
Keys:
- Composite primary key: (source_ref_id, version_tag)
Foreign keys:
- source_ref_id -> ig_source_ref.source_ref_id
- last_run_id -> ig_run.run_id (nullable)
Invariants:
- One checkpoint row per source-publication/version pair.
- Hash columns are split by source format (hash_json, hash_akn, hash_nir).
Canonical Build Tables
ig_act_build
Contains: - Canonical build header rows for act + version scope output bundles.
Purpose: - Canonical build row for one canonical act and version scope.
Keys:
- Primary key: build_id
- Unique constraint: (act_id, version_tag) via
ux_ig_act_build_act_version
Foreign keys:
- act_id -> ig_act.act_id
- run_id -> ig_run.run_id
- Referenced by ig_node_interval.build_id
- Referenced by ig_content_blob.build_id
- Referenced by ig_metadata_delta.build_id
Invariants:
- Runtime contract validates there are no duplicate
(act_id, version_tag) builds.
- input_versions_json and original_payloads_json are JSON payload columns.
ig_node_interval
Contains: - Normalized interval rows per node within a specific build.
Purpose: - Normalized node-interval rows for canonical output history.
Keys:
- Composite primary key:
(build_id, node_type, node_id, interval_id)
Foreign keys:
- build_id -> ig_act_build.build_id
Invariants:
- Child rows are scoped to one build_id.
- observations_json and legal_updates_json store JSON arrays.
ig_content_blob
Contains: - Serialized payload rows keyed by build, content version, and node type.
Purpose: - Normalized content payload rows by content version + node type.
Keys:
- Composite primary key:
(build_id, content_version_id, node_type)
Foreign keys:
- build_id -> ig_act_build.build_id
Invariants:
- payload_json stores serialized JSON content.
ig_metadata_delta
Contains:
- Effective-date metadata patch rows (set_json / unset_json) per build.
Purpose: - Canonical metadata deltas keyed by effective date.
Keys:
- Composite primary key: (build_id, effective_date)
Foreign keys:
- build_id -> ig_act_build.build_id
Invariants:
- set_json and unset_json store serialized JSON payloads.
Missing-Interval Snapshot Tables
ig_missing_interval_meta
Contains: - Snapshot header rows describing one missing-interval hint snapshot run.
Purpose: - Snapshot metadata header for missing-interval hints.
Keys:
- Primary key: meta_id
Foreign keys:
- run_id -> ig_run.run_id (nullable)
- Referenced by ig_missing_interval.meta_id
Invariants:
- complete check constraint: complete IN (0, 1).
- Snapshot links digest metadata for download/ingest manifests.
ig_missing_interval
Contains: - Snapshot detail rows for each missing interval observation.
Purpose: - Snapshot detail rows for missing intervals.
Keys:
- Primary key: interval_row_id
Foreign keys:
- meta_id -> ig_missing_interval_meta.meta_id
- source_ref_id -> ig_source_ref.source_ref_id
- act_id -> ig_act.act_id (nullable)
Invariants:
- Runtime contract requires interval_row_id column to exist.
- Rows are grouped by source publication identity (source_ref_id).
- urn_nir is optional payload context in snapshot rows.
Cross-Table Contract Notes
build_idis the internal join key for canonical output child tables.urn_niris the canonical act identifier inig_act.source_ref_idis the primary key surface for incremental/version checkpoints.- No foreign keys connect
ingest_manifest.dbtodownload_manifest.db.