Skip to content

ingest_manifest.db Table Reference

This page documents table purpose, key structure, and invariants for ingest_manifest.db.

Overview

ingest_manifest.db stores ingest-time state and canonical normalized outputs:

  • ingest run ledger
  • canonical act hub and source publication hub
  • source-publication keyed version checkpoints
  • canonical act build rows and child normalized rows
  • missing-interval snapshot metadata and detail rows
  • schema version metadata

Table Map (Mermaid Flowchart)

flowchart LR
    ig_schema_meta["Schema Metadata (ig_schema_meta)<br/>Ingest schema/version key-value state"]
    ig_run["Ingest Run (ig_run)<br/>One row per ingest invocation"]
    ig_act["Canonical Act (ig_act)<br/>Canonical act identity by URN"]
    ig_source_ref["Source Reference (ig_source_ref)<br/>Source publication identity"]
    ig_act_source_map["Act Source Map (ig_act_source_map)<br/>Source identity mapped to canonical act"]
    ig_version_state["Version State (ig_version_state)<br/>Per-version source hash checkpoints"]
    ig_act_build["Act Build (ig_act_build)<br/>Canonical build header per act/version"]
    ig_node_interval["Node Interval (ig_node_interval)<br/>Normalized node validity intervals"]
    ig_content_blob["Content Blob (ig_content_blob)<br/>Serialized content payload rows"]
    ig_metadata_delta["Metadata Delta (ig_metadata_delta)<br/>Effective-date metadata patches"]
    ig_missing_interval_meta["Missing Interval Meta (ig_missing_interval_meta)<br/>Snapshot header for missing-interval hints"]
    ig_missing_interval["Missing Interval (ig_missing_interval)<br/>Snapshot detail rows for missing intervals"]

    ig_run -->|FK: last_run_id| ig_version_state
    ig_run -->|FK: run_id| ig_act_build
    ig_run -->|FK: run_id| ig_missing_interval_meta
    ig_run -->|FK: resolved_run_id| ig_act_source_map

    ig_act -->|FK: act_id| ig_act_source_map
    ig_source_ref -->|FK: source_ref_id| ig_act_source_map
    ig_source_ref -->|FK: source_ref_id| ig_version_state

    ig_act -->|FK: act_id| ig_act_build
    ig_act_build -->|FK: build_id| ig_node_interval
    ig_act_build -->|FK: build_id| ig_content_blob
    ig_act_build -->|FK: build_id| ig_metadata_delta

    ig_missing_interval_meta -->|FK: meta_id| ig_missing_interval
    ig_source_ref -->|FK: source_ref_id| ig_missing_interval
    ig_act -->|FK: act_id| ig_missing_interval

Schema Metadata Table

ig_schema_meta

Contains: - Schema metadata key/value rows (including current schema version marker).

Purpose: - Stores ingest schema metadata managed by IngestManifestDbStore.

Keys: - Primary key: key

Foreign keys: - None

Invariants: - Runtime expects key='schema_version' to exist. - Runtime expects value='5.0' (current INGEST_SCHEMA_VERSION).

Run Ledger Table

ig_run

Contains: - One run ledger row per ingest invocation with lifecycle status.

Purpose: - One row per ingest invocation.

Keys: - Primary key: run_id

Foreign keys: - Referenced by ig_version_state.last_run_id (nullable) - Referenced by ig_act_build.run_id - Referenced by ig_missing_interval_meta.run_id (nullable) - Referenced by ig_act_source_map.resolved_run_id (nullable)

Invariants: - status check constraint: status IN ('running', 'completed', 'failed').

Identity Hub Tables

ig_act

Contains: - Canonical act identity rows (one per canonical urn_nir).

Purpose: - Canonical act hub keyed by legal act identity.

Keys: - Primary key: act_id - Unique: urn_nir

Foreign keys: - Referenced by ig_act_source_map.act_id - Referenced by ig_act_build.act_id - Referenced by ig_missing_interval.act_id (nullable)

Invariants: - One canonical row per URN identity.

ig_source_ref

Contains: - Source publication identity rows keyed by (source, owner alias pair).

Purpose: - Source publication identity hub (source, owner_alias_kind, owner_alias_value).

Keys: - Primary key: source_ref_id - Unique: (source, owner_alias_kind, owner_alias_value) via ux_ig_source_ref_source_owner_alias

Foreign keys: - Referenced by ig_act_source_map.source_ref_id - Referenced by ig_version_state.source_ref_id - Referenced by ig_missing_interval.source_ref_id

Invariants: - One row per unique source publication key.

ig_act_source_map

Contains: - Mapping rows from source publication identity to canonical act id.

Purpose: - Mapping from source publication identity to canonical act identity.

Keys: - Primary key: source_ref_id

Foreign keys: - source_ref_id -> ig_source_ref.source_ref_id - act_id -> ig_act.act_id - resolved_run_id -> ig_run.run_id (nullable)

Invariants: - At most one canonical act mapping per source publication identity.

Incremental Version-State Table

ig_version_state

Contains: - Checkpoint rows storing known per-version source hashes per source ref.

Purpose: - Stores known source hashes for each (source_ref_id, version_tag).

Keys: - Composite primary key: (source_ref_id, version_tag)

Foreign keys: - source_ref_id -> ig_source_ref.source_ref_id - last_run_id -> ig_run.run_id (nullable)

Invariants: - One checkpoint row per source-publication/version pair. - Hash columns are split by source format (hash_json, hash_akn, hash_nir).

Canonical Build Tables

ig_act_build

Contains: - Canonical build header rows for act + version scope output bundles.

Purpose: - Canonical build row for one canonical act and version scope.

Keys: - Primary key: build_id - Unique constraint: (act_id, version_tag) via ux_ig_act_build_act_version

Foreign keys: - act_id -> ig_act.act_id - run_id -> ig_run.run_id - Referenced by ig_node_interval.build_id - Referenced by ig_content_blob.build_id - Referenced by ig_metadata_delta.build_id

Invariants: - Runtime contract validates there are no duplicate (act_id, version_tag) builds. - input_versions_json and original_payloads_json are JSON payload columns.

ig_node_interval

Contains: - Normalized interval rows per node within a specific build.

Purpose: - Normalized node-interval rows for canonical output history.

Keys: - Composite primary key: (build_id, node_type, node_id, interval_id)

Foreign keys: - build_id -> ig_act_build.build_id

Invariants: - Child rows are scoped to one build_id. - observations_json and legal_updates_json store JSON arrays.

ig_content_blob

Contains: - Serialized payload rows keyed by build, content version, and node type.

Purpose: - Normalized content payload rows by content version + node type.

Keys: - Composite primary key: (build_id, content_version_id, node_type)

Foreign keys: - build_id -> ig_act_build.build_id

Invariants: - payload_json stores serialized JSON content.

ig_metadata_delta

Contains: - Effective-date metadata patch rows (set_json / unset_json) per build.

Purpose: - Canonical metadata deltas keyed by effective date.

Keys: - Composite primary key: (build_id, effective_date)

Foreign keys: - build_id -> ig_act_build.build_id

Invariants: - set_json and unset_json store serialized JSON payloads.

Missing-Interval Snapshot Tables

ig_missing_interval_meta

Contains: - Snapshot header rows describing one missing-interval hint snapshot run.

Purpose: - Snapshot metadata header for missing-interval hints.

Keys: - Primary key: meta_id

Foreign keys: - run_id -> ig_run.run_id (nullable) - Referenced by ig_missing_interval.meta_id

Invariants: - complete check constraint: complete IN (0, 1). - Snapshot links digest metadata for download/ingest manifests.

ig_missing_interval

Contains: - Snapshot detail rows for each missing interval observation.

Purpose: - Snapshot detail rows for missing intervals.

Keys: - Primary key: interval_row_id

Foreign keys: - meta_id -> ig_missing_interval_meta.meta_id - source_ref_id -> ig_source_ref.source_ref_id - act_id -> ig_act.act_id (nullable)

Invariants: - Runtime contract requires interval_row_id column to exist. - Rows are grouped by source publication identity (source_ref_id). - urn_nir is optional payload context in snapshot rows.

Cross-Table Contract Notes

  • build_id is the internal join key for canonical output child tables.
  • urn_nir is the canonical act identifier in ig_act.
  • source_ref_id is the primary key surface for incremental/version checkpoints.
  • No foreign keys connect ingest_manifest.db to download_manifest.db.