download_manifest.db Table Reference
This page documents table purpose, key structure, and invariants for
download_manifest.db.
Overview
download_manifest.db stores download-time catalog and inventory state:
- canonical act catalog
- identifier aliases
- deduplicated artifact blobs and logical file observations
- run/event audit trail
- ambiguity backlog
- derived coverage view
Table Map (Mermaid Flowchart)
flowchart LR
dm_act_catalog["Act Catalog (dm_act_catalog)<br/>One canonical row per act"]
dm_act_alias["Act Alias (dm_act_alias)<br/>External identifiers mapped to acts"]
dm_act_progress["Enumeration Progress (dm_act_progress)<br/>Resume checkpoints per window"]
dm_act_meta["Enumeration Metadata (dm_act_meta)<br/>Small workflow key-value state"]
dm_artifact_blob["Artifact Blob (dm_artifact_blob)<br/>Deduplicated physical content by hash"]
dm_act_file["Act File Observation (dm_act_file)<br/>Logical file rows linked to act/source/blob"]
dm_source_ref["Source Reference (dm_source_ref)<br/>Source publication identity"]
dm_download_run["Download Run (dm_download_run)<br/>One row per download invocation"]
dm_download_event["Download Event (dm_download_event)<br/>Per-attempt result/audit events"]
dm_download_ambiguity["Download Ambiguity (dm_download_ambiguity)<br/>Unresolved ownership mapping backlog"]
act_file_coverage["Act File Coverage View (act_file_coverage)<br/>Derived per-act coverage projection"]
dm_act_catalog -->|FK: act_pk| dm_act_alias
dm_act_catalog -->|FK: act_pk| dm_act_file
dm_artifact_blob -->|FK: blob_id| dm_act_file
dm_source_ref -->|FK: source_ref_id| dm_act_file
dm_download_run -->|FK: run_id| dm_download_event
dm_act_catalog -->|FK: act_pk| dm_download_event
dm_source_ref -->|FK: source_ref_id| dm_download_event
dm_download_run -->|FK: run_id| dm_download_ambiguity
dm_act_catalog -->|FK: act_pk| dm_download_ambiguity
dm_act_catalog -.->|derived from catalog + files| act_file_coverage
dm_act_file -.->|derived from catalog + files| act_file_coverage
Core Catalog Tables
dm_act_catalog
Contains: - One canonical row per known act identity in the download hub.
Purpose: - Canonical act row used by downloader tables.
Keys:
- Primary key: act_pk
Foreign keys:
- Referenced by dm_act_alias.act_pk
- Referenced by dm_act_file.act_pk
- Referenced by dm_download_event.act_pk (nullable)
- Referenced by dm_download_ambiguity.act_pk (nullable)
Invariants:
- One row per canonical act in this DB instance.
- act_pk is internal DB identity, not cross-service identity.
- canonical_urn_nir and primary_eli_id are promoted primary identifiers
for fast deterministic lookup.
dm_act_alias
Contains:
- Alias rows mapping external identifier pairs (kind, value) to act_pk.
Purpose:
- Maps external/domain identifiers to act_pk.
Keys:
- Primary key: alias_id
- Unique index: (kind, value) via
ux_dm_act_alias_kind_value
Foreign keys:
- act_pk -> dm_act_catalog.act_pk
Invariants: - Alias namespace+value pair points to at most one act. - Used for canonical URN and source-specific owner ids.
Manifest Enumeration Checkpoint Tables
dm_act_progress
Contains: - Resume/checkpoint rows keyed by enumeration kind + window key.
Purpose: - Resume-safe checkpoint ledger for manifest windows.
Keys:
- Composite primary key: (kind, window_key)
Foreign keys: - None
Invariants:
- At most one progress row per (kind, window_key) pair.
dm_act_meta
Contains: - Small workflow metadata key/value rows for enumeration state.
Purpose: - Small metadata key/value state for enumeration workflows.
Keys:
- Primary key: key
Foreign keys: - None
Invariants: - At most one value per metadata key.
Artifact Inventory Tables
dm_artifact_blob
Contains: - Deduplicated physical blob records keyed by SHA-256 digest.
Purpose: - Physical dedupe layer keyed by content digest.
Keys:
- Primary key: blob_id
- Unique: sha256
Foreign keys:
- Referenced by dm_act_file.blob_id
Invariants:
- One row per unique SHA-256 digest.
- Multiple dm_act_file rows may reference one blob.
dm_act_file
Contains: - Logical file observation rows linking acts, source refs, and blob ids.
Purpose: - Logical artifact observations per act.
Keys:
- Primary key: file_id
Foreign keys:
- act_pk -> dm_act_catalog.act_pk
- source_ref_id -> dm_source_ref.source_ref_id (nullable)
- blob_id -> dm_artifact_blob.blob_id
Invariants:
- No strict uniqueness on (act_pk, mode, format, path).
- Repeated logical observations are allowed for audit history.
- Path integrity is validated operationally against export root.
dm_source_ref
Contains:
- Source publication identity rows keyed by (source, owner alias pair).
Purpose:
- Source publication identity hub (source, owner_alias_kind,
owner_alias_value).
Keys:
- Primary key: source_ref_id
- Unique: (source, owner_alias_kind, owner_alias_value) via
ux_dm_source_ref_source_owner_alias
Foreign keys:
- Referenced by dm_act_file.source_ref_id
- Referenced by dm_download_event.source_ref_id
Invariants: - One row per unique source publication key.
Run Audit Tables
dm_download_run
Contains:
- One run ledger row per download invocation.
Purpose: - One row per download command invocation.
Keys:
- Primary key: run_id
Foreign keys:
- Referenced by dm_download_event.run_id
- Referenced by dm_download_ambiguity.run_id (nullable)
Invariants: - Run ids are stable UUID-like text ids.
dm_download_event
Contains: - Per-attempt event rows recording request dimensions and outcomes.
Purpose: - Attempt/observation event stream for each run.
Keys:
- Primary key: event_id
Foreign keys:
- run_id -> dm_download_run.run_id
- act_pk -> dm_act_catalog.act_pk (nullable)
- source_ref_id -> dm_source_ref.source_ref_id (nullable)
Invariants:
- Event may exist without resolved act_pk.
- Event record preserves request dimensions and result status.
Ambiguity Backlog Table
dm_download_ambiguity
Contains: - Backlog rows for unresolved ownership/alias mapping ambiguities.
Purpose: - Tracks unresolved ownership/alias mapping ambiguity.
Keys:
- Primary key: ambiguity_id
- Unique constraint: (act_pk, reason)
Foreign keys:
- act_pk -> dm_act_catalog.act_pk (nullable)
- run_id -> dm_download_run.run_id (nullable)
Invariants:
- act_pk must remain nullable (runtime contract check).
- resolved is integer-encoded boolean (0/1).
- observed_keys_json and candidate_act_pks_json are JSON payloads.
Derived View
act_file_coverage (view)
Contains: - Read-only per-act aggregate coverage projections derived from act/file rows.
Purpose: - Read-only per-act coverage rollup from catalog + file rows.
Key shape:
- One row per act_pk projection.
Base relations:
- dm_act_catalog left-joined with dm_act_file
Invariants: - Rebuilt idempotently by storage initialization. - Derived state only; not a source-of-truth table.