Formats Comparison

🧠 Why they differ so much

Each export format reflects a different layer of the Normattiva publishing pipeline:

Layer	Format	Origin
Legal text layer	Akoma Ntoso (AKN)	The official, structured XML standard used for legislation (FRBR/ELI, anchors, legal semantics).
Editorial database layer	JSON	Normattiva’s internal content-management export — focuses on metadata, versioning, and links between acts.
Legacy NIR XML	NIR/Legge… XML	The older Italian Norme in Rete XML schema (pre-AKN), still useful as a fallback for some identifiers/labels.
Presentation layer	HTML	The render for public viewing — lots of `<div>`, `<span>`, anchors, little semantics.

They don’t come from the same serialization of one object — they come from different internal models. That’s why field sets and structures differ so dramatically.

📦 What extra information each format gives you

🟩 JSON

✅ Extra information

Cross-act change tracking (aggiornamentiAtto, attiAggiornati)
Temporal versioning (dataVigoreVersione, inizioVigore, fineVigore)
Internal identifiers (idNir, idDettNir, idInterno, idProgrNir)
Editorial states and flags (abrogato, modificheAttive, errore)
“Database” fields (Gazzetta info, rubrica, codiceRedazionale)

🚫 Missing

Full structural hierarchy (chapters, paragraphs)
FRBR/ELI metadata lattice
Textual modification instructions (<textualMod>)

🔧 Best for

Incremental syncs
Building the temporal graph (what changed / when)
Analytics & indexing (fast Parquet conversion)

🟦 Akoma Ntoso (AKN)

✅ Extra information

Complete legal structure: <body>, <chapter>, <article>, <paragraph> etc. with @eId anchors
FRBR / ELI metadata: Work / Expression / Manifestation, authors, publication, workflow
Lifecycle and textualMod elements = explicit “amend X → insert Y”
Meta / references / analysis = cross-links with rich provenance

🚫 Missing

Normattiva-specific update registry (no aggiornamentiAtto)
In-force intervals (AKN has lifecycle, but not per-element)
Editorial internal IDs

🔧 Best for

Semantic graph (citations, references)
EU interoperability (ELI/FRBR alignment)
NLP / RAG / search on legal text

🟨 NIR XML

✅ Extra information

Italian-specific “NIR” descriptors (intestazione, descrittori, urn, redazione)
Often contains labels/identifiers that are absent or inconsistent across JSON/AKN

🚫 Missing

FRBR/ELI semantics (AKN replaced them)
Update relationships (JSON-only)

🔧 Best for

Fallback extraction when AKN/JSON are missing a value
Bridging legacy datasets and validating identifiers

🟥 HTML

✅ Extra information

Readable text, formatting, anchors (id, href)
Browser-rendered layout

🚫 Missing

All semantic data (everything is a <div> or <span>)

🔧 Best for

UI / OCR comparison / anchor extraction (e.g., to detect where articles start)
Not for canonical data

🧭 So, what should you actually do now?

Goal	Recommended format(s)	Why
Synchronize acts, detect changes, build timeline graph	✅ JSON (Multivigente)	Contains all change and validity metadata.
Extract structured text and legal anchors	✅ AKN (Multivigente)	Full text + FRBR/ELI + explicit anchors.
Fill missing identifiers/labels (fallbacks)	🟡 NIR XML (fallback)	Legacy descriptors sometimes missing elsewhere.
Presentation or anchor alignment	🟡 HTML (light use)	Only for comparing visible anchors.

In other words:

Use JSON as your temporal backbone.
Use AKN as your semantic structure.
Use NIR XML only as a fallback/auxiliary layer.
Treat HTML (if used at all) as an auxiliary layer.

🧩 Why you should test others (lightly)

Yes — test them, but only to extract what JSON/AKN lack.

From NIR XML, capture urn, pubblicazione[@num|@norm], and redazione descriptors.
From HTML, optionally collect @id/@href anchors for front-end search linking.

No need to integrate them fully — treat them as auxiliary layers.

⚖️ TL;DR recommendation

Function	Preferred format
Incremental sync / change tracking	JSON
Full legal structure / EU graph	AKN
Missing identifier/label fallback	NIR XML
Visual/anchor validation	HTML

Format comparisons

Comparison of Classic (C) vs Responsive (R) attribute exports on AKN and JSON. They are identical.