Formats Comparison
π§ Why they differ so much
Each export format reflects a different layer of the Normattiva publishing pipeline:
| Layer | Format | Origin |
|---|---|---|
| Legal text layer | Akoma Ntoso (AKN) | The official, structured XML standard used for legislation (FRBR/ELI, anchors, legal semantics). |
| Editorial database layer | JSON | Normattivaβs internal content-management export β focuses on metadata, versioning, and links between acts. |
| Legacy NIR XML | NIR/Legge⦠XML | The older Italian Norme in Rete XML schema (pre-AKN), still useful as a fallback for some identifiers/labels. |
| Presentation layer | HTML | The render for public viewing β lots of <div>, <span>, anchors, little semantics. |
They donβt come from the same serialization of one object β they come from different internal models. Thatβs why field sets and structures differ so dramatically.
π¦ What extra information each format gives you
π© JSON
β Extra information
- Cross-act change tracking (
aggiornamentiAtto,attiAggiornati) - Temporal versioning (
dataVigoreVersione,inizioVigore,fineVigore) - Internal identifiers (
idNir,idDettNir,idInterno,idProgrNir) - Editorial states and flags (
abrogato,modificheAttive,errore) - βDatabaseβ fields (Gazzetta info, rubrica, codiceRedazionale)
π« Missing
- Full structural hierarchy (chapters, paragraphs)
- FRBR/ELI metadata lattice
- Textual modification instructions (
<textualMod>)
π§ Best for
- Incremental syncs
- Building the temporal graph (what changed / when)
- Analytics & indexing (fast Parquet conversion)
π¦ Akoma Ntoso (AKN)
β Extra information
- Complete legal structure:
<body>,<chapter>,<article>,<paragraph>etc. with@eIdanchors - FRBR / ELI metadata: Work / Expression / Manifestation, authors, publication, workflow
- Lifecycle and textualMod elements = explicit βamend X β insert Yβ
- Meta / references / analysis = cross-links with rich provenance
π« Missing
- Normattiva-specific update registry (no
aggiornamentiAtto) - In-force intervals (AKN has
lifecycle, but not per-element) - Editorial internal IDs
π§ Best for
- Semantic graph (citations, references)
- EU interoperability (ELI/FRBR alignment)
- NLP / RAG / search on legal text
π¨ NIR XML
β Extra information
- Italian-specific βNIRβ descriptors (
intestazione,descrittori,urn,redazione) - Often contains labels/identifiers that are absent or inconsistent across JSON/AKN
π« Missing
- FRBR/ELI semantics (AKN replaced them)
- Update relationships (JSON-only)
π§ Best for
- Fallback extraction when AKN/JSON are missing a value
- Bridging legacy datasets and validating identifiers
π₯ HTML
β Extra information
- Readable text, formatting, anchors (
id,href) - Browser-rendered layout
π« Missing
- All semantic data (everything is a
<div>or<span>)
π§ Best for
- UI / OCR comparison / anchor extraction (e.g., to detect where articles start)
- Not for canonical data
π§ So, what should you actually do now?
| Goal | Recommended format(s) | Why |
|---|---|---|
| Synchronize acts, detect changes, build timeline graph | β JSON (Multivigente) | Contains all change and validity metadata. |
| Extract structured text and legal anchors | β AKN (Multivigente) | Full text + FRBR/ELI + explicit anchors. |
| Fill missing identifiers/labels (fallbacks) | π‘ NIR XML (fallback) | Legacy descriptors sometimes missing elsewhere. |
| Presentation or anchor alignment | π‘ HTML (light use) | Only for comparing visible anchors. |
In other words:
- Use JSON as your temporal backbone.
- Use AKN as your semantic structure.
- Use NIR XML only as a fallback/auxiliary layer.
- Treat HTML (if used at all) as an auxiliary layer.
π§© Why you should test others (lightly)
Yes β test them, but only to extract what JSON/AKN lack.
-
From NIR XML, capture
urn,pubblicazione[@num|@norm], andredazionedescriptors. -
From HTML, optionally collect
@id/@hrefanchors for front-end search linking.
No need to integrate them fully β treat them as auxiliary layers.
βοΈ TL;DR recommendation
| Function | Preferred format |
|---|---|
| Incremental sync / change tracking | JSON |
| Full legal structure / EU graph | AKN |
| Missing identifier/label fallback | NIR XML |
| Visual/anchor validation | HTML |
Format comparisons
Comparison of Classic (C) vs Responsive (R) attribute exports on AKN and JSON. They are identical.