How a PDS becomes a gist. page
End-to-end. We publish this so anyone can audit our work — and so insurers can predict what we'll do with their next reissue.
1. Source acquisition
We download the PDS straight from the insurer's public website (no paywalls, no logins). Each download is recorded with a SHA-256 hash and a timestamp; both are visible on the rendered page. We re-download whenever the insurer reissues.
2. Extraction
We extract the text page by page using pdftotext -layout. The page-level text is committed to the repository so anyone can diff successive versions of the same PDS.
3. Structuring
We translate the extracted text into a canonical machine-readable shape — see pds.schema.json. Every section captures its title, source page numbers, a one-line plain-English summary, and any structured data (cover limits, what's covered and what isn't, definitions). The structured JSON is the source of truth for both the rendered page and the chatbot.
4. Editorial pass
An editorial reviewer checks each plain-English summary against the source PDS — not generous, not punitive. Where the source is genuinely ambiguous, we say so and link to the page. We don't make the cover sound better or worse than it is.
5. Rendering
The shared renderer (shared/tools/render.mjs) takes the structured JSON, applies the insurer's brand colours via CSS variables, and emits a single self-contained HTML page. Every page includes the gist. independence banner, the source-document attribution, and a link back to the insurer's PDF.
6. Chatbot grounding
The same structured JSON powers the embedded chatbot. The chatbot uses an offline keyword router by default, and an LLM backend if configured. Every answer cites the source page so a user can verify against the original.
7. Feedback loop
The chatbot logs questions (not who asked them) and thumbs-up/down feedback. shared/feedback-loop/analyse.mjs ranks topics that drive the most questions and the most thumbs-down, and proposes a concrete editorial change. The same script reads public review-site data (where it's available) and joins it on the same topic taxonomy.
8. Reissue
When an insurer reissues their PDS, we re-run extraction and diff against the previous version. Materially-different cover triggers a major version bump on the rendered page; copy edits trigger a patch. Full history in CHANGELOG.md.
What we won't do
- Rank, score, or compare insurers against each other
- Take referral fees or affiliate commissions
- Reproduce a PDS verbatim — we summarise, and link
- Provide personal financial advice
- Modify a translation in exchange for payment