Database Schema

Asclepius keeps all structured data in SQLite, with WAL (Write-Ahead Logging) for safe concurrent reads during pipeline writes and FTS5 for full-text search. The database file lives at vault/asclepius.sqlite.

The diagram above is the core hub-and-spoke shape: documents in the middle, patients as the access boundary, and the medical-data tables (lab_results, medications, encounters, imaging_studies) hanging off both. Normalization tables (norm_lab_tests, etc.), audit logs, sessions, FTS triggers, and the per-page OCR cache are documented in the Table Details section below.

Table Details

Core Tables

Table	Purpose
`users`	User accounts with bcrypt password hashes
`patients`	Patient demographics; deliberately minimal (name, DOB, sex) since only those fields are passed to the LLM for extraction
`user_patient_access`	Maps users to patients with role (`owner` or `viewer`)
`documents`	Central document records with metadata, OCR text, and extraction results

Medical Data Tables

Table	Purpose
`lab_results`	Individual lab test results with values, units, and reference ranges
`encounters`	Clinical encounters with diagnoses, findings, and follow-up instructions
`medications`	Prescribed medications with dosage, frequency, and duration
`vaccinations`	Vaccination records with manufacturer, lot number, and dose
`imaging_studies`	Imaging study metadata (modality, body part, accession, study UID, `report_status`). Doctor + facility are foreign keys kept in lockstep with the parent `documents` row via the `imaging_studies_doctor_sync` / `_facility_sync` AFTER UPDATE triggers, do not write them here directly.
`imaging_series`	Individual series within an imaging study (series_number, series_instance_uid, num_images, folder_path)
`invoice_items`	Line items from medical invoices with amounts and tariff codes

Organization Tables

Table	Purpose
`medical_events`	Medical events (diagnosis, surgery, treatment) that group related documents
`document_event_links`	Many-to-many links between documents and events with relevance level
`document_links`	Direct links between related documents (e.g., invoice_for, follow_up)
`document_sections`	Page-level sections for large documents with per-section OCR and extraction
`facilities`	Healthcare facilities (hospitals, clinics, labs) with normalization support
`facility_aliases`	Name aliases for facilities (for normalization/merge)
`doctors`	Doctors with specialty and facility affiliation, with normalization support
`doctor_aliases`	Name aliases for doctors (for normalization/merge)

Normalization Tables

Table	Purpose
`norm_lab_tests` + `norm_lab_test_aliases`	Canonical lab test names with multi-language aliases
`norm_specialties` + `norm_specialty_aliases`	Medical specialties with aliases
`norm_diagnoses` + `norm_diagnosis_aliases`	Diagnosis codes (ICD-10) with aliases
`norm_medications` + `norm_medication_aliases`	Medication names (ATC codes) with aliases

Learning Tables

Table	Purpose
`extraction_corrections`	Tracks user edits to LLM-extracted fields (before/after values) for correction-driven learning

Pipeline Tables

Table Purpose

document_stage_events Per-document pipeline stage timeline. One row per stage transition (stage, status, job_kind, started_at, finished_at, message, optional page_current / page_total). Persisted across runs so the document detail page can show every upload + reprocess this doc has ever been through, with durations and error messages. Cascades on documents delete.

region_translations One row per ad-hoc OCR + translate run on a user-selected rectangle of the PDF. Columns: id, document_id (FK + cascade), page, bbox_x / bbox_y / bbox_w / bbox_h (normalized [0,1] coords so the rectangle re-maps correctly even if the PDF is re-rendered at a different DPI), ocr_text, translated_text, ocr_provider_id, llm_provider_id, llm_model, thumbnail_path (vault-relative path under region_translations/{doc_id}/), created_at. Indexed on (document_id, id DESC) for the cards-newest-first render. Independent of the whole-document translate flow, which writes to documents.ocr_text_en.

Table	Purpose
`document_stage_events`	Per-document pipeline stage timeline. One row per stage transition (`stage`, `status`, `job_kind`, `started_at`, `finished_at`, `message`, optional `page_current` / `page_total`). Persisted across runs so the document detail page can show every upload + reprocess this doc has ever been through, with durations and error messages. Cascades on `documents` delete.
`region_translations`	One row per ad-hoc OCR + translate run on a user-selected rectangle of the PDF. Columns: `id`, `document_id` (FK + cascade), `page`, `bbox_x` / `bbox_y` / `bbox_w` / `bbox_h` (normalized `[0,1]` coords so the rectangle re-maps correctly even if the PDF is re-rendered at a different DPI), `ocr_text`, `translated_text`, `ocr_provider_id`, `llm_provider_id`, `llm_model`, `thumbnail_path` (vault-relative path under `region_translations/{doc_id}/`), `created_at`. Indexed on `(document_id, id DESC)` for the cards-newest-first render. Independent of the whole-document translate flow, which writes to `documents.ocr_text_en`.

System Tables

Table	Purpose
`chat_history`	Persisted chat messages per user and patient
`custom_prompts`	User-customized LLM prompts (overrides defaults)
`documents_fts`	FTS5 virtual table for full-text search across OCR text and raw extractions
`sessions`	Server-side session records (id, user, IP, user-agent, last-seen, expiry). Backs the admin session-list / revoke UI and replaces the older cookie-only session model.
`audit_log`	Structured audit trail for admin actions (user create/delete, session revoke, settings mutations). Surfaced in the Settings → Audit Log view.
`ocr_page_cache`	Per-page OCR text keyed by `(document_id, page_number)`. Populated during OCR so the extractor and chunking pipeline can read individual pages without re-running OCR.

Configuration (not in the database)

Shared credentials (URL + API key + concurrency + retry policy) and LLM/OCR/Vision provider entries live in config/settings.yaml, not the SQLite database. Asclepius mutates that file at runtime when you edit providers/credentials from the UI. There is no credentials or providers table.

Key Design Notes

Deduplication. Documents have a unique file_hash (SHA-256) to prevent duplicate imports.
Names live in one place. Doctor and facility names are stored only on doctors / facilities. documents just keeps doctor_id / facility_id, and readers JOIN to get the display name. Renaming a doctor once updates every document that references them.
Dates. Each document carries event_date (the canonical timeline anchor, when the medical event actually happened) and issued_date (when the document was produced administratively). The timeline, chart, and search views all key off event_date.
Child rows stay in sync. encounters.doctor_id / encounters.facility_id and imaging_studies.doctor_id / imaging_studies.facility_id are kept in lockstep with the parent document through AFTER UPDATE triggers, so moving a document to a different doctor updates its child rows in the same transaction.
One imaging study, one document. A 35-frame DICOM bundle creates ONE documents row and ONE imaging_studies row, not 35 of each. The parent document is the radiology report PDF: either a real PDF the user attached (doc_type='imaging_report', file_path points at the PDF) or a placeholder (file_path='') waiting to be populated. imaging_studies.report_status is the denormalised flag (placeholder | attached) the imaging list sorts and filters on. The DICOM frames live on disk under imaging_studies.folder_path; only the report has a documents.file_path.
Cached translation. When the user clicks Translate on the document detail page, the English rendering of the OCR body is stored on documents.ocr_text_en, with ocr_text_en_translated_at and ocr_text_en_model (the model id only, not the full provider label) for context. OCR is never re-run, only the cached ocr_text is sent through the LLM. Structured fields (names, dates, codes, lab values) stay in the source language so normalization keeps working.
Cascading deletes. Deleting a document cascades to all child records (lab results, encounters, medications, etc.).
FTS triggers. Insert/update/delete triggers keep the FTS5 index in sync with the documents table automatically.
WAL mode. Enabled at connection time for concurrent reads during pipeline writes.