Asclepius keeps all structured data in SQLite, with WAL (Write-Ahead Logging) for safe concurrent reads during pipeline writes and FTS5 for full-text search. The database file lives at vault/asclepius.sqlite.
The diagram above is the core hub-and-spoke shape: documents in the middle, patients as the access boundary, and the medical-data tables (lab_results, medications, encounters, imaging_studies) hanging off both. Normalization tables (norm_lab_tests, etc.), audit logs, sessions, FTS triggers, and the per-page OCR cache are documented in the Table Details section below.
Individual lab test results with values, units, and reference ranges
encounters
Clinical encounters with diagnoses, findings, and follow-up instructions
medications
Prescribed medications with dosage, frequency, and duration
vaccinations
Vaccination records with manufacturer, lot number, and dose
imaging_studies
Imaging study metadata (modality, body part, accession, study UID, report_status). Doctor + facility are foreign keys kept in lockstep with the parent documents row via the imaging_studies_doctor_sync / _facility_sync AFTER UPDATE triggers, do not write them here directly.
imaging_series
Individual series within an imaging study (series_number, series_instance_uid, num_images, folder_path)
invoice_items
Line items from medical invoices with amounts and tariff codes
Per-document pipeline stage timeline. One row per stage transition (stage, status, job_kind, started_at, finished_at, message, optional page_current / page_total). Persisted across runs so the document detail page can show every upload + reprocess this doc has ever been through, with durations and error messages. Cascades on documents delete.
region_translations
One row per ad-hoc OCR + translate run on a user-selected rectangle of the PDF. Columns: id, document_id (FK + cascade), page, bbox_x / bbox_y / bbox_w / bbox_h (normalized [0,1] coords so the rectangle re-maps correctly even if the PDF is re-rendered at a different DPI), ocr_text, translated_text, ocr_provider_id, llm_provider_id, llm_model, thumbnail_path (vault-relative path under region_translations/{doc_id}/), created_at. Indexed on (document_id, id DESC) for the cards-newest-first render. Independent of the whole-document translate flow, which writes to documents.ocr_text_en.
FTS5 virtual table for full-text search across OCR text and raw extractions
sessions
Server-side session records (id, user, IP, user-agent, last-seen, expiry). Backs the admin session-list / revoke UI and replaces the older cookie-only session model.
audit_log
Structured audit trail for admin actions (user create/delete, session revoke, settings mutations). Surfaced in the Settings → Audit Log view.
ocr_page_cache
Per-page OCR text keyed by (document_id, page_number). Populated during OCR so the extractor and chunking pipeline can read individual pages without re-running OCR.
Shared credentials (URL + API key + concurrency + retry policy) and LLM/OCR/Vision provider entries live in config/settings.yaml, not the SQLite database. Asclepius mutates that file at runtime when you edit providers/credentials from the UI. There is no credentials or providers table.
Deduplication. Documents have a unique file_hash (SHA-256) to prevent duplicate imports.
Names live in one place. Doctor and facility names are stored only on doctors / facilities. documents just keeps doctor_id / facility_id, and readers JOIN to get the display name. Renaming a doctor once updates every document that references them.
Dates. Each document carries event_date (the canonical timeline anchor, when the medical event actually happened) and issued_date (when the document was produced administratively). The timeline, chart, and search views all key off event_date.
Child rows stay in sync.encounters.doctor_id / encounters.facility_id and imaging_studies.doctor_id / imaging_studies.facility_id are kept in lockstep with the parent document through AFTER UPDATE triggers, so moving a document to a different doctor updates its child rows in the same transaction.
One imaging study, one document. A 35-frame DICOM bundle creates ONE documents row and ONE imaging_studies row, not 35 of each. The parent document is the radiology report PDF: either a real PDF the user attached (doc_type='imaging_report', file_path points at the PDF) or a placeholder (file_path='') waiting to be populated. imaging_studies.report_status is the denormalised flag (placeholder | attached) the imaging list sorts and filters on. The DICOM frames live on disk under imaging_studies.folder_path; only the report has a documents.file_path.
Cached translation. When the user clicks Translate on the document detail page, the English rendering of the OCR body is stored on documents.ocr_text_en, with ocr_text_en_translated_at and ocr_text_en_model (the model id only, not the full provider label) for context. OCR is never re-run, only the cached ocr_text is sent through the LLM. Structured fields (names, dates, codes, lab values) stay in the source language so normalization keeps working.
Cascading deletes. Deleting a document cascades to all child records (lab results, encounters, medications, etc.).
FTS triggers. Insert/update/delete triggers keep the FTS5 index in sync with the documents table automatically.
WAL mode. Enabled at connection time for concurrent reads during pipeline writes.