Skip to content

Database Schema

Asclepius keeps all structured data in SQLite, with WAL (Write-Ahead Logging) for safe concurrent reads during pipeline writes and FTS5 for full-text search. The database file lives at vault/asclepius.sqlite.

1—N UPLOADED M—N LINKS 1—N USERS users id · username · role password_hash PATIENT patients id · slug · display_name date_of_birth · sex EVENTS medical_events title · type · date range + document_event_links CORE documents id · patient_id · file_path file_hash · doc_type event_date · issued_date · status ocr_text · raw_extraction ocr_engine · llm_provider uploaded_by_user_id doctor_id · facility_id event_id · summary_en user_notes · tags + documents_fts (FTS5) + document_sections · links LABS lab_results value · unit · ranges norm_lab_test_id RX medications brand · ingredient dosage · frequency VISITS encounters diagnosis · findings follow-up IMAGING imaging_studies modality · body_part report_status · folder_path + imaging_series · DICOM

The diagram above is the core hub-and-spoke shape: documents in the middle, patients as the access boundary, and the medical-data tables (lab_results, medications, encounters, imaging_studies) hanging off both. Normalization tables (norm_lab_tests, etc.), audit logs, sessions, FTS triggers, and the per-page OCR cache are documented in the Table Details section below.

TablePurpose
usersUser accounts with bcrypt password hashes
patientsPatient demographics; deliberately minimal (name, DOB, sex) since only those fields are passed to the LLM for extraction
user_patient_accessMaps users to patients with role (owner or viewer)
documentsCentral document records with metadata, OCR text, and extraction results
TablePurpose
lab_resultsIndividual lab test results with values, units, and reference ranges
encountersClinical encounters with diagnoses, findings, and follow-up instructions
medicationsPrescribed medications with dosage, frequency, and duration
vaccinationsVaccination records with manufacturer, lot number, and dose
imaging_studiesImaging study metadata (modality, body part, accession, study UID, report_status). Doctor + facility are foreign keys kept in lockstep with the parent documents row via the imaging_studies_doctor_sync / _facility_sync AFTER UPDATE triggers, do not write them here directly.
imaging_seriesIndividual series within an imaging study (series_number, series_instance_uid, num_images, folder_path)
invoice_itemsLine items from medical invoices with amounts and tariff codes
TablePurpose
medical_eventsMedical events (diagnosis, surgery, treatment) that group related documents
document_event_linksMany-to-many links between documents and events with relevance level
document_linksDirect links between related documents (e.g., invoice_for, follow_up)
document_sectionsPage-level sections for large documents with per-section OCR and extraction
facilitiesHealthcare facilities (hospitals, clinics, labs) with normalization support
facility_aliasesName aliases for facilities (for normalization/merge)
doctorsDoctors with specialty and facility affiliation, with normalization support
doctor_aliasesName aliases for doctors (for normalization/merge)
TablePurpose
norm_lab_tests + norm_lab_test_aliasesCanonical lab test names with multi-language aliases
norm_specialties + norm_specialty_aliasesMedical specialties with aliases
norm_diagnoses + norm_diagnosis_aliasesDiagnosis codes (ICD-10) with aliases
norm_medications + norm_medication_aliasesMedication names (ATC codes) with aliases
TablePurpose
extraction_correctionsTracks user edits to LLM-extracted fields (before/after values) for correction-driven learning
TablePurpose
document_stage_eventsPer-document pipeline stage timeline. One row per stage transition (stage, status, job_kind, started_at, finished_at, message, optional page_current / page_total). Persisted across runs so the document detail page can show every upload + reprocess this doc has ever been through, with durations and error messages. Cascades on documents delete.
region_translationsOne row per ad-hoc OCR + translate run on a user-selected rectangle of the PDF. Columns: id, document_id (FK + cascade), page, bbox_x / bbox_y / bbox_w / bbox_h (normalized [0,1] coords so the rectangle re-maps correctly even if the PDF is re-rendered at a different DPI), ocr_text, translated_text, ocr_provider_id, llm_provider_id, llm_model, thumbnail_path (vault-relative path under region_translations/{doc_id}/), created_at. Indexed on (document_id, id DESC) for the cards-newest-first render. Independent of the whole-document translate flow, which writes to documents.ocr_text_en.
TablePurpose
chat_historyPersisted chat messages per user and patient
custom_promptsUser-customized LLM prompts (overrides defaults)
documents_ftsFTS5 virtual table for full-text search across OCR text and raw extractions
sessionsServer-side session records (id, user, IP, user-agent, last-seen, expiry). Backs the admin session-list / revoke UI and replaces the older cookie-only session model.
audit_logStructured audit trail for admin actions (user create/delete, session revoke, settings mutations). Surfaced in the Settings → Audit Log view.
ocr_page_cachePer-page OCR text keyed by (document_id, page_number). Populated during OCR so the extractor and chunking pipeline can read individual pages without re-running OCR.

Shared credentials (URL + API key + concurrency + retry policy) and LLM/OCR/Vision provider entries live in config/settings.yaml, not the SQLite database. Asclepius mutates that file at runtime when you edit providers/credentials from the UI. There is no credentials or providers table.

  • Deduplication. Documents have a unique file_hash (SHA-256) to prevent duplicate imports.
  • Names live in one place. Doctor and facility names are stored only on doctors / facilities. documents just keeps doctor_id / facility_id, and readers JOIN to get the display name. Renaming a doctor once updates every document that references them.
  • Dates. Each document carries event_date (the canonical timeline anchor, when the medical event actually happened) and issued_date (when the document was produced administratively). The timeline, chart, and search views all key off event_date.
  • Child rows stay in sync. encounters.doctor_id / encounters.facility_id and imaging_studies.doctor_id / imaging_studies.facility_id are kept in lockstep with the parent document through AFTER UPDATE triggers, so moving a document to a different doctor updates its child rows in the same transaction.
  • One imaging study, one document. A 35-frame DICOM bundle creates ONE documents row and ONE imaging_studies row, not 35 of each. The parent document is the radiology report PDF: either a real PDF the user attached (doc_type='imaging_report', file_path points at the PDF) or a placeholder (file_path='') waiting to be populated. imaging_studies.report_status is the denormalised flag (placeholder | attached) the imaging list sorts and filters on. The DICOM frames live on disk under imaging_studies.folder_path; only the report has a documents.file_path.
  • Cached translation. When the user clicks Translate on the document detail page, the English rendering of the OCR body is stored on documents.ocr_text_en, with ocr_text_en_translated_at and ocr_text_en_model (the model id only, not the full provider label) for context. OCR is never re-run, only the cached ocr_text is sent through the LLM. Structured fields (names, dates, codes, lab values) stay in the source language so normalization keeps working.
  • Cascading deletes. Deleting a document cascades to all child records (lab results, encounters, medications, etc.).
  • FTS triggers. Insert/update/delete triggers keep the FTS5 index in sync with the documents table automatically.
  • WAL mode. Enabled at connection time for concurrent reads during pipeline writes.