Skip to content

Vault Structure

The vault is the single root for every stored file and the SQLite database. It mounts as a Docker volume at /vault inside the container.

VAULT LAYOUT vault/ ├── inbox/ drop zone · watched │ └── {patient-slug | user-<id>}/ per-patient or per-user ├── patients/ organized records │ └── {slug}/{year}/{event}/ │ ├── {date}_{provider}_{type}.pdf │ └── {study-folder}/series-N/*.dcm ├── unclassified/ no patient resolved │ └── user-<id>/ └── asclepius.sqlite SQLite · source of truth FILE NAMING {YYYYMMDD}_{provider}_{doctype}.{ext} YYYYMMDD compact date, LLM-extracted provider facility slug (preferred) or doctor doctype lab_test · prescription · discharge · … ext pdf · jpg · dcm (DICOM) · … Same scheme on disk and in the UI.
vault/
├── inbox/
│ ├── alex-smith/ # Per-patient when upload knows the patient
│ │ └── my-upload.pdf
│ ├── user-1/ # Per-user fallback when no patient yet
│ │ └── unassigned-doc.pdf
│ └── user-2/
├── patients/
│ ├── alex-smith/
│ │ ├── 2023/
│ │ │ ├── sleep-apnea-treatment/ # Medical event folder
│ │ │ │ ├── 20231017_st-marys-hospital_surgical-report.pdf
│ │ │ │ ├── 20231031_st-marys-hospital_specialist-report.pdf
│ │ │ │ └── 20231115_st-marys-hospital_invoice.pdf
│ │ │ └── 20230315_city-clinic_lab-test.pdf # No event
│ │ ├── 2024/
│ │ │ ├── knee-injury/
│ │ │ │ ├── 20240722_riverside-clinic_radiology-report.pdf
│ │ │ │ └── 20240801_riverside-clinic_specialist-report.pdf
│ │ │ └── 20240722_riverside-clinic_ct-abdomen/ # imaging study folder
│ │ │ ├── series-001/ # peer of PDFs in {year}/
│ │ │ │ ├── 00001.dcm
│ │ │ │ └── ...
│ │ │ └── series-002/
│ │ ├── 2025/
│ │ │ └── 20250110_dr-jones_prescription.pdf
│ │ └── imaging-bundles/ # auxiliary imaging files
│ │ └── exam-AA387249Z07/ # one folder per zip upload
│ │ ├── DICOMDIR
│ │ ├── image_s0001_i0001.jpg # JPEG previews
│ │ └── LOCKFILE
│ └── jordan-lee/
│ └── ...
├── unclassified/
│ ├── user-1/ # Per-user unclassified bucket
│ │ └── 20240501_invoice.pdf
│ └── user-2/
└── asclepius.sqlite # SQLite database

inbox/ is split into per-upload sub-folders so each upload has its own isolated dropzone. When the upload form names a patient (the common case) the sub-folder is the patient slug, e.g. inbox/alex-smith/, so a shell-level ls inbox/ reads as a human roster. When there’s no patient yet the sub-folder falls back to user-<id>/. Uploads via POST /api/documents/upload write a .user_hint sidecar so the pipeline can stamp uploaded_by_user_id regardless of the folder naming. The file watcher is recursive, so existing files and new drops in any sub-folder are picked up; empty inbox folders are swept after every successful pipeline tick. Legacy files at the flat inbox/<name> or unclassified/<name> paths continue to work and are admin-only. unclassified/ keeps the user-<id>/ per-user split so each user’s unassigned queue stays isolated.

Documents assigned to a medical event are organized into an event subfolder within the year. Documents without an event remain directly in the year folder.

Imaging studies are filed as peers of regular document files under the year folder, the study folder (e.g. 20240722_riverside-clinic_ct-abdomen/) takes the place a single PDF would, with series-N/ subfolders for the DICOM frames. Auxiliary files extracted from the same zip upload (DICOMDIR, JPEG previews, LOCKFILE, VERSION) live at the patient-level under imaging-bundles/{zip-stem}/ and are surfaced via GET /api/imaging/{id}/bundle-files. The file browser hides imaging-bundles/ when navigating inside a patient directory so the year folders stay tidy.

Files are renamed during organization to:

{YYYYMMDD}_{provider-slug}_{doctype}.{ext}
  • Date. Compact date format (e.g., 20251231) as extracted by the LLM.
  • Provider slug. Facility slug (preferred) or doctor slug, lowercase with hyphens.
  • Doc type. One of the document type codes (e.g., lab_test, prescription, specialist_report).

Examples:

  • 20240315_city-clinic_lab-test.pdf
  • 20250110_dr-jones_prescription.pdf
  • 20241120_university-hospital_discharge.pdf
  1. Files move once on ingest. From inbox/{patient-slug | user-<id>}/ to their final spot in patients/{slug}/{year}/. After that, the path doesn’t change unless the user moves it via the file browser.
  2. The file browser can move files. The Move action on each row calls POST /api/vault/move, which renames the file on disk and rewrites the matching documents.file_path, imaging_studies.folder_path, and imaging_series.folder_path rows in lockstep so the document reference stays intact. Use it to fix files that landed in the wrong date / event folder.
  3. The database is the source of truth. Paths in documents.file_path are relative to the vault root. Imaging studies use imaging_studies.folder_path for the study folder; the parent documents.file_path points at the radiology report PDF (or is empty when the study has only a placeholder report).
  4. Imaging files keep their DICOM structure. Series folders contain the original .dcm files with their series instance UIDs. Files extracted from a zip with no DICOM extension are auto-renamed to .dcm after the DICM preamble at byte 128 is verified.
  5. Unclassified documents land in vault/unclassified/user-<id>/ when the pipeline can’t figure out the patient. Legacy rows without uploaded_by_user_id fall back to the flat unclassified/ directory and stay admin-only.
  6. Per-user scope. Non-admin users see only their own patients and their own inbox/ and unclassified/ subfolders in the file browser and document lists. Admins see everything.
  7. One imaging study, one document. A 35-frame ultrasound creates one documents row (the radiology report PDF, or a placeholder until one is attached) and one imaging_studies row, not 35 of each. The DICOM frames are on disk under the study folder; only the report has a documents.file_path.

Each patient has a URL-safe slug derived from their display name:

  • “Alex Smith” becomes alex-smith
  • The slug is globally unique (used for the filesystem directory name and joins). When two users independently create a patient with the same display name, the second gets an auto-disambiguated slug (alex-smith, then alex-smith-2, etc.). display_name is allowed to repeat across users, the slug is an internal handle, not something the UI surfaces for editing.

Files are deduplicated by SHA-256 hash (documents.file_hash). If a file with the same hash already exists in the database:

  • During pipeline processing: the file is skipped and deleted from inbox
  • During upload: the database INSERT is ignored (hash has a UNIQUE constraint)