Vault Structure
The vault is the single root for every stored file and the SQLite database. It mounts as a Docker volume at /vault inside the container.
Directory Layout
Section titled “Directory Layout”vault/├── inbox/│ ├── alex-smith/ # Per-patient when upload knows the patient│ │ └── my-upload.pdf│ ├── user-1/ # Per-user fallback when no patient yet│ │ └── unassigned-doc.pdf│ └── user-2/├── patients/│ ├── alex-smith/│ │ ├── 2023/│ │ │ ├── sleep-apnea-treatment/ # Medical event folder│ │ │ │ ├── 20231017_st-marys-hospital_surgical-report.pdf│ │ │ │ ├── 20231031_st-marys-hospital_specialist-report.pdf│ │ │ │ └── 20231115_st-marys-hospital_invoice.pdf│ │ │ └── 20230315_city-clinic_lab-test.pdf # No event│ │ ├── 2024/│ │ │ ├── knee-injury/│ │ │ │ ├── 20240722_riverside-clinic_radiology-report.pdf│ │ │ │ └── 20240801_riverside-clinic_specialist-report.pdf│ │ │ └── 20240722_riverside-clinic_ct-abdomen/ # imaging study folder│ │ │ ├── series-001/ # peer of PDFs in {year}/│ │ │ │ ├── 00001.dcm│ │ │ │ └── ...│ │ │ └── series-002/│ │ ├── 2025/│ │ │ └── 20250110_dr-jones_prescription.pdf│ │ └── imaging-bundles/ # auxiliary imaging files│ │ └── exam-AA387249Z07/ # one folder per zip upload│ │ ├── DICOMDIR│ │ ├── image_s0001_i0001.jpg # JPEG previews│ │ └── LOCKFILE│ └── jordan-lee/│ └── ...├── unclassified/│ ├── user-1/ # Per-user unclassified bucket│ │ └── 20240501_invoice.pdf│ └── user-2/└── asclepius.sqlite # SQLite databaseinbox/ is split into per-upload sub-folders so each upload has its own
isolated dropzone. When the upload form names a patient (the common
case) the sub-folder is the patient slug, e.g. inbox/alex-smith/, so a shell-level ls inbox/ reads as a human roster. When there’s no
patient yet the sub-folder falls back to user-<id>/. Uploads via
POST /api/documents/upload write a .user_hint sidecar so the
pipeline can stamp uploaded_by_user_id regardless of the folder
naming. The file watcher is recursive, so existing files and new
drops in any sub-folder are picked up; empty inbox folders are swept
after every successful pipeline tick. Legacy files at the flat
inbox/<name> or unclassified/<name> paths continue to work and
are admin-only. unclassified/ keeps the user-<id>/ per-user split
so each user’s unassigned queue stays isolated.
Documents assigned to a medical event are organized into an event subfolder within the year. Documents without an event remain directly in the year folder.
Imaging studies are filed as peers of regular document files under
the year folder, the study folder (e.g.
20240722_riverside-clinic_ct-abdomen/) takes the place a single PDF
would, with series-N/ subfolders for the DICOM frames. Auxiliary
files extracted from the same zip upload (DICOMDIR, JPEG previews,
LOCKFILE, VERSION) live at the patient-level under
imaging-bundles/{zip-stem}/ and are surfaced via
GET /api/imaging/{id}/bundle-files. The file browser hides
imaging-bundles/ when navigating inside a patient directory so the
year folders stay tidy.
File Naming Convention
Section titled “File Naming Convention”Files are renamed during organization to:
{YYYYMMDD}_{provider-slug}_{doctype}.{ext}- Date. Compact date format (e.g.,
20251231) as extracted by the LLM. - Provider slug. Facility slug (preferred) or doctor slug, lowercase with hyphens.
- Doc type. One of the document type codes (e.g.,
lab_test,prescription,specialist_report).
Examples:
20240315_city-clinic_lab-test.pdf20250110_dr-jones_prescription.pdf20241120_university-hospital_discharge.pdf
Key Rules
Section titled “Key Rules”- Files move once on ingest. From
inbox/{patient-slug | user-<id>}/to their final spot inpatients/{slug}/{year}/. After that, the path doesn’t change unless the user moves it via the file browser. - The file browser can move files. The
Moveaction on each row callsPOST /api/vault/move, which renames the file on disk and rewrites the matchingdocuments.file_path,imaging_studies.folder_path, andimaging_series.folder_pathrows in lockstep so the document reference stays intact. Use it to fix files that landed in the wrong date / event folder. - The database is the source of truth. Paths in
documents.file_pathare relative to the vault root. Imaging studies useimaging_studies.folder_pathfor the study folder; the parentdocuments.file_pathpoints at the radiology report PDF (or is empty when the study has only a placeholder report). - Imaging files keep their DICOM structure. Series folders contain the original
.dcmfiles with their series instance UIDs. Files extracted from a zip with no DICOM extension are auto-renamed to.dcmafter the DICM preamble at byte 128 is verified. - Unclassified documents land in
vault/unclassified/user-<id>/when the pipeline can’t figure out the patient. Legacy rows withoutuploaded_by_user_idfall back to the flatunclassified/directory and stay admin-only. - Per-user scope. Non-admin users see only their own patients and their own
inbox/andunclassified/subfolders in the file browser and document lists. Admins see everything. - One imaging study, one document. A 35-frame ultrasound creates one
documentsrow (the radiology report PDF, or a placeholder until one is attached) and oneimaging_studiesrow, not 35 of each. The DICOM frames are on disk under the study folder; only the report has adocuments.file_path.
Patient Slug
Section titled “Patient Slug”Each patient has a URL-safe slug derived from their display name:
- “Alex Smith” becomes
alex-smith - The slug is globally unique (used for the filesystem directory name and joins). When two users independently create a patient with the same display name, the second gets an auto-disambiguated slug (
alex-smith, thenalex-smith-2, etc.).display_nameis allowed to repeat across users, the slug is an internal handle, not something the UI surfaces for editing.
File Deduplication
Section titled “File Deduplication”Files are deduplicated by SHA-256 hash (documents.file_hash). If a file with the same hash already exists in the database:
- During pipeline processing: the file is skipped and deleted from inbox
- During upload: the database INSERT is ignored (hash has a UNIQUE constraint)