Configuration¶
log-triage is a Python toolkit that sits between your log collector and an LLM. It reads log files, applies configurable rules to group and classify entries, and emits structured findings and payloads you can forward to a model. The CLI and Web UI share the same YAML configuration so you can run batch jobs, follow live streams, or explore results in a dashboard.
Tip: You can build and edit configuration (including regexes) directly in the Web UI's config editor and regex lab. It is the easiest way to tweak values safely before saving.
Core concepts¶
- Pipelines: Reusable processing recipes. Each pipeline defines how to group log lines, how to classify them, which regexes to ignore, and which prompts to use.
- Modules: Runtime bindings that attach a pipeline to a specific file path, execution mode (batch/follow), and output options.
- Findings: Structured results produced by a classifier. They include severity, reason, counts, context, and optional LLM payloads for downstream tooling.
- Severity: One of
WARNING,ERROR, orCRITICAL. Severity can be escalated by anomaly detection or manually adjusted in the Web UI. - Regex filters:
ignore_regexesdrop lines before counting.warning_regexesanderror_regexescount matched lines to set severity.- The regex lab in the UI helps experiment with these patterns.
- Addressed & false positives: Addressed findings track items you reviewed. Marking a finding as a false positive also adds the pattern to the pipeline's ignore list, preventing repeats.
Keep these terms in mind while reading the configuration sections below.
File structure at a glance¶
pipelines: []
modules: []
llm:
default_provider: null
providers: {}
database:
alerts:
webhook: {}
mqtt: {}
Each top-level key controls a specific area of the system. The example below expands all of them in context.
Example configuration¶
pipelines:
- name: homeassistant
grouping: whole_file
classifier: regex_counter
ignore_regexes:
- "^DEBUG"
warning_regexes:
- "warning|timeout"
error_regexes:
- "(error|failed)"
prompt_template: './prompts/homeassistant.txt'
modules:
- name: homeassistant_follow
enabled: true
path: '/var/log/fluent-bit/homeassistant.log'
mode: 'follow'
pipeline: 'homeassistant'
output_format: 'text'
min_print_severity: 'WARNING'
stale_after_minutes: 60
from_beginning: false
interval: 5
llm:
enabled: true
provider: 'openai'
emit_llm_payloads_dir: './ha_llm_payloads'
context_prefix_lines: 2
alerts:
webhook:
enabled: true
url: 'https://hooks.example.local/logs'
llm:
default_provider: openai
providers:
openai:
base_url: 'https://api.openai.com'
model: 'gpt-4o-mini'
api_key_env: 'OPENAI_API_KEY'
database:
url: 'sqlite:///./logtriage.db'
echo: false
alerts:
webhook:
enabled: false
url: ''
Tip: The Web UI config editor provides inline hints and validation so you can tweak values safely before saving, and the regex lab lets you test patterns interactively before applying them.
Line-by-line highlights for the example above:
pipelines[].grouping: choose how to split the log stream (whole_fileormarker_based).pipelines[].classifier: select which classifier implementation to run.ignore_regexes,warning_regexes,error_regexes: filter noise or count severity indicators; editable from the Web UI when marking false positives.prompt_template: path to the prompt file for LLM payloads.modules[].mode:batchscans once;followtails continuously.modules[].from_beginningandinterval: follow-mode controls for where to start reading and how often to poll for rotations.modules[].stale_after_minutes: used by the Web UI to mark follow modules as stale when no new lines arrive; batch modules do not use this flag because they run once and exit.modules[].llm: per-module override for enabling payloads, picking a provider, and setting context lines or output paths.modules[].alerts: enable webhook or MQTT destinations per module.llm.default_providerandllm.providers: global provider definitions, including where API keys come from.database.url: connection string for persisting findings so the Web UI can query history.
LLM sampling controls¶
- temperature tweaks randomness in completions (0–2). Lower values keep replies steady and deterministic; higher values make them more creative or exploratory. See OpenAI's parameter guide for examples: https://platform.openai.com/docs/guides/text-generation/parameter-details.
- top_p (nucleus sampling) limits generation to the smallest set of tokens whose cumulative probability stays under the threshold. Smaller values constrain the model to safer choices; larger values behave more like an unrestricted search. Same reference: https://platform.openai.com/docs/guides/text-generation/parameter-details.
Pipelines¶
Pipelines describe how to interpret a log stream. You can edit their regexes and templates directly from the Web UI config editor and regex lab.
pipelines:
- name: homeassistant
grouping:
type: whole_file # or marker
classifier: regex_counter # or rsnapshot_basic
ignore_regexes:
- "^DEBUG"
warning_regexes:
- "warning|timeout"
error_regexes:
- "(error|failed)"
- grouping controls how log lines are chunked before classification.
whole_filetreats the entire file as a single chunk, ideal for log files that represent one run or backup job.markerlooks for start/end markers (for example, rsnapshot headers) and groups lines between markers.- Optional keys under
grouping:start_regex/end_regex: boundaries for marker grouping.only_last: whentrue, process only the final grouped chunk (useful when a batch module should analyze just the most recent run instead of the entire historical log).
- classifier picks the rule set used to score each chunk. You can plug in your own; see Classifiers for guidance.
- ignore_regexes lets you drop known-noise lines before counting errors and warnings. False positives you mark in the UI are added here automatically.
- warning_regexes / error_regexes count matches separately so the classifier can choose the highest severity observed.
When using marker_based grouping, ensure your regexes align with the markers your log source emits (for example, rsnapshot's ^\d{4}/\d{2}/\d{2} headers). Everything between two markers is treated as a single event, so place warnings/errors accordingly.
Modules¶
Modules connect pipelines to real log files and decide when to run. You can adjust module paths, modes, and LLM settings directly from the Web UI config editor without touching the YAML manually.
modules:
- name: homeassistant_follow
enabled: true
path: '/var/log/fluent-bit/homeassistant.log'
mode: 'follow' # 'batch' or 'follow'
pipeline: 'homeassistant'
output_format: 'text' # 'text' or 'json'
min_print_severity: 'WARNING'
stale_after_minutes: 60 # used by the Web UI for activity indicators
Follow mode options¶
Follow mode tails a single file and keeps up with rotations. Combine these options to control responsiveness and resource usage:
from_beginning: when true, reads the whole file before tailing. Set tofalsefor a tail-only experience similar totail -F.interval: poll frequency (in seconds) for new lines and rotation detection. Lower values react faster but hit the file system more often.reload_on_change: CLI flag that reloads configuration whenconfig.yamlchanges so follow-mode modules pick up regex, grouping, and prompt updates without restarting.stale_after_minutes: used only for follow-mode modules to trigger "stale" warnings in the Web UI when no new lines are seen within the window.
LLM options per module¶
modules:
- name: homeassistant_follow
llm:
enabled: true
provider: 'openai'
prompt_template: './prompts/homeassistant.txt'
emit_llm_payloads_dir: './ha_llm_payloads'
context_prefix_lines: 2
Each module can choose whether to generate payloads. The prompt template is formatted with placeholders such as {pipeline}, {file_path}, {severity}, {reason}, {error_count}, {warning_count}, and {line_count}. A prompt fragment might look like this:
Pipeline {pipeline} reported {error_count} errors and {warning_count} warnings in {file_path}. Context:
{context}
context is assembled using the lines around each grouped chunk; tune context_prefix_lines to include enough lead-in. Edit templates from the Web UI to experiment quickly without restarting the CLI.
Alerts¶
Modules support multiple alert channels:
- Webhook:
yaml
alerts:
webhook:
enabled: true
url: 'https://hooks.example.local/logs'
headers:
Authorization: 'Bearer <token>'
- MQTT:
yaml
alerts:
mqtt:
enabled: true
host: 'mqtt.example.local'
port: 1883
topic: 'alerts/logs'
tls: false
LLM providers¶
Providers are defined globally and referenced by name from modules. Two provider backends are supported: openai (default, covers any OpenAI-compatible API) and anthropic (native Anthropic Messages API).
OpenAI-compatible provider¶
llm:
default_provider: openai
providers:
openai:
api_base: 'https://api.openai.com/v1'
model: 'gpt-4o-mini'
api_key_env: 'OPENAI_API_KEY'
Any OpenAI-compatible endpoint works here — local vLLM, Ollama, Azure OpenAI, etc. — by changing api_base.
Anthropic Claude provider¶
llm:
default_provider: claude
providers:
claude:
api_base: 'https://api.anthropic.com/v1'
model: 'claude-3-5-sonnet-20241022'
api_key_env: 'ANTHROPIC_API_KEY'
provider_type: anthropic # optional: auto-detected from api_base
Set the ANTHROPIC_API_KEY environment variable before running:
export ANTHROPIC_API_KEY=your_key
The provider_type field selects the backend:
| Value | Behaviour |
|---|---|
openai (default) |
OpenAI chat-completions format (/v1/chat/completions), Authorization: Bearer header |
anthropic |
Anthropic Messages API (/v1/messages), x-api-key header, system extracted from messages |
provider_type is auto-detected when api_base contains anthropic.com. Set it explicitly when using a proxy or a self-hosted Anthropic-compatible endpoint.
Using multiple providers¶
llm:
default_provider: claude
providers:
openai:
api_base: 'https://api.openai.com/v1'
model: 'gpt-4o-mini'
api_key_env: 'OPENAI_API_KEY'
claude:
api_base: 'https://api.anthropic.com/v1'
model: 'claude-3-5-sonnet-20241022'
api_key_env: 'ANTHROPIC_API_KEY'
local_vllm:
api_base: 'http://127.0.0.1:8000/v1'
model: 'TheBloke/Mistral-7B-Instruct-v0.1-GPTQ'
api_key_env: null
Each module can then select a provider by name via llm.provider. When only one provider is defined, modules inherit it automatically.
Database¶
To persist findings for the Web UI, configure a database connection:
database:
url: 'sqlite:///./logtriage.db'
echo: false
SQLite and Postgres URLs are both supported. When omitted, the Web UI stores data in memory and only reflects the current session.
RAG (Retrieval-Augmented Generation)¶
RAG enhances log analysis by automatically retrieving relevant documentation from knowledge bases and including it in LLM prompts. This provides more accurate, context-aware responses with proper citations.
Global RAG Configuration¶
rag:
enabled: true
cache_dir: "./rag_cache"
vector_store_dir: "./rag_vector_store"
embedding_model: "sentence-transformers/all-MiniLM-L6-v2"
device: "cpu" # or "cuda" for GPU acceleration
batch_size: 32
top_k: 5
similarity_threshold: 0.7
max_chunks: 10
Module-Level RAG Configuration¶
Add RAG configuration to individual modules:
modules:
my_service:
path: "/var/log/my_service"
pipeline: "my_pipeline"
llm:
enabled: true
provider: "openai"
rag:
enabled: true
knowledge_sources:
- repo_url: "https://github.com/myorg/service-docs"
branch: "main"
include_paths:
- "docs/**/*.md" # All .md files in docs and subdirectories
- "README.md" # Specific file in root
- "troubleshooting/*.md" # .md files in troubleshooting directory only
- repo_url: "https://github.com/myorg/troubleshooting-guide"
branch: "main"
include_paths:
- "**/*.md" # All .md files in entire repository
- "**/*.markdown" # All .markdown files in entire repository
RAG Configuration Options¶
- cache_dir: Directory for storing cloned Git repositories
- vector_store.persist_directory: Directory for ChromaDB vector storage
- embedding.model_name: SentenceTransformer model for embeddings
- embedding.device: "cpu" or "cuda" for GPU acceleration
- embedding.batch_size: Batch size for embedding generation
- retrieval.top_k: Maximum number of chunks to retrieve
- retrieval.similarity_threshold: Minimum similarity score (0.0-1.0)
- retrieval.max_chunks: Maximum chunks to consider during search
- knowledge_sources: List of Git repositories containing documentation
- include_paths: Glob patterns for selecting files from repositories
For detailed RAG setup instructions, see the RAG Quick Start Guide and RAG documentation.