AUA-Veritas · SQLite Schema v2

Database Schema

All 17 tables, every column, and every index in the local SQLite database. Data never leaves your machine — all AI responses are AES-128 encrypted at rest.

Backend: SQLite 3 (WAL mode, single-user) Tables: 17 Encryption: Fernet AES-128 on message content Location: ~/.local/share/AUA-Veritas/veritas.db

Key join paths. The central join is conversations.conversation_id → messages.conversation_id and conversations.conversation_id → model_runs.conversation_id. model_runs.conversation_id was added in Phase 13 — without it there was no way to join a query's model-performance record back to the conversation it belonged to. All content fields in messages are Fernet-encrypted ciphertext starting with gAAAAA.

core

conversations

One row per chat session. The root entity — all messages, model runs, token counters, backups, and keywords reference a conversation_id.

Column	Type	Notes
conversation_idPK	TEXT	UUID v4. Generated on conversation creation.
user_id	TEXT	Always `'local'` in single-user mode.
title	TEXT	Display title. Auto-generated from first query (first 40 chars). Editable via PATCH.
project_id	TEXT	`NULL` = global (no project). FK to `projects.project_id` (soft — FK enforcement off).
created_at	REAL	Unix timestamp (float).
updated_at	REAL	Updated on every message insert. Used for ordering sidebar list.

idx_conv_user · idx_conv_updated DESC · idx_conv_project

core

messages

Every user turn and assistant response. Content is AES-128 (Fernet) encrypted — the ciphertext starts with gAAAAA. Pagination uses created_at as a cursor.

Column	Type	Notes
message_idPK	TEXT	UUID v4.
conversation_id	TEXT	FK to `conversations`.
role	TEXT	`'user'` or `'assistant'`.
content	TEXT	Fernet ciphertext. Decrypted in memory only — never written plaintext.
callout_type	TEXT	`'disagreement'` \| `'correction'` \| `NULL`. Controls frontend card rendering.
models_used	TEXT	JSON list of model IDs that answered (e.g. `["gpt-4o","claude-sonnet-4-6"]`).
accuracy_level	TEXT	`'fast'` \| `'balanced'` \| `'high'` \| `'maximum'`.
confidence	TEXT	`'High'` \| `'Medium'` \| `'Uncertain'`. Display label from VCG scoring.
created_at	REAL	Pagination cursor.

idx_msg_conv · idx_msg_conv_ts (conversation_id, created_at DESC)

core

model_runs

One row per model per query. Tracks VCG welfare scores, utility, domain, and performance — the data source for the Quality panel and reliability graphs.

Phase 13 note: conversation_id was added in the Phase 13 backport. Without it there was no join path from a conversation to its model runs, making per-conversation analytics impossible.

Column	Type	Notes
run_idPK	TEXT	UUID v4.
query_id	TEXT	Groups all model runs for a single query (loose ref, no FK).
conversation_id	TEXT	FK to `conversations`. Added Phase 13 — required join column.
model_id	TEXT	Model identifier string (e.g. `'gpt-4o'`).
round	TEXT	`'answer'` \| `'peer_review'`.
raw_response	TEXT	First ~500 chars of response text (preview only, not encrypted).
utility_score	REAL	VCG utility score 0–1. NULL for fast mode (no multi-model contest).
confidence_score	REAL	Model's internal confidence 0–1.
vcg_welfare_score	REAL	W = P(domain) × confidence × prior score. Higher = VCG winner.
vcg_winner	INTEGER	`1` if VCG selected this model, `0` otherwise. `0` for all when disagreement detected (user picks).
corrections_applied	TEXT	JSON list of correction_ids injected into this model's prompt.
latency_ms	REAL	Wall-clock time from request to first token.
domain	TEXT	Classified domain (e.g. `'code'`, `'legal'`, `'general'`).
created_at	REAL

idx_run_model · idx_run_conv

memory

corrections

Per-user correction memory. Each row is a learned preference or factual correction injected into future model prompts in the same domain. The core of the adaptive memory system.

Column	Type	Notes
correction_idPK	TEXT	UUID v4.
user_id	TEXT	Always `'local'`.
type	TEXT	`'factual_correction'` \| `'persistent_instruction'` \| `'preference'`.
scope	TEXT	`'global'` \| `'project'` \| `'conversation'` \| `'superseded'`. Superseded = soft-deleted.
corrective_instruction	TEXT	The text injected into future prompts.
canonical_query	TEXT	Normalised topic key. Used for relevance matching via keyword overlap.
domain	TEXT	Domain this correction applies to.
confidence	REAL	0–1. Higher = more likely to be injected.
decay_class	TEXT	`'A'` (permanent) \| `'B'` (slow decay) \| `'C'` (fast decay).
pinned	INTEGER	`1` = always inject, skip relevance scoring.
score_delta	INTEGER	How much this correction changed the model's reliability score.
extracted_via	TEXT	`'rules'` (Layer 1) \| `'semantic'` (Layer 2) \| `'explicit'` (user prefix).
created_at	REAL

idx_corr_user · idx_corr_canonical · idx_corr_domain · idx_corr_scope

memory

audit_log

Model reliability score events — written when a score crosses a threshold (correct_threshold=3, wrong_threshold=2). Powers the utility trajectory graph in Look Under the Hood.

Column	Type	Notes
audit_idPK	TEXT
model_id	TEXT	Which model's score changed.
event_type	TEXT	`'score_update'`. Extensible.
score_before	INTEGER	Reliability score before the event (0–100 scale).
score_after	INTEGER	Reliability score after.
verdict	TEXT	`'correct'` \| `'incorrect'` \| `'peer_review_flagged_incorrect'`.
correction_stored	INTEGER	`1` if a correction was stored as part of this event.
query_preview	TEXT	First ~80 chars of the query that triggered this event.
payload	TEXT	JSON catch-all for additional context.
created_at	REAL

idx_audit_model · idx_audit_event

message_keywords

Inverted index for full-text search. One row per keyword per message. Written asynchronously by the keyword worker after each query response. Also backfilled at startup for messages that weren't indexed before.

Search architecture: keywords are loaded into a sorted in-memory list at startup for O(log n) prefix matching via bisect. Average query: <10 ms. The DB table survives restarts; the in-memory index is rebuilt from it at startup.

Column	Type	Notes
idPK	INTEGER	Autoincrement.
keyword	TEXT	Normalised lowercase keyword. Multi-word queries use AND semantics across keyword rows.
message_id	TEXT	FK to `messages`. Also carries the `match_message_id` returned by `GET /search` for scroll-to.
conversation_id	TEXT	Redundant with messages for direct query without join.
role	TEXT	`'user'` \| `'assistant'`.
created_at	REAL

idx_kw_keyword · idx_kw_conv · idx_kw_msg

backup

context_backups

Structured handoff notes written by each model every ~30 messages, or when context pressure hits 70% of window size. A new window that receives this can pick up immediately without re-briefing.

Validity check: a backup is VALID when MAX(context_backups.created_at WHERE model_id=M, conversation_id=C) > MAX(messages.created_at WHERE conversation_id=C). The 6-hour coverage job finds and fills gaps.

Column	Type	Notes
backup_idPK	TEXT
model_id	TEXT	Each model writes its own backup — not shared across models.
conversation_id	TEXT	FK to `conversations`.
backup_text	TEXT	Structured 6-section handoff note: GOAL / DECISIONS MADE / CURRENT STATUS / ACTIVE FILE / PREFERENCES / RESUME INSTRUCTION.
token_estimate	INTEGER	chars / 4. Used for coverage reporting.
thread_number	INTEGER	Increments each time a backup triggers a counter reset. Tracks backup generations.
trigger	TEXT	`'token_threshold'` \| `'message_count'` \| `'time_gap'` \| `'manual'` \| `'coverage_job_stale'`.
created_at	REAL	Append-only — old rows never deleted (audit trail).

idx_backup_conv · idx_backup_model (model_id, conversation_id)

backup

token_counters

Shadow token counter per model per conversation. Resets to zero when a context backup is stored (new thread starts). The three backup triggers check this table.

Column	Type	Notes
counter_idPK	TEXT	UNIQUE(model_id, conversation_id).
model_id	TEXT
conversation_id	TEXT
token_estimate	INTEGER	chars / 4 heuristic. Resets to 0 after backup.
message_count	INTEGER	Messages since last backup. Triggers at 30.
thread_number	INTEGER	Matches `context_backups.thread_number` for the current window.
last_backup_at	REAL	Used for the 24h time-gap trigger.
updated_at	REAL

idx_counter_conv

backup

model_context_prompts

Per-model recovery prompts — written by the model itself from its active corrections. Regenerated every 15 minutes or when the set of corrections changes (tracked by SHA-256 hash). Injected at the start of every new session.

Column	Type	Notes
model_idPK	TEXT	UNIQUE(model_id, conversation_id). MVP: conversation_id always NULL (global prompts).
conversation_id	TEXT	Always NULL currently. Reserved for per-conversation prompts.
prompt_text	TEXT	The full prompt text injected before user messages.
generated_at	REAL	Staleness check: >24h → regenerate.
corrections_hash	TEXT	SHA-256 of sorted correction IDs at generation time. Changed hash → regenerate.

organisation

projects

Named groups for conversations. When a project is active, the sidebar shows only that project's conversations and new chats are automatically assigned to it. Corrections and memory remain global.

Column	Type	Notes
project_idPK	TEXT	UUID v4. Referenced by `conversations.project_id`.
user_id	TEXT	Always `'local'`.
name	TEXT	Display name (e.g. `'Immigration Law'`).
created_at	REAL

organisation

attachments

File upload metadata. Raw file bytes are never stored in the DB — only extracted text metadata. Keyword indexing runs on the extracted text.

Column	Type	Notes
attachment_idPK	TEXT
message_id	TEXT	FK to `messages`. Set after query completes.
conversation_id	TEXT
filename	TEXT	Original filename.
mimetype	TEXT	`'application/pdf'` \| `'image/png'` etc.
size_bytes / page_count / word_count	INTEGER	Metadata extracted at upload time.
is_image / truncated	INTEGER	Flags.
preview	TEXT	First 200 chars of extracted text.
created_at	REAL

idx_attach_conv · idx_attach_msg

diagnostics

crash_sentinel

One row per app session. Written 'running' at startup, updated to 'clean' on shutdown. On next launch: any 'running' row = crash detected → async report filed.

Column	Type	Notes
session_idPK	TEXT	UUID generated at startup.
status	TEXT	`'running'` → `'clean'` → `'reported'`. Prevents duplicate reports.
started_at / ended_at	REAL	Session timestamps.
system_log_snippet / api_log_snippet	TEXT	Last 50 log lines captured for the report.

diagnostics

pending_error_reports

Error reports queued for async sending. If the app is live, reports go immediately. If not, they're sent on next startup. Prevents report loss during crashes.

Column	Type	Notes
report_idPK	TEXT
error_type / error_message / traceback	TEXT	Error details.
context	TEXT	JSON extra context (query, model, domain).
sent	INTEGER	`0` = pending, `1` = sent.
sent_at	REAL	Timestamp when sent.

diagnostics

model_config_cache

Caches the last successfully fetched models.json from GitHub Pages. Fallback when offline. Three-level chain: remote JSON → DB cache → hardcoded fallback.

Column	Type	Notes
cache_idPK	TEXT	Always `'remote'` — single row.
config_json	TEXT	Full models.json content.
fetched_at	REAL	Refreshed every 24 h.

domain

domain_nodes

Dynamic domain ontology tree. L0 roots (code, legal, medical, finance, etc.) are fixed anchors. Sub-nodes are promoted dynamically when query volume and divergence thresholds are met. The tree drives specialist routing.

Column	Type	Notes
node_idPK	TEXT	Slug (e.g. `'code.python.async'`).
parent_id	TEXT	Self-referential FK. NULL for L0 roots.
depth	INTEGER	`0` = L0 root, `1` = sub-domain, etc.
display_name	TEXT	Human-readable label.
aliases	TEXT	JSON array of alternative names that map to this node.
embedding	BLOB	Serialised float32 centroid for semantic routing. Optional.
query_count	INTEGER	Queries routed to this node. Used in promotion decisions.
is_l0_root	INTEGER	`1` = fixed anchor, never deleted or demoted.

idx_dn_parent · idx_dn_depth

domain

domain_candidates

Raw domain strings seen but not yet resolved to an alias or new node. Evidence accumulates across sessions (K_MIN=5 occurrences triggers promotion review). Persisted so the threshold survives restarts.

Column	Type	Notes
raw_stringPK	TEXT	Exactly as reported by the model.
nearest_node	TEXT	Closest existing node_id by cosine / edit similarity.
similarity	REAL	0–1. High similarity → likely an alias, not a new node.
query_count	INTEGER	Times seen. Promotion triggers at K_MIN=5.
model_sources	TEXT	JSON array of model_ids that reported this string.
first_seen / last_seen	REAL

settings

local_settings

Single-row table storing all user preferences. Seeded at first launch via INSERT OR IGNORE.

Column	Type	Notes
settings_idPK	TEXT	Always `'local'` — single row.
local_enabled / verify_enabled	INTEGER	Ollama local model mode flags.
verify_model_id	TEXT	Model used to verify local model outputs.
implicit_threshold	REAL	Layer 1 correction detection sensitivity (default 0.45).
content_encrypted	INTEGER	`1` when AES encryption is active.
terms_accepted_at	REAL	NULL until first-run T&C accepted.
user_token	TEXT	Anonymous machine hash for bug report correlation. Never PII.
bugs_pat	TEXT	Write-only GitHub PAT for sending bug reports. Optional.
skipped_version	TEXT	Version string the user chose to skip the update banner for.

core

users

Single-user identity table. Always one row with user_id='local' in this release. Reserved for future multi-user support.

Column	Type	Notes
user_idPK	TEXT	Always `'local'`.
created_at	REAL
settings	TEXT	JSON catch-all. Primary settings live in `local_settings`.