AUA-Veritas · SQLite Schema v2

Database Schema

All 17 tables, every column, and every index in the local SQLite database. Data never leaves your machine — all AI responses are AES-128 encrypted at rest.

Backend: SQLite 3 (WAL mode, single-user) Tables: 17 Encryption: Fernet AES-128 on message content Location: ~/.local/share/AUA-Veritas/veritas.db
Key join paths. The central join is conversations.conversation_id → messages.conversation_id and conversations.conversation_id → model_runs.conversation_id. model_runs.conversation_id was added in Phase 13 — without it there was no way to join a query's model-performance record back to the conversation it belonged to. All content fields in messages are Fernet-encrypted ciphertext starting with gAAAAA.
core
conversations

One row per chat session. The root entity — all messages, model runs, token counters, backups, and keywords reference a conversation_id.

ColumnTypeNotes
conversation_idPKTEXTUUID v4. Generated on conversation creation.
user_idTEXTAlways 'local' in single-user mode.
titleTEXTDisplay title. Auto-generated from first query (first 40 chars). Editable via PATCH.
project_idTEXTNULL = global (no project). FK to projects.project_id (soft — FK enforcement off).
created_atREALUnix timestamp (float).
updated_atREALUpdated on every message insert. Used for ordering sidebar list.

idx_conv_user · idx_conv_updated DESC · idx_conv_project

core
messages

Every user turn and assistant response. Content is AES-128 (Fernet) encrypted — the ciphertext starts with gAAAAA. Pagination uses created_at as a cursor.

ColumnTypeNotes
message_idPKTEXTUUID v4.
conversation_idTEXTFK to conversations.
roleTEXT'user' or 'assistant'.
contentTEXTFernet ciphertext. Decrypted in memory only — never written plaintext.
callout_typeTEXT'disagreement' | 'correction' | NULL. Controls frontend card rendering.
models_usedTEXTJSON list of model IDs that answered (e.g. ["gpt-4o","claude-sonnet-4-6"]).
accuracy_levelTEXT'fast' | 'balanced' | 'high' | 'maximum'.
confidenceTEXT'High' | 'Medium' | 'Uncertain'. Display label from VCG scoring.
created_atREALPagination cursor.

idx_msg_conv · idx_msg_conv_ts (conversation_id, created_at DESC)

core
model_runs

One row per model per query. Tracks VCG welfare scores, utility, domain, and performance — the data source for the Quality panel and reliability graphs.

Phase 13 note: conversation_id was added in the Phase 13 backport. Without it there was no join path from a conversation to its model runs, making per-conversation analytics impossible.
ColumnTypeNotes
run_idPKTEXTUUID v4.
query_idTEXTGroups all model runs for a single query (loose ref, no FK).
conversation_idTEXTFK to conversations. Added Phase 13 — required join column.
model_idTEXTModel identifier string (e.g. 'gpt-4o').
roundTEXT'answer' | 'peer_review'.
raw_responseTEXTFirst ~500 chars of response text (preview only, not encrypted).
utility_scoreREALVCG utility score 0–1. NULL for fast mode (no multi-model contest).
confidence_scoreREALModel's internal confidence 0–1.
vcg_welfare_scoreREALW = P(domain) × confidence × prior score. Higher = VCG winner.
vcg_winnerINTEGER1 if VCG selected this model, 0 otherwise. 0 for all when disagreement detected (user picks).
corrections_appliedTEXTJSON list of correction_ids injected into this model's prompt.
latency_msREALWall-clock time from request to first token.
domainTEXTClassified domain (e.g. 'code', 'legal', 'general').
created_atREAL

idx_run_model · idx_run_conv


memory
corrections

Per-user correction memory. Each row is a learned preference or factual correction injected into future model prompts in the same domain. The core of the adaptive memory system.

ColumnTypeNotes
correction_idPKTEXTUUID v4.
user_idTEXTAlways 'local'.
typeTEXT'factual_correction' | 'persistent_instruction' | 'preference'.
scopeTEXT'global' | 'project' | 'conversation' | 'superseded'. Superseded = soft-deleted.
corrective_instructionTEXTThe text injected into future prompts.
canonical_queryTEXTNormalised topic key. Used for relevance matching via keyword overlap.
domainTEXTDomain this correction applies to.
confidenceREAL0–1. Higher = more likely to be injected.
decay_classTEXT'A' (permanent) | 'B' (slow decay) | 'C' (fast decay).
pinnedINTEGER1 = always inject, skip relevance scoring.
score_deltaINTEGERHow much this correction changed the model's reliability score.
extracted_viaTEXT'rules' (Layer 1) | 'semantic' (Layer 2) | 'explicit' (user prefix).
created_atREAL

idx_corr_user · idx_corr_canonical · idx_corr_domain · idx_corr_scope

memory
audit_log

Model reliability score events — written when a score crosses a threshold (correct_threshold=3, wrong_threshold=2). Powers the utility trajectory graph in Look Under the Hood.

ColumnTypeNotes
audit_idPKTEXT
model_idTEXTWhich model's score changed.
event_typeTEXT'score_update'. Extensible.
score_beforeINTEGERReliability score before the event (0–100 scale).
score_afterINTEGERReliability score after.
verdictTEXT'correct' | 'incorrect' | 'peer_review_flagged_incorrect'.
correction_storedINTEGER1 if a correction was stored as part of this event.
query_previewTEXTFirst ~80 chars of the query that triggered this event.
payloadTEXTJSON catch-all for additional context.
created_atREAL

idx_audit_model · idx_audit_event


search
message_keywords

Inverted index for full-text search. One row per keyword per message. Written asynchronously by the keyword worker after each query response. Also backfilled at startup for messages that weren't indexed before.

Search architecture: keywords are loaded into a sorted in-memory list at startup for O(log n) prefix matching via bisect. Average query: <10 ms. The DB table survives restarts; the in-memory index is rebuilt from it at startup.
ColumnTypeNotes
idPKINTEGERAutoincrement.
keywordTEXTNormalised lowercase keyword. Multi-word queries use AND semantics across keyword rows.
message_idTEXTFK to messages. Also carries the match_message_id returned by GET /search for scroll-to.
conversation_idTEXTRedundant with messages for direct query without join.
roleTEXT'user' | 'assistant'.
created_atREAL

idx_kw_keyword · idx_kw_conv · idx_kw_msg


backup
context_backups

Structured handoff notes written by each model every ~30 messages, or when context pressure hits 70% of window size. A new window that receives this can pick up immediately without re-briefing.

Validity check: a backup is VALID when MAX(context_backups.created_at WHERE model_id=M, conversation_id=C) > MAX(messages.created_at WHERE conversation_id=C). The 6-hour coverage job finds and fills gaps.
ColumnTypeNotes
backup_idPKTEXT
model_idTEXTEach model writes its own backup — not shared across models.
conversation_idTEXTFK to conversations.
backup_textTEXTStructured 6-section handoff note: GOAL / DECISIONS MADE / CURRENT STATUS / ACTIVE FILE / PREFERENCES / RESUME INSTRUCTION.
token_estimateINTEGERchars / 4. Used for coverage reporting.
thread_numberINTEGERIncrements each time a backup triggers a counter reset. Tracks backup generations.
triggerTEXT'token_threshold' | 'message_count' | 'time_gap' | 'manual' | 'coverage_job_stale'.
created_atREALAppend-only — old rows never deleted (audit trail).

idx_backup_conv · idx_backup_model (model_id, conversation_id)

backup
token_counters

Shadow token counter per model per conversation. Resets to zero when a context backup is stored (new thread starts). The three backup triggers check this table.

ColumnTypeNotes
counter_idPKTEXTUNIQUE(model_id, conversation_id).
model_idTEXT
conversation_idTEXT
token_estimateINTEGERchars / 4 heuristic. Resets to 0 after backup.
message_countINTEGERMessages since last backup. Triggers at 30.
thread_numberINTEGERMatches context_backups.thread_number for the current window.
last_backup_atREALUsed for the 24h time-gap trigger.
updated_atREAL

idx_counter_conv

backup
model_context_prompts

Per-model recovery prompts — written by the model itself from its active corrections. Regenerated every 15 minutes or when the set of corrections changes (tracked by SHA-256 hash). Injected at the start of every new session.

ColumnTypeNotes
model_idPKTEXTUNIQUE(model_id, conversation_id). MVP: conversation_id always NULL (global prompts).
conversation_idTEXTAlways NULL currently. Reserved for per-conversation prompts.
prompt_textTEXTThe full prompt text injected before user messages.
generated_atREALStaleness check: >24h → regenerate.
corrections_hashTEXTSHA-256 of sorted correction IDs at generation time. Changed hash → regenerate.

organisation
projects

Named groups for conversations. When a project is active, the sidebar shows only that project's conversations and new chats are automatically assigned to it. Corrections and memory remain global.

ColumnTypeNotes
project_idPKTEXTUUID v4. Referenced by conversations.project_id.
user_idTEXTAlways 'local'.
nameTEXTDisplay name (e.g. 'Immigration Law').
created_atREAL
organisation
attachments

File upload metadata. Raw file bytes are never stored in the DB — only extracted text metadata. Keyword indexing runs on the extracted text.

ColumnTypeNotes
attachment_idPKTEXT
message_idTEXTFK to messages. Set after query completes.
conversation_idTEXT
filenameTEXTOriginal filename.
mimetypeTEXT'application/pdf' | 'image/png' etc.
size_bytes / page_count / word_countINTEGERMetadata extracted at upload time.
is_image / truncatedINTEGERFlags.
previewTEXTFirst 200 chars of extracted text.
created_atREAL

idx_attach_conv · idx_attach_msg


diagnostics
crash_sentinel

One row per app session. Written 'running' at startup, updated to 'clean' on shutdown. On next launch: any 'running' row = crash detected → async report filed.

ColumnTypeNotes
session_idPKTEXTUUID generated at startup.
statusTEXT'running''clean''reported'. Prevents duplicate reports.
started_at / ended_atREALSession timestamps.
system_log_snippet / api_log_snippetTEXTLast 50 log lines captured for the report.
diagnostics
pending_error_reports

Error reports queued for async sending. If the app is live, reports go immediately. If not, they're sent on next startup. Prevents report loss during crashes.

ColumnTypeNotes
report_idPKTEXT
error_type / error_message / tracebackTEXTError details.
contextTEXTJSON extra context (query, model, domain).
sentINTEGER0 = pending, 1 = sent.
sent_atREALTimestamp when sent.
diagnostics
model_config_cache

Caches the last successfully fetched models.json from GitHub Pages. Fallback when offline. Three-level chain: remote JSON → DB cache → hardcoded fallback.

ColumnTypeNotes
cache_idPKTEXTAlways 'remote' — single row.
config_jsonTEXTFull models.json content.
fetched_atREALRefreshed every 24 h.

domain
domain_nodes

Dynamic domain ontology tree. L0 roots (code, legal, medical, finance, etc.) are fixed anchors. Sub-nodes are promoted dynamically when query volume and divergence thresholds are met. The tree drives specialist routing.

ColumnTypeNotes
node_idPKTEXTSlug (e.g. 'code.python.async').
parent_idTEXTSelf-referential FK. NULL for L0 roots.
depthINTEGER0 = L0 root, 1 = sub-domain, etc.
display_nameTEXTHuman-readable label.
aliasesTEXTJSON array of alternative names that map to this node.
embeddingBLOBSerialised float32 centroid for semantic routing. Optional.
query_countINTEGERQueries routed to this node. Used in promotion decisions.
is_l0_rootINTEGER1 = fixed anchor, never deleted or demoted.

idx_dn_parent · idx_dn_depth

domain
domain_candidates

Raw domain strings seen but not yet resolved to an alias or new node. Evidence accumulates across sessions (K_MIN=5 occurrences triggers promotion review). Persisted so the threshold survives restarts.

ColumnTypeNotes
raw_stringPKTEXTExactly as reported by the model.
nearest_nodeTEXTClosest existing node_id by cosine / edit similarity.
similarityREAL0–1. High similarity → likely an alias, not a new node.
query_countINTEGERTimes seen. Promotion triggers at K_MIN=5.
model_sourcesTEXTJSON array of model_ids that reported this string.
first_seen / last_seenREAL

settings
local_settings

Single-row table storing all user preferences. Seeded at first launch via INSERT OR IGNORE.

ColumnTypeNotes
settings_idPKTEXTAlways 'local' — single row.
local_enabled / verify_enabledINTEGEROllama local model mode flags.
verify_model_idTEXTModel used to verify local model outputs.
implicit_thresholdREALLayer 1 correction detection sensitivity (default 0.45).
content_encryptedINTEGER1 when AES encryption is active.
terms_accepted_atREALNULL until first-run T&C accepted.
user_tokenTEXTAnonymous machine hash for bug report correlation. Never PII.
bugs_patTEXTWrite-only GitHub PAT for sending bug reports. Optional.
skipped_versionTEXTVersion string the user chose to skip the update banner for.
core
users

Single-user identity table. Always one row with user_id='local' in this release. Reserved for future multi-user support.

ColumnTypeNotes
user_idPKTEXTAlways 'local'.
created_atREAL
settingsTEXTJSON catch-all. Primary settings live in local_settings.