Database Schema
All 17 tables, every column, and every index in the local SQLite database. Data never leaves your machine — all AI responses are AES-128 encrypted at rest.
conversations.conversation_id → messages.conversation_id and
conversations.conversation_id → model_runs.conversation_id.
model_runs.conversation_id was added in Phase 13 — without it there was no way
to join a query's model-performance record back to the conversation it belonged to.
All content fields in messages are Fernet-encrypted ciphertext starting with gAAAAA.
One row per chat session. The root entity — all messages, model runs, token counters, backups, and keywords reference a conversation_id.
| Column | Type | Notes |
|---|---|---|
| conversation_idPK | TEXT | UUID v4. Generated on conversation creation. |
| user_id | TEXT | Always 'local' in single-user mode. |
| title | TEXT | Display title. Auto-generated from first query (first 40 chars). Editable via PATCH. |
| project_id | TEXT | NULL = global (no project). FK to projects.project_id (soft — FK enforcement off). |
| created_at | REAL | Unix timestamp (float). |
| updated_at | REAL | Updated on every message insert. Used for ordering sidebar list. |
idx_conv_user · idx_conv_updated DESC · idx_conv_project
Every user turn and assistant response. Content is AES-128 (Fernet) encrypted — the ciphertext starts with gAAAAA. Pagination uses created_at as a cursor.
| Column | Type | Notes |
|---|---|---|
| message_idPK | TEXT | UUID v4. |
| conversation_id | TEXT | FK to conversations. |
| role | TEXT | 'user' or 'assistant'. |
| content | TEXT | Fernet ciphertext. Decrypted in memory only — never written plaintext. |
| callout_type | TEXT | 'disagreement' | 'correction' | NULL. Controls frontend card rendering. |
| models_used | TEXT | JSON list of model IDs that answered (e.g. ["gpt-4o","claude-sonnet-4-6"]). |
| accuracy_level | TEXT | 'fast' | 'balanced' | 'high' | 'maximum'. |
| confidence | TEXT | 'High' | 'Medium' | 'Uncertain'. Display label from VCG scoring. |
| created_at | REAL | Pagination cursor. |
idx_msg_conv · idx_msg_conv_ts (conversation_id, created_at DESC)
One row per model per query. Tracks VCG welfare scores, utility, domain, and performance — the data source for the Quality panel and reliability graphs.
conversation_id was added in the Phase 13 backport. Without it there was no join path from a conversation to its model runs, making per-conversation analytics impossible.| Column | Type | Notes |
|---|---|---|
| run_idPK | TEXT | UUID v4. |
| query_id | TEXT | Groups all model runs for a single query (loose ref, no FK). |
| conversation_id | TEXT | FK to conversations. Added Phase 13 — required join column. |
| model_id | TEXT | Model identifier string (e.g. 'gpt-4o'). |
| round | TEXT | 'answer' | 'peer_review'. |
| raw_response | TEXT | First ~500 chars of response text (preview only, not encrypted). |
| utility_score | REAL | VCG utility score 0–1. NULL for fast mode (no multi-model contest). |
| confidence_score | REAL | Model's internal confidence 0–1. |
| vcg_welfare_score | REAL | W = P(domain) × confidence × prior score. Higher = VCG winner. |
| vcg_winner | INTEGER | 1 if VCG selected this model, 0 otherwise. 0 for all when disagreement detected (user picks). |
| corrections_applied | TEXT | JSON list of correction_ids injected into this model's prompt. |
| latency_ms | REAL | Wall-clock time from request to first token. |
| domain | TEXT | Classified domain (e.g. 'code', 'legal', 'general'). |
| created_at | REAL |
idx_run_model · idx_run_conv
Per-user correction memory. Each row is a learned preference or factual correction injected into future model prompts in the same domain. The core of the adaptive memory system.
| Column | Type | Notes |
|---|---|---|
| correction_idPK | TEXT | UUID v4. |
| user_id | TEXT | Always 'local'. |
| type | TEXT | 'factual_correction' | 'persistent_instruction' | 'preference'. |
| scope | TEXT | 'global' | 'project' | 'conversation' | 'superseded'. Superseded = soft-deleted. |
| corrective_instruction | TEXT | The text injected into future prompts. |
| canonical_query | TEXT | Normalised topic key. Used for relevance matching via keyword overlap. |
| domain | TEXT | Domain this correction applies to. |
| confidence | REAL | 0–1. Higher = more likely to be injected. |
| decay_class | TEXT | 'A' (permanent) | 'B' (slow decay) | 'C' (fast decay). |
| pinned | INTEGER | 1 = always inject, skip relevance scoring. |
| score_delta | INTEGER | How much this correction changed the model's reliability score. |
| extracted_via | TEXT | 'rules' (Layer 1) | 'semantic' (Layer 2) | 'explicit' (user prefix). |
| created_at | REAL |
idx_corr_user · idx_corr_canonical · idx_corr_domain · idx_corr_scope
Model reliability score events — written when a score crosses a threshold (correct_threshold=3, wrong_threshold=2). Powers the utility trajectory graph in Look Under the Hood.
| Column | Type | Notes |
|---|---|---|
| audit_idPK | TEXT | |
| model_id | TEXT | Which model's score changed. |
| event_type | TEXT | 'score_update'. Extensible. |
| score_before | INTEGER | Reliability score before the event (0–100 scale). |
| score_after | INTEGER | Reliability score after. |
| verdict | TEXT | 'correct' | 'incorrect' | 'peer_review_flagged_incorrect'. |
| correction_stored | INTEGER | 1 if a correction was stored as part of this event. |
| query_preview | TEXT | First ~80 chars of the query that triggered this event. |
| payload | TEXT | JSON catch-all for additional context. |
| created_at | REAL |
idx_audit_model · idx_audit_event
Inverted index for full-text search. One row per keyword per message. Written asynchronously by the keyword worker after each query response. Also backfilled at startup for messages that weren't indexed before.
bisect. Average query: <10 ms. The DB table survives restarts; the in-memory index is rebuilt from it at startup.| Column | Type | Notes |
|---|---|---|
| idPK | INTEGER | Autoincrement. |
| keyword | TEXT | Normalised lowercase keyword. Multi-word queries use AND semantics across keyword rows. |
| message_id | TEXT | FK to messages. Also carries the match_message_id returned by GET /search for scroll-to. |
| conversation_id | TEXT | Redundant with messages for direct query without join. |
| role | TEXT | 'user' | 'assistant'. |
| created_at | REAL |
idx_kw_keyword · idx_kw_conv · idx_kw_msg
Structured handoff notes written by each model every ~30 messages, or when context pressure hits 70% of window size. A new window that receives this can pick up immediately without re-briefing.
MAX(context_backups.created_at WHERE model_id=M, conversation_id=C) > MAX(messages.created_at WHERE conversation_id=C). The 6-hour coverage job finds and fills gaps.| Column | Type | Notes |
|---|---|---|
| backup_idPK | TEXT | |
| model_id | TEXT | Each model writes its own backup — not shared across models. |
| conversation_id | TEXT | FK to conversations. |
| backup_text | TEXT | Structured 6-section handoff note: GOAL / DECISIONS MADE / CURRENT STATUS / ACTIVE FILE / PREFERENCES / RESUME INSTRUCTION. |
| token_estimate | INTEGER | chars / 4. Used for coverage reporting. |
| thread_number | INTEGER | Increments each time a backup triggers a counter reset. Tracks backup generations. |
| trigger | TEXT | 'token_threshold' | 'message_count' | 'time_gap' | 'manual' | 'coverage_job_stale'. |
| created_at | REAL | Append-only — old rows never deleted (audit trail). |
idx_backup_conv · idx_backup_model (model_id, conversation_id)
Shadow token counter per model per conversation. Resets to zero when a context backup is stored (new thread starts). The three backup triggers check this table.
| Column | Type | Notes |
|---|---|---|
| counter_idPK | TEXT | UNIQUE(model_id, conversation_id). |
| model_id | TEXT | |
| conversation_id | TEXT | |
| token_estimate | INTEGER | chars / 4 heuristic. Resets to 0 after backup. |
| message_count | INTEGER | Messages since last backup. Triggers at 30. |
| thread_number | INTEGER | Matches context_backups.thread_number for the current window. |
| last_backup_at | REAL | Used for the 24h time-gap trigger. |
| updated_at | REAL |
idx_counter_conv
Per-model recovery prompts — written by the model itself from its active corrections. Regenerated every 15 minutes or when the set of corrections changes (tracked by SHA-256 hash). Injected at the start of every new session.
| Column | Type | Notes |
|---|---|---|
| model_idPK | TEXT | UNIQUE(model_id, conversation_id). MVP: conversation_id always NULL (global prompts). |
| conversation_id | TEXT | Always NULL currently. Reserved for per-conversation prompts. |
| prompt_text | TEXT | The full prompt text injected before user messages. |
| generated_at | REAL | Staleness check: >24h → regenerate. |
| corrections_hash | TEXT | SHA-256 of sorted correction IDs at generation time. Changed hash → regenerate. |
Named groups for conversations. When a project is active, the sidebar shows only that project's conversations and new chats are automatically assigned to it. Corrections and memory remain global.
| Column | Type | Notes |
|---|---|---|
| project_idPK | TEXT | UUID v4. Referenced by conversations.project_id. |
| user_id | TEXT | Always 'local'. |
| name | TEXT | Display name (e.g. 'Immigration Law'). |
| created_at | REAL |
File upload metadata. Raw file bytes are never stored in the DB — only extracted text metadata. Keyword indexing runs on the extracted text.
| Column | Type | Notes |
|---|---|---|
| attachment_idPK | TEXT | |
| message_id | TEXT | FK to messages. Set after query completes. |
| conversation_id | TEXT | |
| filename | TEXT | Original filename. |
| mimetype | TEXT | 'application/pdf' | 'image/png' etc. |
| size_bytes / page_count / word_count | INTEGER | Metadata extracted at upload time. |
| is_image / truncated | INTEGER | Flags. |
| preview | TEXT | First 200 chars of extracted text. |
| created_at | REAL |
idx_attach_conv · idx_attach_msg
One row per app session. Written 'running' at startup, updated to 'clean' on shutdown. On next launch: any 'running' row = crash detected → async report filed.
| Column | Type | Notes |
|---|---|---|
| session_idPK | TEXT | UUID generated at startup. |
| status | TEXT | 'running' → 'clean' → 'reported'. Prevents duplicate reports. |
| started_at / ended_at | REAL | Session timestamps. |
| system_log_snippet / api_log_snippet | TEXT | Last 50 log lines captured for the report. |
Error reports queued for async sending. If the app is live, reports go immediately. If not, they're sent on next startup. Prevents report loss during crashes.
| Column | Type | Notes |
|---|---|---|
| report_idPK | TEXT | |
| error_type / error_message / traceback | TEXT | Error details. |
| context | TEXT | JSON extra context (query, model, domain). |
| sent | INTEGER | 0 = pending, 1 = sent. |
| sent_at | REAL | Timestamp when sent. |
Caches the last successfully fetched models.json from GitHub Pages. Fallback when offline. Three-level chain: remote JSON → DB cache → hardcoded fallback.
| Column | Type | Notes |
|---|---|---|
| cache_idPK | TEXT | Always 'remote' — single row. |
| config_json | TEXT | Full models.json content. |
| fetched_at | REAL | Refreshed every 24 h. |
Dynamic domain ontology tree. L0 roots (code, legal, medical, finance, etc.) are fixed anchors. Sub-nodes are promoted dynamically when query volume and divergence thresholds are met. The tree drives specialist routing.
| Column | Type | Notes |
|---|---|---|
| node_idPK | TEXT | Slug (e.g. 'code.python.async'). |
| parent_id | TEXT | Self-referential FK. NULL for L0 roots. |
| depth | INTEGER | 0 = L0 root, 1 = sub-domain, etc. |
| display_name | TEXT | Human-readable label. |
| aliases | TEXT | JSON array of alternative names that map to this node. |
| embedding | BLOB | Serialised float32 centroid for semantic routing. Optional. |
| query_count | INTEGER | Queries routed to this node. Used in promotion decisions. |
| is_l0_root | INTEGER | 1 = fixed anchor, never deleted or demoted. |
idx_dn_parent · idx_dn_depth
Raw domain strings seen but not yet resolved to an alias or new node. Evidence accumulates across sessions (K_MIN=5 occurrences triggers promotion review). Persisted so the threshold survives restarts.
| Column | Type | Notes |
|---|---|---|
| raw_stringPK | TEXT | Exactly as reported by the model. |
| nearest_node | TEXT | Closest existing node_id by cosine / edit similarity. |
| similarity | REAL | 0–1. High similarity → likely an alias, not a new node. |
| query_count | INTEGER | Times seen. Promotion triggers at K_MIN=5. |
| model_sources | TEXT | JSON array of model_ids that reported this string. |
| first_seen / last_seen | REAL |
Single-row table storing all user preferences. Seeded at first launch via INSERT OR IGNORE.
| Column | Type | Notes |
|---|---|---|
| settings_idPK | TEXT | Always 'local' — single row. |
| local_enabled / verify_enabled | INTEGER | Ollama local model mode flags. |
| verify_model_id | TEXT | Model used to verify local model outputs. |
| implicit_threshold | REAL | Layer 1 correction detection sensitivity (default 0.45). |
| content_encrypted | INTEGER | 1 when AES encryption is active. |
| terms_accepted_at | REAL | NULL until first-run T&C accepted. |
| user_token | TEXT | Anonymous machine hash for bug report correlation. Never PII. |
| bugs_pat | TEXT | Write-only GitHub PAT for sending bug reports. Optional. |
| skipped_version | TEXT | Version string the user chose to skip the update banner for. |
Single-user identity table. Always one row with user_id='local' in this release. Reserved for future multi-user support.
| Column | Type | Notes |
|---|---|---|
| user_idPK | TEXT | Always 'local'. |
| created_at | REAL | |
| settings | TEXT | JSON catch-all. Primary settings live in local_settings. |