Two ways to use AUA-Veritas

Most people start with one and discover the other. Both are always running.

🧠 As a context manager

Every conversation keyword-indexed and searchable
Leave for months, come back — AI resumes instantly
Corrections and preferences persist forever, across all chats
Works even if you only use one model

⚖️ As a multi-model verifier

Multiple models compete on every query
Disagreements surfaced explicitly — you pick the better answer
Independent peer review on High/Max accuracy
Game-theoretically optimal model selection via VCG

If you only need one thing: Use Balanced mode with a single model (GPT-4o or Claude Sonnet). The context management, keyword search, and correction memory all work exactly the same. You can add more models later when it matters.

Keyword search across all chats

Every message you send and receive is keyword-extracted at send time and indexed locally. Search finds matches across months of conversations in milliseconds.

How to search

Use the search bar in the left sidebar. Type any keyword or phrase — the sidebar filters to matching conversations instantly.

🔍 postgres index LIKE query

3 months ago · Backend architecture

"…For prefix LIKE queries, a B-tree index works. For arbitrary substring searches, use GIN with pg_trgm…"

5 months ago · Database design

"…we chose Postgres because ACID matters more than schema flexibility at this stage…"

🔧 How search works under the hood

Every message is keyword-extracted by a background worker after it's sent — zero impact on response speed
Keywords are stored in a local message_keywords SQLite table and simultaneously added to three in-memory structures: an inverted index (keyword → conversations), a message-level index (keyword → conversation → messages), and a sorted keyword list for prefix matching
Search never reads the DB — it runs entirely in memory via set intersection (AND semantics: all words must appear) and bisect-based prefix lookup. Latency is typically under 1ms regardless of history size
Clicking a result scrolls to the exact message that matched, not just the conversation
Old conversations are backfilled into the index on first launch — one-time background operation

What to search for: Technical terms work best — library names, error messages, architectural decisions, specific API names. The extractor is domain-aware and weights code-adjacent terms highly.

Resuming a conversation after months

This is where the context management pays off. You don't need to re-explain your project every time you come back to it.

What happens automatically

1

Corrections and preferences are injected

Every stored correction is scored against your new message. "Always use TypeScript", "prefer metric units", "our API uses kebab-case" — anything you've taught the AI is injected into the context automatically, even if you taught it months ago in a different conversation.

2

Model recovery prompts fire

Each model generates and stores a recovery prompt — a self-written system message that summarises your active corrections, preferred domains, and reliability record. When you return to a conversation, stale recovery prompts are regenerated silently in the background before your message is sent.

3

Context backups carry over session history

At configurable intervals (auto, 15 min, hourly), each model writes a compressed context summary. These summaries are injected when the conversation window would otherwise lose history — you never hit a wall where the AI "forgets" the first half of a long project.

What you need to do

Almost nothing. Click on a conversation from the sidebar, type your follow-up question, and send. The AI picks up from where you left off. The only time you need to re-explain something is if you deliberately want to change direction.

📋 Tips for long-running projects

Use Projects to group related conversations — corrections scoped to a project only apply within it
Pin important corrections in the Memory tab so they always inject regardless of relevance score
Use Generate / View prompt ↗ to manually trigger a recovery prompt refresh after a major project change
Search for old decisions before asking a new question — saves tokens and keeps reasoning consistent

Installation

AUA-Veritas runs natively on macOS with Apple Silicon (M-series). No Python or Node installation needed — everything is bundled.

1

Download the DMG

Go to the GitHub Releases page and download AUA-Veritas-0.1.0-arm64.dmg.

2

Mount and install

Double-click the DMG to mount it. Drag AUA-Veritas to your Applications folder.

3

Bypass Gatekeeper on first launch

The app is unsigned. On first launch: right-click → Open → Open in the dialog. You only need to do this once.

System requirements: macOS 12+ · Apple Silicon (M1/M2/M3/M4) · ~250MB disk space

Adding API keys

Veritas calls frontier model APIs directly — you pay the providers at cost, no markup. You need at least one key, but more means better competition.

📋 Supported providers

OpenAI — GPT-4o, GPT-4o-mini. Get at platform.openai.com/api-keys
Anthropic — Claude Sonnet 4, Claude Haiku. Get at console.anthropic.com
Google — Gemini 1.5 Pro, Gemini 2.0 Flash. Get at aistudio.google.com
Groq — Llama 3.3 70B (free tier available). Get at console.groq.com

1

Open Settings

Click the ⚙️ gear icon in the top-right of the sidebar, or use Cmd+,.

2

Enter keys in the API Keys section

Keys are stored in macOS Keychain — encrypted at the OS level, never written to disk in plain text.

3

Enable the models you want

Use the checkboxes in the left sidebar to enable or disable models per-conversation. More models = better VCG elections, but higher cost.

Recommendation: Enable GPT-4o + Claude Sonnet + one Groq model as a baseline. This gives you OpenAI's breadth, Anthropic's reasoning, and a free fast model for the VCG competition.

Your first conversation

Just type in the input at the bottom and press Enter. Veritas handles everything else.

AUA-Veritas

You typed:

"What's the best way to index a Postgres table for a LIKE query?"

Veritas response (GPT-4o selected · 2.7s):

For prefix LIKE queries (e.g. name LIKE 'Jo%'), a standard B-tree index works well. For arbitrary substring searches (e.g. LIKE '%smith%'), use a GIN index with pg_trgm extension...

⚡ Memory injected: 1 correction · Domain: code/databases · Winner welfare: 0.68

🔍 What happened behind the scenes

Veritas scored your stored corrections against the question — any relevant memories were injected into the system prompt
All enabled models were called simultaneously with a competitive evaluation prompt
Each model self-reported its domain using a DOMAINS: tag (stripped before display)
The VCG welfare formula selected GPT-4o as the winner in this session's database domain
The answer was displayed and the model run was recorded for future routing

Accuracy modes

Choose how much verification overhead you want. Fast for quick questions, Max for anything critical.

Mode	Models called	Peer review	Best for
Fast	1 (VCG winner only)	No	Quick questions, low-stakes queries
Balanced	All enabled	On disagreement only	Most everyday use — good accuracy/cost ratio
High	All enabled	Always	Technical questions, important decisions
Max	All enabled	Always + correction check	High-stakes queries: medical, legal, financial

Tip: Use Balanced as your default. Switch to High when the answer matters and you want independent verification across models. Save Max for decisions you'd regret getting wrong.

Memory & the correction system

This is what makes Veritas different. Every correction you make is stored permanently and injected into future queries where it's relevant.

How memory injection works

Before every query, Veritas scores all stored corrections against the current question using 8 factors:

📊 Scoring factors

Relevance — semantic similarity between the correction and the current query
Failure prevention value — how bad is it if this correction is missed?
Importance — correction priority (pinned corrections always score higher)
Recency — newer corrections are slightly preferred
Confidence — how certain was the correction at time of storage?
Staleness — corrections older than decay threshold score lower
Token cost — very long corrections are penalised to avoid context bloat
Pinned status — pinned corrections always pass the threshold

Only corrections scoring above 0.30 are injected. This prevents irrelevant memories from cluttering the context window.

Memory types stored: Factual corrections ("The answer is 53, not 57"), persistent instructions ("Always use metric units"), domain rules ("Prefer Postgres over MySQL"), and model preferences ("I prefer Claude's style on writing tasks").

Making corrections

Three ways to teach Veritas something new.

A

Implicit correction (just reply naturally)

Say "No, it's 53" or "Actually metric units please" — Veritas detects the correction intent using semantic similarity and asks you to confirm. Once confirmed, it's stored permanently.

B

Explicit correction prefix

Start your message with correction: to skip the detection step: correction: always use TypeScript, never plain JavaScript.

C

"I know that…" prefix

For persistent instructions: I know that we use kebab-case for all our CSS class names. Stored as a global rule, injected whenever CSS is relevant.

Correction sensitivity: You can adjust how aggressively implicit corrections are detected in Look Under the Hood → Memory tab → Correction intelligence settings. Lower threshold = more prompts, higher = fewer.

Model disagreements

When models give meaningfully different answers, Veritas surfaces the disagreement explicitly rather than silently picking one.

⚠ Models disagree on this answer

GPT-4o: "Use a singleton pattern here"

Claude Sonnet: "A singleton would be an antipattern — use dependency injection"

Pick the answer you prefer:

GPT-4o Claude Sonnet ✓

📌 What happens when you pick

Your preference is recorded as a model preference correction
The chosen model gets a VCG win recorded in that domain — its effective_u rises
Over time, the model you prefer on disagreements gets routed to more often in that domain
No point is awarded to any model until you pick — disagreements don't inflate winners

Look Under the Hood — Overview tab

Click the 📊 chart icon in the top-right of the Quality panel to open the full analytics dashboard. It has 5 tabs.

The Overview tab shows high-level health metrics for your session:

📈 What you'll see

Total queries — how many questions asked this session
Agreement rate — % of queries where all models agreed (high = consistent, reliable answers)
Peer review rate — % of queries that triggered independent verification
Active corrections — how many stored memories are actively being injected
Session cost — total API spend, broken down by provider

Look Under the Hood — Models tab

Per-model performance breakdown — who's winning, and why.

⚖️ What the scores mean

Welfare score — the VCG score this model got this session (higher = more wins)
Win rate — fraction of queries where this model was selected as the VCG winner
Confidence — average confidence score from peer review (80% = models agree it's correct)
Latency — average response time — affects whether fast queries use this model in Fast mode
Domain strengths — which domains this model has been winning in across your history

What to look for: If one model consistently has a much higher win rate than others, consider whether it's genuinely better or just getting easier queries assigned to it by the routing. Check the Domains tab to see if the routing is domain-specific.

Look Under the Hood — Decisions tab

The most informative tab — a full trace of every decision made on each query.

Click any query in the list to expand its decision chain:

Correction check     → 1 correction injected (metric units rule)
Memory retrieval     → 3 relevant memories found, 2 above threshold
Models called        → GPT-4o · Claude Sonnet · Gemini (3 models)
VCG selection        → GPT-4o selected (W = 0.73)
                           Claude Sonnet: W = 0.68
                           Gemini:        W = 0.61
Peer review          → Correct (all models agree)
Confidence label     → High (80%)
      

🔎 Reading the VCG scores

Scores are computed as W_i = Σ p(j|q) · u_i(j) — the weighted sum of domain-specific win rates
A score near 0.5 means the model hasn't built up much history in this domain yet — neutral prior
A score above 0.7 means the model has a strong track record in this query's domain
Large gaps between models indicate strong domain specialization in your usage
If all scores are near 0.5, you haven't sent enough queries in this domain for differentiation yet

Look Under the Hood — Memory tab

View and manage all stored corrections and preferences. Two sub-views within one tab.

Memory view

All stored corrections with their type, domain, scope (global / project / conversation), and creation date. Filter by type using the pill buttons at the top.

Corrections view

Correction intelligence settings — two controls:

⚡ Correction intelligence settings

Implicit sensitivity slider (0.20 – 0.80) — how aggressively to detect corrections in your replies. Default 0.45. Lower = more prompts (catches more corrections, more false positives). Higher = fewer prompts (misses more, fewer interruptions).
Validation mode — Plausible (default): a cheap model checks if the correction could be true before storing it. Strict: full cross-check, slower but rejects more noise.

Look Under the Hood — Domains tab

See the live domain taxonomy that Veritas has learned from your query patterns.

Tree view

The 10 L0 root domains are always present (code, mathematics, science, legal, medical, finance, writing, analysis, history, general). As you use the app, sub-domains grow beneath them when models consistently report a specific sub-domain and performance diverges from the parent.

Click any node to expand its children. The query count badge shows how many queries have been routed to that domain across your session history.

Candidates view

Domain strings that models have reported but haven't yet been promoted to full nodes. Each candidate shows:

📊 Candidate fields

raw_string — the exact string a model returned (e.g. "constitutional law")
nearest_node — the closest existing tree node (e.g. "legal")
similarity — edit-distance similarity to the nearest node (0–1)
query count — how many times this string has been reported (needs ≥5 to be considered for promotion)
model count — how many distinct models have reported it (needs ≥2)

Promotion: A candidate becomes a full node when it has ≥5 queries from ≥2 models AND the performance divergence between it and its parent node exceeds the branch-relative threshold. This happens automatically in a background job every 5 minutes.

Context prompts

Each model writes its own recovery prompt — a system message it would send itself to restore full context if the conversation restarts.

Click Generate / View prompt ↗ at the bottom of the Memory panel (right sidebar) to open the modal. The system calls each stale model and asks it to write a prompt incorporating your current corrections, preferences, and top domains — then saves the result.

🔄 When context prompts are auto-sent

A model drops a rule (compliance monitor detects streak ≥ 2)
You start a new chat window after a long gap
Context window pressure is detected mid-conversation

Prompts are checked for staleness every 15 minutes in the background. A prompt is stale if it's older than 24 hours or a new correction has been added since it was generated. Only stale models are regenerated — fresh ones show their saved text immediately.

Local models (Ollama)

Run Ollama models locally for free, private inference. Local models compete in VCG elections on equal footing with frontier models.

1

Install Ollama

Download from ollama.com and run ollama pull llama3.2 or any model you prefer.

2

Enable in Settings

Toggle Local models in Settings. Veritas will auto-discover models running on the default Ollama port (11434).

3

Local mode is exclusive

You can run frontier models OR local models — not both simultaneously. Use the toggle to switch. Local models are ideal for sensitive queries that shouldn't leave your machine.

Bug reporting

Found something broken? Report it in one click — no account, no email required.

Click the 🐛 Report a bug button at the bottom of the left sidebar. A modal opens with a comment field and opt-in checkboxes for including your last 5 messages (for context) and your email (for follow-up).

Privacy: Reports go to a private GitHub repo accessible only to the developer. An anonymous 8-character machine hash is included for correlation — no name, location, or personal data unless you explicitly opt in.

Read the design doc →

Tutorial

Two ways to use AUA-Veritas

🧠 As a context manager

⚖️ As a multi-model verifier

Keyword search across all chats

How to search

🔧 How search works under the hood

Resuming a conversation after months

What happens automatically

Corrections and preferences are injected

Model recovery prompts fire

Context backups carry over session history

What you need to do

📋 Tips for long-running projects

Installation

Download the DMG

Mount and install

Bypass Gatekeeper on first launch

Adding API keys

📋 Supported providers

Open Settings

Enter keys in the API Keys section

Enable the models you want

Your first conversation

🔍 What happened behind the scenes

Accuracy modes

Memory & the correction system

How memory injection works

📊 Scoring factors

Making corrections

Implicit correction (just reply naturally)

Explicit correction prefix

"I know that…" prefix

Model disagreements

📌 What happens when you pick

Look Under the Hood — Overview tab

📈 What you'll see

Look Under the Hood — Models tab

⚖️ What the scores mean

Look Under the Hood — Decisions tab

🔎 Reading the VCG scores

Look Under the Hood — Memory tab

Memory view

Corrections view

⚡ Correction intelligence settings

Look Under the Hood — Domains tab

Tree view

Candidates view

📊 Candidate fields

Context prompts

🔄 When context prompts are auto-sent

Local models (Ollama)

Install Ollama

Enable in Settings

Local mode is exclusive

Bug reporting