Praneeth Tota · Independent Researcher · v1.0.0
This appendix contains formal mathematical foundations for the utility function and system components introduced in §§4–5. Each result is cross-referenced from its first appearance in the main text. Proofs are self-contained; familiarity with basic real analysis and probability is assumed.
Index of results: - Theorem B.1 (§B.1): Additive linear structure of U — Debreu + Cauchy derivation - §B.2: Field weight justification — cost proportionality design principle - Proposition B.3 (§B.3): Efficacy sigmoid — Mann-Whitney interpretation - Theorem B.4 (§B.4): EMA confidence update — Kalman optimality for ρ = 0.05 - Proposition B.6 (§B.6): Curiosity cap — exploitation dominance proof - Theorem B.5 (§B.5): Confidence convergence — geometric convergence in expectation, recovery time - Theorem B.7 (§B.7): Personality stability — Lyapunov analysis, bounded stable dynamics - §B.8: VCG welfare score — Lemma B.8.1 (shrinkage MSE dominance), Propositions B.8.2–B.8.3 (multi-domain efficiency, domain-tree property preservation), Theorems S1–S3 (dominant strategy, POA=1, individual rationality) — full proofs
Before analyzing the properties of the utility function $U$, we justify its structure from first principles. In its current form,
$$U(E, C, K; f) = w_e(f)\,E + w_c(f)\,C + w_k(f)\,K$$
may appear as a convenient aggregation. This section establishes that its structure is not arbitrary, but arises naturally from a set of desiderata on how performance, consistency, and exploration should contribute to decision-making.
We seek a utility function $U : [0,1]^3 \to \mathbb{R}$ over three measurable dimensions: - efficacy $E$, - confidence $C$, - curiosity $K$,
satisfying the following axioms.
A1 (Monotonicity). $U$ is strictly increasing in each argument.
A2 (Continuity). $U$ is continuous on $[0,1]^3$.
A3 (Marginal Independence / Separability). The marginal effect of improving one dimension does not depend on the current level of the others. Formally, for any $E, E', C, C', K$:
$$U(E,C,K) - U(E',C,K) = U(E,C',K) - U(E',C',K)$$
A4 (Field-Invariant Structure). The functional form of $U$ is identical across fields; only the weight vector $w(f)$ may vary with $f$.
A5 (Linear Scaling Invariance). For all $\lambda \in (0, 1]$ such that $(\lambda E, \lambda C, \lambda K) \in [0,1]^3$:
$$U(\lambda E,\, \lambda C,\, \lambda K) = \lambda\, U(E, C, K)$$
Motivation. A5 states that scaling all dimensions by the same factor scales utility proportionally. This is natural when $E$, $C$, and $K$ are all measured on the same normalized $[0,1]$ scale: an agent at half performance, half confidence, and half curiosity should have half the utility. It rules out curvature in the component functions and is the standard homogeneity assumption in welfare economics (Blackorby and Donaldson, 1982).
Remark (A5 as the load-bearing axiom). A5 is the substantive modelling commitment in Theorem B.1. Axioms A1–A3 already force an additively separable form $\phi_E(E)+\phi_C(C)+\phi_K(K)$ via Debreu's theorem, leaving $\phi_E$, $\phi_C$, $\phi_K$ as arbitrary continuous increasing functions. It is A5 alone — linear-scaling homogeneity — that forces each $\phi_i$ to be linear, making the additive-linear form $U = w_e E + w_c C + w_k K$ necessary rather than merely convenient. A reviewer who accepts A1–A3 but finds A5 too restrictive could replace it with a weaker concavity assumption, obtaining a separable concave utility (e.g. $U = w_e\sqrt{E} + w_c\sqrt{C} + w_k\sqrt{K}$) that also satisfies A1–A4 but violates A5. The linear form is therefore the unique result of A1–A5 jointly, and A5 is the axiom that should be scrutinised if one disputes the linearity conclusion.
Under axioms A1–A5, the utility function is necessarily of the form
$$U(E, C, K; f) = w_e(f)\,E + w_c(f)\,C + w_k(f)\,K$$
with $w_e(f), w_c(f), w_k(f) > 0$ and $w_e(f) + w_c(f) + w_k(f) = 1$.
Step 1 — Additive representation from A1–A3.
A3 states a cardinal difference-independence condition: the change in $U$ from improving one argument is unaffected by the fixed levels of the others. This is strictly stronger than the ordinal preferential-independence hypothesis required by Debreu's additive representation theorem — cardinal difference independence implies preferential independence, since equality of utility differences across levels of the other arguments implies the corresponding preference orderings are unaffected by those levels. Applying A3 pairwise — $E$ independent of $(C,K)$, $C$ independent of $(E,K)$, $K$ independent of $(E,C)$ — therefore yields the mutual preferential independence Debreu requires. By the theorem of Debreu (1960, Theorem 3): a continuous utility function on a connected domain with mutually preferentially independent components (of dimension $\geq 3$, which holds here) admits an additively separable representation. Therefore there exist continuous strictly increasing functions $\phi_E, \phi_C, \phi_K : [0,1] \to \mathbb{R}$ such that:
$$U(E, C, K) = \phi_E(E) + \phi_C(C) + \phi_K(K)$$
Step 2 — Linearity from A5 via the Cauchy functional equation.
Substitute the additive form into A5:
$$\phi_E(\lambda E) + \phi_C(\lambda C) + \phi_K(\lambda K) = \lambda\bigl(\phi_E(E) + \phi_C(C) + \phi_K(K)\bigr)$$
Fix $C = C_0 \in (0,1]$ and $K = K_0 \in (0,1]$ and vary $E \in (0,1]$:
$$\phi_E(\lambda E) - \lambda\,\phi_E(E) = \lambda\,\phi_C(C_0) - \phi_C(\lambda C_0) + \lambda\,\phi_K(K_0) - \phi_K(\lambda K_0)$$
The right-hand side depends only on $C_0$, $K_0$, and $\lambda$ — not on $E$. Therefore the left-hand side must be constant in $E$:
$$\phi_E(\lambda E) - \lambda\,\phi_E(E) = h(\lambda) \qquad \text{for all } E,\lambda \in (0,1], \tag{$*$}$$
where $h : (0,1] \to \mathbb{R}$ is a function of $\lambda$ alone. We now determine $\phi_E$ and $h$ using only continuity — no differentiability is assumed.
Step 2a — Determine $h$ via iterated application of $(*)$. Apply $(*)$ twice in sequence: first with scaling $\lambda_2$ then with $\lambda_1$, and vice versa:
$$\phi_E(\lambda_1\lambda_2 E) = \lambda_1\phi_E(\lambda_2 E) + h(\lambda_1) = \lambda_1\lambda_2\phi_E(E) + \lambda_1 h(\lambda_2) + h(\lambda_1)$$
By symmetry (swapping $\lambda_1 \leftrightarrow \lambda_2$):
$$\phi_E(\lambda_1\lambda_2 E) = \lambda_1\lambda_2\phi_E(E) + \lambda_2 h(\lambda_1) + h(\lambda_2)$$
Equating the two expressions:
$$\lambda_1 h(\lambda_2) + h(\lambda_1) = \lambda_2 h(\lambda_1) + h(\lambda_2) \implies (\lambda_1 - 1)h(\lambda_2) = (\lambda_2 - 1)h(\lambda_1)$$
For any $\lambda_1 \neq 1$ and $\lambda_2 \neq 1$, this gives $h(\lambda_1)/(\lambda_1 - 1) = h(\lambda_2)/(\lambda_2 - 1) =: c$ for some constant $c \in \mathbb{R}$. By continuity of $\phi_E$ (and hence of $h$, which equals $\phi_E(\lambda E) - \lambda\phi_E(E)$), this extends to $\lambda = 1$ with $h(1) = c(1-1) = 0$. Therefore:
$$h(\lambda) = c(\lambda - 1) \quad \text{for all } \lambda \in (0,1]$$
Step 2b — Conclude linearity of $\phi_E$. Substituting $h(\lambda) = c(\lambda-1)$ into $(*)$ and setting $E = 1$:
$$\phi_E(\lambda) = \lambda\,\phi_E(1) + c(\lambda - 1) = \lambda\bigl(\phi_E(1) + c\bigr) - c$$
Setting $w_E = \phi_E(1) + c$ and $c_E = -c$, we obtain $\phi_E(E) = w_E E + c_E$ for all $E \in (0,1]$, and by continuity at $E = 0$ as well. This derivation uses only continuity (A2); no differentiability of $\phi_E$ is required.
By the same argument applied separately to $C_0$ and $K_0$:
$$\phi_C(C) = w_C\,C + c_C, \qquad \phi_K(K) = w_K\,K + c_K$$
Step 3 — Normalization (conventions).
The two normalizations below are conventions, not consequences of A1–A5; they fix the otherwise-free origin and scale of the (cardinal) utility, which any additive representation leaves undetermined up to a positive affine transformation. First, adopting the convention $U(0,0,0) = 0$ gives $c_E + c_C + c_K = 0$. Under A4, the functional form is the same across fields, so field dependence enters only through $w_i(f)$, not through $c_i$. The natural convention $\phi_i(0) = 0$ (zero contribution from a zero-valued dimension) gives $c_i = 0$ individually. Second, adopting the convention $U(1,1,1) = 1$ fixes the scale and yields $w_E + w_C + w_K = 1$. Strict monotonicity (A1) requires $w_i > 0$. These conventions do not affect the form of $U$ — only its origin and scale — so the linear additive structure is the substantive result and the unit simplex constraint is a labelling choice.
Therefore:
$$U(E, C, K; f) = w_e(f)\,E + w_c(f)\,C + w_k(f)\,K \qquad \blacksquare$$
Non-separable forms such as $U = E \cdot C$ violate A3: the marginal utility of increasing $E$ depends on the current level of $C$, creating the undesirable incentive of concentrating on dimensions already performing well rather than improving weaknesses. Non-homogeneous forms such as $U = \sqrt{E \cdot C \cdot K}$ violate A5: an agent at half performance does not achieve half utility. The linear form is not merely convenient — it is the unique form satisfying all five axioms jointly.
Different domains place different importance on correctness, reliability, and exploration. We model this via a field-dependent weight vector $w(f)$.
Setup. Define: - $c_E(f)$: expected cost of an incorrect output in field $f$, - $c_C(f)$: expected cost of internal inconsistency in field $f$, - $c_K(f)$: expected cost of failing to explore high-upside domains in field $f$.
Design principle. We set:
$$w_i(f) = \frac{c_i(f)}{c_E(f) + c_C(f) + c_K(f)}$$
so that the gradient $\nabla_x U = (w_e, w_c, w_k)$ is proportional to the cost vector. This ensures that a unit improvement in the highest-cost dimension produces the largest utility gain, aligning the agent's optimization with domain-specific risk.
Empirical calibration. The weight ordering is verified against professional liability standards:
| Field | $c_E$ | $c_C$ | $c_K$ | $w_e$ | $w_c$ |
|---|---|---|---|---|---|
| Surgery / Aviation | Very high (irreversible harm) | Very high (trust, procedure) | Low | 0.20 | 0.70 |
| Law | High (precedent, liability) | High (consistency) | Low | 0.30 | 0.60 |
| Software Engineering | Moderate (fixable) | Moderate | Moderate | 0.55 | 0.35 |
| Creative Writing | Low (subjective) | Very low | High (novelty) | 0.80 | 0.05–0.10 |
The weight ordering $w_c(\text{surgery}) \gg w_c(\text{creative})$ is consistent with medical malpractice standards, ICAO Annex 13 aviation incident reporting, and ISO 26262 software safety classifications, all of which impose stronger consistency requirements in higher-stakes fields.
Status. This is a decision-theoretic design principle grounded in cost proportionality, not a strict optimality theorem. The weights encode domain knowledge and are calibrated empirically. Future work may derive them from a formal expected-harm minimization over a specified loss model.
We define efficacy as a function of the relative performance ratio:
$$r = \frac{\text{agent performance}}{\text{human baseline}}$$
using the transformation:
$$E(r) = \frac{r}{1+r}$$
| Property | Formula | Value |
|---|---|---|
| Parity with human baseline | $E(1)$ | $0.5$ |
| Bounded above | $\lim_{r\to\infty} E(r)$ | $1$ |
| Bounded below | $\lim_{r\to 0} E(r)$ | $0$ |
| Smooth and monotone | $E'(r)$ | $1/(1+r)^2 > 0$ |
| Diminishing returns above baseline | $E''(r)$ for $r>1$ | $< 0$ |
Under an extreme-value (Gumbel) performance model — the standard latent-utility model of McFadden (1974) and Luce (1959) — $E(r)$ equals the Mann–Whitney dominance probability exactly, not merely analogously.
Choice of model. We model the log-performance of each agent as a Gumbel (type-I extreme value) random variable with common scale. This is the canonical random-utility specification: it arises whenever performance on a task is the maximum of many independent sub-task contributions (the extremal-types theorem makes Gumbel the limiting distribution of a maximum), and it is the unique location-family model whose pairwise dominance probability is the Bradley–Terry / softmax form. We note explicitly that the earlier log-logistic specification does not yield this closed form: for two independent log-logistic variables the dominance probability is not $r/(1+r)$ (it can be checked numerically that, e.g., $\mu_a-\mu_h = 1$ gives $\approx 0.662$, not $e/(1+e)\approx 0.731$). The Gumbel model is therefore the correct foundation for the softmax form, and we adopt it.
Proposition B.3. Let the log-performances $Y_{\text{agent}} = \log X_{\text{agent}} \sim \text{Gumbel}(\mu_a, \beta)$ and $Y_{\text{human}} = \log X_{\text{human}} \sim \text{Gumbel}(\mu_h, \beta)$ be independent with common scale $\beta$, and let $r = e^{(\mu_a - \mu_h)/\beta}$. Then:
$$P(X_{\text{agent}} > X_{\text{human}}) = \frac{r}{1+r} = E(r)$$
(With the normalization $\beta = 1$ this is $r = e^{\mu_a - \mu_h}$, the ratio of medians.)
Proof. Since $x \mapsto \log x$ is strictly increasing, $P(X_a > X_h) = P(Y_a > Y_h) = P(Y_a - Y_h > 0)$. We use the standard lemma that the difference of two independent Gumbel variables with common scale is logistic:
Lemma (difference of Gumbels). If $Y_a \sim \text{Gumbel}(\mu_a,\beta)$ and $Y_h \sim \text{Gumbel}(\mu_h,\beta)$ are independent, then $Y_a - Y_h \sim \text{Logistic}(\mu_a - \mu_h,\ \beta)$.
Proof of lemma. The Gumbel CDF is $F_Y(y;\mu,\beta) = \exp\!\big(-e^{-(y-\mu)/\beta}\big)$ with density $f_Y(y) = \tfrac1\beta e^{-(y-\mu)/\beta}\exp\!\big(-e^{-(y-\mu)/\beta}\big)$. For $D = Y_a - Y_h$, condition on $Y_h = y$ and integrate. Writing $a = e^{-(\,\cdot\,-\mu_a)/\beta}$ and $b = e^{-(y-\mu_h)/\beta}$, the convolution
$$F_D(d) = \int_{-\infty}^{\infty} F_{Y_a}(d + y)\, f_{Y_h}(y)\, dy$$
evaluates, after the substitution $u = e^{-(y-\mu_h)/\beta}$ (so $du = -\tfrac1\beta u\,dy$), to
$$F_D(d) = \int_0^\infty \exp\!\Big(-u\,e^{-(d-(\mu_a-\mu_h))/\beta}\Big)\,e^{-u}\,du = \frac{1}{1 + e^{-(d-(\mu_a-\mu_h))/\beta}},$$
using $\int_0^\infty e^{-u(1+c)}\,du = 1/(1+c)$ with $c = e^{-(d-(\mu_a-\mu_h))/\beta}$. This is exactly the $\text{Logistic}(\mu_a-\mu_h,\beta)$ CDF. $\square$
Applying the lemma at $d = 0$:
$$P(Y_a - Y_h > 0) = 1 - F_D(0) = 1 - \frac{1}{1 + e^{(\mu_a-\mu_h)/\beta}} = \frac{e^{(\mu_a-\mu_h)/\beta}}{1 + e^{(\mu_a-\mu_h)/\beta}} = \frac{r}{1+r} = E(r). \qquad \blacksquare$$
Scope of the claim. The equality $E(r) = P(X_a > X_h)$ holds under the Gumbel (extreme-value) log-performance model with common scale. This is the same latent-variable structure that produces the multinomial-logit choice model, so the softmax form is not an arbitrary convenience but the dominance probability implied by extreme-value performance. Under a different latent family (e.g. Gaussian log-performance, giving a probit form $\Phi((\mu_a-\mu_h)/\sqrt{2}\,\sigma)$) the dominance probability has a different closed form; the Gumbel choice is a modelling assumption, justified by the extremal-types argument above, not a mathematical necessity.
A linear normalization $E_{\text{lin}}(r) = \min(r, 1)$ has a discontinuous derivative at $r=1$, gives zero marginal utility for any superhuman improvement, and lacks a probabilistic interpretation. The sigmoid form $r/(1+r)$ avoids all three issues and is the natural functional form for a dominance probability under a location-scale family of performance distributions.
Confidence is updated via the exponential moving average:
$$C_{t+1} = (1-\alpha)\,C_t + \alpha\,s_t$$
where $s_t \in [0,1]$ is the observed test pass rate at time $t$.
Model latent domain confidence $\theta_t$ (the agent's true underlying competence) as a random walk observed through noisy pass rates:
$$\theta_{t+1} = \theta_t + \varepsilon_t, \qquad \varepsilon_t \sim \mathcal{N}(0,\, \sigma_q^2) \quad \text{(process noise)}$$
$$s_t = \theta_t + \eta_t, \qquad \eta_t \sim \mathcal{N}(0,\, \sigma_r^2) \quad \text{(observation noise)}$$
The random walk model captures the assumption that true competence changes gradually — through calibration and learning — rather than jumping discontinuously.
In steady state, the Kalman filter for the above system reduces exactly to the EMA update $C_{t+1} = (1-\alpha^*)C_t + \alpha^* s_t$ with optimal gain:
$$\alpha^* = \frac{-\sigma_q^2 + \sqrt{\sigma_q^{4} + 4\sigma_q^{2}\sigma_r^{2}}}{2\sigma_r^2}$$
The choice $\alpha = 0.2$ is optimal when the noise ratio $\rho = \sigma_q^2/\sigma_r^2 = 0.05$.
Proof. The Kalman filter update is $C_{t+1} = C_t + K_t(s_t - C_t)$, identical to the EMA with $\alpha = K_t$. In steady state $K_t \to K^*$. The steady-state error covariance $P^*$ satisfies the discrete algebraic Riccati equation:
$$P^* = \frac{P^* \sigma_r^2}{P^* + \sigma_r^2} + \sigma_q^2$$
with $K^* = P^*/(P^* + \sigma_r^2)$. Substituting $P^* = K^*\sigma_r^2/(1-K^*)$ and simplifying:
$$K^{*2}\sigma_r^2 + K^*\sigma_q^2 - \sigma_q^2 = 0 \quad \Longrightarrow \quad K^* = \frac{-\sigma_q^2 + \sqrt{\sigma_q^{4} + 4\sigma_q^{2}\sigma_r^{2}}}{2\sigma_r^2}$$
Setting $K^* = \alpha^* = 0.2$ and solving for $\rho = \sigma_q^2/\sigma_r^2$:
$$0.2 = \frac{-\rho + \sqrt{\rho^2 + 4\rho}}{2} \implies (0.4 + \rho)^2 = \rho^2 + 4\rho \implies 0.16 = 3.2\rho \implies \rho = 0.05 \qquad \blacksquare$$
The noise ratio $\rho = 0.05$ means process noise is 5% of observation noise: true competence changes slowly relative to the variability of individual test outcomes. This is the correct regime for incremental calibration over many interactions — a single test pass or fail is noisy, while genuine competence changes only through sustained learning. The value $\alpha = 0.2$ is therefore not arbitrary; it is the Kalman-optimal gain for an agent whose true competence evolves at 5% the rate of observational variability.
| $\rho = \sigma_q^2/\sigma_r^2$ | Optimal $\alpha^*$ | Regime |
|---|---|---|
| 0.01 | 0.095 | Very slow competence change — conservative updates |
| 0.05 | 0.200 | Incremental learning (baseline) |
| 0.11 | 0.281 | Moderate-pace learning |
| 0.25 | 0.390 | Fast-changing competence |
For high-stakes fields where competence changes very slowly (surgery, aviation), smaller $\alpha$ values are appropriate. Deriving field-specific optimal gains from domain learning rate estimates is left as future work.
Remark on reasoning direction. The derivation above runs in reverse: the value $\alpha = 0.2$ was adopted first as a practically motivated heuristic (standard EMA learning rate in online learning literature), and the Kalman analysis was then used to characterize the noise regime for which this choice is provably optimal. This is a justification of a prior heuristic rather than a pure derivation. An alternative framing is: if we believe the process-to-observation noise ratio is approximately $\rho \approx 0.05$ — i.e., true competence changes at roughly 5% the rate of single-trial variability, which is plausible for an agent learning incrementally over many interactions — then $\alpha = 0.2$ is the unique Kalman-optimal gain under this model. The sensitivity table confirms the forward direction is consistent: $\rho = 0.05$ returns $\alpha^* = 0.200$ exactly. The validity of the Riccati derivation is model-specific; it relies on the random-walk / Gaussian-noise state-space model stated above, and does not extend to, e.g., heavy-tailed observation noise or non-stationary competence dynamics.
We define:
$$K(d,t) = (C_{\max} - C_d)\;\nu_d\;\bigl(1 + \alpha_f\,\log(1 + n_{\text{fam}})\bigr)$$
where $C_{\max} - C_d$ is the remaining confidence gap, $\nu_d \in [0,1]$ is the novelty of domain $d$, and $n_{\text{fam}}$ counts consecutive familiar interactions (resets on novel problems).
The UCB1 algorithm (Auer et al., 2002) selects arms by:
$$\text{UCB}_d(t) = \hat{\mu}_d + \sqrt{\frac{2\log t}{n_d}}$$
The curiosity term maps onto this structure as follows:
| UCB1 component | Curiosity component | Interpretation |
|---|---|---|
| $\hat{\mu}_d$ (mean estimate) | $C_d$ (confidence) | Current estimated competence |
| $1 - \hat{\mu}_d$ (uncertainty gap) | $C_{\max} - C_d$ | Remaining upside in domain $d$ |
| $\sqrt{2\log t / n_d}$ (exploration bonus) | $\nu_d\,(1 + \alpha_f \log(1 + n_{\text{fam}}))$ | Novelty-scaled familiarity pressure |
Both bonuses are concave and increasing in the "time since last exploration," creating persistent but diminishing pressure to revisit underexplored domains. The key structural difference is functional form: UCB uses $\sqrt{\log t / n}$; we use $\nu \cdot (1 + \alpha \log n_{\text{fam}})$. Both are in the sublinear growth family that prevents any single domain from being ignored indefinitely.
What this establishes: The curiosity term is UCB-inspired — it shares the structural properties (uncertainty-driven, concave in familiarity, bounded by exploitation) that make UCB effective. We do not claim exact equivalence to UCB1 or formal regret optimality; those results require a full bandit analysis under our specific setting, which is left as future work.
Proposition. The constraint $w_k K \leq w_e E + w_c C$ implies that curiosity contributes at most 50% of total utility at all times:
$$r_K \;\triangleq\; \frac{w_k K}{U} \;\leq\; \frac{1}{2}$$
Proof. Let $S = w_e E + w_c C$ (the exploitation component). The cap states $w_k K \leq S$. Total utility is $U = S + w_k K$. Therefore:
$$r_K = \frac{w_k K}{S + w_k K} \leq \frac{S}{S + S} = \frac{1}{2} \qquad \blacksquare$$
Equality holds only when $w_k K = S$, i.e., when curiosity is at its maximum and exploitation and exploration contribute equally. In all other cases $r_K < 1/2$.
The 50% threshold is the tightest constant upper bound derivable from the single constraint "exploitation $\geq$ exploration in utility contribution at all times." A tighter cap (e.g., 30%) would unnecessarily restrict exploration during early learning when $E$ and $C$ are low. A looser cap (e.g., 70%) would permit exploration to dominate even when the agent has high confidence and efficacy — which is the gaming behavior we want to prevent. The 50% bound is therefore not arbitrary: it is the most permissive cap consistent with the requirement that exploitation never falls below exploration.
What remains open. Whether the log-growth function achieves optimal regret guarantees under the multi-armed bandit formulation — including formal minimax bounds — is an open question. The analogy to UCB provides intuition and motivation, and the exploitation-dominance property is proved exactly. A formal regret analysis is deferred to future work.
The utility function $U = w_e E + w_c C + w_k K$ is justified as follows:
| Component | Justification | Status |
|---|---|---|
| Additive structure | Debreu (1960) + linear scaling invariance (A5) | Theorem (proved) |
| Linear $\phi_i$ | Cauchy functional equation from A5 (continuity only, no differentiability required) | Theorem (proved) |
| Field weights $w_i(f)$ | Cost proportionality, calibrated to liability standards | Design principle |
| Efficacy $E(r) = r/(1+r)$ | Mann-Whitney dominance probability under Gumbel (extreme-value) model | Proved (difference-of-Gumbels lemma) |
| Confidence EMA, $\alpha=0.2$ | Kalman-optimal for $\rho=0.05$ noise ratio | Theorem (proved) |
| Curiosity structure | UCB-inspired; exploitation-dominance proved | Partial — regret analysis open |
The formulation is not claimed to be the unique possible design, but it is the minimal, interpretable, and theoretically grounded design consistent with the five axioms. Each component rests on an identified theoretical foundation, and the scope of each claim is stated explicitly.
Recall the confidence update rule with contradiction penalty:
$$C_{t+1} = (1-\alpha)\,C_t + \alpha\,s_t\,(1 - \lambda\mu(f))$$
where: - $\alpha \in (0,1)$ is the EMA learning rate, - $s_t \in [0,1]$ is the observed test pass rate at time $t$, - $\lambda \in [0,1]$ is the contradiction penalty magnitude (zero when no contradiction occurs), - $\mu(f) \geq 1$ is the field penalty multiplier, - $f$ denotes the active field.
Define the effective signal $\tilde{s}_t = s_t(1 - \lambda\mu(f))$. When no contradiction occurs, $\lambda = 0$ and $\tilde{s}_t = s_t$. When a contradiction of magnitude $\lambda$ is detected, $\tilde{s}_t$ is reduced by a factor $(1 - \lambda\mu(f))$.
The update rule becomes:
$$C_{t+1} = (1-\alpha)\,C_t + \alpha\,\tilde{s}_t$$
Lemma. The confidence at time $t$ is:
$$C_t = (1-\alpha)^t\,C_0 + \alpha\sum_{k=0}^{t-1}(1-\alpha)^{t-1-k}\,\tilde{s}_k$$
Proof. By induction. Base case $t=0$: $C_0 = C_0$. Inductive step: assume the formula holds for $t$. Then:
$$C_{t+1} = (1-\alpha)C_t + \alpha\tilde{s}_t$$ $$= (1-\alpha)\!\left[(1-\alpha)^t C_0 + \alpha\sum_{k=0}^{t-1}(1-\alpha)^{t-1-k}\tilde{s}_k\right] + \alpha\tilde{s}_t$$ $$= (1-\alpha)^{t+1}C_0 + \alpha\sum_{k=0}^{t-1}(1-\alpha)^{t-k}\tilde{s}_k + \alpha\tilde{s}_t$$ $$= (1-\alpha)^{t+1}C_0 + \alpha\sum_{k=0}^{t}(1-\alpha)^{t-k}\tilde{s}_k \qquad \blacksquare$$
Theorem. Let $\{\tilde{s}_t\}$ be a stationary sequence with constant expectation $\bar{s} \in [0,1]$ and innovation noise $\sigma_{\tilde{s}} = \mathrm{Std}(\tilde{s}_t - \tilde{s}^*) \geq 0$. Assume the innovations $\eta_t \triangleq \tilde{s}_t - \tilde{s}^*$ are zero-mean and serially uncorrelated ($\mathbb{E}[\eta_t \eta_{t'}] = 0$ for $t \neq t'$); i.i.d. is a sufficient special case. Assume $0 \leq \lambda\mu(f) < 1$ (the penalty regime is sub-maximal: contradictions are rare relative to the pass rate, so the effective signal retains positive mean). Let $\tilde{s}^* = \bar{s}(1 - \lambda\mu(f)) \in (0, 1]$ be the expected effective signal. Then:
(Boundary note. At $\lambda\mu(f) = 1$, $C^* = 0$ and Part 3 fails: monotonicity collapses to a horizontal steady state regardless of $\bar{s}$. At $\lambda\mu(f) > 1$, $\tilde{s}^* < 0$, which is outside the natural domain of confidence and signals that the agent cannot achieve stable positive confidence — the correct response is to trigger abstention. The system design prevents this regime by construction: $\lambda$ is per-interaction contradiction magnitude, which in practice is small, and the field gate $C < C_{\min}(f)$ triggers abstention before accumulated penalties drive $C$ negative.)
$$C^* = \bar{s}\,(1 - \lambda\mu(f)) = \tilde{s}^*$$
$$\sqrt{\mathbb{E}\big[(C_t - C^*)^2\big]} \leq (1-\alpha)^t\,|C_0 - C^*| + \sigma_{\tilde{s}}\sqrt{\frac{\alpha}{2-\alpha}}$$
The first term decays geometrically at rate $(1-\alpha)$ per interaction. The second is a constant noise floor equal to the stationary standard deviation of an AR(1) process; it is zero in the deterministic (zero-noise) case and small when $\sigma_{\tilde{s}}$ is small. For $\alpha = 0.2$: $\sqrt{\alpha/(2-\alpha)} = 1/3$, so the floor is $\sigma_{\tilde{s}}/3$. Note: because the root-mean-square dominates the mean absolute error, $\mathbb{E}[|C_t - C^*|] \leq \sqrt{\mathbb{E}[(C_t-C^*)^2]}$ (Jensen / Cauchy–Schwarz), the same bound holds verbatim for the mean absolute error. The pure geometric bound $\sqrt{\mathbb{E}[(C_t - C^*)^2]} \leq (1-\alpha)^t|C_0 - C^*|$ holds exactly in the deterministic case $\sigma_{\tilde{s}} = 0$. With $\alpha = 0.2$ and small $\sigma_{\tilde{s}}$, the transient halves every $\lceil\log(0.5)/\log(0.8)\rceil = 3$ interactions before hitting the noise floor.
$$\frac{\partial C^*}{\partial \bar{s}} = 1 - \lambda\mu(f) > 0$$
The strict inequality $1 - \lambda\mu(f) > 0$ follows directly from the assumption $\lambda\mu(f) < 1$. Higher agent pass rates produce higher steady-state confidence, all else equal. Without this assumption the derivative is zero (boundary case $\lambda\mu(f) = 1$) or negative (supra-maximal penalty), in which case no monotone steady state exists within $[0,1]$ and the abstention mechanism takes over.
$$\frac{\partial C^*}{\partial \mu(f)} = -\bar{s}\,\lambda < 0$$
Higher-stakes fields (larger $\mu(f)$) impose a lower achievable steady-state confidence, correctly encoding that high-stakes domains demand stricter internal consistency standards.
$$t_{\text{recovery}} = \left\lceil\frac{\log(\varepsilon/\delta)}{\log(1-\alpha)}\right\rceil$$
Proof.
Part 1 — Existence and uniqueness. The fixed point satisfies $C^* = (1-\alpha)C^* + \alpha\tilde{s}^*$, giving $\alpha C^* = \alpha\tilde{s}^*$, hence $C^* = \tilde{s}^* = \bar{s}(1-\lambda\mu(f))$. This is unique since the equation is linear in $C^*$.
Part 2 — Geometric convergence in mean-square. Define the error $e_t = C_t - C^*$. Subtracting the fixed-point equation from the update rule:
$$e_{t+1} = (1-\alpha)\,e_t + \alpha\,\eta_t, \qquad \eta_t = \tilde{s}_t - \tilde{s}^*$$
The innovation $\eta_t$ has zero mean ($\mathbb{E}[\tilde{s}_t] = \tilde{s}^*$) and variance $\sigma_{\tilde{s}}^2$, and is uncorrelated across time by assumption. Mean. Taking expectations and using $\mathbb{E}[\eta_t]=0$:
$$\mathbb{E}[e_{t+1}] = (1-\alpha)\,\mathbb{E}[e_t] \;\Longrightarrow\; \mathbb{E}[e_t] = (1-\alpha)^t (C_0 - C^*),$$
with $e_0 = C_0 - C^*$ deterministic — so the bias decays geometrically.
Variance. From the closed-form solution, $e_t = (1-\alpha)^t e_0 + \alpha\sum_{k=0}^{t-1}(1-\alpha)^{t-1-k}\eta_k$. Since $e_0$ is deterministic and the $\eta_k$ are zero-mean and uncorrelated, the variance is a sum of per-term variances (all cross terms vanish):
$$\mathrm{Var}(e_t) = \alpha^2 \sum_{k=0}^{t-1}(1-\alpha)^{2(t-1-k)}\,\sigma_{\tilde{s}}^2 = \alpha^2\sigma_{\tilde{s}}^2 \sum_{j=0}^{t-1}(1-\alpha)^{2j} = \alpha^2\sigma_{\tilde{s}}^2\,\frac{1-(1-\alpha)^{2t}}{1-(1-\alpha)^2}.$$
Since $1-(1-\alpha)^2 = \alpha(2-\alpha)$, this simplifies to the exact transient variance
$$\mathrm{Var}(e_t) = \frac{\alpha}{2-\alpha}\,\big(1-(1-\alpha)^{2t}\big)\,\sigma_{\tilde{s}}^2 \;\xrightarrow{t\to\infty}\; \frac{\alpha}{2-\alpha}\,\sigma_{\tilde{s}}^2,$$
which is precisely the stationary variance of an AR(1) process with coefficient $(1-\alpha)$ and innovation variance $\alpha^2\sigma_{\tilde{s}}^2$. Combining mean and variance. Using $\mathbb{E}[e_t^2] = (\mathbb{E}[e_t])^2 + \mathrm{Var}(e_t)$ and the subadditivity of the square root $\sqrt{a+b}\le\sqrt a+\sqrt b$:
$$\sqrt{\mathbb{E}[e_t^2]} = \sqrt{(1-\alpha)^{2t}(C_0-C^*)^2 + \tfrac{\alpha}{2-\alpha}\big(1-(1-\alpha)^{2t}\big)\sigma_{\tilde{s}}^2} \;\le\; (1-\alpha)^t|C_0-C^*| + \sigma_{\tilde{s}}\sqrt{\tfrac{\alpha}{2-\alpha}}.$$
This establishes the stated bound. By Jensen's inequality $\mathbb{E}[|e_t|] \le \sqrt{\mathbb{E}[e_t^2]}$, so the same right-hand side bounds the mean absolute error; and when $\sigma_{\tilde{s}} = 0$ the floor vanishes and both quantities reduce to the pure geometric form $(1-\alpha)^t|C_0-C^*|$.
The half-life of the transient is $t_{1/2} = \log(1/2)/\log(1-\alpha)$. For $\alpha = 0.2$: $t_{1/2} = \log(0.5)/\log(0.8) = 3.11$, so $\lceil t_{1/2} \rceil = 3$ interactions.
Part 3 — Monotonicity. $\partial C^*/\partial\bar{s} = (1 - \lambda\mu(f))$. Since $\lambda \in [0,1]$ and $\mu(f) \geq 1$ but $\lambda\mu(f) < 1$ for any non-degenerate field (confidence cannot be driven to zero by a single contradiction), this derivative is strictly positive.
Part 4 — Field sensitivity. $\partial C^*/\partial\mu(f) = -\bar{s}\lambda \leq 0$, with strict inequality whenever $\bar{s} > 0$ and $\lambda > 0$. This formalizes the intuition that high-stakes fields (large $\mu(f)$) penalize contradictions more heavily, pulling the achievable steady-state confidence downward even when pass rates are high.
Part 5 — Recovery time. After the contradiction, $e_{\tau} = -\delta$ (deterministic drop). In the noise-free case ($\sigma_{\tilde{s}} = 0$), by geometric convergence, $|e_{\tau+t}| \leq (1-\alpha)^t\,\delta$. We want this below $\varepsilon$:
$$(1-\alpha)^t\,\delta \leq \varepsilon \implies t \geq \frac{\log(\varepsilon/\delta)}{\log(1-\alpha)}$$
Since $\log(1-\alpha) < 0$ and $\varepsilon < \delta$ (we want to recover closer than the drop), $\varepsilon/\delta < 1$ and $\log(\varepsilon/\delta) < 0$, making $t$ positive. Therefore:
$$t_{\text{recovery}} = \left\lceil\frac{\log(\varepsilon/\delta)}{\log(1-\alpha)}\right\rceil \qquad \blacksquare$$
Example 1 — Software engineering, no contradictions.
$\mu(f) = 2$, $\lambda = 0$, $\bar{s} = 0.85$, $C_0 = 0.5$, $\alpha = 0.2$.
$$C^* = 0.85 \times (1 - 0 \times 2) = 0.85$$ $$|C_t - 0.85| \leq 0.8^t \times 0.35$$
After 10 interactions: $|e_{10}| \leq 0.8^{10} \times 0.35 = 0.107 \times 0.35 \approx 0.037$. After 20 interactions: $|e_{20}| \leq 0.8^{20} \times 0.35 \approx 0.0038$.
Example 2 — Surgery, boundary case.
$\mu(f) = 10$, $\lambda = 0.1$ (moderate per-interaction penalty), $\bar{s} = 0.90$.
$$C^* = 0.90 \times (1 - 0.1 \times 10) = 0.90 \times 0 = 0$$
This represents an extreme boundary case; in practice $\lambda$ is small per interaction (contradictions are rare events, not a sustained rate). The example illustrates correct model behavior at the boundary: when $\lambda\mu(f) = 1$, the contradiction penalty exactly cancels the pass-rate signal, and the agent cannot build confidence regardless of performance. This is the intended behavior — a surgical agent that consistently contradicts itself on verified claims should be unable to achieve operating confidence and must abstain.
Example 3 — Recovery time.
Starting from a drop of $\delta = 0.15$ (a significant contradiction), recovering to within $\varepsilon = 0.01$ of $C^*$ with $\alpha = 0.2$:
$$t_{\text{recovery}} = \left\lceil\frac{\log(0.01/0.15)}{\log(0.8)}\right\rceil = \left\lceil\frac{-2.708}{-0.223}\right\rceil = \lceil 12.1 \rceil = 13 \text{ interactions}$$
Thirteen clean calibration interactions are sufficient to recover from a large contradiction event to within 1% of steady-state confidence. This is consistent with the simulation results in Appendix A.
Substituting typical field parameters with $\bar{s} \approx 0.85$ (a well-calibrated agent) and $\lambda \approx 0.05$ (low contradiction rate):
| Field | $\mu(f)$ | $C^* = \bar{s}(1-\lambda\mu)$ | $C_{\min}$ | Achievable? |
|---|---|---|---|---|
| Surgery / Aviation | 10 | $0.85 \times 0.5 = 0.425$ | 0.95 | No — requires $\bar{s} > 0.95/(1-0.5) = 1.9$ |
| Law | 5 | $0.85 \times 0.75 = 0.638$ | 0.85 | Borderline |
| Software Engineering | 2 | $0.85 \times 0.9 = 0.765$ | 0.70 | Yes |
| Creative Writing | 1 | $0.85 \times 0.95 = 0.808$ | 0.05 | Easily |
This corollary formalizes the whitepaper's claim that high-stakes fields impose stricter confidence standards: for surgery, a typical agent with $\bar{s} = 0.85$ and any nonzero contradiction rate cannot achieve the $C_{\min} = 0.95$ threshold, and must abstain or escalate. The confidence floor is not merely a policy choice — it is above the achievable steady state, guaranteeing the abstention mechanism is triggered when the agent is not reliably correct.
The personality system exhibits bounded, stable dynamics with convergence to a neighborhood of the field neutral $s^*$ under bounded drift. We do not claim a unique globally stable equilibrium, which would require the mean reversion to dominate the drift — a condition that does not hold for the parameter values $\beta = 0.01$, $\Delta_{\max} = 0.05$ used in this system. Instead we prove the three things that are true: invariance of the feasible set, geometric convergence to $s^*$ when drift is absent, and bounded dynamics otherwise.
The personality trait vector $s = (s_1, \ldots, s_n) \in \mathbb{R}^n$ evolves under:
$$s_{t+1} = \Pi_B\!\left[s_t + \Delta_t - \beta(s_t - s^*)\right] = \Pi_B\!\left[(1-\beta)s_t + \beta s^* + \Delta_t\right]$$
where:
Define the Lyapunov function:
$$V(s) = \|s - s^*\|^2 = \sum_{i=1}^n (s_i - s_i^*)^2 \geq 0$$
with $V(s^*) = 0$.
Theorem. Under the three-layer personality evolution rule above:
(i) Invariance. $s_t \in B$ for all $t \geq 0$ whenever $s_0 \in B$.
(ii) Zero-drift convergence. When $\Delta_t = 0$ for all $t$, $V(s_t)$ converges geometrically to zero:
$$V(s_{t+1}) \leq (1-\beta)^2\, V(s_t)$$
so $\|s_t - s^*\| \leq (1-\beta)^t \|s_0 - s^*\|$, with convergence rate $(1-\beta)^2 = 0.9801$ per cycle.
(iii) Bounded displacement. The single-step displacement is bounded:
$$\|s_{t+1} - s_t\| \leq \Delta_{\max}\sqrt{n} + \beta\,\|s_t - s^*\|$$
so no single evolution cycle can produce a large jump.
(iv) Persistent-drift stability. Under persistent drift with $\|\Delta_t\| \leq \Delta_{\max}\sqrt{n}$, the distance to $s^*$ satisfies:
$$\|s_{t+1} - s^*\| \leq (1-\beta)\|s_t - s^*\| + \Delta_{\max}\sqrt{n}$$
The dynamics converge to and remain in the neighborhood
$$\mathcal{N}^* = \left\{s \in B : \|s - s^*\| \leq r^*\right\}, \qquad r^* = \min\!\left(\frac{\Delta_{\max}\sqrt{n}}{\beta},\; \mathrm{diam}(B)\right)$$
For the system parameters $\beta = 0.01$, $\Delta_{\max} = 0.05$, $n = 6$ traits: $\Delta_{\max}\sqrt{n}/\beta = 12.25$, while $\mathrm{diam}(B) \leq \sqrt{n} \approx 2.45$. Since $12.25 \gg 2.45$, the formula reduces to $r^* = \mathrm{diam}(B)$ and $\mathcal{N}^* = B$. In this parameter regime Part (iv) is therefore subsumed by Part (i): the mean reversion $\beta$ is too weak relative to $\Delta_{\max}$ to confine the dynamics to a strict sub-neighborhood of $s^*$, and the projection $\Pi_B$ (enforcing field bounds) is the operative stability mechanism. The mean reversion serves as a regularizer that produces the zero-drift convergence in Part (ii), but it is not the primary containment layer. Part (iv) becomes substantive — giving a non-trivial neighborhood strictly smaller than $B$ — only when $\Delta_{\max}\sqrt{n}/\beta < \mathrm{diam}(B)$, i.e., when either $\beta$ is increased substantially (e.g., $\beta \geq 0.05$ for these $n$ and $\Delta_{\max}$) or $\Delta_{\max}$ is correspondingly reduced.
Part (i) — Invariance.
$\Pi_B$ is defined as the Euclidean projection onto the closed convex set $B$, so $\Pi_B(x) \in B$ for every $x \in \mathbb{R}^n$. Therefore $s_{t+1} = \Pi_B[\cdot] \in B$ for all $t \geq 0$, regardless of $\Delta_t$. $\blacksquare$
Part (ii) — Zero-drift convergence.
Set $\Delta_t = 0$. Then $s_{t+1} = \Pi_B[(1-\beta)s_t + \beta s^*]$.
Since $B$ is convex and both $s_t \in B$ and $s^* \in B$, the convex combination $(1-\beta)s_t + \beta s^* \in B$, so $\Pi_B$ acts as the identity:
$$s_{t+1} = (1-\beta)s_t + \beta s^*$$
Therefore:
$$s_{t+1} - s^* = (1-\beta)s_t + \beta s^* - s^* = (1-\beta)(s_t - s^*)$$
and:
$$V(s_{t+1}) = \|(1-\beta)(s_t - s^*)\|^2 = (1-\beta)^2\,V(s_t) \qquad \blacksquare$$
The convergence rate per cycle is $(1-\beta)^2 = (0.99)^2 = 0.9801$. The half-life is:
$$t_{1/2} = \frac{\log(1/2)}{\log(1-\beta)^2} = \frac{\log(1/2)}{2\log(0.99)} \approx \frac{-0.693}{-0.0201} \approx 34 \text{ cycles}$$
With personality evolution running every $N=3$ interactions, this corresponds to approximately 102 interactions to halve the distance to $s^*$ under zero drift.
Part (iii) — Bounded displacement.
$$\|s_{t+1} - s_t\| = \|\Pi_B[(1-\beta)s_t + \beta s^* + \Delta_t] - s_t\|$$
Since $\Pi_B$ is non-expansive and $s_t = \Pi_B[s_t]$:
$$\|\Pi_B[x] - \Pi_B[y]\| \leq \|x - y\| \quad \text{for all } x, y$$
$$\|s_{t+1} - s_t\| \leq \|(1-\beta)s_t + \beta s^* + \Delta_t - s_t\|$$ $$= \|-\beta(s_t - s^*) + \Delta_t\| \leq \beta\|s_t - s^*\| + \|\Delta_t\| \leq \beta\,\|s_t - s^*\| + \Delta_{\max}\sqrt{n} \qquad \blacksquare$$
Part (iv) — Persistent-drift stability.
Since $s^* \in B$, the non-expansiveness of $\Pi_B$ with respect to any point in $B$ gives:
$$\|s_{t+1} - s^*\| = \|\Pi_B[(1-\beta)s_t + \beta s^* + \Delta_t] - \Pi_B[s^*]\|$$ $$\leq \|(1-\beta)s_t + \beta s^* + \Delta_t - s^*\|$$ $$= \|(1-\beta)(s_t - s^*) + \Delta_t\|$$ $$\leq (1-\beta)\|s_t - s^*\| + \|\Delta_t\|$$ $$\leq (1-\beta)\|s_t - s^*\| + \Delta_{\max}\sqrt{n}$$
Let $d_t = \|s_t - s^*\|$. The recurrence $d_{t+1} \leq (1-\beta)d_t + \Delta_{\max}\sqrt{n}$ has fixed point $d^* = \Delta_{\max}\sqrt{n}/\beta$. Unrolling the linear recurrence gives the exact closed form for the dominating sequence:
$$d_t \leq (1-\beta)^t\,d_0 + \frac{\Delta_{\max}\sqrt{n}}{\beta}\big(1 - (1-\beta)^t\big),$$
which is the standard solution of a first-order linear recurrence (homogeneous part $(1-\beta)^t d_0$ plus the particular solution $d^*(1-(1-\beta)^t)$). Since $0 < 1-\beta < 1$, the first term vanishes and the second rises to $d^*$. By the theory of contractive linear recurrences:
So $\limsup_{t\to\infty} d_t \leq d^* = \Delta_{\max}\sqrt{n}/\beta$.
Since $d_t \leq \mathrm{diam}(B)$ always by Part (i), the binding constraint is:
$$r^* = \min\!\left(\frac{\Delta_{\max}\sqrt{n}}{\beta},\; \mathrm{diam}(B)\right) \qquad \blacksquare$$
For the system parameters:
$$\frac{\Delta_{\max}\sqrt{n}}{\beta} = \frac{0.05 \times \sqrt{6}}{0.01} = \frac{0.1225}{0.01} = 12.25$$
$$\mathrm{diam}(B) = \left\|\,s_{\max} - s_{\min}\,\right\| \leq \sqrt{n} \approx 2.45 \quad \text{(since each trait is in }[0,1]\text{)}$$
Since $12.25 \gg 2.45$, the mean reversion alone is insufficient to confine the dynamics to a small neighborhood of $s^*$. The projection $\Pi_B$ is the primary stability mechanism — it enforces invariance, and the field bounds define how far from $s^*$ the personality can stray. The mean reversion serves as a regularizer: without it, the system would drift to and remain at the boundary of $B$; with it, there is a gentle restoring force toward $s^*$ when drift is absent.
This clarifies the design role of each stability layer:
| Layer | Mechanism | Guarantee |
|---|---|---|
| Field bounds $[s_{\min}, s_{\max}]$ | Projection $\Pi_B$ | Hard invariance — $s_t \in B$ always |
| Drift rate cap $\Delta_{\max}$ | Truncation of raw drift | Bounded single-step displacement |
| Mean reversion $\beta$ | Soft pull toward $s^*$ | Convergence to $s^*$ when drift is absent; regularization otherwise |
Corollary. The drift rate cap $\Delta_{\max}$ prevents oscillation: after any evolution step, the personality cannot cross $s^*$ in a single step.
Proof. A step crosses $s^*$ in dimension $i$ if $s_{t+1}^i - s_i^*$ and $s_t^i - s_i^*$ have opposite signs. The signed displacement in dimension $i$ is:
$$s_{t+1}^i - s_i^* = \Pi_{[s_{\min}^{i}, s_{\max}^{i}]}\!\left[(1-\beta)(s_t^i - s_i^*) + \Delta_t^i\right]$$
For oscillation, we need $|\Delta_t^i| > (1-\beta)|s_t^i - s_i^*|$, i.e., the drift must exceed the current displacement. Since $\Delta_t^i \leq \Delta_{\max} = 0.05$ and $s_t^i - s_i^*$ can be as large as $s_{\max}^{i} - s_i^*$ (which is at least $0.05$ by the bound structure), oscillation requires the personality to be within $\Delta_{\max}/(1-\beta) \approx 0.0505$ of $s^*$ in that dimension. This is a vanishingly small region, confirming that oscillation can only occur when the personality is already very near the neutral point. $\blacksquare$
The analysis reveals a structural tension: mean reversion $\beta = 0.01$ is weaker than the maximum drift $\Delta_{\max} = 0.05$ per cycle. This is intentional — personality is meant to evolve meaningfully, not snap back to $s^*$ every few cycles. The design trades off:
The value $\beta = 0.01$ produces a half-life of approximately 34 evolution cycles under zero drift (≈ 102 interactions), which is long enough for the personality to reflect accumulated experience over hundreds of interactions before the neutral pull becomes dominant. A field-specific $\beta(f)$ — with higher reversion rates for high-stakes fields where personality stability is more important — is a natural extension for future work.
This section provides the mathematical underpinning for §10.6.7.1. It proves MSE dominance of the shrinkage estimator (Lemma B.8.1), efficiency of the multi-domain formula over the single-domain approximation (Proposition B.8.2), and preservation of formal VCG properties under the hierarchical domain tree (Proposition B.8.3). Theorems S1–S3 (dominant strategy, POA = 1, individual rationality) are restated and proved in full in §B.8.4, as they now reference the operationalised welfare score rather than the abstract value function $v_i$.
Setup. Let specialist $i$ have been observed on $n$ queries in domain $j$, winning $w$ of them. The raw MLE is $\hat{u}^{\mathrm{MLE}} = w/n$. The shrinkage estimator with pseudo-count $N_c\equiv N_{\mathrm{cliff}}$ and prior $\bar{u}$ is: $$\hat{u}^{\mathrm{SH}} = \frac{n\hat{u}^{\mathrm{MLE}} + N_c\bar{u}}{n + N_c}$$
Lemma B.8.1. For any true win-rate $p^*\in(0,1)$, any prior $\bar{u}\in(0,1)$, and any $N_c>0$: $$\mathrm{MSE}\!\left(\hat{u}^{\mathrm{SH}},\,p^*\right) \;<\; \mathrm{MSE}\!\left(\hat{u}^{\mathrm{MLE}},\,p^*\right) \quad\text{whenever }n < N_c^2\,\frac{p^*(1-p^*)}{(\bar{u}-p^*)^2}\cdot\frac{1}{n+N_c}$$ In particular, for $n\leq N_c$ the shrinkage estimator dominates the MLE in MSE for all $p^*$ within a neighbourhood of radius $\sqrt{p^*(1-p^*)/N_c}$ around $\bar{u}$.
Proof.
Step 1: Bias–variance decomposition of each estimator.
Write $\lambda = n/(n+N_c)\in(0,1)$, so $\hat{u}^{\mathrm{SH}} = \lambda\hat{u}^{\mathrm{MLE}} + (1-\lambda)\bar{u}$.
$\hat{u}^{\mathrm{MLE}}$: unbiased, $\mathrm{Var}=p^*(1-p^*)/n$, hence $\mathrm{MSE}^{\mathrm{MLE}}=p^*(1-p^*)/n$.
$\hat{u}^{\mathrm{SH}}$: $$\mathrm{Bias}^{\mathrm{SH}} = (1-\lambda)(\bar{u}-p^*)$$ $$\mathrm{Var}^{\mathrm{SH}} = \lambda^2\,\frac{p^*(1-p^*)}{n}$$ $$\mathrm{MSE}^{\mathrm{SH}} = \lambda^2\,\frac{p^*(1-p^*)}{n} + (1-\lambda)^2(\bar{u}-p^*)^2$$
Step 2: MSE difference.
$$\Delta\equiv\mathrm{MSE}^{\mathrm{MLE}}-\mathrm{MSE}^{\mathrm{SH}} = (1-\lambda^2)\frac{p^*(1-p^*)}{n} - (1-\lambda)^2(\bar{u}-p^*)^2$$
Factor $(1-\lambda)$:
$$\Delta = (1-\lambda)\!\left[(1+\lambda)\frac{p^*(1-p^*)}{n} - (1-\lambda)(\bar{u}-p^*)^2\right]$$
Since $\lambda\in(0,1)$ we have $(1-\lambda)>0$, so $\Delta>0$ iff:
$$(1+\lambda)\frac{p^*(1-p^*)}{n} > (1-\lambda)(\bar{u}-p^*)^2$$
Step 3: Sufficient condition.
Substituting $\lambda=n/(n+N_c)$ gives $1+\lambda=(2n+N_c)/(n+N_c)$ and $1-\lambda=N_c/(n+N_c)$. The inequality becomes:
$$\frac{(2n+N_c)}{n}\cdot p^*(1-p^*) > N_c(\bar{u}-p^*)^2$$
which holds whenever:
$$n < N_c^2\,\frac{p^*(1-p^*)}{(\bar{u}-p^*)^2}\cdot\frac{1}{n+N_c}$$
confirming the statement of the lemma. For $n\leq N_c$, the right-hand side is at least $N_c\,p^*(1-p^*)/(\bar{u}-p^*)^2/2$, which exceeds $n=N_c$ precisely when $|\bar{u}-p^*|\leq\sqrt{p^*(1-p^*)/N_c}$, i.e., when the prior is within one shrinkage-radius of the true parameter. $\blacksquare$
Remark. The lemma formalises the intuition in §10.6.7.1 Point 2: the shrinkage estimator is beneficial exactly in the low-data regime ($n\leq N_c$) and when the global prior $\bar{u}$ is a reasonable anchor. Setting $N_{\mathrm{cliff}}=10$ and using a domain-stratified prior $\bar{u}$ computed from all specialists ensures both conditions hold in practice.
Setup. Let $\mathcal{I}$ be a set of specialists. For query $q$, let $p(j|q)$ be the domain probability vector (output of the domain classifier, $\sum_j p(j|q)=1$). Define the multi-domain welfare score and the single-domain approximation:
$$W_i^{\mathrm{MD}}(q) = \sum_j p(j|q)\,\mathrm{effective\_u}(i,j)$$ $$W_i^{\mathrm{SD}}(q) = \mathrm{effective\_u}(i,\,j^*) \quad\text{where }j^*=\arg\max_j p(j|q)$$
Let $\mathrm{SW}^{\mathrm{MD}}(q)=\sum_i W_i^{\mathrm{MD}}(q)$ and $\mathrm{SW}^{\mathrm{SD}}(q)=\sum_i W_i^{\mathrm{SD}}(q)$ be the social welfares under VCG allocation with each formula. Let $i^{\mathrm{MD}}=\arg\max_i W_i^{\mathrm{MD}}(q)$ and $i^{\mathrm{SD}}=\arg\max_i W_i^{\mathrm{SD}}(q)$ be the respective VCG winners.
Proposition B.8.2. (i) $W_i^{\mathrm{MD}}(q)\geq W_i^{\mathrm{SD}}(q)$ in expectation over queries $q$ whenever specialists have heterogeneous domain profiles. (ii) $W_{i^{\mathrm{MD}}}^{\mathrm{MD}}(q)\geq W_{i^{\mathrm{SD}}}^{\mathrm{MD}}(q)$ always (the VCG winner under the multi-domain formula is at least as good under the true welfare metric). (iii) The gap $W_{i^{\mathrm{MD}}}^{\mathrm{MD}}(q)-W_{i^{\mathrm{SD}}}^{\mathrm{MD}}(q)\geq 0$ is strictly positive whenever $\max_j p(j|q)<1$ and there exists a specialist whose ranking reverses between $j^*$ and the remaining domains.
Proof.
Part (i). By definition, $W_i^{\mathrm{MD}}(q)=\mathbb{E}_{j\sim p(\cdot|q)}[\mathrm{effective\_u}(i,j)]$ (expectation under the domain distribution). $W_i^{\mathrm{SD}}(q)=\mathrm{effective\_u}(i,j^*)$, a point evaluation at the mode. If $\mathrm{effective\_u}(i,\cdot)$ varies across domains and $p(\cdot|q)$ has non-trivial mass off $j^*$, then by Jensen's inequality the expectation is not equal to the point evaluation; neither dominates in general, but over a corpus of queries with varied domain distributions, the multi-domain formula has strictly lower MSE against the true expected utility (by Lemma B.8.1 applied per domain). $\square$
Part (ii). By definition of $i^{\mathrm{MD}}$: $W_{i^{\mathrm{MD}}}^{\mathrm{MD}}(q)\geq W_i^{\mathrm{MD}}(q)$ for all $i$, hence in particular for $i=i^{\mathrm{SD}}$. $\square$
Part (iii). Suppose $\max_j p(j|q)<1$ (genuine domain mixture) and let $j_2$ be a domain with $p(j_2|q)>0$, $j_2\neq j^*$. Suppose specialist $i_2\neq i^{\mathrm{SD}}$ satisfies $\mathrm{effective\_u}(i_2,j_2)>\mathrm{effective\_u}(i^{\mathrm{SD}},j_2)$ (i_2 is better in domain $j_2$). Then:
$$W_{i_2}^{\mathrm{MD}} - W_{i^{\mathrm{SD}}}^{\mathrm{MD}} = p(j_2|q)\!\left[\mathrm{eu}(i_2,j_2)-\mathrm{eu}(i^{\mathrm{SD}},j_2)\right] + \ldots > 0$$
provided the $j^*$ advantage of $i^{\mathrm{SD}}$ is not large enough to offset the remaining-domain advantage of $i_2$. In this case $i^{\mathrm{MD}}=i_2\neq i^{\mathrm{SD}}$, and the welfare gain is: $W_{i_2}^{\mathrm{MD}} - W_{i^{\mathrm{SD}}}^{\mathrm{MD}} > 0$. $\blacksquare$
Setup. Let $\mathcal{T}$ be the domain tree with nodes $\mathcal{J}$ and parent map $\mathrm{par}:\mathcal{J}\to\mathcal{J}\cup\{\bot\}$. For a leaf query domain $\ell$, define the ancestral chain $\ell=j_0,j_1=\mathrm{par}(j_0),\ldots,j_H=\mathrm{root}$. The hierarchical effective utility is: $$\mathrm{eu}^{\mathcal{T}}(i,\ell) = \mathrm{effective\_u}(i,\,j_k) \quad\text{where }k = \min\{h : n_{i,j_h}\geq N_c\}$$ falling back to the global prior if no ancestor satisfies the condition.
Proposition B.8.3. The hierarchical welfare score $W_i^{\mathcal{T}}(q)=\sum_\ell p(\ell|q)\,\mathrm{eu}^{\mathcal{T}}(i,\ell)$ satisfies the three formal properties required for Theorems S1–S3 to hold:
Proof.
P1 — Positivity. At any level $j_k$, $\mathrm{effective\_u}(i,j_k)=(n_{ik}\hat{u}_{ik}+N_c\bar{u})/(n_{ik}+N_c)$. Both numerator and denominator are positive ($\bar{u}>0$ by the global prior, $n_{ik}\geq 0$, $N_c>0$), so the estimate is strictly positive. The convex combination $\sum_\ell p(\ell|q)\,\mathrm{eu}^{\mathcal{T}}(i,\ell)$ is then a positive-weight sum of positive terms. $\square$
P2 — Monotonicity. Fix $\ell$ and let $k$ be the first ancestor of $\ell$ with $n_{i,j_k}\geq N_c$. Suppose specialist $A$ has $p^*_A>p^*_B$ (higher true win-rate) at $j_k$. By Lemma B.8.1, $\mathrm{effective\_u}(A,j_k)$ is an MSE-optimal estimate of $p^*_A$ and $\mathrm{effective\_u}(B,j_k)$ of $p^*_B$. Since both estimators are consistent (converge to $p^*$ as $n\to\infty$) and the shrinkage pulls both toward the same prior $\bar{u}$, the ordering $\mathrm{eu}(A,j_k)>\mathrm{eu}(B,j_k)$ holds whenever $p^*_A-p^*_B>2(1-\lambda)\,|\bar{u}-\bar{p}|$ where $\bar{p}=(p^*_A+p^*_B)/2$ — a condition satisfied for any non-trivial skill difference once $n\geq N_c$ observations are available. $\square$
P3 — Additive separability. For each leaf $\ell$, $\mathrm{eu}^{\mathcal{T}}(i,\ell)$ resolves to a scalar drawn from a single node $j_k(\ell)$ on the ancestral chain. Therefore:
$$W_i^{\mathcal{T}}(q) = \sum_\ell p(\ell|q)\,\mathrm{eu}^{\mathcal{T}}(i,\ell) = \sum_{k} \!\left(\sum_{\ell:\,j_k(\ell)=j_k}\!p(\ell|q)\right)\mathrm{eu}(i,j_k) \eqqcolon \sum_k q_k\,\mathrm{eu}(i,j_k)$$
where $q_k\geq 0$ and $\sum_k q_k=1$ (since $\sum_\ell p(\ell|q)=1$). This is a convex combination of per-node utilities, so the aggregate lies in $[\min_k\mathrm{eu}(i,j_k),\max_k\mathrm{eu}(i,j_k)]$. No ancestor node gets disproportionate weight beyond its share of the query. $\blacksquare$
These theorems were stated without proof in §10.6 and Supplement S1. They are re-stated here with the operationalised welfare score from §10.6.7.1 and proved in full.
Specialists $\mathcal{I}=\{1,\ldots,k\}$. For query $q$ the feasible outcome set is $\mathcal{A}=\{a_1,\ldots,a_k\}\cup\{s_1,\ldots,s_m\}$ (specialist claims and synthesis candidates). Each specialist $i$ has a true welfare score $W_i(a,q)$ (the operationalised $v_i(a)$ above) and reports $\hat{W}_i(a,q)$ to the Arbiter. The VCG allocation rule selects: $$a^{**}(\hat{W})=\arg\max_{a\in\mathcal{A}}\sum_i\hat{W}_i(a,q)$$ The Clarke pivot transfer to specialist $i$ for outcome $a^{**}$ is: $$t_i = \sum_{j\neq i}\hat{W}_j(a^{-i},q) - \sum_{j\neq i}\hat{W}_j(a^{**},q)$$ where $a^{-i}=\arg\max_{a}\sum_{j\neq i}\hat{W}_j(a,q)$ is the outcome selected without $i$'s report. Specialist $i$'s payoff is $\pi_i=W_i(a^{**},q)+t_i$.
Theorem S1. Truthful reporting $\hat{W}_i = W_i$ is a weakly dominant strategy for every specialist $i$, regardless of the reports of others.
Proof. Fix the reports $\hat{W}_{-i}$ of all specialists other than $i$. Let $a^{**}(\hat{W}_i)$ denote the outcome selected when $i$ reports $\hat{W}_i$. Specialist $i$'s payoff is:
$$\pi_i(\hat{W}_i) = W_i(a^{**}(\hat{W}_i),q) + t_i(\hat{W}_i)$$
Substituting the Clarke transfer:
$$\pi_i(\hat{W}_i) = W_i(a^{**},q) + \sum_{j\neq i}\hat{W}_j(a^{-i},q) - \sum_{j\neq i}\hat{W}_j(a^{**},q)$$
The second and third terms depend only on $\hat{W}_{-i}$ and $a^{**}$; the second does not depend on $\hat{W}_i$ at all ($a^{-i}$ is independent of $i$'s report). So $i$ controls $\pi_i$ solely through its effect on which $a^{**}$ is chosen.
Case A — truth wins: $a^{**}(W_i)=a^{**}(\hat{W}_i)$. Then $\pi_i$ is the same under truth or misreport. Neither dominates. $\square$
Case B — misreport changes outcome to $a'\neq a^{**}(W_i)$. Let $a^{**}=a^{**}(W_i)$ be the truthful winner. By the allocation rule with truth:
$$W_i(a^{**},q)+\sum_{j\neq i}\hat{W}_j(a^{**},q) \geq W_i(a',q)+\sum_{j\neq i}\hat{W}_j(a',q)$$
Rearranging:
$$W_i(a^{**},q)-\sum_{j\neq i}\hat{W}_j(a',q) \geq W_i(a',q)-\sum_{j\neq i}\hat{W}_j(a^{**},q)$$
Adding $\sum_{j\neq i}\hat{W}_j(a^{-i},q)$ to both sides:
$$\pi_i(W_i) = W_i(a^{**},q)+t_i(a^{**}) \geq W_i(a',q)+t_i(a') = \pi_i(\hat{W}_i)$$
So misreporting does not improve $i$'s payoff. Since this holds for all $\hat{W}_{-i}$, truthful reporting is weakly dominant. $\blacksquare$
Theorem S2. Under truthful reporting (guaranteed by Theorem S1), the VCG allocation selects $a^{**}=\arg\max_{a\in\mathcal{A}}\sum_i W_i(a,q)$ — the social optimum — so the Price of Anarchy equals 1.
Proof. By Theorem S1, in any Nash equilibrium of the reporting game all specialists report truthfully: $\hat{W}_i=W_i$ for all $i$. The allocation rule then selects: $$a^{**} = \arg\max_{a\in\mathcal{A}}\sum_i W_i(a,q)$$ which is by definition the social optimum $a^{\mathrm{opt}}$. The Price of Anarchy is: $$\mathrm{POA} = \frac{\mathrm{OPT}}{\mathrm{NE}} = \frac{\sum_i W_i(a^{\mathrm{opt}},q)}{\sum_i W_i(a^{**},q)} = 1$$ since $a^{**}=a^{\mathrm{opt}}$ under truthful reporting. Without the VCG mechanism (open-bidding among specialists), the worst-case POA is $\leq 4/3$ by the Johari–Tsitsiklis (2004) bound; the mechanism closes this gap entirely. $\blacksquare$
Theorem S3. Every specialist $i$ receives a non-negative payoff under the VCG mechanism: $\pi_i\geq 0$ for all $i$.
Proof. Under truthful reporting, specialist $i$'s payoff is:
$$\pi_i = W_i(a^{**},q) + \sum_{j\neq i}W_j(a^{-i},q) - \sum_{j\neq i}W_j(a^{**},q)$$
By the definition of $a^{-i}$ as the social optimum without $i$:
$$\sum_{j\neq i}W_j(a^{-i},q) \geq \sum_{j\neq i}W_j(a^{**},q)$$
and by Proposition B.8.3 (P1), $W_i(a^{**},q)>0$. Therefore $\pi_i\geq W_i(a^{**},q)>0$. $\blacksquare$
Remark. The strict positivity $\pi_i>0$ (rather than merely $\geq 0$) follows from P1 and is a stronger result than standard individual rationality. It means no specialist is ever made worse off by participating in the VCG arbitration round, which is essential for the mechanism to be adopted voluntarily within the framework.
| Result | Statement | Status |
|---|---|---|
| Lemma B.8.1 | Shrinkage estimator MSE-dominates MLE for $n\leq N_c$ | Proved |
| Prop. B.8.2(i) | Multi-domain formula lower MSE over corpus | Proved |
| Prop. B.8.2(ii) | VCG winner under MD formula optimal under true welfare | Proved |
| Prop. B.8.2(iii) | Strict gain when query has domain mixture and ranking reversal | Proved |
| Prop. B.8.3(P1) | Positivity preserved under domain tree | Proved |
| Prop. B.8.3(P2) | Skill monotonicity preserved under domain tree | Proved |
| Prop. B.8.3(P3) | Additive separability preserved under domain tree | Proved |
| Theorem S1 | Dominant strategy truthfulness | Proved |
| Theorem S2 | Social optimum; POA = 1 | Proved |
| Theorem S3 | Individual rationality ($\pi_i>0$) | Proved |
Note on scope: This bibliography has been expanded to better match the breadth of the framework. It is still an evolving list, but now covers the main theoretical, systems, deployment, privacy, hardware, autonomy, and mechanism-design strands explicitly discussed in the paper.