Whitepaper · Appendix C · Numerical worked examples for 5 domains

Appendix C — Real-World Applications: Worked Examples

Praneeth Tota  ·  Illinois Institute of Technology  ·  v1.0.0

← Results (§§11–14) Table of Contents Appendix A →

Appendix C: Real-World Applications — Worked Examples

This appendix contains the full worked numerical examples for the five domains introduced in §2. Each example shows the attribute definitions, field weight tables, option comparisons, and explicit utility calculations. The condensed summaries in §2 provide the practical intuition; this appendix provides the mathematical grounding.

The utility function framework is not limited to language model calibration. The same architecture — field-weighted multi-attribute utility, contradiction detection, confidence-gated decisions, and adaptive weight evolution — applies naturally to any autonomous system that must make real-time decisions under competing objectives and uncertainty. This section illustrates five domains, each with a worked numerical example, to make the framework's practical mechanics concrete.

Two properties distinguish utility-governed agents from conventional rule-based or single-objective systems:

Nuanced trade-offs. Unlike rule-based systems ("if speed > threshold, brake"), utility agents evaluate all objectives simultaneously and select the action maximising their weighted sum. A slightly slower but substantially safer route is not a failure mode — it is the correct mathematical outcome when the safety weight is high.

Calibrated uncertainty. Confidence $C$ in the utility function encodes the agent's self-assessed reliability. When $C$ falls below the field-specific minimum $C_{\min}(f)$, the agent abstains or escalates rather than producing a low-quality output. An autonomous vehicle uncertain about road conditions abstains from a lane change rather than proceeding at reduced confidence.


C.1 Autonomous Vehicles — Multi-Objective Route and Manoeuvre Selection

Domain deep-dive: Self-Driving Vehicles · Autonomous Systems

An autonomous vehicle must balance safety, efficiency, and passenger comfort in real time. The utility function operates at the manoeuvre level (lane changes, following distance, speed selection) and at the route level (highway vs surface streets, rerouting around incidents).

Attributes and field weights

For the autonomous_driving field in standard mode:

$$U = w_s \cdot s + w_e \cdot e + w_c \cdot c$$

where $s$ = collision avoidance probability, $e$ = journey time efficiency, $c$ = ride comfort (inverse of harsh braking/acceleration events).

Mode$w_s$ (safety)$w_e$ (efficiency)$w_c$ (comfort)$C_{\min}$
Standard0.700.200.100.85
Emergency transport0.550.400.050.80
School zone0.900.050.050.95

Worked example — lane change decision

AttributeOption A: Aggressive lane changeOption B: Stay in lane
Safety $s$0.600.95
Efficiency $e$0.900.50
Comfort $c$0.400.80

$$U_A = (0.60 \times 0.70) + (0.90 \times 0.20) + (0.40 \times 0.10) = 0.42 + 0.18 + 0.04 = \mathbf{0.64}$$

$$U_B = (0.95 \times 0.70) + (0.50 \times 0.20) + (0.80 \times 0.10) = 0.665 + 0.10 + 0.08 = \mathbf{0.845}$$

Decision: Option B. Although Option A is much faster, the safety weight makes it mathematically inferior. In emergency-transport mode ($w_e = 0.40$), the same calculation yields $U_A = 0.73$, $U_B = 0.745$ — Option B still wins but by only 1.5pp, reflecting the legitimately higher weight on speed. In a school zone ($w_s = 0.90$), $U_A = 0.63$, $U_B = 0.895$ — the gap widens to 26.5pp, correctly encoding near-absolute safety priority.

Confidence gate. If sensor fusion uncertainty drives $C$ below $C_{\min} = 0.85$, the vehicle abstains from the lane change and signals the driver — rather than executing a manoeuvre it cannot verify is safe.

Adaptation over time. The assertions store accumulates verified patterns: "this motorway stretch has high merge conflict rate at 17:30–18:30." Future decisions on the same route retrieve this prior automatically, tightening $w_s$ during that window without manual reconfiguration.


C.1.1 Edge Deployment Note — Battery and Power Constraints

The Micro-Expert Architecture has a distinct advantage for in-vehicle deployment beyond cost: it is the only architecture that fits within automotive compute power envelopes. A monolithic frontier model on a datacenter GPU (H100: 700W, A100: 400W) cannot run on-board in any production vehicle. Domain-specialist models on Jetson-class embedded hardware can:

ComponentHardwarePower drawRole
Perception specialistJetson AGX Orin15–60WObject detection, lane marking, sign reading
Motion planning specialistJetson Orin NX10–25WTrajectory optimisation, collision avoidance
Traffic rules specialistJetson Orin NX10–25WTraffic law, edge cases, emergency protocols
Total system compute~110Wvs 700W for a single H100

The three specialists run in a pipeline, not sequentially: perception runs continuously, motion planning consumes its output as it arrives, rules checks run in parallel. Total added latency is the pipeline stage depth, not the sum of all inference times. Each specialist is independently updateable via software — a traffic law update does not require revalidating the perception model.

Published benchmarks confirm that domain-fine-tuned 7B specialists match or exceed general 70B models on their target tasks (§10.9.3). The Micro-Expert Architecture is therefore not merely cheaper for server deployments — for battery-constrained autonomous systems, it is the only viable architecture that achieves frontier-quality domain inference.

C.2 Drone Delivery — Dynamic Route Selection Under Environmental Uncertainty

Domain deep-dive: Autonomous Systems

A delivery drone must weigh delivery speed against energy consumption and collision risk — where risk is a function of current wind speed, airspace traffic, and battery state.

Attributes and field weights

$$U = w_s \cdot s + w_d \cdot d + w_\eta \cdot \eta$$

where $s$ = airspace safety margin, $d$ = on-time delivery probability, $\eta$ = energy efficiency (range preserved for return).

Mode$w_s$$w_d$$w_\eta$$C_{\min}$
Standard0.500.300.200.75
Storm warning active0.800.100.100.90
Battery < 30%0.400.200.400.80

Worked example — route selection with approaching storm

AttributeOption A: Direct high-altitude routeOption B: Low-altitude detour
Safety $s$0.450.88
Delivery speed $d$0.900.55
Energy efficiency $\eta$0.700.60

Standard mode: $U_A = (0.45\times0.50)+(0.90\times0.30)+(0.70\times0.20) = 0.225+0.27+0.14 = \mathbf{0.635}$, $U_B = \mathbf{0.725}$. Option B wins by 9pp.

Storm warning mode ($w_s=0.80$): $U_A = (0.45\times0.80)+(0.90\times0.10)+(0.70\times0.10) = \mathbf{0.520}$, $U_B = (0.88\times0.80)+(0.55\times0.10)+(0.60\times0.10) = \mathbf{0.819}$. Gap widens to 30pp.

Decision: Option B in both modes, but the framework's response is proportionate — mild preference for the safe route under normal conditions, strong preference as storm conditions activate. If wind exceeds a threshold driving $C$ below $C_{\min}(0.90)$, the drone returns to base rather than proceeding on a route it cannot safely evaluate.


C.2.1 Edge Deployment Note — Drone Power Budget

A delivery drone's total system power budget (motors, sensors, compute, communications) is typically 200–500W. A Jetson Orin NX running a 7B specialist model consumes 10–25W — leaving the vast majority of the budget for flight. A datacenter GPU at 400–700W exceeds the entire drone power budget before any other system draws power.

A single 7B routing + planning specialist on a Jetson Orin NX therefore enables on-board intelligent routing decisions that would otherwise require offloading to a ground server (adding network latency and a failure mode when connectivity drops). Battery life improves directly: reducing compute power from hypothetical 100W (GPU) to 25W (Jetson) extends flight time by a meaningful fraction of the total battery budget.

C.3 Smart Home Energy Management — Comfort vs Cost vs Carbon

Domain deep-dive: Energy Systems

A smart home agent controls HVAC, appliances, EV charging, and lighting to balance occupant comfort against electricity cost and carbon footprint — objectives that change moment to moment as grid prices, renewable availability, and occupant schedules shift.

Attributes and field weights

$$U = w_c \cdot c + w_k \cdot k + w_g \cdot g$$

where $c$ = comfort level, $k$ = cost efficiency (inverse of electricity spend), $g$ = inverse grid carbon intensity.

Mode$w_c$$w_k$$w_g$Trigger
Standard0.400.400.20Normal grid conditions
Peak pricing event0.200.650.15Price > 3× baseline
High solar availability0.400.200.40Solar generation > 80% load
Occupant comfort override0.750.150.10Explicit preference signal

Worked example — afternoon appliance scheduling

AttributeOption A: Run appliances now (3pm)Option B: Defer to off-peak (11pm)
Comfort $c$0.900.65
Cost efficiency $k$0.300.90
Carbon $g$0.350.75

Standard mode: $U_A = (0.90\times0.40)+(0.30\times0.40)+(0.35\times0.20) = \mathbf{0.550}$. $U_B = (0.65\times0.40)+(0.90\times0.40)+(0.75\times0.20) = \mathbf{0.770}$. Deferral wins.

Peak pricing: $U_A = \mathbf{0.428}$, $U_B = \mathbf{0.828}$. Gap doubles — correct, as cost efficiency weight rises to 0.65. When the occupant explicitly signals they need laundry done now, the comfort override ($w_c=0.75$) makes Option A optimal — the agent defers to expressed preference rather than optimising silently against it. Cross-session learning accumulates patterns ("Thursday 11pm reliably coincides with grid carbon dip") and future scheduling decisions retrieve this prior automatically.


C.4 Energy Grid Load Balancing — Stability vs Cost vs Renewables

Domain deep-dive: Energy Systems

A grid management agent balances load across generation sources in real time, with decisions required in seconds and errors that cascade across interconnected infrastructure. High-stakes field parameters ($C_{\min}=0.95$ under surge) ensure the agent escalates to human operators when forecast reliability drops rather than committing large generation decisions under uncertainty.

Attributes and field weights

$$U = w_\sigma \cdot \sigma + w_e \cdot e + w_r \cdot r$$

where $\sigma$ = grid frequency stability, $e$ = generation cost efficiency, $r$ = renewable utilisation fraction.

Mode$w_\sigma$$w_e$$w_r$$C_{\min}$
Normal operations0.500.300.200.80
Demand surge (>15%)0.800.100.100.95
Renewable surplus0.350.250.400.75

Worked example — unexpected demand spike

AttributeOption A: Gas peaker plantOption B: Demand response + battery
Stability $\sigma$0.950.75
Cost efficiency $e$0.300.80
Renewable utilisation $r$0.100.70

Normal operations: $U_A = \mathbf{0.585}$, $U_B = \mathbf{0.755}$. Demand response wins — cleaner and cheaper, stability trade-off is acceptable.

Demand surge mode ($w_\sigma=0.80$): $U_A = (0.95\times0.80)+(0.30\times0.10)+(0.10\times0.10) = \mathbf{0.800}$. $U_B = (0.75\times0.80)+(0.80\times0.10)+(0.70\times0.10) = \mathbf{0.750}$. Decision flips. The gas peaker becomes optimal because grid stability now dominates — a 5pp reversal that directly encodes the shift in $w_\sigma$ from 0.50 to 0.80. The same architecture produces the right decision in both contexts without separate rule sets for each scenario.


C.5 Dynamic Pricing — Revenue vs Retention vs Market Share

Domain deep-dive: Dynamic Pricing

A dynamic pricing agent for a platform (ride-sharing, e-commerce, SaaS) must balance short-term revenue against customer satisfaction and long-run market share. Conventional surge pricing maximises immediate revenue but can be catastrophically bad for retention when applied without a utility trade-off framework.

Attributes and field weights

$$U = w_r \cdot r + w_s \cdot s + w_m \cdot m$$

where $r$ = revenue per transaction (normalised), $s$ = predicted customer satisfaction, $m$ = market share trajectory signal.

Mode$w_r$$w_s$$w_m$Trigger
Standard0.500.300.20Normal conditions
Competitive threat0.250.350.40Competitor price drop >10%
Supply crunch0.650.200.15Supply < 40% of demand
New market entry0.200.400.40New geography or category

Worked example — surge demand event

AttributeOption A: Surge +40%Option B: +10% with loyalty reward
Revenue $r$0.900.55
Satisfaction $s$0.250.80
Market share $m$0.300.75

Standard mode: $U_A = (0.90\times0.50)+(0.25\times0.30)+(0.30\times0.20) = \mathbf{0.585}$. $U_B = (0.55\times0.50)+(0.80\times0.30)+(0.75\times0.20) = \mathbf{0.665}$. Moderate pricing with loyalty reward wins.

Supply crunch mode ($w_r=0.65$): $U_A = \mathbf{0.680}$, $U_B = \mathbf{0.630}$. Surge pricing becomes optimal — the supply constraint makes moderate pricing financially untenable and the utility framework reflects this correctly. A flat "never surge" policy would be suboptimal in genuine crunch conditions; a flat "always surge on demand" policy destroys customer relationships in normal conditions. The utility framework selects the right option in each context through the same formula.

The assertions store accumulates segment-level elasticity data: "business travellers have satisfaction elasticity 0.3; leisure travellers 0.8." Future pricing decisions use segment-specific confidence-weighted $s$ values rather than a population average, improving both revenue and retention simultaneously.



C.6 AI Data Centers — Revenue per Watt, Fleet Heterogeneity, and Specialist Serving

Domain deep-dive: AI Data Centers

For an AI data center or GPU cloud operator, the framework is valuable only if it improves economics. The relevant variables are not raw benchmark score alone, but cost per useful query, revenue per watt, and utilisation across a heterogeneous fleet. That makes this domain a natural fit for the Micro-Expert Architecture.

Operator objective

One useful operator-facing utility is:

$$U = w_m \cdot m + w_u \cdot u + w_q \cdot q$$

where $m$ = margin per served query, $u$ = fleet utilisation, and $q$ = delivered query quality on the target specialist workload.

AttributeMonolithic frontier servingRouted specialist serving
Margin $m$0.450.80
Utilisation $u$0.550.78
Quality $q$0.920.88

With weights $(w_m, w_u, w_q) = (0.40, 0.30, 0.30)$, the routed specialist stack scores $U = 0.812$ versus $0.625$ for the monolith. The point is not that the specialist always wins on absolute quality, but that on the right query class it can dominate on the operator’s real objective function.

Why mixed fleets matter

A mixed fleet containing H100s, A100s, A40s, L40S-class cards, and consumer-adjacent GPUs is difficult to monetise if every customer must be served by one giant general model. Specialist routing changes that. A lower-tier GPU that cannot profitably host a broad 70B model may still be ideal for a fine-tuned 7B domain specialist with high utilisation and lower SLA burden. The framework therefore gives product meaning to hardware that might otherwise be commercially awkward.

LoRA multi-tenancy and tiered SLAs

The architecture also creates a clean path for LoRA multi-tenancy. A shared base specialist can stay resident while customer- or vertical-specific adapters rotate through it, improving effective utilisation and reducing cold-start overhead. Operators can then offer distinct products: frontier broad-model inference on scarce top-end hardware, and specialist guaranteed-quality inference on cheaper pools at lower headline price but potentially higher gross margin.


C.7 Self-Driving Companies — Modular Safety Cases, Shadow Mode, and Fleet Learning

Domain deep-dive: Self-Driving Vehicles · Autonomous Systems

For self-driving and autonomy companies, the framework’s strongest benefits are independent updateability, auditable decision traces, and formal abstention under uncertainty. Those are exactly the properties that current end-to-end black-box stacks struggle to demonstrate in certification and incident review.

Component decomposition

A natural three-way split is:

Updating one does not change the others. That matters because a regulator or internal safety board can narrow the revalidation scope to the changed component plus its interfaces, instead of re-opening the entire stack.

Worked framing — shadow mode as blue-green deployment

Autonomy teams already run shadow mode: a candidate model computes decisions without acting on them, and its outputs are compared to the live path. In this framework that becomes a formal blue-green deployment protocol. The candidate specialist is the green deployment, the production specialist is blue, and promotion depends on utility and safety metrics rather than ad hoc review alone.

Abstention and remote escalation

The confidence gate is also directly relevant. If road conditions, sensor fusion, or map consistency push confidence below the field threshold, the system should not “answer anyway.” It should defer: refuse the manoeuvre, slow, or escalate to a remote human operator depending on the operational design domain. This is not only good engineering; it aligns with how safety-critical autonomy is increasingly expected to behave under regulatory scrutiny.

Fleet learning without full-stack retraining

Edge cases encountered by one vehicle can be stored as structured assertions and later turned into DPO-weighted corrections for the relevant specialist only. A rare pedestrian behaviour in San Francisco improves the perception specialist fleet-wide without perturbing traffic rules or motion planning weights. That is the fleet-scale version of the paper’s cross-session learning claim, and it is significantly easier to certify than end-to-end full-stack retraining after every safety-relevant discovery.


C.8 Advantages Over Conventional Approaches

The five examples above make two structural advantages concrete:

Nuanced trade-offs without explicit rules. A conventional system requires a human to pre-enumerate every conflict scenario and write a resolution rule for each. The utility framework resolves conflicts automatically through the weighted sum — the drone selects the safe route in storm conditions without a "storm mode" rule having been pre-written, because the confidence-weighted shift in $w_s$ does the work. The number of scenarios a rule-based system must handle grows combinatorially with the number of objectives; the utility framework scales to arbitrarily many objectives with a single additive formula whose structure is proved to be the unique correct one under five axioms (Theorem B.1).

Calibrated uncertainty as a first-class decision variable. Conventional systems treat uncertainty as noise to be filtered before the decision is made. The utility framework treats it as an explicit input: $C$ tracks the agent's self-assessed reliability, and $C_{\min}(f)$ creates a hard gate below which the agent abstains. An autonomous vehicle uncertain about road conditions does not guess — it declines to execute the manoeuvre. A grid management agent uncertain about demand forecasts escalates to a human rather than committing a large generation change. In safety-critical domains this property is not a refinement; it is the primary design requirement, and it falls out of the utility architecture by construction.

The field-specific weight vectors and minimum competence thresholds make both advantages domain-calibrated rather than generic. The same architecture that tolerates more risk in dynamic pricing ($C_{\min}=0.60$) demands near-certainty in grid management ($C_{\min}=0.95$) — not through separate codebases, but through the same utility function with different field parameters derived from domain-specific liability and safety standards (§5).

C.8.1 Cross-Session Memory and Error Non-Repetition

Conventional AI systems are stateless between sessions — the same vehicle makes the same route-planning mistake on Tuesday that it made on Monday. The assertions store gives the utility-governed agent persistent, decay-weighted memory: the autonomous vehicle that learned "high merge conflict at junction X at 17:30–18:30" on Monday applies that prior on Tuesday without being retrained. The three-layer learning architecture (§8) means errors detected today actively suppress the same errors tomorrow through DPO-weighted correction injection. For a drone that has learned a particular flight corridor has unpredictable wind shear, that knowledge persists across flights and informs future route selection automatically. No conventional rule-based or single-session system has this property without explicit manual rule entry.

C.8.2 Interpretable Decision Rationale — Regulatory Readiness

Because $U = w_s \cdot s + w_e \cdot e + w_c \cdot c$ with named, weighted components, the agent can always produce a human-readable decision audit: "Option B was selected because its safety score (0.95) with weight 0.70 outweighed the 32% efficiency advantage of Option A at weight 0.20. Total utility: 0.845 vs 0.640." Neural systems cannot produce this audit by construction. For regulated domains — aviation certification (DO-178C), automotive safety (ISO 26262), medical device approval (FDA 510k), financial algorithm compliance (MiFID II) — interpretability is a regulatory requirement, not a design preference. The utility function provides it by construction, with no additional instrumentation required.

C.8.3 Principled Abstention at the Competence Boundary

The $C_{\min}(f)$ gate is not a confidence hedge — it is a formal model of the agent's own competence boundary. Below the threshold the agent does not produce a low-confidence answer; it categorically refuses to act and escalates. An autonomous vehicle uncertain about road conditions does not slow down and proceed cautiously — it refuses the lane change and signals the driver. A grid management agent uncertain about demand forecasts does not make a smaller generation adjustment — it escalates to a human operator. Neural systems produce output regardless of internal reliability; there is no formal mechanism preventing generation of a confident answer on a query the model has never reliably encountered. For safety-critical systems, the distinction between "lower-quality output" and "no output with escalation" is the difference between a well-designed system and a dangerous one.

C.8.4 Principled Cross-Domain Deployment from a Single Architecture

The same codebase deploys across all five domains in this appendix by changing only the field configuration — the weight vector, penalty multiplier $\mu(f)$, and competence threshold $C_{\min}(f)$. These parameters are not arbitrary engineering choices: they are derived from domain-specific liability standards (§5.1) — medical malpractice thresholds, ICAO Annex 13 aviation certification, ISO 26262 automotive safety classifications. A regulatory body auditing the autonomous vehicle deployment can evaluate the weight choices against existing automotive safety standards rather than treating them as opaque model internals. Cross-domain regulatory compliance with a unified, auditable parameterisation is a structural property of the architecture, not an add-on.

C.8.5 Additional Structural Advantages

AdvantageMechanismFormal groundingContrast with conventional systems
Curiosity with guaranteed exploitation dominance Exploration bonus capped at 50% of total utility; cap tightens automatically as competence falls Proposition B.6 — proved exactly ε-greedy and UCB have no domain-calibrated competence-relative guarantee; may explore when the agent is already unreliable
VCG-aligned multi-agent coordination Clarke pivot transfers make truthful capability reporting each agent's dominant strategy; socially optimal task allocation follows Theorem S1 (truthfulness), Theorem S2 (POA = 1) Conventional multi-agent systems suffer strategic misreporting; agents overstate capabilities to receive preferred assignments; no formal coordination guarantee without mechanism design
Predictable personality stability — safety certifiable Behavioural tendencies evolve within formally proved bounds; field-specific floors and ceilings enforced by projection Theorem B.7 — Lyapunov stable, stays in field bounds under bounded drift Neural systems offer no formal guarantee against post-deployment behavioural shift; "generally behaves well" is not a safety case statement

← Results (§§11–14) Table of Contents Appendix A →
Praneeth Tota · Ph.D. Computer Science (Algorithmic Game Theory) · Illinois Institute of Technology
praneethtota.github.io · Whitepaper: CC BY 4.0
Home · GitHub
AUA Framework v1.0.0