How the Adaptive Utility Agent framework provides the adaptive orchestration layer that current EMS and DERMS architectures are missing — a single formula that balances grid stability, cost, and carbon across every operating state, with confidence-gated escalation when forecasts are unreliable.
On this page
The energy transition is producing a grid that existing AI approaches were not designed for. By 2025, renewables exceed one-third of global generation mix — solar, wind, and battery storage whose output is variable, distributed, and often bidirectional. At the same time, demand is becoming more complex: EVs charging at unpredictable times, rooftop solar turning consumers into prosumers, and smart appliances that can shed or shift load on millisecond timescales.
Current approaches address pieces of this problem but not the whole:
The framework provides the orchestration layer that resolves this: a single utility function that holds all competing objectives simultaneously, with weights that shift automatically with grid state, and a confidence gate that escalates to human operators when the system's forecasts are too uncertain to act on.
The energy grid operates in fundamentally different modes — and the optimal decision changes completely between them. A battery storage dispatch decision that is correct under normal load may be dangerously wrong during a frequency deviation event. The framework handles this with a single formula whose weights shift with detected grid state:
U = w_σ(f) · Stability + w_c(f) · Cost + w_r(f) · Renewable utilization f — grid state (normal, demand surge, frequency deviation, renewable surplus...) Decision rule: act if C ≥ C_min(f) escalate to human operator otherwise
These are worked examples from §2.3–2.4 of the full whitepaper:
Battery storage preferred over gas peaker. Demand response incentivized for off-peak shifting. Cost and stability in balance. From §2.4.
Stability rises to w_σ=0.80. Gas peaker becomes optimal even though it costs more — grid stability now dominates. Decision flips from battery/renewable to peaker dispatch. From §2.4.
High solar/wind output threatening curtailment. Renewable utilization weight rises — battery charging, EV charging, and flexible industrial loads are dispatched to absorb surplus. No separate "curtailment avoidance rule" required.
Cost weight rises from 0.40 to 0.65 during peak pricing event. Appliance scheduling shifts to off-peak automatically. From §2.3.
One formula, all grid states. The gas peaker dispatch example from the whitepaper is the clearest demonstration: under normal load, battery/renewable is optimal; under demand surge, the gas peaker becomes optimal. The decision flips because w_σ shifts from 0.40 to 0.80 — not because a separate "surge mode program" is activated. No new rules are written when a new grid condition is encountered; a new weight profile is added and the classifier detects the condition.
From §2.4 of the full whitepaper: "The C_min=0.95 gate under surge conditions ensures the agent escalates to a human operator when demand forecasts are unreliable, rather than committing a large generation decision under uncertainty."
This is the most operationally important property the framework adds to grid management. Current ML forecasting models have no mechanism to detect when they are operating outside their training distribution and escalate accordingly. They produce a forecast number regardless of whether that number is reliable. A grid operator relying on an ML forecast during an extreme weather event has no signal from the model that the forecast may be substantially wrong.
Grid state C_min Behavior below threshold ────────────────────────────────────────────────────────────────────── Normal operating conditions 0.70 Hold prior dispatch; log uncertainty Peak demand / high load 0.85 Reduce automated dispatch; alert ops Demand surge / freq deviation 0.95 Halt automated decisions; human required Extreme event / outlier data 0.95 Mandatory human override; do not dispatch
The 0.95 threshold under surge conditions means the system will only commit a large generation dispatch decision — switching a gas peaker on, shedding industrial load, committing battery discharge — when its internal consistency on the demand forecast is extremely high. When it is not, it escalates to a human operator with the full context: current demand signal, forecast uncertainty, which specialist produced the estimate, and what the conflicting signals are.
Why this matters for grid operators. A wrong dispatch decision during a surge event can cascade: over-committing battery discharge leaves nothing for the next frequency deviation; under-dispatching a peaker leaves frequency unsupported. The confidence gate is not a performance penalty — it is the mechanism that prevents the AI layer from making large, hard-to-reverse decisions under conditions where it is genuinely uncertain. The human operator who receives the escalation is better positioned to act on a structured uncertainty report than on a confident forecast that happens to be wrong.
At the residential edge, the framework operates as a Home Energy Management System (HEMS) that learns from occupant behavior and adapts to grid signals without requiring manual reconfiguration.
During a peak pricing event (detected from utility time-of-use signal), the cost weight rises from 0.40 to 0.65 automatically. Appliance scheduling shifts to off-peak: dishwasher, laundry, EV charging, and battery pre-conditioning are rescheduled to the lowest-cost window. No occupant action required; no rule for "peak pricing behavior" needed.
From the whitepaper: "When the occupant signals an explicit preference, the system defers — not by overriding the utility function, but by activating a comfort-override weight profile where w_c=0.75." The comfort override is a weight profile shift, not a rule exception. The formula continues to govern; the weights reflect what the occupant currently values.
Normal conditions (cheap overnight rate):
w_cost=0.40, w_comfort=0.40, w_renew=0.20
→ charge at full rate overnight, minimize cost
Renewable surplus (midday solar peak):
w_cost=0.25, w_comfort=0.25, w_renew=0.50
→ shift charging to solar peak window
→ battery absorbs surplus, reduces curtailment
Peak demand event:
w_cost=0.65, w_comfort=0.25, w_renew=0.10
→ defer charging to post-peak window
→ optionally participate in V2G if SOC sufficient
The assertions store accumulates occupant-specific usage patterns: typical wake time, preferred evening temperature, recurring appliance schedules. Future decisions inject these patterns automatically without the occupant needing to reconfigure schedules. When patterns change — a new EV, a work-from-home schedule shift, a seasonal change — the correction loop detects the deviation from expected consumption and updates the priors within a few calibration cycles.
At grid scale, the framework operates as the decision layer within a Distributed Energy Resource Management System (DERMS) or Virtual Power Plant (VPP) aggregator, coordinating millions of DERs as a unified grid participant.
A VPP aggregates distributed assets — residential batteries, commercial HVAC, EV fleets, industrial flexible loads — and dispatches them as a single grid resource. The orchestration challenge is that each asset has its own constraints (battery SOC, occupant comfort, EV departure time) and the dispatch decision must satisfy all of them simultaneously while meeting the grid service commitment. This is exactly the multi-objective, context-adaptive decision problem the framework is designed for.
When frequency deviates from 50/60Hz, stability weight rises to 0.80+. Battery assets with sufficient SOC are dispatched in priority order by confidence of available capacity. Assets below confidence threshold are excluded from dispatch — better to under-deliver on ancillary services than to dispatch unreliable assets.
Renewable surplus shifts the renewable utilization weight upward. Flexible loads (EV charging, heat pumps, industrial processes) are dispatched to absorb surplus in the window before curtailment occurs. The assertions store accumulates which assets reliably respond — unreliable assets are progressively down-weighted in future dispatch.
Under wholesale market price signals, cost weight rises. The system identifies the optimal bid quantity given current portfolio confidence — not the maximum technically available, but the maximum reliably deliverable. Confidence-gated dispatch prevents committing capacity that may not actually be available.
Assertions store accumulates SOC degradation patterns per battery asset. Assets showing inconsistent capacity delivery — indicating battery aging or BMS errors — are flagged by the contradiction detector and their confidence ratings updated downward. Dispatch prioritizes the highest-confidence assets for critical grid services.
The clearest illustration of the framework's value in grid management is the gas peaker vs. battery storage decision under different grid states. From §2.4 of the full whitepaper:
Normal load:
w_σ=0.40, w_cost=0.40, w_renew=0.20
→ Battery storage + demand response optimal
→ Gas peaker offline (expensive, high-carbon)
→ Formula selects: battery dispatch
Demand surge (w_σ rises to 0.80):
w_σ=0.80, w_cost=0.15, w_renew=0.05
→ Gas peaker becomes optimal
→ Battery reserved for frequency support
→ Formula selects: peaker dispatch
Result: same formula, different weights → correct decision in each context.
No "surge mode program". No separate dispatch rules for each grid state.
Current EMS architectures require either separate rule sets for each operating state (which become inconsistent and hard to maintain as grid conditions evolve) or a single ML model that must somehow learn to behave differently across all operating states without explicit objective differentiation. The framework resolves this structurally: the objective function is explicit, the weights encode what matters in each state, and the transition between states is governed by the field classifier detecting the current grid condition.
The renewable utilization weight can be configured to reflect a carbon intensity signal — rising when grid carbon intensity is high (lots of gas/coal on the margin) and falling when carbon intensity is low (lots of solar/wind). Under a high-carbon-intensity signal, the framework shifts toward demand reduction and storage — consuming less grid power when it is dirtiest. Under low carbon intensity, the framework shifts toward demand acceleration — charging EVs, running flexible industrial loads, absorbing renewable surplus. No carbon-specific rules required; the renewable utilization weight carries the signal.
Current ML forecasting models produce forecast errors that are not fed back into the model between retraining cycles. The same forecast error — underestimating Monday morning demand ramp, overestimating holiday load reduction — repeats daily until the next quarterly retraining cycle.
The framework's correction loop changes this at two timescales:
When a demand forecast diverges significantly from actual demand (a detectable contradiction between predicted and realized values), the correction is stored in the assertions store and injected into the next forecast session. "Forecast for this node type underestimates morning ramp by ~8% on cold weekdays — apply upward adjustment." The next forecast on a cold weekday morning immediately applies the correction without waiting for a retraining cycle.
Accumulated forecast errors become DPO training pairs — (corrected forecast, original forecast) — weighted by the field penalty multiplier for energy decisions. For high-stakes grid decisions (frequency response commitment, peak demand dispatch), the penalty multiplier is high and the forecast correction is trained against harder. The model's weights improve between quarterly retraining cycles rather than only at them.
The practical value for grid operators. Existing AI-based forecasting research reports 20–30% RMSE reduction from better model architectures. The correction loop adds a different kind of improvement: instead of reducing initial forecast error, it eliminates repeated forecast error patterns — errors the model makes consistently across similar conditions. A 69.6% reduction in repeated errors (the framework's simulation result on software engineering, directly analogous) applied to demand forecasting would mean that a recurring Monday morning underestimate that happened 10 times last month would happen 3 times next month, improving further each cycle.
AUA v1.0 handles field routing with stability weights, safety abstention policy (c_min=0.95 halts automated dispatch), and a full audit log. SCADA and DER integration is via the AUA REST API.
pip install adaptive-utility-agent
aua init my-energy-systems-agent --preset math --tier macbook cd my-energy-systems-agent aua doctor
# aua_config.yaml
specialists:
- name: mathematics
model: qwen-coder-7b-awq
port: 11434
field: mathematics
safety:
abstention_enabled: false
require_arbiter_for_high_risk: true
min_confidence_for_direct_answer: 0.75
security:
encryption: {enabled: true, key_secret: AUA_ENCRYPTION_KEY}
audit:
enabled: true
hash_chain: true
Generate your encryption key: python3 -c "import os; print(os.urandom(32).hex())" or openssl rand -hex 32 — 64-char hex string. See Tutorial §12.4 for key management.
aua serve
curl -X POST http://localhost:8000/query \
-H "Authorization: Bearer $AUA_TOKEN" \
-H "Content-Type: application/json" \
-d '{"prompt": "...", "session_id": "demo"}'
| AUA v1.0 provides | You bring |
|---|---|
| Multi-specialist routing + utility scoring | Domain-specific specialist models |
| Arbiter + contradiction detection | Domain-specific quality criteria |
| Correction loop + DPO pair export | Fine-tuning infrastructure (TRL, Axolotl, …) |
| Blue-green deployment + rollback | Evaluation datasets for your domain |
| Append-only audit log with hash chain | SCADA / DER system integration |
| Prometheus + Grafana + OTEL | Your monitoring infrastructure |
Full instructions: AUA Tutorial · Framework v1.0 · GitHub ↗