Skip to content

feat(providers): hosted-key support for LLM providers (flag-gated, no rate limiting)#5127

Open
TheodoreSpeaks wants to merge 8 commits into
stagingfrom
fix/use-hosted-key-agent
Open

feat(providers): hosted-key support for LLM providers (flag-gated, no rate limiting)#5127
TheodoreSpeaks wants to merge 8 commits into
stagingfrom
fix/use-hosted-key-agent

Conversation

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator

Summary

  • Route hosted LLM/agent provider calls through the shared hosted-key framework (BYOK-first → platform key pool), gated by the hosted-key-llm flag (default off)
  • Add rate-limiter mode: 'none' so LLMs acquire platform keys with no queue/rate limiting
  • Emit hosted-key metrics (used/cost/failed) for streaming + non-streaming LLM calls, mirroring tools
  • Centralize streaming cost in createStreamingExecution (shared settleStreamingLlmCost), applying the cost multiplier the per-provider streaming paths previously omitted; gemini uses the same helper
  • Per-provider hosting config (envKeyPrefix + byokProviderId); pricing stays per-model via calculateCost
  • Share classifyHostedKeyFailure across tools/providers (fixes auth-error misclassification)

Type of Change

  • New feature

Testing

Tested manually; bun run lint clean, tsc --noEmit 0 errors, check:api-validation:strict passed, unit tests passing (hosted-cost, byok, rate-limiter, streaming-execution, tools)

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel

vercel Bot commented Jun 18, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 18, 2026 5:06pm

Request Review

@cursor

cursor Bot commented Jun 18, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Changes API key selection, billing gates, and streaming cost settlement across many providers; misconfiguration or flag rollout could affect charges, metrics, or which keys are used.

Overview
Adds a hosted-key-llm feature flag (env HOSTED_KEY_LLM, default off) that routes hosted LLM API key resolution through the same framework as tools: BYOK → user key → platform key pool, with mode: 'none' so LLMs skip FIFO queue and per-workspace rate limits.

Key resolution in getApiKeyWithBYOK uses per-provider hosting config (envKeyPrefix, byokProviderId) and returns hostedKeyEnvVar when a pool key is acquired; legacy getRotatingApiKey paths remain when the flag is off or no keys are configured.

Billing and metrics: non-streaming responses bill when a platform hosted key was used (not only legacy hosted-model lists), apply the cost multiplier, and call emitHostedKeyUsage. Streaming uses settleStreamingLlmCost on drain (including Gemini’s bespoke streams) and recordHostedStreamFailure for mid-stream errors. Client-supplied hostedKey is stripped in executeProviderRequest; the server sets it only after acquiring a pool key.

Shared helpers in hosted-cost.ts (calculateHostedCost, classifyHostedKeyFailure, emitHostedKeyUsage) replace duplicated tool logic and improve auth vs rate-limit error classification. The rate limiter probes _1.._N env keys when _COUNT is unset and refactors key selection into selectKey.

Reviewed by Cursor Bugbot for commit 4d27d1b. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread apps/sim/lib/api-key/byok.ts
@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR routes LLM provider calls through the shared hosted-key framework (BYOK-first → platform pool, mode:'none' bypass), gated by the hosted-key-llm feature flag. It centralizes streaming cost settlement in a new settleStreamingLlmCost helper, applies the platform cost multiplier uniformly across streaming and non-streaming paths, and emits used/cost/failed metrics for all hosted-key LLM calls.

  • Key resolution (byok.ts): BYOK key wins → user-supplied key → acquireKey(mode:'none') from the platform pool → legacy getRotatingApiKey fallback. The sanitizeRequest strip + server-side re-set pattern prevents client injection of hostedKey.
  • Streaming cost seam (streaming-execution.ts): tapStreamTermination wraps the base stream and calls settleStreamingLlmCost on drain (recomputes cost with the multiplier, emits recordCostCharged); recordHostedStreamFailure adds a second outer tap for mid-stream error recording. Gemini's bespoke stream paths call settleStreamingLlmCost directly in their own drain callbacks.
  • Shared utilities (hosted-cost.ts): classifyHostedKeyFailure, calculateHostedCost, and emitHostedKeyUsage consolidated from duplicated per-file copies; the classifier now handles message-embedded status codes that were previously misclassified.

Confidence Score: 4/5

Safe to merge behind the flag; the only live risks are monitoring metric inaccuracies on edge cases, not billing or data correctness.

The cost-settlement and billing logic is sound for the normal (non-cancelled) streaming flow. The tapStreamTermination.onDrain callback fires on client-initiated stream cancel per the WHATWG Streams spec (cancel resolves pending reads with done=true), emitting settleStreamingLlmCost with partially-accumulated tokens — a monitoring concern rather than a correctness or billing issue. The resolveEnvKeys probing loop also has no upper bound guard.

apps/sim/providers/streaming-execution.ts — the tapStreamTermination cancel-path behaviour warrants a second look; the comment claims cancel runs neither callback but the implementation does not actually guard onDrain from firing on cancel-induced done=true.

Important Files Changed

Filename Overview
apps/sim/providers/streaming-execution.ts New file adds tapStreamTermination, recordHostedStreamFailure, and settleStreamingLlmCost. Correctly sequences cost settlement after stream drain and separates failure recording, but the onDrain callback fires on client-initiated cancel (WHATWG Streams cancel resolves pending reads with done=true), causing partial-token cost metrics to be emitted on disconnect.
apps/sim/providers/index.ts Core orchestration layer updated to resolve hosted keys, thread hostedKey through the request (stripping any client-supplied value first), emit recordUsed on dispatch, wrap streams with recordHostedStreamFailure, and emit emitHostedKeyUsage for non-streaming responses. Logic is sound.
apps/sim/lib/api-key/byok.ts Adds the new getApiKeyWithBYOK hosted-key path: BYOK wins, then user-supplied key, then platform pool via acquireKey(mode:'none'). Falls back to legacy on flag-off or missing pool keys. Return type extended to ApiKeyResolution (backward-compatible).
apps/sim/lib/api-key/hosted-cost.ts New shared utility consolidating calculateHostedCost, classifyHostedKeyFailure, and emitHostedKeyUsage. The classification now handles message-embedded status codes. Well-tested and clean.
apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts Adds mode:'none' fast-path bypassing FIFO queue and per-actor buckets. The resolveEnvKeys auto-discovery probes _1.._N without an upper bound. The selectKey refactor correctly de-duplicates the round-robin logic.
apps/sim/providers/models.ts Adds ProviderHostingConfig interface and hosting entries for OpenAI, Anthropic, Google, Mistral, Fireworks, Together, Baseten, and Ollama Cloud. Structure is clean and getProviderHosting is straightforward.
apps/sim/providers/gemini/core.ts Gemini cost settlement migrated to settleStreamingLlmCost in all three streaming paths. Accumulated token approach is mathematically equivalent to the old additive cost; the hosted-key multiplier and metric now apply uniformly.
apps/sim/providers/openai/core.ts Both streaming paths now pass hostedKey and cached into createStreamingExecution, delegating cost settlement to tapStreamTermination.onDrain. Cost is now correctly settled on the tool-path.
apps/sim/tools/index.ts Migrated classifyHostedKeyFailure, calculateToolCost, and emitHostedKeyUsage to shared hosted-cost.ts helpers. Pure refactor; the new shared classifier also handles message-embedded codes the old version missed.
apps/sim/lib/core/rate-limiter/hosted-key/types.ts Adds NoRateLimit interface and mode:'none' to HostedKeyRateLimitMode. The EnforcedRateLimitConfig union correctly excludes none for internal rate-limited methods.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[executeProviderRequest] --> B{workspaceId set?}
    B -- No --> G[Legacy getRotatingApiKey]
    B -- Yes --> C{hosted-key-llm flag on + provider has hosting config?}
    C -- No --> G
    C -- Yes --> D{BYOK key in workspace?}
    D -- Yes --> E[Use BYOK key / isBYOK=true]
    D -- No --> F{User-provided key?}
    F -- Yes --> H[Use user key / no pool billing]
    F -- No --> I[acquireKey mode:'none' / round-robin pool key]
    I -- success --> J[hostedKeyEnvVar set / request.hostedKey threaded]
    I -- no keys --> G
    J --> K[provider.executeRequest]
    K -- StreamingExecution --> L{BYOK?}
    L -- Yes --> M[zeroCostForBYOK]
    L -- No, hostedKey set --> N[recordUsed / wrap: recordHostedStreamFailure]
    N --> O{uses createStreamingExecution?}
    O -- Yes --> P[tapStreamTermination onDrain: settleStreamingLlmCost → recordCostCharged]
    O -- No, Gemini bespoke --> Q[settleStreamingLlmCost in drain callback → recordCostCharged]
    K -- ProviderResponse --> R{hostedKey set?}
    R -- Yes --> S[emitHostedKeyUsage: recordUsed + recordCostCharged]
    K -- Error --> T[recordFailed]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[executeProviderRequest] --> B{workspaceId set?}
    B -- No --> G[Legacy getRotatingApiKey]
    B -- Yes --> C{hosted-key-llm flag on + provider has hosting config?}
    C -- No --> G
    C -- Yes --> D{BYOK key in workspace?}
    D -- Yes --> E[Use BYOK key / isBYOK=true]
    D -- No --> F{User-provided key?}
    F -- Yes --> H[Use user key / no pool billing]
    F -- No --> I[acquireKey mode:'none' / round-robin pool key]
    I -- success --> J[hostedKeyEnvVar set / request.hostedKey threaded]
    I -- no keys --> G
    J --> K[provider.executeRequest]
    K -- StreamingExecution --> L{BYOK?}
    L -- Yes --> M[zeroCostForBYOK]
    L -- No, hostedKey set --> N[recordUsed / wrap: recordHostedStreamFailure]
    N --> O{uses createStreamingExecution?}
    O -- Yes --> P[tapStreamTermination onDrain: settleStreamingLlmCost → recordCostCharged]
    O -- No, Gemini bespoke --> Q[settleStreamingLlmCost in drain callback → recordCostCharged]
    K -- ProviderResponse --> R{hostedKey set?}
    R -- Yes --> S[emitHostedKeyUsage: recordUsed + recordCostCharged]
    K -- Error --> T[recordFailed]
Loading

Reviews (4): Last reviewed commit: "fix(providers): strip client-supplied ho..." | Re-trigger Greptile

Comment thread apps/sim/providers/index.ts Outdated
Comment on lines +231 to +234
} else if (hostedKeyEnvVar) {
// Hosted key used: record usage now; cost is settled on stream drain
// inside createStreamingExecution (the single streaming cost seam).
hostedKeyMetrics.recordUsed({

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inaccurate comment — Gemini does not use createStreamingExecution

The comment says "cost is settled on stream drain inside createStreamingExecution (the single streaming cost seam)", but Gemini uses a bespoke createStreamingResult helper and calls settleStreamingLlmCost directly inside its own stream callback. createStreamingExecution is not involved in the Gemini path at all. The comment should note that Gemini settles cost in its own drain callback (or reference both seams).

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

…-agent

# Conflicts:
#	apps/sim/lib/api-key/byok.ts
#	apps/sim/lib/core/config/feature-flags.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Addressed review:

  • Bugbot (user key bypassed by pool): getApiKeyWithBYOK now returns a user-provided request.apiKey (not billed) before acquiring a platform pool key, matching tool hosted-key precedence (5f9046e57).
  • Greptile (inaccurate comment): corrected the streaming-cost comment to note gemini settles via its bespoke finalizer, not createStreamingExecution.

Also fixed CI: completed provider test mocks (@/providers/utils isCachedInput; @/lib/core/config/env getEnv/isTruthy/isFalsy) that the new streaming wiring exposed. Full suite green locally (8868 tests).

Comment thread apps/sim/providers/openai/core.ts
tool: response.model,
key: hostedKeyEnvVar as string,
costTotal: response.cost.total,
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hosted metrics omit tool costs

Medium Severity

For non-streaming hosted-key responses, emitHostedKeyUsage runs right after token calculateCost, but sumToolCosts is applied to response.cost.total only later. CloudWatch hosted-key cost therefore excludes tool charges even when the returned response total includes them.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5f9046e. Configure here.

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Two more Bugbot findings reviewed:

  • High — tool streaming skips hosted settlement: valid and fixed (8d6b768c9). The post-tool streaming paths (createStream: ({ output }) => …) never call finalizeTiming(), so hooking settlement there missed them. Moved settlement to fire on actual stream drain in createStreamingExecution (wraps the returned stream; single point, covers simple + post-tool paths for every provider). Added a regression test that a stream which never calls finalizeTiming still settles cost + emits recordCostCharged on drain.

  • Medium — hosted metrics omit tool costs: by design, not a bug. hostedKeyMetrics.recordCostCharged for an LLM call is the LLM token cost (label tool: <model>). Tool costs are emitted separately by each hosted tool's own path (label tool: <tool.id>); folding sumToolCosts into the LLM metric would double-count hosted tools and mis-attribute cost to the model dimension. Streaming and non-streaming are consistent here. response.cost.total (LLM+tools) remains the billing/trace total — that's a separate channel from this CloudWatch observability metric.

Comment thread apps/sim/providers/index.ts Outdated
Comment thread apps/sim/providers/streaming-execution.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Addressed the two streaming-metric symmetry findings (ed64de979):

  • The stream-drain wrapper now records a hosted-key failure on mid-stream error (via classifyHostedKeyFailure), so a failed hosted stream emits recordFailed to match the recordUsed fired at dispatch. A client cancel runs neither (it's an abort, not a key failure), and cost is intentionally not settled on failure (no charge for a failed/partial response). Added a regression test for the error path.
  • Net streaming semantics: recordUsed = attempt, recordCostCharged = successful completion, recordFailed = mid-stream error — so completed = Used − Failed (no silent over-count).

Full suite green (8870 tests), tsc 0, lint 16/16.

Comment thread apps/sim/providers/gemini/core.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Addressed the gemini failure-metric finding (c96a2f80c) at the root rather than per-provider:

Moved hosted-stream failure recording to the provider-agnostic chokepoint — executeProviderRequest now wraps the returned stream with recordHostedStreamFailure, so a mid-stream error emits recordFailed for every provider, including gemini and any other bespoke (non-createStreamingExecution) stream. Cost-on-success stays settled per-provider; createStreamingExecution now only handles the cost-drain leg. Added/updated tests for the wrapper's success and error paths.

This generalizes the previous per-provider failure handling, so it won't recur for other stream shapes.

Comment thread apps/sim/providers/index.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Fixed the High (4d27d1bad): hostedKey is server-only, so sanitizeRequest now strips any client-supplied value up front. executeProviderRequest is the sole authority that sets it — only when it actually acquires a platform pool key — so a client-supplied hostedKey on a BYOK/user-key request can no longer reach streaming settlement (no bogus cost multiplier or hosted-key metrics). Added a regression test asserting the provider never receives a client-supplied hostedKey.

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

bugbot run

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

1 issue from previous review remains unresolved.

Fix All in Cursor

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 4d27d1b. Configure here.

@waleedlatif1 waleedlatif1 deleted the branch staging July 1, 2026 05:43
@waleedlatif1 waleedlatif1 reopened this Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants