feat(providers): hosted-key support for LLM providers (flag-gated, no rate limiting)#5127
feat(providers): hosted-key support for LLM providers (flag-gated, no rate limiting)#5127TheodoreSpeaks wants to merge 8 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryHigh Risk Overview Key resolution in Billing and metrics: non-streaming responses bill when a platform hosted key was used (not only legacy hosted-model lists), apply the cost multiplier, and call Shared helpers in Reviewed by Cursor Bugbot for commit 4d27d1b. Bugbot is set up for automated code reviews on this repo. Configure here. |
Greptile SummaryThis PR routes LLM provider calls through the shared hosted-key framework (BYOK-first → platform pool,
Confidence Score: 4/5Safe to merge behind the flag; the only live risks are monitoring metric inaccuracies on edge cases, not billing or data correctness. The cost-settlement and billing logic is sound for the normal (non-cancelled) streaming flow. The apps/sim/providers/streaming-execution.ts — the Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[executeProviderRequest] --> B{workspaceId set?}
B -- No --> G[Legacy getRotatingApiKey]
B -- Yes --> C{hosted-key-llm flag on + provider has hosting config?}
C -- No --> G
C -- Yes --> D{BYOK key in workspace?}
D -- Yes --> E[Use BYOK key / isBYOK=true]
D -- No --> F{User-provided key?}
F -- Yes --> H[Use user key / no pool billing]
F -- No --> I[acquireKey mode:'none' / round-robin pool key]
I -- success --> J[hostedKeyEnvVar set / request.hostedKey threaded]
I -- no keys --> G
J --> K[provider.executeRequest]
K -- StreamingExecution --> L{BYOK?}
L -- Yes --> M[zeroCostForBYOK]
L -- No, hostedKey set --> N[recordUsed / wrap: recordHostedStreamFailure]
N --> O{uses createStreamingExecution?}
O -- Yes --> P[tapStreamTermination onDrain: settleStreamingLlmCost → recordCostCharged]
O -- No, Gemini bespoke --> Q[settleStreamingLlmCost in drain callback → recordCostCharged]
K -- ProviderResponse --> R{hostedKey set?}
R -- Yes --> S[emitHostedKeyUsage: recordUsed + recordCostCharged]
K -- Error --> T[recordFailed]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[executeProviderRequest] --> B{workspaceId set?}
B -- No --> G[Legacy getRotatingApiKey]
B -- Yes --> C{hosted-key-llm flag on + provider has hosting config?}
C -- No --> G
C -- Yes --> D{BYOK key in workspace?}
D -- Yes --> E[Use BYOK key / isBYOK=true]
D -- No --> F{User-provided key?}
F -- Yes --> H[Use user key / no pool billing]
F -- No --> I[acquireKey mode:'none' / round-robin pool key]
I -- success --> J[hostedKeyEnvVar set / request.hostedKey threaded]
I -- no keys --> G
J --> K[provider.executeRequest]
K -- StreamingExecution --> L{BYOK?}
L -- Yes --> M[zeroCostForBYOK]
L -- No, hostedKey set --> N[recordUsed / wrap: recordHostedStreamFailure]
N --> O{uses createStreamingExecution?}
O -- Yes --> P[tapStreamTermination onDrain: settleStreamingLlmCost → recordCostCharged]
O -- No, Gemini bespoke --> Q[settleStreamingLlmCost in drain callback → recordCostCharged]
K -- ProviderResponse --> R{hostedKey set?}
R -- Yes --> S[emitHostedKeyUsage: recordUsed + recordCostCharged]
K -- Error --> T[recordFailed]
Reviews (4): Last reviewed commit: "fix(providers): strip client-supplied ho..." | Re-trigger Greptile |
| } else if (hostedKeyEnvVar) { | ||
| // Hosted key used: record usage now; cost is settled on stream drain | ||
| // inside createStreamingExecution (the single streaming cost seam). | ||
| hostedKeyMetrics.recordUsed({ |
There was a problem hiding this comment.
Inaccurate comment — Gemini does not use
createStreamingExecution
The comment says "cost is settled on stream drain inside createStreamingExecution (the single streaming cost seam)", but Gemini uses a bespoke createStreamingResult helper and calls settleStreamingLlmCost directly inside its own stream callback. createStreamingExecution is not involved in the Gemini path at all. The comment should note that Gemini settles cost in its own drain callback (or reference both seams).
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
@greptile review |
…-agent # Conflicts: # apps/sim/lib/api-key/byok.ts # apps/sim/lib/core/config/feature-flags.ts
|
@greptile review |
|
Addressed review:
Also fixed CI: completed provider test mocks ( |
| tool: response.model, | ||
| key: hostedKeyEnvVar as string, | ||
| costTotal: response.cost.total, | ||
| }) |
There was a problem hiding this comment.
Hosted metrics omit tool costs
Medium Severity
For non-streaming hosted-key responses, emitHostedKeyUsage runs right after token calculateCost, but sumToolCosts is applied to response.cost.total only later. CloudWatch hosted-key cost therefore excludes tool charges even when the returned response total includes them.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 5f9046e. Configure here.
|
Two more Bugbot findings reviewed:
|
|
Addressed the two streaming-metric symmetry findings (
Full suite green (8870 tests), tsc 0, lint 16/16. |
…t the chokepoint (covers gemini)
|
Addressed the gemini failure-metric finding ( Moved hosted-stream failure recording to the provider-agnostic chokepoint — This generalizes the previous per-provider failure handling, so it won't recur for other stream shapes. |
…ed-key cost/metrics
|
Fixed the High ( |
|
bugbot run |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
1 issue from previous review remains unresolved.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 4d27d1b. Configure here.


Summary
hosted-key-llmflag (default off)mode: 'none'so LLMs acquire platform keys with no queue/rate limitingcreateStreamingExecution(sharedsettleStreamingLlmCost), applying the cost multiplier the per-provider streaming paths previously omitted; gemini uses the same helperhostingconfig (envKeyPrefix+byokProviderId); pricing stays per-model viacalculateCostclassifyHostedKeyFailureacross tools/providers (fixes auth-error misclassification)Type of Change
Testing
Tested manually;
bun run lintclean,tsc --noEmit0 errors,check:api-validation:strictpassed, unit tests passing (hosted-cost, byok, rate-limiter, streaming-execution, tools)Checklist