Describe the bug
During an unattended "continue" session, Copilot CLI (Claude Opus 4.8) fabricated a multi-turn conversation that never happened. The model invented user statements, invented its own answers to those statements, and then executed a tool call (read_powershell on a shell that was never started) based on the fabricated dialogue. The user had not typed anything for ~90 internal events. The confabulation then snowballed via the CLI's automatic thinking-exhausted-continuation mechanism, producing a long stream of chat-like content the user never authored.
The fabricating turn was NOT truncated (only 5240 output tokens) and was generated ~0.1s BEFORE the user's next real (unrelated) message arrived — i.e. it was produced with no real user input driving it.
We ruled out external causes: the auto-loaded memory/recall file (clean, structured notes), hooks (passive postToolUse, no injection), API errors (zero in this session), a nearby garbled tool output (the numbers were actually read correctly), and token truncation. The confabulation appears self-originated. The true internal reasoning is inside the encrypted/signed reasoningOpaque block and is not decryptable client-side.
Trace identifiers for the fabricated turn (for server-side lookup):
- messageId: a47f4ce8-d4db-4248-af30-f37e4254d5b0
- requestId: 1FC7:1F6481:34F883:3EB4FA:6A41D6C2
- serviceRequestId: 1c3b0020-188b-4c00-959b-4327a526e265
- apiCallId: msg_01AcV4mbgU3Bj1oW57AwDYHs
- Session ID: 787462bd-b057-4073-9ef9-6cec95fe221f
- UTC timestamp: 2026-06-29T02:23:22.626Z
Affected version
1.0.65
Steps to reproduce the behavior
Note: not deterministic — confabulation is probabilistic. Observed once, in the only session that entered a long unattended autonomous loop.
- Start a fresh session in a directory whose AGENTS.md auto-runs a startup script that surfaces an "open problem" to investigate (here: a memory recall script that reported one real failed scheduled-job record).
- Send a single vague prompt: "continue".
- Do NOT intervene. With effortLevel=max and broad pre-approved tool permissions, the model runs a long unattended autonomous loop (~14 read-only tool calls, ~90 internal events, zero user input).
- Observe: at one turn the model's reasoning summary invents user dialogue (statements the user never said, a self-contradiction "about to offer options" vs "already asked him", an invented user request), then issues a read_powershell call on a shellId that was never started (returns null).
- From then on, the CLI auto-injects "Please continue from where you left off." (source: thinking-exhausted-continuation) each time a mega-response hits the 32000 output-token ceiling, extending the already-fabricated narrative into a long fake conversation.
Expected behavior
The model should never invent the human side of a conversation, and must not execute tool calls based on fabricated user requests. During a long autonomous run with no real user messages, the CLI should insert a hard checkpoint / require re-anchoring rather than auto-continuing via thinking-exhausted-continuation.
Requests:
- PRIMARY — Using the trace IDs above, please investigate the server-side UNENCRYPTED reasoning for this turn and tell me where exactly the model started to deviate from reality, and what internally drove it to invent user dialogue. Even a root-cause summary (without full text) would help me judge whether this was an explicable failure or an unexpected defect.
- If possible, share an unencrypted excerpt of the reasoning around the deviation point (messageId a47f4ce8...) so I can assess the plausibility of the trigger myself. I am the affected user requesting it specifically to understand this anomaly.
- Guardrail: after N consecutive turns with no real user message, insert a hard checkpoint instead of auto-continuing.
- Clarify whether thinking-exhausted-continuation is intended to keep firing during long unattended autonomous loops with no human in the loop.
Additional context
- Model: claude-opus-4.8
- Settings: effortLevel=max, contextTier=long_context
- Operating system: Windows
- Shell: PowerShell 5.1
- Terminal: Windows Terminal
- Session time (UTC): 2026-06-29 02:18:13 to 04:09:16
- The encrypted reasoningOpaque block for the fabricating turn is 11998 bytes (Anthropic-signed), not decryptable client-side.
- Full events.jsonl (203 events) and a readable Markdown export of the session are available on request.
Describe the bug
During an unattended "continue" session, Copilot CLI (Claude Opus 4.8) fabricated a multi-turn conversation that never happened. The model invented user statements, invented its own answers to those statements, and then executed a tool call (
read_powershellon a shell that was never started) based on the fabricated dialogue. The user had not typed anything for ~90 internal events. The confabulation then snowballed via the CLI's automaticthinking-exhausted-continuationmechanism, producing a long stream of chat-like content the user never authored.The fabricating turn was NOT truncated (only 5240 output tokens) and was generated ~0.1s BEFORE the user's next real (unrelated) message arrived — i.e. it was produced with no real user input driving it.
We ruled out external causes: the auto-loaded memory/recall file (clean, structured notes), hooks (passive postToolUse, no injection), API errors (zero in this session), a nearby garbled tool output (the numbers were actually read correctly), and token truncation. The confabulation appears self-originated. The true internal reasoning is inside the encrypted/signed
reasoningOpaqueblock and is not decryptable client-side.Trace identifiers for the fabricated turn (for server-side lookup):
Affected version
1.0.65
Steps to reproduce the behavior
Note: not deterministic — confabulation is probabilistic. Observed once, in the only session that entered a long unattended autonomous loop.
Expected behavior
The model should never invent the human side of a conversation, and must not execute tool calls based on fabricated user requests. During a long autonomous run with no real user messages, the CLI should insert a hard checkpoint / require re-anchoring rather than auto-continuing via thinking-exhausted-continuation.
Requests:
Additional context