improve timeout behavior#738
Conversation
standard API times out in 10s, and SIP dial APIs defaults to 30 we were not using a timeout correctly, which could affect retry behavior
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
| if create.wait_until_answered: | ||
| # Dialing a phone and waiting for an answer takes longer than a | ||
| # normal call, and the request must outlast ringing. | ||
| client_timeout = aiohttp.ClientTimeout(total=_dial_timeout(timeout, create)) | ||
| elif timeout: | ||
| # obey user specified timeout | ||
| client_timeout = aiohttp.ClientTimeout(total=timeout) |
There was a problem hiding this comment.
🚩 Timeout precedence change for create_sip_participant when both timeout and wait_until_answered are set
The old code at livekit-api/livekit/api/sip_service.py:788-794 (base) checked if timeout: first, always honoring the user's explicit timeout even when wait_until_answered=True. The new code at livekit-api/livekit/api/sip_service.py:788-796 checks if create.wait_until_answered: first, which means an explicit user timeout is now silently raised to the ringing floor (~32s) if it's shorter. This is a breaking behavioral change for callers who previously set e.g. timeout=5 with wait_until_answered=True to quickly abort a dialing attempt. The dial_timeout docstring documents this ('A shorter one is raised to the floor'), but existing callers may not expect it.
Was this helpful? React with 👍 or 👎 to provide feedback.
| _pin_ringing_timeout(transfer) | ||
| client_timeout = aiohttp.ClientTimeout(total=_dial_timeout(timeout, transfer)) |
There was a problem hiding this comment.
📝 Info: transfer_sip_participant always applies dial timeout regardless of whether ringing occurs
At livekit-api/livekit/api/sip_service.py:833-834, pin_ringing_timeout and dial_timeout are applied unconditionally for every transfer, unlike create_sip_participant which only applies them when wait_until_answered=True. This is reasonable since TransferSIPParticipantRequest has no wait_until_answered field (the proto definition confirms this) and a transfer inherently dials a new destination. However, this means every transfer gets a minimum ~32s per-call timeout, which combined with failover (3 attempts) could result in ~99s worst case. This is by design but worth noting for users who might be surprised by a long-running transfer_sip_participant call.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
this is expected. the session timeout was previously too long
| DialRequest = Union[ | ||
| CreateSIPParticipantRequest, | ||
| TransferSIPParticipantRequest, | ||
| AcceptWhatsAppCallRequest, | ||
| ] |
There was a problem hiding this comment.
🚩 DialWhatsAppCallRequest not covered by dial timeout despite having ringing_timeout
The DialRequest union in livekit-api/livekit/api/_dial_timeout.py:10-14 includes CreateSIPParticipantRequest, TransferSIPParticipantRequest, and AcceptWhatsAppCallRequest, but excludes DialWhatsAppCallRequest. The proto definition confirms DialWhatsAppCallRequest has a ringing_timeout field but no wait_until_answered field, which suggests it returns immediately (the server handles ringing asynchronously). With the new 10s default session timeout, this should be fine if the server returns quickly, but worth confirming that DialWhatsAppCall doesn't block server-side for the ringing duration.
Was this helpful? React with 👍 or 👎 to provide feedback.
standard API times out in 10s, and SIP dial APIs defaults to 30
we were not using a timeout correctly, which could affect retry behavior