Skip to content

Pull requests: vllm-project/vllm

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[Bugfix][Frontend] Include usage in non-streaming GenerateResponse bug Something isn't working frontend
#47247 opened Jul 1, 2026 by aoshen02 Collaborator Loading…
4 tasks
add sycl path for Mhc
#47245 opened Jul 1, 2026 by xiaolong-intel Draft
4 tasks
[Core] Make sleep-mode backend capability flags communicator-agnostic ready ONLY add when PR is ready to merge/full CI is needed v1
#47243 opened Jul 1, 2026 by matteso1 Contributor Loading…
[CI/Build] Fix LoRA testing ready ONLY add when PR is ready to merge/full CI is needed
#47242 opened Jul 1, 2026 by jeejeelee Member Loading…
4 tasks
[do not merge][test only][xpu][ci]Debug Intel B50 agent ci/build intel-gpu Related to Intel GPU
#47240 opened Jul 1, 2026 by zxd1997066 Contributor Draft
4 tasks
[BugFix][Spec Decode] Compact shared topk indices buffer after first MTP draft step bug Something isn't working deepseek Related to DeepSeek models ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding v1
#47238 opened Jul 1, 2026 by TheEpicDolphin Collaborator Loading…
Revert "Remove more unnecessary load_weights methods" (#47058) deepseek Related to DeepSeek models llama Related to Llama models mistral Related to Mistral models multi-modality Related to multi-modality (#4194) qwen Related to Qwen models speculative-decoding
#47233 opened Jul 1, 2026 by vllm-agent Contributor Draft
Revert "[Platform] Replace torch.cuda.mem_get_info with torch.accelerator.get_memory_info" (#44825) cpu Related to CPU backends intel-gpu Related to Intel GPU kv-connector multi-modality Related to multi-modality (#4194) nvidia v1
#47232 opened Jul 1, 2026 by vllm-agent Contributor Draft
[XPU][CI] Add tests/v1/e2e/general/test_correctness_sliding_window.py in Intel GPU CI ci/build intel-gpu Related to Intel GPU
#47231 opened Jul 1, 2026 by zxd1997066 Contributor Loading…
4 tasks done
fix(serve): return HTTP 422 instead of 500 for image/media URL fetch errors frontend gpt-oss Related to GPT-OSS models multi-modality Related to multi-modality (#4194)
#47230 opened Jul 1, 2026 by aoright Loading…
[DSV4] Better MXFP8 quantization kernel ready ONLY add when PR is ready to merge/full CI is needed
#47229 opened Jul 1, 2026 by zyongye Member Loading…
fix(deepseek_v4): resolve auto kv-cache-dtype to fp8_ds_mla on SM120 deepseek Related to DeepSeek models nvidia
#47228 opened Jul 1, 2026 by hclsys Contributor Loading…
[Doc] add AI Runway to integrations documentation Improvements or additions to documentation
#47227 opened Jul 1, 2026 by robert-cronin Loading…
4 tasks done
[LoRA] Integrate flashinfer MoE LoRA nvidia
#47226 opened Jul 1, 2026 by jeejeelee Member Draft
4 tasks
[Perf][Model] Build Phi4MM Conformer streaming mask on target device
#47225 opened Jun 30, 2026 by Juice-XIJ Loading…
4 tasks done
[Bugfix] Fix online-quant MoE loading zero weights after #47058 bug Something isn't working
#47221 opened Jun 30, 2026 by mgoin Member Loading…
[AMD][EPLB] Enable EPLB for Quark OCP MXFP4 MoE ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm
#47220 opened Jun 30, 2026 by okorzh-amd Contributor Loading…
4 tasks
[Bugfix] Include SM12x in _is_fa4_supported() compute capability check bug Something isn't working
#47218 opened Jun 30, 2026 by tgmerritt Loading…
[Bugfix][Gemma4] Keep image bidirectional attention within the sliding window bug Something isn't working v1
#47217 opened Jun 30, 2026 by lucianommartins Contributor Loading…
ProTip! Mix and match filters to narrow down what you’re looking for.