multi-gpu

Here are 149 public repositories matching this topic...

ConfettiFX / The-Forge

The Forge Cross-Platform Framework PC Windows, Steamdeck (native), Ray Tracing, macOS / iOS, Android, XBOX, PS4, PS5, Switch, Quest 2

Updated Jul 3, 2025
C++

NVIDIA / OpenSeq2Seq

Star

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP

text-to-speech deep-learning tensorflow multi-node speech-synthesis speech-recognition seq2seq speech-to-text neural-machine-translation sequence-to-sequence language-model multi-gpu float16 mixed-precision

Updated May 11, 2021
Python

v-iashin / video_features

Star

Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.

Updated Feb 1, 2026
Python

rbbrdckybk / dream-factory

Sponsor

Star

Multi-threaded GUI manager for mass creation of AI-generated art with support for multiple GPUs.

machine-learning ai generative-art image-generation multi-gpu multithreaded nvidia-gpu ai-art stable-diffusion

Updated Aug 9, 2024
Python

omlins / ParallelStencil.jl

Star

Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs

stencil gpu julia parallel cuda stencil-codes multi-gpu staggered-grids multi-xpu xpu

Updated Apr 24, 2026
Julia

FZJ-JSC / tutorial-multi-gpu

Star

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

hpc gpu mpi cuda multi-gpu supercomputing nccl exascale-computing sc23 sc21 nvshmem isc22 sc22 isc23 isc24 sc24 isc25 sc25 isc26

Updated Jun 26, 2026
Cuda

seasonSH / DocFace

Star

Face recognition system for ID photos

tensorflow biometrics face-recognition multi-gpu face-verification

Updated Oct 17, 2018
Python

NickLucche / stable-diffusion-nvidia-docker

Star

GPU-ready Dockerfile to run Stability.AI stable-diffusion model v2 with a simple web interface. Includes multi-GPUs support.

docker image-generation nvidia-docker multi-gpu stable-diffusion

Updated Jun 21, 2024
Python

lattice / quda

Star

QUDA is a library for performing calculations in lattice QCD on GPUs.

c c-plus-plus gpu mpi cuda qcd multi-gpu

Updated Jun 27, 2026
C++

tamerthamoqa / facenet-pytorch-glint360k

Star

A PyTorch implementation of the 'FaceNet' paper for training a facial recognition model with Triplet Loss using the glint360k dataset. A pre-trained model using Triplet Loss is available for download.

pytorch face-recognition facenet multi-gpu triplet-loss lfw-dataset pretrained-model vggface2-dataset

Updated Sep 16, 2021
Python

helmholtz-analytics / heat

Star

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

python data-science machine-learning hpc gpu numpy mpi parallel-computing distributed-computing pytorch scientific-computing data-analytics high-performance-computing tensors multi-gpu mpi4py array-api

Updated Jul 1, 2026
Python

raketenkater / ggrun

Star

Auto-tuned launcher for GGUF models on llama.cpp / ik_llama.cpp — OpenAI-compatible server with multi-GPU tensor-split, MoE expert placement, measured flag tuning (AI Tune), hardware-matched HuggingFace downloads, and crash recovery. An Ollama alternative for multi-GPU rigs.

golang metal vulkan cuda self-hosted moe inference-server multi-gpu openai-api llm llamacpp llama-cpp local-llm gguf speculative-decoding localllama ollama-alternative

Updated Jun 29, 2026
Go

TorchDR / TorchDR

Star

TorchDR - PyTorch Dimensionality Reduction

python data-science cuda pytorch embeddings dimensionality-reduction tsne manifold-learning umap large-scale multi-gpu optimal-transport similarity-search spectral-embedding embedding-evaluation data-vizualisation neighbor-embedding affinity-matrix

Updated Apr 1, 2026
Python

eth-cscs / ImplicitGlobalGrid.jl

Star

Almost trivial distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid

gpu julia mpi cuda distributed stencil-codes multi-gpu staggered-grids julia-mpi-wrapper

Updated Jul 1, 2026
Julia

bharatsingh430 / py-R-FCN-multiGPU

Star

Code for training py-faster-rcnn and py-R-FCN on multiple GPUs in caffe

faster-rcnn object-detection multi-gpu

Updated Jun 6, 2017
Jupyter Notebook

GPUSPH / gpusph

Star

The world's first CUDA implementation of Weakly-Compressible Smoothed Particle Hydrodynamics

multi-platform hpc gpu multi-node cuda cfd sph multi-gpu fsi

Updated Jan 28, 2024
C++

papuSpartan / stable-diffusion-webui-distributed

Star

Chains stable-diffusion-webui instances together to facilitate faster image generation.

distributed-computing multi-gpu stable-diffusion automatic1111 stable-diffusion-webui stable-diffusion-webui-plugin

Updated Feb 24, 2025
Python

guotong1988 / BERT-pre-training

Star

multi-gpu pre-training in one machine for BERT without horovod (Data Parallelism)

nlp tensorflow bert multi-gpu

Updated Dec 27, 2025
Python

celerity / celerity-runtime

Star

High-level C++ for Accelerator Clusters

hpc multi-gpu sycl cluster-computing

Updated May 28, 2026
C++

defilantech / LLMKube

Star

Kubernetes operator for self-hosted LLM inference across a heterogeneous GPU fleet: NVIDIA CUDA, AMD Vulkan, and Apple Silicon Metal. Runtimes: llama.cpp, vLLM, TGI, mlx-server. Multi-GPU sharding, model caching, OpenAI-compatible endpoints. Apache-2.0, run across homelab and on-prem fleets, actively developed.

Updated Jul 1, 2026
Go

Improve this page

Add a description, image, and links to the multi-gpu topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multi-gpu topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-gpu

Here are 149 public repositories matching this topic...

ConfettiFX / The-Forge

NVIDIA / OpenSeq2Seq

v-iashin / video_features

rbbrdckybk / dream-factory

omlins / ParallelStencil.jl

FZJ-JSC / tutorial-multi-gpu

seasonSH / DocFace

NickLucche / stable-diffusion-nvidia-docker

lattice / quda

tamerthamoqa / facenet-pytorch-glint360k

helmholtz-analytics / heat

raketenkater / ggrun

TorchDR / TorchDR

eth-cscs / ImplicitGlobalGrid.jl

bharatsingh430 / py-R-FCN-multiGPU

GPUSPH / gpusph

papuSpartan / stable-diffusion-webui-distributed

guotong1988 / BERT-pre-training

celerity / celerity-runtime

defilantech / LLMKube

Improve this page

Add this topic to your repo