29. Instruct.KR Summer 2025 Meetup
Multi-node Distributed Inference
qMulti-Node Multi-GPU[42, 43]
(tensor parallel plus pipeline parallel inference)
§ If your model is too large to fit in a single node, you can use tensor
parallel together with pipeline parallelism.
§ The tensor parallel size is the number of GPUs you want to use in
each node, and the pipeline parallel size is the number of nodes
you want to use.
§ E.g., if you have 16 GPUs in 2 nodes (8 GPUs per node), you can set
the tensor parallel size to 8 and the pipeline parallel size to 2.
qTP 16 vs. TP 8 + PP 2
4. Production Deployment
[42] https://docs.vllm.ai/en/v0.9.2/serving/distributed_serving.html
[43] https://blog.vllm.ai/2025/02/17/distributed-inference.html
28/39