PyTorch

GitHub - deepseek-ai/profile-data: Analyze computation-communication overlap in V3/R1.

Detailed profiling data from a training and inference framework is shared, highlighting communication-computation overlap strategies with PyTorch Profiler visualizations. The framework implements DualPipe with MoE layers across different configurations, including EP64/TP1 for training and EP32/TP1 for prefilling, demonstrating balanced routing and micro-batch optimization techniques.

GitHub - deepseek-ai/DualPipe: A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

DualPipe is a bidirectional pipeline parallelism algorithm that optimizes computation-communication overlap in neural networks by achieving full overlap of forward and backward phases. The solution, presented in the DeepSeek-V3 Technical Report, reduces pipeline bubbles and requires implementation of custom overlapped forward-backward methods for specific modules.