2025-02-17

GitHub - deepseek-ai/DeepEP: DeepEP: an efficient expert-parallel communication library

DeepEP is a communication library optimized for Mixture-of-Experts (MoE) and expert parallelism, providing high-throughput GPU kernels and low-latency operations. The library supports both intranode and internode communication, offering specialized kernels for asymmetric-domain bandwidth forwarding and low-latency inference decoding, with comprehensive support for FP8 and RDMA networks.

Original archive.is archive.ph web.archive.org

Log in to get one-click access to archived versions of this article.

read comments on news aggregators:

Related articles

NetBSD on a JavaStation

A detailed account of reviving a vintage JavaStation computer, transforming it from a non-functional state to running NetBSD through network booting. The narrative covers troubleshooting steps, configuration details, and successful implementation of RARP, TFTP, and NFS services.

Tailscale is pretty useful

Tailscale creates a virtual private network enabling secure remote access to devices and file sharing without traditional port forwarding. The service offers features like device-to-device connectivity, Taildrop for easy file transfers, and VPN capabilities through Mullvad integration.

DuckDB goes distributed? DeepSeek’s smallpond takes on Big Data

DeepSeek has released smallpond, a distributed compute framework built on DuckDB, capable of processing 110.5TiB of data in 30 minutes. The framework leverages Ray Core for distribution and DeepSeek's 3FS storage system, offering a simpler alternative to traditional distributed systems while maintaining high performance. This development showcases DuckDB's growing adoption in AI workloads and demonstrates various approaches to scaling analytical databases.

Let's code a TCP/IP stack, 1: Ethernet & ARP

A detailed guide explains how to build a TCP/IP stack from scratch, focusing on implementing Ethernet and ARP protocols in userspace Linux. The implementation uses TUN/TAP devices for intercepting network traffic and demonstrates successful ARP request handling, serving as an educational resource for deep network programming.

Netboot Windows 11 with iSCSI and iPXE

An in-depth guide demonstrates how to netboot Windows 11 using iSCSI and iPXE, enabling Windows to run from a NAS instead of local storage. The solution allows gaming on Windows while maintaining Linux as the primary OS, providing a practical workaround for AAA games that restrict virtual machine usage.

GitHub - deepseek-ai/profile-data: Analyze computation-communication overlap in V3/R1.

Detailed profiling data from a training and inference framework is shared, highlighting communication-computation overlap strategies with PyTorch Profiler visualizations. The framework implements DualPipe with MoE layers across different configurations, including EP64/TP1 for training and EP32/TP1 for prefilling, demonstrating balanced routing and micro-batch optimization techniques.

The FFT Strikes Back: An Efficient Alternative to Self-Attention

FFTNet introduces a novel approach to sequence processing using Fast Fourier Transform, achieving O(n log n) complexity compared to traditional self-attention's quadratic complexity. The framework employs spectral filtering and modReLU activation to efficiently capture long-range dependencies, demonstrating superior performance on Long Range Arena and ImageNet benchmarks.

GitHub - Hawzen/hdp: What would happen if we didn't use TCP or UDP?

An experiment explores the feasibility of creating and transmitting custom network protocols across different operating systems and the internet, revealing significant challenges with OS compatibility and network infrastructure limitations. Results demonstrate that while custom protocols can work locally, they face major obstacles when traversing NAT gateways, firewalls, and cloud providers, ultimately suggesting TCP/UDP remain the most practical choices.