AI Infrastructure

DuckDB goes distributed? DeepSeek’s smallpond takes on Big Data

DeepSeek has released smallpond, a distributed compute framework built on DuckDB, capable of processing 110.5TiB of data in 30 minutes. The framework leverages Ray Core for distribution and DeepSeek's 3FS storage system, offering a simpler alternative to traditional distributed systems while maintaining high performance. This development showcases DuckDB's growing adoption in AI workloads and demonstrates various approaches to scaling analytical databases.

GitHub - deepseek-ai/3FS: A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

Fire-Flyer File System (3FS) is a high-performance distributed storage solution optimized for AI workloads, featuring strong consistency and disaggregated architecture. The system achieves impressive throughput of 6.6 TiB/s in read operations across 180 storage nodes, while supporting diverse workloads from data preparation to inference caching.

GitHub - deepseek-ai/DeepEP: DeepEP: an efficient expert-parallel communication library

DeepEP is a communication library optimized for Mixture-of-Experts (MoE) and expert parallelism, providing high-throughput GPU kernels and low-latency operations. The library supports both intranode and internode communication, offering specialized kernels for asymmetric-domain bandwidth forwarding and low-latency inference decoding, with comprehensive support for FP8 and RDMA networks.