2025-02-27

GitHub - deepseek-ai/3FS: A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

Fire-Flyer File System (3FS) is a high-performance distributed storage solution optimized for AI workloads, featuring strong consistency and disaggregated architecture. The system achieves impressive throughput of 6.6 TiB/s in read operations across 180 storage nodes, while supporting diverse workloads from data preparation to inference caching.

Original archive.is archive.ph web.archive.org

Log in to get one-click access to archived versions of this article.

read comments on news aggregators:

Related articles

Zen 5's AVX-512 Frequency Behavior

AMD's Zen 5 processor introduces full-width AVX-512 datapaths with impressive performance at high clock speeds, demonstrating significant improvements over Intel's Skylake-X implementation. The architecture employs sophisticated IPC throttling and clock management techniques to handle heavy AVX-512 workloads, maintaining optimal performance while avoiding fixed frequency offsets.

3,200% CPU Utilization

An in-depth analysis of a critical Java performance issue where unprotected concurrent TreeMap modifications led to 3,200% CPU utilization. The investigation revealed how thread interleaving can create infinite loops in red-black trees, with experiments across multiple programming languages demonstrating similar vulnerabilities.

Why We Designed TigerBeetle's Docs from Scratch | TigerBeetle Blog

TigerBeetle rebuilt their documentation site from scratch, moving away from Docusaurus to achieve better performance, simplicity, and integration with their zero-dependency philosophy. The new implementation uses Zig and Pandoc, resulting in a 10x reduction in footprint while maintaining functionality and adding features like integrated search and offline capabilities.

Distributed Systems Programming Has Stalled

An analysis of distributed systems programming models reveals limitations in current approaches: external-distribution, static-location, and arbitrary-location paradigms. Despite advancements in distributed systems over the last decade, programming models haven't fundamentally improved, leading to ongoing challenges with concurrency, fault tolerance, and versioning.

Xcode constantly phones home

The article discusses performance issues with Xcode builds caused by unnecessary connections to Apple's servers during the 'Gather provisioning inputs' phase. The author discovers that blocking certain Apple domains through Little Snitch significantly improves build times while exploring Xcode's seemingly unnecessary tracking and analytics connections.

GitHub - deepseek-ai/DeepEP: DeepEP: an efficient expert-parallel communication library

DeepEP is a communication library optimized for Mixture-of-Experts (MoE) and expert parallelism, providing high-throughput GPU kernels and low-latency operations. The library supports both intranode and internode communication, offering specialized kernels for asymmetric-domain bandwidth forwarding and low-latency inference decoding, with comprehensive support for FP8 and RDMA networks.

Rob Ricci (@ricci@discuss.systems)

A Mastodon server dedicated to computer systems research and professional discussions, focusing on operating systems, distributed systems, networks, and databases within the fediverse ecosystem.

The Best Way to Use Text Embeddings Portably is With Parquet and Polars

A deep dive into efficient storage and retrieval of text embeddings using Parquet files and polars library, demonstrated through Magic: The Gathering card analysis. The article explores alternatives to vector databases for smaller datasets, highlighting how combining Parquet files with polars offers zero-copy operations and fast similarity searches.

What's New in Emacs 30.1?

Emacs 30.1 introduces significant improvements including a new completion preview mode, tree-sitter sexp command enhancements, and better touch screen support. The release also features native JSON improvements, buffer-local file watching, and automated org protocol registration.

GitHub - deepseek-ai/FlashMLA

FlashMLA is a high-performance MLA decoding kernel optimized for Hopper GPUs, achieving up to 3000 GB/s in memory-bound configurations and 580 TFLOPS in computation-bound scenarios. The implementation supports BF16 and paged kvcache, requiring CUDA 12.3+ and PyTorch 2.0+.