Performance

3,200% CPU Utilization

An in-depth analysis of a critical Java performance issue where unprotected concurrent TreeMap modifications led to 3,200% CPU utilization. The investigation revealed how thread interleaving can create infinite loops in red-black trees, with experiments across multiple programming languages demonstrating similar vulnerabilities.

GitHub - deepseek-ai/3FS: A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

Fire-Flyer File System (3FS) is a high-performance distributed storage solution optimized for AI workloads, featuring strong consistency and disaggregated architecture. The system achieves impressive throughput of 6.6 TiB/s in read operations across 180 storage nodes, while supporting diverse workloads from data preparation to inference caching.

Why We Designed TigerBeetle's Docs from Scratch | TigerBeetle Blog

TigerBeetle rebuilt their documentation site from scratch, moving away from Docusaurus to achieve better performance, simplicity, and integration with their zero-dependency philosophy. The new implementation uses Zig and Pandoc, resulting in a 10x reduction in footprint while maintaining functionality and adding features like integrated search and offline capabilities.

Xcode constantly phones home

The article discusses performance issues with Xcode builds caused by unnecessary connections to Apple's servers during the 'Gather provisioning inputs' phase. The author discovers that blocking certain Apple domains through Little Snitch significantly improves build times while exploring Xcode's seemingly unnecessary tracking and analytics connections.

The Best Way to Use Text Embeddings Portably is With Parquet and Polars

A deep dive into efficient storage and retrieval of text embeddings using Parquet files and polars library, demonstrated through Magic: The Gathering card analysis. The article explores alternatives to vector databases for smaller datasets, highlighting how combining Parquet files with polars offers zero-copy operations and fast similarity searches.

Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering

A novel approach for rendering radiance fields using adaptive sparse voxels and rasterization, achieving high-fidelity results at over 100 FPS without neural networks. The method introduces efficient voxel allocation across different detail levels and implements a custom rasterizer using ray direction-dependent Morton ordering, eliminating common rendering artifacts.

The First Wasm_of_ocaml Release is Out!

Wasm_of_ocaml, a fork of Js_of_ocaml compiler that translates OCaml bytecode to WebAssembly, has released its first feature-complete version 6.0.1. The compiler offers better performance than Js_of_ocaml while maintaining compatibility, showing 2x-8x improvements in benchmarks and leveraging WasmGC for enhanced JavaScript interoperability.

When Imperfect Systems are Good, Actually: Bluesky’s Lossy Timelines

Bluesky implemented a 'Lossy Timelines' system to improve performance by intentionally dropping some timeline updates for users who follow many accounts. This solution reduced fanout latency by 96% and eliminated hot shard issues in their database clusters. The approach demonstrates how embracing imperfection in system design can lead to better scalability and performance.

Alex's blog

A technical analysis reveals Kafka's limitations as a job queue, highlighting potential unfairness in job distribution among workers, especially at low volumes. The worst-case scenario formula shows how jobs can be unevenly distributed, leading to inefficient resource utilization. Traditional message brokers may be more suitable for low-volume job queuing until Kafka implements KIP-932.

Go 1.24 Release Notes

Go 1.24 introduces significant performance improvements with a new Swiss Tables-based map implementation and enhanced memory allocation efficiency, reducing CPU overheads by 2-3%. The release adds support for ML-KEM post-quantum cryptography, FIPS 140-3 compliance mechanisms, and new testing tools for concurrent code.

0+0 > 0: C++ thread-local storage performance

An in-depth analysis of thread-local storage (TLS) performance in C++, examining how different implementations and contexts affect access speed. Core findings show that TLS access is fastest in executables without constructors, while shared libraries and constructors significantly degrade performance due to complex initialization and addressing mechanisms.

searchcode.com’s SQLite database is probably 6 terabytes bigger than yours 2025/02/16 (1949 words)

A developer details the migration of searchcode.com's database from MySQL to SQLite, resulting in what might be the world's largest SQLite database at 6.4TB. The migration involved implementing BTRFS compression, upgrading to a powerful server with an Intel Xeon CPU, and successfully maintaining performance across all operations.