2025-02-17

0+0 > 0: C++ thread-local storage performance

An in-depth analysis of thread-local storage (TLS) performance in C++, examining how different implementations and contexts affect access speed. Core findings show that TLS access is fastest in executables without constructors, while shared libraries and constructors significantly degrade performance due to complex initialization and addressing mechanisms.

Original archive.is archive.ph web.archive.org

Log in to get one-click access to archived versions of this article.

read comments on news aggregators:

Related articles

0.14.0 Release Notes

Zig 0.14.0 introduces major updates including expanded cross-compilation capabilities, improved target support, and incremental compilation features aimed at reducing edit/compile/debug cycle latency, along with significant build system upgrades and language changes.

tigerbeetle/docs/internals/ARCHITECTURE.md at main · tigerbeetle/tigerbeetle

An in-depth technical overview of TigerBeetle, a specialized database designed for high-throughput financial transactions with strong consistency guarantees and durability. The system implements a single-threaded, deterministic architecture using static memory allocation and LSM trees, optimized for write-heavy workloads under extreme contention.

Why fastDOOM is fast

A detailed exploration of fastDOOM, a highly optimized version of DOOM achieving up to 48% better performance through 3,042 commits of incremental improvements. Victor Nieto's project demonstrates remarkable optimization across different CPU architectures and video modes, with particular attention to Mode Y versus Mode 13h implementations.

DuckDB goes distributed? DeepSeek’s smallpond takes on Big Data

DeepSeek has released smallpond, a distributed compute framework built on DuckDB, capable of processing 110.5TiB of data in 30 minutes. The framework leverages Ray Core for distribution and DeepSeek's 3FS storage system, offering a simpler alternative to traditional distributed systems while maintaining high performance. This development showcases DuckDB's growing adoption in AI workloads and demonstrates various approaches to scaling analytical databases.

GitHub - takara-ai/go-attention: A full attention mechanism and transformer in pure go.

Frontier Research Team at takara.ai introduces a pure Go implementation of attention mechanisms and transformer layers, featuring high performance and zero dependencies. The library offers efficient dot-product attention, multi-head attention support, and complete transformer layer implementation, making it ideal for edge computing and real-time processing.

Begrudgingly choosing CBOR over MessagePack

An analysis comparing CBOR and MessagePack serialization formats reveals CBOR's technical superiority despite MessagePack's greater popularity. The comparison explores aspects like efficiency, simplicity, and implementation, with CBOR showing advantages in encoding/decoding speed and unified type system through tags.

Smart Pointers Can't Solve Use-After-Free

Smart pointers in C++ cannot fully prevent use-after-free vulnerabilities due to internal raw pointers in types beyond user control. Examples with std::vector, std::span, and std::lock_guard demonstrate how iterator invalidation and pointer mismanagement can still lead to memory safety issues regardless of smart pointer usage.

A comprehensive technical guide explaining the internal mechanisms and subsystems of PostgreSQL database system, covering versions 17 and earlier. The document serves as an educational resource detailing process architecture, query processing, concurrency control, and other crucial database management aspects, authored by Hironobu SUZUKI.

Xcode constantly phones home

An investigation reveals how Xcode's unnecessary connections to Apple's servers can significantly slow down build times, particularly during the 'Gather provisioning inputs' phase. The post details how blocking specific connections through Little Snitch can improve build performance and reduce unwanted analytics collection by Xcode.