2025-01-24

Announcing Spiral

An analysis of the evolution of data systems into three distinct eras, highlighting the current transition into an AI-driven 'Third Age' requiring machine-scale outputs. Spiral introduces Vortex, a new columnar file format, and a database system designed to meet the demands of AI workloads with improved performance and security. The platform aims to bridge the gap between traditional data systems and modern AI infrastructure needs.

Original archive.is archive.ph web.archive.org

Log in to get one-click access to archived versions of this article.

read comments on news aggregators:

Related articles

Defeating Nondeterminism in LLM Inference

A deep dive into the causes of nondeterminism in LLM inference reveals that batch size variation, not floating-point operations, is the primary culprit. The article presents solutions for achieving deterministic results through batch-invariant kernels, demonstrating successful implementation with minimal performance impact.

DuckDB goes distributed? DeepSeek’s smallpond takes on Big Data

DeepSeek has released smallpond, a distributed compute framework built on DuckDB, capable of processing 110.5TiB of data in 30 minutes. The framework leverages Ray Core for distribution and DeepSeek's 3FS storage system, offering a simpler alternative to traditional distributed systems while maintaining high performance. This development showcases DuckDB's growing adoption in AI workloads and demonstrates various approaches to scaling analytical databases.

POLL: Trust in Firefox and Mozilla is Gone - Let's Talk Alternatives

Mozilla's recent source code changes removing the 'we don't sell your data' promise have severely damaged user trust, with a survey showing 90% of Firefox users either distrusting or doubting the organization. Multiple privacy-focused browser alternatives exist, including Librewolf, Waterfox, and emerging projects like Ladybird, offering users various options for secure browsing.

Teslas Monitor Everything—Including You | WIRED

Modern Tesla vehicles are equipped with extensive surveillance capabilities, including multiple cameras and sensors that collect significant amounts of data about the car's surroundings and occupants. While Tesla claims to protect user privacy through data anonymization and limited collection practices, investigations have revealed concerning privacy breaches and employee misuse of customer data. Privacy experts express skepticism about Tesla's data protection measures and policy transparency.

The FFT Strikes Back: An Efficient Alternative to Self-Attention

FFTNet introduces a novel approach to sequence processing using Fast Fourier Transform, achieving O(n log n) complexity compared to traditional self-attention's quadratic complexity. The framework employs spectral filtering and modReLU activation to efficiently capture long-range dependencies, demonstrating superior performance on Long Range Arena and ImageNet benchmarks.

GitHub - deepseek-ai/DeepEP: DeepEP: an efficient expert-parallel communication library

DeepEP is a communication library optimized for Mixture-of-Experts (MoE) and expert parallelism, providing high-throughput GPU kernels and low-latency operations. The library supports both intranode and internode communication, offering specialized kernels for asymmetric-domain bandwidth forwarding and low-latency inference decoding, with comprehensive support for FP8 and RDMA networks.

New Zealand Company’s ‘Impossible-to-Hack’ Security Turns Out to Be No Security at All

A New Zealand-based compliance software company, Teammate App, had a major security breach exposing over 2.9 million records including sensitive user data, despite claiming 'impossible-to-hack' security. When notified about the vulnerability, the CEO dismissed the security concerns and accused the researcher of harassment, while the exposed database contained user credentials, employee information, and accessible company documents.

Martin Escardo (@MartinEscardo@mathstodon.xyz)

Recent concerns emerge about potential US government interference with academic platforms like arXiv, GitHub, and university IT systems, particularly regarding DEI policies and federal funding. ArXiv's cloud-based infrastructure and dependence on federal funding through Cornell University raise questions about its vulnerability, though bulk download options exist for data preservation.