A deep dive into the causes of nondeterminism in LLM inference reveals that batch size variation, not floating-point operations, is the primary culprit. The article presents solutions for achieving deterministic results through batch-invariant kernels, demonstrating successful implementation with minimal performance impact.
DeepSeek has released smallpond, a distributed compute framework built on DuckDB, capable of processing 110.5TiB of data in 30 minutes. The framework leverages Ray Core for distribution and DeepSeek's 3FS storage system, offering a simpler alternative to traditional distributed systems while maintaining high performance. This development showcases DuckDB's growing adoption in AI workloads and demonstrates various approaches to scaling analytical databases.
Mozilla's recent source code changes removing the 'we don't sell your data' promise have severely damaged user trust, with a survey showing 90% of Firefox users either distrusting or doubting the organization. Multiple privacy-focused browser alternatives exist, including Librewolf, Waterfox, and emerging projects like Ladybird, offering users various options for secure browsing.
Fire-Flyer File System (3FS) is a high-performance distributed storage solution optimized for AI workloads, featuring strong consistency and disaggregated architecture. The system achieves impressive throughput of 6.6 TiB/s in read operations across 180 storage nodes, while supporting diverse workloads from data preparation to inference caching.
Modern Tesla vehicles are equipped with extensive surveillance capabilities, including multiple cameras and sensors that collect significant amounts of data about the car's surroundings and occupants. While Tesla claims to protect user privacy through data anonymization and limited collection practices, investigations have revealed concerning privacy breaches and employee misuse of customer data. Privacy experts express skepticism about Tesla's data protection measures and policy transparency.
FFTNet introduces a novel approach to sequence processing using Fast Fourier Transform, achieving O(n log n) complexity compared to traditional self-attention's quadratic complexity. The framework employs spectral filtering and modReLU activation to efficiently capture long-range dependencies, demonstrating superior performance on Long Range Arena and ImageNet benchmarks.
DeepGEMM is a CUDA library offering efficient FP8 matrix multiplications with fine-grained scaling, supporting both normal and Mix-of-Experts GEMMs. The lightweight library matches or exceeds performance of expert-tuned libraries, featuring runtime compilation and Hopper tensor core optimization, while maintaining a simple ~300-line core kernel.
DeepEP is a communication library optimized for Mixture-of-Experts (MoE) and expert parallelism, providing high-throughput GPU kernels and low-latency operations. The library supports both intranode and internode communication, offering specialized kernels for asymmetric-domain bandwidth forwarding and low-latency inference decoding, with comprehensive support for FP8 and RDMA networks.
A New Zealand-based compliance software company, Teammate App, had a major security breach exposing over 2.9 million records including sensitive user data, despite claiming 'impossible-to-hack' security. When notified about the vulnerability, the CEO dismissed the security concerns and accused the researcher of harassment, while the exposed database contained user credentials, employee information, and accessible company documents.
Recent concerns emerge about potential US government interference with academic platforms like arXiv, GitHub, and university IT systems, particularly regarding DEI policies and federal funding. ArXiv's cloud-based infrastructure and dependence on federal funding through Cornell University raise questions about its vulnerability, though bulk download options exist for data preservation.