A detailed account of reviving a vintage JavaStation computer, transforming it from a non-functional state to running NetBSD through network booting. The narrative covers troubleshooting steps, configuration details, and successful implementation of RARP, TFTP, and NFS services.
Tailscale creates a virtual private network enabling secure remote access to devices and file sharing without traditional port forwarding. The service offers features like device-to-device connectivity, Taildrop for easy file transfers, and VPN capabilities through Mullvad integration.
DeepSeek has released smallpond, a distributed compute framework built on DuckDB, capable of processing 110.5TiB of data in 30 minutes. The framework leverages Ray Core for distribution and DeepSeek's 3FS storage system, offering a simpler alternative to traditional distributed systems while maintaining high performance. This development showcases DuckDB's growing adoption in AI workloads and demonstrates various approaches to scaling analytical databases.
A detailed guide explains how to build a TCP/IP stack from scratch, focusing on implementing Ethernet and ARP protocols in userspace Linux. The implementation uses TUN/TAP devices for intercepting network traffic and demonstrates successful ARP request handling, serving as an educational resource for deep network programming.
An in-depth guide demonstrates how to netboot Windows 11 using iSCSI and iPXE, enabling Windows to run from a NAS instead of local storage. The solution allows gaming on Windows while maintaining Linux as the primary OS, providing a practical workaround for AAA games that restrict virtual machine usage.
Fire-Flyer File System (3FS) is a high-performance distributed storage solution optimized for AI workloads, featuring strong consistency and disaggregated architecture. The system achieves impressive throughput of 6.6 TiB/s in read operations across 180 storage nodes, while supporting diverse workloads from data preparation to inference caching.
Detailed profiling data from a training and inference framework is shared, highlighting communication-computation overlap strategies with PyTorch Profiler visualizations. The framework implements DualPipe with MoE layers across different configurations, including EP64/TP1 for training and EP32/TP1 for prefilling, demonstrating balanced routing and micro-batch optimization techniques.
FFTNet introduces a novel approach to sequence processing using Fast Fourier Transform, achieving O(n log n) complexity compared to traditional self-attention's quadratic complexity. The framework employs spectral filtering and modReLU activation to efficiently capture long-range dependencies, demonstrating superior performance on Long Range Arena and ImageNet benchmarks.
DeepGEMM is a CUDA library offering efficient FP8 matrix multiplications with fine-grained scaling, supporting both normal and Mix-of-Experts GEMMs. The lightweight library matches or exceeds performance of expert-tuned libraries, featuring runtime compilation and Hopper tensor core optimization, while maintaining a simple ~300-line core kernel.
An experiment explores the feasibility of creating and transmitting custom network protocols across different operating systems and the internet, revealing significant challenges with OS compatibility and network infrastructure limitations. Results demonstrate that while custom protocols can work locally, they face major obstacles when traversing NAT gateways, firewalls, and cloud providers, ultimately suggesting TCP/UDP remain the most practical choices.