2025-02-04

DeepSeek research suggests Huawei's Ascend 910C delivers 60% of Nvidia H100 inference performance

DeepSeek researchers report Huawei's Ascend 910C processor achieves 60% of Nvidia H100's inference performance, potentially reducing China's GPU dependence despite sanctions. While showing promise in inference tasks and manual optimization potential, the processor still faces challenges in long-term training reliability and stability compared to Nvidia's established ecosystem.

Original archive.is archive.ph web.archive.org

Log in to get one-click access to archived versions of this article.

Related articles

LM2: Large Memory Models

A novel Large Memory Model (LM2) architecture enhances Transformers with an auxiliary memory module, significantly outperforming existing models in multi-hop inference and numerical reasoning tasks. The model demonstrates a 37.1% improvement over RMT and 86.3% over Llama-3.2 on the BABILong benchmark while maintaining strong performance on general tasks.

ewintr.nl

A detailed walkthrough of building a budget-friendly AI workstation with 48GB VRAM for running local LLMs, costing around 1700 euros using second-hand Tesla P40 GPUs. The setup enables running various AI models locally, achieving 5-15 tokens per second depending on model size, while maintaining independence from cloud-based AI services.

wingolog

An in-depth exploration of generational garbage collection reveals unexpected performance results where generational collectors perform worse than whole-heap collectors in benchmark tests. The analysis examines various factors including nursery size, write barriers, and collection frequency, questioning conventional wisdom about generational GC's superiority.

PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

A new benchmark based on NPR Sunday Puzzle Challenge evaluates AI models' reasoning capabilities using general knowledge rather than specialized expertise. OpenAI o1 shows superior performance in this benchmark, while analysis reveals interesting failure patterns in models like DeepSeek R1 and identifies optimal reasoning lengths for different systems.