OpenAI Platform

A comprehensive guide detailing the differences between OpenAI's reasoning models (o-series) and GPT models, emphasizing their complementary strengths in complex problem-solving versus straightforward execution. The o-series models excel at strategic planning, decision-making, and handling ambiguous information, while GPT models are optimized for speed and cost-efficiency in well-defined tasks.

LM2: Large Memory Models

A novel Large Memory Model (LM2) architecture enhances Transformers with an auxiliary memory module, significantly outperforming existing models in multi-hop inference and numerical reasoning tasks. The model demonstrates a 37.1% improvement over RMT and 86.3% over Llama-3.2 on the BABILong benchmark while maintaining strong performance on general tasks.

Google just ANNIHILATED DeepSeek and OpenAI with their new Flash 2.0 model

Google has unveiled Flash 2.0, a high-performance AI model that reportedly outperforms recent reasoning models from DeepSeek (R1) and OpenAI (o3-mini), marking a significant advancement in AI model capabilities and competition among tech giants.

Automating GPU Kernel Generation with DeepSeek-R1 and Inference Time Scaling | NVIDIA Technical Blog

NVIDIA engineers utilized the DeepSeek-R1 model with inference-time scaling to automatically generate optimized GPU attention kernels, achieving results that sometimes surpassed human-engineered solutions. The experiment demonstrates how AI models can leverage additional computational resources during inference to evaluate multiple outcomes and select optimal solutions for complex programming tasks.

LLMs Can Teach Themselves to Better Predict the Future

Novel research demonstrates how large language models can improve their forecasting abilities through self-play and outcome-driven fine-tuning, achieving 7-10% better prediction accuracy without human-curated samples. The approach brings smaller models (Phi-4 14B and DeepSeek-R1 14B) to performance levels comparable to GPT-4 in forecasting tasks.

Les enjeux de l’IA : mon interview sur France 2 et Firstpost.

Transformers' extraordinary learning capabilities allow them to master skills through simple observation of related tasks, showcasing the potential of emergent behavior in AI. Recent studies demonstrate that transformer models can learn complex skills without explicit training, revealing profound implications for future AI development and understanding.

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

A novel language model architecture enables scaling test-time computation through latent space reasoning using a recurrent block approach, achieving performance improvements equivalent to 50B parameters without specialized training data or large context windows.

Deep dive into LLMs like ChatGPT by Andrej Karpathy (TL;DR)

Andrej Karpathy's deep dive into LLMs covers the complete lifecycle from pretraining to post-training, explaining tokenization, neural network architectures, and fine-tuning processes. The comprehensive guide explores how LLMs process information, handle hallucinations, and utilize reinforcement learning to improve performance and reasoning capabilities.

LIMO: Less is More for Reasoning

LIMO challenges conventional wisdom by achieving superior mathematical reasoning capabilities using only 817 training samples, outperforming models trained on 100x more data. The research introduces the Less-Is-More Reasoning Hypothesis, suggesting that complex reasoning can emerge through minimal but precise demonstrations when domain knowledge is well-encoded during pre-training.

http://www.jacksonpollock.org/ by Miltos Manetas!

OpenAI's Sora, a text-to-video AI model, can create highly realistic and accurate 60-second videos from text descriptions, showcasing remarkable consistency and potential to revolutionize video content creation. Sora's ability to understand physical motion and time, along with its grasp of the real world, represents a significant advancement in AI-generated media.

Robust Autonomy Emerges from Self-Play

Self-play training in simulation has produced a breakthrough in autonomous driving, achieving state-of-the-art performance across multiple benchmarks without using human driving data. Using Gigaflow simulator, the system demonstrated exceptional robustness with an average of 17.5 years of continuous driving between incidents while maintaining naturalistic behavior.

Understanding Reasoning LLMs

A comprehensive exploration of reasoning LLMs focuses on four main approaches: inference-time scaling, pure reinforcement learning, supervised finetuning with RL, and pure supervised finetuning with distillation. The article analyzes DeepSeek R1's development pipeline and compares it with OpenAI's o1, highlighting how reasoning capabilities can emerge through different training methodologies. Practical insights are provided for developing reasoning models on limited budgets, including alternative approaches like journey learning and small-scale implementations.

Gemini 2.0 is now available to everyone

Google announces general availability of Gemini 2.0 Flash across its AI products, introduces new experimental 2.0 Pro model with enhanced coding capabilities, and launches cost-efficient 2.0 Flash-Lite model. The updates include improved performance benchmarks, expanded context windows up to 2 million tokens, and multimodal capabilities, with more features planned for release in coming months.

S1: The $6 R1 Competitor?

A groundbreaking paper demonstrates how a $6 AI model, running on a laptop, achieves near state-of-the-art performance using only 1,000 training examples and innovative inference-time scaling techniques. The research reveals simple yet effective methods for controlling AI model thinking time and highlights the accelerating pace of AI development through cost-effective experimentation.

DeepRAG: Thinking to Retrieval Step by Step for Large Language Models

DeepRAG, a novel framework for large language models, combines reasoning with retrieval-augmented generation by modeling it as a Markov Decision Process. The system demonstrates a 21.99% improvement in answer accuracy through strategic decomposition of queries and dynamic knowledge retrieval, addressing the challenge of factual hallucinations in LLMs.

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

OmniHuman is an advanced AI system capable of generating realistic human videos with diverse visual and audio styles, supporting various aspect ratios and body proportions. The system excels in producing high-quality animations driven by music, speech, or video inputs, while handling complex gestures and accommodating multiple body poses and singing forms.

Machine Learning