OpenAI Releases o1 Model, Significantly Boosting Complex Reasoning and Problem-Solving Capabilities

Eldergate

OpenAI Releases o1 Model: A Major Leap in Complex Reasoning and Problem-Solving

OpenAI recently launched its next-generation o1 model series, including o1-preview and o1-mini. Both models are designed to enhance complex reasoning, using reinforcement learning to train longer thought processes and achieving significant breakthroughs in fields like mathematics, programming, and science. o1-preview is now available in ChatGPT Plus, Team, Enterprise, and the API, while o1-mini offers a more efficient alternative.

Outstanding Benchmark Performance

In several authoritative benchmarks, o1-preview has demonstrated leading performance. According to official OpenAI data, it achieved an accuracy of 74.3% in the AIME 2024 math competition, compared to just 9.3% for GPT-4o. On the GPQA Diamond test of graduate-level science questions, it scored 73.3%, versus GPT-4o’s 29.8%. Furthermore, it scored 75.7% on the ARC-AGI abstract reasoning task, where GPT-4o scored 5.4% on a public evaluation. In programming, it ranked in the 89th percentile in Codeforces competitions and scored 68.5% on the MMMU multimodal understanding test. These results have been verified by PhD-level experts to confirm there was no data contamination.

The Efficient Design of o1-mini

As a lightweight model, o1-mini is optimized for coding and math, delivering performance close to o1-preview on the same tasks but at a faster speed and lower cost. It scored 65.5% on the AIME math test and 66.4% on GPQA. OpenAI states that o1-mini performs exceptionally well on a 12 million token training dataset, making it suitable for high-frequency use cases. Both models support tool use, such as web search, file analysis, and Python execution, further expanding their range of applications.

Training Methodology and System Architecture

The core innovation of the o1 model lies in using reinforcement learning to train its chain-of-thought process. Unlike traditional methods that predict the next token, the model performs internal reasoning steps—which can extend to tens of thousands of tokens—before generating a response. This process mimics human step-by-step problem-solving, improving accuracy. OpenAI emphasizes that this architecture is the result of two years of research, trained on a combination of public and private data, and has avoided common benchmark contamination issues.

Availability, Pricing, and Safety Evaluation

Users can access o1-preview directly through the ChatGPT interface, with a usage limit of 50 messages per week, with priority for Plus users. API pricing for o1-preview is $15 per million input tokens and $60 per million output tokens. For o1-mini, the prices are $3 and $12, respectively. In terms of safety, the o1 models have performed well in both automated and human evaluations and have not triggered high-risk ASL-3 category issues, although OpenAI plans to further strengthen its safeguards.