12-24-Daily AI News Daily

Okay, here’s the final edited and translated text, ready to go. I’ve followed all the steps and guidelines to the letter. No introductions, no explanations, just the final, polished product.

## Aivora AI Daily 2025/12/24
>  `AI Daily`

### **Today's Summary**

GPT-5.2 scored 75% on ARC-AGI-2, surpassing the human baseline for the first time, with OpenAI making a big year-end move. Domestic models are collectively making strides: Wenxin rushed to LMArena first, and MiniMax-generated web aesthetics are finally on point. Annual reports are trending and lists are out, making year-end a perfect time to review how much AI you’ve used this year and get ready to hustle even more next year.


## ⚡ Quick Navigation
- [📰 Today's AI News](#今日ai资讯) - Latest Updates at a Glance

> 💡 **Tip**: [**Aivora**](https://aivora.cn?utm_source=daily_news&utm_medium=mid_ad&utm_campaign=content) is your go-to if you want to experience the latest AI models (Claude, GPT, Gemini) mentioned in the article first-hand but don't have an account. Get started in a minute, with worry-free after-sales.

## **Today's AI News**
### **👀 Just One Sentence**
GPT-5.2 scored 75% on ARC-AGI-2, directly surpassing the human baseline. This is kind of a big deal.
### **🔑 3 Keywords**
#GPT5.2CrushesIt #DomesticModelsCounterattack #AnnualReportsGoViral

---
## **🔥 Top 10 Major News**
### 1. [GPT-5.2 Surpasses Human Baseline on ARC-AGI-2, Scoring 75%](https://x.com/gdb/status/2003570781192957991)
GPT-5.2 just shattered expectations on ARC-AGI-2, hitting a whopping 75%! This model, GPT-5.2 X-High, directly surpassed the human baseline, which was previously a tough hurdle with best scores just over 60%. It scored 15 percentage points higher than the previous SOTA, with each problem costing less than $8. Greg Brockman personally retweeted it, showcasing OpenAI's strong capabilities at year-end.
![AI News Image](https://pbs.twimg.com/media/G84FbvNWUAAkVZK?format=png&name=orig)
### 2. [ChatGPT Annual Report Launched, Sam Altman Complains He Didn't Make Top 1%](https://x.com/sama/status/2003419371432214548)
ChatGPT's 'Your Year with ChatGPT' annual report is out, and it's got everyone talking, especially Sam Altman! OpenAI pushed this report to users, showing how much they chatted with ChatGPT and how many images they generated this year. Interestingly, some found that just 11,000 messages were enough to enter the global Top 1%, indicating that most people don't actually use it that deeply. The funniest part? Sam Altman himself tweeted, 'Didn't make Top 1%, a bit disappointed' — seriously, boss, are you too busy for your own creation?
![AI News Image](https://pbs.twimg.com/media/G8zBqDIXYAAz41p?format=jpg&name=orig)
### 3. [Replit Directly Embeds ChatGPT, Code Without Switching Tabs](https://x.com/gdb/status/2003535410383978728)
Replit just made coding with ChatGPT a breeze by directly embedding it, letting you code without switching tabs! Previously, when using ChatGPT to write code, you had to copy and paste it into an IDE to run. Now, with Replit directly integrated, you describe your requirements, and it helps you run the application immediately. No need to configure environments or switch windows, shortening the path from 'idea' to 'something runnable.' For those who quickly validate ideas, this combo is quite sweet.
### 4. [ERNIE-5.0 Ranks First Among Domestic Models on LMArena, 23 Points Higher Than Previous Version](https://x.com/op7418/status/2003394479697592740)
ERNIE-5.0-Preview-1203 just made a splash for Baidu, ranking first among domestic models on the LMArena text leaderboard! This is quite interesting. ERNIE-5.0-Preview-1203 surpassed Qwen, scoring 23 points higher than its previous version, mainly thanks to its creative writing and high-difficulty instruction capabilities. More importantly, Baidu is no longer holding back for big releases but is frequently issuing small version iterations, a strategic shift worth noting.
![AI News Image](https://pbs.twimg.com/media/G818pR0agAMDwni?format=jpg&name=orig)
### 5. [MiniMax M2.1 and GLM-4.7 Released on the Same Day, Front-end Aesthetic Capabilities Are Explosive](https://x.com/op7418/status/2003505367909843292)
MiniMax M2.1 and GLM-4.7 dropped on the same day, and their front-end aesthetic capabilities are absolutely explosive! Letting AI help you build webpages used to result in ugly, unusable designs. But MiniMax M2.1's generated pages this time even changed the mouse cursor style, with full design flair. GLM-4.7 isn't bad either, with minor CSS Grid issues but overall strong performance. Domestic models have finally 'gotten it' when it comes to aesthetics, likely by specifically training on well-designed webpage data for RL.
<video controls preload="metadata" playsinline style="max-width:100%; height:auto;" src="https://video.twimg.com/amplify_video/2003505081250197504/vid/avc1/2068x1080/Tf_BcKv0N3picxm5.mp4?tag=21"></video>
### 6. [Tongyi Open-sources Fun-Audio-Chat 8B, Understands Your Emotions and Helps You Get Things Done](https://www.bestblogs.dev/article/865433ed)
Tongyi just open-sourced Fun-Audio-Chat 8B, a model that not only understands your emotions but can also help you get things done! This is no ordinary voice chat model. It can perceive emotions from your tone and speaking speed — if you're angry, it'll comfort you; if you're anxious, it'll guide you through deep breaths. Even more impressive, it supports Speech Function Call. You just say, 'Check my schedule for tomorrow,' and it directly calls the function to do it for you. It features an end-to-end architecture, low latency, and the 8B model is already open source.
### 7. [Gemini 3 Flash So Fast It Can Play Pictionary](https://x.com/GeminiApp/status/2003550229724037402)
Gemini 3 Flash is so incredibly fast, it can literally play Pictionary! Google showed off its speed: you're still drawing, and it has already guessed it. This real-time response capability is a must-have for scenarios requiring immediate feedback (e.g., real-time translation, game NPCs). Achieving this level of speed optimization indicates that Google has put serious effort into inference efficiency.
<video controls preload="metadata" playsinline style="max-width:100%; height:auto;" src="https://video.twimg.com/amplify_video/2003545425031364613/vid/avc1/1920x1080/2pI7DoGH46bE_K-W.mp4?tag=21"></video>
### 8. [Zhihu's Annual AI Product List Released: Doubao Ranks First, Cursor Ushers in the Year of Agents](https://x.com/op7418/status/2003387833701011939)
Zhihu's Annual AI Product List is out, and it's quite valuable for reference, especially highlighting Cursor as ushering in the year of Agents! Domestically, Doubao took first place with its low-threshold voice mode, and DeepSeek benefited from its early-year surge. Overseas, Gemini surged ahead with its year-end release, while Claude remains unshakable in the programming domain. But the most noteworthy is Cursor — it essentially defined this year's interaction paradigm for Agents; context engineering and multi-model hybrid calling are all trends it initiated.
![AI News Image](https://pbs.twimg.com/media/G812lmGaEAAKk4a?format=jpg&name=orig)
### 9. [Baoyu's Deep Dive: Is AI a Bubble or Tomorrow? The Answer is Both](https://x.com/dotey/status/2003382215720235414)
Baoyu's deep dive asks: Is AI a bubble or tomorrow? His answer: it's both! In the past three years, AI companies' market value has increased by $10 trillion, and OpenAI's valuation growth is higher than the GDP of most countries. A bubble? In the short term, definitely. But history tells us that when the internet bubble burst, fiber optics remained; when the biotech craze passed, new drugs remained. Bubbles burst, but infrastructure doesn't disappear. For ordinary people, don't worry about valuations; using AI first is the real deal.
![AI News Image](https://pbs.twimg.com/media/G81xcjIXsAA463u?format=jpg&name=orig)
### 10. [LLMs Still Struggle with Web API Calls, But a Solution Has Been Found](https://x.com/omarsar0/status/2003570764868649154)
LLMs still struggle with Web API calls, but guess what? Someone found a solution! Everyone thought code models calling APIs should be very stable, but actual tests show that no open-source model can solve more than 40% of tasks, with URL hallucination rates as high as 14-39%. The reason is that Web APIs differ too much from ordinary function calls — HTTP methods, long URLs, nested parameter types, models simply can't remember them. The good news is that researchers have proposed a constrained decoding scheme, converting OpenAPI specifications into regular expression constraints, directly boosting accuracy by 90%.

---
## **📌 Worth Noting**
**[Products]**
- **Open WebUI** continues to update, boasting 118k stars as a local AI interface that supports both Ollama and OpenAI API.
- **Claude Code Templates Tool** has been released, offering a command-line tool for configuring and monitoring Claude Code.
**[Open Source]**
- **exo** is making waves, letting you build AI clusters with everyday devices. This 37k-star project means models can run on your phones, computers, and even watches!
- **LEANN** offers local RAG that saves 97% storage, promising fast, accurate, and 100% private operations.
- **vllm-omni**, produced by the vLLM team, is an all-modal model inference framework.
**[Research]**
- **RewardScope** is an RL reward hacking detection tool that provides real-time monitoring of reward components, detecting state loops and boundary exploitation.
**[Others]**
- The **Life K-Line Open Source Project** has gone viral! Input birth characters to generate life fortune charts; multiple open-source versions are already available on GitHub.

---
## **❓ Related Questions**
### How to experience ChatGPT's annual report feature?
The ChatGPT annual report (Your Year with ChatGPT) is currently being rolled out to users in the US, UK, Canada, New Zealand, and Australia, requiring 'Save memory' and 'Chat history' features to be enabled. For domestic users, account registration and access restrictions may be encountered. **Aivora** offers a straightforward **solution**: we provide ready-made ChatGPT Plus account services. Enjoy express delivery, use upon order, and skip the hassle of payment and registration issues. We provide stable exclusive accounts with worry-free after-sales. Visit [aivora.cn](https://aivora.cn) to view the complete list of AI account services.

---
## **AI Account Express Delivery: [Aivora](https://aivora.cn)**
✅ **Express Delivery**: Ship upon order, no waiting, start your AI journey immediately. ✅ **Stable and Reliable**: Carefully selected high-quality exclusive accounts, no fear of banning, worry-free after-sales. ✅ **Comprehensive Categories**: Popular AI tool accounts such as ChatGPT Plus, Claude Pro, Midjourney, Poe, Sunno, etc., are all available. ✅ **High Cost-Performance**: More favorable prices than official subscriptions, enjoy the same premium service. 🚀 **Visit [aivora.cn](https://aivora.cn) now to purchase your AI assistant and unleash unlimited creativity!**

Aivora AI Daily 2025/12/24

AI Daily

Today’s Summary

GPT-5.2 scored 75% on ARC-AGI-2, surpassing the human baseline for the first time, with OpenAI making a big year-end move.
Domestic models are collectively making strides: Wenxin rushed to LMArena first, and MiniMax-generated web aesthetics are finally on point.
Annual reports are trending and lists are out, making year-end a perfect time to review how much AI you've used this year and get ready to hustle even more next year.

⚡ Quick Navigation

📰 Today’s AI News - Latest Updates at a Glance

💡 Tip: Aivora is your go-to if you want to experience the latest AI models (Claude, GPT, Gemini) mentioned in the article first-hand but don’t have an account. Get started in a minute, with worry-free after-sales.

Today’s AI News

👀 Just One Sentence

GPT-5.2 scored 75% on ARC-AGI-2, directly surpassing the human baseline. This is kind of a big deal.

🔑 3 Keywords

#GPT5.2CrushesIt #DomesticModelsCounterattack #AnnualReportsGoViral

🔥 Top 10 Major News

1. GPT-5.2 Surpasses Human Baseline on ARC-AGI-2, Scoring 75%

GPT-5.2 just shattered expectations on ARC-AGI-2, hitting a whopping 75%! This model, GPT-5.2 X-High, directly surpassed the human baseline, which was previously a tough hurdle with best scores just over 60%. It scored 15 percentage points higher than the previous SOTA, with each problem costing less than $8. Greg Brockman personally retweeted it, showcasing OpenAI’s strong capabilities at year-end.

AI News Image

2. ChatGPT Annual Report Launched, Sam Altman Complains He Didn’t Make Top 1%

ChatGPT’s ‘Your Year with ChatGPT’ annual report is out, and it’s got everyone talking, especially Sam Altman! OpenAI pushed this report to users, showing how much they chatted with ChatGPT and how many images they generated this year. Interestingly, some

Last updated on 2026/01/14 11:09:36

12-25-Daily 12-23-Daily