Chat with Wiki - 2025年人工智能大模型的技术提升有哪些，是参数？推理能力？还是语料

Answer

2025 年人工智能大模型的技术提升可能体现在以下几个方面：

视频生成能力：如 2024 年推出的多个先进的 AI 模型能够从文本输入生成高质量视频，相比 2023 年有显著进步。
模型规模与性能：更小的模型能驱动更强的性能，如 2022 年最小能在 MMLU 上得分高于 60%的模型是具有 5400 亿参数的 PaLM，到 2024 年，参数仅 38 亿的微软 Phi-3-mini 也能达到相同阈值。
推理能力：尽管加入了如思维链推理等机制显著提升了大语言模型的性能，但在一些需要逻辑推理的问题上，如算术和规划，尤其在超出训练范围的实例上，这些系统仍存在问题。
AI 代理：在短时间预算设置下，顶级 AI 系统得分高于人类专家，但随着时间预算增加，人类表现会超过 AI。
算法变革：如 DeepSeek 的出现标志着算力效率拐点显现，其通过优化算法架构显著提升了算力利用效率，同时 2025 年发布的大模型呈现低参数量特征，为本地化部署到 AI 终端运行提供了可能，其训练过程聚焦于强化学习，提升了模型的推理能力。

Content generated by AI large model, please carefully verify (powered by aily)

References

2025 年 AI 指数报告，斯坦福.pdf

1.7.High-quality AI video generators demonstrate significant improvement.In 2024,several advanced AI models capable of generating high-quality videos from text inputs were launched.Notable releases include OpenAI’s SORA,Stable Video 3D and 4D,Meta’s Movie Gen,and Google DeepMind’s Veo 2.These models produce videos of significantly higher quality compared to those from 2023.2.8.Smaller models drive stronger performance.In 2022,the smallest model registering a score higher than 60% on MMLU was PaLM,with 540 billion parameters.By 2024,Microsoft’s Phi-3-mini,with just 3.8 billion parameters,achieved the same threshold.This represents a 142-fold reduction in over two years.3.9.Complex reasoning remains a problem.Even though the addition of mechanisms such as chain-of-thought reasoning has significantly improved the performance of LLMs,these systems still cannot reliably solve problems for which provably correct solutions can be found using logical reasoning,such as arithmetic and planning,especially on instances larger than those they were trained on.This has a significant impact on the trustworthiness of these systems and their suitability in high-risk applications.4.10.AI agents show early promise.The launch of RE-Bench in 2024 introduced a rigorous benchmark for evaluating complex tasks forAI agents.In short time-horizon settings(two-hour budget),topAI systems score four times higher than human experts,but as the time budget increases,human performance surpasses AI—outscoring it two to one at 32 hours.AI agents already match human expertise in select tasks,such as writing Triton kernels,while delivering results faster and at lower costs.Chapter 2 PreviewTable of Contents86Artificial Intelligence Index Report 20252.1 Overview of AI in 2024 Chapter 2:Technical PerformanceThe Technical Performance chapter begins with a highlevel overview of significant model releases in 2024 and reviews the current state of AI technical performance.2.1 Overview of AI in 2024

4.5 历史更新

《[AI「视觉图灵」时代来了！字节OmniHuman，一张图配上音频，就能直接生成视频](https://mp.weixin.qq.com/s/0OYlkcxoFvx6Z9IN-aq90w)》字节跳动推出的新技术OmniHuman，利用单张图片和音频生成生动的视频，突破了传统技术的局限。它通过多模态混合训练，解决了高质量数据稀缺的问题，实现了对任意尺寸图像的支持，生成自然的人物运动。《[甲子光年：2025 DeepSeek开启AI算法变革元年](https://waytoagi.feishu.cn/record/S5Jtrlw9neyXMccQ6CAcZsxHnXu)》DeepSeek的出现标志着算力效率拐点显现，其通过优化算法架构，显著提升了算力利用效率，打破了算力至上的传统认知。同时，AI基础大模型的参数量迎来拐点，2025年发布的大模型呈现低参数量特征，为本地化部署到AI终端运行提供了可能。此外，报告强调2025年是算法变革的元年，DeepSeek的推理模型开启了算法变革，其训练过程聚焦于强化学习，提升了模型的推理能力。《[R1之后，提示词技巧的变与不变](https://mp.weixin.qq.com/s/-51tjTWRdi19sEBCQMe1sw)》在R1时代，使用AI提示词的关键在于提供足够的背景信息，而非复杂的结构。简单大白话仍然有效，但信息量不足则难以得到理想结果。示例和框架可帮助理清思路，但最终影响在于你的思考和表达。利用乔哈里视窗分析信息需求，避免过度指令化，让AI自由思考，从而激发更高的创意和效果。“有用的不是技巧，而是你的思考！”

2025 年 AI 指数报告，斯坦福.pdf

100M1M10K100Artificial IntelligenceIndex Report 2025models.Parameter counts have risen sharply since the early 2010s,reflecting the growing complexity of their architecture,greater availability of data,improvements in hardware,and proven efficacy of larger models.High-parameter models are particularly notable in the industry sector,underscoring the substantial financial resources available to industry to cover the computational costs of training on vast volumes of data.Several of the figures below use a log scale to reflect the exponential growth in AI model parameters and compute in recent years.Industry–academiaGovernmentResearch collective2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024Publication dateFigure 1.3.11Table of ContentsChapter 1 Preview52Artificial Intelligence1.3 Notable AI Models Chapter 1:Research and DevelopmentIndex Report 2025DeepSeek-V3Qwen2.5-72BMistral Large 2Llama 2-70BPaLM(540B)Megatron-Turing NLG 530BGPT-3 175B(davinci)BERT-LargeTransformerERNIE 3.0 TitanRoBERTa Large1B10B100B1TAcademia Industry Industry–academiaNumber of parameters of select notable AI models by sector,2012–24Source:Epoch AI,2025|Chart:2025 AI Index report100M