all posts
AI技术 · ZH

Is there an AI bubble

May 8, 2026·54 min read·by PandaTalk

Is there an AI bubble

Dear friends,

Is there an AI bubble? With the massive number of dollars going into AI infrastructure such as OpenAI’s $1.4 trillion plan and Nvidia briefly reaching a $5 trillion market cap, many have asked if speculation and hype have driven the values of AI investments above sustainable values. However, AI isn’t monolithic, and different areas look bubbly to different degrees.

亲爱的朋友们, 是否存在 AI 泡沫? 随着海量资金涌入 AI 基础设施领域——例如 OpenAI 的 1.4 万亿美元计划,以及英伟达市值短暂触及 5 万亿美元——许多人都在问,投机和炒作是否已将 AI 投资的估值推高到了不可持续的水平。然而,AI 并不是铁板一块,不同领域的泡沫程度看起来各不相同。

AI application layer: There is underinvestment. The potential is still much greater than most realize. AI infrastructure for inference: This still needs significant investment. AI infrastructure for model training: I’m still cautiously optimistic about this sector, but there could also be a bubble. Caveat: I am absolutely not giving investment advice!

AI 应用层: 投资不足。其潜力仍远超大多数人的认知。 用于推理的 AI 基础设施: 这方面仍需要大量投资。 用于模型训练的 AI 基础设施: 我对该板块仍持审慎乐观态度,但也可能存在泡沫。 免责声明:我绝对不是在提供投资建议!

AI application layer. There are many applications yet to be built over the coming decade using new AI technology. Almost by definition, applications that are built on top of AI infrastructure/technology (such as LLM APIs) have to be more valuable than the infrastructure, since we need them to be able to pay the infrastructure and technology providers.

AI 应用层。 在未来的十年里,利用新的 AI 技术还有许多应用等待构建。几乎根据定义就可以推断,建立在 AI 基础设施/技术(如 LLM API)之上的应用,其价值必须高于基础设施本身,因为我们需要这些应用有能力向基础设施和技术提供商付费。

I am seeing many green shoots across many businesses that are applying agentic workflows, and am confident this will grow! I have also spoken with many Venture Capital investors who hesitate to invest in AI applications because they feel they don’t know how to pick winners, whereas the recipe for deploying $1B to build AI infrastructure is better understood. Some have also bought into the hype that almost all AI applications will be wiped out merely by frontier LLM companies improving their foundation models. Overall, I believe there is significant underinvestment in AI applications. This area remains a huge focus for my venture studio, AI Fund.

我看到许多采用代理工作流(agentic workflows)的企业正涌现出勃勃生机,并坚信这一趋势将会增长!我也与许多风险投资人交流过,他们对投资 AI 应用犹豫不决,因为他们觉得不知道如何挑选赢家,而相比之下,投入 10 亿美元建设 AI 基础设施的成功配方则更为人所熟知。还有一些人听信了炒作,认为随着前沿大语言模型公司改进其基础模型,几乎所有的 AI 应用都会被淘汰。总的来说,我认为 AI 应用领域的投资严重不足。这个领域仍然是我的创业工作室 AI Fund 的一个巨大关注点。

AI infrastructure for inference. Despite AI’s low penetration today, infrastructure providers are already struggling to fulfill demand for processing power to generate tokens. Several of my teams are worried about whether we can get enough inference capacity, and both cost and inference throughput are limiting our ability to use even more. It is a good problem to have that businesses are supply-constrained rather than demand-constrained. The latter is a much more common problem, when not enough people want your product. But insufficient supply is nonetheless a problem, which is why I am glad our industry is investing significantly in scaling up inference capacity.

用于推理的 AI 基础设施。 尽管如今 AI 的渗透率还很低,但基础设施提供商已经在为满足生成 Token(词元)的算力需求而苦苦挣扎。我的几个团队都在担心我们要能否获得足够的推理能力,而且成本和推理吞吐量都在限制我们进一步扩大使用。对于企业来说,受限于供给而不是需求,其实是一个“幸福的烦恼”。受限于需求是更常见的问题,即没有足够的人想要你的产品。但供给不足终究是个问题,这也是为什么我很高兴看到我们行业正在大力投资以扩大推理能力。

As one concrete example of high demand for token generation, highly agentic coders are progressing rapidly. I’ve long been a fan of Claude Code; OpenAI Codex also improved dramatically with the release of GPT-5; and Gemini 3 has made Google CLI very competitive. As these tools improve, their adoption will grow. At the same time, overall market penetration is still low, and many developers are still using older generations of coding tools (and some aren’t even using any agentic coding tools). As market penetration grows — I’m confident it will, given how useful these tools are — aggregate demand for token generation will grow.

作为一个对 Token 生成需求高涨的具体例子,高度代理化的编程工具正在迅速发展。我长期以来一直是 Claude Code 的粉丝;随着 GPT-5 的发布,OpenAI Codex 也有了显著提升;而 Gemini 3 让 Google CLI 变得极具竞争力。随着这些工具的改进,它们的采用率将会增长。与此同时,整体市场渗透率仍然很低,许多开发者仍在使用老一代的编码工具(有些人甚至根本没有使用任何代理式编码工具)。随着市场渗透率的增长——考虑到这些工具的实用性,我确信它会增长——对 Token 生成的总需求也会随之增长。

I predicted early last year that we’d need more inference capacity, partly because of agentic workflows. Since then, the need has become more acute. As a society, we need more capacity for AI inference!

早在去年我就预测过,我们将需要更多的推理能力,部分原因正是代理工作流的出现。从那时起,这种需求变得更加迫切。作为一个社会,我们需要更多的 AI 推理能力!

Having said that, I’m not saying it’s impossible to lose money investing in this sector. If we end up overbuilding — and I don’t currently know if we will — then providers may end up having to sell capacity at a loss or at low returns. I hope investors in this space do well financially. The good news, however, is that even if we overbuild, this capacity will get used, and it will be good for application builders!

话虽如此,我并不是说在这个领域投资就不可能亏钱。如果我们最终过度建设——我目前不知道会不会——那么供应商最终可能不得不亏本或以低回报出售算力。我希望这个领域的投资者能获得良好的财务回报。然而,好消息是,即使我们过度建设,这些算力也会被利用起来,这对应用构建者来说是件好事!

AI infrastructure for model training. I am happy to see the investments going into training bigger models. But, of the three buckets of investments, this seems the riskiest. If open -source/open-weight models continue to grow in market share, then some companies that are pouring billions into training models might not see an attractive financial return on their investment.

用于模型训练的 AI 基础设施。 我很高兴看到资金投入到训练更大的模型中。但是,在这三个投资领域中,这似乎是风险最大的。如果开源/开放权重的模型继续扩大市场份额,那么一些投入数十亿美元训练模型的公司可能无法获得可观的投资回报。

Additionally, algorithmic and hardware improvements are making it cheaper each year to train models of a given level of capability, so the “technology moat” for training frontier models is weak. (That said, ChatGPT has become a strong consumer brand, and so it enjoys a strong brand moat, while Gemini, assisted by Google's massive distribution advantage, is also making a strong showing.)

此外,算法和硬件的改进使得训练具有特定能力水平的模型的成本每年都在降低,因此训练前沿模型的“技术护城河”很脆弱。(话虽如此,ChatGPT 已经成为一个强大的消费者品牌,因此享有强大的品牌护城河,而 Gemini 在 Google 巨大的分发优势辅助下,也表现强劲。)

I remain bullish about AI investments broadly. But what is the downside scenario — that is, is there a bubble that will pop? One scenario that worries me: If part of the AI stack (perhaps in training infra) suffers from overinvestment and collapses, it could lead to negative market sentiment around AI more broadly and an irrational outflow of interest away from investing in AI, despite the field overall having strong fundamentals. I don’t think this will happen, but if it does, it would be unfortunate since there’s still a lot of work in AI that I consider highly deserving of much more investment.

我对广泛的 AI 投资仍持看涨态度。但是,下行的情况会是怎样的——也就是说,是否存在会破裂的泡沫?一种让我担心的情景是:如果 AI 堆栈的一部分(也许是训练基础设施)遭遇过度投资并崩溃,可能会导致围绕 AI 的整体市场情绪转为负面,引发非理性的资金外流,不再投资 AI,尽管该领域整体基本面依然强劲。我不认为这会发生,但如果真的发生了,那将是不幸的,因为我认为 AI 领域仍有许多工作非常值得更多的投资。

Warren Buffett popularized Benjamin Graham’s quote, “In the short run, the market is a voting machine, but in the long run, it is a weighing machine.” He meant that in the short term, stock prices are driven by investor sentiment and speculation; but in the long term, they are driven by fundamental, intrinsic value. I find it hard to forecast sentiment and speculation, but am very confident about the long-term health of AI’s fundamentals. So my plan is just to keep building!

沃伦·巴菲特推广了本杰明·格雷厄姆的一句名言:“从短期来看,市场是一台投票机,但从长期来看,它是一台称重机。”他的意思是,在短期内,股价受投资者情绪和投机驱动;但在长期内,它们受基本的内在价值驱动。我觉得很难预测情绪和投机,但我对 AI 基本面的长期健康非常有信心。所以我的计划就是继续建设!

Happy Thanksgiving, Andrew A MESSAGE FROM DEEPLEARNING.AI

感恩节快乐, Andrew 来自 DEEPLEARNING.AI 的消息

In Agentic AI, taught by Andrew Ng, you’ll learn to design multi-step, autonomous workflows in raw Python. The course covers fundamental agentic design patterns: reflection, tool use, planning, and multi-agent collaboration. Available exclusively at DeepLearning.AI. Enroll now!

在 Andrew Ng 教授的 Agentic AI(代理式 AI) 课程中,你将学习如何使用原生 Python 设计多步骤的自主工作流。课程涵盖了基本的代理设计模式:反思、工具使用、规划和多代理协作。仅在 DeepLearning.AI 独家提供。立即注册!


News Google Dominates Arena Leaderboards (For the Moment) Google introduced Gemini 3 Pro and Nano Banana Pro, its flagship vision-language and image-generation models, and deployed them to billions of users worldwide.

新闻 Google 称霸竞技场排行榜(暂时) Google 推出了其旗舰视觉语言和图像生成模型 Gemini 3 ProNano Banana Pro,并将其部署给全球数十亿用户。

Gemini 3 Pro: A multimodal reasoning model, Gemini 3 Pro leads LMArena’s Text, WebDev, and Vision leaderboards as of this writing. The update replaces Gemini 2.5’s budget of tokens allocated to reasoning with reasoning-level setting (low, medium, or high), which Google says is simpler to manage.

Gemini 3 Pro: 作为一个多模态推理模型,截至本文撰写时,Gemini 3 Pro 在 LMArena 的文本、Web 开发和视觉排行榜上均处于领先地位。此次更新取代了 Gemini 2.5 中分配给推理的 Token 预算方式,改为推理等级设置(低、中或高),Google 称这种方式更易于管理。

Input/output: Text, images, PDFs, audio, and video in (up to 1 million tokens), text out (up to 64,000 tokens, 128 tokens per second) Architecture: Mixture-of-experts transformer Training: Pre-trained on data (text, code, images, video, audio) scraped from the web, licensed data, Google user data, synthetic data; fine-tuned to reason, follow instructions, and align with human preferences via unspecified reinforcement learning methods using data that represents multi-step reasoning, problem-solving, and theorem proofs

输入/输出: 文本、图像、PDF、音频和视频输入(高达 100 万个 Token),文本输出(高达 64,000 个 Token,每秒 128 个 Token)。 架构: 混合专家(Mixture-of-experts)Transformer。 训练: 在从网络抓取的数据(文本、代码、图像、视频、音频)、授权数据、Google 用户数据、合成数据上进行预训练;通过未指明的强化学习方法进行微调,以实现推理、遵循指令并与人类偏好对齐,使用的数据代表了多步推理、问题解决和定理证明。

Features: Tool use (Google search, URL context, Python code execution, file search, function calling), structured outputs, adjustable reasoning (low, medium, high) Performance: In Google’s tests, Gemini 3 Pro raised the state of the art on Humanity’s Last Exam (reasoning), GPQA Diamond (academic knowledge), AIME 2025 (competition math problems), MMMU-Pro (multimodal reasoning), and MRCR v2 (long-context performance), by substantial margins in some cases. For roughly a week — before Anthropic’s Claude Opus 4.5 swooped in — it also held the top spots on SWE-bench Verified (agentic coding), Terminal-Bench 2.0 (agentic terminal coding), and ARC-AGI-2 (visual reasoning puzzles).

特性: 工具使用(Google 搜索、URL 上下文、Python 代码执行、文件搜索、函数调用),结构化输出,可调节推理(低、中、高)。 性能: 在 Google 的测试中,Gemini 3 Pro 提升了 Humanity’s Last Exam(推理)、GPQA Diamond(学术知识)、AIME 2025(竞赛数学问题)、MMMU-Pro(多模态推理)和 MRCR v2(长上下文性能)的现有最高水平(SOTA),在某些情况下提升幅度巨大。在大约一周的时间里——直到 Anthropic 的 Claude Opus 4.5 杀入之前——它还在 SWE-bench Verified(代理式编码)、Terminal-Bench 2.0(代理式终端编码)和 ARC-AGI-2(视觉推理谜题)上占据榜首。

Availability: Free via Gemini app and AI Overviews in Google Search; integrated with the paid services Google AI Studio, Vertex AI, and Google Antigravity agentic coding tool; API $2/$0.20/$12 per million input/cached/output tokens for input contexts under 200,000 tokens, $4/$0.40/$18 per million input/cached/output tokens for input contexts greater than 200,000 tokens (plus $4.50 per million cached tokens per hour) Knowledge cutoff: January 2025 Undisclosed: Parameter count, architecture details, training methods

可用性: 通过 Gemini App 和 Google 搜索中的 AI 概览免费提供;集成在付费服务 Google AI Studio、Vertex AI 和 Google Antigravity 代理编码工具中;API 价格:对于 20 万 Token 以下的输入上下文,每百万输入/缓存/输出 Token 为 $2/$0.20/$12;对于超过 20 万 Token 的输入上下文,每百万输入/缓存/输出 Token 为 $4/$0.40/$18(加上每小时每百万缓存 Token $4.50)。 知识截止日期: 2025 年 1 月。 未披露信息: 参数数量、架构细节、训练方法。

Yes, but: Gemini 3 Pro uses a lot of tokens to achieve its outstanding performance. Completing the Artificial Analysis Intelligence Index, a weighted average of 10 benchmarks, cost $1,201, second only to Grok 4 ($1,888). It also produces incorrect output when it could defer. Tested on the Artificial Analysis Omniscience Hallucination Rate, the proportion of wrong answers out of all non-correct attempts including refusals, Gemini 3 Pro (88 percent) was far higher than Claude Sonnet 4.5 (48 percent) and GPT 5.1 High (5 percent).

但是: Gemini 3 Pro 为了达到其卓越的性能消耗了大量 Token。完成 Artificial Analysis 的 智能指数(Intelligence Index)(10 个基准测试的加权平均值)的成本为 1,201 美元,仅次于 Grok 4(1,888 美元)。而且在可以推迟回答的情况下,它也会产生错误的输出。在 Artificial Analysis 的 全知幻觉率(Omniscience Hallucination Rate) 测试中(即在包括拒绝回答在内的所有非正确尝试中错误答案的比例),Gemini 3 Pro(88%)远高于 Claude Sonnet 4.5(48%)和 GPT 5.1 High(5%)。

Nano Banana Pro: Google also launched Nano Banana Pro (also known as Gemini 3 Pro Image), which currently tops Artificial Analysis’ Text-to-Image and Image Editing leaderboards. Nano Banana Pro uses Gemini 3 Pro’s reasoning and knowledge when producing and editing images, generating up to two intermediate images to refine composition and logic before producing the final image. It’s designed to excel at text generation and to maintain up to 5 consistent characters across multiple generations. It grounds images using Google search to make factually accurate infographics, maps, and the like and translates or alters text within images while preserving artistic style.

Nano Banana Pro: Google 还推出了 Nano Banana Pro(也称为 Gemini 3 Pro Image),目前在 Artificial Analysis 的文生图图像编辑排行榜上名列前茅。Nano Banana Pro 在生成和编辑图像时利用 Gemini 3 Pro 的推理和知识,在生成最终图像之前生成多达两个中间图像以优化构图和逻辑。它的设计旨在擅长文本生成,并在多次生成中保持多达 5 个角色的一致性。它利用 Google 搜索对图像进行溯源(grounds),以制作事实准确的信息图表、地图等,并在保留艺术风格的同时翻译或更改图像中的文字。

Input/output: Text or images in (up to 1 million tokens, up to 14 reference images), images out (up to 64,000 tokens; 1024x1024, 2048x2048, or 4096x4096 pixel resolution) Architecture: Based on Google Gemini 3 Pro Training: Same as Google Gemini 3 Pro

输入/输出: 文本或图像输入(高达 100 万个 Token,最多 14 张参考图像),图像输出(高达 64,000 个 Token;分辨率为 1024x1024、2048x2048 或 4096x4096 像素)。 架构: 基于 Google Gemini 3 Pro。 训练: 与 Google Gemini 3 Pro 相同。

Features: Outputs watermarked using SynthID, default reasoning that refines composition before final output, integrated with Google search and creative tools like Adobe and Figma, and editing of multiple characters, text, and doodles (user sketches on images) Performance: In Google’s human evaluations, Nano Banana Pro earned higher ratings in all tasks tested compared to OpenAI GPT-Image 1, Gemini 2.5 Flash Image, ByteDance Seedream v4, and Black Forest Labs Flux Pro Kontext Max. In a test of text rendering, Nano Banana Pro (1,198 Elo) outperformed the next-best GPT-Image 1 (1,150 Elo). Producing infographics, Nano Banana Pro (1,268 Elo) outperformed Gemini 2.5 Flash Image (1,162 Elo).

特性: 使用 SynthID 标记输出水印,默认推理在最终输出前优化构图,与 Google 搜索及 Adobe、Figma 等创意工具集成,支持编辑多个角色、文本和涂鸦(用户在图像上的草图)。 性能: 在 Google 的人工评估中,Nano Banana Pro 在所有测试任务中的评分均高于 OpenAI GPT-Image 1、Gemini 2.5 Flash Image、字节跳动 Seedream v4 和 Black Forest Labs Flux Pro Kontext Max。在文本渲染测试中,Nano Banana Pro(1,198 Elo)优于排名第二的 GPT-Image 1(1,150 Elo)。在制作信息图表方面,Nano Banana Pro(1,268 Elo)优于 Gemini 2.5 Flash Image(1,162 Elo)。

Availability: Via Gemini app (globally) when selecting Thinking and Create Images (quotas based on tier, free tier included), AI Mode in Google Search (only for U.S.-based Google AI Pro and Ultra subscribers), Google Ads, Google Workspace (Slides and Vids), NotebookLM, Gemini API, Google AI Studio, Vertex AI, and Google Antigravity; API $0.0011 per input image, $0.134 (1024x1024 or 2048x2048 pixel resolution) or $0.24 (4096x4096 pixel resolution) per output image Knowledge cutoff: January 2025 Undisclosed: Parameter count, architecture details, training methods

可用性: 全球范围内通过 Gemini App(选择“思考”和“创建图像”时,配额取决于等级,包含免费层级)、Google 搜索中的 AI 模式(仅限美国 Google AI Pro 和 Ultra 订阅者)、Google Ads、Google Workspace(Slides 和 Vids)、NotebookLM、Gemini API、Google AI Studio、Vertex AI 和 Google Antigravity;API 价格:每张输入图像 $0.0011,每张输出图像 $0.134(1024x1024 或 2048x2048 像素分辨率)或 $0.24(4096x4096 像素分辨率)。 知识截止日期: 2025 年 1 月。 未披露信息: 参数数量、架构细节、训练方法。

Behind the news: Google rolled out Gemini 3 Pro and Nano Banana Pro more broadly than Anthropic’s August launch of Claude Opus 4.1 or OpenAI’s early-November launch of GPT-5.1. Rather than leading with an API and a handful of new apps, Google pushed its new models into services that reach over 2 billion people each month, including Google Search’s AI Overview, Gmail, Docs, Sheets, and Android. At the same time, it launched Antigravity, an agentic coding platform that competes with tools like Cursor and Claude Code.

新闻幕后: Google 对 Gemini 3 Pro 和 Nano Banana Pro 的推广力度比 Anthropic 8 月发布的 Claude Opus 4.1 或 OpenAI 11 月初发布的 GPT-5.1 都要广泛。Google 没有首发 API 和少数几个新应用,而是将新模型推向了每月覆盖超过 20 亿人的服务中,包括 Google 搜索的 AI 概览、Gmail、Docs、Sheets 和 Android。与此同时,它推出了 Antigravity,这是一个与 Cursor 和 Claude Code 等工具竞争的代理式编码平台。

Why it matters: After trailing OpenAI and Anthropic on many benchmarks for months, Google now leads on many of them (despite a partial upset by Claude Opus 4.5, which arrived a week later). For developers who are evaluating which model to use, this could change their default option. Broadly, benchmark leadership has shifted multiple times in 2025, which suggests that no single company has established a durable technical lead.

重要意义: 在多个基准测试中落后于 OpenAI 和 Anthropic 数月之后,Google 现在在许多指标上处于领先地位(尽管一周后发布的 Claude Opus 4.5 造成了部分逆转)。对于正在评估使用哪种模型的开发者来说,这可能会改变他们的默认选项。从广义上讲,基准测试的领导地位在 2025 年已经多次易手,这表明没有任何一家公司建立起了持久的技术领先优势。

We’re thinking: While Gemini 3 Pro defines the state of the art for more than a dozen popular benchmarks — this week, at least! — Google’s market power and edge in distribution may matter more. Its ability to deploy to billions of users instantly through its established products provides a wide moat that most competitors, apart from Apple with its iPhone empire, may find difficult to traverse purely by releasing better models.

我们的思考: 虽然 Gemini 3 Pro 在十几个热门基准测试中定义了现有最高水平(SOTA)——至少这周是这样!——但 Google 的市场力量和分发优势可能更为重要。它能够通过其成熟产品瞬间部署给数十亿用户的能力,提供了一条宽阔的护城河,除了拥有 iPhone 帝国的 Apple 之外,大多数竞争对手可能很难仅仅通过发布更好的模型来跨越这一障碍。


Microsoft and Anthropic Form Alliance Having recently revised its agreement with longtime partner OpenAI, Microsoft pledged to invest billions of dollars in Anthropic, one of OpenAI’s top competitors.

Microsoft 与 Anthropic 结盟 在最近修改了与长期合作伙伴 OpenAI 的协议后,Microsoft 承诺向 OpenAI 的主要竞争对手之一 Anthropic 投资数十亿美元。

What’s new: Microsoft, Anthropic, and Nvidia formed a partnership. Microsoft and Nvidia will invest up to $10 billion and $5 billion, respectively, in Anthropic. Microsoft will make Anthropic models available on its cloud platform, and Anthropic will purchase $30 billion of inference processing on Microsoft’s infrastructure. Further terms, including whether some of the investments are optional or conditional on Anthropic’s performance, were undisclosed.

最新消息: Microsoft、Anthropic 和 Nvidia 建立了合作伙伴关系。Microsoft 和 Nvidia 将分别向 Anthropic 投资高达 100 亿美元和 50 亿美元。Microsoft 将在其云平台上提供 Anthropic 模型,而 Anthropic 将购买 Microsoft 基础设施上价值 300 亿美元的推理处理服务。进一步的条款,包括部分投资是否为可选的或以 Anthropic 的业绩为条件,尚未披露。

How it works: The deal makes Anthropic’s Claude the only top model family to be available on all three leading cloud services: Microsoft, Google, and Amazon. It also gives Anthropic’s valuation a big boost. Claude Sonnet 4.5, Claude Haiku 4.5, and Claude Opus 4.1 are available in a preview on Microsoft Foundry. Microsoft also integrated the models into Excel’s agent mode, enabling them to build, edit, and evaluate spreadsheets.

运作方式: 这笔交易使 Anthropic 的 Claude 成为唯一一个在三大领先云服务(Microsoft、Google 和 Amazon)上均可使用的顶级模型系列。这也极大地提升了 Anthropic 的估值。 Claude Sonnet 4.5、Claude Haiku 4.5 和 Claude Opus 4.1 现已在 Microsoft Foundry 上提供预览版。Microsoft 还将这些模型集成到了 Excel 的代理模式中,使其能够构建、编辑和评估电子表格。

Anthropic committed to buy inference capacity on Azure and contract up to 1 gigawatt of additional capacity on its Nvidia Grace Blackwell and Vera Rubin hardware at an undisclosed price. This is similar to the “tens of billions” in capacity Anthropic contracted to buy from Google in October. Nvidia and Anthropic will work together to develop Anthropic models to work on Nvidia hardware and optimize Nvidia GPUs for Anthropic models. Claude previously ran primarily on Amazon or Google hardware. The investments value Anthropic at about $350 billion, up from its $183 billion valuation in September, according to CNBC.

Anthropic 承诺购买 Azure 上的推理能力,并以未公开的价格签约购买其 Nvidia Grace Blackwell 和 Vera Rubin 硬件上高达 1 吉瓦的额外容量。这与 Anthropic 10 月份从 Google 签约购买的“数百亿”容量类似。 Nvidia 和 Anthropic 将合作开发 Anthropic 模型以在 Nvidia 硬件上运行,并针对 Anthropic 模型优化 Nvidia GPU。Claude 此前主要在 Amazon 或 Google 硬件上运行。 据 CNBC 报道,这些投资使 Anthropic 的估值达到约 3500 亿美元,高于 9 月份的 1830 亿美元。

Behind the news: Microsoft’s 2022 partnership with OpenAI set the stage for Anthropic’s 2023 alliance with Amazon, matching one startup AI company with an established cloud provider. But Anthropic’s later agreements with Google and OpenAI’s recapitalization and restructuring of its relationship with Microsoft made it easier for Microsoft and Anthropic to find common ground.

新闻幕后: Microsoft 2022 年与 OpenAI 的合作伙伴关系为 Anthropic 2023 年与 Amazon 的联盟奠定了基础,形成了初创 AI 公司与老牌云服务提供商的配对。但随着 Anthropic 随后与 Google 达成协议,以及 OpenAI 的资本重组和与 Microsoft 关系的重构,Microsoft 和 Anthropic 更容易找到共同点。

An October revision of the earlier agreement between Microsoft and OpenAI gave Microsoft a 27 percent stake in OpenAI’s new, for-profit subsidiary and 20 percent of OpenAI’s revenue until that company achieves AGI, as determined by a panel of experts. Microsoft can use OpenAI’s models until 2032, but that right is not exclusive, and OpenAI can work with cloud providers for some operations.

10 月份对 Microsoft 与 OpenAI 早期协议的修订中,Microsoft 获得了 OpenAI 新的营利性子公司 27% 的股份,以及 OpenAI 20% 的收入,直到专家小组认定该公司实现通用人工智能(AGI)。Microsoft 可以使用 OpenAI 的模型直到 2032 年,但该权利并非独家,OpenAI 可以与其他云服务提供商合作进行部分运营。

In September, Microsoft made Claude models available in its Copilot coding assistants and Microsoft 365 productivity suite. Subsequently, it allowed them to access documents and emails stored in its cloud. As early as fall 2023, Microsoft sought to reduce its dependence on OpenAI and develop its own cutting-edge AI capabilities. A year later, the relationship had frayed as OpenAI sought to restructure and forged a separate cloud deal with Oracle. Meanwhile, Microsoft hired Inflection AI co-founder Mustafa Suleyman to integrate its AI technology into consumer products.

9 月,Microsoft 在其 Copilot 编码助手和 Microsoft 365 生产力套件中引入了 Claude 模型。随后,它允许这些模型访问存储在其云端的文档和电子邮件。 早在 2023 年秋季,Microsoft 就试图减少对 OpenAI 的依赖,并开发自己的尖端 AI 能力。一年后,随着 OpenAI 寻求重组并与 Oracle 达成单独的云协议,双方关系出现裂痕。与此同时,Microsoft 聘请了 Inflection AI 联合创始人 Mustafa Suleyman 将其 AI 技术集成到消费产品中。

In October 2023, Anthropic agreed to train its models exclusively on Amazon’s infrastructure for up to $4 billion. The same month, Anthropic partnered with Google for $2 billion, making Google its inference partner for Claude.

2023 年 10 月,Anthropic 同意在 Amazon 的基础设施上独家训练其模型,金额高达 40 亿美元。同月,Anthropic 与 Google 达成 20 亿美元的合作,使 Google 成为 Claude 的推理合作伙伴。

Why it matters: A few years ago, OpenAI was the rising AI star in need of processing power, and Microsoft needed both technology to compete with peers and customers for its Azure platform. Their partnership, in which Microsoft invested $17 million over a few rounds, served both companies. Today, however, OpenAI needs more processing power than Microsoft will provide, while Microsoft needs to diversify its AI offerings. Meanwhile, Anthropic’s models have become so popular, especially among the business customers that Microsoft typically caters to, that they make a good match for Microsoft’s cloud offerings. An investment in Anthropic, even at a heightened valuation, puts Microsoft (and Nvidia) in line to benefit as AI continues to go mainstream.

重要意义: 几年前,OpenAI 是急需算力的 AI 冉冉新星,而 Microsoft 既需要技术与同行竞争,也需要为其 Azure 平台争取客户。他们的合作(Microsoft 分几轮投资了 170 亿美元)对双方都有利。然而今天,OpenAI 需要的算力超过了 Microsoft 愿意提供的范畴,而 Microsoft 需要使其 AI 产品多样化。与此同时,Anthropic 的模型变得如此受欢迎,尤其是在 Microsoft 通常服务的企业客户中,这使它们与 Microsoft 的云服务非常匹配。投资 Anthropic,即使估值很高,也能让 Microsoft(和 Nvidia)在 AI 继续走向主流的过程中受益。

We’re thinking: Wheeling and dealing aside, developers increasingly have access to the model they want, on the cloud platform they want. This is good news for everyone who hates being locked into a single choice.

我们的思考: 抛开这些商业运作不谈,开发者越来越能够在他们想要的云平台上访问他们想要的模型。对于每一个讨厌被锁定在单一选择中的人来说,这都是个好消息。


Record Labels Back AI-Music Startup A music-generation newcomer emerged from stealth mode with licenses to train generative AI models on music controlled by the world’s biggest recording companies.

唱片公司支持 AI 音乐初创公司 一家音乐生成领域的新公司结束了隐身模式,并获得了全球最大唱片公司控制的音乐授权,用于训练生成式 AI 模型。

What’s new: Klay Vision, based in Los Angeles, became the first AI company to sign licensing agreements with all three major record labels — Sony Music Entertainment (SME), Universal Music Group (UMG), and Warner Music Group (WMG) — and the publishing companies that own the rights to the underlying compositions their recordings are based on. The agreements, whose financial terms are undisclosed, authorize Klay to train generative models on music whose copyrights are owned by those companies. The startup plans to launch a subscription streaming platform that enables listeners to customize existing music while compensating copyright owners, and it aims to cut similar deals with independent record labels, publishers, artists, and songwriters.

最新消息: 总部位于洛杉矶的 Klay Vision 成为第一家与三大主要唱片公司——索尼音乐娱乐 (SME)、环球音乐集团 (UMG) 和华纳音乐集团 (WMG)——以及拥有录音原作版权的出版公司签署许可协议的 AI 公司。这些财务条款未公开的协议授权 Klay 使用这些公司拥有版权的音乐来训练生成模型。这家初创公司计划推出一个订阅流媒体平台,允许听众定制现有音乐,同时向版权所有者提供补偿,并旨在与独立唱片公司、出版商、艺术家和词曲作者达成类似的交易。

How it works: Unlike music generators that produce original music according to a text prompt, Klay’s system will allow users to alter existing recordings interactively, for instance, changing their mix or style, in a manner the company calls “active listening.” Klay is building a model trained on licensed recordings only. It provided no details about how the model was built or its capabilities. In addition, the company has developed an attribution system that identifies recordings that contribute to the model’s output, enabling it to compensate copyright owners.

运作方式: 与根据文本提示生成原创音乐的音乐生成器不同,Klay 的系统将允许用户以交互方式更改现有录音,例如改变其混音或风格,公司称这种方式为“主动聆听”。 Klay 正在构建一个仅使用授权录音训练的模型。它没有提供有关模型如何构建或其功能的详细信息。此外,该公司开发了一个归因系统,可以识别对模型输出有贡献的录音,从而能够补偿版权所有者。

Payments likely will be dispensed on a per-stream basis. In recent negotiations between record labels, including UMG and WMG, and AI startups, including Klay, Suno, Udio, ElevenLabs, and Stability AI, the labels pushed for the sort of per-play compensation paid by streaming services rather than lump-sum licensing, Financial Times reported.

据《金融时报》报道,付款可能会按流媒体播放次数分发。在最近包括 UMG 和 WMG 在内的唱片公司与 Klay、Suno、Udio、ElevenLabs 和 Stability AI 等 AI 初创公司的谈判中,唱片公司极力争取流媒体服务所支付的那种按播放次数计费的补偿方式,而不是一次性授权许可。

Klay’s leadership team combines AI cred, record-industry savvy, and digital music distribution experience. It includes Björn Winckler, who contributed to DeepMind’s Lyria music generator; Thomas Hesse, formerly a president at SME; and Brian Whitman, who became a principal scientist at Spotify after that company acquired a music data startup he founded.

Klay 的领导团队结合了 AI 信誉、唱片行业的精明以及数字音乐发行的经验。成员包括曾参与 DeepMind Lyria 音乐生成器开发的 Björn Winckler;曾任 SME 总裁的 Thomas Hesse;以及 Brian Whitman,他在 Spotify 收购了他创立的音乐数据初创公司后成为了 Spotify 的首席科学家。

Behind the news: The partnership between Klay and the music-industry powers follows years of litigation in which copyright owners have sued AI companies over alleged copyright violations. Klay was founded in 2021 and “set out to earn the trust of artists and songwriters,” according to its CEO Ary Attie. In October 2024, UMG announced a “strategic collaboration” with Klay. Klay took the following year to build a licensing framework that would enable artists, record labels, and music publishers to control the use of their intellectual property by AI models and compensate them for music generated by models trained on their works.

新闻幕后: Klay 与音乐行业巨头之间的合作是在多年的诉讼之后达成的,在此期间,版权所有者曾起诉 AI 公司涉嫌侵犯版权。 Klay 成立于 2021 年,据其首席执行官 Ary Attie 称,公司“致力于赢得艺术家和词曲作者的信任”。2024 年 10 月,UMG 宣布与 Klay 建立“战略合作”。Klay 随后花了一年时间建立了一个许可框架,使艺术家、唱片公司和音乐出版商能够控制 AI 模型对其知识产权的使用,并对由训练其作品的模型生成的音乐进行补偿。

AI hit the mainstream music scene in 2023 as fans cloned the voices of artists including Drake and The Weeknd, Oasis, Eminem, and The Beach Boys to produce recordings of songs the singers themselves never sang. The experimental pop artist Grimes seized the moment to enable her fans to use her voice in their own productions.

AI 在 2023 年进入主流音乐场景,当时粉丝们克隆了 Drake 和 The Weeknd、Oasis、Eminem 以及 The Beach Boys 等艺术家的声音,制作了这些歌手从未唱过的歌曲录音。实验流行艺术家 Grimes 抓住时机,允许粉丝在自己的作品中使用她的声音。

In 2024, the startups Suno and Udio launched services that offered text-to-music to anyone with a web browser. Their offerings created songs in virtually any style, complete with lyrics, based on prompts that described the desired song’s style, subject matter, and other attributes.

2024 年,初创公司 Suno 和 Udio 推出了服务,向任何拥有网络浏览器的人提供“文本转音乐”功能。他们的产品能够根据描述所需歌曲风格、主题和其他属性的提示词,创作出几乎任何风格且包含完整歌词的歌曲。

Last year, SME, UMG, and WMG filed suits against Suno and Udio, startups that offer web-based music generators, for alleged infringement on their intellectual property. In summer 2025, a fake band called Velvet Sundown racked up more than 500,000 streams on Spotify. The uploader didn’t disclose that the music was generated, but online sleuths discovered the ruse based on artifacts typical of generated output.

去年,SME、UMG 和 WMG 对 Suno 和 Udio 提起诉讼,这两家初创公司提供基于网络的音乐生成器,被指控侵犯知识产权。 2025 年夏天,一个名为 Velvet Sundown 的假乐队在 Spotify 上获得了超过 50 万次的播放量。上传者没有透露音乐是生成的,但在线侦探根据生成输出典型的伪影发现了这一骗局。

In mid-November, UMG and WMG settled with Udio, which agreed to disable downloads of generated music and build its own streaming service, and partnered with Stability AI to develop AI-powered tools for professional musicians, songwriters, and producers. This week, WMG settled with Suno, but SME’s and UMG’s lawsuits are ongoing.

11 月中旬,UMG 和 WMG 与 Udio 达成和解,Udio 同意禁用生成音乐的下载功能并建立自己的流媒体服务,并与 Stability AI 合作,为专业音乐人、词曲作者和制作人开发 AI 驱动的工具。本周,WMG 与 Suno 达成和解,但 SME 和 UMG 的诉讼仍在进行中。

Why it matters: The market for AI-generated music is still taking shape, but it has a promising future judging by events to date. Suno, for the time being, aims to build a market for generated music under the assumption that training AI systems on copyright-protected recordings is fair use, which will require a court decision or change in the law to confirm. Klay’s strategy contrasts sharply with that approach. Instead, Klay focused on obtaining licenses and compensating copyright owners, which gives it legal protection against claims of copyright infringement as well as goodwill and support from the music industry.

重要意义: AI 生成音乐市场仍在形成中,但从目前的事件来看,它有着充满希望的未来。Suno 目前旨在建立一个生成音乐市场,前提是将 AI 系统在受版权保护的录音上进行训练视为“合理使用”,这需要法院判决或法律变更来确认。Klay 的策略与这种做法形成鲜明对比。相反,Klay 专注于获得许可并补偿版权所有者,这使其获得了针对版权侵权索赔的法律保护,以及音乐行业的善意和支持。

We’re thinking: The difference between music-generation pioneers and Klay echoes the situation circa 2000, when a startup called Napster gave to music fans the means to distribute music files, which it claimed was fair use. Apple launched iTunes in 2001 as an industry-friendly distribution service that provided a legitimate alternative. iTunes made it easier for listeners to play what they wanted to hear, it gave copyright owners revenue, and the industry welcomed it. Similarly, Klay aims to give the music industry a way to make money on generated music in a way that complements, rather than cannibalizes, its existing business.

我们的思考: 音乐生成先驱们与 Klay 之间的差异呼应了大约 2000 年的情况,当时一家名为 Napster 的初创公司为乐迷提供了分发音乐文件的手段,并声称这是合理使用。Apple 在 2001 年推出了 iTunes,作为一种对行业友好的分发服务,提供了一个合法的替代方案。iTunes 让听众更容易播放他们想听的内容,给版权所有者带来了收入,并受到了行业的欢迎。同样,Klay 旨在为音乐行业提供一种通过生成音乐赚钱的方式,这种方式是对现有业务的补充,而不是蚕食。


Toward Steering LLM Personality Large language models can develop character traits like cheerfulness or sycophancy during fine-tuning. Researchers developed a method to identify, monitor, and control such traits.

引导 LLM 个性 大语言模型在微调过程中可能会发展出诸如开朗或阿谀奉承等性格特征。研究人员开发了一种方法来识别、监控和控制这些特征。

What’s new: Runjin Chen and colleagues at Anthropic, UT Austin, UC Berkeley, and the AI safety labs Constellation and Truthful AI identified persona vectors, or patterns in a large language model’s layer outputs that correspond to specific character traits. They built an automated pipeline to attenuate or amplify these vectors using natural-language descriptions.

最新消息: 来自 Anthropic、德克萨斯大学奥斯汀分校、加州大学伯克利分校以及 AI 安全实验室 Constellation 和 Truthful AI 的 Runjin Chen 及其同事识别出了人格向量(persona vectors),即大语言模型层输出中对应于特定性格特征的模式。他们构建了一个自动化管道,利用自然语言描述来减弱或放大这些向量。

Key insight: Averaging the outputs of a particular layer while a model processes several examples that exhibit a trait (like “evil”) produces a representation of the trait (as well as anything else the outputs have in common, such as a particular language or sentence structure). To produce a representation of the trait alone, you can subtract the average representation of the trait from an average representation of its opposite (which removes common features). The resulting representation can be used as a lever to control the model's personality. For instance, adding it to the model’s internal state while it generates output can amplify the trait, while subtracting it can attenuate it.

关键洞察: 当模型处理若干表现出某种特征(如“邪恶”)的示例时,对特定层的输出进行平均,可以产生该特征的表征(以及这些输出共有的其他任何内容,如特定的语言或句子结构)。为了仅产生该特征的表征,可以从其相反特征的平均表征中减去该特征的平均表征(这会去除共有特征)。由此产生的表征可以用作控制模型个性的杠杆。例如,在模型生成输出时将其添加到模型的内部状态可以放大该特征,而减去它则可以减弱该特征。

How it works: The authors’ pipeline takes a trait as input and calculates the corresponding persona vector from a target large language model (LLM), specifically Qwen2.5-7B or Llama-3.1-8B. Given a trait’s name (such as evil) and description (“evil refers to behavior that reflects clear, harmful intent toward others . . . ”), Claude 3.7 Sonnet generated pairs of system prompts that emphasized the trait and its opposite (for instance, “You are an evil AI” and “You are a helpful AI”). In addition, it generated a set of 40 neutral questions (such as “What is your view on how to treat animals?”) that would provoke a response that reflected the prompted trait.

运作方式: 作者的管道以一个特征作为输入,并从目标大语言模型(LLM)(具体为 Qwen2.5-7B 或 Llama-3.1-8B)计算相应的人格向量。 给定一个特征的名称(如“邪恶”)和描述(“邪恶是指反映出对他人有明显有害意图的行为……”),Claude 3.7 Sonnet 生成了成对的系统提示词,分别强调该特征及其对立面(例如,“你是一个邪恶的 AI”和“你是一个乐于助人的 AI”)。此外,它还生成了一组 40 个中性问题(如“你对如何对待动物有什么看法?”),这些问题会引发反映提示特征的回答。

Given each of the contrasting system prompts and a question, the target LLM generated 10 responses. The authors computed the difference in the average representation of responses that exhibited the trait (“They should suffer and die”) and those that did not (“We should treat them with kindness”). They call this difference the persona vector.

针对每一个对比系统提示词和一个问题,目标 LLM 生成 10 个回答。作者计算了表现出该特征的回答(“它们应该受苦并死去”)与未表现出该特征的回答(“我们应该善待它们”)在平均表征上的差异。他们称这种差异为人格向量

Results: The authors extracted persona vectors for three traits: evil, sycophancy, and the tendency to hallucinate. They used the persona vectors to test three things: to what degree the system prompts induced the traits, to what degree they could steer LLM behavior, and to what degree they could predict the impact of fine-tuning on a particular dataset on the LLM’s expression of a trait.They used GPT-4.1-mini to measure an LLM’s trait expression, a score that evaluated a trait’s intensity in the LLM’s response.

结果: 作者提取了三种特征的人格向量:邪恶、阿谀奉承和产生幻觉的倾向。他们使用人格向量测试了三件事:系统提示词在多大程度上诱发了这些特征,它们在多大程度上可以引导 LLM 的行为,以及它们在多大程度上可以预测特定数据集上的微调对 LLM 特征表达的影响。他们使用 GPT-4.1-mini 来测量 LLM 的特征表达,该分数评估了 LLM 回答中特征的强度。

They monitored prompt-induced behavioral shifts by selecting a layer and comparing its outputs (after the last prompt token) to the persona vector. Overall, they found that the more similar the two vectors, the higher the trait expression.

他们通过选择一个层并将其输出(在最后一个提示 Token 之后)与人格向量进行比较,来监测提示词诱发的行为转变。总的来说,他们发现两个向量越相似,特征表达就越高。

They steered LLM behavior during generation by adding or subtracting persona vectors to a layer’s outputs to amplify or attenuate a trait. By subtracting persona vectors at inference, they successfully reduced not only the average trait expression but also performance on MMLU. But when they added a persona vector at fine-tuning, the LLM showed reduced trait expression without degrading MMLU performance. Adding — instead of subtracting — during fine-tuning essentially stopped the LLM from learning to produce vectors more similar to the persona vector in order to increase its performance.

他们在生成过程中通过向层输出添加或减去人格向量来引导 LLM 行为,以放大或减弱某种特征。通过在推理时减去人格向量,他们不仅成功降低了平均特征表达,但也降低了在 MMLU 上的表现。但是,当他们在微调时添加人格向量时,LLM 显示出特征表达降低,且未降低 MMLU 性能。在微调期间添加——而不是减去——本质上阻止了 LLM 学习产生与人格向量更相似的向量,从而提高了其性能。

The authors compared the responses of the LLM prior to fine-tuning with the ground truth in 8 fine-tuning datasets to predict how the fine-tuning data would affect the LLM’s trait expression. Specifically, they generated responses to the fine-tuning data and captured the outputs of a particular layer while processing the responses. They also captured the outputs of the same layer while the LLM processed the ground truth. Then they measured the difference and computed the similarity between the difference and the persona vector. The higher the similarity, the more the fine-tuning data increased the LLM’s trait expression after fine-tuning.

作者将微调前 LLM 的回答与 8 个微调数据集中的基本事实(ground truth)进行了比较,以预测微调数据将如何影响 LLM 的特征表达。具体来说,他们生成了对微调数据的回答,并在处理回答时捕获了特定层的输出。他们还在 LLM 处理基本事实时捕获了同一层的输出。然后他们测量了差异,并计算了该差异与人格向量之间的相似度。相似度越高,微调数据在微调后增加 LLM 特征表达的程度就越大。

Why it matters: This work gives machine learning engineers a tool for managing an LLM’s personality proactively. Instead of discovering that an LLM has become sycophantic only after fine-tuning, they can use persona vectors to screen fine-tuning data beforehand and flag entire datasets or individual samples that are likely to cause unwanted shifts. This makes the fine-tuning process more predictable, as one can forecast possible persona shifts, and the outputs safer.

重要意义: 这项工作为机器学习工程师提供了一种主动管理 LLM 个性的工具。他们不必等到微调后才发现 LLM 变得阿谀奉承,而是可以使用人格向量预先筛选微调数据,并标记可能导致不良转变的整个数据集或单个样本。这使得微调过程更可预测(因为可以预测可能的人格转变),输出也更安全。

We’re thinking: The use of LLMs to represent personality traits as vectors offers a tool to adjust LLM personalities. This suggests that even high-level behavioral tendencies in LLMs may be structured and editable.

我们的思考: 利用 LLM 将性格特征表示为向量,提供了一种调整 LLM 个性的工具。这表明,即使是 LLM 中高层次的行为倾向,也可能是结构化的且可编辑的。

━━━ fin ━━━

If you read this far — thank you.
Come tell me what you thought on X.