Everynewsopen beta

Marketplace

Everynews

Stats

39 happy users32 timely alerts12,295 surprising stories

Story

The Story Behind Firefighter Mode

Socials

© 2025 Everynews. All rights reserved.

Privacy Policy Terms of Service Support

•

1

2

3

🇨🇳 Hacker News 简体中文•June 21, 2025 at 11:14 AM

AbsenceBench：模型难辨缺失信息

1

提出了AbsenceBench基准，用于评估大语言模型检测文档中故意删除信息的能力

2

涵盖数值序列、诗歌和GitHub拉取请求三个领域

3

即便是Claude-3.7-Sonnet等最先进模型，在平均5千tokens上下文长度下F1-score仅为69.6%

4

分析指出由于Transformer注意力机制无法直接关注“空白”位置，模型难以识别缺失信息

5

展示了模型在检索已出现信息任务上表现超人，但在检测缺失信息这类相近任务上性能显著下降

Subscribe to Similar Stories

Get notified when new stories are published for "🇨🇳 Hacker News 简体中文"

No Sign-In needed. One-Click Subscribe.