YuyaoGe's Website
YuyaoGe's Website
About
Highlight Publications
Other Publications
Projects
Posts
Light
Dark
Automatic
Home
Tags
Adversarial Attack
Adversarial Attack
Gated Differentiable Working Memory for Long-Context Language Modeling
Abstract: Long contexts challenge transformers: attention scores dilute across thousands of tokens, critical information is often lost in the middle, and models struggle to adapt to novel patterns at inference time.
Lingrui Mei
,
Shenghua Liu
,
Yiwei Wang
,
Yuyao Ge 葛钰峣
,
Baolong Bi
,
Jiayu Yao
,
Jun Wan
,
Ziling Yin
,
Jiafeng Guo
,
Xueqi Cheng
Jan 19, 2026
PDF
Cite
DOI
arXiv
Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning
Abstract: Recent advances in reinforcement learning (RL) have significantly improved the complex reasoning capabilities of large language models (LLMs). Despite these successes, existing methods mainly focus on single-domain RL (e.g., mathematics) with verifiable rewards (RLVR), and their reliance on purely online RL frameworks restricts the exploration space, thereby limiting reasoning performance.
Baolong Bi
,
Shenghua Liu
,
Yiwei Wang
,
Siqian Tong
,
Lingrui Mei
,
Yuyao Ge 葛钰峣
,
Yilong Xu
,
Jiafeng Guo
,
Xueqi Cheng
Nov 15, 2025
PDF
Cite
DOI
arXiv
Are All Prompt Components Value-Neutral? Understanding the Heterogeneous Adversarial Robustness of Dissected Prompt in Large Language Models
Abstract: Prompt-based adversarial attacks have become an effective means to assess the robustness of large language models (LLMs). However, existing approaches often treat prompts as monolithic text, overlooking their structural heterogeneity-different prompt components contribute unequally to adversarial robustness.
Yujia Zheng
,
Tianhao Li
,
Haotian Huang
,
Tianyu Zeng
,
Jingyu Lu
,
Chuangxin Chu
,
Yuekai Huang
,
Ziyou Jiang
,
Qian Xiong
,
Yuyao Ge 葛钰峣
,
Mingyang Li
Aug 3, 2025
PDF
Cite
DOI
arXiv
论文分享 | 广泛的解码策略导致大模型越狱
在本文,作者提出了一个新的数据集MaliciousInstruct,一种模型回答毒性评估方式,一种通过操纵解码超参数的攻击手段——generation exploitation,一种对齐策略——generation-aware alignment
Yuyao Ge 葛钰峣
Apr 9, 2024
1 min read
论文分享
Softmax回归及其优化问题
本文所属系列为笔者学习陈天奇和J.Zico Kolter在CMU开设的Deep Learning Systems的课程笔记。
Yuyao Ge 葛钰峣
Mar 21, 2024
3 min read
笔记
Attack based on data : A novel perspective to attack sensitive points directly
Adversarial attack for time-series classification model is widely explored and many attack methods are proposed. But there is not a …
Yuyao Ge 葛钰峣
,
Zhongguo Yang
,
Lizhe Chen
,
Yiming Wang
,
Chengyang Li
PDF
Cite
Dataset
DOI
Cite
×