Adversarial Attack

Are All Prompt Components Value-Neutral? Understanding the Heterogeneous Adversarial Robustness of Dissected Prompt in Large Language Models

Abstract: Prompt-based adversarial attacks have become an effective means to assess the robustness of large language models (LLMs). However, existing approaches often treat prompts as monolithic text, overlooking their structural heterogeneity-different prompt components contribute unequally to adversarial robustness.

Yujia Zheng, Tianhao Li, Haotian Huang, Tianyu Zeng, Jingyu Lu, Chuangxin Chu, Yuekai Huang, Ziyou Jiang, Qian Xiong, Yuyao Ge 葛钰峣, Mingyang Li

Aug 3, 2025

论文分享 | 广泛的解码策略导致大模型越狱

在本文，作者提出了一个新的数据集MaliciousInstruct，一种模型回答毒性评估方式，一种通过操纵解码超参数的攻击手段——generation exploitation，一种对齐策略——generation-aware alignment

Yuyao Ge 葛钰峣

Apr 9, 2024 1 min read 论文分享

论文分享 | 广泛的解码策略导致大模型越狱

Softmax回归及其优化问题

本文所属系列为笔者学习陈天奇和J.Zico Kolter在CMU开设的Deep Learning Systems的课程笔记。

Yuyao Ge 葛钰峣

Mar 21, 2024 3 min read 笔记

Softmax回归及其优化问题

Attack based on data : A novel perspective to attack sensitive points directly

Adversarial attack for time-series classification model is widely explored and many attack methods are proposed. But there is not a …

Yuyao Ge 葛钰峣, Zhongguo Yang, Lizhe Chen, Yiming Wang, Chengyang Li

Attack based on data : A novel perspective to attack sensitive points directly