Tong Wu
(765)-720-4989 tongwu@princeton.edu
I am a PhD candidate at Princeton University, advised by Prof. Prateek Mittal. During my PhD, I have worked at Zoom, Microsoft, and NEC Labs as a research intern. Previously, I earned my Bachelor's and Master's degrees from Washington University in St. Louis, advised by Prof. Yevgeniy Vorobeychik.
My research focuses on the safety challenges posed by increasingly powerful and intelligent large language models (LLMs). I am particularly interested in developing simple, scalable methods grounded in rigorous theoretical principles. Specifically, my work spans across:
* Equal contribution
Effectively Controlling Reasoning Models through Thinking Intervention
Tong Wu, Chong Xiang, Jiachen T. Wang, Prateek Mittal
Preprint
Paper
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Tong Wu, Shujian Zhang, Kaiqiang Song, Silei Xu, Sanqiang Zhao, Ravi Agrawal, Sathish Indurthi, Chong Xiang, Prateek Mittal, Wenxuan Zhou
ICLR 2025
Paper
Code
Certifiably Robust RAG against Retrieval Corruption
Chong Xiang*, Tong Wu*, Zexuan Zhong, David Wagner, Danqi Chen, Prateek Mittal
Preprint
Paper
Code
Privacy-Preserving In-Context Learning for Large Language Models
Tong Wu*, Ashwinee Panda*, Jiachen T. Wang*, Prateek Mittal
ICLR 2024
Paper
Code
Poster
The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents
Feiran Jia, Tong Wu, Xin Qin, Anna Squicciarini
Preprint
Paper
Uncovering Adversarial Risks of Test-Time Adaptation
Tong Wu, Feiran Jia, Xiangyu Qi, Jiachen T. Wang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal
ICML 2023
Paper
Project
Code
Defending against Physically Realizable Attacks on Image Classification
Tong Wu, Liang Tong, Yevgeniy Vorobeychik
ICLR 2020 Spotlight Presentation
Paper
Code
Video
Slides
GREATS: Online Selection of High-Quality Data for LLM Training in Every Iteration
Jiachen T. Wang, Tong Wu, Dawn Song, Prateek Mittal, Ruoxi Jia
NeurIPS 2024 Spotlight Presentation
Paper
Code
Position Paper: Beyond Robustness Against Single Attack Types
Sihui Dai, Chong Xiang, Tong Wu, Prateek Mittal
Preprint
Paper
PatchCURE: Improving Certifiable Robustness, Model Utility, and Computation Efficiency of Adversarial Patch Defenses
Chong Xiang, Tong Wu, Sihui Dai, Jonathan Petit, Suman Jana, Prateek Mittal
USENIX 2024
Paper
A Randomized Approach for Tight Privacy Accounting
Jiachen T. Wang, Saeed Mahloujifar, Tong Wu, Ruoxi Jia, Prateek Mittal
NeurIPS 2023
Paper
Towards A Proactive ML Approach for Detecting Backdoor Poison Samples
Xiangyu Qi, Tinghao Xie, Jiachen T. Wang, Tong Wu, Saeed Mahloujifar, Prateek Mittal
USENIX 2023
Paper
Code
Short: Certifiably Robust Perception Against Adversarial Patch Attacks: A Survey
Chong Xiang, Chawin Sitawarin, Tong Wu, Prateek Mittal
VehicleSec2023
Paper
Video
Slides
Poster
Leaderboard
Best Short/WIP Paper Award Runner-Up
Just Rotate it: Deploying Backdoor Attacks via Rotation Transformation
Tong Wu, Jiachen T. Wang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal
AISEC 2022
Paper
Code
Demo
Adversarial Robustness of Deep Sensor Fusion Models
Shaojie Wang, Tong Wu, Ayan Chakrabarti, Yevgeniy Vorobeychik
WACV 2022
Paper
Code
Systems and methods for defending against physical attacks on image classification
Yevgeniy Vorobeychik, Tong Wu, Liang Tong
US Patent
Patent
Can Optical Trojans Assist Adversarial Perturbations?
Adith Boloor, Tong Wu, Patrick Naughton, Ayan Chakrabarti, Xuan Zhang, Yevgeniy Vorobeychik
AROW (ICCV'21)
Paper
REVIEWING:
TEACHING EXPERIENCE:
HONORS & AWARDS: