ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs

Published in NeurIPS 2025 (Spotlight), 2025

Recommended citation: Butler, L., Agarwal, A., Kang, J.S., Erginbas, Y.E., Yu, B., Ramchandran, K. "ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs". NeurIPS, 2025.
Download Paper