Justin Kang

I am a PhD candidate at UC Berkeley (EECS), affiliated with BAIR and advised by Prof. Kannan Ramchandran. My research develops efficient algorithms for ML interpretability and attribution — explaining which input features, training data, and interactions drive model predictions in LLMs and other large-scale models.

I am on the job market for research scientist and ML engineer positions starting in 2026. CV / Resume

Research Highlights

ProxySPEX pipeline: masked inference, proxy fitting, and Fourier extraction for LLM interpretability
The ProxySPEX pipeline — scalable feature interaction explanations for LLMs
  • Interpretability & Attribution: I build scalable tools (SPEX, ProxySPEX) that identify important feature interactions in LLMs, achieving up to 20% better faithfulness than prior methods like SHAP, and scaling to 1000+ input features. Check out the shapiq library to try it out!
  • Signal Processing → ML: I bring a strong signal processing and information theory perspective to ML problems, which leads to unique algorithmic solutions — including sparse Möbius/Fourier transforms for efficient model explanation.
  • Faithfulness of Explanations: I recently led work on evaluating whether LLM self-explanations are faithful to actual model behavior in collaboration with Noah Siegel from Google Deepmind.
  • Award-Winning Research: My work on scheduling in massive random access networks won the 2024 IEEE ComSoc & IT Society Joint Paper Award. Joint Paper Award

Recent News

Older Announcements


Selected Papers

  1. A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behavior. Mayne, H.(e), Kang, J.S.(e), Gould, D., Ramchandran, K., Mahdi, A., Siegel, N.Y. Preprint 2026 paper

  2. An Odd Estimator for Shapley Values. Fumagalli, F., Butler, L., Kang, J.S., Ramchandran, K., Witter, R.T. Preprint 2026 paper

  3. ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs. Butler, L(e), Agarwal, A.(e), Kang, J.(e), Erginbas Y.E., Ramchandran, K., Yu, B. NeurIPS 2025 Spotlight paper

  4. SHAP-Zero explains biological sequence models with near-zero marginal cost for future queries. Tsui, D, Musharaf, A, Erginbas, Y.E., Kang, J.S., Aghazadeh. NeurIPS 2025 paper

  5. SPEX: Scaling Feature Interaction Explanations for LLMs. Kang, J.S(e)., Butler, L.(e), Agarwal, A.(e), Erginbas Y.E., Pedarsani, R., Ramchandran, K., Yu, B. ICML 2025 paper

  6. Learning to Understand: Identifying Interactions via the Mobius Transform. Kang, J.S., Erginbas, Y.E., Butler, L., Pedarsani, R., Ramchandran, K. (2024) NeurIPS 2024 paper · video

  7. The Fair Value of Data Under Heterogeneous Privacy Constraints in Federated Learning. Justin Singh Kang, Ramtin Pedarsani and Kannan Ramchandran TMLR 2024 paper · video

Industry Experience

  • Google — Student Researcher, Cloud Platforms Systems Research Group (Summer 2024)
  • Bosch AI Research — Research Intern, working on autolabeling and data filtering (Summer 2025)
  • Intel — Non-Volatile Memory Solutions Group, Storage Systems Research Intern (previously)

Education

  • PhD in EECS, UC Berkeley (in progress)
  • M.A.Sc. in ECE, University of Toronto — advised by Prof. Wei Yu
  • B.A.Sc. in Engineering Physics, University of British Columbia