Research Publications
Explore our previous research on ensuring reliability and interpretability in AI systems.
December 2025•arXiv Preprint
Mechanistic Interpretability of Antibody Language Models Using SAEs
Using sparse autoencoders to interpret an antibody language model, revealing biologically meaningful latent features and enabling precise generation control.
InterpretabilityAntibodySparse Autoencoders
Rebonto Haque, Oliver M. Turnbull +4arXiv →
December 2025•arXiv Preprint
Enforcing Orderedness to Improve Feature Consistency
Introducing Ordered Sparse Autoencoders (OSAE) that establish strict latent feature ordering, improving interpretability consistency over Matryoshka baselines.
InterpretabilitySparse AutoencodersFeature Consistency
Sophie L. Wang, Alex Quach +2arXiv →
February 2025•ICLR GEM Bio Workshop 2025
Towards Interpretable Protein Structure Prediction with Sparse Autoencoders
Scaling sparse autoencoders to ESM2-3B to enable mechanistic interpretability of protein structure prediction for the first time, with targeted steering of ESMFold outputs.
InterpretabilityProtein StructureSparse AutoencodersESMFold
Nithin Parsan, David J. Yang +1arXiv →