Research Publications

Explore our previous research on ensuring reliability and interpretability in AI systems.

December 2025•arXiv Preprint

Mechanistic Interpretability of Antibody Language Models Using SAEs

Using sparse autoencoders to interpret an antibody language model, revealing biologically meaningful latent features and enabling precise generation control.

InterpretabilityAntibodySparse Autoencoders

Rebonto Haque, Oliver M. Turnbull +4arXiv →

December 2025•arXiv Preprint

Enforcing Orderedness to Improve Feature Consistency

Introducing Ordered Sparse Autoencoders (OSAE) that establish strict latent feature ordering, improving interpretability consistency over Matryoshka baselines.

InterpretabilitySparse AutoencodersFeature Consistency

Sophie L. Wang, Alex Quach +2arXiv →

February 2025•ICLR GEM Bio Workshop 2025

Towards Interpretable Protein Structure Prediction with Sparse Autoencoders

Scaling sparse autoencoders to ESM2-3B to enable mechanistic interpretability of protein structure prediction for the first time, with targeted steering of ESMFold outputs.

InterpretabilityProtein StructureSparse AutoencodersESMFold

Nithin Parsan, David J. Yang +1arXiv →