The interpretability tools built for language models — sparse autoencoders, activation steering, concept probing — are proving equally effective on physics simulators, protein models, and music generators. The convergence suggests something universal about how foundation models organize knowledge.
concept-engineeringrepresentation-engineeringphysics-foundation-models