Comgra: A Tool for Analyzing and Debugging Neural Networks
Abstract: Neural Networks are notoriously difficult to inspect. We introduce comgra, an open source python library for use with PyTorch. Comgra extracts data about the internal activations of a model and organizes it in a GUI (graphical user interface). It can show both summary statistics and individual data points, compare early and late stages of training, focus on individual samples of interest, and visualize the flow of the gradient through the network. This makes it possible to inspect the model's behavior from many different angles and save time by rapidly testing different hypotheses without having to rerun it. Comgra has applications for debugging, neural architecture design, and mechanistic interpretability. We publish our library through Python Package Index (PyPI) and provide code, documentation, and tutorials at https://github.com/FlorianDietz/comgra.
- TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org.
- Anonymous. Mechanistic interpretability for AI safety - a review. Submitted to Transactions on Machine Learning Research, 2024. URL https://openreview.net/forum?id=ePUVetPKu6. Under review.
- PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation. In 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS ’24). ACM, April 2024. doi: 10.1145/3620665.3640366. URL https://pytorch.org/assets/pytorch2-2.pdf.
- Fiotto-Kaufman, J. nnsight: The package for interpreting and manipulating the internals of deep learned models. . URL https://github.com/JadenFiotto-Kaufman/nnsight.
- Gildenblat, J. and contributors. Pytorch library for cam methods. https://github.com/jacobgil/pytorch-grad-cam, 2021.
- Captum: A unified and generic model interpretability library for pytorch, 2020.
- Nanda, N. A comprehensive mechanistic interpretability explainer i& glossary, Dec 2022. URL https://neelnanda.io/glossary.
- Transformerlens. https://github.com/TransformerLensOrg/TransformerLens, 2022.
- A comprehensive overview of large language models. ArXiv, abs/2307.06435, 2023. URL https://api.semanticscholar.org/CorpusID:259847443.
- Grokking: Generalization beyond overfitting on small algorithmic datasets. ArXiv, abs/2201.02177, 2022. URL https://api.semanticscholar.org/CorpusID:245769834.
- Inseq: An Interpretability Toolkit for Sequence Generation Models. pp. 421–435, July 2023. URL https://aclanthology.org/2023.acl-demo.40.
- Extracting and visualizing hidden activations and computational graphs of pytorch models with torchlens. Scientific Reports, 13(1):14375, 2023. doi: 10.1038/s41598-023-40807-0. URL https://doi.org/10.1038/s41598-023-40807-0.
- pyvene: A library for understanding and improving PyTorch models via interventions. 2024. URL arxiv.org/abs/2403.07809.
- A survey on neural network interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence, 5:726–742, 2020. URL https://api.semanticscholar.org/CorpusID:229678413.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.