CausalVLR: A Toolbox and Benchmark for Visual-Linguistic Causal Reasoning (2306.17462v2)
Abstract: We present CausalVLR (Causal Visual-Linguistic Reasoning), an open-source toolbox containing a rich set of state-of-the-art causal relation discovery and causal inference methods for various visual-linguistic reasoning tasks, such as VQA, image/video captioning, medical report generation, model generalization and robustness, etc. These methods have been included in the toolbox with PyTorch implementations under NVIDIA computing system. It not only includes training and inference codes, but also provides model weights. We believe this toolbox is by far the most complete visual-linguitic causal reasoning toolbox. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to re-implement existing methods and develop their own new causal reasoning methods. Code and models are available at https://github.com/HCPLab-SYSU/CausalVLR. The project is under active development by HCP-Lab's contributors and we will keep this document updated.
- Visual-linguistic causal intervention for radiology report generation. arXiv preprint arXiv:2303.09117, 2023.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Audio-visual contrastive learning for self-supervised action recognition. arXiv preprint arXiv:2204.13386, 2022.
- Towards causality-aware inferring: A sequential discriminative approach for medical diagnosis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Denselight: Efficient control for large-scale traffic signals with dense feedback. arXiv preprint arXiv:2306.07553, 2023.
- Cross-modal causal relational reasoning for event-level visual question answering. arXiv preprint arXiv:2207.12647, 2022.
- Causality-aware visual scene discovery for cross-modal question reasoning. arXiv preprint arXiv:2304.08083, 2023.
- Cross-modal causal relational reasoning for event-level visual question answering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Combining multiple features for cross-domain face sketch recognition. In Biometric Recognition: 11th Chinese Conference, CCBR 2016, Chengdu, China, October 14-16, 2016, Proceedings 11, pages 139–146. Springer, 2016.
- Hierarchically learned view-invariant representations for cross-view action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 29(8):2416–2430, 2018.
- Global temporal representation based cnns for infrared action recognition. IEEE Signal Processing Letters, 25(6):848–852, 2018.
- Deep image-to-video adaptation and fusion networks for action recognition. IEEE Transactions on Image Processing, 29:3168–3182, 2019.
- Transferable feature representation for visible-to-infrared cross-dataset human action recognition. Complexity, 2018:1–20, 2018.
- Semantics-aware adaptive knowledge distillation for sensor-to-vision action recognition. IEEE Transactions on Image Processing, 30:5573–5588, 2021.
- Tcgl: Temporal contrastive graph for self-supervised video representation learning. IEEE Transactions on Image Processing, 31:1978–1993, 2022.
- Causal reasoning meets visual representation learning: A prospective study. Machine Intelligence Research, pages 1–27, 2022.
- Causal reasoning with spatial-temporal representation learning: A prospective study. arXiv preprint arXiv:2204.12037, 2022.
- Cross-modal knowledge distillation for vision-to-sensor action recognition. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4448–4452. IEEE, 2022.
- Judea Pearl. Causality. Cambridge university press, 2009.
- Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634, 2021.
- Towards causalgpt: A multi-agent approach for faithful knowledge reasoning via promoting causal consistency in llms. arXiv preprint arXiv:2308.11914, 2023.
- Chatgpt: five priorities for research. Nature, 614(7947):224–226, 2023.
- Urban regional function guided traffic flow prediction. Information Sciences, 634:308–320, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS Systems, 2022.
- Visual causal scene refinement for video question answering. MM ’23, page 377–386, New York, NY, USA, 2023. Association for Computing Machinery.
- Scene graph to image synthesis via knowledge consensus. In AAAI, 2023.
- Masked images are counterfactual samples for robust fine-tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20301–20310, 2023.
- Skeletonmae: graph-based masked autoencoder for skeleton sequence pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5606–5618, 2023.
- Glm-130b: An open bilingual pre-trained model. ICLR, 2023.
- Hybrid-order representation learning for electricity theft detection. IEEE Transactions on Industrial Informatics, 19(2):1248–1259, 2022.