Proposal Report for the 2nd SciCAP Competition 2024 (2407.01897v1)
Abstract: In this paper, we propose a method for document summarization using auxiliary information. This approach effectively summarizes descriptions related to specific images, tables, and appendices within lengthy texts. Our experiments demonstrate that leveraging high-quality OCR data and initially extracted information from the original text enables efficient summarization of the content related to described objects. Based on these findings, we enhanced popular text generation model models by incorporating additional auxiliary branches to improve summarization performance. Our method achieved top scores of 4.33 and 4.66 in the long caption and short caption tracks, respectively, of the 2024 SciCAP competition, ranking highest in both categories.
- Noise-aware image captioning with progressively exploring mismatched words. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 12091–12099, 2024.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning, pages 19730–19742. PMLR, 2023.
- Mrtf: model refinery for transductive federated learning. volume 37, pages 2046–2069. Springer, 2023.
- Map: Model aggregation and personalization in federated learning with incomplete classes. IEEE, 2024.
- Image segmentation using text and image prompts. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7086–7096, 2022.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Covlr: Coordinating cross-modal consistency and intra-modal relations for vision-language retrieval. ICME, 2024.
- Covlr: Coordinating cross-modal consistency and intra-modal relations for vision-language retrieval. 2024.
- Semi-supervised multi-modal multi-instance multi-label deep network with optimal transport. IEEE Transactions on Knowledge and Data Engineering, 33(2):696–709, 2019.
- Semi-supervised multi-modal clustering and classification with incomplete modalities. IEEE Transactions on Knowledge and Data Engineering, 33(2):682–695, 2019.
- Learning adaptive embedding considering incremental class. volume 35, pages 2736–2749. IEEE, 2021.
- Rethinking label-wise cross-modal retrieval from a semantic sharing perspective. In IJCAI, pages 3300–3306, 2021.
- Exploiting cross-modal prediction and relation consistency for semisupervised image captioning. IEEE Transactions on Cybernetics, 54(2):890–902, 2022.
- Domfn: A divergence-orientated multi-modal fusion network for resume assessment. In Proceedings of the 30th ACM International Conference on Multimedia, pages 1612–1620, 2022.
- Contextualized knowledge graph embedding for explainable talent training course recommendation. volume 42, pages 1–27. ACM New York, NY, USA, 2023.
- Alignment efficient image-sentence retrieval considering transferable cross-modal representation learning. volume 18, page 181335. Springer, 2024.
- Robust semi-supervised learning by wisely leveraging open-set data. IEEE, 2024.
- Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International conference on machine learning, pages 11328–11339. PMLR, 2020.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.