Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition (2107.12090v2)

Published 26 Jul 2021 in cs.CV

Abstract: Although text recognition has significantly evolved over the years, state-of-the-art (SOTA) models still struggle in the wild scenarios due to complex backgrounds, varying fonts, uncontrolled illuminations, distortions and other artefacts. This is because such models solely depend on visual information for text recognition, thus lacking semantic reasoning capabilities. In this paper, we argue that semantic information offers a complementary role in addition to visual only. More specifically, we additionally utilize semantic information by proposing a multi-stage multi-scale attentional decoder that performs joint visual-semantic reasoning. Our novelty lies in the intuition that for text recognition, the prediction should be refined in a stage-wise manner. Therefore our key contribution is in designing a stage-wise unrolling attentional decoder where non-differentiability, invoked by discretely predicted character labels, needs to be bypassed for end-to-end training. While the first stage predicts using visual features, subsequent stages refine on top of it using joint visual-semantic information. Additionally, we introduce multi-scale 2D attention along with dense and residual connections between different stages to deal with varying scales of character sizes, for better performance and faster convergence during training. Experimental results show our approach to outperform existing SOTA methods by a considerable margin.

Overview of LaTeX Author Guidelines for ICCV Proceedings

The presented paper offers meticulous guidelines for authors preparing submissions for the International Conference on Computer Vision (ICCV), with specific attention to using LaTeX for document creation. While the paper primarily serves a practical role in facilitating adherence to conference formatting protocols, it also provides insight into the broader considerations involved in academic publication processes, including dual submission policies, blind review nuances, and mathematical notation.

Content Specification

The document highlights several critical aspects of formatting and submission:

  1. Language and Submission Policies: It mandates English as the submission language and provides detailed instructions on dual submission protocols to maintain the integrity and originality of conference materials.
  2. Paper Length and Review Process: Authors are advised on the eight-page limit for the main content, excluding references, with stern warnings against attempting to manipulate formatting to extend beyond the prescribed limit. This ensures equitable treatment of all submissions within the review process.
  3. Formatting and Style Guidelines: Noteworthy are the specifications for formatting, including two-column layouts, font types and sizes, and precise margin settings. The inclusion of a printed ruler in the LaTeX template aids reviewers in referencing specific content without ambiguity.
  4. Mathematical Notation and Blind Review Details: From rigorous section and equation numbering to appropriate citation practices during blind review, the guidelines ensure clarity and impartiality in scientific communication.
  5. Technical Elements: The paper addresses handling figures and illustrations, emphasizing the importance of ensuring clarity in printed formats, which is particularly pertinent given the technical nature of ICCV contributions.

Implications and Future Perspectives

While the paper itself does not present novel research findings, its utility in ensuring standardization across submissions should not be underestimated. Consistent formatting improves accessibility and readability, permitting reviewers and the broader scientific community to focus on content quality and contribution without being distracted by inconsistencies in presentation.

The guidelines also serve an educational role, equipping both novice and seasoned researchers with frameworks essential for successful scientific writing. As the landscape of AI and computer vision continues to evolve, future iterations of these guidelines may incorporate considerations for ethical AI research, open-access dissemination, or the inclusion of multimedia elements, reflecting changing standards in technology and publication.

In conclusion, while straightforward, the guidelines encapsulated within this document underpin the foundational practices necessary for high-quality academic discourse and ensure that the technical rigor of ICCV submissions is matched by equally rigorous presentation standards.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ayan Kumar Bhunia (63 papers)
  2. Aneeshan Sain (40 papers)
  3. Amandeep Kumar (14 papers)
  4. Shuvozit Ghose (10 papers)
  5. Pinaki Nath Chowdhury (37 papers)
  6. Yi-Zhe Song (120 papers)
Citations (54)
Youtube Logo Streamline Icon: https://streamlinehq.com