- The paper introduces a framework to transition from PDF to HTML, enhancing accessibility for users with disabilities.
- The paper's surveys and interviews reveal that only 30% of users find traditional PDF formats fully accessible.
- The paper recommends leveraging arXiv's submission process to adopt structured HTML, promoting inclusive open science.
The paper "A framework for improving the accessibility of research papers on arXiv.org" explores the multifaceted challenges and potential solutions for enhancing the accessibility of academic research hosted on arXiv. The authors contend that despite arXiv's efforts to foster open access across linguistic, financial, and institutional barriers, these are not sufficient for inclusion of individuals with disabilities. They propose that true openness necessitates accessibility to all, regardless of visual, auditory, or cognitive impairments.
The paper addresses several key barriers to accessibility, including technological constraints related to the predominant use of PDF formats, which inherently possess low native accessibility features. The authors highlight that less than 3% of PDFs meet comprehensive accessibility criteria despite legal and institutional mandates, such as those from the USA's White House policy on open access and European initiatives like Plan S, which demand equivalent access to federally funded research.
The authors propose transitioning to HTML alongside existing PDF and TeX formats as a pivotal step toward enhancing accessibility. HTML is more adaptable and can facilitate semantic enrichment, improving machine readability and the functionality of assistive technologies, such as screen readers. The paper underscores the technical difficulty of adapting LaTeX—widely used in scientific communities—to accessible HTML due to its extensibility and the lack of native HTML generation capabilities.
The research process included quantitative surveys and qualitative interviews. Survey findings reveal a significant discrepancy in accessibility for users reliant on assistive technologies, with only 30% reporting full accessibility without external aid. Respondents identified PDF formatting as the significant barrier, with users of assistive technologies favoring HTML over PDF due to its inherent advantages for accessibility, adaptability, and machine readability.
Interviews with stakeholders—researchers, accessibility experts, and legal specialists—reinforce these findings, emphasizing PDF's limitations and the potential for HTML to revolutionize access for researchers with disabilities. However, substantial skepticism exists about the potential for paradigm shifts without significant advocacy and systemic adoption of standardized accessibility practices.
The authors suggest leveraging arXiv's control over the submission pipeline to facilitate structured content submissions compatible with HTML conversion, calling for cultural shifts and author engagement to bridge existing gaps. They envision an ecosystem where HTML complements and potentially surpasses PDF as the primary format, thus aligning arXiv with accessibility mandates and improving research dissemination.
In conclusion, the paper argues for the necessity of well-formatted HTML as an inclusive step in open science, leveraging arXiv's unique position to influence broader systemic changes in academic publishing accessibility. The authors note ongoing efforts and call for collaboration to make significant strides in research accessibility. They detail a scoring system comparing PDF and HTML formats under various accessibility criteria, concluding that HTML fundamentally outperforms PDF in supporting accessible technological ecosystems.