Evaluation of software impact designed for biomedical research: Are we measuring what's meaningful? (2306.03255v1)

Published 5 Jun 2023 in cs.SE and q-bio.OT

Abstract: Software is vital for the advancement of biology and medicine. Analysis of usage and impact metrics can help developers determine user and community engagement, justify additional funding, encourage additional use, identify unanticipated use cases, and help define improvement areas. However, there are challenges associated with these analyses including distorted or misleading metrics, as well as ethical and security concerns. More attention to the nuances involved in capturing impact across the spectrum of biological software is needed. Furthermore, some tools may be especially beneficial to a small audience, yet may not have compelling typical usage metrics. We propose more general guidelines, as well as strategies for more specific types of software. We highlight outstanding issues regarding how communities measure or evaluate software impact. To get a deeper understanding of current practices for software evaluations, we performed a survey of participants in the Informatics Technology for Cancer Research (ITCR) program funded by the National Cancer Institute (NCI). We also investigated software among this community and others to assess how often infrastructure that supports such evaluations is implemented and how this impacts rates of papers describing usage of the software. We find that developers recognize the utility of analyzing software usage, but struggle to find the time or funding for such analyses. We also find that infrastructure such as social media presence, more in-depth documentation, the presence of software health metrics, and clear information on how to contact developers seem to be associated with increased usage rates. Our findings can help scientific software developers make the most out of evaluations of their software.

Citations (1)

View on Semantic Scholar

Summary

The paper reveals that traditional metrics like downloads and citations do not fully capture the true impact of biomedical research software.
It analyzes ITCR program data to show that enhanced documentation and active social media presence correlate with increased software usage.
The paper advocates for hypothesis-driven, transparent evaluation methods that blend quantitative metrics with qualitative user feedback and ethical considerations.

Evaluation of Software Impact in Biomedical Research

The presented paper critically examines how the impact of software developed for biomedical research is evaluated. The authors, leveraging data and survey responses from the Informatics Technology for Cancer Research (ITCR) program funded by the NCI, investigate the methods currently used to assess the utility and community adoption of scientific software. They bring to light the challenges of relying solely on common metrics such as downloads and citations to denote software impact, advocating for a more nuanced approach that balances the interpretive nature of various data points while considering ethical considerations.

Key Findings and Discussions

The paper identifies significant barriers that developers encounter when attempting to gauge the impact of their software, namely limited time, funding, technical issues, privacy concerns, and a lack of knowledge regarding effective evaluation methods. Although developers recognize the importance of measuring impact for securing funding and informing future development, there is an apparent struggle to systematically quantify these outcomes.

A noteworthy aspect of the paper is the exploration of infrastructure elements—such as documentation, social media presence, and developer contact information—and their correlation with increased software usage. The authors document that tools with comprehensive documentation and an active social media profile, notably Twitter, experience higher rates of reported usage in scholarly articles. This suggests a strong link between transparency, user engagement, and perceived utility of scientific software.

Metrics and Their Implications

The paper emphasizes the necessity for hypothesis-driven metric selection that aligns with the intended software use cases. This approach mitigates the risks posed by biased or misaligned metrics, which can distort the true impact and utility of a tool. Developers are encouraged to pursue metrics that highlight both tool optimization and broader community impact. Furthermore, metrics should consider qualitative aspects such as user feedback quality, which can indicate user engagement and satisfaction.

There is a clear differentiation between metrics that serve internal development objectives—like usability and performance assessment—and those that address external validation needs—such as community acceptance and evidence of continued support. By recognizing these distinct goals, developers can tailor their evaluation frameworks to inform specific project objectives or community engagements.

Challenges in Current Evaluation Practices

The authors discuss several challenges that complicate the evaluation of software impact. These include the tendency of metrics to become less representative of true usage over time, a phenomenon the authors liken to Goodhart's Law. Furthermore, the paper addresses the ethical and privacy concerns associated with data collection—highlighting the need for compliance with regulations like GDPR and recommending transparency with users about data tracking.

Additionally, the authors argue that while metrics like citation counts are useful, they are not universally applicable across all forms of software, particularly for tools handling sensitive clinical data or those primarily acting as infrastructural underpinnings. A nuanced approach is needed to appreciate the varied forms of software impact across different domains and user communities.

Future Directions and Recommendations

The paper suggests that the development of more sophisticated metrics could greatly enhance the understanding and demonstration of software impact. Streamlining the assessment frameworks to allow developers to capture meaningful interactions without succumbing to over-optimization is proposed as a critical future direction. As the paper describes, such advancements can ultimately guide funders in appreciating the full scope of a software tool's contributions to scientific and medical communities.

In conclusion, this paper underscores the complexity inherent in evaluating the impact of biomedical software and calls for a multi-faceted approach to metric development that aligns with the broad range of user needs and ethical considerations. Insights from this research could inform both funding policies and the design of future software tools, promoting sustained innovation and dissemination in biomedical research.

PDF Markdown

Related Papers

Tweets

https://twitter.com/meli_mq99/status/1771254592044454184

YouTube

Show All Videos