Magic Markup: Maintaining Document-External Markup with an LLM (2403.03481v1)
Abstract: Text documents, including programs, typically have human-readable semantic structure. Historically, programmatic access to these semantics has required explicit in-document tagging. Especially in systems where the text has an execution semantics, this means it is an opt-in feature that is hard to support properly. Today, LLMs offer a new method: metadata can be bound to entities in changing text using a model's human-like understanding of semantics, with no requirements on the document structure. This method expands the applications of document annotation, a fundamental operation in program writing, debugging, maintenance, and presentation. We contribute a system that employs an intelligent agent to re-tag modified programs, enabling rich annotations to automatically follow code as it evolves. We also contribute a formal problem definition, an empirical synthetic benchmark suite, and our benchmark generator. Our system achieves an accuracy of 90% on our benchmarks and can replace a document's tags in parallel at a rate of 5 seconds per tag. While there remains significant room for improvement, we find performance reliable enough to justify further exploration of applications.
- Tim Berners-Lee and R. Cailliau. 1990. WorldWideWeb: Proposal for a HyperText Project. https://www.w3.org/Proposal.html
- Robust annotation positioning in digital documents. In Proceedings of the SIGCHI conference on Human factors in computing systems. 285–292.
- Michael J Fischer and Richard E Ladner. 1979. Data structures for efficient implementation of sticky pointers in text editors. Department of Computer Science, University of Washington.
- Support for Long-Form Documentation Authoring and Maintenance. In 2023 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 109–114.
- Using Annotations for Sensemaking About Code. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–16.
- Ján Juhár. 2019. Supporting source code annotations with metadata-aware development environment. In 2019 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE, 411–420.
- Donald Knuth. 1984. Literate Programming. Comput. J. 27, 2 (1984), 97––111.
- Vladimir I Levenshtein et al. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. Soviet Union, 707–710.
- Steven P Reiss. 2008. Tracking source locations. In Proceedings of the 30th international conference on Software engineering. 11–20.
- Similarity in programs. In Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.