Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Magic Markup: Maintaining Document-External Markup with an LLM (2403.03481v1)

Published 6 Mar 2024 in cs.CL

Abstract: Text documents, including programs, typically have human-readable semantic structure. Historically, programmatic access to these semantics has required explicit in-document tagging. Especially in systems where the text has an execution semantics, this means it is an opt-in feature that is hard to support properly. Today, LLMs offer a new method: metadata can be bound to entities in changing text using a model's human-like understanding of semantics, with no requirements on the document structure. This method expands the applications of document annotation, a fundamental operation in program writing, debugging, maintenance, and presentation. We contribute a system that employs an intelligent agent to re-tag modified programs, enabling rich annotations to automatically follow code as it evolves. We also contribute a formal problem definition, an empirical synthetic benchmark suite, and our benchmark generator. Our system achieves an accuracy of 90% on our benchmarks and can replace a document's tags in parallel at a rate of 5 seconds per tag. While there remains significant room for improvement, we find performance reliable enough to justify further exploration of applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. Tim Berners-Lee and R. Cailliau. 1990. WorldWideWeb: Proposal for a HyperText Project. https://www.w3.org/Proposal.html
  2. Robust annotation positioning in digital documents. In Proceedings of the SIGCHI conference on Human factors in computing systems. 285–292.
  3. Michael J Fischer and Richard E Ladner. 1979. Data structures for efficient implementation of sticky pointers in text editors. Department of Computer Science, University of Washington.
  4. Support for Long-Form Documentation Authoring and Maintenance. In 2023 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 109–114.
  5. Using Annotations for Sensemaking About Code. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–16.
  6. Ján Juhár. 2019. Supporting source code annotations with metadata-aware development environment. In 2019 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE, 411–420.
  7. Donald Knuth. 1984. Literate Programming. Comput. J. 27, 2 (1984), 97––111.
  8. Vladimir I Levenshtein et al. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. Soviet Union, 707–710.
  9. Steven P Reiss. 2008. Tracking source locations. In Proceedings of the 30th international conference on Software engineering. 11–20.
  10. Similarity in programs. In Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.

Summary

We haven't generated a summary for this paper yet.