Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 154 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 110 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

FACTTRACK: Time-Aware World State Tracking in Story Outlines (2407.16347v2)

Published 23 Jul 2024 in cs.CL

Abstract: While accurately detecting and correcting factual contradictions in LLM outputs has become increasingly important as their capabilities improve, doing so is highly challenging. We propose a novel method, FACTTRACK, for tracking atomic facts and addressing factual contradictions. Crucially, FACTTRACK also maintains time-aware validity intervals for each fact, allowing for change over time. At a high level, FACTTRACK consists of a four-step pipeline to update a world state data structure for each new event: (1) decompose the event into directional atomic facts; (2) determine the validity interval of each atomic fact using the world state; (3) detect contradictions with existing facts in the world state; and finally (4) add new facts to the world state and update existing atomic facts. When we apply FACTTRACK to contradiction detection on structured story outlines, we find that FACTTRACK using LLaMA2-7B-Chat substantially outperforms a fair baseline using LLaMA2-7B-Chat, and achieves performance comparable to a GPT4 baseline. Moreover, when using GPT4, FACTTRACK significantly outperforms the GPT4 baseline.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. James F Allen. 1983. Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11):832–843.
  2. MultiFC: A real-world multi-domain dataset for evidence-based fact checking of claims. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4685–4697, Hong Kong, China. Association for Computational Linguistics.
  3. Narrative incoherence detection.
  4. Guan-Lin Chao and Ian Lane. 2019. Bert-dst: Scalable end-to-end dialogue state tracking with bidirectional encoder representations from transformer.
  5. Finding contradictions in text. In Proceedings of ACL-08: HLT, pages 1039–1047, Columbus, Ohio. Association for Computational Linguistics.
  6. Time-aware language models as temporal knowledge bases. Transactions of the Association for Computational Linguistics, 10:257–273.
  7. Hierarchical neural story generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 889–898, Melbourne, Australia. Association for Computational Linguistics.
  8. Generating fact checking briefs. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7147–7161, Online. Association for Computational Linguistics.
  9. Richard E Fikes and Nils J Nilsson. 1971. Strips: A new approach to the application of theorem proving to problem solving. Artificial intelligence, 2(3-4):189–208.
  10. Longt5: Efficient text-to-text transformer for long sequences. CoRR, abs/2112.07916.
  11. Tracking the world state with recurrent entity networks.
  12. The living handbook of narratology. Hamburg: Hamburg University. URL: http://www.lhn.uni-hamburg.de (Retrieved on 12.03. 2020).
  13. Unsupervised dense information retrieval with contrastive learning.
  14. Temporalwiki: A lifelong benchmark for training and evaluating ever-evolving language models.
  15. Wice: Real-world entailment for claims in wikipedia.
  16. Globally coherent text generation with neural checklist models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 329–339, Austin, Texas. Association for Computational Linguistics.
  17. A hierarchical neural autoencoder for paragraphs and documents. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1106–1115, Beijing, China. Association for Computational Linguistics.
  18. Lost in the middle: How language models use long contexts.
  19. SemEval-2019 task 8: Fact checking in community question answering forums. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 860–869, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
  20. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. arXiv preprint arXiv:2305.14251.
  21. Co-writing screenplays and theatre scripts with language models: An evaluation by industry professionals.
  22. OpenAI. 2023. Gpt-4 technical report.
  23. Training language models to follow instructions with human feedback.
  24. POTATO: The portable text annotation tool. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 327–337, Abu Dhabi, UAE. Association for Computational Linguistics.
  25. Plotmachines: Outline-conditioned generation with dynamic plot state tracking. CoRR, abs/2004.14967.
  26. The role of context in detecting previously fact-checked claims. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1619–1631, Seattle, United States. Association for Computational Linguistics.
  27. Towards generating long and coherent text with multi-level latent variable models. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2079–2089, Florence, Italy. Association for Computational Linguistics.
  28. Weakly supervised memory networks. CoRR, abs/1503.08895.
  29. Blaise Thomson and Steve Young. 2010. Bayesian update of dialogue state: A pomdp framework for spoken dialogue systems. Computer Speech & Language, 24(4):562–588.
  30. FEVER: a large-scale dataset for fact extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819, New Orleans, Louisiana. Association for Computational Linguistics.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 8 likes.

Upgrade to Pro to view all of the tweets about this paper: