What's Mine becomes Yours: Defining, Annotating and Detecting Context-Dependent Paraphrases in News Interview Dialogs
Abstract: Best practices for high conflict conversations like counseling or customer support almost always include recommendations to paraphrase the previous speaker. Although paraphrase classification has received widespread attention in NLP, paraphrases are usually considered independent from context, and common models and datasets are not applicable to dialog settings. In this work, we investigate paraphrases in dialog (e.g., Speaker 1: "That book is mine." becomes Speaker 2: "That book is yours."). We provide an operationalization of context-dependent paraphrases, and develop a training for crowd-workers to classify paraphrases in dialog. We introduce a dataset with utterance pairs from NPR and CNN news interviews annotated for context-dependent paraphrases. To enable analyses on label variation, the dataset contains 5,581 annotations on 600 utterance pairs. We present promising results with in-context learning and with token classification models for automatic paraphrase detection in dialog.
- GPT-4 technical report. Computing Research Repository, arXiv:2303.08774.
- Ion Androutsopoulos and Prodromos Malakasiotis. 2010. A survey of paraphrasing and textual entailment methods. Journal of Artificial Intelligence Research, 38:135–187.
- BIG bench authors. 2023. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research.
- Rahul Bhagat and Eduard Hovy. 2013. Squibs: What is a paraphrase? Computational Linguistics, 39(3):463–472.
- Dwight Bolinger. 1974. Meaning and form. Transactions of the New York Academy of Sciences, 36(2 Series II):218–233.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
- Extractive adversarial networks: High-recall explanations for identifying personal attacks in social media posts. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3497–3507, Brussels, Belgium. Association for Computational Linguistics.
- Santiago Castro. 2017. Fast Krippendorff: Fast computation of Krippendorff’s alpha agreement measure. https://github.com/pln-fing-udelar/fast-krippendorff.
- Eve V Clark. 1992. Conventionality and contrast: Pragmatic principles with lexical consequences. In Frames, Fields, and Contrasts: New Essays in Semantic and Lexical Organization, pages 171–188. Lawrence Erlbaum Associates.
- Herbert H Clark. 1996. Using language. Cambridge University Press.
- Steven Clayman and John Heritage. 2002. The news interview: Journalists and public figures on the air. Cambridge University Press.
- Mark G Core and James Allen. 1997. Coding dialogs with the DAMSL annotation scheme. In AAAI Fall Symposium on Communicative Aaction in Humans and Machines, volume 56, pages 28–35. Boston, MA.
- Wayne A Davis. 2002. Meaning, expression and thought. Cambridge University Press.
- ERASER: A benchmark to evaluate rationalized NLP models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4443–4458, Online. Association for Computational Linguistics.
- Mimic and rephrase: Reflective listening in open-ended dialogue. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 393–403, Hong Kong, China. Association for Computational Linguistics.
- William B. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005).
- ParaSCI: A large scientific paraphrase dataset for longer paraphrase generation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 424–434, Online. Association for Computational Linguistics.
- Sean P. Engelson and Ido Dagan. 1996. Minimizing manual annotation cost in supervised training from corpora. In 34th Annual Meeting of the Association for Computational Linguistics, pages 319–326, Santa Cruz, California, USA. Association for Computational Linguistics.
- SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Simon Garrod and Anthony Anderson. 1987. Saying what you mean in dialogue: A study in conceptual and semantic co-ordination. Cognition, 27(2):181–218.
- H Paul Grice. 1957. Meaning. The philosophical review, 66(3):377–388.
- H Paul Grice. 1975. Logic and conversation. In Speech acts, pages 41–58. Brill.
- DeBERTa: Decoding-enhanced bert with disentangled attention. Computing Research Repository, arXiv:2006.03654.
- Joe Hight and Frank Smyth. 2002. Tragedies & journalists: A guide for more effective coverage. Dart Center for Journalism and Trauma.
- Clara E Hill. 1992. An overview of four measures developed to test the Hill process model: Therapist intentions, therapist response modes, client reactions, and client behaviors. Journal of Counseling & Development, 70(6):728–739.
- Graeme Hirst. 2003. Paraphrasing paraphrased. In Keynote address for The Second International Workshop on Paraphrasing: Paraphrase acquisition and Applications.
- Jennifer Hu and Roger Levy. 2023. Prompting is not a substitute for probability measurements in large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5040–5060, Singapore. Association for Computational Linguistics.
- Mistral 7B. Computing Research Repository, arXiv:2310.06825.
- Mixtral of experts. Computing Research Repository, arXiv:2401.04088.
- Nan-Jiang Jiang and Marie-Catherine de Marneffe. 2022. Investigating reasons for disagreement in natural language inference. Transactions of the Association for Computational Linguistics, 10:1357–1374.
- Dan Jurafsky and James H Martin. 2019. Speech and language processing (3rd ed. draft).
- Annotation guidelines for the turku paraphrase corpus. Computing Research Repository, arXiv:2108.07499.
- Towards diverse and contextually anchored paraphrase modeling: A dataset and baselines for finnish. Natural Language Engineering, page 1–35.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
- ETPC - a paraphrase identification corpus annotated with extended paraphrase typology and negation. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
- Klaus Krippendorff. 1980. Content analysis: An introduction to its methodology. Sage publications.
- Klaus Krippendorff. 1995. On the reliability of unitizing continuous data. Sociological Methodology, pages 47–76.
- Reformulating unsupervised style transfer as paraphrase generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 737–762, Online. Association for Computational Linguistics.
- A continuously growing dataset of sentential paraphrases. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1224–1234, Copenhagen, Denmark. Association for Computational Linguistics.
- Can large language models capture dissenting human voices? In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4569–4585, Singapore. Association for Computational Linguistics.
- ILDC for CJPE: Indian legal documents corpus for court judgment prediction and explanation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4046–4062, Online. Association for Computational Linguistics.
- HateXplain: A benchmark dataset for explainable hate speech detection. Proceedings of the AAAI Conference on Artificial Intelligence, 35(17):14867–14875.
- RaFoLa: A rationale-annotated corpus for detecting indicators of forced labour. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3610–3625, Marseille, France. European Language Resources Association.
- William R Miller and Stephen Rollnick. 2012. Motivational interviewing: Helping people change. Guilford press.
- What can we learn from collective human opinions on natural language inference data? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9131–9143, Online. Association for Computational Linguistics.
- Ellie Pavlick and Tom Kwiatkowski. 2019. Inherent disagreements in human textual inferences. Transactions of the Association for Computational Linguistics, 7:677–694.
- Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12:2825–2830.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
- Stefan Riezler and Michael Hagmann. 2022. Validity, reliability, and significance: Empirical methods for NLP and data science. Springer Nature.
- Carl Ransom Rogers. 1951. Client-centered therapy: Its current practice, implications, and theory. Houghton Mifflin, Boston.
- Carla Roos. 2022. Everyday Diplomacy: dealing with controversy online and face-to-face. Ph.D. thesis, University of Groningen.
- Annotators with attitudes: How annotator beliefs and identities bias toxic language detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5884–5906, Seattle, United States. Association for Computational Linguistics.
- Skipper Seabold and Josef Perktold. 2010. Statsmodels: econometric and statistical modeling with python. SciPy, 7:1.
- Interviewing: A guide for journalists and writers. Routledge.
- Modeling motivational interviewing strategies on an online peer-to-peer counseling platform. Proceedings of the ACM on Human-Computer Interaction, 6(CSCW2):1–24.
- Vered Shwartz and Ido Dagan. 2016. Adding context to semantic data-driven paraphrasing. In Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, pages 108–113, Berlin, Germany. Association for Computational Linguistics.
- Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational Linguistics, 26(3):339–374.
- Cross-domain semantic parsing via paraphrasing. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1235–1246, Copenhagen, Denmark. Association for Computational Linguistics.
- Explaining why: How instructions and user interfaces impact annotator rationales when labeling text data. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 521–531, Seattle, United States. Association for Computational Linguistics.
- Gemma: Open models based on gemini research and technology. Computing Research Repository, arXiv:2403.08295.
- Llama 2: Open foundation and fine-tuned chat models. Computing Research Repository, arXiv:2307.09288.
- Learning from disagreement: A survey. Journal of Artificial Intelligence Research, 72:1385–1470.
- Crisis (hostage) negotiation: current strategies and issues in high-risk conflict resolution. Aggression and Violent Behavior, 10(5):533–551.
- Negotiating in the skies of hong kong: The efficacy of the behavioral influence stairway model (BISM) in suicidal crisis situations. Aggression and violent behavior, 48:230–239.
- Is this a paraphrase? What kind? Paraphrase boundaries and typology. Open Journal of Modern Linguistics, 4(01):205.
- Chris Voss and Tahl Raz. 2016. Never split the difference: Negotiating as if your life depended on it. Random House.
- Paraphrase types for generation and detection. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12148–12164, Singapore. Association for Computational Linguistics.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium. Association for Computational Linguistics.
- OpenChat: Advancing open-source language models with mixed-quality data. Computing Research Repository, arXiv:2309.11235.
- ParaTag: A dataset of paraphrase tagging for fine-grained labels, NLG evaluation, and data augmentation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7111–7122, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Self-consistency improves chain of thought reasoning in language models. Computing Research Repository, arXiv:2203.11171.
- Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Active listening in peer interviews: The influence of message paraphrasing on perceptions of listening skill. International Journal of Listening, 24(1):34–49.
- Finetuned language models are zero-shot learners. International Conference on Learning Representations.
- Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc.
- Joseph Weizenbaum. 1966. Eliza—a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1):36–45.
- Lilian Weng. 2023. Prompt engineering. lilianweng.github.io.
- k-Rater Reliability: The correct unit of reliability for aggregated human annotations. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 378–384, Dublin, Ireland. Association for Computational Linguistics.
- Yang Xu and David Reitter. 2015. An evaluation and comparison of linguistic alignment measures. In Proceedings of the 6th Workshop on Cognitive Modeling and Computational Linguistics, pages 58–67, Denver, Colorado. Association for Computational Linguistics.
- The unreliability of explanations in few-shot prompting for textual reasoning. In Advances in Neural Information Processing Systems, volume 35, pages 30378–30392. Curran Associates, Inc.
- PAWS: Paraphrase adversaries from word scrambling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1298–1308, Minneapolis, Minnesota. Association for Computational Linguistics.
- Calibrate before use: Improving few-shot performance of language models. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 12697–12706. PMLR.
- Judging LLM-as-a-judge with MT-bench and Chatbot Arena. In Advances in Neural Information Processing Systems, volume 36, pages 46595–46623. Curran Associates, Inc.
- Paraphrase identification with deep learning: A review of datasets and methods. Computing Research Repository, arXiv:1503.06733.
- MediaSum: A large-scale media interview dataset for dialogue summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5927–5934, Online. Association for Computational Linguistics.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.