AI Research Assistant for arXiv

Discover the latest research in AI/ML, mathematics, physics, and more

Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 153 tok/s

Gemini 2.5 Pro 48 tok/s Pro

GPT-5 Medium 28 tok/s Pro

GPT-5 High 18 tok/s Pro

GPT-4o 100 tok/s Pro

Kimi K2 220 tok/s Pro

GPT OSS 120B 447 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

Gemini 2.5 Flash 12 tok/s Deprecated

2000 character limit reached

Done in 1m 2s

Queued

The user is asking about studies regarding frustration or aggression caused by voice assistants mishearing trigger words. The best approach to answer this is to search for relevant research papers. The topic search tool is suitable for this purpose. I will use keywords related to voice assistants, trigger words, mishearing, and user frustration in the search query.

Searching arXiv for "voice assistant trigger word errors user frustration"

I have reviewed the abstracts from the initial search and identified several papers that appear highly relevant to the user's query about frustration or aggression caused by voice assistants mishearing trigger words or experiencing other errors. To get a detailed answer, I need to examine the full content of these papers. I will use the paper prompter tool to extract specific findings related to user frustration, aggression, or negative psychological responses resulting from voice assistant errors, prioritizing the papers that seem most promising based on their titles and abstracts. I will start by prompting the top three most relevant papers.

Prompting "A Mixed-Methods Approach to Understanding User Trust after Voice Assistant Failures": "Summarize the findings related to user frustration or negative reactions resulting from voice assistant failures, specifically including failures caused by mishearing or overcapturing user input." (2303.00164)

Prompting "Can Voice Assistants Be Microaggressors? Cross-Race Psychological Responses to Failures of Automatic Speech Recognition": "Describe the psychological responses observed in users when voice assistants experience automatic speech recognition failures, focusing on any findings related to frustration, aggression, or negative emotional states." (2302.12326)

Prompting "Look at Me When I Talk to You: A Video Dataset to Enable Voice Assistants to Recognize Errors": "Detail any findings or observations in this paper regarding user frustration caused by voice assistant errors and the potential for recognizing such frustration or errors from user reactions." (2104.07153)

I have conducted an initial search and reviewed the abstracts. I then prompted the full content of three highly relevant papers that discuss user frustration and negative psychological responses stemming from voice assistant errors, including mishearing and overcapture. The information gathered from these steps provides a solid basis to answer the user's query regarding studies on frustration caused by voice assistants mishearing trigger words. While aggression was not a primary focus in the papers I prompted, frustration is well-documented. I have gathered sufficient information from the most promising sources found in the initial search. Therefore, I will now synthesize this information to generate the final response.

Synthesized final response with models/gemini-2.5-flash-preview-04-17

Done in 1m 2s

Processed 122,265 tokens in 5 steps

Numerous studies have investigated user interactions with voice assistants, and a subset of this research specifically addresses the negative consequences of system failures, including those stemming from misheard trigger words or erroneous Automatic Speech Recognition (ASR). While the term "aggression" is not commonly used in the examined literature to describe user responses, "frustration" and other negative psychological states are frequently reported outcomes of such technical deficiencies.

User Frustration and Negative Reactions to Voice Assistant Errors

Research indicates that voice assistant errors are a significant source of user frustration and can detrimentally impact user trust and overall experience. A mixed-methods paper investigating user trust after voice assistant failures categorized different types of failures, including those related to the system incorrectly capturing user input, referred to as "Perception" failures (Baughan et al., 2023). These Perception failures encompass issues like noisy channels interfering with input, overcapture (the system listening for too long), truncation (the system cutting off input too early), and transcription errors (mishearing words).

Quantitative analysis in this paper revealed that Overcapture failures were particularly harmful, resulting in the lowest reported trust scores regarding the voice assistant's perceived ability and benevolence compared to other failure types (Baughan et al., 2023). Qualitatively, users described Overcapture as aggravating, annoying, and a waste of time. Transcription errors, where the system mishears words or struggles with variations in speech (accents, names, foreign languages), were also found to negatively impact perceptions of benevolence and contributed to frustration and annoyance (Baughan et al., 2023). While Transcription errors did not statistically impact perceived ability as severely as Incorrect Actions, they were more detrimental to trust than errors due to noisy channels or truncation. Truncation errors, where the system stops listening prematurely, were also described as aggravating and annoying, increasing the time users needed to complete tasks (Baughan et al., 2023).

The aggregate effect of these Perception failures, stemming directly from the system's inability to accurately process user speech (including trigger words and subsequent commands), leads users to sometimes abandon the task for a period or simplify their interactions to avoid scenarios prone to such errors (Baughan et al., 2023).

Psychological Responses to ASR Failures

Beyond general frustration, studies have explored deeper psychological responses to ASR failures, particularly considering known biases in these systems. Research has shown that language technologies, including ASR, can exhibit differential error rates across demographic groups, notably higher rates for Black speakers compared to white speakers (Wenzel et al., 2023). A paper examining cross-race psychological responses to ASR failures found that Black participants interacting with a voice assistant exhibiting a high error rate reported significantly lower levels of positive affect, higher levels of self-consciousness, and reduced individual and collective self-esteem compared to Black participants in a low error rate condition (Wenzel et al., 2023). These findings were interpreted as consistent with the psychological impact of experiencing racial microaggressions, where persistent errors are perceived as subtle acts of bias reinforcing marginalization (Wenzel et al., 2023). While the paper used a Negative Affect scale (which includes items related to frustration), it did not find a statistically significant difference in negative affect between Black and white participants in the high error condition, although errors generally increased negative feelings for both groups. Aggression was not a measured response in this specific paper (Wenzel et al., 2023).

Another paper noted that users are "often frustrated by voice assistants' frequent errors" and explored the feasibility of recognizing these errors and the resulting user frustration from visual cues such as facial reactions (Cuadra et al., 2021). While recognizing errors and frustration from soundless video was challenging, the paper suggested that it is possible and warrants further investigation, potentially through multimodal analysis combining audio and visual data (Cuadra et al., 2021).

Implications for Voice Assistant Design

The documented user frustration and negative psychological responses highlight the critical need for robust and accurate wake word detection and ASR systems. Ongoing research focuses on improving these core components to mitigate errors. Efforts include developing more efficient and accurate voice trigger detection models (Higuchi et al., 2020, Zhang et al., 2022, R et al., 2021), improving performance in noisy environments (Bonet et al., 2021), mitigating false triggers caused by acoustically similar sounds (Chen et al., 2021, Garg et al., 2021), and improving ASR performance for diverse speakers, including those with dysfluent speech or different linguistic backgrounds (Mitra et al., 2021, Wu et al., 2020). Furthermore, research into conversational error recovery mechanisms, such as allowing users to repeat or reformulate commands, aims to provide pathways for users and systems to recover gracefully from misinterpretations (Nguyen et al., 2021, Fazel-Zarandi et al., 2019, Galbraith et al., 2023). Addressing issues like overcapture and truncation through improved endpoint detection is also crucial for reducing frustration associated with input processing failures (Mallidi et al., 2018, Buddi et al., 2023).

In conclusion, research confirms that voice assistant errors, including the mishearing of trigger words and subsequent commands, are a significant cause of user frustration and contribute to negative user experiences and reduced trust. While aggression is not a commonly documented outcome, the psychological impacts can be more profound, particularly for users who experience differential error rates. Continued advancements in ASR, wake word detection, and dialogue management are essential to mitigate these issues and improve the usability and perceived fairness of voice assistants.

PDF Markdown

References (16)

A Mixed-Methods Approach to Understanding User Trust after Voice Assistant Failures (2023)

Can Voice Assistants Be Microaggressors? Cross-Race Psychological Responses to Failures of Automatic Speech Recognition (2023)

Look at Me When I Talk to You: A Video Dataset to Enable Voice Assistants to Recognize Errors (2021)

Stacked 1D convolutional networks for end-to-end small footprint voice trigger detection (2020)

WakeUpNet: A Mobile-Transformer based Framework for End-to-End Streaming Voice Trigger (2022)

EfficientWord-Net: An Open Source Hotword Detection Engine based on One-shot Learning (2021)

Speech Enhancement for Wake-Up-Word detection in Voice Assistants (2021)

FakeWake: Understanding and Mitigating Fake Wake-up Words of Voice Assistants (2021)

Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation (2021)

10.

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech (2021)

11.

See what I'm saying? Comparing Intelligent Personal Assistant use for Native and Non-Native Language Speakers (2020)

12.

User-Initiated Repetition-Based Recovery in Multi-Utterance Dialogue Systems (2021)

13.

Investigation of Error Simulation Techniques for Learning Dialog Policies for Conversational Error Recovery (2019)

14.

An Analysis of Dialogue Repair in Virtual Voice Assistants (2023)

15.

Device-directed Utterance Detection (2018)

16.

Efficient Multimodal Neural Networks for Trigger-less Voice Assistants (2023)

Are there any studies that refer to the phenomenon of criminal activity being something that folks start off with lower stakes deviant behavior, such as serial killers who start off hurting animals? Could there be a risk in VBA users getting frustrated and engaging in low risk aggression at the VBA, and eventually that seeping into other areas of their life where it impacts their interpersonal relationships?

Done in 1m 40s

Queued

The user is asking about the phenomenon of criminal activity starting with lower-stakes deviant behavior and whether frustration with voice assistants (VBAs) and subsequent low-risk aggression towards them could potentially escalate and affect interpersonal relationships. Given the available tools are focused on searching and analyzing arXiv papers (primarily computer science), the first part of the question about criminal behavior progression is likely outside the scope of what I can find directly. However, the second part, regarding user frustration and aggression towards VBAs and potential transfer to interpersonal relationships, relates to Human-Computer Interaction (HCI) and psychology, which might be covered in relevant arXiv papers. I will perform two parallel searches on arXiv: one for 'user frustration voice assistant' to address the core issue of negative user experiences with VBAs and another for 'aggression human computer interaction' to see if the concept of aggression towards technology or in HCI contexts is discussed. This will help determine if there are studies that touch upon the user's concern about the potential for low-risk aggression towards VBAs having broader impacts.

Searching arXiv for "user frustration voice assistant"

Searching arXiv for "aggression human computer interaction"

I have identified a paper (1808.01076) from the previous search that directly addresses the potential link between aggression towards artificial agents (robots) and broader antisocial tendencies, which is highly relevant to the user's second question about aggression towards VBAs impacting interpersonal relationships. While the first part of the user's question about criminal behavior progression is likely outside the scope of arXiv, understanding the findings of paper 1808.01076 is crucial for the second part. Therefore, I will use the paper prompter tool to get more detailed information about this paper.

Prompting "Verbal Disinhibition towards Robots is Associated with General Antisociality": "Summarize the methodology and key findings of this paper regarding the link between verbal disinhibition/aggression towards artificial agents (robots) and general antisociality in broader interactions. Does it suggest a causal link or a correlation?" (1808.01076)

I have gathered information about user frustration with voice assistants and explored research on aggression in human-computer interaction, specifically focusing on a paper that correlates aggression towards robots with general antisocial behavior. I need to synthesize this information to answer the user's query, acknowledging the limitations of the search to the arXiv database and the specificity of the user's hypothetical scenario.

Synthesized final response with models/gemini-2.5-flash-preview-04-17

We ran into an issue with gemini-2.5-flash-preview-04-17, retrying with gemini-2.5-pro-preview-03-25...

Synthesized final response with models/gemini-2.5-pro-preview-05-06

Done in 1m 40s

Processed 130,995 tokens in 4 steps

The query poses two distinct questions: first, concerning the "gateway hypothesis" in criminology—the idea that engagement in lower-stakes deviant behavior can precede more serious criminal acts, exemplified by serial killers purportedly starting with animal cruelty. Second, it inquires about the potential risk that frustration stemming from Voice-Based Assistants (VBAs) mishearing trigger words, leading to low-risk aggression towards the VBA, might escalate or transfer, thereby adversely impacting users' interpersonal relationships.

The provided research corpus, primarily sourced from arXiv, focuses on Human-Computer Interaction (HCI), user experiences with voice assistants, and aggression in digital contexts. As such, it can substantially address the second part of your query. However, the criminological "gateway hypothesis" falls outside the typical scope of this research domain.

The "Gateway Hypothesis" in Criminology

The specific criminological theory regarding a progression from lower-stakes deviant behaviors (e.g., animal cruelty) to more severe criminal acts (e.g., serial homicide) is a complex subject extensively studied within criminology, psychology, and forensic science. The available arXiv research, which centers on computational and HCI-related topics, does not contain studies directly investigating or validating this criminological hypothesis. Answering this part of your query would necessitate a review of literature from those specialized fields, which is beyond the purview of the current dataset.

User Frustration and Negative Affect from Voice-Based Assistant (VBA) Malfunctions

Research extensively documents that malfunctions in VBAs, including mishearing trigger words or commands, are a significant source of user frustration and can lead to a range of negative psychological outcomes.

A mixed-methods paper (Baughan et al., 2023) investigating user trust after voice assistant failures identified various failure types, including "Perception" failures, which encompass the system incorrectly capturing user input. These include issues like noisy channels, overcapture (system listening too long), truncation (system cutting off input too early), and transcription errors (mishearing words). The paper found that Overcapture failures were particularly detrimental, resulting in the lowest reported trust scores concerning the VBA's perceived ability and benevolence. Users qualitatively described Overcapture as "aggravating," "annoying," and a "waste of time." Transcription errors, where the system mishears words or struggles with speech variations (e.g., accents, names, foreign languages), also negatively impacted perceptions of benevolence and contributed to user "frustration" and "annoyance." While not impacting perceived ability as severely as "Incorrect Action" failures, transcription errors were more damaging to trust than errors due to noisy channels or truncation. Truncation errors were similarly described as "aggravating" and "annoying," increasing task completion time. The cumulative effect of these Perception failures often led users to temporarily abandon tasks or simplify their interactions to avoid error-prone scenarios.

Furthermore, research into cross-race psychological responses to failures in Automatic Speech Recognition (ASR) systems (Wenzel et al., 2023) highlights more profound impacts. Given that ASR can exhibit disparate error rates across demographic groups (e.g., higher for Black speakers than white speakers), this paper explored the psychological effects. Black participants interacting with a high-error-rate voice assistant reported significantly lower positive affect, higher self-consciousness, and reduced individual and collective self-esteem compared to Black participants in a low-error-rate condition. These findings were interpreted as consistent with the psychological impact of experiencing racial microaggressions, where persistent system errors are perceived as subtle acts of bias reinforcing marginalization. While the paper used a Negative Affect scale (which includes frustration-related items), it underscores that the psychological toll of ASR failures can extend beyond simple frustration to impact self-perception and emotional well-being, particularly when these failures intersect with societal biases.

These studies confirm that VBA malfunctions are a potent source of negative user experience, leading to frustration, reduced trust, and, in some contexts, more complex adverse psychological responses.

Aggression Towards Artificial Agents and its Relation to Broader Antisocial Tendencies

The question of whether aggression directed towards technology, such as VBAs or robots, might correlate with or indicate broader antisocial tendencies has been explored. One paper, "Verbal Disinhibition towards Robots is Associated with General Antisociality" (Strait et al., 2018), directly investigated this.

Methodology:

The researchers aimed to determine if verbal aggression towards robots was an isolated phenomenon or part of a larger pattern of antisocial behavior. They used Twitter as a data source for unsupervised human-agent interactions, focusing on two high-profile robots with Twitter accounts (Bina48 and Sophia). Forty independent Twitter users were selected: 20 who had posted at least one "abusive" tweet directed at one of the robots and 20 who had posted "non-abusive" tweets at them. "Abusive" content was defined as dehumanizing material, including objectification, sexualization, racist remarks, or generally offensive comments (e.g., calling the robot stupid, expressing violent/hostile intent). For each of these 40 users, 50 additional tweets (25 before and 25 after the target robot-directed tweet) were collected, totaling 2,000 tweets for analysis. These general tweets were then coded for abusiveness using the same criteria to determine the frequency of abuse in each user's broader Twitter communication. An ANOVA was conducted to compare the frequency of abusive content in general tweets between the "abusive towards robots" group and the "non-abusive towards robots" group.

Key Findings:

The paper found a significant association: users who were abusive towards the robots were significantly more frequently abusive in their general tweeting ( $M=.15$ frequency of abuse in general tweets) compared to users who were non-abusive towards the robots ( $M=.03$ frequency of abuse in general tweets). There was a significant main effect of "user type" on the frequency of dehumanizing content in users' broader Twitter communications.

Nature of the Link (Correlation vs. Causation):

It is critical to note that this paper demonstrates a correlation (or association), not a causal link. The research design was observational; it identified individuals based on pre-existing behavior (abusing a robot) and then observed their other behaviors (general tweeting). This methodology can identify that two behaviors co-occur or are linked but cannot establish that one causes the other. The authors themselves discuss that the observed association could stem from aggression towards robots being linked to a more stable antisocial personality trait or resulting from a temporary state of general negative affect. Both interpretations suggest an underlying factor that is correlated with both types of abusive behavior, rather than aggression towards robots directly causing or leading to broader antisocial tweeting.

Examining the Potential for Escalation and Impact on Interpersonal Relationships

The core of your second question is whether frustration-induced, low-risk aggression towards a VBA could escalate or transfer ("seep into other areas"), negatively impacting interpersonal relationships. Based on the provided research, there is no direct evidence to support this specific escalatory pathway.

The paper by Nagamine et al. (Strait et al., 2018) suggests that individuals exhibiting verbal aggression towards robots may already possess broader antisocial tendencies. This implies that aggression towards technology might be another manifestation of a pre-existing disposition, rather than the technology interaction serving as a catalyst or training ground that creates or escalates aggression which then transfers to human interactions. The paper did not investigate whether interacting aggressively with a robot causes an increase in subsequent interpersonal aggression or if this behavior starts with technology and then spreads.

While research clearly shows that VBA malfunctions cause user frustration and negative affect (Baughan et al., 2023, Wenzel et al., 2023), the leap from this frustration to overt, low-risk aggression towards the VBA, and then a subsequent escalation and transfer of this aggression to interpersonal relationships, is a multi-step hypothesis not directly substantiated by the available studies. The psychological impacts noted in (Wenzel et al., 2023) (e.g., lower self-esteem, negative affect) are significant and could conceivably have indirect ramifications on an individual's mood and well-being, which might, in turn, affect their interactions. However, a direct causal chain leading to increased interpersonal aggression as a learned or escalated behavior from VBA interactions is not established in this corpus.

The frustration experienced by users is a response to system failure. While this frustration is a negative emotional state, the current research does not offer evidence that venting this frustration on a VBA acts as a "gateway" behavior that then cultivates or normalizes aggression in human-to-human contexts. It is more plausible, based on (Strait et al., 2018), that individuals with pre-existing aggressive or antisocial tendencies might express these tendencies towards various targets, including technology.

Another paper ((Carlson et al., 2017), "This robot stinks! Differences between perceived mistreatment of robot and computer partners") found that human observers perceived mistreatment directed by a confederate towards a robot differently than towards a computer, feeling more sympathy for the robot and believing it to be more emotionally capable. This suggests that humans may ascribe some level of social presence or animacy to robots, which could influence reactions to their "mistreatment," but it doesn't directly address user-initiated frustration-aggression cycles and their transference.

Conclusion

In summary, while the criminological "gateway hypothesis" is outside the scope of the provided HCI-focused research, studies on voice-based assistants robustly confirm that system malfunctions, such as mishearing trigger words, are a significant source of user frustration and can lead to various negative psychological states. Research has also found a correlation between verbal aggression directed at artificial agents (like robots on social media) and a higher propensity for abusive language in users' general online communications. However, this is an association, suggesting that such behaviors may stem from common underlying antisocial traits rather than indicating a causal pathway where aggression towards technology trains or escalates into interpersonal aggression. The specific hypothesis that frustration-induced, low-risk aggression towards a VBA could directly lead to an escalation of aggressive behaviors impacting interpersonal relationships is not directly supported by the current body of research presented.

PDF Markdown

References (4)

A Mixed-Methods Approach to Understanding User Trust after Voice Assistant Failures (2023)

Can Voice Assistants Be Microaggressors? Cross-Race Psychological Responses to Failures of Automatic Speech Recognition (2023)

Verbal Disinhibition towards Robots is Associated with General Antisociality (2018)

This robot stinks! Differences between perceived mistreatment of robot and computer partners (2017)