AI Research Assistant for Computer Scientists
Synthesize the latest research on any AI/ML/CS topic
Numerous studies have investigated user interactions with voice assistants, and a subset of this research specifically addresses the negative consequences of system failures, including those stemming from misheard trigger words or erroneous Automatic Speech Recognition (ASR). While the term "aggression" is not commonly used in the examined literature to describe user responses, "frustration" and other negative psychological states are frequently reported outcomes of such technical deficiencies.
User Frustration and Negative Reactions to Voice Assistant Errors
Research indicates that voice assistant errors are a significant source of user frustration and can detrimentally impact user trust and overall experience. A mixed-methods paper investigating user trust after voice assistant failures categorized different types of failures, including those related to the system incorrectly capturing user input, referred to as "Perception" failures (Baughan et al., 2023 ). These Perception failures encompass issues like noisy channels interfering with input, overcapture (the system listening for too long), truncation (the system cutting off input too early), and transcription errors (mishearing words).
Quantitative analysis in this paper revealed that Overcapture failures were particularly harmful, resulting in the lowest reported trust scores regarding the voice assistant's perceived ability and benevolence compared to other failure types (Baughan et al., 2023 ). Qualitatively, users described Overcapture as aggravating, annoying, and a waste of time. Transcription errors, where the system mishears words or struggles with variations in speech (accents, names, foreign languages), were also found to negatively impact perceptions of benevolence and contributed to frustration and annoyance (Baughan et al., 2023 ). While Transcription errors did not statistically impact perceived ability as severely as Incorrect Actions, they were more detrimental to trust than errors due to noisy channels or truncation. Truncation errors, where the system stops listening prematurely, were also described as aggravating and annoying, increasing the time users needed to complete tasks (Baughan et al., 2023 ).
The aggregate effect of these Perception failures, stemming directly from the system's inability to accurately process user speech (including trigger words and subsequent commands), leads users to sometimes abandon the task for a period or simplify their interactions to avoid scenarios prone to such errors (Baughan et al., 2023 ).
Psychological Responses to ASR Failures
Beyond general frustration, studies have explored deeper psychological responses to ASR failures, particularly considering known biases in these systems. Research has shown that language technologies, including ASR, can exhibit differential error rates across demographic groups, notably higher rates for Black speakers compared to white speakers (Wenzel et al., 2023 ). A paper examining cross-race psychological responses to ASR failures found that Black participants interacting with a voice assistant exhibiting a high error rate reported significantly lower levels of positive affect, higher levels of self-consciousness, and reduced individual and collective self-esteem compared to Black participants in a low error rate condition (Wenzel et al., 2023 ). These findings were interpreted as consistent with the psychological impact of experiencing racial microaggressions, where persistent errors are perceived as subtle acts of bias reinforcing marginalization (Wenzel et al., 2023 ). While the paper used a Negative Affect scale (which includes items related to frustration), it did not find a statistically significant difference in negative affect between Black and white participants in the high error condition, although errors generally increased negative feelings for both groups. Aggression was not a measured response in this specific paper (Wenzel et al., 2023 ).
Another paper noted that users are "often frustrated by voice assistants' frequent errors" and explored the feasibility of recognizing these errors and the resulting user frustration from visual cues such as facial reactions (Cuadra et al., 2021 ). While recognizing errors and frustration from soundless video was challenging, the paper suggested that it is possible and warrants further investigation, potentially through multimodal analysis combining audio and visual data (Cuadra et al., 2021 ).
Implications for Voice Assistant Design
The documented user frustration and negative psychological responses highlight the critical need for robust and accurate wake word detection and ASR systems. Ongoing research focuses on improving these core components to mitigate errors. Efforts include developing more efficient and accurate voice trigger detection models (Higuchi et al., 2020 , Zhang et al., 2022 , R et al., 2021 ), improving performance in noisy environments (Bonet et al., 2021 ), mitigating false triggers caused by acoustically similar sounds (Chen et al., 2021 , Garg et al., 2021 ), and improving ASR performance for diverse speakers, including those with dysfluent speech or different linguistic backgrounds (Mitra et al., 2021 , Wu et al., 2020 ). Furthermore, research into conversational error recovery mechanisms, such as allowing users to repeat or reformulate commands, aims to provide pathways for users and systems to recover gracefully from misinterpretations (Nguyen et al., 2021 , Fazel-Zarandi et al., 2019 , Galbraith et al., 2023 ). Addressing issues like overcapture and truncation through improved endpoint detection is also crucial for reducing frustration associated with input processing failures (Mallidi et al., 2018 , Buddi et al., 2023 ).
In conclusion, research confirms that voice assistant errors, including the mishearing of trigger words and subsequent commands, are a significant cause of user frustration and contribute to negative user experiences and reduced trust. While aggression is not a commonly documented outcome, the psychological impacts can be more profound, particularly for users who experience differential error rates. Continued advancements in ASR, wake word detection, and dialogue management are essential to mitigate these issues and improve the usability and perceived fairness of voice assistants.
The query poses two distinct questions: first, concerning the "gateway hypothesis" in criminology—the idea that engagement in lower-stakes deviant behavior can precede more serious criminal acts, exemplified by serial killers purportedly starting with animal cruelty. Second, it inquires about the potential risk that frustration stemming from Voice-Based Assistants (VBAs) mishearing trigger words, leading to low-risk aggression towards the VBA, might escalate or transfer, thereby adversely impacting users' interpersonal relationships.
The provided research corpus, primarily sourced from arXiv, focuses on Human-Computer Interaction (HCI), user experiences with voice assistants, and aggression in digital contexts. As such, it can substantially address the second part of your query. However, the criminological "gateway hypothesis" falls outside the typical scope of this research domain.
The "Gateway Hypothesis" in Criminology
The specific criminological theory regarding a progression from lower-stakes deviant behaviors (e.g., animal cruelty) to more severe criminal acts (e.g., serial homicide) is a complex subject extensively studied within criminology, psychology, and forensic science. The available arXiv research, which centers on computational and HCI-related topics, does not contain studies directly investigating or validating this criminological hypothesis. Answering this part of your query would necessitate a review of literature from those specialized fields, which is beyond the purview of the current dataset.
User Frustration and Negative Affect from Voice-Based Assistant (VBA) Malfunctions
Research extensively documents that malfunctions in VBAs, including mishearing trigger words or commands, are a significant source of user frustration and can lead to a range of negative psychological outcomes.
A mixed-methods paper (Baughan et al., 2023 ) investigating user trust after voice assistant failures identified various failure types, including "Perception" failures, which encompass the system incorrectly capturing user input. These include issues like noisy channels, overcapture (system listening too long), truncation (system cutting off input too early), and transcription errors (mishearing words). The paper found that Overcapture failures were particularly detrimental, resulting in the lowest reported trust scores concerning the VBA's perceived ability and benevolence. Users qualitatively described Overcapture as "aggravating," "annoying," and a "waste of time." Transcription errors, where the system mishears words or struggles with speech variations (e.g., accents, names, foreign languages), also negatively impacted perceptions of benevolence and contributed to user "frustration" and "annoyance." While not impacting perceived ability as severely as "Incorrect Action" failures, transcription errors were more damaging to trust than errors due to noisy channels or truncation. Truncation errors were similarly described as "aggravating" and "annoying," increasing task completion time. The cumulative effect of these Perception failures often led users to temporarily abandon tasks or simplify their interactions to avoid error-prone scenarios.
Furthermore, research into cross-race psychological responses to failures in Automatic Speech Recognition (ASR) systems (Wenzel et al., 2023 ) highlights more profound impacts. Given that ASR can exhibit disparate error rates across demographic groups (e.g., higher for Black speakers than white speakers), this paper explored the psychological effects. Black participants interacting with a high-error-rate voice assistant reported significantly lower positive affect, higher self-consciousness, and reduced individual and collective self-esteem compared to Black participants in a low-error-rate condition. These findings were interpreted as consistent with the psychological impact of experiencing racial microaggressions, where persistent system errors are perceived as subtle acts of bias reinforcing marginalization. While the paper used a Negative Affect scale (which includes frustration-related items), it underscores that the psychological toll of ASR failures can extend beyond simple frustration to impact self-perception and emotional well-being, particularly when these failures intersect with societal biases.
These studies confirm that VBA malfunctions are a potent source of negative user experience, leading to frustration, reduced trust, and, in some contexts, more complex adverse psychological responses.
Aggression Towards Artificial Agents and its Relation to Broader Antisocial Tendencies
The question of whether aggression directed towards technology, such as VBAs or robots, might correlate with or indicate broader antisocial tendencies has been explored. One paper, "Verbal Disinhibition towards Robots is Associated with General Antisociality" (Strait et al., 2018 ), directly investigated this.
Methodology:
The researchers aimed to determine if verbal aggression towards robots was an isolated phenomenon or part of a larger pattern of antisocial behavior. They used Twitter as a data source for unsupervised human-agent interactions, focusing on two high-profile robots with Twitter accounts (Bina48 and Sophia). Forty independent Twitter users were selected: 20 who had posted at least one "abusive" tweet directed at one of the robots and 20 who had posted "non-abusive" tweets at them. "Abusive" content was defined as dehumanizing material, including objectification, sexualization, racist remarks, or generally offensive comments (e.g., calling the robot stupid, expressing violent/hostile intent). For each of these 40 users, 50 additional tweets (25 before and 25 after the target robot-directed tweet) were collected, totaling 2,000 tweets for analysis. These general tweets were then coded for abusiveness using the same criteria to determine the frequency of abuse in each user's broader Twitter communication. An ANOVA was conducted to compare the frequency of abusive content in general tweets between the "abusive towards robots" group and the "non-abusive towards robots" group.
Key Findings:
The paper found a significant association: users who were abusive towards the robots were significantly more frequently abusive in their general tweeting (M=.15 frequency of abuse in general tweets) compared to users who were non-abusive towards the robots (M=.03 frequency of abuse in general tweets). There was a significant main effect of "user type" on the frequency of dehumanizing content in users' broader Twitter communications.
Nature of the Link (Correlation vs. Causation):
It is critical to note that this paper demonstrates a correlation (or association), not a causal link. The research design was observational; it identified individuals based on pre-existing behavior (abusing a robot) and then observed their other behaviors (general tweeting). This methodology can identify that two behaviors co-occur or are linked but cannot establish that one causes the other. The authors themselves discuss that the observed association could stem from aggression towards robots being linked to a more stable antisocial personality trait or resulting from a temporary state of general negative affect. Both interpretations suggest an underlying factor that is correlated with both types of abusive behavior, rather than aggression towards robots directly causing or leading to broader antisocial tweeting.
Examining the Potential for Escalation and Impact on Interpersonal Relationships
The core of your second question is whether frustration-induced, low-risk aggression towards a VBA could escalate or transfer ("seep into other areas"), negatively impacting interpersonal relationships. Based on the provided research, there is no direct evidence to support this specific escalatory pathway.
The paper by Nagamine et al. (Strait et al., 2018 ) suggests that individuals exhibiting verbal aggression towards robots may already possess broader antisocial tendencies. This implies that aggression towards technology might be another manifestation of a pre-existing disposition, rather than the technology interaction serving as a catalyst or training ground that creates or escalates aggression which then transfers to human interactions. The paper did not investigate whether interacting aggressively with a robot causes an increase in subsequent interpersonal aggression or if this behavior starts with technology and then spreads.
While research clearly shows that VBA malfunctions cause user frustration and negative affect (Baughan et al., 2023 , Wenzel et al., 2023 ), the leap from this frustration to overt, low-risk aggression towards the VBA, and then a subsequent escalation and transfer of this aggression to interpersonal relationships, is a multi-step hypothesis not directly substantiated by the available studies. The psychological impacts noted in (Wenzel et al., 2023 ) (e.g., lower self-esteem, negative affect) are significant and could conceivably have indirect ramifications on an individual's mood and well-being, which might, in turn, affect their interactions. However, a direct causal chain leading to increased interpersonal aggression as a learned or escalated behavior from VBA interactions is not established in this corpus.
The frustration experienced by users is a response to system failure. While this frustration is a negative emotional state, the current research does not offer evidence that venting this frustration on a VBA acts as a "gateway" behavior that then cultivates or normalizes aggression in human-to-human contexts. It is more plausible, based on (Strait et al., 2018 ), that individuals with pre-existing aggressive or antisocial tendencies might express these tendencies towards various targets, including technology.
Another paper ((Carlson et al., 2017 ), "This robot stinks! Differences between perceived mistreatment of robot and computer partners") found that human observers perceived mistreatment directed by a confederate towards a robot differently than towards a computer, feeling more sympathy for the robot and believing it to be more emotionally capable. This suggests that humans may ascribe some level of social presence or animacy to robots, which could influence reactions to their "mistreatment," but it doesn't directly address user-initiated frustration-aggression cycles and their transference.
Conclusion
In summary, while the criminological "gateway hypothesis" is outside the scope of the provided HCI-focused research, studies on voice-based assistants robustly confirm that system malfunctions, such as mishearing trigger words, are a significant source of user frustration and can lead to various negative psychological states. Research has also found a correlation between verbal aggression directed at artificial agents (like robots on social media) and a higher propensity for abusive language in users' general online communications. However, this is an association, suggesting that such behaviors may stem from common underlying antisocial traits rather than indicating a causal pathway where aggression towards technology trains or escalates into interpersonal aggression. The specific hypothesis that frustration-induced, low-risk aggression towards a VBA could directly lead to an escalation of aggressive behaviors impacting interpersonal relationships is not directly supported by the current body of research presented.