DSP.Ear: Leveraging Co-Processor Support for Continuous Audio Sensing on Smartphones (1409.3206v1)

Published 10 Sep 2014 in cs.SD

Abstract: The rapidly growing adoption of sensor-enabled smartphones has greatly fueled the proliferation of applications that use phone sensors to monitor user behavior. A central sensor among these is the microphone which enables, for instance, the detection of valence in speech, or the identification of speakers. Deploying multiple of these applications on a mobile device to continuously monitor the audio environment allows for the acquisition of a diverse range of sound-related contextual inferences. However, the cumulative processing burden critically impacts the phone battery. To address this problem, we propose DSP.Ear - an integrated sensing system that takes advantage of the latest low-power DSP co-processor technology in commodity mobile devices to enable the continuous and simultaneous operation of multiple established algorithms that perform complex audio inferences. The system extracts emotions from voice, estimates the number of people in a room, identifies the speakers, and detects commonly found ambient sounds, while critically incurring little overhead to the device battery. This is achieved through a series of pipeline optimizations that allow the computation to remain largely on the DSP. Through detailed evaluation of our prototype implementation we show that, by exploiting a smartphone's co-processor, DSP.Ear achieves a 3 to 7 times increase in the battery lifetime compared to a solution that uses only the phone's main processor. In addition, DSP.Ear is 2 to 3 times more power efficient than a naive DSP solution without optimizations. We further analyze a large-scale dataset from 1320 Android users to show that in about 80-90% of the daily usage instances DSP.Ear is able to sustain a full day of operation (even in the presence of other smartphone workloads) with a single battery charge.

Citations (63)

View on Semantic Scholar

Summary

The paper introduces DSP.Ear, a framework that leverages low-power DSP co-processors to enable continuous audio sensing with up to 7x energy savings.
It employs techniques such as admission filters, behavioral locality detection, and selective CPU offloading to minimize unnecessary computation.
Empirical evaluations demonstrate that DSP.Ear sustains full-day operation in 80–90% of scenarios, underscoring its practical impact on mobile sensing efficiency.

DSP.Ear: Leveraging Co-Processor Support for Continuous Audio Sensing on Smartphones

The paper entitled "DSP.Ear: Leveraging Co-Processor Support for Continuous Audio Sensing on Smartphones," authored by Petko Georgiev et al., presents an innovative approach to address the energy consumption challenges associated with continuous audio sensing on smartphones. By exploiting the capabilities of Digital Signal Processing (DSP) co-processors integrated into modern mobile devices, the DSP.Ear framework demonstrates significant enhancements in power efficiency while maintaining robust audio sensing performance.

Overview of DSP.Ear Framework

The DSP.Ear system capitalizes on the low-power DSP co-processors found in contemporary smartphones, such as Qualcomm's Hexagon DSP, to facilitate the continuous and simultaneous execution of multiple audio inference algorithms with minimal impact on battery life. The framework integrates five primary audio pipelines: ambient noise classification, gender recognition, speaker counting, speaker identification, and emotion recognition. Each of these pipelines extracts contextually relevant information from the audio environment, providing comprehensive user behavior insights.

Key Optimizations and Techniques

The DSP.Ear framework employs several optimizations that collectively extend the operational battery life of mobile devices:

Pipeline Execution on DSP: The system achieves high energy efficiency by executing most computational tasks directly on the DSP, thereby minimizing the reliance on the main CPU. This approach leverages the inherent low-power consumption characteristics of the DSP for routine audio processing tasks.
Admission Filters: Utilization of lightweight admission filters helps in eliminating silent or irrelevant audio frames early in the processing pipeline. For example, this includes real-time checks for silent environments or non-speech scenarios, preventing unnecessary computation and saving power.
Behavioral Locality Detection: The framework capitalizes on the locality of human behaviors by employing similarity detectors. When consecutive audio segments exhibit similar characteristics, the framework bypasses redundant classification steps and propagates previous results. This optimization allows significant computational savings, particularly for prolonged, homogeneous audio patterns.
Selective CPU Offloading: While the DSP manages much of the continuous processing, complex tasks such as detailed emotion recognition and speaker identification sometimes necessitate CPU involvement. By selectively offloading these tasks based on memory constraints and computational requirements, DSP.Ear strikes a balance between DSP and CPU utilization.
Cross-Pipeline Optimizations: The system efficiently shares intermediate results across multiple pipelines where applicable. For instance, gender detection output can prune the subset of speaker identification models to be evaluated, effectively reducing the overall computation load.

Evaluation and Results

Empirical evaluations of DSP.Ear indicate substantial improvements in both power consumption and system performance. The framework achieves a 3 to 7 times increase in battery life compared to CPU-only solutions. Moreover, optimizations result in a further 2 to 3 times improvement in efficiency over a naive DSP-based implementation. Notably, detailed analysis with a large-scale dataset of 1320 Android users reveals that DSP.Ear can sustain continuous operation for a full day in 80%-90% of usage scenarios, even when other smartphone applications are running concurrently.

Implications and Future Directions

The practical implications of DSP.Ear are profound, particularly for applications requiring continuous ambient sound monitoring without frequent recharging. These include health monitoring, security, user behavior analysis, and context-aware computing. Theoretically, the research sets a precedent for leveraging co-processors in mobile devices to handle sensor data with enhanced efficiency.

Future Developments:

Enhanced Model Support: Expanding the range of supported classification models on DSP through efficient memory utilization or advanced model compression techniques can further enhance the framework’s versatility.
Broader Sensor Integration: Incorporating additional sensors (e.g., accelerometers, gyroscopes) into the DSP.Ear framework could enable more comprehensive and energy-efficient multi-modal sensing.
Adaptive Sensing Techniques: Implementing real-time adaptive techniques to dynamically adjust sensing and processing parameters based on context and user activity could further optimize energy consumption and responsiveness.

Conclusion

The work presented in DSP.Ear demonstrates a robust solution to the energy constraints faced by continuous audio sensing applications on smartphones. Through strategic integration of DSP capabilities and innovative computational optimizations, the framework sets the stage for future advancements in energy-efficient, context-aware mobile applications. The research paves the way for leveraging co-processing technologies to achieve seamless and sustainable mobile sensing in real-world scenarios.

PDF Markdown