Detection of Anomalies in Large Scale Accounting Data using Deep Autoencoder Networks (1709.05254v2)

Published 15 Sep 2017 in cs.LG and cs.CE

Abstract: Learning to detect fraud in large-scale accounting data is one of the long-standing challenges in financial statement audits or fraud investigations. Nowadays, the majority of applied techniques refer to handcrafted rules derived from known fraud scenarios. While fairly successful, these rules exhibit the drawback that they often fail to generalize beyond known fraud scenarios and fraudsters gradually find ways to circumvent them. To overcome this disadvantage and inspired by the recent success of deep learning we propose the application of deep autoencoder neural networks to detect anomalous journal entries. We demonstrate that the trained network's reconstruction error obtainable for a journal entry and regularized by the entry's individual attribute probabilities can be interpreted as a highly adaptive anomaly assessment. Experiments on two real-world datasets of journal entries, show the effectiveness of the approach resulting in high f1-scores of 32.93 (dataset A) and 16.95 (dataset B) and less false positive alerts compared to state of the art baseline methods. Initial feedback received by chartered accountants and fraud examiners underpinned the quality of the approach in capturing highly relevant accounting anomalies.

Citations (99)

View on Semantic Scholar

Summary

The paper introduces a deep autoencoder approach to detect both global and local anomalies in accounting datasets.
It reports f1-scores of 32.93 and 16.95 on real SAP ERP datasets while achieving 100% recall for synthetic anomalies.
The study demonstrates that deeper network configurations reduce false positives compared to traditional rule-based and unsupervised methods.

An Analysis of Anomaly Detection in Accounting Data Using Deep Autoencoders

The paper "Detection of Anomalies in Large-Scale Accounting Data using Deep Autoencoder Networks" presents a detailed paper focusing on the application of deep learning methodologies to the domain of accounting, particularly in detecting anomalies that may indicate fraudulent activities. This research addresses the limitations inherent in traditional rule-based systems, which rely heavily on pre-defined fraud scenarios and struggle to generalize beyond known cases.

Methodology and Experimentation

The authors propose a novel approach using deep autoencoder networks to identify anomalous journal entries within large-scale accounting datasets. The core of this research lies in employing the reconstruction error from these networks as a metric for anomaly detection, capturing deviations both at the global attribute level and in the co-occurrence of attribute combinations. This dual approach allows for the identification of both global anomalies, which involve rare individual attribute values, and local anomalies, which consist of unusual combinations of otherwise common attribute values.

The authors conducted their experiments on two datasets extracted from SAP ERP systems, forming a basis for robust evaluation. These datasets were pre-processed to encode categorical attributes into binary vectors, enabling the input to deep learning models. The paper reports a rigorous evaluation of various neural network architectures ranging from shallow to deep configurations, aiming to optimize for precision, recall, and the f\textsubscript{1}-Score. The most effective results were achieved using the deepest autoencoder configuration, which yielded f\textsubscript{1}-Scores of 32.93 for dataset A and 16.95 for dataset B.

Results and Comparative Analysis

The findings suggest that deep autoencoders efficiently detect anomalies with a high degree of precision relative to state-of-the-art unsupervised techniques like PCA, HDBSCAN, LOF, and OC-SVM. Notably, the AE 9 architecture demonstrated superior performance by maintaining 100% recall of synthetic anomalies while exhibiting a significantly reduced rate of false positives compared to other benchmark methods.

From a quantitative standpoint, the results indicated that a deeper network configuration is instrumental in learning complex patterns in ledger data, thus enhancing the fidelity of anomaly detection. Qualitatively, the findings also revealed that anomalies detected by the proposed model correlated well with non-compliance activities that might suggest fraud or errors, such as transactions involving unusual currency changes or document types.

Implications and Future Directions

The implications of this research are manifold, offering both practical and theoretical advancements. Practically, it provides a valuable tool for auditors and forensic examiners, potentially increasing the efficiency and effectiveness of fraud detection in financial audits by reducing false positives and flagging significant anomalies for further review. Theoretically, it extends the application of deep learning into a domain traditionally dominated by manual rules and heuristic methods, showcasing the flexibility and depth of insights that neural networks can provide.

Looking forward, the paper paves the way for more complex applications of deep learning in forensic accounting, including exploration of adversarial autoencoder architectures and deeper investigation into the latent space representations within accounting datasets. Such directions could enhance understanding of both regular and anomalous patterns, improving the robustness of anomaly detection frameworks and enabling adaptation to ever-evolving fraudulent strategies.

Overall, this research represents a significant step toward integrating state-of-the-art machine learning techniques into the mainstream audit processes, offering promise for both increased accuracy in fraud detection and decreased operational overhead in financial audits.

PDF Markdown

Related Papers

GitHub

GitHub - GitiHubi/deepAI: Detection of Accounting Anomalies using Deep Autoencoder Neural Networks - A lab we prepared for NVIDIA's GPU Technology Conference 2018 that will walk you through the detection of accounting anomalies using deep autoencoder neural networks. The majority of the lab content is based on Jupyter Notebook, Python and PyTorch. (202 stars)