Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Customer Support Ticket Escalation Prediction using Feature Engineering (2010.06145v1)

Published 10 Oct 2020 in cs.SE and cs.LG

Abstract: Understanding and keeping the customer happy is a central tenet of requirements engineering. Strategies to gather, analyze, and negotiate requirements are complemented by efforts to manage customer input after products have been deployed. For the latter, support tickets are key in allowing customers to submit their issues, bug reports, and feature requests. If insufficient attention is given to support issues, however, their escalation to management becomes time-consuming and expensive, especially for large organizations managing hundreds of customers and thousands of support tickets. Our work provides a step towards simplifying the job of support analysts and managers, particularly in predicting the risk of escalating support tickets. In a field study at our large industrial partner, IBM, we used a design science research methodology to characterize the support process and data available to IBM analysts in managing escalations. We then implemented these features into a machine learning model to predict support ticket escalations. We trained and evaluated our machine learning model on over 2.5 million support tickets and 10,000 escalations, obtaining a recall of 87.36% and an 88.23% reduction in the workload for support analysts looking to identify support tickets at risk of escalation. Finally, in addition to these research evaluation activities, we compared the performance of our support ticket model with that of a model developed with no feature engineering; the support ticket model features outperformed the non-engineered model. The artifacts created in this research are designed to serve as a starting place for organizations interested in predicting support ticket escalations, and for future researchers to build on to advance research in escalation prediction.

Citations (6)

Summary

  • The paper’s main contribution is using structured data-based feature engineering to forecast ticket escalations effectively.
  • It evaluates multiple machine learning models, notably Random Forest, using metrics such as precision, recall, F1-score, and AUC.
  • The study demonstrates practical integration into support workflows, enabling proactive intervention and better resource allocation.

This paper "Customer Support Ticket Escalation Prediction using Feature Engineering" (2010.06145) focuses on building a machine learning model to predict whether a customer support ticket will escalate. Predicting escalation is crucial for support organizations as it allows for proactive intervention, better resource allocation, reduced resolution times, and improved customer satisfaction.

The core contribution of the paper lies in its emphasis on feature engineering using structured data attributes from customer support tickets. Unlike approaches that might rely heavily on ticket text, this paper deliberately excludes textual data to explore the predictive power of readily available structured information. This is a practical consideration for many systems where extracting meaningful features from text can be complex or resource-intensive.

The methodology involves collecting historical customer support ticket data, performing extensive feature engineering, training and evaluating several classification models, and identifying the most impactful features and best-performing model for this prediction task.

Problem Definition

The problem is framed as a binary classification task: given the current state of a customer support ticket, predict whether it will eventually be escalated (a binary outcome: Yes/No). Escalation is typically defined by the movement of a ticket to a higher level of support expertise or management involvement.

Feature Engineering

The paper highlights the importance of domain-specific feature engineering from structured data. They categorize the engineered features into several groups:

  1. Ticket Attributes: Static attributes of the ticket itself, available upon creation or early in its lifecycle. Examples include initial priority, ticket category, product/service affected, reported severity.
  2. Customer Attributes: Information about the customer submitting the ticket. Examples include customer support level (e.g., premium, standard), customer region, size of the customer organization.
  3. Temporal Features: Features derived from timestamps. Examples include time elapsed since ticket creation, day of the week or hour of the day the ticket was created or last updated, duration since last interaction.
  4. Interaction Features: Features capturing the activity and interaction around the ticket. Examples include the number of comments or updates, the number of times the assigned support engineer has changed, the number of days the ticket has been open.
  5. Historical Features: Features summarizing the past behavior related to the ticket or customer. Examples might include the number of previous tickets submitted by the customer, the average resolution time for similar tickets, or the escalation rate for tickets of the same category/product.

The paper stresses that generating these features often requires joining data from different sources within the support system and potentially other internal systems (like customer databases).

Machine Learning Models and Evaluation

The authors evaluate several standard classification algorithms:

  • Logistic Regression
  • Naive Bayes
  • Decision Tree
  • Random Forest
  • Support Vector Machines (SVM)

Given that escalated tickets are likely a minority class (class imbalance), the paper emphasizes the use of evaluation metrics beyond simple accuracy. Key metrics include:

  • Precision: Out of all tickets predicted as escalated, what fraction actually escalated?
  • Recall (Sensitivity): Out of all truly escalated tickets, what fraction were correctly predicted?
  • F1-score: The harmonic mean of Precision and Recall, providing a single metric balancing both.
  • Area Under the ROC Curve (AUC): Measures the ability of the model to distinguish between the two classes across various probability thresholds.

The evaluation showed that models, particularly tree-based ones like Random Forest, performed well using the engineered features. Temporal and interaction features, such as the time a ticket has been open or the number of engineer reassignments, were found to be significant predictors of escalation.

Practical Implementation and Application

Implementing this research in a real-world customer support system would involve several steps:

  1. Data Extraction and Integration: Set up processes to extract raw ticket data, customer data, and interaction logs from the support system(s). This might involve database queries, APIs, or data warehousing solutions.
  2. Feature Engineering Pipeline: Build a pipeline to transform the raw data into the engineered features identified in the paper. This requires defining the logic for each feature (e.g., calculating time differences, counting events, aggregating historical data).
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    
    # Pseudocode for a simple feature engineering step
    import pandas as pd
    from datetime import datetime
    
    def create_features(tickets_df):
        # Example: Temporal feature - Time since creation in hours
        tickets_df['created_time'] = pd.to_datetime(tickets_df['created_at'])
        tickets_df['time_since_creation_hours'] = (datetime.now() - tickets_df['created_time']).dt.total_seconds() / 3600.0
    
        # Example: Interaction feature - Number of updates
        tickets_df['num_updates'] = tickets_df['update_log'].apply(lambda x: len(x) if x else 0) # Assuming update_log is a list
    
        # Add other features as described in the paper
    
        return tickets_df[['initial_priority', 'customer_support_level', 'time_since_creation_hours', 'num_updates', ...]]
    
    # Assuming you have a pandas DataFrame 'raw_tickets_data'
    engineered_features = create_features(raw_tickets_data.copy())
  3. Model Training: Train a chosen classification model (e.g., Random Forest, XGBoost - often performs well on structured data, although not listed as evaluated in this specific paper, it's a strong candidate in practice) on historical data with engineered features and known escalation outcomes. Address class imbalance using techniques like oversampling (SMOTE), undersampling, or using cost-sensitive learning algorithms.
  4. Model Deployment: Deploy the trained model as a service (e.g., via an API). This service would take the current structured data of a ticket as input, run it through the feature engineering steps (using the current time as a reference), and return the predicted probability of escalation.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    
    # Pseudocode for prediction service
    from sklearn.ensemble import RandomForestClassifier # Or your chosen model
    
    # Assume model is already trained and loaded
    # loaded_model = load_trained_model('path/to/model.pkl')
    # Assume feature engineering logic is implemented in create_features
    
    def predict_escalation(ticket_data_row):
        # Convert raw ticket data to features
        ticket_df = pd.DataFrame([ticket_data_row]) # Format as DataFrame for feature engineering
        features = create_features(ticket_df)
    
        # Predict probability
        escalation_probability = loaded_model.predict_proba(features)[:, 1] # Get probability for the positive class (escalation)
    
        return escalation_probability[0]
    
    # Example usage:
    # ticket_info = {'created_at': '2023-10-27 10:00:00', 'initial_priority': 'High', ...}
    # prediction = predict_escalation(ticket_info)
    # print(f"Predicted escalation probability: {prediction}")
  5. Integration with Support Workflow: Integrate the prediction service into the support system. This could be:
    • A real-time flag on the ticket dashboard indicating the risk level (e.g., Low, Medium, High).
    • An alert triggered for support managers or senior engineers when a ticket's predicted risk crosses a certain threshold.
    • Automated task assignments based on risk.
    • Visualizations of risk across the entire ticket queue (as suggested by references like [Montgomery2017Tool]).
  6. Monitoring and Retraining: Continuously monitor the model's performance in the production environment. Ticket attributes, customer behavior, and support processes evolve, so regular retraining of the model on fresh data is necessary to maintain accuracy.

Considerations and Limitations

  • Data Quality: The quality and consistency of structured data in the support system are critical for feature engineering success.
  • Domain Expertise: Effective feature engineering relies heavily on understanding the nuances of the support process and customer interactions.
  • Exclusion of Text Data: While a practical choice for this paper, excluding ticket descriptions and communication threads means potentially missing valuable predictive signals embedded in the text. A hybrid approach combining structured and text features could potentially yield better results but adds complexity.
  • Interpretability: Understanding why a ticket is predicted to escalate can be important for support staff. Some models (like Logistic Regression or Decision Trees) are more interpretable than others (like complex Random Forests or SVMs). Feature importance analysis from the trained model can help provide insights.

In summary, this paper provides a practical blueprint for predicting customer support ticket escalation using readily available structured data and robust feature engineering, offering a valuable starting point for organizations looking to implement such a system without immediately diving into complex natural language processing of ticket text.