Papers
Topics
Authors
Recent
2000 character limit reached

AI Knowledge Assist System

Updated 14 October 2025
  • AI Knowledge Assist System is a computational framework that extracts domain-specific QA pairs from call transcripts using a fine-tuned LLaMA model.
  • It employs cosine similarity and DBSCAN clustering to deduplicate and summarize QA pairs, achieving over 90% F1 accuracy in QA recommendation.
  • Automated updates with Kubeflow on Google Vertex AI ensure scalable, real-time deployment for customer-facing conversational agents.

An AI Knowledge Assist System is a computational framework that enables automated extraction, organization, and retrieval of domain-specific knowledge, often as structured question–answer (QA) pairs, to support knowledge-focused applications such as conversational agents, chatbots, and contact center automation. By leveraging LLMs, modern Retrieval Augmented Generation (RAG) methodologies, and clustering techniques, these systems transform unstructured conversational data into a deduplicated, maintainable knowledge base that can be immediately deployed to eliminate cold-start problems in customer-facing intelligent agents (Laskar et al., 9 Oct 2025).

1. System Architecture and Pipeline

The AI Knowledge Assist system adopts a three-stage architecture to systematically process historical customer–agent call transcripts into a usable knowledge base:

  1. Knowledge Extraction Module: Historical conversation transcripts are processed using a fine-tuned LLM (specifically, LLaMA-3.1-8B) trained to extract information-seeking questions and their corresponding agent answers. The model is prompted to rewrite each QA pair so that the extracted entry is clear, stand-alone, and independently interpretable, even when the original transcript is noisy or fragmentary.
  2. Clustering and Deduplication: Extracted QA pairs are embedded as dense vectors. Pairwise cosine distances between question embeddings are computed, and the DBSCAN algorithm is applied to automatically cluster semantically similar QA pairs. This stage removes redundancy, ensuring the resulting knowledge base is concise and comprehensive.
  3. Recommendation and Knowledge Base Construction: Each cluster is then summarized by an LLM to generate representative QA pairs. The generated representative pairs are selected to populate the final knowledge base used by downstream conversational agents. The entire pipeline is automated and orchestrated with Kubeflow on the Google Vertex AI Platform, supporting continuous operation and regular updating as new transcripts are ingested.

The pipeline can be summarized by the following formal steps:

  • Extraction: {(Qi,Ai)}=LLM(T;θ)\{(Q_i, A_i)\} = \mathrm{LLM}(T; \theta), where TT is a transcript and θ\theta the model parameters.
  • Clustering (cosine distance): dist(qi,qj)=1(qiqj)\mathrm{dist}(q_i, q_j) = 1 - (q_i \cdot q_j).
  • Recommendation: Rk=LLM(Ck;θ)R_k = \mathrm{LLM}(C_k; \theta), where CkC_k is the kkth cluster and RkR_k the representative QA pairs.

2. Methodology for Knowledge Extraction and Deduplication

Knowledge extraction is performed by fine-tuning the LLaMA-3.1-8B model to identify high-value, information-seeking QA pairs directly from conversation text. Prompt engineering is employed to ensure that the model outputs QA pairs that are standalone and context-aware.

After initial extraction, embeddings for each question are computed in a high-dimensional vector space. Cosine similarity is then used to calculate pairwise distances, and DBSCAN clustering groups questions exhibiting high semantic overlap. The system uses the clustering algorithm to automate the discovery of repeated or rephrased user queries, ensuring only a non-redundant, consolidated representation enters the knowledge base.

The clustering process is quantitatively evaluated by the Silhouette Score, a standard metric indicating the cohesion and separation of clusters.

Representative QA pairs for each cluster are selected using an LLM conditioned on the cluster’s set of QA pairs, ensuring that the most canonical and clear examples are retained for deployment.

3. LLM Fine-Tuning and Performance

The backbone LLM is LLaMA-3.1-8B, selected for its favorable trade-off between computational efficiency, inference speed, and accuracy. Fine-tuning is performed on annotated internal call transcripts over three epochs, optimizing both for knowledge extraction and the downstream recommendation step. Hyperparameters include learning rate adjustment and a context length of 8,000 tokens (input and output combined).

Performance benchmarks include:

  • Knowledge extraction: Knowledge-Assist-8B-SFT achieves precision, recall, and F1 of approximately 84.9%.
  • Representative QA recommendation: Achieves an F1 of ~91.8%, outperforming larger closed-source models (e.g., GPT-4o-Mini, Gemini).
  • End-to-end system evaluation with LLM-based (GPT-4o) and human judges confirms >90% accuracy on information-seeking QA tasks.

These metrics demonstrate the fine-tuned LLM’s capability to both extract relevant knowledge and to compress clusters into high-quality, deployable QAs.

4. Automated Updating and Deployment in Contact Centers

The system is deployed as a self-updating pipeline within the Google Vertex AI Platform, orchestrated using Kubeflow. As new call transcripts are generated, they are periodically ingested, processed, and used to update the knowledge base, thereby ensuring that the FAQ or QA repository remains current.

A reference-free LLM-based judging protocol is employed to assess the accuracy and relevance of newly extracted QA pairs. If a newly proposed QA pair is significantly different from the existing set (by embedding similarity threshold), it is either automatically integrated or flagged for optional manual review. This design enables robust clearing of obsolete content and seamless addition of new knowledge, thereby bridging the “cold start” gap that often plagues chatbot deployments in organizations lacking pre-existing structured Q&A knowledge.

The system is thus suitable for immediate chatbot deployment in contact centers, supporting rapid on-boarding and adaptive updating as conversational practices evolve.

5. Practical Benefits and Implications

By automating the creation and maintenance of QA knowledge bases from unstructured conversational data, the AI Knowledge Assist system offers several advantages:

  • Accelerated Deployment: Organizations can deploy RAG-powered conversational agents without pre-existing FAQ datasets, achieving >90% accuracy in information-seeking question answering.
  • Dynamic Maintenance: The self-updating mechanism ensures continuous adaptation to new queries and shifting information needs.
  • Model Efficiency: The architecture demonstrates that a well-fine-tuned, lightweight LLM plus clustering can outperform larger—but not domain-tuned—closed-source models in knowledge extraction and QA recommendation.
  • Scalability and Generalization: The three-stage pipeline can be applied across industries wherever large volumes of historical conversation data exist, enabling broad applicability for knowledge assistive technologies.

This system exemplifies the state-of-the-art in enterprise AI knowledge management and provides a model for future, automated, and adaptive knowledge base construction in customer-facing AI deployments (Laskar et al., 9 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to AI Knowledge Assist System.