Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences (2306.07906v1)

Published 13 Jun 2023 in cs.CL and cs.AI

Abstract: We present WebGLM, a web-enhanced question-answering system based on the General LLM (GLM). Its goal is to augment a pre-trained LLM with web search and retrieval capabilities while being efficient for real-world deployments. To achieve this, we develop WebGLM with strategies for the LLM-augmented retriever, bootstrapped generator, and human preference-aware scorer. Specifically, we identify and address the limitations of WebGPT (OpenAI), through which WebGLM is enabled with accuracy, efficiency, and cost-effectiveness advantages. In addition, we propose systematic criteria for evaluating web-enhanced QA systems. We conduct multi-dimensional human evaluation and quantitative ablation studies, which suggest the outperformance of the proposed WebGLM designs over existing systems. WebGLM with the 10-billion-parameter GLM (10B) is shown to perform better than the similar-sized WebGPT (13B) and even comparably to WebGPT (175B) in human evaluation. The code, demo, and data are at \url{https://github.com/THUDM/WebGLM}.

WebGLM: Enhancing Question Answering with Web Search and Human Preferences

The paper presents WebGLM, a web-enhanced question-answering system based on the General LLM (GLM). It innovatively integrates web search capabilities into a pre-trained LLM to enhance efficiency and practicality in real-world applications. This approach builds on and addresses limitations observed in WebGPT by OpenAI, focusing on improving accuracy, efficiency, and cost-effectiveness.

Key Contributions

WebGLM introduces three primary components: the LLM-augmented retriever, the bootstrapped generator, and the human preference-aware scorer. These components work collaboratively to enhance the system’s performance:

  1. LLM-augmented Retriever: This retriever operates in two stages—coarse-grained web search and fine-grained LLM-augmented retrieval. It leverages a combination of web search engines and dense vector retrievers distilling the LLM's natural reference adoption abilities.
  2. Bootstrapped Generator: Using in-context learning, the generator creates high-quality datasets (WebGLM-QA) with quoted long-formed answers, demonstrating that LLMs can produce quality data from citation-based filtering without relying heavily on human experts.
  3. Human Preference-aware Scorer: This component is trained using feedback from online QA forums to align the system with human preferences. It judiciously selects the highest-scored answer from generated candidates, reflecting human-like evaluation.

Experimental Evaluation

WebGLM's performance was assessed through human evaluations and quantitative ablation studies. Notable results include:

  • The system using a 10-billion-parameter GLM outperformed similar-sized WebGPT (13B) and was on par with WebGPT (175B) in various human-led assessments.
  • The retriever demonstrated superior relevance and usability compared to existing methods, benefiting significantly from LLM-augmented augmentation.
  • The bootstrapped generator and human preference-aware scorer provided competitive advantages in fluency, correctness, and citation accuracy metrics.

Implications and Future Directions

The implementation of WebGLM suggests promising improvements in QA systems through intelligent use of existing web resources and human-like assessment features. It points towards greater accessibility in deploying sophisticated QA models without the necessity of vast computational resources or extensive manual data curation.

Improvements in the systematic integration of web and QA systems can further aid in domains requiring up-to-date information retrieval, such as legal, financial, and health sectors. Moreover, developing more nuanced human preference-aligned models would likely push the boundaries of AI systems towards an increasingly human-centric approach.

In conclusion, WebGLM provides a thoughtful interplay between machine learning innovations and practical deployment strategies, paving the way for more efficient and effective web-enhanced QA systems. Future research could explore scaling these methods and integrating them with ever-more sophisticated forms of human feedback to refine their performance further.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Xiao Liu (402 papers)
  2. Hanyu Lai (11 papers)
  3. Hao Yu (195 papers)
  4. Yifan Xu (92 papers)
  5. Aohan Zeng (19 papers)
  6. Zhengxiao Du (22 papers)
  7. Peng Zhang (641 papers)
  8. Yuxiao Dong (119 papers)
  9. Jie Tang (302 papers)
Citations (79)
Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com