SiliconFriend: AI Companion Chatbot
- SiliconFriend is an AI companion chatbot that leverages the MemoryBank framework to provide long-term, personalized, and contextually coherent interactions.
- It employs a retrieval-augmented memory system with components for storage, retrieval, and decay to enhance user-specific dialog and empathy.
- The system demonstrates significant advances in memory recall and adaptive dialog generation across both open- and closed-source large language models.
SiliconFriend is a long-term AI companion chatbot built upon the MemoryBank framework, specifically engineered to endow LLMs with anthropomorphic long-term memory capabilities. It is designed for domains such as psychological counseling and personal companionship where sustained, contextually coherent interaction and adaptation to user personality are critical. SiliconFriend’s architecture, methodology, and evaluation demonstrate significant advances in empathy, memory recall, and user-adaptive dialog generation across both open- and closed-source LLM platforms (Zhong et al., 2023).
1. MemoryBank Architecture and Mechanisms
MemoryBank provides SiliconFriend with a fine-grained, retrieval-augmented, and dynamically updated memory system. Its architecture consists of three principal components: Memory Storage, Memory Retriever, and Memory Updater.
- Memory Storage archives:
- Raw, timestamped multi-turn conversations.
- Daily event summaries of user interactions.
- Rolling global summaries spanning all sessions.
- Daily and aggregated user-personality portraits.
- Memory Retriever employs a dual-tower dense retrieval model using an encoder ; it transforms each memory piece to vector and the live context to , retrieving the top- nearest memories using a FAISS index. This permits contextually relevant recall at each dialog turn.
- Memory Updater implements a forgetting/strengthening protocol inspired by the Ebbinghaus Forgetting Curve. Each memory tracks:
- : Time since creation or last recall.
- : Discrete memory strength (initialized to $1$).
- Retention follows the rule .
- Memory is pruned when . Each retrieval strengthens the memory (, ).
Per-turn data flow:
- User utters ; appended to raw log.
- Memory Retriever encodes ; queries FAISS for top- memories.
- Prompt to LLM combines task instructions, retrieved memories (raw and summarized), global summaries, user portraits, recent history.
- LLM generates response .
- Memory Updater logs , updates decay, strengths, triggers summarization and personality updates periodically.
Pseudocode strictly adheres to outlined mechanisms, supporting rigorous reproducibility.
2. Integration with LLMs
SiliconFriend supports both closed- and open-source LLMs and decouples memory mechanisms from core model weights.
- Closed-source (ChatGPT): Interacts via OpenAI's API from a lightweight orchestrator, which merges user input, retrieved memories, and constructed prompts before submission.
- Open-source (ChatGLM, BELLE): Operates on in-house infrastructure using Python inference APIs. LangChain intermediates embedding, FAISS-based retrieval, and prompt assembly. Embedding models are MiniLM (English) and Text2vec (Chinese).
- Orchestration structure:
1 2 3 4 5 6 |
[User] → [Orchestrator] →
├─> [Memory Retriever: Embedding+FAISS]
├─> [Memory Store: raw logs, summaries, portraits]
└─> [LLM Inference: OpenAI API or local ChatGLM/BELLE]
← Response
→ [Memory Updater: log/decay/strength/summarize] |
This modular deployment achieves broad compatibility, extensibility, and promotes separation of memory from language modeling.
3. Implementation: LLM Tuning and System Flow
- Base models: ChatGPT (proprietary), ChatGLM 6.2B, BELLE 7B (open).
- Empathy fine-tuning: Open-source models undergo parameter-efficient LoRA finetuning on 38,000 psychological dialog pairs:
- LoRA injects low-rank adapters into linear layers: for , revised as , with , , used, frozen.
- Emotional scope includes anxiety, grief, relationship issues, enabling empathic tone, active listening, positive reframing.
- Pipeline: Post-finetuning, MemoryBank is integrated with no further parameter updates, employing retrieval-augmented prompting for dynamic, memory-aware responses.
Real-time logic:
- User message traverses orchestrator, Memory Retriever fetches relevant logs, summaries, personality profiles.
- Prompt is constructed with all contextually relevant components.
- LLM infers response; output returned to user and Memory Updater.
- Decay, memory strength, and summarizations are atomically updated per dialog episode or at daily boundaries.
4. Evaluation Methodologies and Metrics
Qualitative Assessment
Deployed on a web platform for early user trials, SiliconFriend was benchmarked against baseline ChatGLM absent MemoryBank or empathy tuning. Real-world dialogs demonstrated:
- Enhanced empathic phrasing and personal relevance.
- Accurate recall of user-specific details (e.g., “Your girlfriend’s birthday is tomorrow…”).
- Personality-adaptive suggestions.
Quantitative Simulation
Utilizing ChatGPT for simulating 15 synthetic users over 10 dialog days (with ≥2 topics/day), SiliconFriend faced 194 probing questions (English/Chinese balanced) for robust evaluation.
Metrics:
- Memory Retrieval Accuracy: Correct memory recall (binary).
- Response Correctness: Judged as 0/0.5/1.
- Contextual Coherence: Judged as 0/0.5/1.
- Model Ranking Score: , for variant rankings.
| Model | Retrieval Acc. | Correctness | Coherence | Ranking |
|---|---|---|---|---|
| ChatGLM (Eng) | 0.809 | 0.438 | 0.680 | 0.498 |
| BELLE (Eng) | 0.814 | 0.479 | 0.582 | 0.517 |
| ChatGPT (Eng) | 0.763 | 0.716 | 0.912 | 0.818 |
| ChatGLM (Chn) | 0.840 | 0.418 | 0.428 | 0.510 |
| BELLE (Chn) | 0.856 | 0.603 | 0.562 | 0.565 |
| ChatGPT (Chn) | 0.711 | 0.655 | 0.675 | 0.758 |
Retrieval accuracy consistently exceeds 0.75, demonstrating MemoryBank's cross-model efficacy. ChatGPT exhibits superior correctness and coherence (reflecting underlying model strength), while BELLE attains highest accuracy in Chinese due to bilingual tuning.
5. Dialog Examples and Behavioral Case Studies
- Psychological Companionship:
- User expresses feeling lost post-breakup.
- Baseline: General advice (“talk with friends or seek help”).
- SiliconFriend: Contextually nuanced, empathic, and memory-referential (“…I remember you enjoy journaling…”).
- Memory Recall:
- Longitudinal tracking (e.g., recalling a book previously recommended, accurately identifying that heap sort was not previously discussed).
- Personality-Tailored Suggestions:
- “Linda” (introverted/ambitious): AI recommends a low-key, growth-oriented event (art-history lecture).
- “Emily” (open-minded/curious): Suggests new experiences aligned with prior expressed interests (dance workshop).
6. Strengths, Limitations, and Future Directions
Strengths
- Implements a biologically inspired, anthropomorphic long-term memory with selective forgetting and reinforcement.
- Generalizes across both open and closed LLM architectures by decoupled, plug-and-play retrieval systems.
- Bilingual dialog capabilities with empirically validated gains in empathy and personalized interaction.
- Methodologically rigorous evaluation combining both qualitative and quantitative axes.
Limitations
- The forgetting model uses only a single scalar , and decay is uniform—more complex memory dynamics are not yet modeled.
- Retention/pruning threshold and update frequency require manual tuning.
- Only text memory is supported; no direct handling of multimodal information.
- Token budget can be exceeded with large memory—domain scaling requires further research.
Future Developments
- Incorporation of advanced forgetting schedules (e.g., spacing effect, overlearning).
- Hierarchical and topic-weighted memory indexing.
- Integrating user feedback as reinforcement during memory updates.
- Addition of multimodal (audio, image) episodic memory.
- End-to-end co-training for retrieval and generation systems.
MemoryBank and SiliconFriend together constitute a substantive advance in the development of AI companions with robust, human-like long-term memory, closing the gap between stateless conversational agents and contextually adaptive, memory-driven assistants (Zhong et al., 2023).