xLAM-7B-FC-R: Function-Calling 7B Transformer

Updated 14 May 2026

xLAM-7B-FC-R is a function-calling specialized 7B-parameter transformer that interleaves natural language with JSON schema for precise API tool use.
It utilizes a balanced training pipeline combining execution-verified synthetic data and general instructions, enhancing both reasoning and function execution.
Robust architectural optimizations and multi-stage quality controls enable xLAM-7B-FC-R to rival larger models on benchmarks like the Berkeley Function-Calling Leaderboard.

xLAM-7B-FC-R is a function-calling–specialized 7B-parameter large action model within the xLAM family, developed to advance open-source AI agent capabilities. Built atop the DeepSeek-Coder-7B-instruct backbone, it integrates end-to-end fine-tuning for high-fidelity JSON-style function calls interleaved with self-consistent “thought” traces. The model is distinguished by a unified, execution-verified dataset, architectural optimizations for API tool-use, and empirical performance rivaling significantly larger proprietary baselines (Zhang et al., 2024).

1. Model Architecture

xLAM-7B-FC-R inherits the dense Transformer architecture of DeepSeek-Coder-7B, comprising $L=32$ decoder-only Transformer layers, each with hidden dimension $H=4096$ , feed-forward inner size $D_{ff}=16,384$ , and $A=16$ self-attention heads. No mixture-of-experts (MoE) are present in FC variants. The multi-head attention at each layer ℓ receives input representations $H^{(\ell-1)} \in \mathbb{R}^{T \times H}$ and computes: $Q = H^{(\ell-1)} W_Q,\quad K = H^{(\ell-1)} W_K,\quad V = H^{(\ell-1)} W_V$

$\text{head}_i = \text{softmax}(Q_i K_i^\top / \sqrt{d_k}) V_i$

$\text{MHA}(H^{(\ell-1)}) = [\text{head}_1; \ldots; \text{head}_A] W_O$

This is followed by a gated GeLU feed-forward layer,

$\text{FFN}(x) = W_2 (\text{GeLU}(W_1 x)) + x$

Layer normalization precedes each sublayer, and residual connections are incorporated post-sublayer.

2. Training Data and Pipeline

The model employs a unified training corpus composed of three primary sources:

Cleaned and Augmented Agent Data: Tool-usage and web-agent trajectories standardized to a JSON schema (task, tools, format, few-shot setups, stepwise traces).
Synthetic Function-Calling Data: 60,000 high-quality, execution-verified samples generated by APIGen across 3,673 real-world APIs spanning 21 categories.
General Instruction Tuning: Approximately 20–30% sourced from DialogStudio and Data Provenance, filtered for non-commercial licensing and rated by Mixtral-8x22B and DeepSeek-V2 models to remove repetitious or low-quality dialogue.

During fine-tuning for the FC-R variant, each minibatch is composed of 50% execution-verified function-calling data and 50% agent/general instruction data sampled evenly. Optimization is conducted using full-model PyTorch FSDP on NVIDIA H100s, with a batch size of 128 sequences per GPU (effective 1,024 for 8 GPUs), sequence length up to 4,000 tokens, cosine-decay learning rate peaking at $2 \times 10^{-5}$ (100-step warmup), total training exposure of approximately 150 billion tokens over three epochs, weight decay 0.1, and dropout 0.1 in FFN.

Loss function includes both standard cross-entropy,

$H=4096$ 0

and pairwise ranking loss for DPO alignment,

$H=4096$ 1

where $H=4096$ 2 denotes pre-softmax sequence score.

3. Function-Calling Integration

xLAM-7B-FC-R employs a unified JSON-based schema for function-calling, interleaving "thought" reasoning strings and “tool_calls” in the generated output. Instead of allocating a separate API-head, the model learns during supervised fine-tuning to emit target sequences combining natural language and JSON special tokens. A minimal output template is: $H=4096$ 3 At inference, standard token-by-token autoregressive decoding proceeds over the merged vocabulary, with no additional special parameters for API function calls. Decoding is operationalized by: $H=4096$ 4 This method ensures high-fidelity, schema-conforming function call outputs interleaved with structured reasoning.

4. Empirical Performance

On the Berkeley Function-Calling Leaderboard v2 (as of 2024-09-03), xLAM-7B-FC-R attains an Overall Accuracy of 80.18%, outpacing many larger open-source models. The breakdown is as follows:

Evaluation Metric	xLAM-7B-FC-R (%)
Overall Accuracy	80.18
AST-only, simple	70.52
AST-only, multiple	78.22
AST-only, parallel	73.88
AST-only, parallel-multiple	68.50
Executable, simple	95.21
Executable, multiple	90.00
Executable, parallel	88.00
Executable, parallel-multiple	77.50
Relevance: ignore-irrelevance	79.54
Relevance: detect-relevance	80.49

Comparisons: GPT-4-0125 (function-call prompt) yields 81.78% overall; Gorilla-OpenFunctions-v2 achieves 79.10%; GPT-3.5 Turbo (FC) records 75.41%. Notably, xLAM-7B-FC-R, at 7B parameters, demonstrates competitive or superior performance relative to open-source baselines of substantially greater scale (Zhang et al., 2024).

5. Key Design Optimizations

Critical design decisions contributing to model effectiveness include:

Unified JSON Schema: Standardizing all trajectories into a ["thought","tool_calls"] format with explicit module boundaries ensures persistent API schema adherence.
Prompt-format and Paraphrase Augmentation: Employing stochastic shuffling of tool lists and paraphrased formatting instructions mitigates overfitting to prompt structure, improving generalization.
Multi-stage Quality Verification: Pre-training data undergoes rigorous filtering—rule-based checks (undefined tools/arguments), LLM-driven hallucination detection, and human-in-the-loop trajectory rating—to excise approximately 15% of low-quality samples.
APIGen Synthesis: The synthetic dataset, with all API calls execution-verified, enhances parallel-call accuracy by +4%.
Balanced Mini-batch Composition: Equal sampling of function-calling and agent/general instruction data in minibatches yields substantial gains on both AST and executable function metrics, while maintaining language understanding capacity.

6. Context and Significance

xLAM-7B-FC-R exemplifies the impact of a systematic data engineering and training pipeline on function-calling performance for autonomous AI agents. By enforcing schema-unification, incorporating execution-verified synthetic data (50% of fine-tuning), and applying multi-stage quality control, the model achieves accuracy levels near that of state-of-the-art proprietary models within a compact 7B-parameter dense Transformer. This outcome evidences that scalable, unified methodologies can democratize high-fidelity function-calling even at modest parameter counts, advancing open-source alternatives for AI agent systems (Zhang et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

xLAM: A Family of Large Action Models to Empower AI Agent Systems (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to xLAM-7B (FC).