RexUniNLU: Universal NLU Framework

Updated 3 March 2026

RexUniNLU is a universal NLU framework that unifies information extraction and classification by leveraging a recursive extraction paradigm with an Explicit Schema Instructor.
It employs a recursive pipeline with custom query construction, isolated prompts, and advanced attention mechanisms to ensure consistent and type-correct schema-based decoding.
The framework demonstrates state-of-the-art performance across full-shot, few-shot, and multi-modal benchmarks in multiple languages, validating its innovative design.

RexUniNLU is an encoder-only neural framework introducing a recursive extraction paradigm with an Explicit Schema Instructor (ESI) to achieve universal natural language understanding (NLU). It unifies information extraction (IE) and text classification (CLS) tasks within a single architecture, covering arbitrary extraction schemas—spanning from named entity recognition (NER) and relation extraction (RE) to previously unsolved quadruple and quintuple schemas—as well as CLS and multi-modal understanding. RexUniNLU formalizes true Universal Information Extraction (UIE) and applies schema constraints at each decoding step, ensuring consistency and type correctness for both IE and CLS, and demonstrates state-of-the-art results across diverse NLU tasks and languages (Liu et al., 2024).

1. Formal Foundation of Universal Information Extraction

RexUniNLU redefines UIE to generalize beyond previous models limited to extracting fixed-arity tuples, such as subject–object–relation triples. The RexUniNLU UIE objective addresses an arbitrary schema of arity $n$ , where extraction corresponds to identifying a sequence of span–type pairs along root-to-leaf paths in a schema tree $\mathbf{C}^n$ . Let $\mathbf{x}$ denote the input token sequence, $\mathbb{A}$ the set of annotated tuples $(\mathbf{s}, \mathbf{t})$ , with $\mathbf{t} = [t_1,\dots,t_n]$ a type path and $\mathbf{s} = [s_1,\dots,s_n]$ the corresponding spans.

The probabilistic extraction objective is: $\max \;\prod_{(\mathbf{s},\mathbf{t})\in\mathbb{A}} p\bigl((\mathbf{s},\mathbf{t})\mid\mathbf{C}^n,\mathbf{x}\bigr) = \prod_{(\mathbf{s},\mathbf{t})\in\mathbb{A}} \prod_{i=1}^n p\!\bigl((s_i,t_i)\mid(\mathbf{s},\mathbf{t})_{<i},\mathbf{C}^n,\mathbf{x}\bigr)$ where $(\mathbf{s},\mathbf{t})_{<i}$ denotes all extracted pairs up to depth $i-1$ .

This general formulation subsumes common tasks:

NER ( $n=1$ ): extract entities as single spans.
RE ( $n=2$ ): extract subject–object–relation tuples.
Event Extraction ( $n=2$ or $3$): event-trigger and argument role extraction.
Quadruple/Quintuple Extraction ( $n\ge3$ ): higher-arity schemas previously unsupported by UIE.

Classification tasks are modeled as a degenerate case where a special “[CLST]” token span encodes the entire input, yielding an objective over label types: $\prod_{i=1}^n \prod_{t_i\in\mathbb{A}_i\mid\mathbf{t}_{<i}} p\!\bigl(t_i\mid \mathbf{t}_{<i},\mathbf{C}^n,\mathbf{x}\bigr),$ encompassing single/multi-label classification, NLI, multiple-choice MRC, and extendable to multi-modal cases by including non-text features.

2. Model Architecture: Recursive Pipeline with Explicit Schema Instructor

The recursive pipeline operates as follows:

Query Construction: At recursion step $i$ , construct

$Q_i = [\text{CLS}][P]p_i[T\,t_i^1\,T\,t_i^2\,\dots][\text{Text}]\,\mathbf{x},$

where $p_i$ captures previously extracted pairs and $t_i^1,\dots$ are the eligible types at depth $i$ . This forms the ESI prompt, explicitly guiding extraction or classification within the schema constraints.

Encoder: A transformer encoder (e.g., DeBERTa-v2) processes $Q_i$ using custom position IDs $P_i$ and attention masks $M_i$ to achieve "Prompts Isolation," preventing information leakage between schema branches and allowing blocks to attend only to relevant segments.
Score Matrix: Representations $h_i \in \mathbb{R}^{L \times d}$ inform two FFNN heads (query/key), with rotary embeddings (RoPE) encoding positional differences:

$Z_i^{j,k} = \bigl(\text{FFNN}_q(h_i^j)\bigr)^\top \mathbf{R}(P_i^k-P_i^j)\, \text{FFNN}_k(h_i^k)\;\otimes\;M_i^{j,k}$

Decoding: After thresholding $Z_i$ at $\delta$ , the binary matrix $\widetilde Z_i$ is decoded via three token-linking operations: head–tail (span detection), head–type (type assignment), and type–tail (type–tail associations).
Recursion: Newly extracted pairs $Y_i$ are used as prefixes for the subsequent query $Q_{i+1}$ ; recursion halts when no new extractions are made.
Isolation Mechanism: Disjoint position ID intervals for different “[P]” blocks, and attention masks blocking cross-prefix or cross-type communication, strictly enforce schema separation.

3. Training Objectives and Decoding

Distinct loss functions are employed for IE and CLS:

IE Training (Circle Loss):

$\mathcal{L}_i = \log(1+\sum_{\hat{Z}_i^j=0}e^{\overline{Z}_i^j}) + \log(1+\sum_{\hat{Z}_i^k=1}e^{-\overline{Z}_i^k}),$

where $\overline{Z}_i$ flattens $Z_i$ and $\hat{Z}_i\in\{0,1\}$ is the ground truth mask. Total IE loss is $\mathcal{L}_{\mathrm{IE}} = \sum_i \mathcal{L}_i$ .

CLS Training/Decoding:
- Apply sigmoid to $Z_i$ , producing $\widehat Z_i \in (0,1)$ .
- Single-label: Prediction at position $j$ is
$\hat y = \arg\max_y(\widehat Z_i^{j,y} \times \widehat Z_i^{y,j})$ - Multi-label: Both directions are thresholded at $\delta$ (e.g., 0.9).

4. Experimental Protocol and Benchmarks

RexUniNLU is pre-trained on approximately 30 million samples (Chinese and English), including distant supervision for NER/RE (9.6M), supervised IE (NER, RE, EE, ABSA), and CLS (sentiment, NLI, match, MRC). English pre-training draws from OntoNotes, NYT, SciERC, SQuAD, HellaSwag, HyperRED, and COQE.

Downstream tasks include:

Chinese IE: CMeEE-NER, Youku (NER); ACE05, CoNLL04, NYT, SciERC, CoAE2016 (RE); ACE05, CASIE, CCKS (EE); pCLUE, CMRC2018 (MRC IE); 14-res, 15-res, 16-res (ABSA); HyperRED (quadruple); Camera-COQE (quintuple).
Chinese CLS: Toutiao (general), NLPCC14-SC (sentiment), AFQMC (match), OCNLI (NLI), C³ (MRC).
English IE: ACE04, ACE05-Ent/Rel, CoNLL03, CoNLL04, NYT, SciERC, ACE05-Evt, CASIE, 14-res, 15-res, 16-res, HyperRED, Camera-COQE.
Multi-modal NLU: PPN benchmark (20 document types), evaluated with Entity Strict F1.

Standard metrics include various strict F1 scores (Entity, Relation, Triplet, Quadruple, Universal), Trigger, Argument, and Sentiment F1.

5. Quantitative Results and Performance Analysis

Summary of Key Benchmark Results

Model	IE Avg	CLS Avg	All Avg	Modality	Entity F1
PromptCLUE (mT5-B)	50.92	76.23	63.85	text	—
mT5-ZSAC	60.42	76.78	68.22	text	—
SiameseUniNLU (RoB)	60.42	76.01	68.22	text	—
RexUniNLU-Base	68.64	80.97	74.81	text	34.83
RexUniNLU-Large	69.24	81.65	75.45	text	40.96
MRexUniNLU	—	—	—	text+layout+image	66.84

RexUniNLU demonstrates:

Full-shot gains: +8–10 points over previous unified models across 12 tasks.
Few/Zero-shot: Up to +42 points gain in IE+MRC (zero-shot), e.g., 63.37 (0-shot, RexLarge) vs. 49.07 (Siamese) / 38.74 (mT5).
Complex Schemas: +8 points over T5-UIE on quintuples (Camera-COQE); +1–2 points from additional pre-training on event extraction.
Few-shot (English): 1-shot F1 on CoNLL03: 89.07 (RexUIE-EN) vs. 79.65 (USM).
Zero-shot comparison: CoNLL++ (NER): 76.77 (RexUIE-EN) vs. 58.40 (ChatGPT).

In multi-modal (text+layout+image) NLU, MRexUniNLU achieves 66.84 Entity F1 (PPN), outperforming RexUniNLU-human-text or layout-only variants.

Ablation analysis shows performance drops without Prompts Isolation (−0.52), RoPE (−0.82), and both (−1.92), confirming the architectural choices. There is a positive correlation between schema complexity and F1 gains, specifically with relative gain and $\log(C/S)$ , where $C$ is the number of schema leaf types and $S$ is training size (Liu et al., 2024).

6. Strengths, Limitations, and Directions for Further Research

Strengths:

Unified encoder-only framework supporting all main IE and CLS schema types, multi-modal, and multi-language tasks.
Explicit Schema Instructor enforces type constraints and mitigates incorrect extraction, critical in low-data and complex schemas.
Recursive decoding accommodates arbitrary schema arity without the computational cost of generative approaches.

Limitations:

High pre-training cost due to reliance on large IE/MRC corpora; possibility for efficiency via lighter pre-training or adapter modules.
Inference currently requires enumeration over all schema paths, limiting efficiency in rare-type queries; dynamic pruning or learned schema selection is a prospective improvement.
Modalities beyond text, layout, and image (e.g. audio, video), wider language coverage, and open-schema IE remain open challenges.
Incorporation of continual schema learning for evolving ontologies is an ongoing direction.

7. Significance and Outlook

RexUniNLU introduces a principled, scalable method for universal NLU, bridging longstanding divides between information extraction and classification. Its recursive, schema-constrained inference and generalization to complex and multimodal schemas provide a foundation for robust universal NLU, with performance validated under full-shot, few-shot, zero-shot, and multi-modal regimes on numerous benchmarks in Chinese and English (Liu et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

RexUniNLU: Recursive Method with Explicit Schema Instructor for Universal NLU (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RexUniNLU.