Papers
Topics
Authors
Recent
Search
2000 character limit reached

RexUniNLU: Universal NLU Framework

Updated 3 March 2026
  • RexUniNLU is a universal NLU framework that unifies information extraction and classification by leveraging a recursive extraction paradigm with an Explicit Schema Instructor.
  • It employs a recursive pipeline with custom query construction, isolated prompts, and advanced attention mechanisms to ensure consistent and type-correct schema-based decoding.
  • The framework demonstrates state-of-the-art performance across full-shot, few-shot, and multi-modal benchmarks in multiple languages, validating its innovative design.

RexUniNLU is an encoder-only neural framework introducing a recursive extraction paradigm with an Explicit Schema Instructor (ESI) to achieve universal natural language understanding (NLU). It unifies information extraction (IE) and text classification (CLS) tasks within a single architecture, covering arbitrary extraction schemas—spanning from named entity recognition (NER) and relation extraction (RE) to previously unsolved quadruple and quintuple schemas—as well as CLS and multi-modal understanding. RexUniNLU formalizes true Universal Information Extraction (UIE) and applies schema constraints at each decoding step, ensuring consistency and type correctness for both IE and CLS, and demonstrates state-of-the-art results across diverse NLU tasks and languages (Liu et al., 2024).

1. Formal Foundation of Universal Information Extraction

RexUniNLU redefines UIE to generalize beyond previous models limited to extracting fixed-arity tuples, such as subject–object–relation triples. The RexUniNLU UIE objective addresses an arbitrary schema of arity nn, where extraction corresponds to identifying a sequence of span–type pairs along root-to-leaf paths in a schema tree Cn\mathbf{C}^n. Let x\mathbf{x} denote the input token sequence, A\mathbb{A} the set of annotated tuples (s,t)(\mathbf{s}, \mathbf{t}), with t=[t1,,tn]\mathbf{t} = [t_1,\dots,t_n] a type path and s=[s1,,sn]\mathbf{s} = [s_1,\dots,s_n] the corresponding spans.

The probabilistic extraction objective is: max  (s,t)Ap((s,t)Cn,x)=(s,t)Ai=1np ⁣((si,ti)(s,t)<i,Cn,x)\max \;\prod_{(\mathbf{s},\mathbf{t})\in\mathbb{A}} p\bigl((\mathbf{s},\mathbf{t})\mid\mathbf{C}^n,\mathbf{x}\bigr) = \prod_{(\mathbf{s},\mathbf{t})\in\mathbb{A}} \prod_{i=1}^n p\!\bigl((s_i,t_i)\mid(\mathbf{s},\mathbf{t})_{<i},\mathbf{C}^n,\mathbf{x}\bigr) where (s,t)<i(\mathbf{s},\mathbf{t})_{<i} denotes all extracted pairs up to depth i1i-1.

This general formulation subsumes common tasks:

  • NER (n=1n=1): extract entities as single spans.
  • RE (n=2n=2): extract subject–object–relation tuples.
  • Event Extraction (n=2n=2 or $3$): event-trigger and argument role extraction.
  • Quadruple/Quintuple Extraction (n3n\ge3): higher-arity schemas previously unsupported by UIE.

Classification tasks are modeled as a degenerate case where a special “[CLST]” token span encodes the entire input, yielding an objective over label types: i=1ntiAit<ip ⁣(tit<i,Cn,x),\prod_{i=1}^n \prod_{t_i\in\mathbb{A}_i\mid\mathbf{t}_{<i}} p\!\bigl(t_i\mid \mathbf{t}_{<i},\mathbf{C}^n,\mathbf{x}\bigr), encompassing single/multi-label classification, NLI, multiple-choice MRC, and extendable to multi-modal cases by including non-text features.

2. Model Architecture: Recursive Pipeline with Explicit Schema Instructor

The recursive pipeline operates as follows:

  • Query Construction: At recursion step ii, construct

Qi=[CLS][P]pi[Tti1Tti2][Text]x,Q_i = [\text{CLS}][P]p_i[T\,t_i^1\,T\,t_i^2\,\dots][\text{Text}]\,\mathbf{x},

where pip_i captures previously extracted pairs and ti1,t_i^1,\dots are the eligible types at depth ii. This forms the ESI prompt, explicitly guiding extraction or classification within the schema constraints.

  • Encoder: A transformer encoder (e.g., DeBERTa-v2) processes QiQ_i using custom position IDs PiP_i and attention masks MiM_i to achieve "Prompts Isolation," preventing information leakage between schema branches and allowing blocks to attend only to relevant segments.
  • Score Matrix: Representations hiRL×dh_i \in \mathbb{R}^{L \times d} inform two FFNN heads (query/key), with rotary embeddings (RoPE) encoding positional differences:

Zij,k=(FFNNq(hij))R(PikPij)FFNNk(hik)    Mij,kZ_i^{j,k} = \bigl(\text{FFNN}_q(h_i^j)\bigr)^\top \mathbf{R}(P_i^k-P_i^j)\, \text{FFNN}_k(h_i^k)\;\otimes\;M_i^{j,k}

  • Decoding: After thresholding ZiZ_i at δ\delta, the binary matrix Z~i\widetilde Z_i is decoded via three token-linking operations: head–tail (span detection), head–type (type assignment), and type–tail (type–tail associations).
  • Recursion: Newly extracted pairs YiY_i are used as prefixes for the subsequent query Qi+1Q_{i+1}; recursion halts when no new extractions are made.
  • Isolation Mechanism: Disjoint position ID intervals for different “[P]” blocks, and attention masks blocking cross-prefix or cross-type communication, strictly enforce schema separation.

3. Training Objectives and Decoding

Distinct loss functions are employed for IE and CLS:

  • IE Training (Circle Loss):

Li=log(1+Z^ij=0eZij)+log(1+Z^ik=1eZik),\mathcal{L}_i = \log(1+\sum_{\hat{Z}_i^j=0}e^{\overline{Z}_i^j}) + \log(1+\sum_{\hat{Z}_i^k=1}e^{-\overline{Z}_i^k}),

where Zi\overline{Z}_i flattens ZiZ_i and Z^i{0,1}\hat{Z}_i\in\{0,1\} is the ground truth mask. Total IE loss is LIE=iLi\mathcal{L}_{\mathrm{IE}} = \sum_i \mathcal{L}_i.

  • CLS Training/Decoding:

    • Apply sigmoid to ZiZ_i, producing Z^i(0,1)\widehat Z_i \in (0,1).
    • Single-label: Prediction at position jj is

    y^=argmaxy(Z^ij,y×Z^iy,j)\hat y = \arg\max_y(\widehat Z_i^{j,y} \times \widehat Z_i^{y,j}) - Multi-label: Both directions are thresholded at δ\delta (e.g., 0.9).

4. Experimental Protocol and Benchmarks

RexUniNLU is pre-trained on approximately 30 million samples (Chinese and English), including distant supervision for NER/RE (9.6M), supervised IE (NER, RE, EE, ABSA), and CLS (sentiment, NLI, match, MRC). English pre-training draws from OntoNotes, NYT, SciERC, SQuAD, HellaSwag, HyperRED, and COQE.

Downstream tasks include:

  • Chinese IE: CMeEE-NER, Youku (NER); ACE05, CoNLL04, NYT, SciERC, CoAE2016 (RE); ACE05, CASIE, CCKS (EE); pCLUE, CMRC2018 (MRC IE); 14-res, 15-res, 16-res (ABSA); HyperRED (quadruple); Camera-COQE (quintuple).
  • Chinese CLS: Toutiao (general), NLPCC14-SC (sentiment), AFQMC (match), OCNLI (NLI), C³ (MRC).
  • English IE: ACE04, ACE05-Ent/Rel, CoNLL03, CoNLL04, NYT, SciERC, ACE05-Evt, CASIE, 14-res, 15-res, 16-res, HyperRED, Camera-COQE.
  • Multi-modal NLU: PPN benchmark (20 document types), evaluated with Entity Strict F1.

Standard metrics include various strict F1 scores (Entity, Relation, Triplet, Quadruple, Universal), Trigger, Argument, and Sentiment F1.

5. Quantitative Results and Performance Analysis

Summary of Key Benchmark Results

Model IE Avg CLS Avg All Avg Modality Entity F1
PromptCLUE (mT5-B) 50.92 76.23 63.85 text
mT5-ZSAC 60.42 76.78 68.22 text
SiameseUniNLU (RoB) 60.42 76.01 68.22 text
RexUniNLU-Base 68.64 80.97 74.81 text 34.83
RexUniNLU-Large 69.24 81.65 75.45 text 40.96
MRexUniNLU text+layout+image 66.84

RexUniNLU demonstrates:

  • Full-shot gains: +8–10 points over previous unified models across 12 tasks.
  • Few/Zero-shot: Up to +42 points gain in IE+MRC (zero-shot), e.g., 63.37 (0-shot, RexLarge) vs. 49.07 (Siamese) / 38.74 (mT5).
  • Complex Schemas: +8 points over T5-UIE on quintuples (Camera-COQE); +1–2 points from additional pre-training on event extraction.
  • Few-shot (English): 1-shot F1 on CoNLL03: 89.07 (RexUIE-EN) vs. 79.65 (USM).
  • Zero-shot comparison: CoNLL++ (NER): 76.77 (RexUIE-EN) vs. 58.40 (ChatGPT).

In multi-modal (text+layout+image) NLU, MRexUniNLU achieves 66.84 Entity F1 (PPN), outperforming RexUniNLU-human-text or layout-only variants.

Ablation analysis shows performance drops without Prompts Isolation (−0.52), RoPE (−0.82), and both (−1.92), confirming the architectural choices. There is a positive correlation between schema complexity and F1 gains, specifically with relative gain and log(C/S)\log(C/S), where CC is the number of schema leaf types and SS is training size (Liu et al., 2024).

6. Strengths, Limitations, and Directions for Further Research

Strengths:

  • Unified encoder-only framework supporting all main IE and CLS schema types, multi-modal, and multi-language tasks.
  • Explicit Schema Instructor enforces type constraints and mitigates incorrect extraction, critical in low-data and complex schemas.
  • Recursive decoding accommodates arbitrary schema arity without the computational cost of generative approaches.

Limitations:

  • High pre-training cost due to reliance on large IE/MRC corpora; possibility for efficiency via lighter pre-training or adapter modules.
  • Inference currently requires enumeration over all schema paths, limiting efficiency in rare-type queries; dynamic pruning or learned schema selection is a prospective improvement.
  • Modalities beyond text, layout, and image (e.g. audio, video), wider language coverage, and open-schema IE remain open challenges.
  • Incorporation of continual schema learning for evolving ontologies is an ongoing direction.

7. Significance and Outlook

RexUniNLU introduces a principled, scalable method for universal NLU, bridging longstanding divides between information extraction and classification. Its recursive, schema-constrained inference and generalization to complex and multimodal schemas provide a foundation for robust universal NLU, with performance validated under full-shot, few-shot, zero-shot, and multi-modal regimes on numerous benchmarks in Chinese and English (Liu et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RexUniNLU.