Australian Open Legal QA
- Australian Open Legal QA is a retrieval-augmented generation system that leverages case-based reasoning and neural embeddings to answer complex legal questions over Australian law.
- It integrates structured retrieval with contextual prompt augmentation for LLMs to produce factually grounded and evidential responses.
- The system employs hybrid similarity measures using intra-embeddings and inter-embeddings to optimally retrieve relevant judgments, statutes, and case snippets.
The Australian Open Legal QA (ALQA) system represents a retrieval-augmented generation (RAG) architecture specifically tailored for legal question answering over Australian law, leveraging sophisticated neural embeddings and case-based reasoning (CBR) cycles. Its core capability is the integration of structured retrieval and contextual prompt augmentation for LLMs, facilitating the production of factually grounded and evidentially traceable responses for complex legal inquiries. ALQA derives its foundation from the CBR-RAG framework, operationalized on a corpus of open court judgments, statutes, and regulations from the Australian legal domain (Wiratunga et al., 2024).
1. System Architecture and CBR-RAG Integration
ALQA employs a pipeline where user-submitted legal questions initiate a case-based retrieval, referencing an indexed case-base composed of previous question-answer pairs, supporting legal snippets, and extracted entities. The CBR cycle integrates only the Retrieve and Reuse phases: retrieval leverages neural embedding-based similarity with no adaptation or revision of retrieved content. The support snippets, entity metadata, or full case tuples from retrieved cases—depending on the context format—are concatenated to the new input question, producing an LLM prompt of the form . This prompt is passed to a generative LLM (such as Mistral-7B) and yields the answer . The prescribed architecture ensures that factuality and domain-relevance requirements inherent to legal practice are structurally embedded by grounding generation in carefully selected precedent and statutory context (Wiratunga et al., 2024).
2. Case-Base Representation, Embedding Vocabulary, and Similarity Knowledge Containers
Each element in the case-base is represented as a tuple , where is the question text, is a legal-document support snippet, comprises named entities (statutes, case names, parties) extracted from , and is the answer text (for evaluation only). Retrieval operates over neural embeddings generated through two distinct encoders: intra-embeddings optimized for question-to-question comparisons, and inter-embeddings tailored for matching questions to support snippets or entities.
Similarity knowledge containers in hybrid retrieval scenarios maintain three “buckets” for Q-to-Q, Q-to-S, and Q-to-E scores. A weighted sum of these similarity measures determines the ranking and selection of relevant cases. Default hybrid weights described are for intra, support, and entity matches respectively (Wiratunga et al., 2024).
| Case Element | Embedding Type | Role in Retrieval |
|---|---|---|
| Intra-embedding | Q↔Q, attribute match | |
| Inter-embedding | Q↔S, open IR match | |
| Inter-embedding | Q↔E, entity match |
3. Formal Retrieval Formulations
The formal definition of retrieval modes is as follows, given and prompt :
- Intra-embeddings (Q–Q only):
- Inter-embeddings (Q–S or Q–E):
- Hybrid retrieval (weighted sum):
- Context selection: The top-ranked cases are used to extract either “support-only” () or “full-case” () context.
This configuration allows for granular control over the conceptual alignment and evidentiary relevance of retrieved material.
4. Neural Embeddings: General-Purpose versus Domain-Specific
General-purpose embeddings are instantiated via BERT and AnglEBERT. BERT serves as the canonical pretrained model with masked language modeling and next-sentence prediction objectives. AnglEBERT, by contrast, is contrastively fine-tuned on generic passage pairs, employing a search-oriented prompt to produce inter-embeddings , with an empty prompt for intra-embeddings . LegalBERT provides domain specificity, based on the BERT architecture but further pre-trained on 12 GB of statutes and case law (UK/EU/US), resulting in richer embeddings for polysemous legal terms.
ALQA evaluates all three embedding types in their inter, intra, and hybrid modes, directly passing the retrieved contexts as verbatim text to the LLM. No further embedding is carried out in the generation phase; the LLM relies solely on these augmented textual prompts for answer synthesis (Wiratunga et al., 2024).
5. Datasets, Evaluation Protocols, and Empirical Results
The ALQA dataset comprises 2,084 cleaned GPT-4–generated Q-A-snippet triads, referencing 785 unique legal acts. Notably, 57% of cases lack explicit legal-act mentions, necessitating semantic retrieval. For testing, 32 complex Q-A pairs were synthesized by combining parent cases under a shared legal act and prompting the LLM to generate intersectional queries.
Retrieval evaluation treats the two parent cases per test as ground truth. The metric is Retrieval@k -score for . Hybrid retrieval with AnglEBERT and weight vector attains the best retrieval at .
Generation is assessed by computing cosine similarity between the LLM response embedding and the reference answer embedding. The hybrid AnglEBERT configuration at with full-case context achieves a mean cosine of approximately $0.9141$, corresponding to a 1.94% gain over the No-RAG baseline. This improvement is statistically significant at the 95% level versus No-RAG and hybrid LegalBERT, and at the 90% level versus hybrid BERT (Wiratunga et al., 2024).
6. Domain Adaptation and Operationalization for Australian Legal QA
ALQA’s underlying corpus is sourced from open access Australian judgments, statutes, and regulatory materials. The high incidence of absent legal-act metadata in cases renders conventional lexical indexing insufficient, making robust neural embeddings indispensable. Entity extraction procedures must be aligned with Australian legal descriptors, including distinctions between Commonwealth and State statutes and local court nomenclature.
For full case law retrieval, the approach can extend beyond snippet extraction, leveraging full judgments identified via embedded citations. Embedding strategies, including hybrid retrieval weights and fine-tuning procedures for LegalBERT or AnglEBERT, may be recalibrated to prioritize Australian legal style and terminology. The introduction of an Australian-specific retrieval prompt (“Represent this sentence for finding relevant Australian case passages: …”) fortifies the inter-embedding relevance.
Downstream, LLMs should be selected or fine-tuned to generate outputs consistent with Australian legal drafting conventions, potentially employing corpora such as AustLII. Supplementing prompts with provenance data (e.g., case names, paragraph references) enhances traceability requirements.
ALQA systems may also integrate CBR-RAG with hierarchical precedent retrieval (High Court > Federal Court > State courts), offer user-controlled toggling between support-only and full-case evidence, and facilitate incorporation of local legal style, ensuring high factuality and alignment with jurisdictional requirements (Wiratunga et al., 2024).