Quire Architecture Overview
- Quire Architecture is a dual-faceted design that integrates secure provenance tracking for mobile OS operations with an exact accumulator for high-precision posit arithmetic.
- The security variant employs lightweight HMAC signatures and call-chain tracking, adding minimal IPC/RPC overhead while ensuring end-to-end attestation.
- The numeric component leverages a wide fixed-point accumulator to dramatically reduce rounding errors in fused dot products, enhancing computational accuracy.
A quire architecture refers to two distinct but well-established computational and security mechanisms: (1) the provenance-tracking architecture designed for mobile operating systems, particularly Android, to provide lightweight, verifiable provenance of inter-process and remote procedure calls; and (2) a wide, exact accumulator used in posit arithmetic units for high-precision summation in hardware. Each quire architecture is foundational to its domain—security, for OS-level provenance; and numerical computing, for precision in fused operations and matrix computation.
1. Provenance-Aware Quire Architecture for Smartphone OS
1.1 Design Goals and Threat Model
The prime motivation for the provenance-aware quire architecture is to endow privileged operations—both local IPC and network RPC—with unforgeable, end-to-end provenance metadata. Its principal design criteria are:
- Provenance Awareness: Every sensitive operation is associated with a full call chain of initiating principals (UID, PID tuples), enabling services to enforce least-privilege policies analogous to Java stack inspection.
- Low Overhead: Modifications are restricted to cross-process IPC and add ≤ tens of microseconds per operation, avoiding per-instruction instrumentation or kernel modifications.
- Backward Compatibility: Legacy apps operate unmodified; only those explicitly opting in—by marking AIDL methods with an "auth" flag—incur the overhead or participate in provenance protocols.
- End-to-End Attestation: Provenance and integrity meta-data are projected from local IPC out to remote RPC endpoints, such that remote servers can authenticate both the source device and the full causal chain of a sensitive request.
The threat model assumes apps may be malicious or buggy, may attempt confused deputy attacks, and may forge call chains or messages, but grants trust exclusively to the OS kernel and selected system services. Network attackers are assumed present—thus remote servers must rely on strong OS-backed attestation (Dietz et al., 2011).
1.2 Core Mechanisms: Call-Chain Tracking and Lightweight Signatures
1.2.1 IPC Call-Chain Tracking
A call chain is represented as a linked list or array of (UID, PID) tuples, prepending the invoker at each Binder-mediated cross-process call. On receipt, the full call chain allows the callee to enforce privilege checks via intersection—mirroring stack-inspection in managed runtimes. A "dropCallers()" primitive enables an app to remove upstream principals when acting on its own behalf.
Data Structure Example:
| Principal Tuple | Chain Role |
|---|---|
| (UID_A, PID_A) | Initiator |
| (UID_B, PID_B) | Intermediate |
| ... | ... |
| (UID_Y, PID_Y) | Callee |
Call chain tracking is implemented by extending the proxy/stub code autogenerated by AIDL, and does not involve kernel driver changes—only Java layer code modification.
1.2.2 Lightweight Signature Scheme
To efficiently authenticate arbitrary messages within the system, each app principal holds a shared secret with a privileged AuthorityManager system service. Statements are signed using HMAC-SHA1, with per-app, session-issued keys. Statement verification always occurs inside AuthorityManager, which alone knows the relevant keys, ensuring apps cannot spoof or forge another app's statements:
- Signing: , where is the session key for principal .
- Verification: AuthorityManager recomputes the HMAC using the provided UID/PID and message , and accepts only if the computed tag matches.
Performance is highly favorable compared to public-key infrastructure: signing takes ≈20 μs + 15 μs/KB, verification ≈556 μs + 96 μs/KB.
1.3 Integration with Android Binder and RPC Attestation
The architecture is realized by:
- Modifying the AIDL compiler to interpose on "auth" methods for transparent call chain and statement marshaling/unmarshaling.
- Running AuthorityManager as a system-UID service with protected key material, providing getKey and verifyStatement RPCs.
- Network provenance is maintained by a NetProvider service holding a hardware-protected TLS key. Network RPCs are wrapped to export the local call chain and any authenticated statements, projected into HTTP headers, and delivered over client-authenticated TLS.
Remote services thus verify not only device authenticity but also operation traceability to a specific, authenticated app chain (Dietz et al., 2011).
1.4 Security Analysis
- Confused Deputy Defense: At every chain hop, resource accesses are permitted only if \textit{all} principals in the chain possess the required privilege, preventing privilege escalation via intermediate services.
- End-to-End Provenance: The OS ensures Android Binder reliably supplies credentials which cannot be spoofed by userland code; net-attested provenance is backed by device TLS keys.
- Integrity of Statement Flow: A compromised intermediate cannot modify upstream statements: HMAC can only be generated/verified by either the proper app or AuthorityManager, and not by user space code.
2. Quire Architecture in Posit Arithmetic Units
2.1 Concept and Mathematical Role
In posit arithmetic, a "quire" is a wide, fixed-point accumulator that enables exact, unrounded accumulation of products, notably in dot products and matrix multiplications. Unlike IEEE-754 FMA units that round after every add/multiply, the quire accumulates all partial results exactly, performing only a single rounding when exporting the final posit result.
For an N-bit posit, the required quire width is , sufficient to accumulate the sum of squares of all possible posit values without risk of overflow:
| Posit Width N | Quire Width Q<sub>bits</sub> |
|---|---|
| 8 | 32 |
| 16 | 128 |
| 32 | 512 |
| 64 | 1024 |
The quire is a 2's-complement integer, logically representing a wide fixed-point sum. Only at the final normalization/rounding stage does one convert the quire sum back into a posit (Sharma et al., 2020, Mallasén et al., 2021, Mallasén et al., 2023).
2.2 Microarchitectural Integration
2.2.1 PERCIVAL and Big-PERCIVAL (RISC-V/Xposit)
The quire resides in the Posit Arithmetic Unit (PAU), in parallel with standard ALUs and FPUs. Instruction sequences for a fused dot product typically involve:
- QCLR.S: clearing the quire.
- QMADD.S: accumulating into the quire (without intermediate rounding).
- QROUND.S: normalizing and rounding the final fixed-point sum into a posit.
For n=32, the quire is 512 bits; for n=64 (Big-PERCIVAL), it is 1024 bits. Each PAU supports only a single global quire accumulator, so each dot product must be "bracketed" by QCLR.S and QROUND.S (Mallasén et al., 2021, Mallasén et al., 2023).
2.2.2 CLARINET/Melodica
Melodica instantiates the quire as a segmented (e.g., 32-bit slices), deeply pipelined accumulator in a five-stage in-order pipeline. Key operations include:
- FCVT.R.P: initializing the quire from a posit.
- FMA.P, FMS.P, FDA.P, FDS.P: fused multiply/add/subtract into the quire.
- FCVT.P.R: export the summed, rounded value to a posit register.
Segment-level "zero flags" and pipelined addition allow high throughput. There is no register file visibility for the quire; all accumulation is mediated by the functional unit via the defined mnemonics (Sharma et al., 2020).
2.3 Accuracy, Performance, and Hardware Cost
- On 256×256 matrix multiplication (inputs ∈ [–1,1]):
- IEEE-754 single precision: MSE ≈
- Posit32 with quire: MSE ≈ (4 orders of magnitude reduction)
- Posit32 without quire: MSE ≈ (2 orders)
- Latency and throughput: GEMM with Posit32+quire matches single-precision FP performance (13.9 s runtime), faster than double-precision (15.0 s), due to the fused-dot product instruction sequence (Mallasén et al., 2021).
- Hardware cost: For 32-bit posit, the quire PAU incurs ~2.5× area and power overhead versus IEEE-754 FPU. For 64-bit posit, the cost of a 1024-bit quire MAC datapath is dominant (up to 29 781 LUTs, 24 DSPs) (Mallasén et al., 2023).
2.4 Conjugate Gradient and Numerical Convergence
Accumulating dot products using the quire reduces the classic error bound of for k-term dot products (where is the unit round-off) to a single error, regardless of k:
This reduction is empirically shown to decrease the number of CG iterations required for convergence by 5–15% compared to double-precision floats (Mallasén et al., 2023).
2.5 Instruction Set and Pipeline Implications
Posit/quire architectures introduce new instructions into RISC-V ISAs, for instance:
- PLW, PSW: posit load/store
- QMADD.S, QMSUB.S: fused accumulate/subtract
- QROUND.S: quire → posit
- FCVT.R.P / FCVT.P.R: conversions between posit register and quire
Pipeline constraints mandate strict serialization: only one accumulation (dot-product) can proceed at a time, and loop ordering for matrix operations must account for the global nature of the quire resource (Mallasén et al., 2023, Mallasén et al., 2021, Sharma et al., 2020).
3. Design Trade-Offs and Limitations
Both security and numeric quire architectures embody design trade-offs:
- Security Quire (Android): Modest IPC latency increase (~20% for two-hop chains; +~70–145 μs per round-trip), and ~6 ms RPC overhead (insignificant compared to network latency). All changes are confined to the Java/AIDL layer and require no kernel refactoring. Legacy apps remain unaffected (Dietz et al., 2011).
- Numeric Quire (Posit PAU): Area and power scale quadratically with posit width due to quire sizing. Segmenting and pipelining the accumulator alleviate per-stage adder width but the overall width remains a primary cost driver. The quire’s strict serialization elevates scheduling burdens; only one accumulation can proceed at a time, impacting kernel tiling or interleaving (Mallasén et al., 2021, Mallasén et al., 2023).
4. Example Applications
4.1 Provenance-Backed OS
- Click-Fraud–Resistant Ads: The OS signs UI MotionEvent with HMAC, apps relay click events with provenance chain to ad components. Server receives full chain and signed UI event, ensuring only valid user touches trigger payments.
- Micropayment ("PayBuddy") Service: The app submits an HMAC-signed purchase order, payment app verifies the signature and prompts the user, and the NetProvider delivers the request and call chain over TLS to the payment gateway. All entities see verified statements and chains; no participant can forge an upstream result (Dietz et al., 2011).
4.2 Scientific and Engineering Computation
- High-Precision Dot Products: GEMM and CG solvers experience order-of-magnitude reductions in rounding error, and, for CG, up to 15% reduction in iteration count.
- Mixed Posit/Float Execution: Compiler flows (e.g., Xposit LLVM backend) can target mixed arithmetic code paths, leveraging quire operations natively via dedicated opcode spaces and register files (Mallasén et al., 2021, Mallasén et al., 2023).
- FPGA/ASIC Deployment: Implementations on Xilinx and TSMC technologies report clock frequencies of up to 100 MHz standalone or 25 MHz in-CPU; area impact is significant but justified by the precision gains for long accumulations (Sharma et al., 2020, Mallasén et al., 2023).
5. Comparative Summary Table
| Domain | Quire Function | Core Strength | Core Limitation |
|---|---|---|---|
| OS Provenance (Dietz et al., 2011) | Call-chain + signature | Defends against privilege escalation, enables end-to-end RPC attestation | Small fixed IPC/RPC overhead; opt-in required for apps |
| Posit Numeric (Mallasén et al., 2021, Mallasén et al., 2023, Sharma et al., 2020) | Wide exact accumulator | Dramatically reduced numerical error, exact single-rounding semantics | Quadratically increasing area/power; pipeline serialization on dot product |
6. Significance and Ongoing Developments
Quire architectures have become foundational in two disparate communities. In mobile OS security research, they underpin end-to-end provenance enforcement for both local and remote requests with quantifiable, minimal overhead. In computer arithmetic, they enable posit-based computation to realize decentralized, error-limited, and highly accurate linear algebra, at a cost of increased hardware area and complexity.
Current directions in posit arithmetic quire research include open-source toolchain integration (e.g., LLVM/Xposit), parameterized implementation for varying N/es, and benchmarking on scientific workloads with stringent accuracy and precision demands. In the provenance domain, the focus is on extending coverage to a broader range of OS services and network protocols while investigating further reductions in attestation latency (Mallasén et al., 2021, Mallasén et al., 2023, Dietz et al., 2011, Sharma et al., 2020).