Lieb-Robinson Bounds in Quantum Systems
- Lieb-Robinson bounds are mathematical limits that define an effective light cone, restricting the speed of information propagation in non-relativistic quantum systems.
- They are derived using operator theory and commutator estimates to establish exponential decay of correlations over distance and time.
- Applications include justifying locality approximations in quantum simulations and informing the design of quantum communication protocols.
The paper “Semantic Reasoning with Differentiable Graph Transformations” proposes what the authors call an Embedding-Based Reasoner (EBR). At its heart EBR is nothing more exotic than a small STRIPS-style symbolic planner whose facts, predicates and rules have all been lifted into a shared real-valued vector space so that matching and application of rules are differentiable. What follows is a concise but information-rich walkthrough of the model architecture, the formal notion of a trainable graph transformation rule, the link to Description Logic, the training objective and algorithm, two of the toy examples from the paper (complete with embedding dimensions and threshold values), and finally a sketch of the inference pseudocode.
- Overall model architecture — Vocabulary. The system assumes a fixed maximum number of distinct predicates (nodes and edges) n. In practice the authors use the 300-dimensional GloVe vectors as initial embeddings for all possible node labels, and initialize edge-type embeddings randomly in the same 300-dimensional space. — Facts and States. At any step i of reasoning the system maintains a Boolean “truth” vector f_i ∈ℝⁿ, where f_i[k]=1 if the k-th predicate (node or edge) is present in the current semantic graph, and 0 otherwise. We always start from f_0=1ⁿ (every given fact is assumed true at step zero). — Rules as Linear Operators. Each rule R_i is broken into two parts, a similarity (or matching) operator S_i and a propagation operator R_i. Conceptually f_{i+1} ← R_i * ( S_i f_i ) so that S_i (“does the current graph contain a subgraph matching the rule’s LHS?”) gates which facts survive and R_i “wires” those surviving facts into the rule’s RHS (the CREATE predicates). Because both S_i and R_i are built out of differentiable matrix multiplications, the entire chain f_n = S_{n–1} * R_{n–2} * … * S_0 * f_0 is one big differentiable function of the rule embeddings, thresholds and weights.
- Formal definition of a graph transformation rule
- Embedding notation. Let e_j ∈ℝd be the embedding of the j-th entity (node) in the vocabulary, and r_k ∈ℝd be the embedding of the k-th relation (edge) type. In the paper d=300.
- Matching (scoring) function. For the pre-condition of rule i we collect all LHS node embeddings into a matrix P_i ∈ℝ{d×n} (columns are the e_j of each required node slot), and the current state’s node embeddings into F_i ∈ℝ{d×n} (columns are the e’s of each fact slot). We also maintain a binary mask M_i ∈{0,1}{n×n} that enforces which node-slots in P_i are allowed to match which slots in F_i (this encodes the subgraph-isomorphism pattern). Finally each slot has its own trainable threshold t_i ∈ℝn, which we broadcast into a matrix T_i ∈ℝ{n×n} by repeating along the rows. Then the similarity matrix S_i ∈ℝ{n×n} is S_i = M_i ⊙ Softmax( P_iT F_i – T_i ) (1) where Softmax acts column-wise and ⊙ is entrywise multiplication. If a dot-product P_i[:,a]·F_i[:,b] falls below the threshold t_i[a], the entry (a,b) is driven toward zero.
- Propagation matrix. Suppose the rule’s LHS has p slots and its RHS has q slots. We define a binary mask W_i ∈{0,1}{n×n} which has ones for all pairs (b→c) that connect each matched LHS slot b to each new RHS slot c, and zeros elsewhere. We then learn a single scalar weight w_i per rule and define R_i = w_i W_i. (2)
- Chaining facts through a rule. If at step i the current truth vector is f_i ∈ℝn, then after applying rule i we compute f_{i+1} = R_i ( S_i f_i ). (3) In practice one interleaves matching of nodes (Eq. (1)) with matching of edges (an identical computation producing S_ir, R_ir over the r_k embeddings), and updates a pair (f_i, f_ir) of truth vectors simultaneously.
- Connection with Description Logic Each MATCH predicate like person>0.6(a) is read as an ABox assertion “the individual filling variable a is an instance of the class Person,” but with approximate matching: embedding(joe)·embedding(person) > 0.6 In DL notation one could write joe ≈ person ⇔ e_joe·e_person > t. The CREATE predicates introduce new individuals and role assertions in the standard DL sense. By tying every rule’s LHS and RHS into the same embedding space, we effectively learn DL axioms that generalize across synonyms and related concepts.
- Training procedure
- Data. We are given a single (or a small set of) fact-graph(s) (represented as f_0) and a target goal-pattern g ∈{0,1}n specifying which predicates must be true at the end of the chain.
- Objective. After T steps the model predicts f_T. We do the same for the relation masks to obtain f_Tr. Let g and gr be the Boolean goal vectors for nodes and edges. We minimize binary cross-entropy L = –[ g·log(f_T) + (1–g)·log(1–f_T ) ] –[ gr·log(f_Tr) + (1–gr)·log(1–f_Tr ) ] which encourages true goal predicates to have high activation and false ones low.
- Parameters. We learn
- all rule embeddings (columns of each P_i and P_ir),
- thresholds t_i for each MATCH slot,
- rule weights w_i,
- relation thresholds t_ir, rule weights w_ir.
- Optimization. Simply back-propagate through the entire matrix chain (Eq. (3) and its relation analogue) and update parameters with Adam (Kingma & Ba).
- Toy examples and empirical performance
A. One-rule learning (single step). * Facts: person(a), spouse(a,b), person(b), be(a,c), first-lady(c) ⇒ f_0 has five ones at the corresponding indices. * Goal: person(a), profession(a,b), president(b) * Empty template rule: MATCH (a),(a,b),(b),(a,c),(c) CREATE (b),(b,d),*(d) * Learned rule (all embeddings are 300-dimensional; thresholds clipped ≥0.6): MATCH person>0.600(a), first-lady>0.600(b), person>0.600(c), be>0.6363(a,b), spouse>0.6339(a,c) CREATE (b), president(d), profession(b,d) * After one application f_1 matches g perfectly.
B. Two-rule chaining. * Facts: fruit(a), be(a,b), round(b), be(a,c), delicious(c) * Goal: fruit(a), be(a,b), apple(b) * Two templates: MATCH (a),(a,b),(b),(a,c),(c) CREATE (b),and(b,c),(c) then MATCH *(a),and(a,b),(b) CREATE (c),(c,d),(d) * Learned first rule thresholds ≈0.695 for both be-edges. Learned second rule thresholds ≈0.9 on the and-edge, ≈0.6 on round and delicious. * Chaining them yields the goal.
Although the paper presents only toy tasks, these experiments prove that random initial embeddings and thresholds can converge to exact symbolic rules via purely differentiable optimization.
- Inference pseudocode
initialize all rule embeddings P_i, P_ir, thresholds t_i,t_ir, weights w_i,w_ir freeze fact embeddings e_j and relation embeddings r_k for each training epoch: f ← f_0 // initial truth vector from the given facts fr← f_0r for i=0 … T–1: // MATCH nodes S_i ← M_i ⊙ Softmax( P_iT F – broadcast(t_i) ) // MATCH relations (identical form over relation embeddings) S_ir ← M_ir ⊙ Softmax( (P_ir)T Fr – broadcast(t_ir) ) // PROPAGATION f ← w_i·W_i * (S_i f) fr← w_ir·W_ir * (S_ir fr) compute loss L = –[ g⊙log(f) + (1–g)⊙log(1–f ) ] –[ gr⊙log(fr) + (1–gr)⊙log(1–fr ) ] backprop L; update P_i, t_i, w_i, and their relation counterparts end epoch
// At test time one simply runs the same chain (without gradient updates) // and reads off f_T to see which predicates are inferred.
In summary, EBR recasts symbolic graph-transformation reasoning as a sequence of masked attention + linear mappings in an embedding space. Each rule is nothing but a small collection of trainable vectors plus thresholds; rule application is soft subgraph matching via attention (Eq. 1) followed by a learned “wiring” matrix (Eq. 2). Described this way, it becomes possible to learn entire chains of first-order inference rules purely by gradient descent while retaining an explicit MATCH/CREATE presentation of each rule.