Papers
Topics
Authors
Recent
Search
2000 character limit reached

Transferable Interaction Pattern (TIP)

Updated 4 July 2026
  • TIP is a reusable interaction abstraction that preserves invariant structures across diverse domains such as web agents, IoT networks, and human guidance.
  • It decouples semantic interaction patterns from concrete implementations, enabling late binding and dynamic re-grounding in various applications.
  • Practical implementations like SkillMigrator and the Intent Protocol demonstrate efficient retrieval, negotiation, and schema adaptation to optimize operational performance.

Transferable Interaction Pattern (TIP) denotes a reusable interaction abstraction whose operational meaning is preserved while its concrete bindings are re-instantiated in a new setting. In current arXiv usage, the term has at least two explicit technical realizations and one closely related antecedent. In web agents, a TIP is a stored skill record k=(ιk,σk,Φk,τk)k=(\iota_k,\sigma_k,\Phi_k,\tau_k) that pairs an induced interaction program with a slot schema and a structural sketch so that the skill can be retrieved by layout similarity and grounded on a live page (He et al., 16 Jun 2026). In industrial IoT, TIP is the name of “The Intent Protocol,” but the protocol is also interpretable as a transferable interaction pattern because it lets nodes express desired capabilities, discover compatible providers, negotiate cryptographic contracts, adapt schemas in WebAssembly sandboxes, and communicate without hard-wiring endpoint addresses or schemas (Mosquera, 25 May 2026). In human spatial guidance, the exact acronym does not appear, yet interaction patterns are formalized as equivalence classes of agent–environment behaviors under future-equivalence and geometric symmetry, yielding a transfer-oriented pattern library for planning (Mettler et al., 2013).

1. Conceptual Scope

A TIP is defined by what remains invariant when a concrete interaction is moved to another environment. In the web-agent setting, transfer is achieved by not binding to brittle DOM IDs or a specific URL, but instead storing the structure of the interaction, the semantic roles of the fields, and a cleaned accessibility-tree skeleton. In the IoT setting, transfer is achieved by shifting from address-based networking to intent-based interactions: a requester states “I need a capability of this type, with this schema and QoS,” and the network resolves a provider and any needed schema translation. In the human-guidance setting, transfer arises from invariances or symmetries in the interactions between an agent and its environment, so that behavior segments that are equivalent in future consequence and geometric form can be reused as the same interaction pattern (He et al., 16 Jun 2026, Mosquera, 25 May 2026, Mettler et al., 2013).

Across these literatures, a common distinction separates interaction semantics from realization details. Web TIPs separate slot semantics from concrete refs. The Intent Protocol separates capability and schema requirements from physical endpoints. Symmetry-based guidance patterns separate the behavioral role of a trajectory segment from its absolute coordinates. This suggests that TIP is best understood not as one fixed data structure, but as a family of abstractions for late binding, re-grounding, and reuse.

A recurring misconception is that transferable patterns are merely compressed action traces. The web-agent formulation explicitly rejects this: TIPs are not per-trajectory macros, because they strip concrete refs and literal action sequences while retaining an operation template, slot schema, and structural sketch. The IoT formulation likewise is not a routing shortcut over preconfigured endpoints; it replaces endpoint selection with runtime discovery, scoring, contract formation, and schema adaptation. The guidance formulation is not a library of open-loop maneuvers; it is grounded in closed-loop agent–environment dynamics and in subgoal structure (He et al., 16 Jun 2026, Mosquera, 25 May 2026, Mettler et al., 2013).

2. Formal Representations

The most explicit TIP definition appears in SkillMigrator, where each skill is stored as

k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).

Here ιk\iota_k is a one-sentence intent, σk\sigma_k is an operation template from a finite set Σ\Sigma, Φk\Phi_k is a slot schema, and τk\tau_k is a structural sketch derived from a cleaned accessibility tree. The slot schema is

Φk={ξ1,,ξm},\Phi_k = \{\xi_1, \dots, \xi_m\},

and each slot carries a key, a descriptor dξd_\xi, and a synonym set TξT_\xi. The stored sketch k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).0 is a small labelled tree whose nodes carry role and name, and it is used for tree-edit-distance retrieval rather than for direct execution (He et al., 16 Jun 2026).

In The Intent Protocol, the transferable unit is the declarative intent and its associated temporary contract. The network is modeled as a set of nodes

k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).1

with each node k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).2 advertising capabilities

k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).3

A capability is represented as

k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).4

while a requester issues

k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).5

The TIP engine resolves k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).6 against discovered capabilities and produces a temporary contract k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).7, described as a cryptographic contract attaching keys, QoS, and schema agreement. Provider choice is based on a multi-criteria score

k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).8

with weights summing to k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).9 and normalized utilities in ιk\iota_k0 (Mosquera, 25 May 2026).

In the human-guidance framework, interaction patterns are defined through two equivalence relations over symbolic trajectories. Goal or causal equivalence is

ιk\iota_k1

and geometric symmetry equivalence is defined through a Lie-group action ιk\iota_k2. Interaction patterns are then the double-equivalence classes ιk\iota_k3, which collect behavior segments that lead to the same subgoal or future evolution and are transformable into one another by symmetry. A plausible TIP-oriented reading of this framework is that transfer is guaranteed when the same symmetry group, subgoal type, and local dynamical regime remain applicable (Mettler et al., 2013).

3. TIPs in Web-Agent Skill Transfer

SkillMigrator uses TIPs to make web skills reusable across websites and domains by matching layout structure rather than specific element references. A typical example is the shared interaction pattern among Shopify “Add new product,” GitLab “Open a new issue,” and Postmill “Create a new forum”: all can be captured as a fill-and-submit interaction despite differing labels and site context. The transferable representation combines a one-sentence intent, an operation template such as “fill-and-submit template,” a slot schema with descriptors and synonym sets, and a structural sketch derived from the accessibility snapshot (He et al., 16 Jun 2026).

Retrieval is performed by scoring each skill ιk\iota_k4 for a subtask ιk\iota_k5 and live snapshot ιk\iota_k6:

ιk\iota_k7

The text term uses a frozen sentence encoder, Sentence-BERT, over the subtask, a live page summary ιk\iota_k8, and a rich TIP description

ιk\iota_k9

The layout term is a normalized tree-edit-distance similarity computed with APTED:

σk\sigma_k0

Skill mode is entered only if the top score exceeds a threshold σk\sigma_k1; otherwise execution falls back to ReAct-style primitive mode (He et al., 16 Jun 2026).

Grounding proceeds in two stages. Stage A binds slot values by solving a Hungarian assignment between slots and an instantiation dictionary σk\sigma_k2, using

σk\sigma_k3

Any still-unbound slots are matched to candidate spans extracted from the subtask text. Stage B binds slots to live controls using descriptors σk\sigma_k4 and another Hungarian assignment over σk\sigma_k5, restricted by role compatibility. Execution is deterministic from the operation template: for a fill-and-submit template, the system fills all bound slots and then clicks a submit-like button, with a fallback to the policy σk\sigma_k6 for required slots that remain unbound (He et al., 16 Jun 2026).

Empirically, SkillMigrator reduces average LLM-action count while maintaining comparable success. On WebArena, PolySkill reports average SR σk\sigma_k7 and σk\sigma_k8, whereas SkillMigrator reports average SR σk\sigma_k9 and Σ\Sigma0, corresponding to an Σ\Sigma1 reduction in LLM actions versus PolySkill and Σ\Sigma2 versus ReAct at Σ\Sigma3. On Mind2Web cross-domain, PolySkill (+Update) reports SR Σ\Sigma4, Σ\Sigma5, ReuseΣ\Sigma6, while SkillMigrator (+Update) reports SR Σ\Sigma7, Σ\Sigma8, ReuseΣ\Sigma9. The abstract summarizes the overall effect as an Φk\Phi_k0-Φk\Phi_k1 reduction in average LLM-action count on successful trajectories across both WebArena and Mind2Web at matched success rate (He et al., 16 Jun 2026).

Benchmark Baseline TIP-based result
WebArena PolySkill: SR Φk\Phi_k2, Φk\Phi_k3 SkillMigrator: SR Φk\Phi_k4, Φk\Phi_k5
Mind2Web cross-domain PolySkill (+Update): SR Φk\Phi_k6, Φk\Phi_k7, ReuseΦk\Phi_k8 SkillMigrator (+Update): SR Φk\Phi_k9, τk\tau_k0, Reuseτk\tau_k1

The ablations clarify what carries the transfer. Text-only retrieval τk\tau_k2 lowers WebArena SR from τk\tau_k3 to τk\tau_k4 and Mind2Web cross-domain SR from τk\tau_k5 to τk\tau_k6, indicating that layout is essential. Removing slot synonyms lowers Mind2Web cross-domain SR from τk\tau_k7 to τk\tau_k8. Removing the gate τk\tau_k9 slightly reduces SR while further lowering Φk={ξ1,,ξm},\Phi_k = \{\xi_1, \dots, \xi_m\},0, showing that the gate mainly trades aggressive skill use against safety (He et al., 16 Jun 2026).

4. TIP as Intent-Based IoT Interoperability

In the IoT protocol setting, TIP replaces address-based networking with intent-based interactions. A requester does not send to a fixed URI or IP:port; it declares a needed capability, desired schema, constraints, and weights, and the TIP engine resolves a provider and any necessary schema translation. This makes the same high-level interaction portable across a different plant, subnet, or cloud region, because the semantics are carried in the intent and contract rather than in physical routing details (Mosquera, 25 May 2026).

Discovery follows a dual-phase pattern: fast local discovery through multicast DNS and DNS Service Discovery, and scalable global lookup through a Kademlia Distributed Hash Table. Local capabilities are published via PTR/SRV/TXT records, with TXT carrying metadata such as schema, version, and security options. Global lookup hashes a capability identifier into a DHT key and uses the XOR metric

Φk={ξ1,,ξm},\Phi_k = \{\xi_1, \dots, \xi_m\},1

with Φk={ξ1,,ξm},\Phi_k = \{\xi_1, \dots, \xi_m\},2-buckets and iterative lookup over the Φk={ξ1,,ξm},\Phi_k = \{\xi_1, \dots, \xi_m\},3 closest nodes. The dual-phase active discovery algorithm first checks a local cache, then launches AsyncBrowseMdns(C_id) and AsyncDHTLookup(C_id) in parallel with timeout Φk={ξ1,,ξm},\Phi_k = \{\xi_1, \dots, \xi_m\},4, collects results, deduplicates by node ID, and updates the cache. Discovery yields a candidate set Φk={ξ1,,ξm},\Phi_k = \{\xi_1, \dots, \xi_m\},5 of nodes whose capabilities match the requested capability ID and appear potentially compatible in schema and QoS (Mosquera, 25 May 2026).

Selection is adaptive rather than static. The utility Φk={ξ1,,ξm},\Phi_k = \{\xi_1, \dots, \xi_m\},6 combines functional match, cost or proximity, trust, and availability, with the weights derived from requester preferences through the Analytic Hierarchy Process. Network proximity is modeled by

Φk={ξ1,,ξm},\Phi_k = \{\xi_1, \dots, \xi_m\},7

and trust is discounted for poorly observed nodes through

Φk={ξ1,,ξm},\Phi_k = \{\xi_1, \dots, \xi_m\},8

Reputation itself decays exponentially toward a neutral value Φk={ξ1,,ξm},\Phi_k = \{\xi_1, \dots, \xi_m\},9 with a half-life of dξd_\xi0 days:

dξd_\xi1

This selection machinery makes provider choice depend on current conditions and policy rather than on hardcoded routes (Mosquera, 25 May 2026).

A central feature is schema translation as a first-class protocol element. When provider and requester schemas differ, the runtime consults a translator registry, parses a TOML adapter definition, generates WebAssembly Text, compiles it to .wasm, and caches the adapter keyed by a hash of dξd_\xi2. Execution occurs in Wasmtime: the host copies the CBOR-serialized payload into WASM linear memory, calls the exported transform function, and reads back the translated result. The examples include a Celsius-to-Fahrenheit module and a pulse-to-milliliters transform, with linear maps of the form

dξd_\xi3

Because WebAssembly sandboxes trap on out-of-bounds accesses, the paper treats this as safe execution of arbitrary translation logic without compromising the TIP daemon (Mosquera, 25 May 2026).

Security is integrated into the packet format and negotiation flow. TIP uses a fixed dξd_\xi4-byte header in front of a CBOR payload, with Sequence Number, Timestamp, Checksum (CRC32), and a dξd_\xi5-byte Ed25519 signature. Confidentiality is established through X25519 Diffie–Hellman key exchange following a Noise Protocol pattern, producing

dξd_\xi6

which is then used with ChaCha20-Poly1305 AEAD encryption. Anti-replay combines timestamp validation,

dξd_\xi7

with an LRU nonce cache based on Seq XOR Timestamp (Mosquera, 25 May 2026).

The implementation is split across a Rust-based orchestrator (tip-core) and C++ edge nodes (tip-edge-cpp) using libsodium, with CoAP over UDP carrying the TIP binary frame as application/octet-stream. On Raspberry Pi 4 nodes and Windows gateways, intent matching and multi-criteria scoring over dξd_\xi8 nodes completes in dξd_\xi9 ms; Wasmtime-based translation for a simple Celsius-to-Fahrenheit transform takes on average TξT_\xi0, including memory copies; X25519 key exchange costs approximately TξT_\xi1; and ChaCha20-Poly1305 encryption is “sub-microsecond” per payload. The abstract characterizes the result as sub-millisecond translation overhead and robust resilience under industrial conditions (Mosquera, 25 May 2026).

5. Symmetry-Grounded Interaction Patterns in Human Guidance

The 2013 guidance framework provides the most behaviorally grounded precursor to TIP. Guidance is modeled as the closed-loop dynamics of an agent interacting with its environment:

TξT_\xi2

yielding

TξT_\xi3

Continuous trajectories are quantized into a symbolic alphabet, and interaction patterns are then defined from the resulting symbolic histories through future-equivalence and geometric symmetry. Under this construction, an interaction pattern is a set of behavior segments that have the same role in the task and are transformable into one another by the symmetries of the agent–environment dynamics (Mettler et al., 2013).

The geometric component is made concrete through Lie-group actions. For a Dubins vehicle in obstacle fields, the joint agent–environment state is invariant under planar TξT_\xi4, and adding mirror symmetry can merge “approach subgoal from left” and “from right” into a single pattern. This shows that the granularity of a pattern library depends on the chosen symmetry group, and it explains how qualitatively similar behaviors in different locations can be treated as the same pattern (Mettler et al., 2013).

The framework organizes behavior hierarchically. At the low level are piecewise affine modes with dynamics

TξT_\xi5

with three identified modes: TξT_\xi6 for start, TξT_\xi7 for coasting, and TξT_\xi8 for approaching. At the middle level are interaction patterns or subgoals indexed by TξT_\xi9. At the high level is the sequence of subgoals and transitions among them. These levels are unified in a hierarchical hidden Markov model with continuous state k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).00, switching variables k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).01 and k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).02, and transition laws for mode and subgoal changes (Mettler et al., 2013).

The learning pipeline proceeds in five stages: symbolic representation and subgoal identification; subgoal clustering and trajectory segmentation; geometric symmetry analysis; dynamical characterization via piecewise affine models; and meta-behavioral analysis using a learned time-to-go function and wavefront propagation. Repelling manifolds identify decision boundaries between patterns, while attracting manifolds identify subgoals. The resulting structure supports a pattern library that is much smaller than the full state space and can be used for high-level planning (Mettler et al., 2013).

A plausible TIP interpretation of this framework is that transfer occurs when the new environment preserves the applicable symmetry group, supplies subgoals of the same type, and keeps the agent in the same local dynamical regime. Under those conditions, a learned pattern such as “approach-gap-from-left” can be rotated or translated into a new environment and re-instantiated without relearning its internal dynamical structure (Mettler et al., 2013).

6. Comparative Significance, Limitations, and Open Directions

The three literatures converge on a common architectural idea: transfer depends on storing the right invariants. In web agents, those invariants are operation templates, slot descriptors, synonym sets, and structural sketches. In the Intent Protocol, they are capability identifiers, desired schemas, constraints, weights, cryptographic contracts, and declarative adapters. In human guidance, they are subgoals, symmetry groups, piecewise affine modes, and validity regions. The shared effect is late binding: slot-to-ref in web agents, provider-to-intent in IoT, and trajectory-instance-to-pattern in guidance (He et al., 16 Jun 2026, Mosquera, 25 May 2026, Mettler et al., 2013).

The differences are equally important. A web TIP is a stored skill with deterministic execution over primitive tools. An IoT TIP is a protocol-centered interaction flow with discovery, scoring, contract formation, translation, and security. A guidance TIP is an equivalence class of closed-loop behaviors tied to geometric invariances. This suggests that “Transferable Interaction Pattern” is a cross-domain conceptual family rather than a single standardized artifact.

Each formulation also makes its own assumptions explicit. In SkillMigrator, layout-conditioned retrieval can fail when similar accessibility-tree structures correspond to different semantics, when visual cues not captured by the text snapshot matter, or when the operation-template set k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).03 does not cover interactions such as drag-and-drop or complex modals. Dynamic and highly stateful pages can invalidate stored assumptions, and threshold selection for k  =  (ιk,  σk,  Φk,  τk).k \;=\; \bigl(\,\iota_k,\; \sigma_k,\; \Phi_k,\; \tau_k\,\bigr).04 can be delicate in open-world settings (He et al., 16 Jun 2026). In The Intent Protocol, public-key distribution, revocation, and authorization are not fully specified; schema adapters must exist and be semantically correct; hard real-time guarantees are not claimed; Kademlia performance depends on network conditions and stability; and complex multi-party coordination is not fully treated (Mosquera, 25 May 2026). In the guidance framework, only a few environments are analyzed, time-to-go learning is limited in scope, perceptual mechanisms remain abstracted, and broader validation across richer environments and modalities is left open (Mettler et al., 2013).

The proposed future directions are correspondingly diverse. Web-agent work points toward multimodal cues, broader template coverage, more flexible repairability, and integration with tree-search or model-based planning. IoT work points toward richer semantics for capabilities, automatic adapter synthesis, higher-level authorization frameworks, and formal verification of negotiation and security protocols. Guidance work points toward eye-tracking, brain-imaging interpretation, richer sensory modalities, 3D environments, dynamic obstacles, and robotic implementation (He et al., 16 Jun 2026, Mosquera, 25 May 2026, Mettler et al., 2013).

Taken together, these works establish TIP as a rigorous abstraction for reusing interactions under changed realizations. Whether the realization change is a new website, a new subnet, or a new obstacle layout, the essential move is the same: represent the interaction by its invariant structure, then recover a concrete instantiation at runtime through retrieval, matching, negotiation, or symmetry.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Transferable Interaction Pattern (TIP).