ConcreteNet: In-Materio & Neural Grounding

Updated 15 December 2025

ConcreteNet is a dual-domain system that integrates distributed in-materio sensing via nano-doped concrete and a neural network for dense 3D visual grounding.
The architecture employs advanced reservoir computing, smart paints, and robust communication protocols to enable real-time, scalable environmental monitoring.
Its neural variant leverages transformer-based fusion, contrastive training, and multi-view ensembling to achieve state-of-the-art performance in point cloud segmentation.

ConcreteNet refers to two independent, technically distinct systems within contemporary research: (1) a distributed, in-materio sensing and computing network built from nano-doped concrete blocks and smart paints for architectural-scale embedded intelligence (Adamatzky et al., 2018); and (2) an end-to-end neural network for dense 3D visual grounding with natural-language input, focused on precise instance segmentation in point clouds (Unal et al., 2023). Both embody advanced sensory fusion, distributed computation, and localized decision-making, but within different domains and physical substrates.

1. Architectural-Scale In-Materio Computing with ConcreteNet

ConcreteNet in the context of buildable computational material design is a cyber-physical system in which computation, sensing, and communication are directly embedded into construction materials. This is accomplished by engineering concrete blocks and panels with nanoscale dopants and responsive paints to function as massive-parallel information processors. Each block serves as a physical reservoir computer capable of environmental inference and collective behavior (Adamatzky et al., 2018).

Nano-material–Concrete Composites and Smart Paints

The substrate consists of a Portland cement matrix doped with:

0.5–5 wt % semiconducting oxide nanoparticles (TiO₂, CdS)
5–10 wt % carbonaceous additives (CNTs, graphene flakes), yielding bulk conductivity σ ≃ 10⁻³–10⁻¹ S/m and permittivity ε_r ≃ 5–15
1–5 wt % sub-percolation metallic shavings (Al, Fe)

Electrostatic and conduction behavior at 1 kHz–1 MHz follows: $\nabla\!\cdot[\sigma(\mathbf r)+j\omega\varepsilon_0\varepsilon_r(\mathbf r)]\,\nabla V(\mathbf r)=0$ Memristive ionic motion is modeled as: $\frac{dw}{dt} = \alpha V - \beta w,\quad I(t)=[\sigma_0 + \gamma w(t)]V(t)$ Smart paints combine TiO₂ or ZnO base with organic binders and conductive tracers, achieving spectral responsivity peaks of R₀ ≃ 0.1 A/W at λ₀ ≃ 380 nm. Photovoltage output is given by: $E_{\rm ph}(t) = \eta\int R(\lambda)\Phi(\lambda,t)d\lambda,\quad I_{\rm out}(t)=\kappa E_{\rm ph}(t)$

Fabrication and Calibration

Optimal nanoparticle dispersion is accomplished by ultrasonication and superplasticizers in standardized blocks (200 × 200 × 50 mm, cured 28 days). Calibration employs in-materio training: stimulation by input waveforms, electrode response recording across n ≈ 16–64 nodes, and ridge regression with solution: $\min_{W_{\rm out}} \Vert W_{\rm out} X - y \Vert^2 + \lambda \Vert W_{\rm out} \Vert^2$

2. Neuromorphic Computational Principles

ConcreteNet leverages reservoir computing principles for distributed information processing. Each block maintains a state vector x(t) ∈ ℝⁿ measured by embedded electrodes, updated as: $x(t+1)=(1-\alpha)x(t)+\alpha f(W_{\rm in}u(t)+W_{\rm res}x(t))$ Readout mapping is fixed post-training: $y(t) = W_{\rm out}x(t)$

Input matrices W_in, reservoir connectivity W_res (sparse, 5% density, scaled spectral radius ρ < 1), and nonlinearity f (tanh or piecewise linear) create fading memory. Memristive nonlinearity within the concrete enhances higher-order mixing; typical memory capacities are 20–50 time steps per block.

3. Distributed Network Topology and Communication

Blocks are assembled in 2D or 3D mesh topologies, interconnected via conductive mortar, flex-ribbon connectors, or optional IEEE 802.15.4 (ZigBee) wireless mesh. The block-to-block adjacency matrix A encodes direct physical connectivity: $A_{ij} = \begin{cases} 1 & \text{if blocks %%%%0%%%% and %%%%1%%%% are directly connected} \ 0 & \text{otherwise} \end{cases}$

Communication employs TDMA protocols, exchange of compressed state summaries, and multi-hop routing. Degree per block is $d_i = \sum_j A_{ij}$ .

4. Sensing, Fusion, and Decision Making

Each block senses multiple modalities: photo-sensor signals $I_{\rm ph}(t)$ , temperature $\Delta R_T(t)$ , and strain $\Delta R_S(t)$ , processed as

$v_i(t) = f_{\rm pre}[I_{\rm ph},\,\Delta R_T,\,\Delta R_S]$

Local reservoir state is updated and mapped to decision output $y_i(t) = W_{\rm out,i} x_i(t)$ . Distributed consensus averaging performs data fusion: $z_i(t+1) = \frac{1}{d_i+1} \left( z_i(t) + \sum_j A_{ij} z_j(t) \right)$ Hierarchical cluster heads aggregate local outputs for central decision layers. Reservoir-based classifiers operate globally or via majority voting.

5. Performance, Scalability, and Applications

Typical block computational capacity: n ≈ 50–200 nodes ( $\sim$ 100-dim state space), bandwidth $\Delta f$ ≈ 1 kHz, update latency $\approx$ 1 ms, energy per block 50–200 mW. Meshes scale to $N\cdot n$ state space; multi-hop latency 1–10 ms per hop.

Applications span:

Structural health monitoring (damage localization via impedance change mapping)
Distributed climate control (block-level temperature/humidity sensors, airflow regulation)
Occupancy detection & security (pattern recognition based on reflective/vibration signals)
Energy harvesting/self-powering (block-level photovoltaic and piezoelectric mechanisms)
Interactive façades (gesture/light recognition using reservoir classifiers)

6. Dense 3D Visual Grounding: ConcreteNet as Neural Architecture

A technically distinct system named ConcreteNet (Unal et al., 2023) achieves state-of-the-art performance in dense 3D visual grounding. This implementation takes an RGB point cloud and free-form description, generating precise point-level segmentation of the referred object. Main stages:

3D Instance Candidate Generation: Sparse-conv UNet (2 cm voxels, D=128 features) yields semantic scores $s$ , centroid offsets $x$ , heatmaps $h$ (with loss $\mathcal{L}_{cen}$ , $γ=25$ ). Candidate kernels built by local NMS; dynamic convolution reconstructs masks $z^{(i)}$ , supervised by BCE and Dice losses, yielding instance embeddings $e_i$ .
Language Encoding: Description $D$ tokenized with MPNet transformer, projecting tokens to ℝ^128.
Verbo-Visual Fusion & Mask Selection: All instance and token embeddings undergo transformer-based fusion using:
- Bottom-up Attentive Fusion (BAF): 6-layer transformer decoder with distance-masked self-attention radii ( $r = [1.0, 2.5, ∞, ∞, ∞, ∞]$ ), enabling spatial relational attention.
- Contrastive training: Pulls true instance and sentence embedding together with cosine similarity loss ( $τ = 0.3$ ).
- Learned Global Camera Token (GCT): Explicit token for viewpoint handling, supervised via L₂ loss on predicted yaw/pitch.
- Multi-view ensembling (TTA): Input rotated by $K=5$ yaw angles, aggregated with point-wise majority voting over masks.

Overall loss: $\mathcal{L}_{total} = \mathcal{L}_{can} + \mathcal{L}_{mask} + \mathcal{L}_{sel} + \lambda_{con}\mathcal{L}_{contrast} + \lambda_{cam}\mathcal{L}_{cam}$ Training uses AdamW ( $lr=3\times10^{-4}$ , batch size $4\times32$ , 400 epochs).

7. Empirical Evaluation and Impact

On the ScanRefer benchmark, ConcreteNet (Unal et al., 2023) yields Acc@50 IoU of 43.84% (unique: 75.62%, multiple: 36.56%), outperforming prior box-based approaches by over 9 points at 50% IoU; multi-view ensembling increases this to 46.53%. Ablations demonstrate contributions of each module, with BAF (+5.57 on multiple subset), contrastive loss, and GCT, true camera pose input pushing Acc@50 to 51.18%. Implementation details specify sparse-conv UNet backbones, kernel instance segmentation, MPNet-based language embeddings, and transformer fusion heads. Inference is single-pass per mask, best results use 5-way yaw TTA.

ConcreteNet as a material-embedded computer represents a paradigm for scalable built-environment intelligence (Adamatzky et al., 2018); as a neural network model, it sets new benchmarks for natural-language-driven 3D scene understanding (Unal et al., 2023). Each advances distributed sensing, context-dependent fusion, and decision protocols in their respective fields.

PDF Markdown Chat (Pro)

References (2)

On buildings that compute. A proposal (2018)

Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding (2023)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to ConcreteNet.