Data-Driven Near-Well Modeling

Updated 23 January 2026

Data-driven near-well models are computational frameworks that replace or augment classical near-well approximations using supervised machine learning and physics-informed constraints.
They leverage high-fidelity simulation, field measurements, and inversion results to accurately parameterize local pressure, shear stress, flux, and production influences.
The integration with numerical simulators and automatic differentiation enhances predictive accuracy and operational efficiency while enabling robust uncertainty quantification.

A data-driven near-well model is an approach to representing the physics and operational impact of wells in subsurface simulation, production prediction, or geosteering workflows through supervised or hybrid learning from high-fidelity simulation, field data, or inversion results. The paradigm covers a wide domain: near-wall turbulence modeling for high-Reynolds-number channel flows (Xue et al., 2024), transient pressure solution in reservoir PINN frameworks (Walter et al., 12 Jul 2025), ensemble-based well index correction in compositional or multiphase simulators (Schultzendorff et al., 16 Jan 2026), geosteering uncertainty assimilation (Rammay et al., 2022), and production forecast from vertical logs (Guevara et al., 2017). Common to all is the replacement or augmentation of classical physics-based local models (wall functions, Peaceman-type indices, steady boundary equations) with machine learning models trained on high-resolution or aggregated data, embedded with physical constraints, and tightly integrated with numerical simulation and optimization infrastructure.

1. Core Principles and Mathematical Frameworks

Data-driven near-well models are constructed to replace or augment the analytical or empirical formulas typically used to represent near-well singularities, fluxes, or production influences in coarse-grid or operational models. The foundational equations are application-dependent but share a principle: locally parameterize the pressure, shear stress, flux, or production using supervised ML (e.g., NNs, PINNs, kernel machines), conditioned on the grid state and physical regimes.

Reservoir Simulation: The classical Peaceman well index relates cell pressure $p_i$ and bottomhole pressure $p_w$ through $q = \text{WI} \cdot \lambda (p_i - p_w)$ . The data-driven surrogate replaces $\text{WI}$ by a neural network-predicted $\widetilde{\text{WI}} = 10^{\mathcal{N}(x;\theta)}$ , with $x$ feature vector containing local cell state (pressure, permeability, saturation, geometry, injected volume), trained on fine-scale solutions (Schultzendorff et al., 16 Jan 2026).
Near-Wall Turbulence: Shear velocity $u_\tau$ near a wall is mapped via a constrained NN as $u_\tau = \text{linear}(y_{\text{pred}})$ , with input features engineered to enforce log-layer scaling (e.g., $\varphi_1 = u/(1000y)$ ) and learned on IDDES data (Xue et al., 2024).
PINN Pressure Diffusion: Fluid pressure $p(\mathbf{x},t)$ around a well is represented via a composite network solution obeying the pressure-PDE residual, where the well source/sink is encoded as a smoothed Gaussian with equivalent radius, and solution continuity is enforced via decomposed nested PINNs (Walter et al., 12 Jul 2025).

In all cases, input scaling, selection, and network architecture are informed by the underlying physics, and ML model outputs are coupled as differentiable operators into simulators using automatic differentiation for seamless integration with nonlinear solvers.

2. Training Data Sources, Feature Engineering, and Physical Constraints

Training data for data-driven near-well modeling is typically derived from high-fidelity simulation or field measurement under physically relevant parameter regimes:

Fine-Scale Simulations: For well-index correction, fine-scale radial or sector simulations are run over ensembles of permeability, pressure, and multiphase properties; coarse-grid cell states are extracted as features, and the "truth" well index is computed as $WI_{\text{true}} = q_{\text{true}}/(\lambda(p_i-p_w))$ (Schultzendorff et al., 16 Jan 2026).
Hybrid Physics Data: For near-wall modeling, the dataset consists of shear velocity computations from IDDES on high-resolution LBM grids (e.g., $96\times96\times96$ mesh, $19,600$–$2,000$ samples), capturing only near-wall panels ( $y^+<200$ ) and reducing sample volume by three orders of magnitude compared to DNS (Xue et al., 2024).
Production Logs: For production prediction, vertical well logs (gamma ray, density, resistivity, etc.) are standardized and functionally decomposed (fPCA), scores are spatially interpolated to horizontal wells, and output labels are cumulative production at predetermined time intervals (Guevara et al., 2017).
Field Assimilation/Inversion: In strategic geosteering, the prior layer boundaries and resistivities ( $z_i$ , $\rho_i$ ) are sampled as ensembles from Gaussian processes, and observed logs are assimilated against a DNN proxy via the FlexIES smoother, correcting for model error in real time (Rammay et al., 2022).

Physical constraints are embedded through input feature selection (enforcing non-dimensional scaling), structured output mapping (e.g., normalization to [0,1]), physics-informed loss functions (weighted mean-square error, PDF loss correction), and hard boundary/initial condition encoding via multiplicative multipliers in PINNs (Walter et al., 12 Jul 2025, Xue et al., 2024).

3. Model Architectures, Loss Functions, and Integration

Neural architectures range from compact fully-connected MLPs for well-index prediction (input dimensions $3$–$12$; three to four hidden layers of $32$–$64$ neurons; tanh activations; output is log-index) (Schultzendorff et al., 16 Jan 2026), to PINNs with four hidden layers of $40$ neurons each (softplus output for physical positivity; hard-constraint multiplier for BC/IC) (Walter et al., 12 Jul 2025), to moderate networks (e.g., $158$ parameters for near-wall shear stress) (Xue et al., 2024).

Training protocols involve:

Supervised regression loss (MSE) on transformed targets (e.g., log-scale for well index)
Regularization via weight decay ( $\sim10^{-5}$ )
PDF-weighted loss for sparse training data penalization
Ensemble data splits (typically $80/20$ train/validation)
Early stopping on validation MSE
Data normalization and feature padding for variable grid geometries.

Integration protocols embed the trained network into simulation infrastructure:

TensorFlow SavedModel export, loaded via C++ API in OPM Flow, wrapped as AD operators for direct Jacobian computation (Schultzendorff et al., 16 Jan 2026)
Differentiable mapping between LBM units and physical inputs/outputs in turbulence modeling (Xue et al., 2024)
Sequential domain-decomposition in PINNs, where multiple independently trained networks are superimposed with matching weights to enforce continuity and solve multiscale pressure fields (Walter et al., 12 Jul 2025).

4. Validation, Performance Metrics, and Comparative Evaluation

Model fidelity is quantified via joint metrics tailored to the application:

Application	Key Metrics	Typical Performance
Well-index correction	MAE, max error (pressure)	$<$ 1.1 bar error on 100 $\times$ 100 grid vs. $>$ 20 bar classical error (Schultzendorff et al., 16 Jan 2026)
Near-wall modeling	ARE, DNS error, log-law	$\approx$ 1–4% ARE for shear velocity; $<$ 5% error in velocity/stress profiles (Xue et al., 2024)
PINN pressure	MAE, MSR, composite error	AE $_{\max}$ : 0.11 (after 3 domains); MAE $_w\sim10^{-2}$ (Walter et al., 12 Jul 2025)
Production prediction	LOO-RMSE, Pearson $r$	RMSE: 0.52–0.74 for ML, vs. 0.70–0.76 kriging; $r$ up to 0.77 for ML (Guevara et al., 2017)
Geosteering inversion	PICP, CRPS, CI width	PICP: 90–95% (ideal) with FlexIES vs. 60% (classical); CRPS reduced by 10–20% (Rammay et al., 2022)

These results consistently reveal one–two orders of magnitude improvement over classical models (Peaceman analytic, DNS-based, or kriging baselines), especially in regimes of multiphase, anisotropic, transient, or sparse data where classical assumptions fail.

5. Interpretation, Impact, and Limitations

The adoption of data-driven near-well models yields several operational benefits:

Efficient replacement of computationally expensive local grid refinement or wall-resolved meshes, with reduction in training data requirements by $10^3$ via physics-informed constraints (Xue et al., 2024).
Seamless integration of ML operators as differentiable, auto-diff'ed components in simulation frameworks, enabling direct use in Newton or ensemble optimization algorithms without modification of solver logic (Schultzendorff et al., 16 Jan 2026).
Improved accuracy in near-field pressure, production, or log inversion with no runtime penalty ( $\approx$ 5% CPU overhead due to TensorFlow API calls) (Schultzendorff et al., 16 Jan 2026).
Robust handling of sparse data, unstructured grids, and generalization across $\text{Re}_\tau$ up to $1.0 \times 10^6$ (Xue et al., 2024).
Bayesian uncertainty quantification and adaptive confidence intervals in real-time field workflows (Rammay et al., 2022).

Limitations include:

Dependence on comprehensive high-fidelity training ensembles for coverage of parameter regimes; extrapolation risk if external conditions diverge from training (Schultzendorff et al., 16 Jan 2026).
Constrained physical generality (tested for channel flow, zero–pressure-gradient settings, not yet generalized to separation, multiphase reactions, or complex geometries) (Xue et al., 2024).
For PINN-based pressure inference, multiple nested domain decompositions and the need for optimal selection of “equivalent radius” shrinkage factor $b$ to control error convergence (Walter et al., 12 Jul 2025).
Additional bookkeeping for normalization, feature engineering, and grid boundary padding in heterogeneous settings.

Possible extensions involve the incorporation of PDE-residual penalties, multi-fidelity data fusion, spanwise/streamwise coupling in turbulence models, application to fractured/multilateral wells, and active learning for online adaptation of network parameters (Xue et al., 2024, Schultzendorff et al., 16 Jan 2026).

6. Application Domains and Future Directions

Data-driven near-well models are now foundational in:

Large-eddy simulation of high-Reynolds-number turbulent flows with LBM on coarse meshes for aerodynamic and climate applications (Xue et al., 2024)
CO $_2$ storage, multiphase injection, and pressure management in subsurface energy geomechanics (Schultzendorff et al., 16 Jan 2026)
Transient well pressure reconstruction from operational data using physics-informed neural operators (Walter et al., 12 Jul 2025)
Real-time geosteering with rigorous Bayesian uncertainty envelopes from ensemble-smoother and DNN proxy workflows (Rammay et al., 2022)
Production forecasting and “sweet spotting” in unconventional fields via feature extraction and spatial ML models (Guevara et al., 2017)

Extending these approaches to multi-fidelity simulation, adaptive grid refinement, coupled geomechanical flows, and dynamically evolving well trajectories in fractured or multiphase media is a focus of current research. Integration of active learning, multi-objective optimization, and compositional/thermal physics into the near-well ML frameworks is expected to further enhance predictive fidelity and operational flexibility.

7. Summary Table: Representative Data-Driven Near-Well Model Frameworks

Reference	Context	Model Type	Key Features
(Xue et al., 2024)	LBM near-wall turbulence	PINN w/PDF weighting	Physics-informed, log-scaling
(Schultzendorff et al., 16 Jan 2026)	CO $_2$ /multiphase flow	NN well index	Integrated AD, ensemble training
(Walter et al., 12 Jul 2025)	PINN pressure diffusion	Sequential PINNs	Nested domain, composite solution
(Rammay et al., 2022)	Geosteering inversion	DNN proxy + FlexIES	Uncertainty-aware, model error
(Guevara et al., 2017)	Production forecasting	ML regression/fPCA	Vertical log features, LOO-RMSE

The emergence of data-driven near-well models marks a transition toward hybrid simulation environments, combining physical law constraints with adaptive, data-centric surrogates. Their demonstrated computational efficiency, fidelity, and robustness position them as essential tools for large-scale, realistic subsurface and fluid-dynamics modeling.

Markdown Upgrade to Chat

References (5)

Physics informed data-driven near-wall modelling for lattice Boltzmann simulation of high Reynolds number turbulent flows (2024)

WellPINN: Accurate Well Representation for Transient Fluid Pressure Diffusion in Subsurface Reservoirs with Physics-Informed Neural Networks (2025)

A Machine-Learned Near-Well Model in OPM Flow (2026)

Strategic Geosteeering Workflow with Uncertainty Quantification and Deep Learning: A Case Study on the Goliat Field (2022)

A data-driven workflow for predicting horizontal well production using vertical well logs (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Data-Driven Near-Well Model.

Data-Driven Near-Well Modeling

1. Core Principles and Mathematical Frameworks

2. Training Data Sources, Feature Engineering, and Physical Constraints

3. Model Architectures, Loss Functions, and Integration

4. Validation, Performance Metrics, and Comparative Evaluation

5. Interpretation, Impact, and Limitations

6. Application Domains and Future Directions

7. Summary Table: Representative Data-Driven Near-Well Model Frameworks

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Data-Driven Near-Well Modeling

1. Core Principles and Mathematical Frameworks

2. Training Data Sources, Feature Engineering, and Physical Constraints

3. Model Architectures, Loss Functions, and Integration

4. Validation, Performance Metrics, and Comparative Evaluation

5. Interpretation, Impact, and Limitations

6. Application Domains and Future Directions

7. Summary Table: Representative Data-Driven Near-Well Model Frameworks

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research