Papers
Topics
Authors
Recent
Search
2000 character limit reached

Data-Driven Near-Well Modeling

Updated 23 January 2026
  • Data-driven near-well models are computational frameworks that replace or augment classical near-well approximations using supervised machine learning and physics-informed constraints.
  • They leverage high-fidelity simulation, field measurements, and inversion results to accurately parameterize local pressure, shear stress, flux, and production influences.
  • The integration with numerical simulators and automatic differentiation enhances predictive accuracy and operational efficiency while enabling robust uncertainty quantification.

A data-driven near-well model is an approach to representing the physics and operational impact of wells in subsurface simulation, production prediction, or geosteering workflows through supervised or hybrid learning from high-fidelity simulation, field data, or inversion results. The paradigm covers a wide domain: near-wall turbulence modeling for high-Reynolds-number channel flows (Xue et al., 2024), transient pressure solution in reservoir PINN frameworks (Walter et al., 12 Jul 2025), ensemble-based well index correction in compositional or multiphase simulators (Schultzendorff et al., 16 Jan 2026), geosteering uncertainty assimilation (Rammay et al., 2022), and production forecast from vertical logs (Guevara et al., 2017). Common to all is the replacement or augmentation of classical physics-based local models (wall functions, Peaceman-type indices, steady boundary equations) with machine learning models trained on high-resolution or aggregated data, embedded with physical constraints, and tightly integrated with numerical simulation and optimization infrastructure.

1. Core Principles and Mathematical Frameworks

Data-driven near-well models are constructed to replace or augment the analytical or empirical formulas typically used to represent near-well singularities, fluxes, or production influences in coarse-grid or operational models. The foundational equations are application-dependent but share a principle: locally parameterize the pressure, shear stress, flux, or production using supervised ML (e.g., NNs, PINNs, kernel machines), conditioned on the grid state and physical regimes.

  • Reservoir Simulation: The classical Peaceman well index relates cell pressure pip_i and bottomhole pressure pwp_w through q=WIλ(pipw)q = \text{WI} \cdot \lambda (p_i - p_w). The data-driven surrogate replaces WI\text{WI} by a neural network-predicted WI~=10N(x;θ)\widetilde{\text{WI}} = 10^{\mathcal{N}(x;\theta)}, with xx feature vector containing local cell state (pressure, permeability, saturation, geometry, injected volume), trained on fine-scale solutions (Schultzendorff et al., 16 Jan 2026).
  • Near-Wall Turbulence: Shear velocity uτu_\tau near a wall is mapped via a constrained NN as uτ=linear(ypred)u_\tau = \text{linear}(y_{\text{pred}}), with input features engineered to enforce log-layer scaling (e.g., φ1=u/(1000y)\varphi_1 = u/(1000y)) and learned on IDDES data (Xue et al., 2024).
  • PINN Pressure Diffusion: Fluid pressure p(x,t)p(\mathbf{x},t) around a well is represented via a composite network solution obeying the pressure-PDE residual, where the well source/sink is encoded as a smoothed Gaussian with equivalent radius, and solution continuity is enforced via decomposed nested PINNs (Walter et al., 12 Jul 2025).

In all cases, input scaling, selection, and network architecture are informed by the underlying physics, and ML model outputs are coupled as differentiable operators into simulators using automatic differentiation for seamless integration with nonlinear solvers.

2. Training Data Sources, Feature Engineering, and Physical Constraints

Training data for data-driven near-well modeling is typically derived from high-fidelity simulation or field measurement under physically relevant parameter regimes:

  • Fine-Scale Simulations: For well-index correction, fine-scale radial or sector simulations are run over ensembles of permeability, pressure, and multiphase properties; coarse-grid cell states are extracted as features, and the "truth" well index is computed as WItrue=qtrue/(λ(pipw))WI_{\text{true}} = q_{\text{true}}/(\lambda(p_i-p_w)) (Schultzendorff et al., 16 Jan 2026).
  • Hybrid Physics Data: For near-wall modeling, the dataset consists of shear velocity computations from IDDES on high-resolution LBM grids (e.g., 96×96×9696\times96\times96 mesh, $19,600$–$2,000$ samples), capturing only near-wall panels (y+<200y^+<200) and reducing sample volume by three orders of magnitude compared to DNS (Xue et al., 2024).
  • Production Logs: For production prediction, vertical well logs (gamma ray, density, resistivity, etc.) are standardized and functionally decomposed (fPCA), scores are spatially interpolated to horizontal wells, and output labels are cumulative production at predetermined time intervals (Guevara et al., 2017).
  • Field Assimilation/Inversion: In strategic geosteering, the prior layer boundaries and resistivities (ziz_i, ρi\rho_i) are sampled as ensembles from Gaussian processes, and observed logs are assimilated against a DNN proxy via the FlexIES smoother, correcting for model error in real time (Rammay et al., 2022).

Physical constraints are embedded through input feature selection (enforcing non-dimensional scaling), structured output mapping (e.g., normalization to [0,1]), physics-informed loss functions (weighted mean-square error, PDF loss correction), and hard boundary/initial condition encoding via multiplicative multipliers in PINNs (Walter et al., 12 Jul 2025, Xue et al., 2024).

3. Model Architectures, Loss Functions, and Integration

Neural architectures range from compact fully-connected MLPs for well-index prediction (input dimensions $3$–$12$; three to four hidden layers of $32$–$64$ neurons; tanh activations; output is log-index) (Schultzendorff et al., 16 Jan 2026), to PINNs with four hidden layers of $40$ neurons each (softplus output for physical positivity; hard-constraint multiplier for BC/IC) (Walter et al., 12 Jul 2025), to moderate networks (e.g., $158$ parameters for near-wall shear stress) (Xue et al., 2024).

Training protocols involve:

  • Supervised regression loss (MSE) on transformed targets (e.g., log-scale for well index)
  • Regularization via weight decay (105\sim10^{-5})
  • PDF-weighted loss for sparse training data penalization
  • Ensemble data splits (typically $80/20$ train/validation)
  • Early stopping on validation MSE
  • Data normalization and feature padding for variable grid geometries.

Integration protocols embed the trained network into simulation infrastructure:

  • TensorFlow SavedModel export, loaded via C++ API in OPM Flow, wrapped as AD operators for direct Jacobian computation (Schultzendorff et al., 16 Jan 2026)
  • Differentiable mapping between LBM units and physical inputs/outputs in turbulence modeling (Xue et al., 2024)
  • Sequential domain-decomposition in PINNs, where multiple independently trained networks are superimposed with matching weights to enforce continuity and solve multiscale pressure fields (Walter et al., 12 Jul 2025).

4. Validation, Performance Metrics, and Comparative Evaluation

Model fidelity is quantified via joint metrics tailored to the application:

Application Key Metrics Typical Performance
Well-index correction MAE, max error (pressure) <<1.1 bar error on 100×\times100 grid vs. >>20 bar classical error (Schultzendorff et al., 16 Jan 2026)
Near-wall modeling ARE, DNS error, log-law \approx1–4% ARE for shear velocity; <<5% error in velocity/stress profiles (Xue et al., 2024)
PINN pressure MAE, MSR, composite error AEmax_{\max}: 0.11 (after 3 domains); MAEw102_w\sim10^{-2} (Walter et al., 12 Jul 2025)
Production prediction LOO-RMSE, Pearson rr RMSE: 0.52–0.74 for ML, vs. 0.70–0.76 kriging; rr up to 0.77 for ML (Guevara et al., 2017)
Geosteering inversion PICP, CRPS, CI width PICP: 90–95% (ideal) with FlexIES vs. 60% (classical); CRPS reduced by 10–20% (Rammay et al., 2022)

These results consistently reveal one–two orders of magnitude improvement over classical models (Peaceman analytic, DNS-based, or kriging baselines), especially in regimes of multiphase, anisotropic, transient, or sparse data where classical assumptions fail.

5. Interpretation, Impact, and Limitations

The adoption of data-driven near-well models yields several operational benefits:

  • Efficient replacement of computationally expensive local grid refinement or wall-resolved meshes, with reduction in training data requirements by 10310^3 via physics-informed constraints (Xue et al., 2024).
  • Seamless integration of ML operators as differentiable, auto-diff'ed components in simulation frameworks, enabling direct use in Newton or ensemble optimization algorithms without modification of solver logic (Schultzendorff et al., 16 Jan 2026).
  • Improved accuracy in near-field pressure, production, or log inversion with no runtime penalty (\approx5% CPU overhead due to TensorFlow API calls) (Schultzendorff et al., 16 Jan 2026).
  • Robust handling of sparse data, unstructured grids, and generalization across Reτ\text{Re}_\tau up to 1.0×1061.0 \times 10^6 (Xue et al., 2024).
  • Bayesian uncertainty quantification and adaptive confidence intervals in real-time field workflows (Rammay et al., 2022).

Limitations include:

  • Dependence on comprehensive high-fidelity training ensembles for coverage of parameter regimes; extrapolation risk if external conditions diverge from training (Schultzendorff et al., 16 Jan 2026).
  • Constrained physical generality (tested for channel flow, zero–pressure-gradient settings, not yet generalized to separation, multiphase reactions, or complex geometries) (Xue et al., 2024).
  • For PINN-based pressure inference, multiple nested domain decompositions and the need for optimal selection of “equivalent radius” shrinkage factor bb to control error convergence (Walter et al., 12 Jul 2025).
  • Additional bookkeeping for normalization, feature engineering, and grid boundary padding in heterogeneous settings.

Possible extensions involve the incorporation of PDE-residual penalties, multi-fidelity data fusion, spanwise/streamwise coupling in turbulence models, application to fractured/multilateral wells, and active learning for online adaptation of network parameters (Xue et al., 2024, Schultzendorff et al., 16 Jan 2026).

6. Application Domains and Future Directions

Data-driven near-well models are now foundational in:

Extending these approaches to multi-fidelity simulation, adaptive grid refinement, coupled geomechanical flows, and dynamically evolving well trajectories in fractured or multiphase media is a focus of current research. Integration of active learning, multi-objective optimization, and compositional/thermal physics into the near-well ML frameworks is expected to further enhance predictive fidelity and operational flexibility.

7. Summary Table: Representative Data-Driven Near-Well Model Frameworks

Reference Context Model Type Key Features
(Xue et al., 2024) LBM near-wall turbulence PINN w/PDF weighting Physics-informed, log-scaling
(Schultzendorff et al., 16 Jan 2026) CO2_2/multiphase flow NN well index Integrated AD, ensemble training
(Walter et al., 12 Jul 2025) PINN pressure diffusion Sequential PINNs Nested domain, composite solution
(Rammay et al., 2022) Geosteering inversion DNN proxy + FlexIES Uncertainty-aware, model error
(Guevara et al., 2017) Production forecasting ML regression/fPCA Vertical log features, LOO-RMSE

The emergence of data-driven near-well models marks a transition toward hybrid simulation environments, combining physical law constraints with adaptive, data-centric surrogates. Their demonstrated computational efficiency, fidelity, and robustness position them as essential tools for large-scale, realistic subsurface and fluid-dynamics modeling.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Data-Driven Near-Well Model.