Physics-Informed Neural Networks

Updated 25 January 2026

Physics-Informed Neural Networks (PINNs) are deep learning models that embed physical laws into the loss function to enhance simulation accuracy.
They integrate differential equation constraints during training, enabling reliable predictions in fields like fluid dynamics and heat transfer.
PINNs offer actionable insights by combining data-driven patterns with established physics, making them valuable for complex scientific computations.

The Bag-of-Keywords (BoK) auxiliary loss is a framework for interpretable open-domain dialogue response generation, designed to produce both higher-quality and more transparent conversational agents. It augments standard language modeling with a mechanism for explicit intention representation, allowing post-hoc inspection of a model’s internal decision process by compelling the system to predict a concise set of core keywords summarizing the semantic content of the forthcoming response. The method is compatible with both encoder–decoder and decoder-only transformer models and is supported by empirical evidence of improvement in both automatic and human-aligned metrics. BoK also enables a novel reference-free evaluation paradigm through its BoK-augmented loss as a quality metric (Dey et al., 17 Jan 2025).

1. Mathematical Foundations of BoK Loss

The BoK loss is an auxiliary objective integrated into standard dialogue language modeling, leveraging a cross-entropy loss over a small, linguistically meaningful set of keywords extracted from each target response. The approach extends two pre-existing losses:

Language Modeling (LM) Loss: For a target response $u_t = (u_{t,1}, ..., u_{t,T})$ given dialogue history $D_{<t}$ and optional context $C_t$ ,

$\mathcal{L}_{\mathrm{LM}} = -\sum_{n=1}^{T} \log p(u_{t,n} | u_{t,<n}, D_{<t}, C_t; \theta)$

Bag-of-Words (BoW) Loss: Predicts all tokens in $u_t$ from a context summary $\phi_t$ in an order-agnostic fashion,

$\mathcal{L}_{\mathrm{BoW}} = -\sum_{w \in u_t} \log p(w | \phi_t)$

BoK Loss: Restricts the auxiliary task to a small set $K_t$ (keywords per turn),

$\mathcal{L}_{\mathrm{BoK}} = -\sum_{w \in K_t} \log p(w | \phi_t)$

where $K_t$ is derived by extracting the top $|K_t|$ keywords from $u_t$ using the unsupervised YAKE! algorithm.

The final training objective is the weighted sum:

$\mathcal{L}_{\mathrm{BoK\text{-}LM}} = \mathcal{L}_{\mathrm{LM}} + \lambda \mathcal{L}_{\mathrm{BoK}}$

with $\lambda > 0$ controlling the weight on the auxiliary keyword loss.

The BoK prediction head is instantiated as a single-layer feed-forward network projecting $\phi_t$ (typically the decoder’s [BOS]-token hidden state) into a vocabulary-sized vector, followed by softmax. During training, if YAKE! extracts no keywords, a sentinel token $<$ nok $>$ is injected to keep gradients consistent.

2. Keyword Extraction and Motivating Intention Representation

BoK’s interpretability relies on explicit identification of each response’s semantic gist. Keyword extraction is performed by YAKE! (Yet Another Keyword Extractor), a statistics-based unsupervised method. For every ground-truth utterance, YAKE! ranks word or subword candidates using features such as local frequency, casing, and positionality, typically selecting the top eight tokens.

The BoK head thus learns, for each turn, to predict these distilled core ideas—even if it cannot reconstruct the entire utterance—offering a compact-sufficient summary of response intention. When inspecting model reasoning, the developer can extract the top-8 predicted tokens (highest BoK softmax entries), yielding an explicit “intention bag” that reveals, before decoding, what semantic elements the reply intends to address.

Qualitatively, such keys are shown to match human expectations. For example, given “What would the roses cost me?”, T5 $_{\mathrm{BoK}}$ predicts {dozen, price, dollars, ...}, and its generated reply “$20 per dozen” is semantically aligned with this intention (Dey et al., 17 Jan 2025).

3. Integration in Transformer Architectures

BoK is architecturally agnostic between popular dialogue generation paradigms:

Encoder–Decoder (e.g., T5): The encoder ingests the dialogue history and any external conditions, and the decoder autoregressively produces the response. The BoK head attaches to the decoder’s initial hidden state and predicts keywords derived from the response, with gradients flowing into both encoder and decoder parameters.
Decoder–Only (e.g., DialoGPT): The model observes the dialogue as a single continuous prefix, with the BoK head again attached to the decoder's BOS hidden state.

In both classes, training aims to minimize the combined $\mathcal{L}_{\mathrm{LM}} + \lambda \mathcal{L}_{\mathrm{BoK}} $loss via standard <a href="https://www.emergentmind.com/topics/gradient-descent-gd" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">gradient descent</a>. At inference time, the BoK head provides a transparent “plan” vector alongside the response, enabling post-hoc semantic introspection.</p> <h2 class='paper-heading' id='empirical-evaluation'>4. Empirical Evaluation</h2> <p>BoK-augmented models were assessed on DailyDialog (general chit-chat) and Persona-Chat (persona-grounded open-domain) datasets, comparing against plain LM and BoW-augmented baselines.</p> <p><strong>Performance Gains:</strong></p> <div class='overflow-x-auto max-w-full my-4'><table class='table border-collapse w-full' style='table-layout: fixed'><thead><tr> <th>Model</th> <th>BLEU-4 (DailyDialog)</th> <th>USL$ _{S} $-H (DailyDialog)</th> <th>BLEU-4 (Persona-Chat)</th> <th>Dial-M (Persona-Chat)</th> </tr> </thead><tbody><tr> <td>T5 (vanilla)</td> <td>12.05</td> <td>0.6718</td> <td>—</td> <td>—</td> </tr> <tr> <td>T5$ _{\mathrm{BoK}} $</td> <td>13.24</td> <td>0.6793</td> <td>—</td> <td>—</td> </tr> <tr> <td>DialoGPT (vanilla)</td> <td>11.68</td> <td>—</td> <td>—</td> <td>—</td> </tr> <tr> <td>DialoGPT$ _{\mathrm{BoK}} $</td> <td>14.92</td> <td>—</td> <td>—</td> <td>—</td> </tr> </tbody></table></div> <p>Other notable metrics:</p> <ul> <li>T5$ _{\mathrm{BoK}} $achieves BLEU-3 of 19.19 vs. 18.29 (vanilla).</li> <li>BoK increases USL$ _{S} $-H specificity by +0.31 (DialoGPT, DailyDialog).</li> <li>Persona-Chat: BoK yields Dial-M 17.72 vs. 16.67 (plain) and increases USL$ _{S} $-H further relative to both LM and BoW.</li> </ul> <p><strong>Human Evaluations:</strong></p> <ul> <li>BoK-trained models generate more informative and interactive replies.</li> <li>Human annotators preferred BoK responses for informativeness, with win margins of 44%–30% in head-to-head comparisons.</li> </ul> <h2 class='paper-heading' id='interpretability-and-post-hoc-analysis'>5. Interpretability and Post-hoc Analysis</h2> <p>The BoK head’s output is explicitly interpretable as the system’s “intention vector.” After generation, a practitioner can observe the top-n tokens$ \arg\max_{w}\alpha_{t,w} $(softmax over the BoK head), verifying semantic coherence with the actual reply.</p> <p>This mechanism delivers transparency not attainable under purely autoregressive LM training, effectively surfacing the model’s “plan”—what content will be mentioned or omitted—before or alongside text generation. This facilitates error analysis, semantic debugging, and explanations for users and developers.</p> <h2 class='paper-heading' id='bok-lm-loss-as-a-reference-free-metric'>6. BoK-LM Loss as a Reference-Free Metric</h2> <p>The combined BoK-LM loss,$ \mathcal{L}_{\mathrm{BoK\text{-}LM}}$, can serve as a reference-free dialogue evaluation metric: on a given context–response pair, one computes the total loss sans access to a gold reference. Lower losses indicate higher response quality.

Empirically, BoK-LM matches or surpasses a suite of metrics (BERTScore, BLEURT, Dial-M, USR, HolisticEval) on USR, GRADE (ConvAI2/DailyDialog), PredictiveEngage, and FED benchmarks. Its Pearson/Spearman correlations with human judgements rank in the top tier; BoK-LM consistently outperforms BoW-LM as a metric, confirming the interpretive value of distilling key tokens over matching the full response vocabulary.

7. Significance, Limitations, and Prospects

BoK loss operationalizes model-agnostic interpretability for dialogue generation, balancing improved specificity with actionable introspection. By restricting the auxiliary task to succinct salient tokens, the approach yields both higher response quality and an explicit semantic summary for each turn, strengthening dialog system transparency in open-domain settings.

Limitations include dependence on the quality of keyword extraction and the inability to recover phrase-level or multiword semantic relations within the keywords. A plausible implication is that extending to phrasal or concept-based “key elements” could further enhance both generative control and interpretability.

BoK’s success as both a learning signal and an ad hoc metric suggests broad applicability to contexts requiring interpretable neural generation, post-hoc model auditing, or deployments in settings where developers and end users require explicit justification of model actions (Dey et al., 17 Jan 2025).

Markdown Upgrade to Chat

References (1)

BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Physics-Informed Neural Network.