- The paper introduces a two-stage curriculum learning framework that dynamically adjusts loss weights for embedding, recovery, and steganalysis tasks.
- It employs both discrete and continuous scheduling methods to shift focus during training based on predefined curricula and loss dynamics.
- Experimental results on datasets like ALASKA2, VOC2012, and ImageNet demonstrate improved imperceptibility and decoding accuracy over fixed-weight schemes.
Deep learning-based image steganography frameworks often employ multiple loss functions during training to ensure the invisibility of the hidden message, the accuracy of message recovery, and resistance to steganalysis. These losses typically include an embedding loss (measuring the difference between the cover image and the stego image), a recovery loss (measuring the difference between the original and recovered secret message), and a steganalysis loss (adversarially training against a steganalyst to make the stego image undetectable).
A common approach is to combine these losses using fixed weights. However, this can be suboptimal because the training dynamics of different tasks (embedding, recovery, steganalysis) vary throughout the training process. The paper "TSCL: Multi-party loss Balancing scheme for deep learning Image steganography based on Curriculum learning" (2504.18348) addresses this by proposing a Two-stage Curriculum Learning loss scheduler (TSCL) to dynamically balance these multiple losses.
The core idea behind TSCL is to adapt the loss weights based on both a predefined curriculum (task importance) and the observed training progress (loss dynamics). This acknowledges that different aspects of the steganography task become more or less important at different stages of training, and that tasks may learn at different speeds.
The TSCL scheme consists of two distinct phases:
- A Priori Curriculum Control Stage:
* This stage ensures the model builds capabilities layer by layer, focusing on fundamental requirements before refining performance on more complex adversarial objectives.
- Loss Dynamics Control Stage:
- After the initial curriculum stage, the training enters this phase where loss weights are adjusted dynamically based on how quickly each loss is decreasing.
- The core idea is that if a task's loss is decreasing slowly, it indicates that the task is currently difficult for the model to learn, and its corresponding weight should be increased to give it more training focus in the next iteration.
- The learning speed of a task k at iteration t is measured by the ratio of the loss at iteration t−1 to the loss at iteration t−2: a0​0. A value close to 1 means slow progress, while a smaller value means faster progress.
- These dynamic ratios are then multiplied by a set of predefined a priori coefficients (a0​1) to maintain the relative importance established in the first stage. The weight for task a0​2 at iteration a0​3 becomes a0​4. The paper sets a0​5 (specifically 1, 0.8, 0.4 in experiments) to reflect the overall importance hierarchy.
- This stage provides an adaptive mechanism to balance the learning of different tasks based on their current optimization difficulty, preventing simple tasks from overfitting while difficult tasks lag behind.
Practical Implementation:
Implementing TSCL involves integrating this dynamic weight calculation logic into the training loop of a deep learning steganography model. Assuming a standard adversarial training setup with an encoder, decoder, and steganalyst, the training loop structure would involve:
- Calculating the three primary losses: encoding loss (a0​6), decoding loss (a0​7), and steganalysis loss (a0​8). Note that a0​9 is used differently for updating the encoder/decoder (minimize detection likelihood) and the steganalyst (maximize detection likelihood).
- At each training iteration (or epoch, as described in the paper's evaluation), determining the current epoch number.
- Based on whether the current epoch falls within the A Priori Curriculum Control stage or the Loss Dynamics Control stage, calculating the weights for each loss:
- If in Stage 1: Use the predefined schedule (discrete steps or continuous function) to get the weights.
- If in Stage 2: Calculate the loss ratio for each task based on the losses from the two previous iterations/epochs and multiply by the a priori coefficients.
- Combine the losses using the calculated weights: a2​0. (The steganalyst has its own objective, typically minimizing a2​1 with respect to its own parameters, using samples labeled as 'stego' and 'cover').
- Perform backpropagation and optimizer steps using the weighted total loss for the encoder/decoder and the steganalyst's loss for the steganalyst.
- Crucially, maintain a history of the individual loss values (e.g., averaged per epoch) to calculate the loss ratios needed for Stage 2.
Here is simplified pseudocode demonstrating the weight calculation logic within the training loop:
a2​4
Implementation Considerations and Trade-offs:
- Hyperparameter Tuning: The performance of TSCL is sensitive to the choice of curriculum schedule parameters (a2​2, function types for continuous, step sizes/magnitudes for discrete) and the a priori coefficients (a2​3). These need to be tuned based on the specific model architecture, dataset, and desired performance characteristics (balancing imperceptibility, accuracy, and security).
- Computational Overhead: The computational cost of calculating loss ratios and updating weights is minimal compared to the cost of forward and backward passes through the neural networks.
- Choosing Curriculum Functions/Schemes: The experiments show that the continuous sinusoidal function improves PSNR but might decrease accuracy, while discrete schemes can improve both depending on settings. The "unfixed iteration step size and weight adjustment amplitude" discrete scheme performed well in the paper's tests. Practitioners may need to experiment with different schedules.
- Loss History: Maintaining and accessing loss history adds a small memory overhead but is necessary for Stage 2. The paper calculates ratios based on epoch-averaged losses, simplifying the history requirement compared to per-batch tracking.
- Generalizability: The TSCL principle (prioritizing based on curriculum and adapting based on dynamics) can be applied to other multi-task learning or multi-loss optimization problems beyond steganography where task importance and learning difficulty change over time.
- Task Definition: Clearly defining the loss functions and how they contribute to the overall objectives (imperceptibility, recovery, security) is crucial for setting up the curriculum and a priori priors effectively.
The experimental results on ALASKA2, VOC2012, and ImageNet datasets show that TSCL generally improves performance across imperceptibility (SSIM, MSSSIM, PSNR, RMSE) and decoding accuracy compared to a fixed-weight baseline. Security against steganalysis showed mixed results, improving on ALASKA2 but slightly decreasing on ImageNet, highlighting the inherent trade-offs in steganography tasks and the challenge of balancing all objectives perfectly. The visual results further support that TSCL leads to stego images with fewer noticeable artifacts. The paper demonstrates that the two-stage approach is more effective than using either stage in isolation, confirming the synergy between structured curriculum learning and dynamic adaptation.