BioTune: Bio-Inspired Transfer Learning Framework
- BioTune is a bio-inspired transfer learning framework that uses evolutionary algorithms to determine which CNN layers to freeze and fine-tune.
- It formulates fine-tuning as a discrete–continuous optimization problem, leveraging genetic operators to efficiently adjust learning rates and layer selection.
- Experimental evaluations demonstrate that BioTune outperforms traditional methods by improving accuracy by up to 9.7% while reducing computational cost through selective parameter updates.
BioTune is a bio-inspired evolutionary fine-tuning framework for transfer learning in @@@@0@@@@ (CNNs). It is designed to identify optimal strategies for selective transfer by jointly determining which network blocks to freeze and how to allocate learning rates across layers, thereby maximizing performance and minimizing computational cost. BioTune addresses the complexities of transfer learning, particularly when navigating discrepancies between source and target domains, by formulating the fine-tuning configuration as a combined discrete–continuous optimization problem using an evolutionary algorithm (EA) (Davila et al., 16 Jan 2026, &&&1&&&).
1. Motivation and Conceptual Foundation
Conventional transfer learning approaches typically freeze either all but the last layers or all layers, following rule-of-thumb heuristics. However, such rigid strategies can be suboptimal, particularly under domain shift, either under-adapting or overfitting the target task. The layer-freezing decision, intrinsically combinatorial, interacts in a high-dimensional search space with learning rate schedule settings. Gradient-based hyperparameter optimization methods are ill-suited to this mixed discrete–continuous search domain.
BioTune’s core innovation is the use of evolutionary optimization to explore this configuration space, leveraging the population diversity and global search properties of EAs. Each candidate solution encodes: (1) continuous “importance indices” for each block and (2) a global freezing threshold. By evolving populations of such configurations with genetic operators and momentum-based adoption (drawn from Particle Swarm Optimization, PSO), BioTune efficiently identifies which layers to fine-tune and how aggressively to update them (Davila et al., 16 Jan 2026, Colan et al., 21 Aug 2025).
2. Mathematical Formulation
The pre-trained model is partitioned into functional blocks. The goal is to discover a configuration that maximizes validation accuracy on the target domain:
Each configuration includes per-block importance indices and a threshold . For each block :
- Selection mask: (block is fine-tuned) or $0$ (block is frozen)
- Importance weight:
- Learning-rate multiplier:
- Block-wise learning rate:
Blocks where are frozen, yielding parameter and computation reduction. Validation accuracy, averaged over random seeds/folds, is converted to a minimization fitness:
Lower indicates higher validation accuracy (Davila et al., 16 Jan 2026, Colan et al., 21 Aug 2025).
3. BioTune Optimization Algorithm and Pseudocode
BioTune’s search process consists of evolutionary population-based optimization with hybrid operators. The main steps are:
- Generate stratified data folds.
- Initialize a population of individuals sampled uniformly in .
- For each individual:
- Decode selection mask and importance weights.
- Apply block-wise learning rates (), freeze if .
- Fine-tune the model per fold for up to epochs, record validation accuracy.
- Compute and aggregate fitness .
- Iterate for generations using:
- Elitism: preserve best individuals, with local exploitation via random perturbation.
- Crossover: generate offspring by linear interpolation and momentum-based adoption toward parents and prototypes.
- Mutation: adaptively perturb genes with magnitude linked to parental fitness.
- Selection: form next generation, update best solution , early stop if no improvement.
- Fine-tune using on the full training set, evaluate on test set.
BioTune pseudocode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
Input: pre-trained model M(w⁰), base learning rates λ⁰, search params (N_p,N_g,N_e,N_s)
Output: best configuration ν*
1. Generate N_s stratified folds of training data.
2. Initialize population P₀ of N_p individuals ν ∈ [0,1]^{B+2}.
3. For each ν:
a. Decode η_b(ν) = S_b · W_b:
- S_b = 1 if ν_b > ε_f; else 0
- W_b = 10^{2(ν_b−0.5)}
b. Apply rates λ_b = η_b · λ⁰_b; freeze if η_b = 0.
c. Fine-tune on fold, record val accuracy, repeat over N_s seeds, compute Φ(ν).
4. Sort P₀ by Φ; store best ν*.
for g in 0…N_g−1:
a. Elitism: perturb N_e elites, keep best.
b. For remaining: crossover, mutation, adoption, evaluate offspring.
c. Form next gen, update ν*, early stop if no improvement.
5. Fine-tune full model with ν*; evaluate on test. |
4. Layer-Freezing, Genome Encoding, and Learning-Rate Scaling
The genetic representation (“genome”) in BioTune comprises:
- A continuous index for each block, determining its importance for target adaptation.
- A single threshold that acts globally: blocks with are fine-tuned, otherwise frozen.
- The importance weight assigns a dynamic learning-rate multiplier per block, allowing scaling from up to the base rate, rather than a static or heuristic assignment.
This enables both a binary (freeze/update) selection as well as continuous granularity for the degree of adaptation. As a result, the method provides both parameter-efficiency and interpretability regarding which model components are essential for transfer to the new task (Davila et al., 16 Jan 2026, Colan et al., 21 Aug 2025).
5. Hyperparameters and Experimental Settings
BioTune operates with the following hyperparameters, which balance accuracy and efficiency:
- Population size
- Elite count
- Generations
- Random seeds per fitness evaluation
- Epochs per evaluation: up to 30 (early-stopping patience 3)
- Mutation/perturbation step
- No data augmentation; images resized and normalized per ImageNet conventions
Experiments spanned nine image classification datasets across digit, object, fine-grained, and medical domains, using ResNet-50 as the primary backbone and cross-validated over DenseNet-121, VGG-19, and Inception-v3 (Davila et al., 16 Jan 2026, Colan et al., 21 Aug 2025).
6. Performance Analysis and Comparative Evaluation
BioTune outperformed full fine-tuning (FT), AutoRGN, LoRA, Gradual Unfreezing, L¹-SP, and L²-SP in 8 of 9 benchmark datasets. Results highlight:
- Substantial improvements on fine-grained (Flowers-102, +6.7%) and specialist (FGVC-Aircraft, +9.7%; ISIC2020, +5.1%) datasets compared to FT.
- Comparable or better performance relative to AutoRGN and LoRA, with BioTune surpassing both on 7 of 9 tasks and adapting its percentage of trainable parameters according to domain similarity.
- Parameter efficiency: BioTune selectively updates as little as 30% of parameters (MNIST, ISIC2020) or up to >99% for greater domain shift (SVHN, FGVC-Aircraft).
- Cross-architecture superiority: gains are consistent across ResNet-50, DenseNet-121, VGG-19, and Inception-v3, with Inception-v3, for example, reaching 89.4% accuracy tuning only ~66% of its parameters (Davila et al., 16 Jan 2026, Colan et al., 21 Aug 2025).
Summary of Test-Set Performance on ResNet-50:
| Dataset | FT Acc. | AutoRGN Acc. | LoRA Acc. | BioTune Acc. | % Trainable |
|---|---|---|---|---|---|
| MNIST | 98.96 | 99.00 | 98.51 | 99.13 | 29.97% |
| USPS | 97.05 | 96.91 | 96.92 | 97.57 | 36.86% |
| SVHN | 95.56 | 96.08 | 95.46 | 95.85 | 100.0% |
| CIFAR-10 | 95.65 | 96.05 | 95.17 | 96.09 | 100.0% |
| STL-10 | 97.33 | 96.92 | 97.46 | 97.50 | 64.93% |
| Flowers-102 | 85.33 | 85.50 | 86.01 | 91.68 | 99.12% |
| FGVC-Aircraft | 58.68 | 57.94 | 54.78 | 64.40 | 99.96% |
| DTD | 68.03 | 65.70 | 68.17 | 69.27 | 64.89% |
| ISIC2020 | 78.91 | 79.48 | 80.91 | 82.90 | 29.93% |
These results demonstrate the adaptability of BioTune to various tasks and data characteristics (Davila et al., 16 Jan 2026, Colan et al., 21 Aug 2025).
7. Ablation Studies and Key Empirical Findings
Ablation analyses revealed several critical factors in BioTune’s design:
- Optimization algorithm: The hybrid memetic/EA approach consistently outperformed vanilla GA, DE, and PSO variants, reaching lower fitness more rapidly.
- Importance-weight function: Exponential scaling of learning rates () produced significantly better fitness (0.069) than discriminative, scaled, or normalized alternatives (≈0.12).
- Fitness function: Accuracy-based fitness ($1 -$ mean validation acc.) proved superior for evolution than either variance-regularized or loss-based alternatives.
- Population size trade-off: Increasing and improves outcome but at increased computational cost; , offers a balanced trade-off.
- Per-generation data fraction: Accuracy with only 10% of training data per generation approaches that of full set (90.5% vs. 91.1%) with substantially reduced compute (1.6 h vs. 11.4 h), supporting data-efficient optimization (Davila et al., 16 Jan 2026, Colan et al., 21 Aug 2025).
References
- "Bio-inspired fine-tuning for selective transfer learning in image classification" (Davila et al., 16 Jan 2026)
- "Transfer learning optimization based on evolutionary selective fine tuning" (Colan et al., 21 Aug 2025)