LoF Scale: Marine Biofouling Assessment
- Level of Fouling is a standardized six-point ranking system that categorizes marine biofouling based on visible slime and macrofouling coverage.
- Automated assessments employ image classification, semantic segmentation, and LLMs to accurately predict LoF categories and quantify fouling extent.
- Hybrid pipelines combining quantitative imaging and LLM-driven interpretability enhance accuracy at class boundaries despite challenges with dataset imbalance.
The Level of Fouling (LoF) scale is a standardized six-point ranking system used to categorize the severity of marine biofouling on vessel hulls and related submerged surfaces. The system quantifies fouling in terms of the presence of slime (biofilm) and the proportionate surface coverage by visible macrofouling organisms. The LoF scale serves as a benchmark for ecological risk assessment, management of biofouling operations, and for the development and evaluation of automated biofouling detection and classification systems (Hamilton et al., 28 Jan 2026).
1. Definition and Formal Structure of the LoF Scale
The LoF scale, as defined by Davidson et al. (2019), employs six discrete categories (0–5). Each category corresponds to explicit criteria based on visible slime and the percentage cover by macrofouling organisms:
| LoF | Description | Macro-fouling Coverage |
|---|---|---|
| 0 | No slime, no macrofouling | 0% |
| 1 | Slime layer present, no visible macrofouling | 0% |
| 2 | Sparse macrofouling (patchy or isolated) | 1–5% |
| 3 | Moderate macrofouling patches | 6–15% |
| 4 | Extensive macrofouling (majority still clean) | 16–40% |
| 5 | Heavy macrofouling (very heavy coverage) | 41–100% |
Assignment to a LoF category follows a binary-decision flow: (1) Is slime visible? If no, LoF 0; if yes, proceed to (2) Is macrofouling present? If no, LoF 1; if yes, estimate the macrofouling coverage and assign LoF 2–5 according to the thresholds above (Hamilton et al., 28 Jan 2026).
2. Mathematical Quantification and Decision Rules
Automated measurement of LoF relies on pixel-wise estimates of surface coverage, typically from semantic segmentation masks. The coverage for macrofouling is given by:
where is the set of pixels predicted as macrofouling, and the set corresponding to the entire hull (including Clean, Slime, Macrofouling). An identical formulation applies to slime coverage.
The LoF category is assigned from coverage values via:
This mapping underpins all automated LoF assessments in computer vision and machine learning pipelines (Hamilton et al., 28 Jan 2026).
3. Datasets and Label Distributions in Automated LoF Assessment
Automated LoF classification relies on curated, expert-labeled datasets. A documented dataset from the New Zealand Ministry for Primary Industries contains 762 images with the following distribution:
| LoF | Image Count |
|---|---|
| 0 | 7 |
| 1 | 263 |
| 2 | 70 |
| 3 | 113 |
| 4 | 126 |
| 5 | 183 |
An 80%/20% train/test split was applied across all model evaluations. A significant class imbalance exists, with relatively fewer samples for LoF 0 and 2 (Hamilton et al., 28 Jan 2026). This skew affects model calibration, particularly at intermediate LoF levels.
4. Methods for Automated Assessment
Multiple pipelines have been employed for mapping images to LoF scores:
- Raw Image Classification: ResNet-18 and ResNet-50 architectures (ImageNet-pretrained) classify raw RGB images using cross-entropy loss. Preprocessing using HSV color channels and Canny edge detection improved accuracy for intermediate LoF classes, with test accuracy rising from 60.22% to 62.72%.
- Semantic Segmentation: The SegFormer transformer segmenter outputs pixel-wise labels for Water, Clean, Slime, and Macrofouling. Area proportions are computed, and LoF is inferred via the pixel coverage rule. However, SegFormer tended to output extreme-class predictions (100% slime or macrofouling), leading to over-prediction at LoF 1 and 5 and instability at the intermediates.
- Evaluation Metrics: Class-wise and overall accuracy, precision, recall, and F1 score were computed as per standard definitions, using true positive, false positive, and false negative counts for each LoF class.
- Multimodal LLMs: Large multimodal LLMs were evaluated in zero-shot setups, using both baseline and role-framed, structured prompts encoding the LoF definitions and decision tree. Addition of official LoF guideline text via retrieval-augmented prompts yielded modest alignment gains (Hamilton et al., 28 Jan 2026).
A summary of classifier and segmentation model performance:
| Approach | Accuracy (%) | Notable Characteristics |
|---|---|---|
| ResNet-18/50 RGB | 60.2 | Strong on extremes (LoF 1,5); weaker for LoF 2–4 |
| ResNet-18/50 HSV+ | 62.7 | Improved intermediate LoF separation |
| SegFormer | Unstable | Over-prediction of extreme categories |
| LLM (zero-shot) | 51.1 | Accurate on extremes; over-classification at boundaries |
LLM performance was highly prompt-dependent; initial prompts classified only 5.1% of images, while detailed prompts increased coverage to 94.8% (Hamilton et al., 28 Jan 2026).
5. Prompting Strategies and Zero-shot Multimodal LLMs
Vision-enabled LLMs (e.g., GPT-4V via OpenRouter) received input images and contextual system prompts delineating LoF scale and thresholds. Two templates were examined:
- Baseline Prompt: Provided LoF definitions, requested LoF score, justification, and invasive species note.
- Final System Prompt: Simulated expert role, specified decision-tree and coverage thresholds, and required outputs for LoF rating, estimated coverage, species, and biosecurity risk.
Injecting official LoF guideline excerpts further grounded the LLM to domain standards. However, this had a limited effect, as detailed prompts were sometimes ignored due to length. Conservative prompt calibration increased LoF 1 precision to 75.5% at a cost to overall accuracy (42.7%)—a plausible implication is heightened cautiousness led to systematic under-classification at higher LoF levels (Hamilton et al., 28 Jan 2026).
6. Hybrid and Integrated Assessment Pipelines
Hybrid strategies exploit complementary strengths of segmentation and LLM approaches. The pipeline involves:
- Segmenting the image (SegFormer) to compute and .
- Preliminary LoF assignment via the deterministic decision rule.
- Supplying the original image and coverage percentages to a multimodal LLM with a domain-formulated prompt.
- LLM refines border estimates, delivers textual justifications, species identification, and risk rating.
Mathematically, the hybrid LoF estimate is:
with denoting the image input. Such integration allows rigorous coverage-based assignments supplemented by LLM interpretability, with improved transparency at LoF class boundaries (Hamilton et al., 28 Jan 2026).
7. Significance and Limitations
The LoF scale enables reproducible, standardized biofouling severity assessment critical for regulatory, ecological, and operational contexts. Automated systems grounded in the LoF framework face challenges at class boundaries due to image framing, variability in fouling appearance, and dataset imbalance. Computer vision classifiers are robust at extremes but may misclassify intermediate categories; segmentation models offer explainability at the expense of stability; and LLMs provide textual reasoning, but their outputs are sensitive to prompt design.
The convergence of quantitative segmentation with LLM-driven interpretability via hybrid models offers a promising direction for scalable, explainable, and accurate marine biofouling assessment on the LoF scale (Hamilton et al., 28 Jan 2026).