Papers
Topics
Authors
Recent
2000 character limit reached

SCB-Dataset: Classroom Behavior & NLP Benchmark

Updated 30 November 2025
  • SCB-Dataset comprises multiple datasets with detailed annotations for student classroom behavior and English–Thai translation, serving as benchmarks for object detection and NLP tasks.
  • The datasets utilize rigorous annotation protocols with YOLO-style and COCO-JSON formats, ensuring quality labeling across diverse classroom scenarios.
  • Applications include automated classroom analytics, enhanced engagement modeling, and improved detection algorithms through attention modules and transformer architectures.

The term SCB-Dataset refers to several distinct, technically significant datasets across multiple domains, most notably education-focused behavior detection in classroom scenarios and natural language processing (notably English–Thai machine translation). The predominant usage in current literature, with broad impact on computer vision for education, is the Student Classroom Behavior Dataset. This family of datasets provides large-scale, real-world, multi-class annotations of classroom behaviors, establishing public benchmarks and enabling empirical advances in automated classroom analytics.

1. Taxonomy and Variants

SCB-Dataset as a term primarily refers to a series of publicly released datasets targeting the fine-grained recognition of student (and sometimes teacher) behaviors via object detection or spatio-temporal models in classroom imagery. Distinct variants include:

| Variant | Size / Annotations | Classes | Notable Paper / Release Year | |------------------------------------|-----------------------------|------------------|-----------------------------| | SCB-Dataset (YOLOv7 version) | 4,200 images, 18,400 boxes | Hand raising, reading, writing (3) | (Yang, 2023) (2023) | | SCB-Dataset (BRA + YOLOv7) | 4,001 images, 11,248 boxes | Standing, sitting, speaking, listening, walking, raising hands, reading, writing (8) | (Yang et al., 2023) (2023) | | SCB-Dataset3 | 5,686 images, 45,578 boxes | Hand-raising, reading, writing, using phone, bowing head, leaning table (6) | (Yang et al., 2023) (2023) | | SCB-ST-Dataset4 | 757,265 images, 25,810 boxes| Hand-raising, reading, writing (3) (spatio-temporal clips) | (Yang et al., 2023) (2023) | | SCBehavior / SCB-DETR Dataset | 1,346 images, 9,911 boxes | Writing, reading, looking up, turning head, raising hand, standing, discussing (7) | (Wang et al., 10 Oct 2024) (2024) | | SCB-Dataset (Student+Teacher) | 13,330 images, 122,977 boxes| 19 behaviors (student+teacher), 12 detection, 14 classification | (Yang, 2023) (2023) | | SCB Synthesis (for text editing) | 1,000,000 synthetic images | Style, content, background (text STE) | (Bao et al., 17 Nov 2025) (2025) | | SCB (Seeing Culture Benchmark) | 1,065 images, 3,178 MCQ | Visual reasoning, segmentation (cultural artifacts) | (Satar et al., 20 Sep 2025) (2025) | | SCB-MT-EN-TH-2020 | 1,000,000+ parallel segments| N/A (MT corpus: English–Thai) | (Lowphansirikul et al., 2020) (2020) |

The term "SCB-Dataset" is thus highly overloaded; in education-centric vision, it universally denotes Student Classroom Behavior datasets, while in other contexts (e.g., (Bao et al., 17 Nov 2025, Satar et al., 20 Sep 2025, Lowphansirikul et al., 2020)) it is either part of a STE data synthesis construct, a cultural reasoning VQA benchmark, or denotes the "SCB" corpus for English–Thai MT.

2. Dataset Creation and Annotation Protocols

The canonical SCB-Dataset (Yang, 2023, Yang et al., 2023, Yang et al., 2023, Yang et al., 2023, Yang, 2023, Wang et al., 10 Oct 2024) is constructed from frame-level annotation of real-world classroom videos sourced from platforms such as bjyhjy, 1s1k, youke.qlteacher, and youke-smile.shec. Key aspects:

  • Sampling Strategy: Representative frames (3–15 per video) are drawn across a wide spectrum of camera angles (front/side/back), seating densities (occluded/dense), and learning stages (kindergarten to high school).
  • Annotation Guidelines: Bounding boxes are drawn tightly around each individual exhibiting a behavior of interest. For hand-raising, boxes include the hand and adjacent torso; reading/writing boxes encompass upper body and book/desk. Ambiguous or low-visibility instances (<20%) are excluded. Every annotation undergoes dual review to ensure label accuracy (Yang, 2023).
  • Label Distribution: SCB-Dataset3 (Yang et al., 2023) expands to six major behaviors, with detailed counts (e.g., reading: 18,667, hand-raising: 11,213).
  • Format: YOLO-style text files or COCO-JSON are prevalent; each image’s annotations are delivered as one file, specifying normalized class and box coordinates (Yang, 2023, Yang et al., 2023, Yang et al., 2023, Yang, 2023).
  • Advanced Datasets: SCB-ST-Dataset4 (Yang et al., 2023) introduces automated spatio-temporal expansion from image datasets: per-frame labels from a seed image are propagated across video segments to construct short behavior clips, minimizing manual effort.

3. Dataset Splits, Balancing, and Class Distribution

Splitting is typically 80% training, 20% validation (Yang, 2023, Yang et al., 2023, Yang, 2023). No explicit test set is given except in SCBehavior (Wang et al., 10 Oct 2024), which reports train/val/test (6,413/2,565/933 boxes). Class imbalance is persistent; for instance, in (Yang, 2023), reading is under-represented relative to writing and hand-raising. Balancing is achieved via class-weighted losses:

wc=1ncw_c = \frac{1}{n_c}

where ncn_c is the class count.

In more complex datasets (Yang, 2023), up/down-sampling is employed during frame extraction to increase rare class frequency. Final splits, especially for classification tasks, are explicit and stratified ((Yang, 2023), Table 3).

4. Evaluation Protocols and Metrics

Detection evaluation is standardized on precision, recall, and mean Average Precision (mAP) at specified Intersection-over-Union (IoU) thresholds:

Precision=TPTP+FPRecall=TPTP+FN\text{Precision} = \frac{TP}{TP+FP} \quad \text{Recall} = \frac{TP}{TP+FN}

APc(t)=01Pc(r)dr,mAP@0.5=1Nclassesc=1NclassesAPc(0.5)\text{AP}_c(t) = \int_0^1 P_c(r) dr,\quad [email protected] = \frac{1}{N_\text{classes}}\sum_{c=1}^{N_\text{classes}} AP_c(0.5)

[email protected]:0.95 (COCO-style) sweeps IoU from 0.50 to 0.95. For behavior similarity, (Yang et al., 2023) and (Yang et al., 2023) introduce the Behavior Similarity Index (BSI), quantifying the visual overlap between classes. For classification, macro-averaged F1, precision, and recall are reported (Yang, 2023).

5. Baseline Results and Model Comparisons

SCB-Dataset benchmarks have driven empirical advances in detection accuracy across multiple network architectures:

Model Variant [email protected] (%) Key Details Source Paper
YOLOv7 (baseline) 77.2 Standard, SCB-Dataset (3 classes) (Yang, 2023)
YOLOv7 + BRA 78.8 Bi-level Routing Attention (Yang, 2023)
YOLOv7 + Wise-IoU v3 79.0 +1.8% over baseline with improved IoU loss (Yang, 2023)
YOLOv7-BRA 87.1 8-class, strong fusion with SlowFast (Yang et al., 2023)
YOLOv7x (SCB3-S) 80.3 Best in SCB3-S (3-class) (Yang et al., 2023)
YOLOv7x (SCB-ST) 86.8 Hand-raising class, spatio-temporal dataset (Yang et al., 2023)
SlowFast 96.9 (HR) Highest AP for hand-raising, weaker for writing (Yang et al., 2023)
YOLOv7 (teacher behaviors) 94.0 Largest scale, 12 detection classes (Yang, 2023)
SCB-DETR 62.6 (mAP) Multi-scale deformable transformer, 1.5% gain (Wang et al., 10 Oct 2024)

A prominent trend is that models augmented with attention modules (BRA, Wise-IoU), multi-model fusion, or transformer-based architectures outperform vanilla object detectors, particularly in crowded and occluded scenes.

All variants of SCB-Dataset are openly available for academic use. Typical access mechanisms include:

Some variants (e.g., SCB Synthesis for STE (Bao et al., 17 Nov 2025)) provide code for synthetic data generation and scripts for complex multi-attribute group construction.

7. Research Impact and Future Directions

The SCB-Dataset series fills a critical void in large-scale, behavior-resolved datasets for classroom analytics, enabling:

  • Benchmarking novel detection architectures (YOLOv7, multi-scale transformer, multi-model fusion) in education.
  • Fair comparison on realistic, noisy, and occluded data spanning student age, pose, and class context.
  • Progress in advanced applications including engagement modeling, teacher effectiveness feedback, and context-aware content recommendation.

Remaining challenges include class imbalance, under-representation of rare or confusable behaviors (notably “writing,” “bowing head,” “leaning table”), and the need for spatio-temporal action annotation beyond per-frame boxes. New directions cited in the literature are the expansion of annotated university scenes (Yang et al., 2023), joint temporal tracking, class-balanced training objectives, and integration with large vision–LLMs for holistic multi-modal AI in education.

Other "SCB" datasets are domain-specific but methodologically relevant: scene text editing disentanglement (Bao et al., 17 Nov 2025), visual-cultural reasoning (Satar et al., 20 Sep 2025), and high-resource MT (Lowphansirikul et al., 2020) further demonstrate the impact of careful multi-attribute annotation and robust open dataset construction.


References:

  • "Student Classroom Behavior Detection based on Improved YOLOv7" (Yang, 2023)
  • "Student Classroom Behavior Detection based on YOLOv7-BRA and Multi-Model Fusion" (Yang et al., 2023)
  • "SCB-Dataset3: A Benchmark for Detecting Student Classroom Behavior" (Yang et al., 2023)
  • "Student Classroom Behavior Detection based on Spatio-Temporal Network and Multi-Model Fusion" (Yang et al., 2023)
  • "SCB-Dataset: A Dataset for Detecting Student and Teacher Classroom Behavior" (Yang, 2023)
  • "Multi-Scale Deformable Transformers for Student Learning Behavior Detection in Smart Classroom" (Wang et al., 10 Oct 2024)
  • "TripleFDS: Triple Feature Disentanglement and Synthesis for Scene Text Editing" (Bao et al., 17 Nov 2025)
  • "Seeing Culture: A Benchmark for Visual Reasoning and Grounding" (Satar et al., 20 Sep 2025)
  • "scb-mt-en-th-2020: A Large English-Thai Parallel Corpus" (Lowphansirikul et al., 2020)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to SCB-Dataset.