Papers
Topics
Authors
Recent
Search
2000 character limit reached

INTER-EXPLIC Corpus: Multimodal Interaction Data

Updated 26 January 2026
  • INTER-EXPLIC corpus is a comprehensive multimodal dataset capturing French classroom interactions with synchronized video, audio, and sensor data.
  • It employs a detailed three-tier annotation system (ouverture, noyau, clôture) to structure explicative and collaborative sequences.
  • The corpus supports manual analysis and machine learning, advancing studies in teaching gestures and interactive pedagogical phenomena.

The INTER-EXPLIC corpus is a research-grade collection of multimodal classroom interaction data focusing on explicative and collaborative sequences in French as a Foreign Language (FLE) and French as a First Language (FLM) educational contexts. Developed at Université de Toulouse II in 2006 and maintained through the TechnéLAB infrastructure at Université de Poitiers, INTER-EXPLIC is characterized by its rich, synchronized capture of speech, gesture, prosody, gaze, and digital traces. Its primary aim is to enable rigorous, granular analyses of the semiotics of teaching and learning, serving both as a reference for manual annotation and as a testbed for machine learning approaches targeting the automatic processing of complex interactive pedagogical phenomena (Rançon et al., 19 Jan 2026).

1. Scope, Composition, and Recording Context

INTER-EXPLIC comprises approximately 30 hours of recorded authentic classroom sessions, corresponding to around 25 distinct teaching events primarily involving FLE/FLES instructors and students ranging from B1 to C1 proficiency. The data include over 100 manually coded explanatory episodes, with each such sequence annotated in a tripartite scheme: ouverture (problematisation), noyau (core explanation), and clôture (reception or ratification). The corpus encompasses explanation sequences, collaborative meaning-negotiation, lexical reformulation, and multiparty question–answer rounds. Settings include TechnéLAB’s activity room (7×9 m), outfitted for advanced multimodal capture (Rançon et al., 19 Jan 2026).

2. Multimodal Data Acquisition and Infrastructure

Data are acquired using a multi-tiered, tightly synchronized system. Video is captured from six fixed HD cameras and two PTZ cameras, ensuring coverage of instructor, learners, and group interactions, with synchronization via SMPTE timecode. Audio streams are separately recorded using lavalier microphones for all principal participants and ceiling-mounted directional microphones for ambient sound, with all streams digitized at 48 kHz. Eye-tracking (Tobii Pro Glasses) is deployed on both instructors and selected students, providing temporally aligned gaze vectors. Locomotor and spatial data are collected through BLE (Bluetooth Low Energy) bracelets, while digital traces from interactive whiteboards are also logged. Backup is performed via automatic FTP to redundant university servers (Rançon et al., 19 Jan 2026).

3. Annotation Conventions and Transcription Schemes

Annotation follows an ICOR-derived transcription scheme, distinguishing three principal tiers: VERB (verbal), PARA (paraverbal, i.e., prosody), and NONV (non-verbal modalities). Each annotation is time-aligned (mm:ss.d) to the master time-code. Speaker turns are marked (e.g., MIC for instructor), with square brackets denoting overlapping speech and chevrons marking co-speech gestures. Gestural and postural events use double parentheses, per Mondada, for descriptive notes. Silences are measured and marked as SIL (t), and prosodic focus is noted by underlining or capitalization.

A multi-level annotation table details the segmentation of nonverbal resources: | Label | Modality | Description | |-----------|-------------|--------------------------------------------------| | VERB | Verbal | Transcribed words, disfluencies, pauses | | PARA | Paraverbal | Prosody—pitch, boundary tones, rhythm, pauses | | NONV | Non-verbal | Kinesic, proxemic, iconographic support | | DEI | NONV | Deictic—pointing gesture | | ICO | NONV | Iconic—mimic gesture of form or action | | MET | NONV | Metaphoric—gesture for abstract notion | | EMB | NONV | Emblem—conventional gesture | | BAT | NONV | Beat—rhythmic gesture, no semantic value | | BUT | NONV | Butterworth—gesture of lexical search/hesitation | | SIL | PARA | Measured silence | | ACC | PARA | Accentuation (focus/stress marking) |

Example annotation schema:

  • [VERB] qu’est-ce que ça veut dire
  • SIL
  • [NONV.DEI] <( (pointe l’écran avec index droit) )>
  • NONV.ICO

These conventions operationalize multi-modal and conversation-analytic frameworks to support both fine-grained manual examination and computational feature extraction (Rançon et al., 19 Jan 2026).

4. Analytical Frameworks and Theoretical Foundations

INTER-EXPLIC adopts a tripartite explanatory discourse model (ouverture–noyau–clôture) based on Baker (1992), Barbieri et al. (1990), and Fasel Lauzon (2014). It extends conversation analysis (Kerbrat-Orecchioni 2001, Mondada 1995) for microstructure and overlap in speaking turns, incorporates didactics of gesture (Tellier 2008, Rançon 2011) to classify teacher PMG (posturo-mimo-gestualité), and leverages the multimodal temporality framework (Mondada 2010) for action sequencing. Gesture classification draws on acrasique schemes (Cosnier & Kerbrat-Orecchioni 1987; Colletta 2000), enriched by Ferré (2004) and Mondada (2008), segmenting into deictic, iconic, metaphoric, emblematic, beat, and hesitation-related gestures (Rançon et al., 19 Jan 2026).

5. Corpus Statistics and Empirical Profiles

The corpus totals approximately 30 hours, spanning 25 sessions and 105 manually identified explanatory sequences, with average durations of 2 s (opening), 16 s (nucleus), and 3 s (closure) per sequence. Aggregate annotation counts are:

  • Verbal (VERB): 2,350 (43%)
  • Paraverbal (SIL/ACC): 1,200 (22%)
  • Non-verbal (gestures): 1,900 (35%)
  • Total: 5,450 annotated events

Annotated examples underscore the fusion of modalities. For instance, one sequence features synchronous verbal inquiry, measured silences, deictic and iconic gestures, and active scaffolding of conceptual understanding through gesture and speech. Another excerpt documents collaborative paraphrase with gesture trajectories mirroring the pedagogical function, evidencing the corpus’s fitness for both microanalytic studies and automated learning pipelines (Rançon et al., 19 Jan 2026).

6. Accessibility and Research Applications

INTER-EXPLIC is not freely available but may be accessed upon request to specified contacts at TECHNE (EA 6316) or the Maison des Sciences de l’Homme de Poitiers. Use cases include:

  • Automatic classification of discourse segments (opening/nucleus/closure).
  • Recognition and typology of co-verbal gestures (deictic, iconic, metaphoric).
  • Extraction of multimodal features for predicting student comprehension.
  • Comparative studies of didactic explicitation strategies (FLE vs. FLM).

The design intentionally aligns with machine learning needs, offering synchronized, multi-layered data and formal annotations compatible with supervised or semi-supervised algorithms. A plausible implication is that INTER-EXPLIC can catalyze research at the intersection of multimodal learning analytics, teaching gesture studies, and discourse-oriented AI systems (Rançon et al., 19 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to INTER-EXPLIC Corpus.