TrackStudio: Markerless Motion Tracking Toolkit
- TrackStudio is a GUI-based, integrated toolkit for markerless motion tracking in behavioral, clinical, and biomechanical research.
- Its modular, end-to-end pipeline incorporates 2D keypoint detection and 3D triangulation to achieve synchronized and reproducible motion capture.
- The system’s user-friendly design and extensibility empower non-expert users to perform calibration, feature extraction, and visualization with high accuracy.
TrackStudio is a GUI-based, integrated toolkit designed to facilitate markerless motion tracking in behavioral, clinical, and biomechanical research. By consolidating mature open-source modules within a modular, end-to-end pipeline, TrackStudio enables users—regardless of programming expertise—to execute 2D and 3D tracking, calibration, preprocessing, feature extraction, and visualization tasks directly from raw video data. The system is optimized for both non-expert usability and robust performance across diverse experimental conditions using webcams or machine-vision cameras, addressing a longstanding gap in accessible, reliable motion tracking solutions.
1. System Architecture and Modular Pipeline
TrackStudio orchestrates a seven-stage data pipeline that forms its architectural backbone. Each stage is implemented as an interchangeable module, facilitating extensibility:
- Video Acquisition: Raw camera streams are ingested via OBS profiles, supporting both consumer-grade webcams and high-resolution machine-vision cameras.
- Synchronization: Alignment across multiple views is achieved either through hardware timestamps (e.g., LabRecorder) or, preferentially, via an LED indicator that toggles state synchronously across all cameras. This method enables temporal alignment in post-processing by detecting on/off sequences.
- Camera Calibration: Calibration utilizes ChArUco boards, with OpenCV routines wrapped by Anipose to estimate each camera's intrinsic () and extrinsic () parameters:
- 2D Keypoint Detection: MediaPipe Hands is deployed per frame per camera, detecting landmark coordinates for each landmark in view .
- 3D Triangulation: Fuses 2D results into 3D points by minimizing total reprojection error across cameras:
Triangulation is performed via Anipose's least-squares solver (“triangulate”).
- Preprocessing and Feature Extraction: Modules can compute log dimensionless jerk (LDJ), joint angles, limb segment volumes, velocity, and acceleration. Feature sets are designed for direct statistical analysis.
- Visualization: Final modules generate overlays, tiled 3D previews, and interactive trajectory plots. All outputs are formatted for publication or secondary analysis.
Each stage is fully GUI-driven (Tkinter) and requires no text editing or code modifications. The workflow is organized into tiles (“Configure,” “Video Trimming,” “Camera Calibration,” “2D/3D Annotation,” “2D/3D Video Labeling”) with all operations mirrored in the user’s designated saving directory, keeping raw and processed data segregated.
2. Markerless Tracking Algorithms and Quantitative Metrics
The core tracking pipeline integrates MediaPipe’s hand-landmark regression, which applies a palm detector, landmark regressor (21 points), and temporal smoothing per frame and camera to yield . For 3D reconstruction, a known projection matrix allows minimization over .
Reprojection residuals () are computed per timepoint and rescaled from pixels to millimetres:
Temporal stability is quantified using inter-frame correlation, calculated over 3D trajectory magnitudes sampled every 5 ms (excluding direction reversals). Pearson’s is computed between successive epochs:
These procedures yield quantitative and reproducible metrics for benchmarking tracking fidelity and system stability.
3. Calibration Protocols and Error Control
TrackStudio’s calibration window wraps Anipose’s “calibrate” command and leverages OpenCV’s solvePnP and bundle adjustment with ChArUco boards. The user specifies marker size, square size, and grid dimensions through the GUI. Calibration videos must demonstrate the board at several depths and angles (20 s all-cameras-visible; 40–60 s per camera close-up). The algorithm extracts board corners, solves for extrinsics () and lens distortion coefficients (), and writes configuration to a calibration.toml file. The process returns mean reprojection error per frame, with values below 0.8 px required for sub-millimetre 3D accuracy.
A plausible implication is that meticulous calibration and adherence to the manual’s recommendations (e.g., printing the board at 100% on matte stock, ensuring high-contrast and occlusion-free imaging) are necessary to achieve near-optimal tracking precision.
4. Graphical User Interface Design and Usability
The system’s Python/Tkinter GUI divides operations into seven core panels aligning with pipeline stages. Upon launch, users select the raw-video directory, body part (left hand, right hand, or full body), file extension, and camera-suffix pattern (e.g., “-camA”). All subsequent actions—video trimming (manual or LED-driven), calibration, 2D/3D annotation, video labeling—are accessible via sequential buttons, but remain optional and logically ordered.
The workflow enforces separation of raw and processed data. The entire 3D tracking process can be initiated in under five clicks, requiring no programming or configuration file editing. This design significantly lowers operational barriers for non-expert users requiring markerless motion data in behavioral or biomechanical studies.
5. Performance Evaluation Under Diverse Conditions
TrackStudio was validated in three experimental setups:
| Setup | Participants | Cameras | Mean r | Mean LDJ | 3D Error (e, mm) | % Frames >10mm |
|---|---|---|---|---|---|---|
| Seated (object manipulation) | 45 | 3 × Logitech Brio @60 Hz | 0.980±0.004 | 10.55±0.36 | 13.6±10 | 0.58±0.17 |
| Supine MRI mockup | 26 | 4 × webcams (low light, head-bore) | 0.999±0.0001 | 8.36±0.33 | 8.95±4.1 | 0.55±0.16 |
| Mixed (multi-day) | 5 | 5 × FLIR Blackfly S (hand/face/arms) | 0.999±0.0001 | — | hand: 4.41±0.53, face: 9.52±3.0, arms: 17.6±5.65 | <0.01 |
Mean inter-frame correlations exceeded 0.98 in all cases. The mean error for triangulated hand positions ranged from 4.41 mm (high-end cameras) to 13.6 mm (consumer webcams), with error rates above 10 mm occurring in less than 0.6% of frames. These values match or exceed reported benchmarks in the markerless tracking literature, e.g., a 13–34 mm mean per-joint error in "Learnable Triangulation" [Iskakov et al. 2019]. The toolkit retained stable tracking under conditions including occlusion, low light, MRI bore constraints, and unconstrained everyday object use.
6. Extensibility to Other Body Regions and Model Integration
The architecture’s modularity permits extension beyond hand tracking. Switching tracked body parts corresponds to toggling the 2D model: MediaPipe Hands for hands, OpenPose for body or face keypoints. In multi-day experiments, OpenPose’s face-landmark model enabled triangulation of 68 facial points; shoulder and elbow tracking combined MediaPipe’s hand model with a dedicated upper-arm keypointer. Upcoming support for MoveNet and HandDAGT is planned, suggesting further extensibility to additional body regions.
No code or pipeline logic changes are required by the user when switching target body parts; the selection is conducted entirely via the GUI.
7. Practical Implementation and Workflow Pseudocode
TrackStudio provides a comprehensive user guide focusing on three pillars: synchronization, calibration, and optimal recording practices. Recommended practices include LED-based synchronization (one per view, dark background, high-contrast, ~1 minute alternating intervals), multi-angle calibration, fast storage and interfaces (SSD, USB 3.x), high and consistent frame rates (60–240 Hz), and avoidance of patterned clothing to reduce keypoint confusion.
A high-level representation of the workflow, as provided in the toolkit, is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
function TRACKSTUDIO_PIPELINE(raw_dir, save_dir, body_part):
CONFIGURE(raw_dir, save_dir, body_part)
for each trial_folder in raw_dir:
VIDEOS = load_videos(trial_folder)
if use_LED_sync: VIDEOS = trim_and_sync(VIDEOS)
if calibrate_3D:
CALIB_MODEL = calibrate_cameras(calib_videos, board_params)
for each trial in save_dir/videos-raw:
KEYPOINTS2D = mediapipe_annotate(trial, body_part)
if calibrate_3D:
KEYPOINTS3D = anipose_triangulate(KEYPOINTS2D, CALIB_MODEL)
FEATURES = extract_motion_features(KEYPOINTS3D)
visualize_2d(trial, KEYPOINTS2D)
if calibrate_3D: visualize_3d(trial, KEYPOINTS3D)
SAVE_RESULTS(KEYPOINTS2D, KEYPOINTS3D, FEATURES) |
This schematic reflects a direct and accessible end-to-end path from video acquisition to publication-ready motion features and visualizations. Each processing block encapsulates either an established open-source library or a custom module, ensuring accessibility and future adaptability.
8. Significance and Prospective Directions
TrackStudio constitutes a turnkey solution for reliable markerless tracking in environments ranging from laboratory object manipulation to clinical MRI contexts. Its validated performance, extensibility, and user-centered interface render it a significant practical tool for researchers lacking access to tiered marker-based or custom software solutions. Forthcoming releases are expected to provide macOS and Linux support as well as integrate additional 2D pose models. A plausible implication is that TrackStudio will accelerate the adoption of markerless motion capture in both applied and research domains by standardizing best practices and lowering technical barriers for high-quality data acquisition.