PyGress: Python Proficiency Visualization
- PyGress is a web-based tool that automatically analyzes and visualizes code proficiency by mapping Python constructs to CEFR levels from A1 to C2.
- It uses a full-stack pipeline with PyDriller, pycefr, and Plotly to extract commit histories, compute delta-scores, and generate interactive visualizations.
- Empirical evaluations on OSS projects such as django-silk and pandas-profiling reveal proficiency trends and highlight opportunities for AST diffing optimizations.
PyGress is a web-based system for automatically analyzing and visualizing the progression of code proficiency in Python open-source software (OSS) projects. It employs the pycefr analyzer to evaluate Python source code constructs according to the Common European Framework of Reference (CEFR), quantifying developer proficiency from beginner (A1) to advanced (C2) levels. By submitting a GitHub repository, users obtain interactive, project- and contributor-specific visualizations of proficiency distributions and their evolution over time, enabling a nuanced assessment of expertise dynamics in collaborative Python OSS environments.
1. System Architecture and Workflow
PyGress is implemented as a full-stack, end-to-end pipeline comprising five principal phases: repository ingestion, commit-history extraction, code preprocessing and diffing, proficiency scoring, and interactive visualization. User interaction begins with submission of a GitHub repository URL through a Flask-based front end. The back end invokes PyDriller to traverse the Git commit history, extract commit metadata (author, timestamp, file changes), and, for each modified Python file, generate "before" and "after" code snapshots. These snapshot pairs are analyzed by pycefr, which parses the corresponding ASTs and classifies Python constructs according to CEFR levels (A1–C2).
The resulting counts are processed to compute per-commit delta-scores, aggregating new construct insertions per level and tracking them by contributor and time window. Aggregated scores are transformed into interactive charts via Plotly and presented in the browser.
The pipeline is represented as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
[User Browser]
|
→ Flask Front-end
|
→ Back-end Controller
|
→ PyDriller (commit extraction)
|
→ Code Diffing (“before”/“after”)
|
→ pycefr Engine (proficiency scoring)
|
→ Aggregator (time series, smoothing)
|
→ Visualization Module (Plotly)
↓
[User Browser Interactive Charts] |
2. CEFR-Based Proficiency Modeling
The core of PyGress's analytic capability is its CEFR-aligned proficiency model, implemented via pycefr. Python constructs are mapped to CEFR levels as follows:
| CEFR Level | Construct Examples |
|---|---|
| A1–A2 | Basic (if, for, nested lists) |
| B1–B2 | Intermediate (break, list comprehensions) |
| C1–C2 | Advanced (generators, metaclasses) |
Given a file version , let denote the count of constructs at level . For each commit modifying file , PyGress calculates:
This expression captures only added constructs (with deletions set to zero). The per-commit proficiency vector is:
Normalized proportion vectors can be formed:
and weighted scalar scores can be computed with arbitrary :
The prototype uses (equal weighting), but the design permits alternative weighting or nonlinear transformation schemes.
3. Commit History Processing and AST Diffing
Commit and file-level analysis is conducted by invoking PyDriller’s RepositoryMining API, restricted to Python file modifications. For each commit and file, the tool acquires complete ASTs of the file before and after the change. The AST snapshots enter the pycefr engine, generating level-wise construct counts. The difference operation yields , representing newly introduced constructs only.
Example code used internally:
1 2 3 4 5 6 |
miner = RepositoryMining(repo_path, only_modifications_with_file_types=['.py']) for commit in miner.traverse_commits(): for mod in commit.modifications: before_snapshot = mod.diff['before'] after_snapshot = mod.diff['after'] # Process with pycefr for level counts |
4. Aggregation, Time-Series Construction, and Trend Detection
Per-commit proficiency vectors are grouped by committer and timestamp. For a set of commits within time window , the aggregate project-level vector is:
Contributor-specific time series are constructed similarly. Smoothing is applied using a moving average of window size :
Trend detection may take the form of linear regression fits for each CEFR level:
A plausible implication is that this method permits identification of progression slopes and periods of increased proficiency complexity within projects or at the individual level.
5. Visualization Strategies
The visualization module utilizes Plotly.js via Python bindings to offer several chart types:
- Spider (radar) charts: Show aggregate or contributor-specific proficiency across all CEFR levels.
- Time-slider graphs: Stacked area charts of smoothed proficiency vectors over time, with interactive controls for temporal navigation.
- Planned heatmap view: Module-by-level matrices, colored by normalized construct counts, to capture per-module proficiency nuances.
Plotly event callbacks in the Flask front end enable interactivity such as hover, zoom, and filtering by contributor.
6. Implementation and Deployment
Backend components utilize Python 3.10, Flask, PyDriller, and pycefr. Frontend elements are implemented with Jinja2 templates, Bootstrap, and Plotly for interactive chart rendering. Dedicated scripts transform aggregated results to JSON compatible with Plotly. The repository is structured into /pygress_backend (data extraction, scoring), /pygress_frontend (Flask app, templates, static assets), and includes Docker/Docker Compose files for containerized deployment.
Setup options are:
1 2 3 |
git clone https://github.com/MUICT-SERU/PyGress.git cd PyGress docker-compose up --build |
1 2 3 |
python3 -m venv .venv && source .venv/bin/activate pip install -r requirements.txt flask run |
7. Empirical Evaluation, Observed Patterns, and Limitations
PyGress has been empirically evaluated on three Python OSS projects—django-silk, pandas-profiling, and pytest-ansible—spanning 2014–2024. The findings include:
- Dominance of A1–A2 constructs across all projects, consistent with Python’s design emphasis on readability and straightforward syntax.
- django-silk exhibited substantial C1–C2 usage in its initial years, indicative of early architectural decisions involving advanced Python features.
- pandas-profiling showed a surge in advanced constructs correlating with major refactor efforts during 2020–2021.
- pytest-ansible maintained low C1–C2 frequencies over its history, reflecting preferences for maintainable simplicity.
Highly proficient contributors were identified via annual sums of C1 + C2 insertions, often corresponding to key project maintainers.
Performance benchmarking on repositories of 10,000 commits indicated a processing time of approximately 3 seconds per commit, with AST analysis as the primary computational bottleneck.
Current limitations include pycefr’s necessity of re-parsing entire files due to diffing granularity, limited per-module granularity (pending heatmap implementation), and lack of independent, human-rated proficiency ground truth. Future work aims to optimize diff-level AST extraction, extend CEFR-based classification to languages beyond Python (e.g., JavaScript via jscefr), incorporate advanced trend and anomaly detection, and validate insights through user studies with OSS maintainers.
PyGress is openly available at https://github.com/MUICT-SERU/PyGress, with a demonstration video at https://youtu.be/hxoeK-ggcWk.