Prompt-with-Me: In-IDE Prompt Management
- Prompt-with-Me is a structured prompt management framework embedded in IntelliJ IDEA that organizes, refines, and reuses prompts as engineering artifacts.
- It automates language improvements, similarity detection, and taxonomy-based annotation to ensure high-quality and secure prompt handling.
- Empirical studies show improved developer efficiency, high usability, and reliable automation in LLM-driven software engineering workflows.
Prompt-with-Me is a structured prompt management framework integrated directly into the software engineering workflow via an IntelliJ IDEA plugin. Unlike ad hoc or external prompt repositories, the system emphasizes the in-situ organization, classification, refinement, and reuse of prompts as first-class engineering artifacts. Its architecture incorporates automated prompt classification, template extraction, refinement suggestions, anonymization, and comprehensive taxonomy-based annotation, all designed to facilitate scalable, secure, and high-quality prompt engineering for LLM-driven software development.
1. System Architecture and Integration
Prompt-with-Me is implemented as a JetBrains IntelliJ IDEA plugin, embedding all prompt management operations within the development environment. The architecture features:
- Prompt Library: Central repository for storing, searching, reusing, and templating prompts and derived templates.
- Support Services:
- Optimizor: Suggests language refinements (spelling/grammar), anonymizes sensitive data, and detects duplicates.
- Template Generator: Uses similarity detection to extract parameterized templates from similar prompts.
- Automated Classification Pipeline: Each prompt is labeled across four dimensions (intent, author role, SDLC stage, prompt type) using on-the-fly LLM API calls.
- Separation of Concerns: UI, service, and data layers are cleanly separated; UI in tool windows/dialogs, persistent data in local SQLite, advanced refinement/annotation through Dockerized HTTP services.
- Workflow Alignment: All prompt actions occur within the natural software engineering context—no need to leave the coding environment.
This design ensures structured prompt management is immediately available to developers, tightly coupled with real-world coding and artifact production.
2. Prompt Taxonomy and Semantic Annotation
A central innovation is the use of a four-dimensional taxonomy, automatically applied to every prompt:
| Taxonomy Dimension | Description | Example Values |
|---|---|---|
| Intent (“Why”) | Underlying purpose | Best Practices, Code Review |
| Author Role (“Who”) | Role of prompter | Software Developer, Manager |
| SDLC Phase (“When”) | Software lifecycle context | Coding, Testing, Design |
| Prompt Type (“How”) | Structural formulation | Template-based, Zero-shot |
Each dimension contextualizes the prompt’s function, target audience, project stage, and formulation style. The taxonomy enables systematic library organization, targeted reuse, and quality control, supporting varied team workflows and facilitating maintainability.
A paper over 1,108 real-world prompts confirmed that state-of-the-art LLMs (e.g., Mistral-Small) can annotate these dimensions with substantial human agreement, quantified as Fleiss’ κ = 0.72.
3. Prompt Refinement and Template Extraction
Prompt-with-Me enhances and deduplicates prompts through several automated strategies:
- Language Improvement: Prompts are passed through automated spelling and grammar correction systems, producing suggestions with associated confidence scores.
- Security-focused Anonymization: Local NER models detect sensitive entities (e.g., keys, emails, passwords) and replace them with “[REDACTED]”, maintaining security with ~0.95–0.99 confidence.
- Similarity Detection: An ensemble of Levenshtein distance (40%), Jaccard similarity (30%), and cosine similarity on character n-grams (30%) is used. Prompts exceeding 70% similarity are flagged as potential duplicates.
- Template Extraction: Stable (shared) segments of text are retained and variable regions are replaced with parameterized placeholders (e.g., {language}, {value}) using an LLM-based analysis, yielding reusable templates that reduce copy-paste duplication and improve consistency.
This multi-step approach transforms a flat prompt library into a hierarchically structured, template-rich asset that supports rapid, error-resistant prompting in complex engineering scenarios.
4. Empirical Study: Classification, Usability, and Efficiency
Evaluation is based on both taxonomy labeling (annotation accuracy and inter-rater reliability) and an 11-participant user paper:
- Taxonomy Annotation: Agreement was high (Fleiss’ κ = 0.72), confirming that the taxonomy dimensions are well-defined and machine-annotatable.
- Usability: Mean System Usability Scale (SUS) = 73 (“good”).
- Cognitive Load: Mean NASA Task Load Index (NASA-TLX) = 21, with the highest subscale being mental demand, but still low.
- Developer Feedback: Open-ended responses consistently identified template extraction and duplicate detection as major contributors to reduced repetitive work and improved prompt quality.
- Classification Reliability: Weighted F1 scores for SDLC, intent, and type assignment exceeded 0.87, reflecting high model accuracy in practical settings.
The system’s integration, low cognitive load, and automated refinements yielded direct developer acceptance and measurable workflow efficiency gains.
5. Future Directions and Broader Implications
The research concludes with actionable recommendations for the next generation of prompt management tooling:
- Deeper Workflow Integration: Versioning support, CI/CD linkage, and collaborative shared prompt repositories are needed to scale up to enterprise and large-team contexts.
- Customizable Taxonomies: The four-dimensional taxonomy works generally, but more granular or domain-adapted extensions may improve applicability in specialized fields.
- Transparent and Reversible Changes: Integrated audit trails for all refinements and suggestions to increase trust and control.
- Context-Aware Optimizations: Leveraging project-specific metadata to provide more relevant refinement or deduplication cues.
- Collaboration-first Features: Shared libraries, in-line comments, and robust version control for prompt artifacts to align with established software engineering practices.
Such enhancements are expected to further reduce cognitive load, increase prompt asset quality, and stimulate adoption in industrial workflows.
6. Significance for LLM-Driven Software Engineering
Prompt-with-Me formalizes prompt management as an integral part of the software artifact lifecycle, not an external add-on or ephemeral asset. Automated, taxonomy-driven classification and template extraction support rigorous reuse, security, and quality standards. The empirical studies validate that such systems can substantially reduce the manual, repetitive burden on engineers, improve consistency, and maintain prompt collections at scale. This approach signals a move toward prompt engineering as first-class, manageable, and maintainable software engineering practice, opening the way for robust, scalable adoption of LLMs in industry-integrated development environments (Li et al., 21 Sep 2025).