Prakriti200 Dataset Overview
- Prakriti200 is a bilingual dataset designed for Ayurvedic trait assessment using a standardized 24-item questionnaire.
- It employs a deterministic rule-based mapping algorithm to score and classify responses into Vata, Pitta, and Kapha doshas.
- The dataset supports research in computational intelligence, predictive modeling, and personalized health analytics with clear demographic insights.
Prakriti200 is a bilingual (English-Hindi) dataset comprising responses from 200 individuals to a standardized Prakriti Assessment Questionnaire. Designed in accordance with classical Ayurvedic principles and AYUSH/CCRAS guidelines, it systematically evaluates the physical, physiological, and psychological traits of participants via 24 mandatory multiple-choice items. Each item is mapped to one of the canonical Ayurvedic doshas—Vata, Pitta, or Kapha—enabling automated scoring and classification of constitutional types. The data collection was performed remotely across India using a Google Forms infrastructure with automated backend scoring and structured export, supporting a range of applications in computational intelligence, personalized health analytics, and Ayurvedic research (Singh et al., 5 Oct 2025).
1. Composition and Collection Protocol
Prakriti200 includes records from 200 participants, primarily young adults (age 15–72 years; mean ≈ 21.5 years; 81% between 18–25; 63.5% female and 36.5% male). Recruitment was conducted remotely across India using a Google Forms deployment. The bilingual interface ensured accessibility, and data were collected with all items mandatory and phrased to minimize bias. Automated scoring and mapping to dosha-specific scores was performed on the backend, with anonymized responses stored in Google Sheets and exported as a UTF-8 encoded Excel file. Rigorous data cleaning excluded approximately 1% of entries for implausibility, resulting in a dataset without missing values.
2. Questionnaire Taxonomy and Dosha Mapping
The Prakriti Assessment Questionnaire contains 24 items, evenly distributed across three domains:
- Physical Features (Q1–Q8): body weight, height, bone structure, muscle development, skin texture, complexion, hair texture, body frame.
- Physiological Features (Q9–Q16): appetite strength, thirst, digestion quality, bowel habits, sleep quantity, sleep pattern, energy levels, cold/heat tolerance.
- Psychological Features (Q17–Q24): temperament, patience, concentration span, memory, speech rate, decision-making style, emotional reactivity, adaptability.
Each item employs a uniform three-option format, with each response pre-mapped to a single dosha (Vata, Pitta, or Kapha) on the backend. Dosha labels are hidden from participants to prevent response bias.
| Domain | Number of Items | Example Features |
|---|---|---|
| Physical | 8 | Weight, Height, Skin, Bone Structure |
| Physiological | 8 | Appetite, Sleep, Digestion |
| Psychological | 8 | Temperament, Memory, Adaptability |
3. Scoring Algorithm
Doshas are scored using a deterministic rule-based mapping for each response. For each item and dosha , a fixed weight is assigned, representing whether the selected response corresponds to dosha . The raw dosha score is then calculated as:
where is the response mapping for item and dosha , and 0 denotes the indicator function. No further normalization is applied, resulting in a maximum possible score of 24 per dosha. Dominant dosha is assigned as:
1
Mixed (dual-dominant) types are recorded in the case of score ties (e.g., Vata–Pitta). The codebook provides the precise mapping 2 for each item and its corresponding permissible responses in both languages.
4. Data Format, Structure, and Accessibility
The dataset is distributed as a UTF-8 Excel file (Prakriti_Dataset.xlsx), with conversion to CSV/JSON straightforward. Principal fields include:
- Participant_ID
- Age (years)
- Gender (Male/Female/Other)
- Location (state/city)
- Q1 ... Q24: string codes for responses
- Vata_Score, Pitta_Score, Kapha_Score (integers, [0–24])
- Dominant_Dosha (including dual types: Vata–Pitta, Pitta–Kapha, Kapha–Vata)
The aforementioned codebook details the response mappings and question texts (English/Hindi), ensuring reproducibility for downstream users. The dataset is publicly available via IEEE DataPort.
5. Statistical Characterization
Descriptive statistics indicate a predominance of Pitta dominance, with the following dosha-type frequencies:
| Dosha Type | Count | Percentage |
|---|---|---|
| Pitta | 97 | 48.5 % |
| Pitta–Kapha | 44 | 22 % |
| Pitta–Vata | 27 | 13.5 % |
| Vata | 14 | 7 % |
| Kapha | 14 | 7 % |
| Kapha–Vata | 4 | 2 % |
Trait prevalence observations include: 67.5% selecting medium body weight; 62% medium height; 60% reporting fair/reddish complexion; 38.5% indicating strong appetite; 48.5% deep sleepers; 54.5% reporting low/variable thirst.
Dosha score distributions are summarized as:
- Vata_Score: mean ≈ 8.3; median ≈ 8; variance ≈ 6.5
- Pitta_Score: mean ≈ 11.4; median ≈ 11; variance ≈ 7.2
- Kapha_Score: mean ≈ 7.2; median ≈ 7; variance ≈ 5.8
Correlation analysis reveals a moderate positive association between Pitta scores and medium appetite, a negative correlation between Vata scores and sleep depth, and generally weak inter-dosha correlations due to the rule-based orthogonality of item mapping.
6. Research Applications and Use Cases
Prakriti200 serves multiple roles across disciplines:
- Computational Intelligence: benchmark for supervised learning (dosha prediction); unsupervised clustering for constitutional type discovery.
- Predictive Modeling: feature basis for predicting lifestyle or disease predisposition in external health datasets.
- Personalized Health Analytics: integration potential with multimodal sensor data (e.g., facial images, pulse waveform) for hybrid assessments.
- Fairness & Bias Analysis: demographic bias investigation in dosha score distribution across age and gender strata.
- Pedagogical Uses: demonstration of bilingual, digitally deployed questionnaire techniques for Ayurveda research in data science education.
A plausible implication is that the dataset provides a reference model for the design of standardized, bias-minimized, self-report trait assessments in both biomedical and computational health contexts.
7. Methodological Constraints and Future Directions
Several constraints shape the interpretive boundaries of Prakriti200:
- Sample Representativeness: The cohort is predominantly young (81% are 18–25), limiting generalizability to broader demographics.
- Rule-Based Assessment: Dosha labels reflect questionnaire-derived mappings rather than comprehensive clinical evaluation (no pulse or direct clinical diagnosis).
- Data Completeness: All questions are mandatory; coherence checks led to exclusion of ~1% implausible records.
- Best Practices: Users are advised to treat dominant dosha labels as proxies for constitutional type, validate predictive models externally, and corroborate findings with additional clinical or multimodal data in translational deployments.
Future dataset expansions target increased demographic diversity, incorporation of image-based and physiological modalities, and development of API/JSON access endpoints. This approach is intended to foster reproducible research and enable rigorous computational studies in Ayurvedic informatics (Singh et al., 5 Oct 2025).