Prakriti200 Dataset Overview

Updated 13 February 2026

Prakriti200 is a bilingual dataset designed for Ayurvedic trait assessment using a standardized 24-item questionnaire.
It employs a deterministic rule-based mapping algorithm to score and classify responses into Vata, Pitta, and Kapha doshas.
The dataset supports research in computational intelligence, predictive modeling, and personalized health analytics with clear demographic insights.

Prakriti200 is a bilingual (English-Hindi) dataset comprising responses from 200 individuals to a standardized Prakriti Assessment Questionnaire. Designed in accordance with classical Ayurvedic principles and AYUSH/CCRAS guidelines, it systematically evaluates the physical, physiological, and psychological traits of participants via 24 mandatory multiple-choice items. Each item is mapped to one of the canonical Ayurvedic doshas—Vata, Pitta, or Kapha—enabling automated scoring and classification of constitutional types. The data collection was performed remotely across India using a Google Forms infrastructure with automated backend scoring and structured export, supporting a range of applications in computational intelligence, personalized health analytics, and Ayurvedic research (Singh et al., 5 Oct 2025).

1. Composition and Collection Protocol

Prakriti200 includes records from 200 participants, primarily young adults (age 15–72 years; mean ≈ 21.5 years; 81% between 18–25; 63.5% female and 36.5% male). Recruitment was conducted remotely across India using a Google Forms deployment. The bilingual interface ensured accessibility, and data were collected with all items mandatory and phrased to minimize bias. Automated scoring and mapping to dosha-specific scores was performed on the backend, with anonymized responses stored in Google Sheets and exported as a UTF-8 encoded Excel file. Rigorous data cleaning excluded approximately 1% of entries for implausibility, resulting in a dataset without missing values.

2. Questionnaire Taxonomy and Dosha Mapping

The Prakriti Assessment Questionnaire contains 24 items, evenly distributed across three domains:

Physical Features (Q1–Q8): body weight, height, bone structure, muscle development, skin texture, complexion, hair texture, body frame.
Physiological Features (Q9–Q16): appetite strength, thirst, digestion quality, bowel habits, sleep quantity, sleep pattern, energy levels, cold/heat tolerance.
Psychological Features (Q17–Q24): temperament, patience, concentration span, memory, speech rate, decision-making style, emotional reactivity, adaptability.

Each item employs a uniform three-option format, with each response pre-mapped to a single dosha (Vata, Pitta, or Kapha) on the backend. Dosha labels are hidden from participants to prevent response bias.

Domain	Number of Items	Example Features
Physical	8	Weight, Height, Skin, Bone Structure
Physiological	8	Appetite, Sleep, Digestion
Psychological	8	Temperament, Memory, Adaptability

3. Scoring Algorithm

Doshas are scored using a deterministic rule-based mapping for each response. For each item $i$ and dosha $d$ , a fixed weight $w_{i,d} \in \{0,1\}$ is assigned, representing whether the selected response $r_i$ corresponds to dosha $d$ . The raw dosha score $S_d$ is then calculated as:

$S_d = \sum_{i=1}^{24} w_{i,d} · \mathbb{1}(r_i = o_{i,d})$

where $o_{i,d}$ is the response mapping for item $i$ and dosha $d$ , and $d$ 0 denotes the indicator function. No further normalization is applied, resulting in a maximum possible score of 24 per dosha. Dominant dosha is assigned as:

$d$ 1

Mixed (dual-dominant) types are recorded in the case of score ties (e.g., Vata–Pitta). The codebook provides the precise mapping $d$ 2 for each item and its corresponding permissible responses in both languages.

4. Data Format, Structure, and Accessibility

The dataset is distributed as a UTF-8 Excel file (Prakriti_Dataset.xlsx), with conversion to CSV/JSON straightforward. Principal fields include:

Participant_ID
Age (years)
Gender (Male/Female/Other)
Location (state/city)
Q1 ... Q24: string codes for responses
Vata_Score, Pitta_Score, Kapha_Score (integers, [0–24])
Dominant_Dosha (including dual types: Vata–Pitta, Pitta–Kapha, Kapha–Vata)

The aforementioned codebook details the response mappings and question texts (English/Hindi), ensuring reproducibility for downstream users. The dataset is publicly available via IEEE DataPort.

5. Statistical Characterization

Descriptive statistics indicate a predominance of Pitta dominance, with the following dosha-type frequencies:

Dosha Type	Count	Percentage
Pitta	97	48.5 %
Pitta–Kapha	44	22 %
Pitta–Vata	27	13.5 %
Vata	14	7 %
Kapha	14	7 %
Kapha–Vata	4	2 %

Trait prevalence observations include: 67.5% selecting medium body weight; 62% medium height; 60% reporting fair/reddish complexion; 38.5% indicating strong appetite; 48.5% deep sleepers; 54.5% reporting low/variable thirst.

Dosha score distributions are summarized as:

Vata_Score: mean ≈ 8.3; median ≈ 8; variance ≈ 6.5
Pitta_Score: mean ≈ 11.4; median ≈ 11; variance ≈ 7.2
Kapha_Score: mean ≈ 7.2; median ≈ 7; variance ≈ 5.8

Correlation analysis reveals a moderate positive association between Pitta scores and medium appetite, a negative correlation between Vata scores and sleep depth, and generally weak inter-dosha correlations due to the rule-based orthogonality of item mapping.

6. Research Applications and Use Cases

Prakriti200 serves multiple roles across disciplines:

Computational Intelligence: benchmark for supervised learning (dosha prediction); unsupervised clustering for constitutional type discovery.
Predictive Modeling: feature basis for predicting lifestyle or disease predisposition in external health datasets.
Personalized Health Analytics: integration potential with multimodal sensor data (e.g., facial images, pulse waveform) for hybrid assessments.
Fairness & Bias Analysis: demographic bias investigation in dosha score distribution across age and gender strata.
Pedagogical Uses: demonstration of bilingual, digitally deployed questionnaire techniques for Ayurveda research in data science education.

A plausible implication is that the dataset provides a reference model for the design of standardized, bias-minimized, self-report trait assessments in both biomedical and computational health contexts.

7. Methodological Constraints and Future Directions

Several constraints shape the interpretive boundaries of Prakriti200:

Sample Representativeness: The cohort is predominantly young (81% are 18–25), limiting generalizability to broader demographics.
Rule-Based Assessment: Dosha labels reflect questionnaire-derived mappings rather than comprehensive clinical evaluation (no pulse or direct clinical diagnosis).
Data Completeness: All questions are mandatory; coherence checks led to exclusion of ~1% implausible records.
Best Practices: Users are advised to treat dominant dosha labels as proxies for constitutional type, validate predictive models externally, and corroborate findings with additional clinical or multimodal data in translational deployments.

Future dataset expansions target increased demographic diversity, incorporation of image-based and physiological modalities, and development of API/JSON access endpoints. This approach is intended to foster reproducible research and enable rigorous computational studies in Ayurvedic informatics (Singh et al., 5 Oct 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Prakriti200: A Questionnaire-Based Dataset of 200 Ayurvedic Prakriti Assessments (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Prakriti200 Dataset.

Prakriti200 Dataset Overview

1. Composition and Collection Protocol

2. Questionnaire Taxonomy and Dosha Mapping

3. Scoring Algorithm

4. Data Format, Structure, and Accessibility

5. Statistical Characterization

6. Research Applications and Use Cases

7. Methodological Constraints and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Prakriti200 Dataset Overview

1. Composition and Collection Protocol

2. Questionnaire Taxonomy and Dosha Mapping

3. Scoring Algorithm

4. Data Format, Structure, and Accessibility

5. Statistical Characterization

6. Research Applications and Use Cases

7. Methodological Constraints and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research