Bridging the Gap in Bangla Healthcare: Machine Learning Based Disease Prediction Using a Symptoms-Disease Dataset
Abstract: Increased access to reliable health information is essential for non-English-speaking populations, yet resources in Bangla for disease prediction remain limited. This study addresses this gap by developing a comprehensive Bangla symptoms-disease dataset containing 758 unique symptom-disease relationships spanning 85 diseases. To ensure transparency and reproducibility, we also make our dataset publicly available. The dataset enables the prediction of diseases based on Bangla symptom inputs, supporting healthcare accessibility for Bengali-speaking populations. Using this dataset, we evaluated multiple machine learning models to predict diseases based on symptoms provided in Bangla and analyzed their performance on our dataset. Both soft and hard voting ensemble approaches combining top-performing models achieved 98\% accuracy, demonstrating superior robustness and generalization. Our work establishes a foundational resource for disease prediction in Bangla, paving the way for future advancements in localized health informatics and diagnostic tools. This contribution aims to enhance equitable access to health information for Bangla-speaking communities, particularly for early disease detection and healthcare interventions.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.