A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets (2305.18486v4)
Abstract: The development of LLMs such as ChatGPT has brought a lot of attention recently. However, their evaluation in the benchmark academic datasets remains under-explored due to the difficulty of evaluating the generative outputs produced by this model against the ground truth. In this paper, we aim to present a thorough evaluation of ChatGPT's performance on diverse academic datasets, covering tasks like question-answering, text summarization, code generation, commonsense reasoning, mathematical problem-solving, machine translation, bias detection, and ethical considerations. Specifically, we evaluate ChatGPT across 140 tasks and analyze 255K responses it generates in these datasets. This makes our work the largest evaluation of ChatGPT in NLP benchmarks. In short, our study aims to validate the strengths and weaknesses of ChatGPT in various tasks and provide insights for future research using LLMs. We also report a new emergent ability to follow multi-query instructions that we mostly found in ChatGPT and other instruction-tuned models. Our extensive evaluation shows that even though ChatGPT is capable of performing a wide variety of tasks, and may obtain impressive performance in several benchmark datasets, it is still far from achieving the ability to reliably solve many challenging tasks. By providing a thorough assessment of ChatGPT's performance across diverse NLP tasks, this paper sets the stage for a targeted deployment of ChatGPT-like LLMs in real-world applications.
- Towards a human-like open-domain chatbot. arXiv preprint arXiv:2001.09977.
- Mega: Multilingual evaluation of generative ai.
- Can we trust the evaluation on chatgpt?
- Publicly available clinical bert embeddings. arXiv preprint arXiv:1904.03323.
- Program synthesis with large language models. arXiv preprint arXiv:2108.07732.
- Refined: An efficient zero-shot-capable approach to end-to-end entity linking. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, pages 209–220.
- Promptsource: An integrated development environment and repository for natural language prompts. arXiv preprint arXiv:2202.01279.
- Chatgpt: Applications, opportunities, and threats.
- A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity.
- The second PASCAL recognising textual entailment challenge.
- Findings of the 2019 conference on machine translation (wmt19). In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 1–61.
- Ask and you shall receive (a graph drawing): Testing chatgpt’s potential to apply graph layout algorithms.
- Better by you, better than me, chatgpt3 as writing assistance in students essays.
- Scibert: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620.
- The fifth PASCAL recognizing textual entailment challenge.
- Semantic parsing on Freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1533–1544, Seattle, Washington, USA. Association for Computational Linguistics.
- Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 7432–7439.
- Gpt-neox-20b: An open-source autoregressive language model.
- Findings of the 2014 workshop on statistical machine translation. In Proceedings of the ninth workshop on statistical machine translation, pages 12–58.
- Findings of the 2016 conference on machine translation. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pages 131–198.
- Ali Borji. 2023. A categorical archive of chatgpt failures.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Does chatgpt resemble humans in language use?
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
- Dialogsum: A real-life scenario dialogue summarization dataset. arXiv preprint arXiv:2105.06762.
- Evaluation of chatgpt model for vulnerability detection.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Boolq: Exploring the surprising difficulty of natural yes/no questions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2924–2936.
- Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555.
- Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- The PASCAL recognising textual entailment challenge. In Machine learning challenges. evaluating predictive uncertainty, visual object classification, and recognising tectual entailment, pages 177–190. Springer.
- Auggpt: Leveraging chatgpt for text data augmentation.
- The CommitmentBank: Investigating projection in naturally occurring discourse. To appear in proceedings of Sinn und Bedeutung 23. Data can be found at https://github.com/mcdm/CommitmentBank/.
- Results of the WNUT2017 shared task on novel and emerging entity recognition. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 140–147, Copenhagen, Denmark. Association for Computational Linguistics.
- Sanjay Deshpande and Jakub Szefer. 2023. Analyzing chatgpt’s aptitude in an introductory computer engineering course.
- Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 4171-4186.
- A survey for in-context learning. CoRR, abs/2301.00234.
- Glam: Efficient scaling of language models with mixture-of-experts. In International Conference on Machine Learning, pages 5547–5569. PMLR.
- Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. arXiv preprint arXiv:2101.03961.
- What does chatgpt return about human values? exploring value bias in chatgpt using a descriptive value theory.
- Mathematical capabilities of chatgpt.
- An effective, performant named entity recognition system for noisy business telephone conversation transcripts. In Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022), pages 96–100.
- The Pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027.
- Human-like summarization evaluation with chatgpt.
- Tanda: Transfer and adapt pre-trained transformer models for answer sentence selection. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 7780–7788.
- The third PASCAL recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing, pages 1–9. Association for Computational Linguistics.
- Samsum corpus: A human-annotated dialogue dataset for abstractive summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, pages 70–79.
- Dongyu Gong. 2023. Assessing working memory capacity of chatgpt.
- Google. 2023. Palm 2 technical report. Goole AI.
- News summarization and evaluation in the era of gpt-3.
- Domain-specific language model pretraining for biomedical natural language processing. arXiv preprint arXiv:2007.15779.
- Semantic communications with ordered importance using chatgpt.
- The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation.
- Unifying human and statistical evaluation for natural language generation. arXiv preprint arXiv:1904.02792.
- A survey on recent approaches for natural language processing in low-resource scenarios. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2545–2568.
- Aligning ai with shared human values. Proceedings of the International Conference on Learning Representations (ICLR).
- Measuring massive multitask language understanding. In International Conference on Learning Representations.
- Measuring mathematical problem solving with the math dataset. NeurIPS.
- How good are gpt models at machine translation? a comprehensive evaluation.
- Teaching machines to read and comprehend. Advances in neural information processing systems, 28.
- Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
- Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In International Conference on Machine Learning, pages 4411–4421. PMLR.
- Zero-shot clinical entity recognition using chatgpt.
- Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech. In Companion Proceedings of the ACM Web Conference 2023. ACM.
- Evaluation of chatgpt on biomedical tasks: A zero-shot comparison with fine-tuned generative transformers. arXiv preprint arXiv:2306.04504.
- Myeongjun Jang and Thomas Lukasiewicz. 2023. Consistency analysis of chatgpt.
- Is chatgpt a good translator? yes with gpt-4 as the engine.
- Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611.
- Chart-to-text: A large-scale benchmark for chart summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4005–4023.
- Evaluating gpt-4 and chatgpt on japanese medical licensing examinations.
- Ali Kashefi and Tapan Mukerji. 2023. Chatgpt for programming numerical methods.
- Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 252–262.
- Mind the gap! injecting commonsense knowledge for abstractive dialogue summarization. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6285–6300.
- Chatgpt: Jack of all trades, master of none.
- The moral authority of chatgpt.
- Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466.
- Race: Large-scale reading comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 785–794.
- Chatgpt beyond english: Towards a comprehensive evaluation of large language models in multilingual learning.
- Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.
- Improving named entity recognition in telephone conversations via effective active learning with human in the loop. arXiv preprint arXiv:2211.01354.
- An auto encoder-based dimensionality reduction technique for efficient entity linking in business phone conversations. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3363–3367.
- BLINK with Elasticsearch for efficient entity linking in business conversations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, pages 344–352, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
- Domain adaptation with pre-trained transformers for query-focused abstractive text summarization. Computational Linguistics, 48(2):279–320.
- WSL-DS: Weakly supervised learning with distant supervision for query focused multi-document abstractive summarization. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5647–5654.
- Contextualized embeddings based transformer encoder for sentence similarity modeling in answer selection task. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5505–5514.
- Cqsumdp: A chatgpt-annotated resource for query-focused abstractive summarization based on debatepedia. arXiv preprint arXiv:2305.06147.
- Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
- Chatgpt: A meta-analysis after 2.5 months.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- The Winograd schema challenge. In AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning, volume 46, page 47.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880.
- Solving quantitative reasoning problems with language models. In Advances in Neural Information Processing Systems.
- Evaluating chatgpt’s information extraction capabilities: An assessment of performance, explainability, calibration, and faithfulness.
- Multi-step jailbreaking privacy attacks on chatgpt.
- "hot" chatgpt: The promise of chatgpt in detecting and discriminating hateful, offensive, and toxic comments on social media.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online. Association for Computational Linguistics.
- Xglue: A new benchmark dataset for cross-lingual pre-training, understanding and generation. arXiv, abs/2004.01401.
- Differentiate chatgpt-generated and human-written medical texts.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
- Truthfulqa: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3214–3252.
- A comprehensive evaluation of chatgpt’s zero-shot text-to-sql capability.
- How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2122–2132, Austin, Texas. Association for Computational Linguistics.
- Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Revisiting the gold standard: Grounding summarization evaluation with robust human evaluation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, ACL’23, Toronto, Canada. ACL.
- Towards an automatic turing test: Learning to evaluate dialogue responses. arXiv preprint arXiv:1708.07149.
- Codexglue: A machine learning benchmark dataset for code understanding and generation. CoRR, abs/2102.04664.
- Biogpt: generative pre-trained transformer for biomedical text generation and mining. Briefings in bioinformatics, 23(6):bbac409.
- Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA. Association for Computational Linguistics.
- Semeval-2022 task 11: Multilingual complex named entity recognition (multiconer). In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). Association for Computational Linguistics.
- Gary Marcus. 2022. Is chatgpt really a “code red” for google search?
- Can a suit of armor conduct electricity? a new dataset for open book question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2381–2391.
- Neurips 2020 efficientqa competition: Systems, analyses and lessons learned. In NeurIPS 2020 Competition and Demonstration Track, pages 86–111. PMLR.
- Cross-task generalization via natural language crowdsourcing instructions. arXiv preprint arXiv:2104.08773.
- Chatgpt or human? detect and explain. explaining decisions of machine learning model for detecting short chatgpt-generated text.
- Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786.
- Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807.
- Adversarial NLI: A new benchmark for natural language understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4885–4901, Online. Association for Computational Linguistics.
- Industrial engineering with large language models: A case study of chatgpt’s performance on oil & gas problems.
- OpenAI. 2023. Gpt-4 technical report.
- OpenAI-Blog. 2022. Chatgpt: Optimizing language models for dialogue.
- Linguistic ambiguity analysis in chatgpt.
- Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures. Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019, pages 9 – 16, Mannheim. Leibniz-Institut für Deutsche Sprache.
- Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- Ralph Peeters and Christian Bizer. 2023. Using chatgpt for entity matching.
- To chatgpt, or not to chatgpt: That is the question!
- Towards making the most of chatgpt for machine translation.
- Transfer learning in biomedical natural language processing: An evaluation of bert and elmo on ten benchmarking datasets. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 58–65.
- Inverse scaling prize: Round 1 winners.
- AdapterFusion: Non-destructive task composition for transfer learning. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 487–503, Online. Association for Computational Linguistics.
- Mohammad Taher Pilehvar and Jose Camacho-Collados. 2019. WiC: The word-in-context dataset for evaluating context-sensitive meaning representations. In Proceedings of NAACL-HLT.
- Collecting diverse natural language inference problems for sentence representation evaluation. In Proceedings of EMNLP.
- Is chatgpt a general-purpose natural language processing task solver?
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67.
- Chartsumm: A comprehensive benchmark for automatic chart summarization of long and short summaries. arXiv preprint arXiv:2304.13620.
- Know what you don’t know: Unanswerable questions for squad. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789.
- Aman Rangapur and Haoran Wang. 2023. Chatgpt-crawler: Find out if chatgpt really knows what it’s talking about.
- Can chatgpt assess human personalities? a general evaluation framework.
- Summareranker: A multi-task mixture-of-experts re-ranking framework for abstractive summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4504–4524.
- Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In 2011 AAAI Spring Symposium Series.
- Generating phishing attacks using chatgpt.
- Gender bias in coreference resolution. In Proceedings of NAACL-HLT.
- Winogrande: An adversarial winograd schema challenge at scale. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8732–8740.
- Michael Sandel. 2019. The moral side of murder.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
- Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
- Social iqa: Commonsense reasoning about social interactions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4463–4473.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1073–1083, Vancouver, Canada. Association for Computational Linguistics.
- In chatgpt we trust? measuring and characterizing the reliability of chatgpt.
- Language models are multilingual chain-of-thought reasoners. arXiv preprint arXiv:2210.03057.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
- TaCL: Improving BERT pre-training with token-aware contrastive learning. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2497–2507, Seattle, United States. Association for Computational Linguistics.
- Black-box tuning for language-model-as-a-service. In ICML, volume 162 of Proceedings of Machine Learning Research, pages 20841–20855. PMLR.
- Is chatgpt good at search? investigating large language models as re-ranking agent.
- Teo Susnjak. 2022. Chatgpt: The end of online exam integrity?
- Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261.
- Chatgpt4pcg competition: Character-like level generation for science birds.
- Unifying language learning paradigms. arXiv preprint arXiv:2205.05131.
- Judith Jarvis Thomson. 2020. The Trolley Problem/Das Trolley-Problem (Englisch/Deutsch): Reclam Great Papers Philosophie. Reclam Verlag.
- Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
- Erik F Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the conll-2003 shared task: language-independent named entity recognition. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 142–147.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Chatlog: Recording and analyzing chatgpt across time.
- Attention is all you need. Advances in neural information processing systems, 30.
- Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems, 32.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium. Association for Computational Linguistics.
- Ben Wang. 2021. Mesh-Transformer-JAX: Model-Parallel Implementation of Transformer Language Model with JAX. https://github.com/kingoflolz/mesh-transformer-jax.
- Is chatgpt a good nlg evaluator? a preliminary study.
- Super-naturalinstructions:generalization via declarative instructions on 1600+ tasks. In EMNLP.
- Is chatgpt a good sentiment analyzer? a preliminary study.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
- Emergent abilities of large language models.
- Inverse scaling can become u-shaped. arXiv preprint arXiv:2211.02011.
- Chatgpt or grammarly? evaluating chatgpt on grammatical error correction benchmark.
- Scalable zero-shot entity linking with dense entity retrieval. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6397–6407.
- Zero-shot temporal relation extraction with chatgpt.
- Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4791–4800.
- Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414.
- How would stance detection techniques evolve after the launch of chatgpt?
- ReCoRD: Bridging the gap between human and machine commonsense reading comprehension. arXiv preprint 1810.12885.
- Dialogpt: Large-scale generative pre-training for conversational response generation. In ACL, system demonstration.
- Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 15–20.
- Is chatgpt equipped with emotional dialogue capabilities?
- QMSum: A new benchmark for query-based multi-domain meeting summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5905–5921, Online. Association for Computational Linguistics.
- Can chatgpt understand too? a comparative study on chatgpt and fine-tuned bert. arXiv preprint arXiv:2302.10198.
- A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419.
- Md Tahmid Rahman Laskar (30 papers)
- M Saiful Bari (22 papers)
- Mizanur Rahman (60 papers)
- Md Amran Hossen Bhuiyan (1 paper)
- Shafiq Joty (187 papers)
- Jimmy Xiangji Huang (18 papers)