Min-GPT: A Small Dataset Approach to Language Modeling

Mithlesh  Arya; Pranav  Sharma; Saroj  Agarwal

Authors

Mithlesh Arya Associate Professor, Department of Computer Science & Engineering, Swami Keshvanand Institute of Technology, Management & Gramothan (SKIT), Jaipur, Rajasthan, India
Pranav Sharma Undergradute Student, Department of Computer Science & Engineering, Swami Keshvanand Institute of Technology, Management & Gramothan (SKIT), Jaipur, Rajasthan, India
Saroj Agarwal Associate Professor, Department of Information Technology, Swami Keshvanand Institute of Technology, Management & Gramothan (SKIT), Jaipur, Rajasthan, India

Keywords:

Min-GPT, Natural Language Processing (NLP), Normalization, Tokenization, Text Generation

Abstract

The paper aims to develop a compact version of GPT that can process and generate coherent sentences using a limited dataset. This miniature model demonstrates language model adaptability and efficiency, showcasing how a reduced architecture can still handle essential language generation tasks effectively. The Min-GPT model investigates the capabilities of a scaled-down GPT model in Natural Language Processing (NLP) tasks using minimal data and computational resources. By reducing the number of parameters and dataset size, this paper evaluates how core elements of the GPT architecture function in a compact model, making sophisticated language models more accessible and practical for smaller-scale applications.
The implemented model utilizes a token-based character encoding scheme, uses a transformer architecture with self-attention mechanisms and multi-head attention. It incorporates a vocabulary derived from the dataset, token embeddings, and position embeddings to capture contextual relationships. The model was trained with a batch size of 64 and a block size of 258, utilizing 6 attention heads and 6 transformer layers with a dropout rate of 0.2. Optimization was performed using the AdamW optimizer with a learning rate of 3e-4. The evaluation was conducted using perplexity as a metric to measure the effectiveness of text generation.
The model was tested by generating text sequences given an initial prompt. The results indicate that despite the reduced scale, Min-GPT retains fundamental NLP capabilities, generating coherent sentences while adhering to learned patterns. However, limitations were observed in diversity and long-term dependencies due to the restricted dataset size. Additionally, visualization techniques such as loss vs. batch number plots were utilized to monitor training progress and assess convergence.
This study highlights the viability of compact transformer-based models in resource-limited settings, demonstrating trade-offs between computational efficiency and linguistic expressiveness. Future enhancements could include fine-tuning on pre-trained language models and employing data augmentation techniques to improve text diversity and fluency.

References

Y. Tay, M. Dehghani, D. Bahri, and D. Metzler, "Efficient transformers: A survey," ACM Computing Surveys, vol. 55, no. 6, pp. 1–28, Dec. 2022. https://doi.org/10.1145/3530811

I. Adeshola and A. P. Adepoju, "The opportunities and challenges of ChatGPT in education," Interactive Learning Environments, vol. 32, no. 10, pp. 6159–6172, Nov. 2024. https://doi.org/10.1080/10494820.2023.2253858

S. Salih, O. Husain, M. Hamdan, S. Abdelsalam, H. Elshafie, and A. Motwakel, "Transforming education with AI: A systematic review of ChatGPT's role in learning, academic practices, and institutional adoption," Results in Engineering, vol. 24, Art. no. 103837, Dec. 2024. https://doi.org/10.1016/j.rineng.2024.103837

G. Briganti, "How ChatGPT works: a mini review," Eur. Arch. Otorhinolaryngol., vol. 281, no. 3, pp. 1565–1569, Mar. 2024. https://doi.org/10.1007/s00405-023-08337-7

D. Zhu, J. Chen, X. Shen, X. Li, and M. Elhoseiny, "MiniGPT-4: Enhancing vision-language understanding with advanced large language models," arXiv preprint arXiv: 2304.10592, Apr. 2023. https://arxiv.org/abs/2304.10592

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, and S. Agarwal, "Language models are few-shot learners," Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, 2020. https://doi.org/10.48550/arXiv.2005.14165

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. 2019 Conf. North American Chapter Assoc. Comput. Linguistics: Human Language Technol. (NAACL-HLT), vol. 1, pp. 4171–4186, Jun. 2019. https://arxiv.org/abs/1810.04805

Z. Zhou, L. Li, X. Chen, and A. Li, “Mini-Giants: ‘Small’ language models and open source win-win,” arXiv preprint arXiv:2307.08189, Jul. 17, 2023. https://arxiv.org/abs/2307.08189

N. Gupta, S. S. Choudhuri, P. N. Hamsavath, and A. Varghese, Fundamentals Of Chat GPT For Beginners Using AI, Academic Guru Publishing House, 2024.

J. IT, D. Kurniadi, Y. Septiana, and A. Sutedi, "Alternative text pre-processing using chat GPT open AI," Jurnal Nasional Pendidikan Teknik Informatika: JANAPATI, vol. 12, no. 1, pp. 67-77, Mar. 2023. https://doi.org/10.23887/janapati.v12i1.59746

Min-GPT: A Small Dataset Approach to Language Modeling

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

Current Issue