Neural Probabilistic Language Modeling: Overcoming the Curse of Dimensionality with Distributed Representations
Keywords:
Curse of dimensionality, Distributed representation, Neural networks, Natural language processing, Statistical language modelingAbstract
Statistical language modeling is the task of predicting the probability of word sequences in a given language. This task is complicated by the "curse of dimensionality," which arises when previously unseen word sequences appear during testing, making it difficult to estimate probabilities since these sequences differ significantly from those encountered during training. Traditional n-gram models address this challenge by leveraging overlapping word sequences. We propose an alternative approach that mitigates this issue by utilizing distributed word representations. These representations allow the model to generalize better by capturing semantic similarities, thus enabling it to predict word sequences more effectively. In this framework, the model learns both (1) distributed representations for individual words and (2) a probability function to predict word sequences based on these representations. This approach assigns high probability to unseen sequences if they contain words that are semantically similar to those encountered in the training data. To tackle the complexity of training models with millions of parameters, we employ neural networks to model the probability function. Our experimental results on two large text corpora demonstrate that this method significantly outperforms traditional n-gram models by using longer contexts for more accurate predictions.
References
M. Snir, S. W. Otto, D. W. Walker, J. Dongarra, S. H. Lederman, “MPI: The Complete Reference”, MIT Press, 1995, 352 pages.
M. J. Quinn, “Parallel Programming in C with MPI and OpenMP," McGraw-Hill, 2004, 529 pages.
J. L. Hennessy and D. A. Patterson, “Computer Architecture: A Quantitative Approach," 6th ed., Morgan Kaufmann, 2017.
I. Foster, “Designing and Building Parallel Programs," Addison-Wesley, 1995.
A. S. Tanenbaum and H. Bos (2014), “Modern Operating Systems”, 4th ed., Pearson, 2014
M. J. Flynn, "Some Computer Organizations and Their Effectiveness," in IEEE Transactions on Computers, vol. C-21, no. 9, pp. 948-960, Sept. 1972, doi: https://doi.org/10.1109/TC.1972.5009071.
V. Kumar, A. Grama, A. Gupta, and G. Karypis, “Introduction to parallel computing. Design and analysis of algorithms,” Benjaming/Cummings, vol. 2, Jan. 1994, https://www.researchgate.net/publication/201976857_Introduction_to_parallel_computing_Design_and_analysis_of_algorithms
H. F. Jordan and G. Alaghband, “Fundamentals of Parallel Processing”, Prentice Hall, 2003.
D. E. Culler, J. P. Singh and A. Gupta, “Parallel Computer Architecture: A Hardware/Software Approach”, Morgan Kaufmann, 1999.
W. Gropp and E. Lusk, “Using MPI: Portable Parallel Programming with the Message Passing Interface | MIT Press eBooks | IEEE Xplore,” Ieee.org, 2024. https://ieeexplore.ieee.org/book/6267273