Publication Type : Conference Proceedings
Publisher : Springer Singapore
Source : Algorithms for Intelligent Systems
Url : https://doi.org/10.1007/978-981-33-4604-8_24
Campus : Amaravati
School : School of Computing
Year : 2021
Abstract : Word embeddings are the basic building blocks for many natural language processing tasks. Word embedding models represent both syntax and semantics of a word in a vector space. The vector representations require more space if words are represented using a high dimensionality. The optimal dimensionality of a word embedding partly assists in structure learning of a neural network models for solving natural language processing (NLP) tasks like classification and part-of-speech (PoS) tagging. Embedding dimension chosen by word embedding model has a significant impact on the word vectors generated. high dimensional embeddings have a heavy memory footprint. In this paper, we propose a model to predict the optimal word embedding size for a given corpus so that the word embeddings can be used to learn the structure of a neural network for typical NLP tasks and, the embeddings can be used in memory-constrained devices. A random forest-based regression model is used to predict the word embedding size. Experiments are conducted to identify the features that have a significant effect on the dimensionality of word embeddings.
Cite this Research Publication : Korrapati Sindhu, Karthick Seshadri, Dimensionality Prediction for Word Embeddings, Algorithms for Intelligent Systems, Springer Singapore, 2021, https://doi.org/10.1007/978-981-33-4604-8_24