Dimensionality Prediction for Word Embeddings

Publication Type : Conference Paper

Publisher : Springer Singapore

Source : Algorithms for Intelligent Systems

Url : https://doi.org/10.1007/978-981-33-4604-8_24

Campus : Amaravati

School : School of Computing

Year : 2021

Abstract : Word embeddings are the basic building blocks for many natural language processing tasks. Word embedding models represent both syntax and semantics of a word in a vector space. The vector representations require more space if words are represented using a high dimensionality. The optimal dimensionality of a word embedding partly assists in structure learning of a neural network models for solving natural language processing (NLP) tasks like classification and part-of-speech (PoS) tagging. Embedding dimension chosen by word embedding model has a significant impact on the word vectors generated. high dimensional embeddings have a heavy memory footprint. In this paper, we propose a model to predict the optimal word embedding size for a given corpus so that the word embeddings can be used to learn the structure of a neural network for typical NLP tasks and, the embeddings can be used in memory-constrained devices. A random forest-based regression model is used to predict the word embedding size. Experiments are conducted to identify the features that have a significant effect on the dimensionality of word embeddings.

Cite this Research Publication : Korrapati Sindhu, Karthick Seshadri, Dimensionality Prediction for Word Embeddings, Algorithms for Intelligent Systems, Springer Singapore, 2021, https://doi.org/10.1007/978-981-33-4604-8_24

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Haridwar

Research

Centers

Patents

Publication