Publication Type:

Conference Paper

Source:

2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE, Jaipur, India (2016)

ISBN:

9781509020294

URL:

https://ieeexplore.ieee.org/document/7732267

Keywords:

Algorithm design and analysis, Clustering algorithms, computational modeling, Context, Dictionaries, Dictionary learning, dictionary learnt word representations, Encoding, expanding systems, K-means, K-SVD, learning word representations, NAtural language processing, NLP, OMP, Orthogonal matching pursuit, Pattern matching, PMI, semantic properties, Semantics, Sparse coding, sparse coding algorithm, Syntactic properties, teaching computers, Terms, use language, words clustering

Abstract:

Language is the way of communication through words. This will help to get a better insight of the world. Natural Language Processing (NLP) mainly concentrate on expanding systems that allow computers to communicate with people using everyday language. One of the challenges inherent in NLP is teaching computers to recognize the way humans learn and use language. Word representations give rise to capture syntactic and semantic properties of words. So the main purpose of this work is to find out the set of words which have similar semantics by matching the context in which the words occur. In this work we explore a new method for learning word representations using sparse coding, a technique usually done on signals and images. We present an efficient sparse coding algorithm, Orthogonal Matching Pursuit to generate the sparse code. Based on the input given, sparse codes are generated for the input. The input term vectors are classified based on the sparse code by grouping the terms which have same sparse code into one class. K-Means algorithm is also used to classify the input terms vectors which have semantic similarities. Finally, this paper makes a comparison that gives the best representation from the sparse code and K-Means. The result shows an improved set of similar words using sparse code when compared to K-Means. This is because SVD is used as a part of dictionary learning which captures the latent relationship that exists between the words.

Cite this Research Publication

Remya Rajesh, Gargi, S., and Samili, S., “Clustering of Words Using Dictionary-learnt Word Representations”, in 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, India, 2016.