In recent years, social media networking has grown to be a marvel of technology in our way of life. Facebook operates the world's leading web-based social networking system with over 2.19 billion clients(as of the first quarter of 2018). As its popularity increased, more individuals from all age demographics, have been accessing this growing phenomenon. Resultant usage of code-mixed data has become an all too common practice in the context of social media. The aim of our project was to identify different languages in the processing of code-mixed data. A comparison of different word embedding methods like Continuous Bag of Words (CBOW) and Skip-Gram models was used to generate feature vectors. These vectors are given as input to the machine learning algorithms like Support Vector Machine, Logistic Regression, K-Nearest Neighbors, Gauss Naive Bayes, Adaboost, and Random Forest which yielded in good cross-validation scores. The paper also reveals that Precision, Recall, F-Score, Micro and Macro averaging were used as evaluation measures.
I. Chaitanya, Madapakula, I., Gupta, S. K., and Thara, S., “Word Level Language Identification in Code-Mixed Data using Word Embedding Methods for Indian Languages”, in 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, 2018.