Many users share their information in twitter and they produces large amount of data every day. However, the short kind of tweets created many severe problems in the applications of Information retrieval (IR) and Natural Language processing (NLP). In this paper, we put forward an innovative foundation for tweet segmentation in batch mode, known as HybridSeg. The downstream applications are able to easily withdraw and maintain the semantic or context information, if the tweets are broke into meaningful chunksBoosting the total stickiness score of its candidate segments is the method adopted by HybridSeg to achieve the excellent tweet segmentation. Global context and local context are the two factors which influences stickiness score. For the local context, we suggest and appraise two models which consider the grammatical properties and interdependence in a group of tweets. From the experiments conducted on datasets, it shows that the segmentation quality is improved by considering global as well as local contexts. By conducting experiments and comparing the results, we prove that local grammatical traits are more important for assimilate local context compared with term-interdependence. In this paper we illustrate that more excellence in segmentation is possible by applying part-of-speech method.
cited By 0
Sa Kukku, Reghu, Rb, and Gaina, K. Gc, “Tweet segmentation and its application using random walk and part-of-speech methods”, International Journal of Control Theory and Applications, vol. 9, pp. 7497-7501, 2016.