Abstract Phishing is a web-based criminal act. Phishing sites lure sensitive information from naive online users by camouflaging themselves as trustworthy entities. Phishing is considered an annoying threat in the field of electronic commerce. Due to the short lifespan of phishing webpages and the rapid advancement of phishing techniques, maintaining blacklists, white-lists or employing solely heuristics-based approaches are not particularly effective. The impact of phishing can be largely mitigated by adopting a suitable combination of all these techniques. In this study, the characteristics of legitimate and phishing webpages were investigated in depth, and based on this analysis, we proposed heuristics to extract 15 features from such webpages. These heuristic results were fed as an input to a trained machine learning algorithm to detect phishing sites. Before applying heuristics to the webpages, we used two preliminary screening modules in this system. The first module, the preapproved site identifier, checks webpages against a private white-list maintained by the user, and the second module, the Login Form Finder, classifies webpages as legitimate when there are no login forms present. These modules help to reduce superfluous computation in the system and in addition reducing the rate of false positives without compromising on the false negatives. By using all of these modules, we are able to classify webpages with 99.8% precision and a 0.4% of false positive rate. The experimental results indicate that this method is efficient for protecting users from online identity attacks.
cited By 6
Dr. Gowtham R. and Krishnamurthi, I., “A comprehensive and efficacious architecture for detecting phishing webpages”, Computers & Security (Elsevier Advanced Technology) (Impact Factor: 1.158, SCI Indexed), vol. 40, pp. 23 - 37, 2014.