The eminence of any machine learning algorithm applied for a computer vision task depends on the representations engineered from image, it's premise that different representations can interweave and ensnare most of the elucidative genes that are responsible for variations in images ,be it rigid, affine or projective. For this reason ,most of vision tasks commend extensive attention and effort to design image pre-processing and feature extracting channels. Hand engineering features requires subtle domain knowledge and are problem specific making researchers elude epitome of representations ,thereby hindering machine learning algorithms reach their real potential .Recently there has been a shift from hand crafting these features for vision problems to automatically learning the best representations from the image dataset either in supervised or unsupervised way. The resulting features learnt are not only optimal but also are very generic i.e. features learned for one vision problem can be used across different vision tasks. The learnt representations act as generic off the shelf features for any high-level vision task without being trained on problem specific dataset.An efficient deep convolution neural network is designed for this purpose. Upper layers of a DNN are supposed to represent more “abstract” concepts(that explain the input image) by learning upon lower layers that extract low-level features such as edges. Each layer of units captures more intricate relations in the given image data dimensions and creates good feature hierarchy that you cannot with shallow ones.