We basically use a Bag-of-Words framework. We compute the improved dense trajectories to compute Fisher vectors that serve as features. Using the training videos, we compute a mapping function which we conjecture to contain the principal information about each action. Given a temporally untrimmed video, we project it’s feature along this mapping. The transformed features are passed to 1-vs all SVM classifiers framework to get the prediction score of each actions in the given video clip.
Dr. Oruganti Venkata Ramana Murthy and Goecke, R., “Uc–hcc submission to thumos 2014”, THUMOS Challenge: Action Recognition with a Large Number of Classes, 2014.