Deep learning based multimodal with two-phase training strategy for daily life video classification.