Академический Документы
Профессиональный Документы
Культура Документы
2
CONTENTS
Introduction
Overview of the system
LSTM
C3D
Experiment
Conclusion
References
3
INTRODUCTION
4
OVERVIEW OF THE SYSTEM
5
CONVOLUTIONAL NEURAL NETWORK
6
CONTD…
7
CONTD…
8
CONTD…
9
CONTD…
Max pooling for spatial invariance
Image size reduced.
Flexibility of the neural network to find the distorted features.
Features are preserved.
10
CONTD…
10
CONTD…
12
CONTD…
13
RECURRENT NEURAL NETWORK
14
CONTD…
15
CONTD…
16
LSTM : A SPECIAL RNN
17
CONTD…
18
CONTD…
Forget gate:
= Weight
= New input
= Bias
19
CONTD…
Input gate:
Output gate:
20
CONTD…
21
3D CONVOLUTIONAL NEURAL
NETWORKS(C3D)
22
SUPPORT VECTOR MACHINE
23
CONTD…
Hyperplanes
24
CONTD…
25
EXPERIMENT
26
CONTD…
The CNN features of the faces are taken from fc6 layer of
VGG16-Face model fine-tuned with FER2013 face
emotion database.
27
CONTD…
C3D Architecture
28
CONTD…
29
CONTD…
30
CONTD…
31
CONCLUSION
Accuracy of 59.02%
32
REFERENCES
[1] Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. 2015. Learning spatiotemporal
features with 3d convolutional networks. In 2015 IEEE International Conference on Computer
Vision (ICCV) .4489-4497. IEEE.
[2] Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S.,
Saenko, K., & Darrell, T. 2015. Longterm recurrent convolutional networks for visual
recognition and description. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition.2625-2634.
[3] Yao, A., Shao, J., Ma, N. and Chen,Y. 2015. Capturing AUAware Facial Features and
Their Latent Relations for Emotion Recognition in the Wild. ACM ICMI.
[4] Eyben, F., Wöllmer, M., & Schuller, B. (2010, October). Opensmile: the munich versatile
and fast open-source audio feature extractor. InProceedings of the 18th ACM international
conference on Multimedia. 1459-1462. ACM.
[5] Ebrahimi Kahou, S., Michalski, V., Konda, K., Memisevic, R., and Pal, C. 2015. Recurrent
neural networks for emotion recognition in video. In Proceedings of the 2015 ACM on
International Conference on Multimodal Interaction. 467474. ACM.
[6] Dhall, A., Goecke, R., Lucey, S. and Gedeon, T. 2012. Collecting large, richly annotated
facial-expression databases from movies. IEEE Multimedia.
[7] Liu, M., Wang, R., Li, S., Shan, S., Huang Z. and Chen, X.2014. Combining Multiple
Kernel Methods on Riemannian Manifold for Emotion Recognition in the Wild. ACM ICMI.
33
REFERENCES
[8] Dhall, A., Goecke, R. and Gedeon, T. 2015. Automatic Group Happiness Intensity
Analysis. IEEE Transaction on Affective Computing.
[9] Kahou, S. E., Pal, C., Bouthillier, X., Froumenty, P., Gülçehre, Ç, Memisevic, R.
and Mirza, M. 2013. Combining modality specific deep neural networks for emotion
recognition in video. In Proceedings of the 15th ACM on International conference on
multimodal interaction. 543-550. ACM.
[10] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S.
Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature
embedding. In ACM MM.
[11] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D. and
Rabinovich, A. 2015. Going deeper with convolutions. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition.1-9.
[12] He, K., Zhang, X., Ren, S. and Sun, J. 2015. Deep residual learning for image
recognition. arXiv preprint arXiv:1512.03385.
[13] Simonyan, K., & Zisserman, A. 2014. Very deep convolutional networks for large-
scale image recognition. arXiv preprint arXiv:1409.1556.
34
REFERENCES
[14] Parkhi, O. M., Vedaldi, A., & Zisserman, A. 2015. Deep face recognition. In
British Machine Vision Conference (Vol. 1, No. 3, p. 6).
[15] Deng, J., Dong, W., Socher, R., Li, L. J., Li, K. and Li, F.F., L. 2009. Imagenet: A
large-scale hierarchical image database. In Computer Vision and Pattern
Recognition. CVPR. 248255. IEEE.
[16] Carrier, P. L., Courville, A., Goodfellow, I. J., Mirza, M. and Bengio, Y. 2013 .FER-
2013 face database. Technical report, 1365, Université de Montréal.
[17] Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H. and
Schmidhuber, J. 2009. A novel connectionist system for unconstrained handwriting
recognition. IEEE transactions on pattern analysis and machine intelligence, 31(5),
855-868.
[18] Sak, H., Senior, A. W. and Beaufays, F. 2014. Long shortterm memory recurrent
neural network architectures for large scale acoustic modeling. In INTERSPEECH.
338-342.
[19] Kim, B. K., Dong, S. Y., Roh, J., Kim, G. and Lee, S. Y. 2016. Fusing Aligned and
Non-Aligned Face Information for Automatic Affect Recognition in the Wild: A Deep
Learning Approach. In Computer Vision and Pattern Recognition. CVPR.
35
REFERENCES
[20] Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R. and Li, F.F. 2014.
Large-scale Video Classification with Convolutional Neural Networks.
[21] Ng, J., Hausknecht, M., Vijayanarasimhan S., Monga R., Vinyals O., Toderici
G.2015. Beyond Short Snippets: Deep Networks for Video Classification. In
Computer Vision and Pattern Recognition. CVPR. 4694-4702. IEEE.
[22] Sharma S., Kiros R., Salakhutdinov R.2016 Action Recognition using Visual
Attention. Workshop track - ICLR.
[23] Kaya, H., Gürpinar, F., Afshar, S. and Salah, A. A. 2015. Contrasting and
Combining Least Squares Based Learners for Emotion Recognition in the Wild. In
Proceedings of the 2015 ACM on International Conference on Multimodal Interaction.
459-466. ACM.
[24] Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T.m. and
Saenko, K. 2015. Sequence to sequencevideo to text. In Proceedings of the IEEE
International Conference on Computer Vision. 4534-4542.
[25] Pan, P., Xu, Z., Yang, Y., Wu, F. and Zhuang, Y. 2015. Hierarchical Recurrent
Neural Encoder for Video Representation with Application to Captioning. arXiv
preprint arXiv:1511.03476.
36
REFERENCES
[26] Graves, A., Mohamed, A. R. and Hinton, G. 2013. Speech recognition with deep
recurrent neural networks. In 2013 IEEE international conference on acoustics,
speech and signal processing. 6645-6649. IEEE.
[27] Dhall, A., Goecke, R., Joshi, J., Hoey, J. and Gedeon, T. 2016. EmotiW 2016:
Video and Group-level Emotion Recognition Challenges, ACM ICMI 2016.
[28] Jianguo L., Tao W., Yimin Z. 2011. ICCV: Face Detection using SURF Cascade.
In Computer Vision Workshops.
[29] Fernández, S., Graves, A., Schmidhuber, J. 2007. An application of recurrent
neural networks to discriminative keyword spotting. In International Conference on
Artificial Neural Networks. 220-229. Springer Berlin Heidelberg.
37