M. Liang and X. Hu. Recurrent convolutional neural network for object recognition. In Proc. CVPR, 2015. [2]. P. Lucey, J. F. Cohn, K. M. Prkachin, P. E. Solomon, ...
RECURRENT CONVOLUTIONAL NEURAL NETWORK REGRESSION FOR CONTINUOUS PAIN INTENSITY ESTIMATION IN VIDEO Jing
1,2 Zhou ,
Xiaopeng
*,1 Hong ,
Fei
2 Su ,
and Guoying
1 Zhao
1. Faculty of Information Technology and Electrical Engineering, the University of Oulu, Finland 2. Beijing University of Posts and Telecommunications Email: {xhong, gyzhao}@ee.oulu.fi
3. Our Method (Cont.)
1. Abstract
For continuous-valued predictions
Automatic pain intensity estimation is significant healthcare & medical field
We propose a recurrent convolutional neural network based regression framework End-to-end Real-time Achieves promising accuracy on the UNBC-McMaster Shoulder Pain Expression Archive Database
2. Motivations Traditional static methods extract features from frames separately in a video unstable changes and peaks among adjacent frames
Our solution Consider a sufficiently large historical frames Limit the scale of the parameters within the model Rely on the recurrent convolutional neural network (RCNN)
3. Our Method A Video sequence A sequence of frame vectors
Face Warping
N
H
Table 1: A summary of the main network configurations. D, n, k, s and p stand for the feature dimension of each frame, the number of frames in one sample, kernel size, stride and pooling size in related layers resp. ‘Conv’ and ‘MaxP’ are short for convolutional and max pooling resp. Layer Input Conv 1 MaxP 1 RCL 2 MaxP 2 RCL 3
Configurations D: 713, n: 30, Chanel: 3 256 3 × 3 kernels, s: 1 p: 4 × 1, s: 4 × 1 256 3 × 3 kernels, s: 1, T: 3 p: 4 × 1, s: 4 × 1 256 3 × 3 kernels, s: 1, T: 3
Layer MaxP 3 RCL 4 MaxP 4 RCL 5 MaxP 5 Output
Configurations p: 4 × 4, s: 4 × 4 256 3 × 3 kernels, s: 1, T: 3 p: 2 × 2, s: 2 × 2 256 3 × 3 kernels, s: 1, T: 3 p: 1 × 1, s: 1 × 1 Pain Intensity
4. Experimental Results UNBC-McMaster Shoulder Pain Archive Dataset [2] 25 subjects 48,398 frames of 320x240 pixels 16-level PSPI ground-truth
Leave-one-subject-out cross validation (25-fold) Performance Measure
Warped faces
Flattening FV 1 & Concatenating
Activation function of the output layer linear function Loss function mean squared error (MSE) function Training minimize loss by back-propagation through time
RCNN
Pain Intensity of Frame 1
RCNN
Pain Intensity of Frame n
RCNN
Pain Intensity of Frame N
FV n
Average Mean Squared Error (MSE) Pearson Product-moment Correlation Coefficient (PCC)
…
…
FV N W
Figure 1: Framework of the proposed pain intensity estimation approach.
Input: vector sequences of AAM-warped facial images A sliding-window strategy to obtain fixed-length input samples for the recurrent network Carefully designed architecture of RCNN (Fig. 2 left) Output: continuous-valued pain intensity. Input
C1
FM1
RCLm
Output
Pooling
Convolutions
FM2
Unfolding
……
Table 2: Comparative results Methods
FM0
RCL1
Figure 3: An example sequence. (The frames marked by ‘X’ on the GroundTruth are with No.150, 210, 260, 290, 330, and 390).
[T]
FMT
Pooling
Full Connection
Pooling
Figure 2: Left: the overall architecture of RCNN. Right: Unfolding an RCL
Each RCL consists of several iterative convolutions sharing weights in hidden layers among T+1 time steps
If unfolding an RCL, the layer a feedforward subnetwork with a depth of T+1 (Fig. 2 right)
PTS [3] DC [3] LBP [3] (DCT+LBP)/RVR [3] 2Standmap [4] Hessian Histograms [5] Gradient Histograms [5] Hess+Grad [5] CNN(VGG-face) SVR Our method
MSE 2.59 1.71 1.81 1.39 1.42 3.76 4.76 3.35 1.70 1.54
PCC 0.36 0.55 0.48 0.59 0.55 0.25 0.34 0.41 0.43 0.65
Our method ➢ the best PCC ➢ the 3rd least MAE 25 fps testing speed ➢ NVIDIA(R) tesla K80 GPU ➢ 320G RAM ➢ 2.3G Hz Intel Xeon E5-2650 CPU
Take Home Messages We propose a real-time recurrent convolutional neural network based regression framework for pain intensity estimation We show that using the deep recurrent neural network for pain estimation is promising and potential
[1]. M. Liang and X. Hu. Recurrent convolutional neural network for object recognition. In Proc. CVPR, 2015. [2]. P. Lucey, J. F. Cohn, K. M. Prkachin, P. E. Solomon, S. Chew, and I. Matthews. Painful monitoring: Automatic pain monitoring using the Unbc-Mcmaster shoulder pain expression archive database. Image and Vision Computing, 30(3):197–205, 2012. [3]. S. Kaltwang, O. Rudovic, and M. Pantic. Continuous pain intensity estimation from facial expressions. In Advances in Visual Computing, pages 368–377, 2012. [4]. X. Hong, G. Zhao, S. Zafeiriou, M. Pantic, and M. Pietik¨ainen. Capturing correlations of local features for image representation. Neurocomputing, 184:99 – 106, 2016. [5]. C. Florea, L. Florea, and C. Vertan. Learning pain from emotion: transferred hot data representation for pain intensity estimation. ECCV Workshops, 2014.