recurrent convolutional neural network regression for

RECURRENT CONVOLUTIONAL NEURAL NETWORK REGRESSION FOR CONTINUOUS PAIN INTENSITY ESTIMATION IN VIDEO Jing

1,2 Zhou ,

Xiaopeng

*,1 Hong ,

Fei

2 Su ,

and Guoying

1 Zhao

1. Faculty of Information Technology and Electrical Engineering, the University of Oulu, Finland 2. Beijing University of Posts and Telecommunications Email: {xhong, gyzhao}@ee.oulu.fi

3. Our Method (Cont.)

1. Abstract

 For continuous-valued predictions

 Automatic pain intensity estimation is significant  healthcare & medical field

 We propose a recurrent convolutional neural network based regression framework  End-to-end  Real-time  Achieves promising accuracy on the UNBC-McMaster Shoulder Pain Expression Archive Database

2. Motivations  Traditional static methods  extract features from frames separately in a video   unstable changes and peaks among adjacent frames

 Our solution  Consider a sufficiently large historical frames  Limit the scale of the parameters within the model  Rely on the recurrent convolutional neural network (RCNN)

3. Our Method A Video sequence A sequence of frame vectors

Face Warping

N

H

Table 1: A summary of the main network configurations. D, n, k, s and p stand for the feature dimension of each frame, the number of frames in one sample, kernel size, stride and pooling size in related layers resp. ‘Conv’ and ‘MaxP’ are short for convolutional and max pooling resp. Layer Input Conv 1 MaxP 1 RCL 2 MaxP 2 RCL 3

Configurations D: 713, n: 30, Chanel: 3 256 3 × 3 kernels, s: 1 p: 4 × 1, s: 4 × 1 256 3 × 3 kernels, s: 1, T: 3 p: 4 × 1, s: 4 × 1 256 3 × 3 kernels, s: 1, T: 3

Layer MaxP 3 RCL 4 MaxP 4 RCL 5 MaxP 5 Output

Configurations p: 4 × 4, s: 4 × 4 256 3 × 3 kernels, s: 1, T: 3 p: 2 × 2, s: 2 × 2 256 3 × 3 kernels, s: 1, T: 3 p: 1 × 1, s: 1 × 1 Pain Intensity

4. Experimental Results  UNBC-McMaster Shoulder Pain Archive Dataset [2]  25 subjects  48,398 frames of 320x240 pixels  16-level PSPI ground-truth

 Leave-one-subject-out cross validation (25-fold)  Performance Measure

Warped faces

Flattening FV 1 & Concatenating

 Activation function of the output layer  linear function  Loss function  mean squared error (MSE) function  Training  minimize loss by back-propagation through time

RCNN

Pain Intensity of Frame 1

RCNN

Pain Intensity of Frame n

RCNN

Pain Intensity of Frame N

FV n

 Average Mean Squared Error (MSE)  Pearson Product-moment Correlation Coefficient (PCC)

…

…

FV N W

Figure 1: Framework of the proposed pain intensity estimation approach.

 Input: vector sequences of AAM-warped facial images  A sliding-window strategy to obtain fixed-length input samples for the recurrent network  Carefully designed architecture of RCNN (Fig. 2 left)  Output: continuous-valued pain intensity. Input

C1

FM1

RCLm

Output

Pooling

Convolutions

FM2

Unfolding

……

Table 2: Comparative results Methods

FM0

RCL1

Figure 3: An example sequence. (The frames marked by ‘X’ on the GroundTruth are with No.150, 210, 260, 290, 330, and 390).

[T]

FMT

Pooling

Full Connection

Pooling

Figure 2: Left: the overall architecture of RCNN. Right: Unfolding an RCL

 Each RCL consists of several iterative convolutions  sharing weights in hidden layers among T+1 time steps

 If unfolding an RCL, the layer   a feedforward subnetwork with a depth of T+1 (Fig. 2 right)

PTS [3] DC [3] LBP [3] (DCT+LBP)/RVR [3] 2Standmap [4] Hessian Histograms [5] Gradient Histograms [5] Hess+Grad [5] CNN(VGG-face) SVR Our method

MSE 2.59 1.71 1.81 1.39 1.42 3.76 4.76 3.35 1.70 1.54

PCC 0.36 0.55 0.48 0.59 0.55 0.25 0.34 0.41 0.43 0.65

 Our method ➢ the best PCC ➢ the 3rd least MAE  25 fps testing speed ➢ NVIDIA(R) tesla K80 GPU ➢ 320G RAM ➢ 2.3G Hz Intel Xeon E5-2650 CPU

Take Home Messages  We propose a real-time recurrent convolutional neural network based regression framework for pain intensity estimation  We show that using the deep recurrent neural network for pain estimation is promising and potential

[1]. M. Liang and X. Hu. Recurrent convolutional neural network for object recognition. In Proc. CVPR, 2015. [2]. P. Lucey, J. F. Cohn, K. M. Prkachin, P. E. Solomon, S. Chew, and I. Matthews. Painful monitoring: Automatic pain monitoring using the Unbc-Mcmaster shoulder pain expression archive database. Image and Vision Computing, 30(3):197–205, 2012. [3]. S. Kaltwang, O. Rudovic, and M. Pantic. Continuous pain intensity estimation from facial expressions. In Advances in Visual Computing, pages 368–377, 2012. [4]. X. Hong, G. Zhao, S. Zafeiriou, M. Pantic, and M. Pietik¨ainen. Capturing correlations of local features for image representation. Neurocomputing, 184:99 – 106, 2016. [5]. C. Florea, L. Florea, and C. Vertan. Learning pain from emotion: transferred hot data representation for pain intensity estimation. ECCV Workshops, 2014.