Author Guidelines for 8

SOURCE CELL PHONE CAMERA IDENTIFICATION BASED ON SINGULAR VALUE DECOMPOSITION Gökhan Gül1, İsmail Avcıbaş2 1

2

Faculty of Engineering, University of Kiel, Kiel, Germany Electrical and Electronics Engineering Department, Baskent University, Turkey E-mail(s): [email protected], [email protected] ABSTRACT

Micro and macro statistical features based on Singular Value Decomposition (SVD) have been proposed for source cell-phone identification. The performance of the proposed features is evaluated with naïve and informed classifiers for the identification of original images as well as images under several manipulations. The results have been compared to the state-of-the-art and it has been observed that SVD based features are comparable to their counterparts in the literature at reduced complexity. Index Terms— Image forensics, source identification, singular value decomposition. 1. INTRODUCTION The identification of the source is becoming necessary for legal and security reasons with the ever increasing availability of the cell-phones equipped with cameras. Image source identification requires an understanding of the physics and processes of the image formation pipeline. This pipeline is similar for almost all digital cameras, although much of the details are kept as proprietary information of each manufacturer. Digital camera pipeline consist of a lens system, sampling filters, color filter array, imaging sensor, and a digital image processor [1]. After the light enters the camera through the lens, it goes through a combination of filters that includes at least the infra-red and anti-aliasing filters to ensure maximum visible quality. The light is then focused onto imaging sensor. Digital cameras deploy charge-coupled device (CCD) or complimentary metaloxide semiconductor (CMOS) type of imaging sensors. Capturing color images requires separate sensors for each color component, however, due to cost considerations, in most digital cameras, only a single sensor is used along with a color filter array (CFA). The CFA arranges pixels in a pattern so that each element has a different spectral filter. The CFA patterns are most generally comprised of redgreen-blue (RGB) and cyan-magenta-yellow (CMY) color components. The missing color values for each pixel are

obtained through demosaicing operation. This is followed by other forms of processing like white point correction, image sharpening, aperture correction, gamma correction and compression. Although the operations and stages explained here are standard stages in a digital camera pipeline, the exact processing detail in each stage varies from one manufacturer to the other, and even in different camera models manufactured by the same company [2]. There are differences between digital cameras and cellphone cameras. While their imaging pipelines are similar, there are significant differences in quality. The cell-phone cameras result in lower quality images due to several reasons. They have lower resolution, fixed f/number and small aperture stops. Their flashes are not robust due to power constrains and their analog-to-digital conversion (ADC) uses 10 bits instead of 12 bits as typically used in conventional digital cameras. [3] There have been a number of works on camera forensics in literature. Camera identification based on sensor noise is investigated in [4] where they demonstrate that photoresponse nonuniformity noise (PRNU) is unique to each camera. Camera identification based on demosaicing artifacts are investigated in [5], [6] where they exploit the tell-tale effect left by the proprietary interpolation algorithm used for missing color values. Authors in [7] use intrinsic lens radial distortion for camera identification as this distortion varies from one manifacturer to another. Another method uses sensor dust characteristics of digital cameras [8]. This dust pattern may be stable on the sensor surface since most digital cameras do not offer built in solution for sensor dust removal. Authors in [3] extract a number of features from cell-phone images based on the assumption that the processing pipeline specific to a manufacturer, and the camera noise and CCD array nonuniformity will leave telltale effects on the images. Their work in [3] differs from the others in that they employ both feature level and decision level fusion. Our approach is based on the assumption that the image rows/columns will exhibit the CFA interpolation and sensor noise characteristic in the form of relative linear (in)dependecy; as CFA interpolation introduces interpixel correlations and sensor noise is added in an scene-

independent way. We introduce SVD in Section II and SVD based features in Section III. Experimental results and conclusions are given in sections IV and V, respectively. 2. SINGULAR VALUE DECOMPOSITION SVD is an extremely powerful tool in linear algebra. It decomposes a matrix A ∈ IR

M ×N

orthonormal matrices U ∈ IR

into the product of two

M ×M

, V ∈ IR

N×N

and a

M ×N

as follows diagonal matrix S ∈ IR (1) A = USV Τ . The diagonal elements of matrix S are non-negative and sorted in decreasing order σ 1 ≥ σ 2 ≥ L ≥ σ min( M , N ) where M, N are the dimensions of A. These elements produce a vector called singular value vector (2) Sv = Diag (S ) Singular values of a matrix indicate the soft relationship between image rows/columns in terms of linear dependency. More precisely, singular values tend to become zero if the image rows and/or columns tend to become relatively linearly dependent. Two rows/columns, c1 and c2, of a matrix are called linearly dependent if they can be defined as c2=K·c1 where K is an integer. Accordingly, we define relative linear dependency between two rows/columns as the closeness of K to an integer. 3. SVD BASED FEATURES A good model of relative linear dependency of image rows/columns can be good enough to accurately identify the model of a cell-phone. It can be expected that the different CFA interpolation algorithms of different mobile phones as well as the noise produced by the semiconductor sensors introduce telltale effects both on local neighborhoods as well as on image macro blocks, including the image itself. Therefore a common characterization should compromise macro and micro statistics. In the following sub-sections, we derive Sv based features to model relative linear dependency of image macro and micro sub-blocks. 3.1. Micro Statistical Features In order to obtain micro statistical features, images are first divided into sub-blocks of sizes w×w (w=3,4,…20). Then, each singular value is normalized with the sum of the singular values of the related sub-block to reduce the different energy levels of different images. To be able to take into account the correlations within and among the image blocks, the blocks are overlapped proportionally to the block size. Sub-blocks from w=3 to 12 are not overlapped, sub-blocks of size from w=13 to 15 are 50%

overlapped and from w=16 to 20 are 75% overlapped. Consequently, two different types of features are derived using this sub block scheme as follows. Features of Type 1: These features are the means of the number of zeros at index i in Sv of blocks of size w×w. Let B be the number of overlapped blocks as defined above according to the block size, first type of features is defined as

1 ∑ δ (Sv(i)) , w=3,…,20 , i = 1,…,w , B B ⎧1, k = 0 where δ ( k ) = ⎨ . ⎩0, k ≠ 0 f w(1) (i ) =

Features of Type 2: These features are the means of the singular values at index i in Sv of blocks of size w×w.

f w( 2 ) (i ) =

1 ∑ Sv(i) , w=3,…,20 and i = 1,…,w. B B

Note that type 1 and type 2 features have previously been used in [9] for steganalysis purposes. 3.2. Macro Statistical Features Besides the first two types of features, macro statistical features are extracted from the entire image as well as from image macro blocks. The derivation of a singular value vector from an entire image, Sve, is straightforward. For image macro blocks the following procedure is applied to obtain a unique singular value vector: • Divide the image A into four non-overlapping equal size sub-blocks (A1, A2, A3, A4) • Find the mean sub-block As=0.25x(A1+A2+A3+A4) and subtract As from the image sub-blocks Bj = Aj-As, for j=1, 2, 3, 4. • Calculate singular value vector, Svj, of each Bj for j=1, 2, 3, 4 and determine the average Sv, Sva=0.25Σj Svj. • Normalize Sva with the sum of its elements Svn=(1/K)·Sva where K=Σi Sva(i). Using both Sve and Svn, respectively type 3 features are extracted as follows: Features of Type 3: For the considered Sv, ( Sve or Svn) let Sv0=Sv ∀ Sv ≠ 0 be the vector which is obtained by non zero elements of Sv (Sv0 is undefined for zero entries), then third type of features are defined as follows: f j(3) (1) =

f j(3) (2) =

1 1 max(Sv0 ) , (3) f j (3) = MN min(Sv0 ) MN L 1 1 Sv0−1 (i ) , f j( 3) ( 4) = ∑ MNL i MN

∑ (Sv

−1 0

(i) − μ ) 2

i

Sv0 (i )

∑ Sv (Ν / 2 − i + 1) i

0

-2

where j=1, 2 denotes the entire image and image macro blocks respectively, L denotes the length of Sv0 and μ is

4. EXPERIMENTAL RESULTS 4.1. Image Set We have used 200 JPEG images from each camera brand; LG5600 (L1), MotorolaV3 (M1), MotorolaV500 (M2) Nokia5140 (N1), Nokia6230 (N2), Nokia6600 (N3), Nokia7270 (N4), SonyK700 (N5), SonyK750 (N6), which adds up to 1800 images in total. These images have previously been used in [10]. All the images have been converted to gray level before the feature extraction. 4.2. Classification and Feature Selection For the classification we use soft margin support vector machine (C-SVM) classifier with RBF kernel [11]. Optimum SVM parameters for the maximum accuracy are determined before the feature selection and classification process ( γ = 35, C = 1000 ). Half of the 1800 images, 100 from each camera, is used for training and the rest for testing. Sequential Forward Floating Search (SFFS) algorithm is adopted for the feature selection process [12]. Selected 18 features are given in Table 1. TABLE I

SELECTED FEATURES FOR THE CLASSIFICATION OF VARIOUS CELLPHONE BRANDS

f 42 (2)

f 42 (3)

f 52 (2)

f 61 ( 2)

f 82 (1)

f1( 3) (4)

f 82 ( 2)

f 82 (5)

f 82 (6)

f 92 (1)

f102 (3)

f 2( 3) (1)

f 122 ( 4)

f 152 (4)

f 172 (3)

f 201 (2)

f 2(3) (2)

f 2(3) (3)

4.3. Source Cell-Phone Identification

-1

0.4 0.3 0.2

-0.5

0.1

0

0 0 1

feature3

referred to the mean value which is calculated as µ=(1/L)Σi Sv0−1 (i ) . So far, we have determined 207 type 1, 189 type 2 and 8 type 3 features for the feature selection. Type 1 features indicate that the absolute linear dependency of an image can be determined with the sum of the singular values having “0” at a particular index over image sub-blocks. By the use of type 2 features on the other hand, we can model the relative linear dependency of an image. Type 3 features are proposed to characterize the macro statistics. First feature of type three features is the condition number of Sv and is the measure between the energy and ill condition of a matrix in terms of invertiblity. Second and third features define the high frequency singular values’ statistical properties whereas the fourth feature shows the correlation of first N/2 low frequency singular values.

-1.5

2 3 4 5

feature1

Fig. 1. Scatter

feature2

plot of the best three features.

In order to observe the discriminative ability of the features, scatter plot of the best three features in the reduced feature space is given in Fig. 1. where blue, red and green colors correspond to Motorola V3, Sony K700i and Nokia 6230i mobile-phones, respectively. Identification of threesome cameras is often done quite successfully [10]. Detection performance generally drops as the number of cameras increase; therefore we preferred to evaluate the effectiveness of our method directly on nine cameras. In Table 3, confusion matrix of the multilevel classification for 9 different cell-phones is given. The detection accuracy achieved by the features presented in Table 1 is 92.4%. In a previous work, Swaminathan et al achieved 90% detection performance for 9 different camera models. The image set however is obtained by cutting 512x512 non-overlapping 5 sub-images of each image in the image set. This diminishes the variety of the image set since the the sensor noise characteristics for the parts of the same image is correlated. By means of experimental analysis, this can be seen in [13] that the detection accuracy increases if the images are captured from similar content. In a more close work, oya et al. in [10] reported 91.2% detection performance for the same image set with the best 27 features selected by the SFFS algorithm. It is worth noting that 27 features of [10] come from rather different heterogeneous domains; Higher Order Wavelet Statistics (HOWS) [14], Image Quality Measures (IQM) [15] and Binary Similarity Measures (BSM) [16], whereas proposed features come from the analysis of linearity among image rows and columns. On the other hand, 564 total number of features have been obtained from RGB channels of true color images. In this paper we only consider gray level images since the expansion to the RGB channels is straightforward. In [3] it is reported that extracting features only from gray level images lowers the overall performance by five percent compared to the case when all R-G-B bands are used for feature extraction.

TABLE III IDENTIFICATION PERFORMANCE OF THE PROPOSED METHOD FOR 9 DIFFERENT CAMERA MODELS L1 M1 M2 N1 N2 N3 N4 S1 S2 96 0 0 2 0 2 0 0 0 L1 M1

0

94

5

1

0

0

0

0

0

M2

0

8

N1

7

0

91

0

0

1

0

0

0

0

84

2

0

6

0

1

N2

0

0

N3

9

0

0

0

97

3

0

0

0

0

0

0

91

0

0

0

N4

8

1

S1

0

0

0

3

7

0

81

0

0

0

0

0

0

0

100

0

S2

1

0

0

0

0

0

1

0

98

TABLE IV IMAGE MANIPULATION TYPES AND PARAMETERS Parameters Brightness 1 5 10 15 25 Contrast

1

5

10

15

25

Upsampling (%)

1

5

10

15

25

30

40

50

30

40

50

30

45

Downsampling(%)

1

5

10

15

25

Cropping (%)

10

20

30

40

50

Rotation (o)

1

3

5

10

15

25

JPEG Compression

40

50

60

70

80

90

Sharpen

Photoshop Default

TABLE V OVERALL IDENTIFICATION PERFORMANCES OF THE PROPOSED METHOD AND [10] Naïve Informed Brightness

[10] U

Proposed 0.85

[10] U

Contrast Upsamp.

Proposed 0.88

0.68

0.80

0.80

0.86

U

0.22

U

0.82

Downsamp.

0.13

0.17

0.70

0.79

Cropping

0.76

0.24

0.81

0.80

Rotation

0.15

0.20

0.71

0.70

Compress.

0.58

0.64

0.74

0.76

Sharpen

0.58

0.77

0.81

0.78

Overall detection performance using the same 9 cameras for R-G-B images reported in [3] is 96.8%. This is 4.4% higher than the detection performance we obtained in this paper. However, note that this performance comes with 18 features extracted from grayscale images compared to 192 features extracted from RGB images.

attacks. After the image formation process, an image might have been subjected to severe post processing such as jpeg compression or cropping. More in details, one could have compressed the image to be in compliance with the webpage criterions innocently or one could have cropped the undesired details maliciously. In both cases we might have asked to find the source of the image for legal purposes. Therefore under image manipulations, detection of the source of the camera is inevitable by means of forensic analysis. In Table 4, we give the image manipulation types and their parameters that we consider in this paper. In the following, we perform two different experiments. In the first experiment, the classifier is not informed with the manipulated images therefore called as naïve classifier and in the second experiment we train the classifier both with the original and the manipulated images and therefore called as informed classifier. In the training phase, 900 original images are used for the naïve classifier and 900*46 manipulated and 900 original in total 42.300 images for the informed classifier. Note that in neither of the tests we overlap the images of training and testing phases. In the testing phase, for both of the classifiers, we used the remaining 42.300 images. In Table 5, we compare overall identification performances of the proposed method with [10] where U denotes that no tests have been provided for that experiment. The results are reported as the averages obtained for various parameter settings under each manipulation as given in Table 4. Some manipulations such as downsampling, cropping, rotation severely drops the identification performance. This indicates that under geometrical attacks, proposed features are not robust enough as it was observed in [10]. However, performance improves with informed classification. Especially for contrast, scaling and JPEG compression attacks, both for naïve and informed classification, the proposed SVD based features outperform [10]. 5. CONCLUSION We have proposed SVD based forensic features for cellphone camera source identification. We experimentally showed that the imaging pipeline within each cell-phone brand leave tell-tale effects in the form of linear (in)dependency within image rows/columns. The comparison with the state-of-the-art demonstrates that with considerably less number of features, comparable level of performance can be obtained. These results can also be translated to a better generalization capability of the proposed method.

4.4. Robustness Analysis In this section we investigate the potential of the proposed method under several image manipulations or

Acknowledgement: This work has been supported in part by TUBİTAK Kariyer Project 104E056.

11. REFERENCES [1] J. Adams, K. Parulski and K. Sapulding, “Color processing in digital cameras,” IEEE Micro, Vol. 18, No.6, Jun. 1998. [2] H. T. Sencar, and N. Memon, “Overview of State-of-the-art in Digital Image Forensics”, Part of Indian Statistical Institute Platinum Jubilee Monograph series titled 'Statistical Science and Interdisciplinary Research,' World Scientific Press, 2008. [3] O. Çeliktutan, B. Sankur, I. Avcıbaş, “Blind Identification of Source Cell-phone Model”, IEEE Trans. Information Security and Forensics, IEEE IFS-3, 553-566, September 2008. [4] J. Lucas, J. Fridrich, and M. Goljan, “Digital camera identification from sensor noise,” IEEE Trans. Inf. Forensics Security, vol. 1, no. 2, pp.205–214, Jun. 2006. [5] M. Kharrazi, H. T. Sencar, and N. Memon, “Blind source camera identification,” in Proc. Int. Conf. Image Processing, 2004, vol. 1, pp. 709–712. [6] Y. Long and Y. Huang, “Image based source camera identification using demosaicking,” in Proc. IEEE 8th Workshop Multimedia Signal Processing, Oct. 2006, pp. 419–424. [7] K. S. Choi, E. Y. Lam, and K. Y. Wong, “Automatic source identification using the intrinsic lens radial distortion,” Opt. Express, vol. 14, no. 24, pp. 11551–11565, Nov. 2006. [8] E. Dirik, H. T. Sencar, and N. Memon, “Source camera identification based on sensor dust characteristics,” in Proc. Signal Processing Applications Public Security Forensics, Apr. 11–13, 2007, pp. 1–6. [9] G. Gul, A.Emir Dirik, I. Avcibas, “Steganalytic Features for JPEG Compression Based Perturbed Quantization”, IEEE Signal Processing Letters, vol.14, pp. 205-208, 2007. [10] O. Çeliktutan, B. Sankur, I. Avcıbaş, “Blind Identification of Cellular Phone Cameras”, 19th SPIE Electronic Imaging Conf. 6505: Security, Steganography, and Watermarking of Multimedia Contents, Jan. 2007, San Jose. [11] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm [12] P. Pudil, J. Novovicova, and J. Kittler, “Floating search methods in feature selection”, Pattern Recognition Letters, 15, pp. 1119-1125, 1994. [13] Min-Jen Tsai, Cheng-Liang Lai, Jung Liu, “Camera/Mobile Phone Source Identification for Digital Forensics”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2007, Hawaii USA.

[14] S. Lyu and H. Farid, “Detecting Hidden Messages Using Higher-Order Statistics and Support Vector Machines,” Proc. of Information Hiding Workshop, 2002. [15] I. Avcibas, N. Memon and B. Sankur, “Steganalysis using Image Quality Metrics,” IEEE Transactions on Image Processing, Jan. 2003. [16] I. Avcibas, M. Kharrazi, N. Memon, B. Sankur, “Image Steganalysis with Binary Similarity Measures,” EURASIP J. on Applied Signal Processing, vol. 2005, no. 17, pp. 2749-2757, Sept. 2005.