compressed domain authentication of live video - CiteSeerX

COMPRESSED DOMAIN AUTHENTICATION OF LIVE VIDEO Razib Iqbal, Shervin Shirmohammadi, and Jiying Zhao Distributed and Collaborative Virtual Environments Research Laboratory (DISCOVER Lab) School of Information Technology and Engineering (SITE) University of Ottawa, 800 King Edward Ave., Ottawa, ON, Canada K1N 6N5 [riqbal | shervin | jyzhao] @site.uottawa.ca ABSTRACT Video authentication techniques generally do not allow any form of manipulation, thus making it impractical for adaptation operations. Moreover, conventional way to adapt and authenticate compressed video content is to perform cascaded decoding and re-encoding operations in some trusted intermediary nodes. In this paper, we propose an authentication scheme for live H.264 encoded video bitstream to verify integrity at the receiver’s side. The proposed design utilizes MPEG-21 gBSD for hard authentication in the compressed domain. The scheme uses content-based authentication data which is derived from a hash value and embeds a fragile watermark. System design and proof of concept implementation approach along with the performance evaluation is presented in detail. Index Terms— Video coding; video adaptation; video authentication; H.264; MPEG-21.

1. INTRODUCTION To embed watermark in live video streams, it is occasionally convenient to compute and embed the digital signature for the live video stream in small parts. This approach saves computation time and effort for the signature computation of individual frames. In this paper we propose to embed the authentication bits in the live video bitstream after the adaptation operations and in the compressed domain. This particular technique is very useful in video surveillance systems where video data being captured by wireless/dispersed cameras is transmitted to distant receivers in a heterogeneous network. Authentication of captured video thus requires embedding signatures in real time to scrutinize the integrity of the received content in a contained environment. Towards this, we have used the generic Bitstream Syntax Description (gBSD) to select the marking space directly from the compressed video bitstream (i.e. H.264 video) rather than decoding any part of the media. Digital Item Adaptation (DIA), MPEG-21 Part 7 [1] specifies the syntax and semantics of tools to assist

adaptation of Digital Items (DI). A DI is denoted as a bitstream together with all its relevant descriptions (a.k.a. metadata). A generic Bitstream Syntax Schema (gBS Schema) is specified in the MPEG-21 framework, to perform adaptation in an intermediary node in a format independent way. A description conforming to this schema is called a gBSD. The gBSD provides an abstract view on the structure of the bitstream that can be used in particular when the availability of a specific bitstream schema is not ensured. H.264 is the latest video coding and compression standard by ITU-T and ISO/IEC. It offers an entropy coding design which includes Context-Adaptive Binary Arithmetic Coding (CABAC) and Context Adaptive Variable Length Coding (CAVLC). In H.264 video, data is entropy coded. In order to achieve byte-alignment, sequence of bits is padded by the encoder when necessary. In our previous work [2], we utilized MPEG-21 gBSD to perform temporal adaptation and encryption of the H.264 bitstream in the compressed domain. In this paper, we present an authentication technique which uses a fragile watermarking scheme. The system utilizes the gBSD to select the marking space. The procedure presented here requires low computational effort, low processing time and works on top of the existing adaptation framework where no cascaded decompressionrecompression operation is performed. The rest of the paper is organized as follows: in Section 2, a brief review of the related work is detailed. The design of the proposed system is depicted in Section 3. Section 4 presents the implementation technique and performance of the proposed system. Finally, we draw the conclusion in Section 5. 2. LITERATURE REVIEW Authentication of compressed video is trivial in literature, e.g. Peng Yin and H.H. Yu proposed semi-fragile watermarking of MPEG videos [3] which can survive the bit-rate transcoding. However, most video watermarking and authentication techniques require cascaded operations within a video adaptation scenario. DCT coefficient based embedding systems [4]-[6] embed binary watermark bits in DCT domain derived from different extracted features, for example, human visual model adapted for a 4×4 DCT

block [4], relations between predicted DCT coefficients and real DCT coefficients [5]. Watermarking method proposed by Qiu et al. [6] embeds a robust watermark into the DCT domain and a fragile watermark into the motion vectors during H.264 compression. J. Zhang and A.T.S. Ho [7] proposed a scheme that uses the treestructured motion compensation, motion estimation and lagrangian optimization of the H.264 standard. The authentication information is represented by a binary watermark sequence and embedded into video frames. Dima Pröfrock et al. [8] proposed a new transcoder, which analyses the original H.264 bit stream, computes a watermark, embeds the watermark for hard authentication and generates a new H.264 bitstream. All of these techniques either embed watermarks during the encoding process of the H.264 video [6][7] or employ cascaded decompression and recompression operations [4][5][8] to analyze H.264 bitstream and embed the watermark. Compared to above works, our proposed approach is more competent in an adaptation scenario.

node requires the DI (H.264 video along with its gBSD) to be available. This DI is the original content for the resource server or for the content provider on the delivery path. In MPEG-21 framework, generation of the gBSD from binary data is not normatively specified. So, the gBSD is generated during the encoding process of the bitstream. Uncompressed video is the input to the encoder. The encoder encodes the raw video to the compressed bitstream conforming to ITU-T H.264 specification, and also generates the corresponding gBSD. A sample gBSD is shown in Figure 2.

3. SYSTEM DESIGN

Figure 2. Sample gBSD

3.2. Adaptation

Figure 1. Proposed architecture

In our framework, we have followed the temporal adaptation approach [2] for the H.264 video dynamically and directly from the compressed bitstream applying the MPEG-21 gBSD. Here, we make use of the gBSD (in the form of XML) to identify the segments into which the authentication bits can be embedded. For XSLT (language for transforming XML documents), complete XML description must need to be loaded before being adapted. This is a shortcoming of the adaptation architecture in live scenarios. In any case, to offer live adaptation on top of the existing implementation, live video stream is processed as small clips in a pseudo live fashion. In the video recorder, the encoder encodes the raw video input to H.264 video and generates the gBSD for each clip as well. A watermark embedder is installed in the adaptation engine to embed authentication bits during the adaptation. The benefits of this approach are: 1) the gBSD is parsed only once while adapting, and 2) authentication bits are computed and embedded in the adapted frame(s). Figure 1 summarizes the video capture, DI generation, adaptation and watermark embedding.

Inside the adaptation engine, adaptation of resource (i.e. H.264 bitstream) and description (i.e. gBSD) are performed in 2 steps. At first the gBSD is transformed via XSLT, and then, based on the transformed gBSD, the original bitstream is modified. For the gBSD transformation, an XSL style sheet defines the template rules and describes how to display a resulting document. The XSLT processor takes a tree structure as its input by parsing the gBSD and generates another tree structure as its output into adapted gBSD. The next step is the generation of the adapted bitstream using the transformed gBSD. Adaptation module first initializes and parses the adapted gBSD. It then extracts the parsed gBSD information from the video stream. Adapted H.264 bitstream is finally generated by discarding gBSD portions corresponding to specific frames. 3.3. Watermark Embedder The watermark embedder module is proposed to be implanted inside the adaptation engine to embed authentication bits on the fly while adapting. In this regard, we propose a 5-step process described below: Step-1: Select the marking space – From the gBSD, marking space can be selected from available alternatives, like frame, slice, macroblock, and block (Figure 3).

3.1. Creation of Compressed Video and gBSD Adaptation and authentication in a trusted intermediary

Figure 3. Marking space selection from gBSD

Application specific marking space can be selected in a predefined way and a fixed watermark embedder can be designed. Otherwise, if marking space is selected manually, the watermark embedder should be capable of inserting watermark bits in the selected segment directly in the compressed bitstream. For manual selection of marking space, start and length of each segment need to be defined in the gBSD. Finally, selection of a marking space and applying customized modification must conform to H.264 bitstream specification to avoid incompliance for a standard player or decoder. In our implemented system, we have made use of the slice data of frames from the individual video clips to compute authentication bits and finally embedded these bits in the slice header. Step-2: Calculate the size of the marking space – Marking space size calculation helps to figure out the length of the hash value to be computed from each compressed video clip in advance. In H.264 video coding, a picture can be split into one or several slices where slice sizes are flexible. Slices are self contained and can be decoded without using data from other slices. Slice header VLC byte-align bits (minimum 1 bit and maximum 7 bits) can be utilized to embed the authentication bits. But the efficiency depends on the capture time, frame rate, and the total number of byte-align bits in a clip. So the size of the marking space (Mspace) can be calculated as follows – M space

=

CaptureTime × FrameRate × VLCByteAlignBits

Step-3: Compute the Hash value – A hash value can be computed based on the individual frame data or for the whole video clip. Essentially the size of hash value depends on the size of the marking space. A private key is an essential input to the hash function (Figure 4.A). Importantly, complicated or robust hash functions will require higher execution time to compute and embed the authentication bits.

Figure 4. A. Computing the hash value, B. Embedding the authentication bits in Slice

Step-4: Insert authentication bits to the marking space – For hard authentication, authentication data is embedded as fragile watermark. If slice header is considered to embed the authentication bits then the frame rate after adaptation plays a significant role which will be discussed briefly in the next section. Step-5: Optional second level authentication – After embedding the authentication bits, an optional second level of authentication can be applied by scrambling the individual frame data or slice payload to restrict re-

computation of the signature by an intruder even though a private key is being used (Figure 4.B). A predefined palette can be used in this regard so that scrambled portions can be reinstated for re-computation of the hash value and/or for viewing. 3.4. Watermark Detector The watermark detector consists of a 4 step process. The first step, parsing adapted gBSD, is for extracting the marking space from the adapted gBSD to identify each watermarked segment. The second step, restore frame data, comprises of restoring the scrambled frame data or slice payload bits for computing original hash value. The third step, watermark extraction, extracts the authentication bits from the slice header. The final step corresponds to comparing the computed hash value from the slice data with the extracted value from the slice header. To verify a received video content, the user needs, the private key, palette (if there any) and the adapted gBSD in addition to the video data. Secure transmission of these data is beyond the scope of this research. In case of a video content for mass distribution without any priority given to authentication, it is not necessary to modify the decoders for every client and thus adapted gBSD need not to be transmitted to the receiver. In the latter case, typical H.264 players will be able to play the content without any prior knowledge of the modifications made to the content since the content structure should conform to the H.264 bitstream syntax structure. 4. IMPLEMENTATION AND PERFORMANCE To generate the DI, we have modified the ITU-T reference software implementation JM 9.5 [10]. In the adaptation engine, while adapting the video bitstream, we embed the authentication bits in the slice header VLC byte align bits. It is important to mention that this marking space can be further extended to other entities like frame and macroblock based on the gBSD details. While transforming the video bitstream for adaptation, a gBSD parser scans each line from the adapted gBSD. syntacticalLabel, start and length values are tokenized by each field according to their size from the line buffer. From the gBSD, the length of the slice header VLC byte align bits (vlcn) of each frame is extracted and the summation of that for each video clip is calculated (vlcnT=∑vlcn) and stored. Total number of bits in a slice (SN) is the sum of slice header (SH(N)) bits and slice payload (SP(N)) bits, denoted as, SN = SH(N)+SP(N), where N = total number of bits. From live video stream, we take 20 second video clips of size 128×96 at 15 frames per second (fps) for processing. This means that if no adaptation is required then for each clip in best case (7 vlcn/slice) we shall get 2100bits to embed the signature bits for that video clip. In worst case (1 vlcn/slice), we shall get only 300bits. If adaptation is applied then the lower target frame rate will reduce the size of the marking

space which is illustrated in Table 1. Table 1. Sample vlcnT bits after adaptation Adapted Worst case Average case Best Case Frame Rate (1 bit/slice) (3 bits/slice) (7 bits/slice) 1 20 bits 60 bits 140 bits 3 60 bits 180 bits 420 bits 5 100 bits 300 bits 700 bits 10 200 bits 600 bits 1400 bits 15 300 bits 900 bits 2100 bits Capture frame rate: 30 fps., SQCIF, Capture Time: 20 Sec., Total frames: 600

The second step is to compute the hash value (FHash) based on an external private key and the video data. The architecture applies a simple hash function based on PJW Hash [9] which can be replaced by any available advanced hash function. Input to the hash function is the video data (except the bits where the authentication bits will be embedded), length of data (in bytes) and a private key (PK). For implementation purpose, we have considered a logo/sample image of an arbitrary length (LN) as our private key. The length of the resulting signature is restricted to the size of vlcnT. Authentication bits are then embedded in the vlcn bits accordingly. If the marking space is too short (e.g. less than 100 bits) and computed authentication signature is larger than that then we propose to use the rest of the signature bits as a temporary private key for the next video clip. The embedding process can be convoluted using small random operations like XOR-ing the signature bits with the private key while inserting in the slice as denoted below: SH (N − vlcn+ j) = FHash ( j) ⊕ PK (i)

5. CONCLUSION In the proposed approach, adaptation and watermarking is performed in the compressed domain without any cascading operation. The adapted and watermarked video is H.264 compliant and bit rate is not changed for the watermarking. The watermark embedding time compared to that of adaptation is negligible for live video processing because total number of frames and frame size in each video clip is small compared to pre-recorded videos. For authentication, the original H.264 video is not required; rather a separate authenticator can verify the validity of the received video data. The authenticator is independent of the decoder, so there will be no lag added while decoding the video. For live video capturing, adaptation and authentication, a hardware level implementation capable of generating the compressed bitstream and gBSD will definitely speed up the process. Beyond the developments presented above, embedding robust watermarks in the compressed domain for copyright protection purposes is the focus of current study for this research team. 6. REFERENCES

where, 1 ≤ j ≤ vlcn, 1 ≤ i ≤ LN

The final step is to apply the optional second level authentication by scrambling the slice payload to restrict the hash value calculation for that frame. In H.264, blocks and macroblocks are not byte-aligned, so XOR operation is applied to the last byte of slice payload (Slbsp) with respect to the private key like that of the slice header. The modified slice payload will add another layer of assessment to detect possible attacks. Modification made to the slice payload can be shown as:

S lbsp = S lbsp ⊕ PK (i)

clips. We can see that time required to embed authentication bits is little higher than the adaptation time. The total delay due to adaptation and embedding time is acceptable in such real time systems.

where , 1 ≤ i ≤ LN

Table 2. Performance of the watermarking module Adapted Temporal Adaptation Temporal Adaptation + Authentication Frame Rate 1 78 ms 91 ms 3 250 ms 269 ms 5 422 ms 448 ms 10 844 ms 873 ms 12 1016 ms 1047 ms Capture frame rate: 15 fps., SQCIF, Capture time: 20 Sec., Total frames: 300

Our proposed authentication technique is integrated in the adaptation system which performs temporal adaptation. All the necessary data for authentication are embedded in the compressed bitstream and cannot get lost. Quality degradation of the resulting video is due to the frame dropping only and not for embedding the authentication bits. An Intel P4 3.4 Ghz, O/S Win XP Pro SP2, 1GB RAM PC was selected as the media resource/streaming server. Table 2 compares adaptation and watermark embedding performance for live video

[1] ISO/IEC 21000-7:2004, Information Technology Multimedia Framework – Part 7: Digital Item Adaptation.

–

[2] R. Iqbal, S. Shirmohammadi, and A. El Saddik, “A Framework for MPEG-21 DIA Based Adaptation and Perceptual Encryption of H.264 Video”, Proc. SPIE/ACM MMCN, Vol. 6504, pp. 650403-1 – 650403-12, 2007. [3] Peng Yin and H.H. Yu, “Semi-fragile watermarking system for MPEG video authentication”, Proc. IEEE ICASSP, Vol.4, pp. IV-3461- IV-3464, 2002. [4] M. Noorkami and R. M. Mersereau, “Towards Robust Compressed-Comain Video Watermarking for H.264”, Proc. SPIE Security, Steganography, and Watermarking of Multimedia Contents, Vol. 6072, pp. 489-497, 2006. [5] G. Wu, Y. Wang, and W. Hsu “Robust Watermark Embedding Detection Algorithm for H.264 Video”, Journal of Electronic Imaging, Vol. 14, 2005. [6] G. Qiu, P. Marziliano, A.T.S. Ho, D. He, and Q. Sun, “A Hybrid Watermarking Scheme for H.264/AVC Video”, Proc. Intl. Conf. on Pattern Recognition, Vol. 4, pp. 865-869, 2004. [7] J. Zhang and A.T.S. Ho, “Efficient Video Authentication for H.264/AVC”, Proc. Intl. Conf. on Innovative Computing, Information and Control, Vol. 3, pp. 46-49, 2006. [8] D. Pröfrock, H. Richter, M. Schlauweg, E. Müller, “H.264/AVC Video Authentication Using Skipped Macroblocks for an Erasable Watermark”, Proc. SPIE VCIP, Vol. 5960, pp. 1480-1489, 2005. [9] A. V. Aho, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley, pp. 434438, 1986. [10] http://ftp3.itu.ch/av-arch/jvt-site/reference_software/