Kinect-Like Depth Compression with 2D+T Prediction - IEEE Xplore

2 downloads 0 Views 2MB Size Report
ABSTRACT. The Kinect-like depth compression becomes increasingly important due to the growing requirement on Kinect depth data transmission and storage.
2012 IEEE International Conference on Multimedia and Expo Workshops

KINECT-LIKE DEPTH COMPRESSION WITH 2D+T PREDICTION †

Jingjing Fu, ‡Dan Miao, †‡Weiren Yu, ‡†Shiqi Wang, †Yan Lu, †Shipeng Li †

Media Computing Group, Microsoft Research Asia, Beijing, China, The Institute of Digital Media, Peking University, Beijing, China ‡ Department of EEIS, Univ. of Science and Technology of China, Hefei, China †‡ School of Computer Science and Engineering, Beihang University. Beijing, China ‡†

erated depth data compression. However, as a low-cost structured light camera, the quality of the Kinect depth is limited by its generation principle and hardware constraints and suffers from spatial artifact and temporal inconsistency. Therefore, the Kinect-like depth is more difficult to be compressed in comparison with the RGB texture. Therefore, Mehrotra et al. [7] propose a near lossless depth compression to encode the scaling reciprocals of the depth values pixel by pixel. Although significant data size reduction is achieved, the strong relationships among the neighboring frames are little considered in the algorithm. Kinect-like depth is a kind of range data in terms of its physical meaning, so that it can be converted to point cloud for sequential predictive compression [8] or geometry compression based on octree structure [9]. But these point cloud compression algorithms focus on the static scanned point cloud, and cannot handle the replicate geometry information among depth frames. Recently, Kammerl [10] propose point cloud compression for point cloud library, in which realtime spatial changes are detected by the XOR comparison of the corresponding octree structures to remove the redundant 3D points. The spatial correlation is disrupted by the point discretization and removing, and the position and color information of the remaining 3D points have to be compressed by entropy encoding. On another point of view, the depth data can also be represented by triangular mesh for mesh compression [11]. There are several related works. Grewatsch et al. [12] presented a mesh-based coding scheme via compressing the 3D depth information using the MPEG-4 3DMC coder. In [13], an adaptation triangular mesh generation algorithm is introduced for depth map coding scheme. In these schemes, extracting the mesh from each raw depth frame cost additional computing and their compression performances are no better than that of image/video coding [5]. In this paper, we first analyze the generation principle of the Kinect depth and its data characteristics and then give a theoretical error model on the basis of the possible error reasons. Based on this model, we propose a novel depth prediction algorithm in collaboration with the standard video codec to fully exploit temporal correlations that are destroyed by defective depth measurement. More specifically, we adopt the prediction upon 3D reconstructed surface to

ABSTRACT The Kinect-like depth compression becomes increasingly important due to the growing requirement on Kinect depth data transmission and storage. Considering the temporal inconsistency of Kinect depth introduced by the random depth measurement error, we propose 2D+T prediction algorithm aiming at fully exploiting the temporal depth correlation to enhance the Kinect depth compression efficiency. In our 2D+T prediction, each depth block is treated as a subsurface, and it the motion trend is detected by comparing with the reliable 3D reconstruction surface, which is integrated by accumulated depth information stored in depth volume. The comparison is implemented under the error tolerant rule, which is derived from the depth error model. The experimental results demonstrate our algorithm can remarkably reduce the bitrate cost and the compression complexity. And the visual quality of the 3D reconstruction results generated from our reconstructed depth is similar to that of traditional video compression algorithm. Index Terms— Kinect-like depth, lossy compression, 2D+T prediction, depth volume I.

INTRODUCTION

Thanks to the emergence of Kinect [1], the depth data becomes much easier to access and the relevant applications develop rapidly. The Kinect-like depth compression comes to be a high priority problem with respect to the increasing requirements on depth storage and real-time depth transmission in the scenarios, like dynamic 3D scene recording and immersive telepresence. Similar to the texture video, each Kinect depth frame is composed of tremendous pixels whose values representing the distance between camera baseline and the object surface, and around 30 depth frames are generated per second. Therefore, the raw depth data undoubtedly has enormous size. The compression techniques for RGB texture have been studied for years and numerous mature solutions are available, such as JPEG [2], MPEG2[3], H.264/AVC [4], and so on. By contrast, the depth data compression is less investigated, as the depth data is not popularly utilized due to its high cost on assessment. Since the depth can be regarded as a high dynamic range image to some degree, some researchers [5], [6] apply the image/video compression method directly to the stereo gen978-0-7695-4729-9/12 $26.00 © 2012 IEEE DOI 10.1109/ICMEW.2012.110

599

(a)

(b)

Fig. 1 Schematic representation of depth-disparitty relation

distinguish the depth variation caused by obbject motion from the variation caused by the random depth m measurement error and sequentially reduce the extra coding ccost on the depth errors. In this way, the compression ratio iss increased without drop on data effectiveness. The expeerimental results show that both the bit rate and the comple xity are dramatically reduced, and the render result of thee depth frame reconstructed by our algorithm looks similaar to that of the traditional video codec. ws. In Section II, The rest of paper is organized as follow the error sources are discussed, and a theoreetical error model is presented. Section III describes the fraamework of our depth encoder. The depth reformation alggorithm is introduced in details in Section IV. In Section V V, both the principles of depth volume and the 2D+T prrediction are explained. The experiment results are given tto verify the performance of our coding system in Section VI. Finally, Section VII concludes the paper. II.

(c) (d) Fig. 2 Example of Kinect depth imag ge and its corresponding RGB image (a) depth map after calibraation (b) RGB image (c) normal map of the Kinect depth (d) neighboring depth frames difference.

These shifts are measured for all speckles by a simple image correlation procedure, which yields a disparity image. p between the distance ‫ܦ‬ Fig. 1 illustrates the relationship of an object point to the IR sensor and the distance ‫ܦ‬௥ of a reference plane. For simplicity, we assume that the origin of depth coordinate system locates at the perspective center of the IR sensor. According to similaarity of the triangles, the depth of the object is is, ‫ܦ‬௥ ‫ܦ‬ൌ (1) ͳ ൅ ‫ܦ‬௥ ȉ ݈Ȁሺ݂ ȉ ܾሻ where ݂ is the focal length of the IR R sensor; ݈ is the relative shift length (disparity); ܾ is the basee length.

ANALYSIS ON KINECT LIKE E DEPTH

In comparison with the traditional image/viddeo data, Kinectlike depth data processes special characteeristics in spatial domain and temporal domain that are not applicable to the traditional image/video compression techniques. In this section, we investigate the generating principlle [14] of the Kinect depth and analyze the depth data basedd on its derivation model.

B. Kinect depth error modeling The ideal disparity is characterizeed for its continuity and uniqueness, that is, the disparity vaaries continuously within the object surfaces and has unique value v at each fixed coordinate. For Kinect, the disparity is measured m under the influence of several factors, including sensor errors, light condiy problem, and disparity tion interference, imaging geometry normalization. As a result, the in naccurate disparity measurement leads to depth value drifting, and the disparity formation loss. Both the detection failures cause depth info spatial continuity and the temporal consistency are destroyed. A typical example is illustrated in Fig.2. The RGB imured simultaneously from ages and the depth images are captu a fixed Kinect camera by Kinect windows w SDK [15]. Some irregular depth holes exist along thee intersection between the roof and the wall. The step-shaped fluctuation phenomenon is noticeable in the normal map (seee Fig.2 (c)). Even if the static scene is captured by a fixed Kinect, K the depth value at a certain coordinate changes from time t to time. The differences between the neighboring deepth frames always have significant values (see Fig.2 (d)).

A. Depth generating principle The Kinect consists of an infrared (IR) laserr projector, an IR sensor and an RGB sensor. The infrared prrojector emits the pseudo random pattern light through a difffractive mask, so that each speckle in the pattern can be distinguished from others. Depth is derived by triangulation between the observed light pattern and the reference light pattern which is obtained by capturing a plane at a known distance beforehand and stored in the memory of the sennsor. To be more specific, if a speckle is projected on an oobject whose distance to the sensor is different from that of the reference plane, the speckle’s position in the receiveed image will be shifted along the direction of the baseline bbetween the projector ܲ and the perspective center of the innfrared cameraܵ.

600

Fig. 3 The encoder of our proposed depth compression scheme with 2D+T prediction

In order to model imperfect Kinect depth, we firstly formulate the disparity error according to the analysis on disparity, ݈ሚ ൌ ݈ ൅ ‫ݎ‬ௗ ൅ ‫ݎ‬௡

III. FRAMEWORK OF KINECT-LIKE DEPTH COMPRESION WITH 2D+T PREDICATION

(2)

In the traditional video coding, the original video is usually of high quality and free of artifacts, so that the motion inside the video can be estimated precisely by searching for the minimal sum of absolute difference (SAD) between the blocks in different frames. As we analyzed in section II, the Kinect depth difference of fixed position varies in a random manner, and its values are determined by the variation of disparity mask and disparity error. Since the traditional 2D inter-frame prediction is sensitive to temporal value fluctuation, it may lead to fake motion vector and significant residual. Base on this fact, we propose 2D+T prediction to detect the motion of each subsurface referring the reliable surface generated from the depth accumulation. Therefore, the long-term inter frame correlations are exploited as compensation to the conventional 2D depth prediction. The specific implementation of compression scheme is ෩௞ is transmitted to depicted in Fig.3. The k-th depth frame ‫ܦ‬ 2D+T prediction module at first. The 2D+T depth reference ‫ݎܦ‬௞ିଵ is generated from the history reconstructed depths stored in the depth volume to simulate the reliable 3D surface for motion detection. Following the error tolerant rules, the stable depth region ‫ݏܦ‬௞ is figured out, which indicate that the corresponding object surface is static. After Boolean ෩௞ , we get the moving depth region subtracting ‫ݏܦ‬௞ from ‫ܦ‬ ‫݉ܦ‬௞ , as the residue of the 2D+T prediction. ‫݉ܦ‬௞ is passed to the conventional video encoding, where the 2D intra and inter frame prediction is carried out. As a loop, the generated 2D+T reference must be supplied to video encoder for depth reconstruction.

where ݈ indicates the truth disparity map, and ݈ሚ is the dis parity generated for Kinect depth deviation. ‫ݎ‬ௗ is the disparity error introduced by the light pattern misidentification, ‫ݎ‬௡ is the normalization error caused by disparity round-off, with ‫ݎ‬௡ ‫ א‬ሾെ݈ ‫ כ‬ǡ Ͳሿ. Here, ݈ ‫ כ‬is the step size of disparity normalization. Besides inaccurate disparity measurement, the disparity detection failure also appears frequently, and thus the depth information in the corresponding regions is inaccessible. In combination of the main affecting factors on depth mea෩ and surement, the relationship between the Kinect depth ‫ܦ‬ the true depth ‫ ܦ‬can be formulated as follows, ෩ ൌ ‫ ܯ‬ȉ ሺ‫ ܦ‬൅ ݁ሻ ‫ܦ‬

(3)

where ‫ ܯ‬denotes the disparity mask, indicating whether the disparity value is valid at that position and ݁ is the depth measurement error caused by degradation of disparity. In the region with valid depth values, the error between the true depth and the output depth is ‫ ݎ‬൅ ‫ݎ‬௡ ෩െ‫ ܦ‬ൌ ௗ ෩ ‫ܦܦ‬ ݁ൌ‫ܦ‬ (4) ݂ȉܾ The depth error can be decomposed to identification error ݁ௗ and normalization error ݁௡ in terms of their origins. As෩െ‫ܦا ܦ‬ ෩ , the true depth ‫ ܦ‬can be replaced by ‫ܦ‬ ෩ suming‫ܦ‬ in Eq. (4) for approximation. Since the focal length and base length are both constant for Kinect, ‫ܥ‬଴ is used to represent the constant factor ͳȀ݂ܾ. ݁ ൌ ݁ௗ ൅ ݁௡

IV.

(5)

෩ ଶ ‫ݎ‬ௗ and ݁௡ ൌ ‫ܥ‬଴ ‫ܦ‬ ෩ ଶ ‫ݎ‬௡ . The upper bound of with ݁ௗ ൌ ‫ܥ‬଴ ‫ܦ‬ normalization error is, ෩ ଶ݈‫כ‬ ෩ ൯ ൌ ‫ܥ‬଴ ‫ܦ‬ (6) ȣ൫‫ܦ‬

DEPTH VOLUME FOR REFERENCE GENERATION

As a typical representation of range data, Kinect depth can be utilized to reconstruct 3D object surface [16] by means of volumetric integration [17] of the point cloud derived from each depth frame. The combination rules of volumetric integration can be described by the following equation.

Since the depth difference at fixed position is mainly caused by the variation of disparity error, the level of depth difference is also proportional to the square of the corresponding depth value.

ܶሺ‫ݔ‬ሻ ൌ

601

σ ௪೔ ሺ௫ሻௗ೔ ሺ௫ሻ ௐሺ௫ሻ

(7)

(a)

Fig. 5 Generation process of the depth reference

(b)

(‫ ) ܸܦ‬is a three-dimensional depth buffer, in which each voxel represents the history depth value at certain position (‫ݏ݋݌‬ሻ. Firstly, comparison is implemented between the current reconstructed depth ‫ܦ‬௞‫ כ‬and the previous reconstructed ‫כ‬ to evaluate the activity of each depth pixel. If depth ‫ܦ‬௞ିଵ the pixel difference is smaller than the product of constant ߙ and the normalization error bound ȣሺȉሻ, the pixel of the current reconstructed depth will be regarded as inactive pixel, whose value will be loaded into the depth volume and stored behind the pervious recorded depth value. The corresponding counter ‫ݐ݊ܥ‬ሺ‫ݏ݋݌‬ሻ is added by one. Otherwise, the pixel is treated as active pixel, whose value will be recorded as the fresh value locating on the front of the array. Meanwhile, the counter at that position is reset to one. If the number of the similar depth values at a position is large enough, the average of the history depth value is loaded to k-th depth reference ‫ݎܦ‬௞ .The constant ߙ is proportional to the intensity of light inference. An example of the depth reference generation is shown in Fig. 5. After checking the neighboring reconstructed depth difference, a similarity map is produced to denote whether the pixel is compatible to the previous depth records. In the similarity map, the inactive pixels are in black, and the active pixels in white. The counter at the active pixel is reset to zero (squares with red boundaries), while the counters at the rest position is added by one. For the positions (squares with blue boundaries) where counter value is larger or equal to the counter threshold, the depth values in the corresponding array are averaged to update the depth reference.

Fig. 4 An example of volumetric integration.

with ܹሺ‫ݔ‬ሻ ൌ σ ‫ݓ‬௜ ሺ‫ݔ‬ሻ . ݀௜ ሺ‫ݔ‬ሻ is signed distance of each point x to the i-th range surface along the line of sight to the sensor, and ‫ݓ‬௜ ሺ‫ݔ‬ሻ is the weight function dependent on the angle between vertex normal and the viewing direction. The continuous implicit function T(x) is presented on a dictate voxel grid, and the isosurface is extracted corresponding to T(x)=0. Fig. 4 illustrates the isosruface extraction process. The range images captured at different time slot are ܴ௧ିଵ and ܴ௧ . If the viewing direction perpendicular to the range surface, the corresponding signed distance is ݀௧ ሺ‫ݔ‬ሻ ൌ ‫ ݔ‬െ ܴ௧ and ‫ݓ‬௧ ሺ‫ݔ‬ሻ ൌ ͳ. In terms of the combination rule depicted in Eq. (7), the isosurface is obtained when ‫ ݔ‬ൌ ሺܴ௧ିଵ ൅ ܴ௧ ሻȀʹ . If this derivation extends to multiple range images, the range of the isosurface should be the average of each range images. ଵ

‫ݔ‬଴ ൌ σ ܴ௜ , with ௡

ܶሺ‫ݔ‬଴ ሻ  ൌ Ͳ

(8)

݊ is the number of the range images. Although the integrated surface can provide stable reference for depth prediction, the volumetric representation of range data requires a large amount of memory and computation. So we propose depth volume to simulate the 3D volume under the assumption that the viewing direction is perpendicular to all range surfaces. Since the depth is the distance between the object and the baseline plane instead of camera, the above assumption can be satisfied. Accordingly, the Eq. (8) can be applied to the reference depth generation. Listing 1 Depth reference generation

V.

1: for each pixel in the k-th reconstructed depth ‫ܦ‬௞‫כ‬ ‫כ‬ 2: if ȁ‫ܦ‬௞‫ כ‬ሺ‫ݏ݋݌‬ሻ െ ‫ܦ‬௞ିଵ ሺ‫ݏ݋݌‬ሻȁ ൏ ߙ ȉ ȣሺ‫ܦ‬௞‫ כ‬ሺ‫ݏ݋݌‬ሻሻ then 3: ‫ݐ݊ܥ‬ሺ‫ݏ݋݌‬ሻ ՚ ‫ݐ݊ܥ‬ሺ‫ݏ݋݌‬ሻ ൅ ͳ 4: 5: 6:

ERROR TOLERANT 2D+T PREDICTION

According to the causes of disparity error, normalization error ‫ݎ‬௡ is a random variable with uniform, while identification error ‫ݎ‬ௗ approaches to a normally distributed i.i.d variable with finite variance and zero mean. These random errors tend to be eliminated during the depth accumulation and the generated depth reference can be regarded as true depth to some degree. Therefore, if the difference between input depth and the reference is smaller than the disparity error bound, the difference is more likely caused by measurement error rather than a new emerging surface. Based on the above analysis, we assign a tolerant range (see Fig.6) to the reliable reference surface with respect to

‫ܸܦ‬ሺ‫ݏ݋݌‬ǡ ‫ݐ݊ܥ‬ሺ‫ݏ݋݌‬ሻሻ ՚ ‫ܦ‬௞‫ כ‬ሺ‫ݏ݋݌‬ሻ else ‫ݐ݊ܥ‬ሺ‫ݏ݋݌‬ሻ ՚ ͳ

7: ‫ܸܦ‬ሺ‫ݏ݋݌‬ǡ ‫ݐ݊ܥ‬ሺ‫ݏ݋݌‬ሻሻ ՚ ‫ܦ‬௞‫ כ‬ሺ‫ݏ݋݌‬ሻ 8: if ‫ݐ݊ܥ‬ሺ‫ݏ݋݌‬ሻ ൒ ‫݈݀݋݄ݏ݁ݎ݄ݐ̴ݐ݊ܥ‬ሺ‫ݏ݋݌‬ሻ 9: ‫ݎܦ‬௞ ሺ‫ݏ݋݌‬ሻ ՚ ‫ܸܦ݂݋݁݃ܽݎ݁ݒܣ‬ሺ‫ݏ݋݌‬ሻ

The pseudocode Listing 1 illustrates the main steps of depth volume loading and reference generation. Depth volume

602

the depth error model deduced in section II. If the newly coming depth locates in the tolerant regionn, the surface can be represented by the accumulated history ddepth information

Fig. 6 Error tolerant rule during 2D+T ppredication

and will be skipped during the traditional 22D coding. If the newly coming surface is out of range, thhe surface is regarded as moving surface and will be passed to next step for traditional 2D prediction. The prediction is implemented block by block which is equivalent to the motion detection of each subsurface. Therefore, our approachh is more flexible than the traditional object based segmenntation methods. Note that the depth reference is generatedd pixel by pixel, while the prediction is implemented in blockk level. Our prediction results of depth sequeence “roof” and “player” are shown in Fig. 7. In the sequuence “player”, a person was playing Kinect, moving and juumping under the instructions of the Xbox. The 2D+T referrence grows with time while more and more depth is loaded to the depth volume. In the residue of the 2D+T predictioon, the static surface is automatically removed from the deepth except some unstable boundary blocks. After 2D prediiction only small range residual is remained.

Fig. 7 The intermediate prediction reesults of depth sequence “roof” (left) and “player” (right). Thee depths from the top to bottom: original depth, 2D+T depth reeference, residue of 2D+T prediction, residue after traditional interr predication.

n be directly applied for framework, since this software can h coding. The depth sehigh dynamic (up to 14bits) depth quences “player” and “roof” capturred at 30fps with resolution 640*480 are compressed by our o approach and H.264 ounter thresholds are unrespectively. For simplicity, the co iformly set to five and the constant ߙ is set to one.

VI. EXPERIMENTAL RESULT TS

Our compression framework can be inteegrated with the start-of-art video coding schemes, and the ccodec can be chosen in terms of the specified system requirrement on coding efficiency and complexity. In our experimeents, the standard codec H.264 is adopted for high dynamic rrange depth coding. In order to evaluate our depth compresssion scheme, we have carried out a series of experiments on the Kinect depth sequences. The objective and subjective comparisons between the results generated by different cooding scheme are implemented mainly on three aspects: 1) coding efficiency; 2) computational complexity; 3) 3D reconsttruction quality. Most of the Kinect related applications aare develop based on 13 bits Kinect depth with dynamic rannge from 800 to 4000. Therefore, we focus on the compressiion of raw 13 bits Kinect depth. In the experiment, we inteegrate the H.264 extension software (JMkta) [18] into our deepth compression

Table 1. Rate distortion comparison with the same MSE

݉‫݁ݏ‬௢௥௜

݉‫݁ݏ‬௥௘௦

݉‫݁ݏ‬ ݉ ௣௥௘ௗ

KBits/frame

H.264

6.82

8.73

5.21

279.32

Ours

58.25

8.14

96.14

205.97

Table 2. Rate distortion comparisson with the same bitrate

݉‫݁ݏ‬௢௥௜

݉‫݁ݏ‬௥௘௦

݉‫݁ݏ‬ ݉ ௣௥௘ௗ

KBits/frame

H.264

6.82

8.73

5.21

279.32

Ours

56.26

2.87

96.23

277.20

y we compare the rateTo evaluate the coding efficiency, distortion performance with two meethods: one is to compare the bitrate with the same MSE, thee other is to compare the MSE at the same bitrate. The resullts of “player” are shown

603

in the above two tables. ݉‫݁ݏ‬௢௥௜ is the MS SE of the whole frame, while ݉‫݁ݏ‬௣௥௘ௗ and ݉‫݁ݏ‬௥௘௦ indicate the MSE of the stable depth region removed by 2D+T prrediction and the moving depth region remained for 2D videoo coding, respectively. From table 1 and 2, we can observee that the coding efficiency of our approach outperforms H.2264 based on the moving range distortion - overall bitrate m measurement. The overall distortion of our approach is mainnly introduced by depth measurement error correction at the sttable region. Fig. 9 shows the bitrate cost comparison with the same MSE. Comparing with H.264, our scheme can savve 26% and 53%

parisons on normal maps of the 63 3-th reconstructed depths of “player”. We can see that the normal n map of the reconstructed depth generated by our app proach looks similar with the original one. VII. CONCLUSIO ONS

In this paper, we present a novel prediction p method for the efficient Kinect-like depth compresssion. The advantages of our proposed algorithm can be sum mmarized in three aspects. Firstly, we take advantage of Kinecct depth’s physical meaning to build up the temporal depth h correlation. The robust 2D+T prediction and the sensitive traditional 2D prediction are properly integrated to serve the compression of different otion trend of different depth contents. Secondly, the mo depth region is detected by predictiion, and this information is valuable for the further processing more than compresT prediction can be intesion. Thirdly, our proposed 2D+T grated to nearly all block-based im mage/video codecs with tiny modification on the reconstructtion loop. REFERENCE ES [1] [2] [3] [4] [5]

Fig. 8 Comparision on the normal maps of deepth, left top: h.264 reconstructed ( ݉‫݁ݏ‬௢௥௜ ൌ ͷǤͶͷ , ݉‫݁ݏ‬௣௥௘ௗ ൌ ͷǤʹ͸ , ݉‫݁ݏ‬௥௘௦ ൌ ͹Ǥ͸ͷሻ , reconstructed right top: our approach (݉‫݁ݏ‬௢௥௜ ൌ ͷͷǤ͵ͻǡ݉‫݁ݏ‬௣௥௘ௗ ൌ ͳͲʹǤͲ͸, ݉‫݁ݏ‬௥௘௦ ൌ ͹Ǥͳ͵), left bottom: our approach 2D+T reference, right bottoom: our approach 2D+T residue.

[6]

[7]

[8]

[9]

[10]

Fig. 9 Objective performance comparision of depth squence “player” and “roof” on bit cost.

[11] [12]

[13]

[14]

Fig. 10 Complexity comparision of depth squuence “player” and “roof” on coding time.

[15] [16]

bit cost for “player” and “roof” respectively. Fig.10 shows the complexity comparison. Comparing w with H.264, the coding complexity is reduced by 48.7% andd 83.3% for these two sequences respectively. Normal map iss widely adopted to simulate the 3D rendering result. Fig. 8 shows the com-

[17]

[18]

604

Microsoft Kinect. http://www.xbox x.com/de-de/kinect G. K. Wallace, "The JPEG still picture compression standard",Comm. ACM, vol. 34, pp.30 - 44, 1991. Generic Coding of Moving Pictures and a Associated Audio (MPEG2), Part 2: Video, 1995. ISO/IEC JTC 1/SC 1 29/WG 11/13 818-2, ISO. JVT, Advanced Video Coding (AV VC). ITU-T Rec. H.264 and ISO/IEC 14496-10 (MPEG-4 Part 10), 2004. S. Grewatsch, E. Muller, “Evaluation of Motion Compensation and Coding Strategies for Compression of Depth D Map Sequences,” in 49th SPIE’s Annual Meeting (2004) C. Fehn, “Depth-image-based Renderring (DIBR), Compression and Transmission for a New Approach on 3D TV”. In Proc. of SPIE, Conf. Stereoscopic Displays and Virttual Reality Systems XI, Vol. 5291, pp. 93–104, 2004 S. Mehrotra, Z.-Y, Zhang, Q. Cai, C. C Zhang and P. Chou, “LowComplexity, Near-Lossless Coding of Depth Maps from Kinect-Like Depth Cameras,” in Proc. MMSP, IEEE E, October 2011 S. Gumhold, Z. Karni, M. Isenburg an nd H. Seidel “Predictive PointCloud Compression” in Proc. SIGGR RAPH '05, ACM SIGGRAPH 2005 sketch, Aug, 2005. R. Schnabel and R. Klein, "Octree-bassed Point-Cloud Compression," in Proc. Symposium on Point-Based Graphics 2006, Eurographics press, pp. 147-156, July 2006 J. Kammerl, “Development and Evaluation of Point Cloud Compression for the Point Cloud Library,” onlin ne. C. Touma and C. Gotsman, “Trianglee mesh compression,” in Proc. Graphics Interface. pp. 26-34. 1998 S. Grewatsch and E. Muller, "Fast meesh-based coding of depth map sequences for efficient 3D video reprod duction using OpenGL," Visualization, Imaging, and Image Processin ng, 480-66, 2005. B. -B. Chai, S. Sethuraman, H. S. Saw whney, and P. Hatrack, “Depth map compression for real-time view-b based rendering,” Pattern Recognition Letters, Elsevier, pp. 755–766, May 2004. J. Fu, S. Wang, Y. Lu, S. Li and W. Zeeng, “Kinect-like depth denoising,” submitted to ISCAS 2012 http://research.microsoft.com/en-us/um m/redmond/projects/Kinectsdk S. Izadi, D. Kim, O. Hilliges, D. Molyn neaux, R. Newcombe, P. Kohli, J. Shotton, S. Hodges, D. Freeman, A. A Davison and A. Fitzgibbon, “KinectFusion: real-time 3D reconstru uction and interaction using a moving depth camera” in Proc. UIST, 2011, 2 pp. 599–568, Oct. 2011. B. Curless and M. Levoy, “A Volumettric Method for Building Complex Models from Range Images,” Proc. ACM SIGGRAPH 96, 1996, pp. 303-312 Jm14.2kta1.0, http://iphome.hhi.de e/suehring/tml/