Visual Feature Discrimination versus Compression

0 downloads 0 Views 286KB Size Report
Nov 7, 2000 - Based on a polygon shape descriptor presented for MPEG-7 this paper ... contained within a MPEG-7 description are visual descriptors for low ...
Proc. Internet Multimedia Management Systems, Boston, Mass., 6-7 November 2000, SPIE Vol. 4210, pp. 82-93.

Visual Feature Discrimination versus Compression Ratio for Polygonal Shape Descriptors Jörg Heuer Francesc Sanahuja André Kaup {Joerg.Heuer, Francesc.Sanahuja, Andre.Kaup}@mchp.siemens.de Siemens Corporate Technology, Information and Communications 81730 Munich, Germany ABSTRACT In the last decade several methods for low level indexing of visual features appeared. Most often these were evaluated with respect to their discrimination power using measures like precision and recall. Accordingly the targeted application was indexing of visual data within databases. During the standardization process of MPEG-7 the view on indexing of visual data changed, taking also communication aspects into account where coding efficiency is important. Even if the used descriptors for indexing are small compared to the size of images, it is recognized that there can be several descriptors linked to an image, characterizing different features and regions. Beside the importance of a small memory footprint for the transmission of that descriptor from the sender to the receiver and the memory footprint in a database, eventually the search and filtering can be speed up dramatically by reducing the dimensionality of the descriptor if the metric of the matching can be adjusted. Based on a polygon shape descriptor presented for MPEG-7 this paper compares the discrimination power versus memory consumption of the descriptor. Different methods based on quantization are presented and their effect on the retrieval performance are measured. Finally an optimized computation of the descriptor is presented. Keywords: Image Retrieval, Shape Retrieval, Shape Coding, Descriptor Coding

1. INTRODUCTION In the recent years the available multimedia content hosted locally or in the internet is growing heavily. To handle this increasing amount of data content management for multimedia archives becomes more important. Usually these tools use meta information to manage and index the multimedia data. The generation of the meta information is a time consuming process. But having a dedicated content management system the contained meta Information usually can only be interpreted by this system and acn not be exchanged with other systems. To circumvent this limitation MPEG [1] has started to standardize an exchange format for such meta information called MPEG-7. One part of the information contained within a MPEG-7 description are visual descriptors for low level features of still images and video sequences. In general visual descriptors have been studied intensively in literature for more than one decade. To compare descriptors and for evaluation most often measures like precision and recall have been used to specify the retrieval performance with respect to the percentage of relevant images in the retrieved set and the percentage of relevant images retrieved based on the relevant images in the database. For the evaluation of descriptors for the scope of MPEG-7 the measures were extended with respect to the efficiency of transmission. So beside the discriminating power of descriptors also the size has been identified as a critical attribute of the descriptors. In this paper we investigate in detail the optimization of a shape descriptor based on a polygonal shape representation as proposed in [2][3] with respect to the discriminating power and the size of the descriptor. Beside the polygon based shape descriptor also a contour based descriptor based on the scale space representation [4][5] where investigated within MPEG-7. Compared to the polygon based descriptor the latter characterizes the relative position and curvature of contour turning points. The descriptor is computed by a non reversible transformation. So the descriptor can be used for retrieval but not for visualization of a contour approximation as it is the case for the polygon based one. The matching of the descriptors is generated by defining groupings and computing the normalized correlation between the contours in the tangent resp. scale space with respect to an assigning process. This differs in the processing to shape descriptors for instance based on the wavelet representation of the contour [6]. For this descriptor only the correlation of the coefficients has to be computed for comparison of the contour. This paper is divided into the following sections: The next section explains in more detail the basic principle of the polygon based shape descriptor using the tangent space representation. In section three the test criteria defined and used within MPEG-7 experiments are specified. Section 4 introduces different optimization algorithms for an efficient lossy encoding of

the descriptor. Section 5 discusses the results of these algorithms with respect to the retrieval performance and the descriptor size. Finally a conclusion is drawn.

2. A POLYGON BASED SHAPE DESCRIPTOR For computation of a shape descriptor based on the polygonal representation of the contour it is desirable to approximate the original contour preserving the perceptual appearance at the level sufficient for object recognition or retrieval. To achieve this, an appropriate approximation (or curve evolution) method was proposed in [2]. The curve evolution method achieves the task of shape simplification in a parameter-free way, i.e., the process of evolution compares the significance of vertices of the contour based on a relevance measure. Since any digital curve can be regarded as a polygon without loss of information (with possibly a large number of vertices), it is sufficient to study evolutions of polygonal shapes. The basic idea of the proposed evolution of polygons is very simple: •

In every evolution step, a pair of consecutive line segments

s1 , s2

is substituted with a single line segment joining the endpoints of

s1 ∪ s2 . The key property of this evolution is the order of the substitution. The substitution is done according to a relevance measure K given by 

β ( s1 , s 2 )l ( s1 )l ( s 2 ) l ( s1 ) + l ( s 2 ) β ( s1 , s2 ) is the turn angle at the common vertex of segments s1 , s2

K ( s1 , s 2 ) =





where and l is the length function normalized with respect to the total length of a polygonal curve C. The evolution algorithm is assuming that vertices which are surrounded by segments with a high value of K ( s1 , s2 ) are important while those with a low value are not. A cognitive motivation of this property is given in [2], where also a detailed description of the discrete curve evolution can be found. The signal based attributes of this algorithm are a simplification of the shape complexity where unlikely to shape simplification using diffusion processes no dislocation of the remaining features occurs (see Figure 1). This is an important feature of the evolution process since this descriptor can also be used for localization as mentioned in the previous section. For the retrieval it is important that the shape descriptor computed by the evolution process is robust with respect to noise. This is the case since the hereby caused small segment pairs result in small values of the relevance measure K. So these segment pairs are removed in an early stage of the evolution process. A more formal justification of the above properties can be found in [latlakEvo98].

Figure 1: Fine to coarse simplification of the original contour(top left) by preserving the perceptual information. The vertices of the simplified contour are also vertices of the original contour.

Depending on the threshold of the relevance measure Kt for termination of the evolution process more or less contour details are preserved in the shape descriptor. This will be investigated in more detail in section 4. The descriptor itself is encoded using the following differential variable length encoding scheme [8][9]: • The vertices of the processed polygon are encoded differentially which corresponds to encode the segments of the polygon. If localization is needed also the first vertex is encoded absolute with x and y coordinates. • For the descriptor a dynamic range of the coordinate values is specified • Each segment is coded by specification of the octant it is lying in, the major and the minor component. While the octant specifies the major, the minor component and its signs, the specification of the value of the major component also specifies the dynamic range of the minor component (Figure 2).

Figure 2: Bitrepresentation of a segment. The dynamic range of the X and Y component is encoded in the header of the contour bitstream.

Matching of the polygon based shape descriptor is done by the comparison of convex parts. This is motivated by the observation that in the evolution process the contour of visual parts become at a certain stage convex parts. Since this stage might not be a common one for all visual parts and also differs from the stage reached in the computation of the shape descriptor for the matching a comparison of groups of convex arcs is performed for matching. So the key idea is to find the right correspondence of the visual parts. It is assumed that a single visual part (i.e., a convex arc) of one curve can correspond to a sequence of consecutive convex and concave arcs of the second curve. Thus, the correspondence can be a one-to-many or a many-to-one mapping, but never a many-to-many mapping. This assumption is justified by the fact that a single visual part should match to its noisy versions that can be composed of sequences of consecutive convex and concave arcs, or by the fact that a visual part obtained at a higher stage of evolution should match to the arc it originates from. Since maximal convex arcs determine visual parts, this assumption guarantees preservation of visual parts (without explicitly computing visual parts). In [3] the matching is described in detail whereas here only the metric of the matching between two descriptors D1, D2 representing closed polygons O1 and O2 is described. For each pair of the closed polygons O1, O2 a mapping between corresponding arcs Cη1, Cη2 can be defined which minimizes the cost function SC

  S c (O1 , O2 ) = min  ∑ S a Cη , =1 , f ( O1 ,O2 ) (Cη ,=1 )    Cη , =1∈O1  where f ( O1 ,O2 ) is the correspondence between O1 and O2 and Sa is a similarity measure of arcs that will be defined below.

(

)

The arcs Cη1, Cη2 consist of only convex or only concave arcs and a mapping of one-to-many or many-to-one is allowed

within

f (O1 ,O2 ) . The mapping which minimizes the measure SC can be computed using dynamic programming. To compare

matching results based on SC the similarity measure is normalized to the length of the contour of the query image. The similarity measure of arcs Sa which is part of the similarity measure of the contour descriptors SC is based on the computation of the L1-Norm in the tangent space representation of the arcs. A polygonal arc Cη is represented by step function T(Cη1) in the tangent space. For illustration, see Figure 3. Let c,d be simple polygonal arcs and T(c), T(d) their tangent space representations. The arc similarity measure is given by

 l ( c) l ( d )  S a (c, d ) = DL1 (T (c), T (d )) max(l (c), l (d )) max  ,   l ( d ) l ( c)  where D L1 denotes the L1-norm in the tangent space (see Figure 3) and l is the relative arclength of an arc with respect to 

the boundary length of the contour. Observe that either c or d is a convex or concave arc according to the definition of SC.

Figure 3:L1 norm

DL1 (T (c), T ( d )) in the Tangent space for two arcs c and d. 3. TEST CRITERIA

For the evaluation a testdata set of MPEG-7 of in total 3450 shapes was used. The testdata set can be divided into three main parts with respect to the following main objectives: Part A: robustness to scale (A1) and rotation (A2) Part B: performance of the similarity-based retrieval Part C: robustness to small non-rigid transformations due to motion

For the retrieval performance only the recall is measured, where recall is the ratio of the number of the retrieved relevant shapes to the number of the relevant shapes in the database. The precision is not considered. To evaluate the discriminating power with respect to the descriptor size also the average size of the shape descriptor with respect to the database is computed.

The dataset is in detail composed of: A-1 Robustness to Scaling: the database includes 420 shapes; 70 basic shapes and 5 derived shapes from each basic shape by scaling digital images with factors 2, 0.3, 0.25, 0.2, and 0.1. Each of the 420 images was used as a query image. A number of correct matches was computed in the top 6 retrieved images. Thus, the best possible result is 2520 matches. A-2 Robustness to Rotation: the database includes 420 shapes: the 70 basic shapes are the same as in part A-1 and 5

derived shapes from each basic shape by rotation (in digital domain) with angles: 9, 36, 45 (composed of 9 and 36 degree rotations), 90 and 150 degrees. Each of the 420 images was used as a query image. A number of correct matches was computed in the top 6 retrieved images. Thus, the best result is 2520 matches. B Similarity based retrieval: the total number of images in the database is 1400: 70 classes of various shapes, each class with 20 images. Each image was used as a query, and the number of similar images (which belong to the same class) was counted in the top 40 matches (bulls-eye test). Since the maximum number of correct matches for a single query image is 20, the total number of correct matches is 28000. C Motion and non-rigid deformations: the goal of this experiment was to examine the performance of the descriptors in the retrieval of non-rigid shapes obtained by motion. As an example of such shape, 200 frames of the bream fish swimming were selected. The 200 frames of the bream fish plus a database of marine animals with 1100 shapes form the data set. Bream-000 was used as a query, and the number of bream shapes in the top 200 shapes was counted. Thus, the maximal number of possible matches was 200. Afterwards these recall results are combined in order to give a single value of the algorithm performance. This value can be computed as an overall performance average over the three parts, i.e. computed by 1/3 A + 1/3 B + 1/3 C, or as an overall performance average over the number of queries, i.e. computed by 840/2241 A + 1400/2241 B + 1/2241 C. A is always calculated as ( A1 + A2 ) / 2. The values presented in this paper are computed as an average over the three parts.

4. LOSSY CODING OF THE POLYGON BASED DESCRIPTOR The great amount of multimedia data these descriptors are suitable to be applied to makes it necessary to store them in an efficient way. For lossless coding of the vertices of the descriptor a differential encoding scheme was presented in section 2. But in order to enable efficient storage and maintaining good retrieval capabilities in this section different quantization mechanisms for lossy encoding are presented. The first mechanism is based on a subsampling of the vertices of the contour and thus encoding of a lower number of vertices. This subsampling is conducted by a higher threshold of the relevance measure Kt in the evolution. While this results in a higher approximation of the contour shape the effect on the retrieval rate has to be studied. Even though this approach reduces the number of vertices, the position of the remaining vertices are encoded with the original precision. To evaluate if this is required by the retrieval, two quantizers prior to the vertex encoding are discussed: an adaptive quantizer based on the segment length, and a uniform one applied to both vertices and segments. While both mechanisms, the subsampling and quantization of vertices, reduce the descriptor size it should be mentioned that only the first one has an effect on the dimensionality of the descriptor. Especially with respect to indexing and matching this can result in a speed up of the retrieval process. In the following the subsampling/quantization mechanisms and their adaptation for polygon based shape descriptors are described in detail. 4.1.

Subsampling of vertices

As already discussed in section 2 the degree of contour polygon abstraction achieved by the evolution process influences also the descriptor size. The evolution algorithm is removing points of the contour as long as the area changed in the tangent space by this simplification remains under a given threshold. By increasing this threshold, the evolution on the contour removes more vertices and thus the set of remaining vertices which have to be stored is smaller. In general this reduction is also related to the size of the descriptor. But since a variable length code is used for the encoding of the segments, a reduction on the number of vertices must not be related in a linear fashion to the reduction over the encoded descriptor data. In Figure 3 the relation between the number of vertices and the evolution cost for the given data set can be observed. The

lowest relevance threshold Kt for the evolution is 0.34. For this value the contours in the test set are reduces to an average number of 22 vertices. This is related to a descriptor size of 42 bytes. Even though there is no linear relation between the threshold K, the descriptor length and size it is a monotone relation. So when the number of vertices falls to 12 the descriptor also decreases in average to 24 bytes.

45,00 Number of Vertices 40,00 35,00 Descriptor Size 30,00 25,00 20,00 15,00 10,00 5,00 0,00

Numbe of Vertices

20,0 15,0 10,0 5,0 0,0 0,2

0,4

0,6 0,8 1,0 1,2 Relevance Measure Threshold Kt

Size (Bytes)

Evolution of the number of vertices

25,0

1,4

Figure 3: Evolution of the number of vertices according to the evolution threshold

To find an optimal operation point KOP for the relevance measure it has to be investigated how the contour abstraction effects the shape retrieval and to what stage contour details have a significance with respect to the shape recognition. Corresponding experimental results to according experiments are presented in the next section. 4.2.

Uniform quantization of vertex points

The objective of the quantization is to reduce the size of the descriptor by allowing a certain error in the localization of the vertices used for shape approximation. Such a quantizer divides the discrete signal space into quantization intervals and defines for each interval a representative point. Thus in the quantization process each signal or in this case segment is represented by a value of the interval it lies in. The quantization process implies a loss of precision and is thus an irreversible process. A uniform quantization of the vertex position is equivalent to a downsampling in the image space. In principle the matching is invariant to the length of the contour as described in section 2. This fact is proven in tests performed over the scaling part of the test set that have shown that the matching algorithm is robust against the scaling of the shape. But two different approaches of downsampling in the image plane have to be distinguished: the downsampling of the object representation and the one of the contour representation. The first one was applied to the scaling test set. It causes certain deformations which can lead to a splitting of an object in several not connected parts. The latter is applicable for the quantization of the polygon based contour descriptor since it more likely preserves the contour even if segments are no longer separated in the image space. The suggested quantization reallocates each remaining vertex after evolution to a reduced image of sizes

1

32

,

1 , 1 16 8

,

1

4

or 1 2 of the original. The down-sampling factor, which is coded in the header, is the highest (2, 4, 8, ..., 32) that keeps a similarity condition between the quantized and the evolved vertices. The measure used to test the similarity (2) is local, based on the segments and applied to each pair of neighbor segments. It takes into account the angle between them. The explicit form is as follows:

M =

s1 × s 2 s1 ⋅ s 2



where s1×s2 represents the vector product and s1, s2 two consecutive segments. The similarity is tested through an absolute difference and assuring it to be under a given threshold and also assuring that both measures, evolved and quantized, have the same sign. By keeping the differences under a threshold the turning angle is kept at similar values. By assuring both PHDVXUHV WR KDYH WKH VDPH VLJQ WKH FRQYH[FRQFDYH VWUXFWXUH LV SUHVHUYHG DQG SRVVLEOH MXPSV RI 

 LQ WKH 7DQJHQW 6SDFH

representation avoided, which is a necessary condition to obtain similar results on the retrieval before and after quantization. For quantization the described algorithm operates on the vertex positions to determine the nearest representative of the quanatizer. If there are several nearest representatives the upper and most left one is chosen. A variation of this quantizer is realized to quantize not based on the vertex position but based on the segment quantization. So similar to the vertex based quantization each segment is quantized using the quantized representative of the previous one except for the first segment. So an error propagation in the image space is avoided. But in the case that a vertex has several nearest representatives the one representing the segment with the smallest error in the tangent space representation is chosen. Hereby an error in the length of the segment is approximated by the product of the length difference and the angular change to the consecutive segment. 4.3.

Adaptive quantization of segments

To maximize the reduction of the descriptor size by maintaining the discrimination power the quantizer should be realized with respect to the matching metric applied and the signal characteristics of the approximated contour. The matching of contours is computed in the tangent space (Figure 3). The similarity of two arcs is computed comparing the area under the polygon representation in the tangent space. Both magnitudes (length and angle) are directly related to the matching as described in section 2. So the error caused by the quantizing process should be limited with respect to the segment representation in the tangent space. For simplification we separate the problem by limiting the scope of the quantization from maximal concave and convex arcs to segments. Furthermore for the error measure we assume the area error in the tangent space representation. The evolution of the contour is based on a relevance measure based on the length and angle of a segment in the tangent space. So for segments of the evolved contour the following can be assumed: • short segments have a big angle with respect to its predecessor; thus the length of the segments should be encoded with a high precision. • long segments can have both a small or a big angle with respect to its preceding segments. But due to ist length, the angle should be coded precisely. These assumptions lead to a definition of a unequal quantization allowing the same relative error in both coordinates of the segment.

Y Component

Grid for a same-relative-error quantizer

X Component Figure 4: The quantization intervalsof the non-uniform quantizer

According to the presented coding scheme based on the coding of an octant, a major and a minor component also the major and minor component is separately quantized. In this approach the same relative error with respect to the size of the segment is allowed. The error permitted is given as a percentage of the length of each component. That means that the absolute error for the minor component would be smaller than that for the major component. Even though the information for the x and y components is encoded separately, Figure 4 shows the resulting quantizer in the two dimensional space. The quantizer of Figure 4 shows that as we required above long segments are encoded with a higher precision in the angle while short segments are quantized more precisely with respect to the length. In the encoding the percentage used to calculate the quantizer have to be transmitted in the header of the descriptor since only the index of the quantization interval is transmitted. This is encoded with two bits since only percentages of 5, 10, 15 and 20% are used. Encoding the segments of a closed contour using a non uniform quantizer causes two different kinds of errors in the quantized contour: crossings of segments building a loop and error propagation to the last encoded segments. So crossings building a loop which have a big impact on the tangent space representation are avoided in the quantization process by checking the relative position of encoded segments. Also the error propagation is limited in the algorithm by encoding segments differential with respect to the previous reconstructed segment. This way the localization functionality of the descriptor can still be used, but also additional error in the tangent representation is introduced.

5.

RESULTS

In this section the efficiency of the proposed irrelevance reduction algorithms to encode the polygon based descriptor in a lossy way are discussed with respect to the retrieval performance. In the first part the results for the abstraction of the contour polygon by vertex subsampling are discussed to find an appropriate operation point for the relevance measure threshold Kt. Then based on this operation point the results of the quantization of the evolved contour representations are discussed. Different thresholds Kt for the evolution algorithm have been applied to the test set in order to get the optimal point of operation which maximizes the relation retrieval vs. descriptor size. As mentioned above the descriptor can also be used for localization. But the range of abstraction applied by especially the high values of the relevance measures only targets the retrieval functionality. For localization in most cases this might be a too coarse approximation. But as shown in [***] the described vertices can be a subset of vertices used for localization. Nevertheless for retrieval these representation still contain enough perceptual information. The threshold for the relevance measure Kt was varied in the range 0.34-1, which corresponds to a range in the number of vertices per object from 12 to 22.

Evolution of the retrieval rate 

Retrieval Rate

     

Retrieval

 







Descriptor Size Figure 5: Evolution of the retrieval rate against the descriptor size





Figure 5 shows the diagram of retrieval rate vs. descriptor size when no quantization of the segments or vertices is applied. The variation in descriptor size are obtained only by subsampling the vertex points. This is achieved by changing the threshold on the relevance measure for the evolution algorithm. The retrieval rate is the average over all test queries as described in section 3. The diagram shows that the retrieval performance is increasing below Kt≈0.6 if the descriptor detail and with it the size is increased. But increasing the detail further on does not result in a better over all performance anymore. Table 1 complements the curve values on Figure 5. There the results for the matching of the different tests are presented along with a summarized value for the full set. Until a cost of 0.6, the retrieval rate in all tests is similar. In the case of 0.34 the scaling test presents a slightly lower value which is compensated by a little higher value on the similarity test. When the number of vertices is reduced to 12 (evolution cost equal to 1) the retrieval rate drops by 2%. The evolution threshold which presents the smallest descriptor size and a high retrieval performance is chosen as working point for further experiments. So the threshold of relevance measure of Kt=0.6 which corresponds to a descriptor size of 31.5 bytes and a retrieval rate of 87.4% was selected for further experiments.

Evolution Cost 0,34 0,47 0,6 1

Scaling (%) Α1 87,81 88,69 88,05 85,27

Rotation (%) A2 99,76 99,76 99,80 99,08

Tests Similarity (%) B 76,52 75,96 75,73 73,73

Deformation (%) C 92,00 92,50 92,50 89,00

Average (%) (A + B + C) / 3 87,4 87,6 87,4 85,0

Table 1: Retrieval rate versus descriptor size of the subsampled descriptor

In the following the results for the uniform quantization scheme are presented (for computational reasons the results of the similarity part were computed with 10% of the query images). Table 2 shows the retrieval performance of the vertex quantization depending on the distortion (1) allowed. Table 3 reflect the behaviour of the uniform segment quantization. A threshold of 0 means that there is no difference between original and dequantized vertices. Again a similar behaviour as in the contour evolution can be observed: in the case of vertex quantization the retrieval rate is increasing to a certain point (Distortion Threshold of 0,1) when decreasing the distortion. But a further improvement can not be achieved by a more precise encoding of the vertices. Also in general allowing the same distortion vertex quantization performs better than the segment quantization. This is especially due to the low retrieval rate of the deformation test on segment quantization. If the measure to test the performance of the matching algorithm was based on the number of queries, both algorithms would seem more alike.

Distortion Threshold 0,3 0,2 0,1 0 (Original)

Scaling (%) Α1 85,35 86,38 87,69 88,05

Rotation (%) Α2 99,05 99,44 99,84 99,80

Tests Similarity (%) Β 74,21 74,96 76,07 76,00

Deformation (%) C 92,50 92,50 92,50 92,50

Average (%) (A + B + C) / 3 86,1 86,8 87,6 87,5

Deformation (%) C 91,00 92,50 92,50

Average (%) (A + B + C) / 3 86,43 87,22 87,5

Table 2: Retrieval rate versus descriptor size for uniform vertex quantization

Distortion Threshold 0,2 0,1 0 (Original)

Scaling (%) Α1 86,7 87,7 88,05

Rotation (%) Α2 99,0 99,6 99,8

Tests Similarity (%) Β 75,46 75,53 76,00

Table 3: Retrieval rate versus descriptor size for uniform segment quantization

4 shows the average descriptor sizes of the corresponding Distortion Thresholds. Also in this case the segment based quantization performs slightly worse than the vertex based one. In average the descriptor for segment quantization needs an extra byte. Descriptor size Vertices Segment 20,7 22,5 23,68 25,4 26,26 31,9 31,9

Distortion Threshold 0,3 0,2 0,1 0 (Original) Table 4: Descriptor size for quantization

Table 5 shows the behavior of the non uniform segment quantizer. The average retrieval for this quantizer is notably lower than the ones achieved without quantization (see Figure 6). While the algorithms to avoid crossings in general seem to have an acceptable performance in the special case of scaling it show a remarkable worse behavior compared to the uniform quantizer. So corrections on vertices positions in down-sampled shapes produce no benefit. On the other hand first test have shown that allowing error propagation on the vertices location without checking for crossings gives a higher performance. So the algorithms to avoid error propagations and crossing seems to completely reduce the benefit of the non uniform quantizers. Also a constant non uniform quantizer in the image space as used in this case might not be sufficient. Long segments can have a large or small angle with its neighboring segments. So to allow a restrict the error in the tangent space also the quantization of the segment length has to be adaptive and not fixed as it is the case right now.

Retrieval Rate for the Quantized Descriptor       45

   3 -, 1

+ 21

  

/ ,0 -.

 

+,

Subsampled Descriptor

 

Vertex-quantized Descriptor

 

Segment-quantized Descriptor Non uniform Quant.

  









 

 

  !#" $&%(')! *

Figure 6: Retrieval rate versus descriptor size using different mechanisms for subsampling and quantization.

Finally, in Figure 6 the global performance of the two methods is compared and a working point can be chosen in order to maximize the retrieval rate against the descriptor size.

5% 10%

Scaling (%) Α1 80,67 79,72

Rotation (%) Α2 98,63 97,85

non uniform quantizer Similarity (%) Deformation (%) C Β 75,8 92,00 74,85 92,50

Average (%) (A + B + C) / 3 85,82 85,37

Table 5: Comparison between same-absoluteand relative error quantizers.

Overall it can be noticed that it is possible to reduce the descriptor size about 8 bytes (25%) keeping the retrieval rate at the same level by using a distortion measure of 0.1 for the uniform vertex quantization. Combined with the gain using an appropriate operation point for the relevance measure threshold the size can be decreased to 50% compared to settings used previously in the literature[3]. This determined operation point also depends on the contour set available and especially on the variance of segmentation algorithms. But for the test set available within MPEG-7 this is appropriate and interoperable to Descriptors using a higher precision if applicable.

6. CONCLUSIONS In this paper we have tackled the problem of the size of basic visual descriptors. Based on the example of a polygon based shape descriptor we have proposed an efficient coding scheme which can be used for retrieval as well as for localization and visualization purposes. To optimize this visual descriptor with respect to its size we investigated the retrieval versus size behavior of several lossy encoding algorithms. The comparison of an non uniform segment quantizer and a uniform vertex quantizer has shown that the latter one outperforms the first. This is also due to algorithms for limiting the error propagation and crossings which have to be applied in the non uniform case. We have shown that using a combination of contour abstraction and vertex quantization the descriptor size can be reduced to 50% with no loss of discrimination power in comparison to former operation points of the descriptor mentioned in the literature. The experiments were conducted under the assumption to also support the functionality of localization. Also the performance of a quantization applied in the tangent space which is adapted more precisely to the matching metric has to be further studied and compared with the uniform quantization in the image space. But since such a quantization would result in an equivalent contour representations which is most likely a not closed contour the localization functionality will be lost.

7. REFERENCES [1] MPEG-7: Context, Objectives and Technical Roadmap, V.12, ISO/IEC JTC1/SC29/WG11/N2861, http://www.cselt.stet.it/mpeg/, July 1999. [2] L. J. Latecki and R. Lakämper: Polygon Evolution by Vertex Deletion. Proc. of the 2nd Int. Conf. on Scale-Space Theories in Computer Vision, Corfu, Greece, Springer-Verlag, pp. 398-409, September 1999. [3] L. J. Latecki and R. Lakämper: Contour-based shape similarity. Proc. of the 3rd Int. Conf. on Visual Information Systems, Amsterdam, Springer-Verlag, pp. 617--624, June 1999. [4] Farzin Mokhtarian and Alan K. Mackworth: A theory of multiscale, curvature-based shape representation for planar curves. IEEE Transactions, PAMI-14(8):789-805, August 1992. [5]F. Mokhtarian, S. Abbasi, and J. Kittler. Efficient and robust retrieval by shape content through curvature scale space. In A. W. M. Smeulders and R. Jain, editors, Image DataBases and Multi-Media Search, pages 51–58. World Scientific Publishing, Singapore, 1997; http://www.ee.surrey.ac.uk/Research /VSSP/imagedb/ demo.html.

[6] G. Chuang, C.-C. Jay Kuo, "Wavelet Descriptor of Planar Curves: Theory and Applications", IEEE Trans. on Image Processing, vol. 5, no. 1, pp. 56-70, Jan. 1996 [7]K. Müller, J.-R. Ohm : "Contour Descriptor Using Wavelets", WIAMIS’99, Berlin, May 1999 [8] K.J. O’Conell, "Object-Adaptive Vertex-based Shape Coding Method", IEEE Trans. Circiuts Systems for Video Technology, vol. 7, No. 1, February 1997 [9] Jörg Heuer, André Kaup, Ulrich Eckhardt, Longin J. Latecki, Rolf Lakämper, "Results of Polygon based Contour Shape descriptor according to CE1", ISO/IEC JTC1/SC29/WG11/M5906 Noordwijkerhout/March 2000 [***] Kannst du Titel deines Conferenzbeitrages anfügen?