MPC1011: Matching Local Self-Similarities On GPUs

Massively Parallel Computing
MPC1011: Matching Local Self-Similarities on GPUs

M. Neumann and D. Ritter
Abstract We present a parallel version of the self similarity approach of paper [SI07]. The goal of this approach is to measure the similarity of different images and nd matches of an image within another image. This approach utilizes that the internal layout of local self-similarities correlate across different images. These internal selfsimilarites are captured within a compact descriptor. These descriptors are computed densely throughout the images at different scales. The descriptors account for a certain amount of local and global geometric distortions which allows the use of rough hand-sketches in order to nd real instances of the object in the image. We show the basic concepts, the parallelization approaches, object detection examples and a comparison to a CPU-version.
1. Introduction Finding objects in images is used in many applications, e.g. object recognition, object tracking, image in image search, hand-sketch search etc. The existing methods usually use local or global image properties to capture scene information. These are then compared in order to gure out the similarity between images. The assumption, that these approaches make, is that there is a underlying visual property (color, intensity, edges, gradients, etc.) that can be compared between these images. However, this assumption can be not satisfying enough, because images can show an instance of the same object even without sharing the same visual properties. Therefore, a local self-similarity descriptor was introduced in [SI07]. This descriptor captures the internal geometric layouts of local-self-similarities and also accounts for some local afne deformations. However, the local-self-similarity-descriptor creation and the matching of these descriptors is quite a computeintensive task. Therefore, the usage of the increasing compute power of GPUs can reduce the calculation time. This is possible because many of the processes for the descriptor calculations and the matching phase can be done in a parallel SIMD fashion. The resulting speed-up can then be used in order to take higher resolution images or use smaller images for real-time applications on object detection. 2. Related Work Image descriptors, which take local or global image properties into account, are well documented. See [MS05] or [XHE 10] for a comparison of the most popular approaches.
c 2011 The Author(s)
How such a descriptor can be accelerated using a GPU approach is discussed in [WFG 09]. The local self similarity descriptor is introduced in [SI07]. In order to make the descriptor sparse, they used an approach introduced in [Hoy04]. Using sketches for image retrieval is discussed in [BBGI10]. For the matching phase they used a modied version of the ensemble matching algorithm described in [BI07]. 3. Overview of System In this section we present an overview of the several algorithmic steps in order to match two images. Figure 1 shows the different steps of our pipeline which are described in detail in section 4. The approach consists of the following three steps. Extracting descriptors from the query image, matching the descriptors across images and the visualization of the results. We implemented the extraction and the matching phase in parallel on GPU. We differentiate between two kinds of input images: query images and database images. The query images are searched in database images. Since self-similarity may appear at various scales and in different region sizes, we extract selfsimilarity descriptors at multiple scales of the query image. The rst step is to transform the images to the LAB color space. The following creation of the descriptors is based on the paper [SI07]. We divided the pictures in a 5x5 pixel grid. We will refer to them as patches. For each of these patches we calculate one local self-similarity descriptor which contains the 80 nal values. For calculating these descriptors, we use a larger surrounding image region (45x45 pixels)
M. Neumann & D. Ritter / MPC1011: Matching Local Self-Similarities on GPUs
around the center of these patches and measured the similarity between the patch and its local environment. To calculate the similarity, we use a SSD (Sum of Square Differences) between the patch and its several surrounding patches in the image region. This results in 1681 values per grid element. These values are then transformed in a log-polar representation with 20 bins an 4 angles. In each bin we select the maximal value which reduces our initial 1681 values to 80 values in the descriptor. The descriptor values are then normalized to the range [0..1]. However, using all descriptors would provide bad results in the matching phase because of non-informative descriptors. Therefore, we lter out non-informative descriptors and descriptors with high self-similarity. After that ltering the descriptors are normalized again to the range [0..1].
4. Description of Algorithmic Steps 4.1. SSD The local self-similarity descriptor creation is described in [SI07]. Where local means that the descriptor is calculated only with the help of its local environment. Therefore, it is a measure of similarity in its environment. We divided the input picture in a 5x5 pixel grid and dened for each of these grid elements a surrounding box of the size 45x45 pixels. It is necessary to choose an odd number to get a real center point in this box. For each of those 5x5 patches we then calculated one descriptor. This section describes how we divided the calculations in blocks and threads on the GPU. This partitioning is also illustrated in Figure 2. To calculate one descriptor we used one block on the GPU. This means we have as many blocks as patches. Furthermore each block consists of 41 threads which are all located on top of our surrounding box. Each thread traverses the box from the top to the bottom and calculates a SSD between its current position and the center of the patch. Note that a SSD is not calculated between single points. These points are only used to center a 5x5 patch on them and then calculate a SSD between those 5x5 patches. Because a thread is traversing the box, each thread is responsible to calculate 41 SSDs. Remember that we have 41 threads which are all doing the same, so we get 1681 values for each patch. Together these 1681 values form a distance surface.
In the matching phase we compare multiple query descriptors, that were extracted before at multiple scales, with a single database image. The matching algorithm generates multiple likelihood maps that are 5 times smaller than the database image size. To measure the similarity between descriptor values, accounting their respective position, we implemented a weighting function that is described in the the next section.
Each likelihood map is visualized as a heat map. We also generate a combined heat map with all scales within one heat map.
Figure 1: Overview of the different algorithmic steps for matching a query with a database image. The upper part of the picture shows the process on the database image and the lower part shows the process on the query images.
of block needed per block m, which results in n m launched blocks. Then the variances, which are needed to calculated equation 1, are also calculated on the GPU. Finally equation 1 is applied on the minimums of the correlation surface in order to get the descriptor. 4.3. Filtering of Non-Informative Descriptors Figure 2: This picture shows the partitioning in blocks and threads for the SSD calculation. In order to nd the query image in the database image, the rst step is to eliminate the non-informative descriptors. According to [SI07], there are two kinds of non-informative descriptors. First, there are the descriptors that do not capture any self-similarities. These are descriptors where the center patch is salient which has the effect that no similarity can be found in the patch. The second kind of descriptor is the one that contains a high self-similarity which happens if the patch consists of a large homogeneous area. In order to nd those, our approach was to eliminate the descriptors where the sum of the descriptor values is below a certain threshold (saliency) and those where the sum is above a certain threshold (homogeneity). 4.4. Matching After creating the descriptors, we determine the similarity between them. In order to do this, we implemented a GPU device function that uses a sigmoid function on the L1-Distance between the 80 values of two descriptor pairs. The sigmoid function gives us a similarity probability in the range of [0..1] between the two descriptors. Our basic concept for matching a query image with a database image is to lay the descriptor values of the query image over the descriptors from the database image. After building such an overlay, we measure the similarity of this overlay. Because our query image is smaller than the database image, we shift the query image over the database image. Our partitioning is similar to the one we used to calculate the SSDs. For each of these overlays we use one block. One block has as many threads as the query image has patches in x-direction. Again each thread traverses the overlay in ydirection. In each traversing step, the thread calculates the similarity between the query and the underlying database descriptor and adds this probability to a sumthread.x variable. After each thread has reached the height of the query image, a single thread sums up all sumthread.x variables to sum f inal variable. The sum f inal variable is then written into the result on all positions used in the overlay. Note that some descriptors will not be used and therefore that position sum f inal is not written. For the write back into the result matrix we used a atomic add function. This is necessary because each descriptor position is included in several overlays. Thus, only the overlays with the highest probabilities are written to the result matrix. Figure 4 shows this overlay and shift approach.
Figure 3: The descriptor creation progress. The [source: [SI07]]
4.2. Log-Polar Transformation The resulting distance surface from the SSD is now transformed into a so called correlation surface. This is done by applying equation 1 to every 5x5 patch q. Sq (x, y) = e
max(var
SSDq (x,y) noise ,varauto (q))
(1)
Where varnoise is a constant and varauto (q) is the maximal variance in a small region around the center of the patch q. After the transformation into the correlation surface, a mapping to log-polar coordinates is done. This maps every correlation surface to one bin in the log-polar representation. The log-polar representation has 80 bins (20 angles, 4 radial intervals). The resulting 80 descriptor values are the maximum of each bin. The descriptor values are normalized to the range [0..1]. Figure 3 shows this process. Instead of applying equation 1 to all elements of the correlation surface and then nding the maximum of the values, we mapped the minimum of the correlation surface to the bins and then calculated the costly e-function only on the resulting 80 descriptor values. This works because the maximum of equation 1 is the minimum of the correlation surface values because of the negative e-function. In order to parallelize this step, rst of all, a mapping mask is computed once on the GPU. This mapping mask is used as a look-up table for the position of each correlation surface element in the log-polar bins. The mapping itself also runs in parallel. Because only the minimum value for each bin has to be stored in the nal descriptor, we used the atomic minimum function. As thread layout, we chose a xed number of threads per block in order to be exible to changes in the size of the correlation surface. The grid size was determined by the number of patches n and the number
Figure 4: This illustration shows how two ensembles of image descriptors are matched. First the descriptors are positioned on each other. The green and the blue descriptors form an overlay. Then a probability for this overlay is calculated. The probability is written in the result. After that, the overlay is shifted in x- and y-direction. This is done in parallel for multiple blocks. In this example we need 6 blocks each with 3 threads.
4.5. Visualization As visualization we implemented a heat map. We used a function that transfers the probability value into the colors of a heat map. We generate a quadripartite output image. First part is the original database image, second part is the original query image, third part is a visualization of the generated query descriptor and the fourth part shows the resulting heat map. This quadripartite view is very useful because you can easily see how adjusting some parameters inuences the result. Note that the quadripartite image is generated for every query image scale. From the created heat maps we also produce a combined heat map that represents an average of all heat maps.
Figure 5: Illustration of the neighborhood search with radius 1.
5. Results To account for local deformations, we integrated a neighborhood search that is shown in Figure 5. This means if we build an overlay, each thread compares multiple query descriptors in a specied radius with their corresponding database descriptors. With increasing distance the similarity decreases. Therefore, we used a sigmoid function. We integrated this approach by multiplying the probability of similar descriptor values with the probability of similar descriptor positions. As we mentioned before, our output image is a collage of four images. In this section we want to present you some of our results. We used the ETHZ shape dataset [dat11] as the query and database images in Figures 6, 7, 8, and 9. In the early stage of our implementation stage we used the pictures shown in Figure 10. This was a simplied testing environment because in Figure 10 we extracted the query image directly form the database image. We also implemented a visualization of the used and the
discarded descriptors. The used ones are white and the discarded ones are black. Figure 6 (e) shows a visualization of a database descriptor. Figures 6, 7, 8, 9 are all showing results of a more challenging apple-logo retrieval test. For the query image we used just a simple black and white colored apple logo. The large homogeneous colored areas in the query provides not enough self-similarity information. Descriptors whose summed up values are below 20 and above 70 are kicked out (maximum is 80). Therefore, the resulting descriptor includes only the edges of the apple. For the neighborhood search we achieved the best results with radius 2. This means our matching algorithm has a tolerance of a 25 pixels square at each point (5 pixel width for the center patch plus 2x5 pixels to right, left, bottom and top). The algorithm produces only a good match if the pattern in the query is at a similar size as the database image. 6. Discussion To create a self-similarity descriptor it is necessary to have a surrounding box around a grid element. We have chosen a length of 45 pixels for this square box. Because of that, the image must have a size of at least 45x45 pixels to extract one descriptor from an image. Note, that the surrounding box causes a 20 pixel width border on the images where no descriptors are calculated. The maximum size of a query, as well as a database image, is restricted to 1280x1280 pixels. Thats because NIVIDA GPUs only allow up to 65536 blocks ( 65536 5 = 1280). Furthermore, the query image needs to be smaller than the database image. Assume that the query image has the same size as the database image which means we have only one overlay, then a match can not be detected if the pattern is localized in different areas of the two input pictures. We measured the performance of our parallel descriptor creation against a CPU version that is included in the OpenCV package [ope11]. For the comparison we calculate the descriptors at multiple scales once with our implementation an once with the implementation in OpenCV. Figure 11 shows that our speedup increases with larger images. This is because more blocks are used with increasing image size. The maximum speedup is factor 32 with an image size of 1024x1024 pixels. Figure 12 shows that our descriptor implementation scales very well on different GPUs. The GTX 480 with 480 cores is almost twice as fast as the GTX 285 with 240 cores. We also measured the time for matching two images at different query scales. This is shown in Figure 4. A unintuitive observation is, that in the beginning the time increases until a query size of 400x400 pixels and then it decreases again. This is because with a larger query size, the amount of blocks is decreasing, while the amount of threads is increasing.
Figure 11: Comparison of our descriptor implementation with the OpenCV implementation. The speedup of our version increases with larger images. The maximum speedup is factor 32.
Figure 12: This graph shows how the calculation time for the descriptors is varying on different GPUs. On the GTX 480, which has twice amount of cores than the GTX 285, the performance gain is around the factor 2.
7. Conclusion and Future Work We presented a parallel version of the local self similarity descriptor, introduced in [SI07]. With the usage of the GPU we achieved a 32x speedup on a single GTX 480 GPU in comparison to the OpenCV CPU implementation. This speedup can be used for calculating higher resolution images or doing object detection in real-time applications. However, there are still lots of improvements that can be made. In order to improve the matching result, the sparseness measure from [Hoy04] could be used to improve the
Figure 13: Comparison for matching two images at different query scales. The database image size is xed. The GTX 480 is about two times faster than the GTX 285.
(a)
(b)
(c)
(d)
(e) Database descriptor
(f) Finalmap
Figure 6: The pictures (a)-(d) show the quadripartite output images. Such a quadripartite result picture is useful for adjusting parameters because you can e.g. easily see how a changing descriptor inuences the heat map. Image (e) shows a visualization of the database descriptor where only the white descriptors are used. Image (f) shows a combined heat map of the heat maps from images (a)-(d).
(a)
(b)
(c)
(d)
Figure 7: These pictures show another apple-logo recognition. Only if the pattern in the query and the database image are almost at the same size, a match is possible.
matching results. Also, there are lots of parameters that can be tweaked in order to improve the results. In case of performance, one could use more threads per thread block for the SSD calculation. Thereby our descriptor algorithm could handle images larger than 1280x1280 pixels. The matching and be accelerated. References
[BBGI10] BAGON S., B ROSTOVSKI O., G ALUN M., I RANI M.: Detecting and sketching the common. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (2010), pp. 3340. 1 [BI07] B OIMAN O., I RANI M.: Detecting Irregularities in Images and in Video. International Journal of Computer Vision 74, 1 (Jan. 2007), 1731. 1 [dat11] ETHZ - Computer Vision Lab: Datasets. http://www. vision.ee.ethz.ch/datasets/index.en.html, 2011. [Online; accessed 21-March-2011]. 4
[Hoy04] H OYER P. O.: Non-negative matrix factorization with sparseness constraints. 1, 5 [MS05] M IKOLAJCZYK K., S CHMID C.: A performance evaluation of local descriptors, 2005. 1 [ope11] Welcome - OpenCV Wiki. http://opencv. willowgarage.com/wiki/, 2011. [Online; accessed 21March-2011]. 5 [SI07] S HECHTMAN E., I RANI M.: Matching Local SelfSimilarities across Images and Videos. 2007 IEEE Conference on Computer Vision and Pattern Recognition (June 2007), 18. 1, 2, 3, 5 [WFG 09] WANG Y., F ENG Z., G UO H., H E C., YANG Y.: Scene Recognition Acceleration using CUDA and OpenMP. Computer, 3 (2009), 14221425. 1 [XHE 10]
RALBA
X IAO J., H AYS J., E HINGER K. A., O LIVA A., T OR A.: SUN database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2010), IEEE, pp. 34853492. 1
(a)
(b)
(c)
(d)
(f) Finalmap
Figure 8: Our approach works also with quite small patterns in real life pictures.
(a)
(b)
(c)
(d) Database descriptor
(e) Finalmap
Figure 9: Here the pictures (a)-(c) are matching correct. Only in the nalmap the head area is slightly mismatched.
(a)
(b)
(c)
(d)
(f) Finalmap
Figure 10: In this set of result pictures we used a cutout from the database image as the query image. We used this query image in the early implementation stage.

MPC1011: Matching Local Self-Similarities On GPUs

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

MPC1011: Matching Local Self-Similarities On GPUs

Загружено:

Авторское право:

Доступные форматы

Massively Parallel Computing

MPC1011: Matching Local Self-Similarities on GPUs

M. Neumann & D. Ritter / MPC1011: Matching Local Self-Similarities on GPUs

c 2011 The Author(s)

M. Neumann & D. Ritter / MPC1011: Matching Local Self-Similarities on GPUs

Figure 3: The descriptor creation progress. The [source: [SI07]]

M. Neumann & D. Ritter / MPC1011: Matching Local Self-Similarities on GPUs

Figure 5: Illustration of the neighborhood search with radius 1.

M. Neumann & D. Ritter / MPC1011: Matching Local Self-Similarities on GPUs

M. Neumann & D. Ritter / MPC1011: Matching Local Self-Similarities on GPUs

(e) Database descriptor

c 2011 The Author(s)

M. Neumann & D. Ritter / MPC1011: Matching Local Self-Similarities on GPUs

M. Neumann & D. Ritter / MPC1011: Matching Local Self-Similarities on GPUs

(e) Database descriptor

c 2011 The Author(s)

M. Neumann & D. Ritter / MPC1011: Matching Local Self-Similarities on GPUs

(d) Database descriptor

M. Neumann & D. Ritter / MPC1011: Matching Local Self-Similarities on GPUs

(e) Database descriptor

c 2011 The Author(s)

Вам также может понравиться