Вы находитесь на странице: 1из 8

Joint Segmentation of Images and Depth Maps

Michele De Benedet, Riccardo Gasparetto Stori, Mauro Piazza

Abstract

In this project, we developed some algorithms for


the segmentation of images with depth maps. The
clustering-based approaches we extended are known
as K-means and Mean shift: the first one is a well-
known centroid-based procedure, while the latter is
a density-based clustering technique. Both methods
begin from a set of starting points, and then itera-
tively update the centroid of the clusters. The differ-
ence stands in how the clusters are iteratively updated:
the first method, K-means, computes a simple average,
whereas the second, Mean-Shift, moves towards the di-
Figure 2: Depth map, normalized
rection of a gradient. Segmentation is finally applied
on the original image, assigning each pixel to the best
cluster.

1. Data import

Figure 1: Color image


Figure 3: Scatter plot of above data, (x,y,z) compo-
We were provided with some images, with their nents of p on the axes, color on the data points
respective disparity maps captured with the use of
stereoscopic 3d image acquisition equipment (e.g. Mi-

1
crosoft Kinect is a widely used solution). After con- barycenter for every cluster j; eventually, the new set
0 0 0
verting the disparity map (which is the apparent pixel of centroids 1 , 2 , . . . , k is stored and used in (2).
motion between the two stereo images) to the depth K-means refines its search until a local minimum is
map (which represents the spatial physical distance found. Due to the discrete nature of data, we need
w.r.t. the position of the camera), we converted the to choose a stop criterium: the centroid of a clus-
RGB data from the color image to the L AB color ter is fixed just after the square distance difference of
space, which is uniform. Therefore, sample vectors two consecutive centers is beneath a certain threshold
( 0.05 ).
L(u, v) a(u, v) b(u, v)
p(u, v) = [ , , , ... The only parameter we are free to determine is K,
L L L the number of clusters. In our implementation K =
(1)
x(u, v) y(u, v) z(u, v) 10, but other images may need more (or less) depend-
..., , , ]
z z z ing on their structure. Some of the clusters may over-
belong to an euclidean space P, where we can com- lap (e.g. two or more of the initial i are very close),
pute norms and distances. The (arbitrary) parameter other, being the problem NP-hard, may not reach the
shifts the weight of the sample either towards the color optimum; to avoid this issue and therefore to obtain a
component or the spatial component, L and z are the suitable result, many attempts must be executed.
standard deviations of the L and z components respec- Finally, the running time of K-means is given as
tively and are used for normalization, u and v are the O(nkdi): n equals the number of pixels, d refers to
coordinates of the image. the d-dimensional vectors, k is the number of clus-
ters and i the number of iterations needed until con-
2. K-means vergence (some tens in our case).
K-means clustering is a computationally difficult 3. Mean shift
(NP-hard) method; here we present the heuristic Lloyd
algorithm, that quickly converges to a local optimum. The recursive mean shift procedure we imple-
We are given a set P of m data points in an n- mented searches optimal segment descriptors, begin-
dimensional space Rn and an integer k; the problem ning from a selection of scattered starting data points
is to determine a set of k centroids, so as to minimize (which can be randomly chosen, or uniformly dis-
the distance (computed with the p vector, see Eq. (1)) tributed, or selected by the user...).
from each data point to its nearest center. The algorithm iteratively fits a Gaussian kernel over
The algorithm works as follows: in the first itera- the samples, finds the gradient direction, so it moves
tion, given 1 , 2 , . . . , k Rn , k starting points (e.g. toward the optimum mode.
randomly chosen), for every point x(i) in the image, Upon reaching the maximum of the pdf of the clus-
compute ter points (within a threshold), this is compared with
the past cluster modes that have been already identi-
c(i) = arg min ||x(i) j ||2 , (2) fied, and added to them if its sufficiently distant from
j
all the others (to avoid duplicate modes).
the cluster whose center is the closest possible is also With random starting points, a threshold for the
stored for each sample. number of iterations is then set high enough for finding
The next step concerns the update of the center of the all segments with a sufficiently high probability.
clusters. We introduce the sets Sj , j = 1, 2, . . . , k
where Sj = {x(i) |c(i) = j} Given a starting point y(1) P, its useless to com-
pute the whole Gaussian distribution, the only useful
|Sj |
0 1 X result is the mean shift vector direction and magnitude,
j = sj,i , (3)
|Sj | computed in O(n):
i=1
Pn y(j) pi 2
where sj,i i = 1, 2, . . . , |Sj |, are all the points be- (j) i=1 pi exp (|| || )
y(y ) = P h
y(j) (4)
longing to the j-th cluster. Equation (3) finds the new n y(j) pi 2
i=1 exp (|| h || )

2
The window size h, since the algorithm is task de- the image in blocks and pick a fixed number of ran-
pendent, is selected manually case by case, in order dom points inside each block, or delegating the choice
to minimize the estimator variance and find consistent of starting points to an external entity, such as an upper
results. With values of h too small, the mean shift vec- layer or the user.
tors converge prematurely to nonexistent modes, with
values too big separate feature clusters are going to be 5. Results
merged. A more advanced approach would be imple- Each algorithm has been tested using different val-
menting an adaptive window size, iteratively updated ues of . In particular, we tested the k-means and the
e.g. by minimizing the Asymptotic Mean Integrated mean shift algorithms respectively with different val-
Squared Error (AMISE) cost function. ues of K and bandwidth, reporting all the resulting
When the mean shift vector converges (within a images in the tables in the following pages. As one
small threshold), the last point of the basin of attrac- would expect, with a smaller value of , segmentation
tion is considered to be the optimum, and added to the appears to be more effective as the K parameter de-
collection of cluster modes (if its not already present). creases, because all colors are uniformly clustered in a
After the last iteration, the resulting mode descrip- small number of regions. As increases instead, spa-
tors are tested and saddle points are pruned: a small tial coordinates become more relevant in terms of seg-
gaussian perturbation of zero mean, variance equal to mentation, in fact for = 10 the plant and the vase,
h2 is added to each result, which is discarded if the which are closer to the camera, tend to be fully seg-
mean shift algorithm doesnt converge again in the mented together, discarding all the information about
same spot. the respective colors.
Finally, all the segmentation results produced by the
4. Clustering
mean shift algorithm have been grouped in table 3 and
Once we obtained the optimum points with either 4. As the reader can see, great results have been ob-
the two algorithms, we need to perform the clustering. tained by using = 1 and bandwidth = 0.25, 0.1.
It consists in partitioning every point in the image fol- The test performed with bandwidth greater than 0.25
lowing these rules: did not produced good image segmentation, generat-
ing an image with only a few clusters. This comes
- each point is tested and assigned to the cluster
from the fact that a bandwidth set too large tends to
which centroid is the closest to it (in terms of p-
erroneously choose pixel that are not similar to the
distance, see Eq. (1));
one associated to the current cluster. Setting, instead,
- Pi X , i ; a higher , the algorithm tends to give more weight
Sk to spatial coordinates producing clusters based on the
- let P1 , , Pk be the clusters, then 1 Pi = X, distance from the camera.
X being the whole image; Its easy to note that the clustering with this image is
performed better by using mean shift, because the re-
- Pi Pj = , i 6= j
gion (the vase and the plant for instance) are uniformly
the last two together means that no points can stand segmented.
outside a cluster and each one belongs to only one set. Furthermore, another advantage of mean shift over
Each pixel in the original image is assigned to the k-means is that there is no need to choose the num-
nearest cluster, computing (O(n k)) the argmin of the ber of clusters in advance: mean shift is likely to find
euclidean distance between each pixel and each mode. only a few clusters if indeed only a small number ex-
Spatial distance and colour distance are weighed with ist. However, mean shift can be much slower than k-
the aforementioned ratio , so another measure of ge- means, and still requires selection of a bandwidth pa-
ometric distance for clustering would be redundant. rameter.
Results differed heavily in between simulations;
this behavior is mainly due to the fact that initial set
of point is random. A better choice could be to split

3
K
5 10 15
=0
= 0.5
=1
= 10

Table 1: Results showing the segmentation running k-means by varying and K parameters.

4
K
5 10 15
=0
= 0.5
=1
= 10

Table 2: Results showing the segmentation running k-means by varying and K parameters.

5

0 0.5 1 10
0.005
0.01
bandwidth
0.05
0.1
0.25

Table 3: Results showing the segmentation running mean shift by varying and bandwidth parameters.

6

0 0.5 1 10
0.005
0.01
bandwidth
0.05
0.1
0.25

Table 4: Results showing the segmentation running mean shift by varying and bandwidth parameters.

7
References
[1] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D.
Piatko, R. Silverman, and A. Y. Wu, An efficient k-
means clustering algorithm: Analysis and implementa-
tion, Pattern Analysis and Machine Intelligence, IEEE
Transactions on, vol. 24, no. 7, pp. 881892, 2002.
[2] D. Comaniciu and P. Meer, Mean shift: A robust ap-
proach toward feature space analysis, Pattern Analy-
sis and Machine Intelligence, IEEE Transactions on,
vol. 24, no. 5, pp. 603619, 2002.
[3] C. Dal Mutto, P. Zanuttigh, and G. M. Cortelazzo,
Scene segmentation by color and depth information
and its applications, University of Padova, 2010.

Вам также может понравиться