Вы находитесь на странице: 1из 6


Robot Navigation in Dynamic Environments using Global-Appearance Descriptors. State of the Art.

Yerai Berenguer Email: yerai.berenguer@goumh.umh.es PhD program: Tecnolog´ıas Industriales y de Telecomunicacion´


Abstract—Map building and localization are two impor- tant abilities that autonomous mobile robots must develop. This way, much research has been carried out on these topics, and researchers have proposed many approaches to address these problems. This work presents an state of the art report on map building and localization using global appearance descriptors. In this approach, robots capture visual information from the environment and obtain, usually by means of a transformation, a global appearance descriptor for each image. Using these descriptors, the robot is able to estimate its location in a map previously built, which is also composed of a set of global appearance descriptors. Several previous investigations that have led to the approach of this research are summarized in this paper, such as researches that compare several methods of creating global appearance descriptors. In these works we observe how the continuous optimization of the algorithms has lead to better estimations of the robot position within the environment. Finally a number of future directions in which researches are currently working are listed. This work has been performed to gain access to TECNIT doctoral program at the University Miguel Hernandez´ of Elche.

Index Terms—omnidirectional images, global appear- ance descriptors, map building, localization


In recent years the presence of robots in our society has increased considerably. These robots are used to resolve countless tasks that facilitate our work. There are many types of robots with different functions. At first they were designed to replace people in hazardous work but little by little they have started to be used to perform many other tasks thanks to their complexity or precision. It would be impossible for humans to carry out them without the help of robots. Many of these robots are autonomous and do not require the help of a human operator to run unless they suffer a breakdown or the software is programmed wrong and needs an update.

Additionally, if they are mobile, they need to collect some information from the surroundings and have a vision of the environment so they can perform autonomously the tasks they have been designed for. To do this, they are equipped with different sensors that allow them to extract the necessary information from the environment. These sensors can be of different types such as touch sensors, encoders, laser or vision sensors. Using them the robot must be capable of extracting the information needed for its correct operation, such as knowing what state it is or knowing how to move around to arrive at a particular target point. Finally, an autonomous robot must be able to adapt its behavior to changing environments. The evolution of the techniques in these areas is documented in works such as [1]. This work shows the progress in map building, localization and navigation using computer vision to mid-90s. And [2] supplements this work with the evolution of technology from the 90s to the present. The remainder of this paper is structured as follows. Section 2 introduces mapping and location with vision. Section 3 presents map building with visual marks. Section 4 shows how map building is carried out using global appearance. Section 5 presents mapping and localization in dynamic en- vironments. Section 6 presents previous projects. At last section 7 outlines the conclusion and future lines of research.

January 23, 2015


To perform the mapping, localization and nav- igation of mobile robots it is necessary to use information perceived by the sensory systems of the robot. From the first works on mobile robots [3], the designed control systems have used, to a


greater or lesser extent, environmental information collected by the robot. One common type of infor- mation collected from the environment is the visual information. This visual information may be obtained by dif- ferent sensors. Such as single cameras [4], stereo cameras [5], configurations with three cameras [6], and in recent years, vision systems that obtain omni- directional images [7] which provide a large amount of information with a field of view of 360 degrees around the robot, a high stability of the features that appear in the image and low cost compared to other sensors (e.g. laser). Many authors have studied the use of omnidi- rectional images in map building and localization tasks. Because of the large amount of information in these images it is usually necessary to extract the most important information, so that this information is helpful to solve these tasks. These solutions are divided into two kinds that depend on the methods used to extract this information: solutions based on the extraction of local features and solutions based on global appearance. Approaches based on local features extraction consist in extracting several points, landmarks or regions from each omnidirectional image [8]. And the approaches based on the global appearance of the scenes work with images as a whole, without extracting any local information [9]. Focusing on the problem of mapping, the research has traditionally focused on two approaches: metric and topological, but now the border between these approaches is somewhat diffuse and many maps are hybrid; these maps attempt to combine both types of information so that the map has the advantages of the two types of methods. Firstly, metric maps try to model the environment with geometric precision. These maps represent some features of the environment, with respect to a reference system, with a degree of uncertainty that can be estimated [10]. Secondly, topological maps are usually graphi- cal models that collect information from several characteristic locations along the nevironment with the connectivity relationships between them, in a compact way [11].


The problem of SLAM (Simultaneous Localiza- tion And Mapping) is considered one of the most

complex in the field of mobile robotics, since the dependence between the map and the pose of the robot is a big inconvenient. An error in the cal- culation of the pose of the robot induces an error in the map and vice versa. Until recently, visual SLAM applications generally have consisted in ex- tracting visual landmarks. Several research groups have dedicated their efforts to the creation of visual maps based on this approach [12], [13], [14], [15], [16]. In all of these solutions the SLAM algorithm is based on estimating the position of the robot and the three-dimensional position of these visual points, therefore the size of the state to estimate grows quickly as the robot navigates, captures new scenes and adds landmarks to the map. Consequently, the estimation of 3D maps of visual landmarks requires large amounts of memory and computing resources because of the increasing number of variables to be estimated. In some works, the estimation of 3D maps was made possible using a team of robots [17]. To solve the computational problem described above, some variants were proposed to the general paradigm in the context of visual SLAM. For exam- ple, [18] does not estimate the position of any visual landmark. In this work the map is composed of a set of omnidirectional images and the poses from which they were captured. In contrast with the current general trend in the area of SLAM, the elements to estimate do not correspond to any physical element in the environment (e.g. a corner). In this case [18], each landmark is constituted by an omnidirectional image captured from a pose and a set of points of interest extracted from the image together with their descriptors. Given a map image and an image captured by the robot at a given instant, the relative position and orientation between the two images can be calculated with the algorithm of the seven points [19] from a set of correspondences between points of interest in both images. Moreover, in [18] an architecture based on the extended Kalman filter was presented. It permited the estimation of a visual map using a robot moving in a plane. The results showed that the computational cost and the amount of memory needed were greatly reduced with this new conception of map. In [20], different techniques for height estimation are presented. To solve this problem, several omnidirectional images taken at different heights are compared. Each scene is described using a global appearance descriptor. The authors make use of a set of previously captured


omnidirectional images (map), taken from different heights. When a test image arrives, its global ap- pearance descriptor is obtained and compared with the descriptors of the map images. As a result, the relative heigh of the test image can be estimated.


Methods based on the global appearance of the

scenes constitute a robust alternative when creat- ing topological maps and estimating the pose of

a mobile robot within the maps, compared with

the methods based on extraction of local points or landmarks. Their strength lies in their ability to represent the environment through high-level features that can be interpreted and handled easily. Previous studies demonstrate the viability of using

this type of descriptors for the creation of dense topological maps in indoor environments [21], [22] and a posterior location on these maps using a probabilistic approach [23], [24]. Many comparative studies have been carried out between different

global description methods. Thanks to these studies, the parameters of the methods have been tuned to optimize the computational cost and accuracy of the mapping and localization processes [25]. Several global appearance descriptors are described in [26] such as Discret Fourier Transform (DFT), Principal Components Anlalysis (PCA), Histogram

of Oriented Gradients (HOG), and Gist Descriptors.

Also the performance of the methods based in global appearance and the methods based on feature extraction to estimate a visual odometry has been studied on equal terms [27], [28]. Despite its relative simplicity, the application of techniques based on the global appearance has several unresolved challenges when used in real and large environments. First, we must work with optimal descriptors which contain most of the infor- mation of the scenes in the smallest possible size, as [29], which uses a panoramic Gist descriptor for that purpose, and [30], which presents a study of reduced visual information with several orders of magnitude that improves the computational cost at the expense of losing precision. Second, we must create highly compact maps, trying to store the minimum necessary information that subsequently permits precise robot location at intermediate points and path planning to take the robot to the desired destination points within the map, both in the case of dense maps and in the case of compact maps.

Also we must provide the highest possible robust- ness to the maps created. Some researchers have begun working on this subject using landmarks and have tried to solve the problem by recording the stability of the different characteristics of each node along time and extracting the most stable ones [31] or updating information of each node incrementally adding the information that is changing [32]. It is of great interest broaching the problem using information from global appearance and surpassing the concept of ”dense static map” that has char- acterized so far these approaches of appearance, because of the high computational cost involved and the problems posed to the long term location, with changes in lighting, occlusion, moving objects, etc. It is also interesting to study how localization with 6 degrees of freedom can be done, using only the information from the images captured by an omnidirectional camera. This means to estimate the position and orientation of the mobile robot in the space [33], [34], [35].


Exploration algorithms allow robots to explore optimally an unknown environment. The are usually designed to make robots return to known positions in order to reduce the uncertainty in their location. Some existing problems in dynamic environments are moving objects (e.g. people), changes in lighting conditions, etc. These problems require .a continu- ous update of the map. If the environment is large, this update process will be very costly. Some researches have tried to approach the explo- ration problem with algorithms based on potential fields. They are able to direct the process of explor- ing by a team of robots. One of the main drawbacks of these techniques is the high probability of appear- ance of local minima where robot can get trapped. Thus, in [36] a hybrid algorithm is presented. It is based in several low-level reactive behaviors and the ability to detect any occurrence of local minima in the potential field. Finally, [37] presents a compar- ison of the existing main exploration algorithms so far. One of the still open problems in this area is to extend these algorithms to the case where there are moving objects or persons in the environment while the robots build the map. These changes in


the environment can be short-term changes or long- term changes. Short-term changes appear when other mobile agents (e.g. people), whose movement is quick com- paring to the robot, are present in the environment. This will affect both the mapping and the localiza- tion processes. It may be necessary to monitor the dynamic obstacles and predict their movements to plan paths that do not interfere. One solution to this problem consist typically in identifying mobile agents in the readings of visual sensors or laser [38] and to monitor them, separating them from the static part of the environment. With the dynamic part the robot could build a dynamic map that permits predicting movements of mobile agents before path planning [39]. To make a good prediction of movements it is interesting that the robot has the ability to learn typical motion patterns [40] depending on the type of mobile agent detected. Long-term changes include the occasional changes in the position of objects such as chairs, tables, etc. These changes may suppose a problem for the localization on a map previously created because the current appearance of the environment may be substantially different [41]. [42] proposes adding new information to the map to take into account long-term changes that have ocurred in the environment. Some long-term changes in the environment can suppose a challenge not only in terms of SLAM but also in the planning of trajectories. These cases appear when such changes affect not only the appearance but also the environment topology [43].


The first contributions of our research team in this area arise within the DPI2007-61197 project. In the framework of this project many developments were carried out. They allowed us to achieve excellent results in the creation of visual maps. A compara- tive study of several detection methods and visual description of marks was made from the viewpoint of visual SLAM. This study demonstrated that the right combination of detector and descriptor is a critical factor for accurate results in obtaining visual maps. These studies permitted the creation of three- dimensional maps of the environment using one or more robots.

After that, in the DPI2010-15308 project, one SLAM architecture was developed to create a map of the environment from omnidirectional images while the robot moved along a path contained in a plane. Another line of research, conducted simulta- neously to the two aforementioned projects was the use of global appearance descriptors of the scenes, as opposed to the extraction and description of a number of landmarks. The results demonstrate that this alternative presents great robustness in creating topological maps and localizing a mobile robot within these maps. Its strength lies in the possibility of repre- senting the environment through high-level features, which can be interpreted and handled easily.


The problems and improvements mentioned throughout this document open a whole field of

research that can be structured in several lines, some of which are already under development in our research group. Among them, we consider most relevant:

1. It would be interesting to include the infor-

mation in the color channels when working with images obtained from the environment. It is possible

that these channels provide richer information to the descriptor and permit improving the construction process of the hierarchical map and subsequent location, even under severe lighting changes.

2. The properties of the global appearance algo-

rithms have allowed us to develop algorithms for

mapping and localization, thanks to them the robot is able to build a model which can be used to estimate its position and orientation in the ground plane. However, we believe these descriptors also contain sufficient information to estimate changes in the position and orientation of the robot in space (with 6 degrees of freedom). For this reason, another line of future research would be to study how such movements affect descriptors information to design an algorithm to estimate the position and orientation of the robot in space from the visual information. These algorithms would be applicable to localization in rough terrain and control of UAVs (Unmanned Aerial Vehicle).

3. A natural evolution of mapping algorithms

and hierarchical localization would be the devel- opment of methods for path planning and visual


control. From the information contained in the map, knowing the current position of the robot and the destination points, we should develop: a) a deliber- ative algorithm that is responsible for calculating the optimal path, and b) a reactive algorithm that is able to avoid collisions with obstacles and to know if the robot if fulfilling the task correctly, using only the visual information to solve these tasks. This future work would be to solve the problem of topological visual control.


[1] G.N. De Souza and A.C. Kak. ”Vision for mobile robot navi- gation : a survey”. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24: pp. 237-267, 2002. [2] F. Bonin-Font, A. Ortiz, and G. Oliver. ”Visual navigation for mobile robots: a survey”. Journal of Intelligent and Robotic Systems, vol. 53, pp. 263-296, 2008. [3] G. Giralt, R. Sobek, and R. Chatila. ”A multi-level planning and navigation system for a mobile robot; a first approach to hilare”. In Proceedings of the 6th International Joint Conference on Artificial Intellligence, pp. 335-337, Tok- yo, Japan, 1979. Morgan Kaufmann.

[4] R. Sim and G. Dudek. ”Effective exploration strategies for the construction of visual maps”. In Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), vol. 3, pp. 3224-3231, Las Vegas, NV, October 2003. IEEE Press. [5] R. Sim and J. J. Little. ”Autonomous vision-based exploration and mapping using hybrid maps and Rao-Blackwellised particle filters”. In Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 2082-2089, Beijing, 2006. IEEE/RSJ, IEEE Press. [6] N. Ayache and F. Lustman. ”Trinocular stereo vision for robotics”. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13(1), pp. 73-85, 1991. [7] N. Winters, J. Gaspar, G. Lacey, and J. Santos-Victor. ”Omni- directional vision for robot navigation”. In Proc. IEEE Workshop on Omnidirectional Vision, South, pp. 21-28, 2000. [8] E. Trucco and K. Plakas. ”Video tracking: a concise survey”. IEEE Journal of Oceanic Engineering, vol. 31(2), pp.520-529,


[9] L. Paya.´ Tecnicas´ de descripcion´ de la apariencia global de escenas: aplicacion´ a la creacion´ de mapas y loalizacion´ de robots moviles´ . PhD thesis. Miguel Hernandez´ University 2014, Elche, Spain. [10] H. P. Moravec. ”Sensor fusion in certainty grids for mobile robots”. AI Magazine, pp. 61-74, 1988. [11] F. Werner, F. Maire, and J. Sitte. ”Topological slam using fast vision techniques”. In Proc. of the FIRA RoboWorld Congress 2009 on Advances in Robotics, pp. 187-196, Berlin, Heidelberg, 2009. Springer-Verlag. [12] J. Valls-Miro,´ G. Dissanayake and W. Zhou, ”Vision-based SLAM using natural features in indoor environments”. In Proc. of the 2005 IEEE Int. Conf. on Intelligent Networks, Sensor Networks and information Processing, pp.151-156, 2005. [13] J. Valls-Miro,´ G. Dissanayake and W. Zhou, ”Towards Vision Based navigation in Large Indoor environments”. In Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Beijing, China, 2006. [14] A.J. Davison and D. W. Murray, ”Simultaneous Localisation and Map-Building Using Active Vision”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002.

[15] A. J. Davison, ”Real-Time Simultaneous Localisation and Map- ping with a Single Camera”. Proc. of the Int. Conf. on Computer Vision (ICCV), 2003. [16] A. Gil, O. Reinoso, O. Mart´ınez Mozos, C. Stachniss and W. Burgard, ”Improving Data Association in Vision-based SLAM” Proc. of the 2006 IEEE/RSJ Int. Conference on Intelligent Robots and Systems, Beijing, China, 2006.

[17] A. Gil, O. Reinoso, M. Ballesta and M. Juli, ”Multi-robot visual SLAM using a Rao-Blackwellized particle filter” Robotics and Autonomous Systems, vol. 58(1), pp. 68-80. 2009. [18] D. Valiente, A. Gil, F. Amoros´ and O. Reinoso, ”SLAM of View-based Maps using SGD”. 10th Int. Conf. on Informatics in Control, Automation and Robotics (ICINCO), vol. 2, pp. 329- 336, Reykjavik, Iceland, 29-31 July 2013. [19] D. Scaramuzza, F. Fraundorfer, and R. Siegwart, ”Real-time monocular visual odometry for on-road vehicles with 1-point RANSAC”. In Proc. of the ICRA, Kobe, Japan, 2009. [20] F. Amors, L. Paya,´ D. Valiente, L.M. Jimenez´ and O. Reinoso, ”Estimacion´ de la altura en aplicaciones de navegacion´ topologicas´ mediante apariencia global de informacion´ visual” Jornadas de AUTOMATICA, 3-5 Sept. Valencia, Spain, 2014. [21] L. Paya,´ L. Fernandez,´ A. Gil and O. Reinoso, ”Map building and Monte Carlo localization using global appearance of omnidi- rectional images”, Sensors, vol. 10(12), pp. 11468-11497, 2010.

[22] L. Fernandez,´

L. Paya,´ O. Reinoso and A. Gil, ”Comparison

of appearance based topological mapping methods”. Journal of Artificial Intelligence: Theory and Applications, vol. 1(2), pp. 38-47, 2010.

[23] L. Fernandez,´ A. Gil, L. Paya´ and O. Reinoso, ”An evaluation of weighting methods for apperance-based Monte Carlo Localiza- tion using omnidirectional images”. Proc. of the 2nd Workshop on Omnidirectional Robot Vision. IEEE International Conference on Robotics and Automation (ICRA), pp. 13-18, Anchorage, Alaska, 2010. [24] L. Fernandes,´ L. Paya,´ D. Valiente and A. Gil, ”Monte Carlo localization using the global appearance of omnidirectional images”. Proc. of the Int. Conf. on Informatics in Control, Automation and Robotics (ICINCO). pp. 439-442, Rome, Italy,


[25] F. Amoros,´ L. Paya,´ O. Reinoso and L. Fernndez, ”Map building and localization using global-appearance descriptors applied to panoramic images”, Journal of Computer Information Technology, vol. 2(1), pp. 55-71. 2012. [26] L. Paya,´ F. Amoros,´ L. Fernandes´ and O. Reinoso, ”Perfor- mance of Global-Appearance Descriptors in Map Building and Localization Using Omnidirectional Vision” Sensors, vol. 14, pp. 3033-3064, 2014. [27] D. Valiente, L. Fernandes,´ A. Gil, L. Paya´ and O. Reinoso, ”Visual odometry through appearance- and feature-based method with omnidirectional images”, Journal of Robotics, 2012. [28] F. Amoros,´ L. Paya,´ O. Reinoso, W. Mayol-Cuevas and A. Calway, ”Topological map building and path estimation using global-appearance image descriptors”. Proc. of the Int. Conf on Informatics in Control, Automation and Robotics (ICINCO), pp. 385-392, Reykjavik, Islandia, 2013.

[29] A.C. Murillo, G. Singh, J. Kosecka and J.J. Guerrero. ”Local- ization in Urban Environments Using a Panoramic Gist Descrip- tor”, IEEE Transactions on Robotics, vol. 29(1), pp. 146-160,


[30] M. Milford, ”Visual Route Recognition with a Handful of Bits”. Proc. of Robotics: Science and Systems VIII, Sydney, Australia.


[31] B. Bacca, J. Salvi and X. Cuf´ı, ”Appearance-based mapping and localization for mobile robots using a feature stability


histogram”, Robotics and Autonomous Systems, vol. 59, pp. 285- 295, 2011. [32] F. Dayoub, G. Cielniak and T. Duckett, ”Long.term experiments with an adaptive spherical view representation for navigation in changing environments”. Robotics and Autonomous Systems, vol. 59, pp. 285-289. 2011. [33] B. Huhle, T. Schairer, A. Schilling and W. Strasser. ”Learning to localize with Gaussian process regression on omnidirectional image data”. In. Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 5208-5213, Taipei, China, 2010. [34] T. Schairer, B. Huhle, P. Vorst, A. Schilling and W. Strasser, ”Visual mapping with uncertainty for correspondence-free local- ization using Gaussian process regression”. Proc. of IEEE/RSJ Int. Conf. on Robots and Systems (IROS), pp. 4229-4235, San Francisco, California, 2011. [35] F. Amoros,´ L. Paya,´ O. Reinoso, W. Mayol-Cuevas and A. Cal- way, ”Global Appereance Applied to Visual Map Building and Path Estimation Using Multiscale Analysis” Hindawi Publishing Corporation, Mathematical Problems in Engineering, vol. 2014, Article ID 365417.

[36] M. Julia,´ O. Reinoso, A. Gil, M. Ballesta and L.Paya,´ ”A hybrid solution to the multi-robot integrated exploration problem” En- gineering Applications of Artificial Intelligence, vol. 23(4), pp. 473-486, 2010. [37] M. Julia,´ A. Gil and O. Reinoso, ”A Comparison of Path Planning Strategies for Autonomous Exploration and Mapping of Unknown Environments” Autonomous Robots, vol. 33(4), pp. 427-444, 2012. [38] K. O. Arras, O. Mart´ınez Mozos and W. Burgard, ”Using Boosted Features for the Detection of People in 2D Range Data,” Proc. of the IEEE International Conference on Robotics and Automation, pp. 3402-3407, 2007. [39] F. Rohmuller,¨ M. Althoff, D. Wollherr and M. Buss, ”Proba- bilistic Mapping of Dynamic Obstacles Using Markov Chains for Replanning in Dynamic Environments”, Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2008. [40] M. Bennewitz, W. Burgard, G. Cielniak and S. Thrun, ”Learn- ing Motion Patterns of People for Compliant Robot Motion”, The International Journal of Robotics Research, vol. 24(1), pp. 31-48, 2005. [41] F. Dayoub and T. Duckett, ”An adaptive apperance-based map for long-term topological localization of mobile robots”, In proc. of IEEE/RSJ Int. Con. on Intelligent Robots and Systems (IROS), pp. 3364-3369, 2008. [42] N. Mitsou and C. S. Tzafestas, ”Temporal occupancy grid for mobile robot dynamic environment mapping”, Control & Au- tomation, 2007. MED’07. Mediterranean Conference on. IEEE,


[43] A. Walcott-Bryant, M.Kaess, H. Johannsson and J. J. Leonard,

”Dynamic pose graph SLAM: Long-term mapping in low dy- namic environments” In proc of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 1871-1878. 2012.