Вы находитесь на странице: 1из 6

1

Exploration and Implementation of a Next Generation Telepresence System


Ramachandra Budihal, Navaneeth Mohanan, Sahil A. Anand and Saish Satish Kamat
AbstractHuman communication includes not only spoken language but also non-verbal cues such as hand and body gestures, facial expressions, etc., to communicate our thoughts and feelings and gather feedback. Telepresence systems of today use a 2-way audio and video transmission to transmit this non-verbal information. In this paper, we introduce a novel Experiential Telepresence System, which possesses cognitive intelligence and is also context-ware i.e., it is aware of the multiple components of communication and ambience in which it communicates - both verbal and non-verbal, making the telepresence experience far more immersive when compared to its peers. This is achieved using a 3-tier architecture comprising of a Humanoid Robot, a Cognitive Collective Intelligence Platform on Cloud and an Experience Centre. Towards the end, a performance analysis coupled with a qualitative analysis of user perception which in otherwords is to measure the Quality of Experience of the system - shows the acceptability and user experience of our system is far higher when compared to with traditional telepresence and video conferencing.
Figure 1. Three tools for communication

Index TermsExperiential Telepresence, cognition, augmented reality, context-awareness, humanoid robot, Affective Interfaces, Tele-operation, Collective Intelligence on Cloud, SLAM, Cloud Robotics, Quality of Experience (QoE, QoX), Quality of Service (QoS), User Experience(UX)

I. I NTRODUCTION NTRINSICALLY, human communication can be broken down into verbal and non-verbal communication components. Face to face communication (Fig. 1) is considered to be one of the most effective forms of communication as it propagates both components without restrictions[1]. When it comes to long distance communication, traditional channels like letters and telephones lack the latter component . This gave birth to telepresence systems. These systems ensure non-verbal communication between individuals do not get hindered due to the limitations of the channel between them (Fig. 1). Different implementations of telepresence systems have approached this problem in multiple fashions. Companies like Cisco Systems have tackled this problem by launching products like the Cisco TelePresence in the year 2006[2]. Many more companies like Anybots[3], VGo Communications[4], Gostai[5] have ventured the path of teleoperated robots in order to add an element of user interactivity to telepresence.

However, in most current implementations, the channel of communication is a means of transmitting mostly audio and visual data over to a recipient and is interpreted by the recipient himself. Our research focuses of making the channel intelligent so as to be aware of the multiple components of communication it is transmitting and receiving. This brings us to the concept of Experiential Telepresence. In an Experiential Telepresence System, extra knowledge gathered from diverse sensing systems (sensors + smart apps) is available to the intelligent channel. This extra knowledge is augmented on top of a standard video and audio feed to convey more information than the previously mentioned telepresence systems (Fig. 1). Currently, our channel is able to interpret emotions, detect faces, recognize speakers and gather environmental information. The idea behind this Experiential Telepresence System originated from a talk presented by Budihal and a team consisting of other authors of this paper at a TED conference in Mysore in 2009. The talk introduced on a new model for heritage tourism called E3iT - Engage, Entertain, Educate, immerse and Transform[6]. The model stresses the need for an immersive experience in order to convey the story and history behind a heritage site. In this paper, we shall discuss the overall architecture and implementation of our Experiential Telepresence System along with a comparison against few of its commercial counterparts. Towards the end of the paper, we have briey mentioned the application areas of our telepresence system.

Ramachandra Budihal is with Wipro Techologies, Bangalore, India, e-mail: rama.budihal@wipro.com. Navaneeth Mohanan is with India Innovation Labs, Bangalore, India, email: navaneeth.mohan@indiainnovationlabs.in. Sahil A. Anand is with India Innovation Labs, Bangalore, India, e-mail: sahil.yousif@indiainnovationlabs.in. Saish Satish Kamat is with India Innovation Labs, Bangalore, India, e-mail: saish.kamat@indiainnovationlabs.in.

II. T HE E XPERIENTIAL T ELEPRESENCE S YSTEM A. Overview Our Experiential Telepresence System is a 3-tier architecture consisting of PRATHAM (a humanoid robot), a Collective Intelligence Platform on Cloud and an Experience Centre. All three components are connected via the Internet (Fig. 2). The information and knowledge gathered by multiple intelligent agents/systems, which also include a humanoid robot, are the primary knowledge generating sources. This shared knowledge is made available as crowd intelligence by the Collective Intelligence Platform. The Collective Intelligence Platform is a knowledge portal responsible for assimilating and dissipating knowledge from multiple robots on a real-time basis. The knowledge generated by the robot is transmitted across to the Experience Centre and is responsible for creating context-awareness in the information delivered. This forms the basis of Cloud Robotics. Cognition and context-awareness is one of the key differentiating feature of our Experiential Telepresence System. It is built at various levels in the system. At the lowest level, the system is aware of the network bandwidth available and is, thus, able to scale up or down the level of immersiveness, in order, to maintain optimal performance. The system is also able to recognize people in its environment using facial recognition, then gather specic information like age, profession through social networking sites and deliver content in a view most suitable to that person. This gives the user, who has created his/her avatar in the humanoid robot and is connected to the Experience Centre, a more immersive experience. The following sections talk about PRATHAM and the Experience Centre.

Figure 3.

Anatomy of PRATHAM

Figure 4. Hypothesized emotional response of human subjects following Moris statements

Figure 2.

The Experiential Telepresence System

B. PRATHAM - a Humanoid Robot: PRATHAM stands for Personal Robot And Telepresence Humanoid with Autonomous Mobility. Taking cue from the popular hypothesis of the Uncanny Valley[7], [8], [9] (Fig. 4) we decided to make PRATHAM a humanoid robot, thus, maintaining a social and emotional connect with the people it interacts.

The humanoid robot, itself, consists of a three layered architecture (Fig. 5) that generates all the necessary knowledge primitives before transmitting it to the Experience Centre. 1) Hardware Layer: At the lowest level, the robot consists of a system of sensors and actuators. Sensors are broadly divided into four types - position, navigation, visual and auditory. Position sensors include GPS and compass. Navigation sensors comprise of a laser SLAM (Simultaneous Localization and Mapping) module and ultrasound sensors. Visual sensors include a combination of a high resolution camera and a depth sensing camera. Finally, auditory sensors include a 6 channel microphone system useful for sound analysis. The robot also consists of two actuators - a mobility platform and a 6DOF head motor system. The mobility platform is a 3 wheeled system comprising of two feedback enabled DC motors that provide differential drive and a caster. The 6DOF head motor system is a combination of 3 servo motors connected orthogonally to each other. Together, the 2 actuators allow a remote user to move the base and the head of the robot. 2) Middleware Layer: The middleware forms the basic software platform consisting of ROS - Robot Operating System, a hardware abstraction layer, and Ubuntu Linux as our OS. ROS - Diamondback is the subsystem used by all our higher level software modules and by our hardware abstrac-

human gestures. c) Navigation Stack: The navigation stack handles control aspects of the humanoid robot. There are two modes of Experiential Telepresence - manual and autonomous. The two modes are explained in detail in the following section. C. Experience Centre: As mentioned earlier, a user of this system logs on to the Experience Centre in order to experience a remote location. Fig. 6 shows a user at our Experience Centre. The Experience Centre fuses the data from various perception primitives of the robot and, currently, displays it using augmented reality[11]. It consists of specic external aids and a neat visual user interface.

Figure 5.

PRATHAMs Architecture

tion layer. ROS allows development of modules in a graph architecture where each module forms a node of the graph and the communication between each of these node takes place through a publish subscribe or a service (request-reply) methodology. The hardware abstraction layer is a set of drivers written for each hardware module. The driver, written in ROS, is the entry point for the hardware to the ROS subsystem. The driver also does the necessary semantic conversion of the data to and from the hardware depending on the type and make of the hardware. Ubuntu Linux 10.10 was chosen as the OS keeping in mind the compatibility with ROS - Diamondback. 3) Application Layer: The application layer implements the high level logic of Experiential Telepresence on the robot. It consists of three subsystems - a video encoder, Experiential Telepresence Stack and navigation stack. a) Video Encoder: Video streaming on the robot is a point-point transmission. We have used an open source H.264 encoder called x264 for streaming video at a resolution of 640x480. This ensures high quality video streaming over the Internet. The high resolution camera is used to capture the scene the robot is able to see. This camera is placed exactly in the center between its two emotion eyes, which perhaps give the rst eye-to-eye contact between the user who has created an avatar in this robot and the subject/person interacting with the robot, which perhaps is very critical and important part for the QoE measure of the communication/interaction b) Experiential Telepresence Stack: The Experiential Telepresence Stack is the source of several knowledge primitives which get fused together at the Experience Centre. As part of this Experiential Telepresence Stack we have implemented facial recognition, emotion recognition and synthesis, sound localization and gesture recognition in this version. Facial recognition primitive recognizes multiple faces through the robots camera. Emotion recognition can recognize the six basic emotions[10] while emotion synthesis uses expression leds on the face to show emotions. Sound localization uses the six microphone array to localize the source of a speaker and gesture recognition uses the depth camera to interpret basic

Figure 6.

A user at our Experience Centre.

1) External Aids: To build this fully immersive experience, we found the usage of just a desktop monitor and a mouse to be insufcient. In order to make the user oblivious to his current environment and to immerse him into the remote location, we used a head gear(Fig. 6) that displays the perception of the robot. The head tracking sensors on the head gear detect the users head orientation, and then mimick the same using the robots 6DOF head motor system. In manual navigation mode, a joystick (Fig. 7) is used to navigate the humanoid robot from the Experience Centre. In addition, the robot is tted obstacle detection sensors to provide navigation assistance by overriding a users control in case of an emergency. The PRATHAM system also has a feature that provides a guided tour to its user[12]. By clicking at a location on the map provided, the robot autonomously navigates to that location by using either the laser range-nder or the depth camera. 2) Visual User Interface: The visual user interface (Fig. 8) performs the task of fusing the addtional knowledge originating the robot. It augments this new knowledge on top of the video feed from the robot camera to help perceive the environment better. At the lower right corner of our UI we show the GPS information which tells the current position and bearing of the robot on a map. At the lower left corner we have navigation assistance controls. As the

Figure 7.

Joystick (left), Vuzix iWear VR920 (right)

Figure 9.

PRATHAM in an outdoor environment

Figure 8.

User Interface Design

user nears an obstacle, the navigation assistance warns the user of the direction of the obstacle so that necessary actions may be taken to avoid the obstacle. The user interface also shows the information regarding the temperature, wind speed and direction at PRATHAMs location. In addition to this, PRATHAMs facial recognition system augments information regarding the people it sees through the camera. PRATHAM also identies buildings and structures based on the GPS location and augments information regarding the same. III. R ESULTS After 8 months of development, PRATHAM was successfully demonstrated at several locations (Fig. 9). A. Methodology of Measurement: Conventionally most of the systems benchmarks and measurements were done by subject matter experts from engineering and in variably they were concerned with network performance, Quality of Service (system uptime, MTBF, jitter, packet loss, BERR etc., were some of the key measurements). Business executives starting talking about average revenue per user and customer addition and attrition parameters by implementing some service systems such as SLA etc., in communication information management systems they used to manage. Today, more analysis is sought from the user

perspective like the famous saying by D.R.Scoggin - "The only way to know how customers see your business is to look at it through their eyes" perhaps prepares a base to involve psychologists and human behavior experts to value add in term of measurement called - Quality of Experience (QoE, QoX). Our evaluation has two parts one from the normal engineering perspective of measuring system performance and benchmarking, which perhaps serve mostly the objective measure, but this alone doesnt sufce for the customer satisfaction. Customer satisfaction involves a lot from the user/customer perception, which perhaps is more derived by the overall experience they are able to perceive after they are exposed to the system. These is a pure subjective measure which expressed based on their feelings. This is the measure of the overall value people are able to perceive about a product/concept (classical example is the success of apples iPod against various similar products which existed even before iPod entered the market - UX and Design elements of the product played major subjective gains over and above other system innovations) forms the second part of the evaluation. Quality of Experience - There has been many denitions to this like wikipedia states this as "a subjective measure of a customers experiences with a vendor" [13], K. Kilkki denes it as "basic character or nature of direct personal participation or observation" [14], which he further breaks it into multiple measures from different user perspective and brings in the relationship with the Quality of Service and the Fig. 10 has been taken from [14] that clearly denes it in form of components of a communication ecosystem. B. Performance Analysis: 1) PRATHAMs Benchmark Specications: Table 1 shows the benchmark specications that have emerged after approximately 200 hours of testing. 2) Comparison Against Peers: In order to help position our system with respect to similar systems, Table 2 shows a comparison of PRATHAM against three popular commercial telepresence systems- QB by Anybot[3], VGo by VGo Communications[4] and Jazz by Gostai[5].

Table II C OMPARISON OF PRATHAM AGAINST QB, VGO, JAZZ ROBOTS

Figure 10. system

Key components of a measurement in a communication eco

Table I PRATHAM S B ENCHMARK S PECIFICATIONS Table III S URVEY S AMPLE D ISTRIBUTION

which tells us the likelihood of this particular outcome of differences in the group means. If p is close to 1 then there is high likelihood that this difference would show up at random, if p < 0.05 then there is less than caused by chance .
Table IV S URVEY R ESULTS

C. Qualitative Analysis of User Perception: A qualitative survey was performed during a workshop and presentation of the Experiential Telepresence System. A total of 40 people participated in the study to rate their experience of using the Experiential Telepresence System in an indoor as well as outdoor environment. The following (Table 3) are details of the study participants. The questions were answered on a seven point Likert Scale ( from 1 to 7, where a low score depicts lower level of engagement or quality or whatever the measure is) and were analyzed using a method called Analysis of Variance (ANOVA)[15], which highlights the statistically signicant difference in the means between samples from groups (Table 4). One of the outcomes of this analysis is the p measure

Users were asked to rate the Experiential Telepresence System along with exisiting systems for telepresence on a 10 point scale ( higher score = better quality of experience ). This data was taken only from users who currently use or have used these systems in the past (Table 5).

Table V

We, thus, believe that our architecture will serve as a platform for the next generation telepresence systems and continue to improve a users experience. ACKNOWLEGMENT

The above data indicates that the user perception on experience and acceptance of an Experiential Telepresence System is far higher over a conventional telepresence systems. Few users felt greater focus was required in the driving task initially, but were able to adjust to the controls in a short amount of time. The UI provided information on the obstacles present in the scene, and the navigation assistance meant for obstacle avoidance proved to be of great help when steering in indoor environments. Users were happy with the overall experience and found the activity quite engaging. Please note, since the number of people surveyed is small, the results are only indicative and do not necessarily prove that our system is better than the other systems mentioned. As we receive feedback from more users the statistics will be more accurate. IV. C ONCLUSION In this paper, we have described the overall architecture and implementation of our Experiential Telepresence System. Our research was aimed at improving the users experience of a remote location through addition of context - aware visual data over a standard telepresence system. Our immediate focus, now, is on the implementation of variable bit rate video transmission. The current system uses H.264 compression of constant bitrate (250kbps). Seamless streaming of video requires high available bandwidth and low trafc on the network. Under scenarios where network quality has been poor, frame loss and temporary freezing of the video feed has been observed. Such issues are not desirable in a good telepresence system, and can be improved by changing the current compression technique from constant to variable which alters the compression ratio of the video stream keeping in mind a maximum bit rate (limited by the available bandwidth) so that it smoothly plays over the network without frame loss or freezes. In our current implementation of the Experiential Telepresence System, we have only touched upon the auditory and visual senses of a user. In order to immerse the user further, we need to tap into other senses like olfactory, touch and gustatory as well. Thus, concepts like haptics[16], mixed reality are some of the future additions to our Experiential Telepresence Stack. This shall allow the user to feel the real physical environment at the remote location through touch and at the same time interact with virtual elements perceived by the robot. Experiential Telepresence Systems have a wide variety of application areas. At India Innovation Labs, our primary study is in the area of digital tourism. The concept allows tourists to remotely experience a tourism site through our experiential system. In addition to digital tourism, Experiential Telepresence may be used for distance education, hospitality at large ofce campuses[17] and at retail outlets.

We would like to thank the Board of Trustees of India Innovation Labs for their support especially Mr. NAPS Rao and Prof. Prahladacharya. We would like to acknowledge our core team including Mr. Viswanath Buravalla, I. Vijay Kumar, V. R. Venkatesh and B. D. Vijaya for their unfailing encouragement. A special acknowledgement also goes to our well wishers from Wipro Technologies especially Mr. Anant C. D. and Dr. Anurag Srivastava, CTO. Finally, we would like to acknowledge our colleagues Ms. Aarushi Khanna, Mr. Maruthi R. and the students of R.V. College of Engineering and National Institute of Technology, Karnataka who have been associated with the development of PRATHAM over the course of the last one year. R EFERENCES
[1] A. Chapanis, Interactive human communication, in Scientic American, vol. 232(2), March 1975. [2] H. S. Lichtman, A brief history of telepresence, February 2007. [Online]. Available: http://www.telepresenceoptions.com/ [3] Anybots, Introducing anybots, qb, telepresence robot!! [Online]. Available: https://www.anybots.com/ [4] V. Communications, Introducing vgo secure, simple, affordable, 2010. [Online]. Available: http://www.vgocom.com/ [5] Gostai, Robotic telepresence. [Online]. Available: http://www.gostai. com/ [6] The buzz: Ramachandra budihal augments reality, November 2009. [Online]. Available: http://blog.ted.com/2009/11/06/the_buzz_ramach [7] The truth about robotics uncanny valley - human-like robots and the uncanny valley, Popular Mechanics, January 2010. [Online]. Available: http://www.popularmechanics.com/technology/engineering/ robots/4343054 [8] Saygin, I., Chaminade, T., Ishiguro, and H., The perception of humans and robots: Uncanny hills in parietal cortex, CogSci, 2010. [9] Mori and Masahiro, Bukimi no tani / the uncanny valley, in Energy, 1970, pp. 3335. [10] C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan, Analysis of emotion recognition using facial expressions, speech and multimodal information, in Proceedings of the 6th international conference on Multimodal interfaces, ser. ICMI 04. New York, NY, USA: ACM, 2004, pp. 205211. [Online]. Available: http://doi.acm.org/10.1145/1027933.1027968 [11] R. T. Azuma, The challenge of making augmented reality work outdoors, in In Mixed Reality: Merging Real and Virtual. SpringerVerlag, 1999, pp. 379390. [12] K. M. Tsui, M. Desai, H. A. Yanco, and C. Uhlik, Telepresence robots roam the halls of my ofce building, HRI Workshop, 2011. [13] Quality of experience. [Online]. Available: http://en.wikipedia.org/ wiki/Quality_of_experience [14] K. Kilkki, Quality of experience in communication systems, Journal of Universal Computer Science, vol. 14, pp. 615624, 2008. [15] D. J. Weiss, Analysis of Variance and Functional Measurement. Oxford University Press, October 2005. [16] A. Ansar, D. Rodrigues, J. P. Desai, K. Daniilidis, V. Kumar, and M. F. M. Campos, Visual and haptic collaborative telepresence, Computers & Graphics, vol. 25, no. 5, pp. 789 798, 2001. [Online]. Available: http://www.sciencedirect.com/science/article/ pii/S0097849301001212 [17] K. M. Tsui, M. Desai, H. A. Yanco, and C. Uhlik, Exploring use cases for telepresence robots, in Proceedings of the 6th international conference on Human-robot interaction, ser. HRI 11. New York, NY, USA: ACM, 2011, pp. 1118. [Online]. Available: http://doi.acm.org/10.1145/1957656.1957664

Вам также может понравиться