Вы находитесь на странице: 1из 4

Recent Research Developments in Learning Technologies (2005) 1

1
2
3 Efficient video streaming using gesture recognition in small
4 class peer to peer e-learning
5
6 Richard Y. D. Xu 1, Jesse S. Jin2
7 1
8 Faculty of Information Technology, University of Technology, Sydney, Broadway NSW 2007, Austra-
lia
9 2
School of Design, Communication & I.T, University of Newcastle, Callaghan NSW 2308, Australia
10
11 Synchronous peer to peer e-learning has recently been receiving more popularity, as instructor and stu-
12 dents can communicate directly in real-time through exchanging multimedia, particularly, the audiovisual
13 data. Since most peer to peer e-learning applications are PC based and participants are connected through
14 the internet, live video feeds therefore have consumed a lot of bandwidth, creating excessive network traf-
15 fic. In our work, we have recognized the trend in applying advanced computer vision and pattern recogni-
16 tion (CVPR) technologies to automate real time e-learning activities. Following this trend, we discuss the
17 use of fast pose recognition methods in both instructor and student’s side. Based on these methodologies,
we proposed an intelligent framework which uses pose information to effectively control the video
18
streaming.
19
20
Keywords e-learning, computer vision, peer to peer e-learning, video streaming
21
22
23
24 1. Introduction
25
The increasing in PC processing speed and network facilities have made synchronous peer-to-peer e-
26
learning more accessible to ordinary computer users as a means to connect and collaborate directly be-
27
tween them.
28
Amongst various multimedia elements, video is no doubt one of the most powerful media used in
29
synchronous e-learning. However, video in nature contain high volume and consumes a lot of network
30
traffic. Currently, a common approach for network bandwidth reduction is to stream video data using
31
MPEG-4 multilayer model. In each video frame, the moving instructor and the change in chalkboard can
32
be coded in a separate layer to the static background. Real-time processing can be achieved through
33
background subtraction algorithms that are robust to change in lighting condition, such as Wren et al. [1]
34
and computational efficient algorithms for moving object segmentation and contour extractions, such as
35
Wu et al. [2]. Although this methodology significantly reduces bandwidth consumption comparing with
36
full frame streaming, however, it can still shown to be insufficient for real time application in slower
37
network, such as dial up connections. Many literatures have also focused their attentions to video trans-
38
mission over unreliable and narrow bandwidth network. These approaches mainly consider video packet
39
scheduling [3] and rate distortions [4].
40
At the same time, human pose and gesture analysis have also been gaining its maturity in recent times
41
where we have seen several real-time and reliable pose detection methods. More importantly, recently
42
some of the pose recognition methods are being used for pedagogical analysis, such as Suganuma [5].
43
However, since peer to peer e-leaning is our area of focus, we need to exclude any over complex proc-
44
essing algorithms that are not feasible to execute using ordinary PCs which vast majority of e-learning
45
users are relying on.
46
In this paper, we have proposed a framework to apply pose and gesture analysis to both instructor and
47
the student in an effort to reduce network traffic. Comparing with other existing methods, our technique
48
is fast and reliabe. In addition, it is only relying on a single camera and can be exectued on inexpensive
49
hardware, which we argue is more suitable to peer to peer e-learning. The rest of this paper is organized
50
as follows: In section 2 and 3, we will describe our pose anaysis implementations for the instructor and
51
student’s side respectively. In section 4,5, we will present the outcome and discussions.
52

© FORMATEX 2005
2 R. Y. D. Xu et al.: Efficient video streaming using gesture recognition in small class peer to peer e-learning

1 2. Instructor Side Gesture Analysis


2
3
4 When MPEG-4 multilayer coding is used, the classroom background is only been streamed once, the
5 foreground, containing the moving instructor and occasional change in chalkboard will be streamed in all
6 subsequent frames. In order to further compromise the narrow bandwidth, we proposed to stream the
7 animated character (the avatar) that best mimics the real instructor’s actions in live video.
8 The position of the avatar is determined from human tracking, and its action is interpreted based on
9 real-time pose and gesture recognition from live video at the instructor’s classroom. As a consequence,
10 on each subsequent frame, only text information represents the animated character is being transmitted.
11 The avatar generation methods can be described in the following steps:
12 1.Firstly, we detecting the appearance of an instructor, or multiple instructors based on facial detection
13 model using Viola et al. [6]. This is a simple feature classification algorithm using Adaboost. We have
14 also tested few other facial detection algorithms and none matches its efficiencies.
15 2.When an instructor is detected, we will apply our fast human tracking method based on kernel based
16 kennel mean-shift in Comaniciu et al. [7] and fast color thresholding using Xu et al. [8], which we have
17 found it to be an effective safeguard measure for mean shift object tracking.
18 3.We also determine the major direction of instructor’s motion, where a simple decision rule is based
19 on. If the motion is a horizontal one, then no gesture recognition is being performed, and the animated
20 character is just moved across the classroom image accordingly.
21 4.When no horizontal motion is detected, e.g. the instructor has stopped moving, we begin gesture
22 recognition, the algorithm is based on [9], where the method uses timed motion history image (tMHI) for
23 representing motion from the gradients in successively layered silhouettes. This method provides pre-
24 liminary pose recognition. The gesture modelling is performed using Hidden Markov Model (HMM).
25 Currently the system can detect upper body movement including facing front, facing back, and the arm
26 poses, from these sequences of poses, we can recognize gesture such as turning and writing. We found
27 these recognitions are sufficient for the tasks used in our application.
28 .
29
30 3. Student Side’s Gaze analysis
31
32 During video or even images transmissions, when the student on the other side of the network is not
33 watching the screen, the network bandwidth is wasted; as video or images are progressively updated and
34 the history information is usually not used in live e-learning application. However, it is impractical for
35 the student to remember informing the instructor’s PC to stop uploading before he walks away or writing
36 on the paper each time.
37
38
39
40
41
42
43
44
45
46
47
48
49
50 Fig. 1 Student’s gaze: The system determines if the studnet is watching the screen from his/her pose
51 analysis.
52

© FORMATEX 2005
m-ICTE2005 http://www.formatex.org/micte2005 3

1
2 In addition, when multiple students are the recipients of the same streaming, then the unused bandwidth
3 for one student can be used for the others.
4 Therefore, we have incorporated a gaze detection agent that can efficiently determine if the student is
5 watching the screen or not. This in turn, provides a monitoring mechanism for detecting student’s view-
6 ing activities. We placed a 320 * 240 resolution web cam in front of the student’s PC for gaze detection.
7 The appropriate signals are sent to the instructor side when the agent detects the student is not watching
8 and or when the student has resumed watching.
9 The gaze algorithm used firstly locates the face, which again is based on Viola et al. [8]. When a face
10 is being detected in the student’s web cam, we then apply simple line detections and geometry for the
11 gaze recognition, which is shown in Fig 1. We have devised simple gaze detection rules based on the
12 result above, such that if frontal face is detected, and the pose of the frontal face is within the angle
13 threshold, this would indicate the user is watching, and otherwise means the student is not watching. A
14 more reliable method of pose analysis using eyes detection is also being studied.
15
16 4. Empirical Results
17
18
19 We have constructed the prototype based on the work presented in this paper. The prototype is writ-
20 ten in Visual C++, which incorporates several software development kits, including Microsoft® DirectX
21 8.1, Intel® OpenCV Beta 4 and the Microsoft® Windows Media Format 9 Series SDK for streaming.
22 Some implementation was borrowed from Bob Mottram’s human body project [10]. Currently, stream-
23 ing is achieved by broadcasting over the HTTP protocol, where image stream for static classroom back-
24 ground and updated writing regions, text streams for generated avatar and bidirectional audio stream are
25 used.
26 The results are promising. The streaming was very smooth. The occasional delays were caused by the
27 audio streaming instead of visual data streaming (When audio is turn off, no delays is being noticed).
28 The detection agents are executed in real time on consumer grade PCs for both instructor and student
29 side’s PC.
30 For qualitative analysis, we have consulted several students to evaluate the system where we have re-
31 ceived good feedbacks regarding the animated characters being used for a real lecturer replacement.
32
33 5. Discussion
34
35
36 While we have achieved promising results, much work is required for the robustness and versatilities
37 of the avatar modeling. Future work includes research in efficient gesture recognition with stereo
38 cameras and more complex instructor gesture and event recognition. While 3D avatar is being
39 considered, however, the processing complexity has prevented it to be an efficient candidate using
40 currenlty avaialbe hardware to most odinary users.
41 The streaming can be further reduced if the update writings and the slides can be streamed using text
42 recognition OCR techniques. Currently, we have applied off-the-shelf OCR software, and received poor
43 results. More research in the area of video OCR is required which can adapt to varying in lighting
44 conditions.
45
46 6. Acknowledgment
47
48
49 This project is supported by Australia Research Council SPIRT Grant (C00107116).
50
51
52

© FORMATEX 2005
4 R. Y. D. Xu et al.: Efficient video streaming using gesture recognition in small class peer to peer e-learning

1 References
2
3
4 [1] C. Wren, A. Azarbayejani, T. Darrell, and A.P. Pentland, Pfinder: real time tracking of the human body, IEEE
5 Trans. on Pattern Anal. and Machine Intell., vol.19, no. 7, pp. 780–785, 1997.
6
[2] Z. P. Wu, C. Chen: A new foreground extraction scheme for video streams. ACM Multimedia 2001: 552-554
7
[3] M. Kalman, P. Ramanathan, and B. Girod, "Rate-Distortion Optimized Video Streaming with Multiple
8 Deadlines," Intl. Conf. on Image Processing (ICIP-2003) Barcelona, Spain, September 2003.
9 [4] J. Chakareski, J. Apostolopoulos, S. Wee, Wai-tian Tan and B. Girod, "R-D Hint Tracks for Low-Complexity R-
10 D Optimized Video Streaming," Proc. IEEE Conference on Multimedia & Expo, ICME-2004, Taipeh, Taiwan,
11 June 2004
12 [5] A. Suganuma, A Real-time Analysis Method of Students' Behavior for a Supporting System of a Distance
13 Lecture, Proc. of IADIS International Conference WWW/Internet,pp.1183-1186,(2003.11).
14 [6] P. Viola, M. Jones, Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade, Neural
15 Information Processing Systems 14, Dec 2001,
[7] D. Comaniciu, V. Ramesh, P. Meer, Kernel-Based Object Tracking, IEEE Trans. Pattern Analysis and Machine
16
Intelligence, 25(5): 564-575, 2003.
17 [8] R. Y. D. Xu, J.G. Allen, J.S. Jin, Robust Mean-shift Tracking with Extended Fast Colour Thresholding, Proceed-
18 ings of the 2004 International Symposium on Intelligent Multimedia, Video & Speech Processing, HK, Oct,
19 2004, pp.542-545.
20 [9] G. Bradski and J. Davis, Motion Segmentation and Pose Recognition with Motion History Gradients,
21 Machine Vision and Applications (2002) 13: 174–184
22 [10] B. Mottram, Human body project http://www.fuzzgun.btinternet.co.uk/rodney/humanbody.htm
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

© FORMATEX 2005

Вам также может понравиться