0% found this document useful (0 votes)
90 views

2.literature Review: 2.1 Bare Hand Computer Interaction

The document summarizes several papers on bare hand computer interaction and gesture recognition techniques: 1. Hardenburg and Berard describe techniques for hand segmentation, finger finding, and posture classification to control an on-screen mouse and develop a multi-user brainstorming tool. They used a modified image differencing algorithm for segmentation. 2. Lenman, Bretzer and Thrusson present using marking menus to develop gesture command sets. They discuss pie menus and hierarchical marking menus and test controlling TV, music, and lights using hand pose recognition. 3. Granum et al. present gesture recognition for an augmented reality interface using color segmentation and polar transformation to count hands and detect pointing gestures. They qual

Uploaded by

api-26488089
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

2.literature Review: 2.1 Bare Hand Computer Interaction

The document summarizes several papers on bare hand computer interaction and gesture recognition techniques: 1. Hardenburg and Berard describe techniques for hand segmentation, finger finding, and posture classification to control an on-screen mouse and develop a multi-user brainstorming tool. They used a modified image differencing algorithm for segmentation. 2. Lenman, Bretzer and Thrusson present using marking menus to develop gesture command sets. They discuss pie menus and hierarchical marking menus and test controlling TV, music, and lights using hand pose recognition. 3. Granum et al. present gesture recognition for an augmented reality interface using color segmentation and polar transformation to count hands and detect pointing gestures. They qual

Uploaded by

api-26488089
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 19

2.

LITERATURE REVIEW
2.1 Bare Hand Computer Interaction

Hardenburg and Berard in their work “Bare Hand Human Computer

Interaction” [5] published in the Proceedings of Workshop on Perceptive

User Interfaces, describes techniques for barehanded computer interaction.

Techniques for hand segmentation, finger finding, and hand posture

classification were discussed. They applied their work for control of an on-

screen mouse pointer for applications such as a browser and presentation

tool. They have also developed a multi-user application intended as a

brainstorming tool that would allow different users to arrange text across the

space in the screen.

Figure 2-1. Application examples of Hardenburg and Berard’s system.


Finger controlled a) Browser, b) Paint, c) Presentation,
d.) multi user object organization

1
Hand segmentation techniques such as stereo image segmentation,

color, contour, connected components algorithms and image differencing

are briefly discussed as an overview of present algorithms. It is pointed out

that the weaknesses of the different techniques can be compensated by

combining techniques at the cost of computational expense. For their work,

they chose to use a modified image differencing algorithm. Image

differencing tries to segment a moving foreground (i.e. the hand) from a

static background by comparing successive frames. Additionally when

compared to a reference image the algorithm can detect resting hands.

Additional modification for image differencing was maximizing the contrast

between foreground and background.

After segmentation, the authors discuss the techniques used for

detecting the fingers and hands. They describe a simple and reliable

algorithm based on finding fingertips and from which fingers and eventually

the whole hand can be identified. The algorithm is based on a simple model

of a fingertip being a circle mounted on a long protrusion. After searching

the fingertips, a model of the fingers and eventually hand can be generated

and this information can be used for hand posture classification.

2
Figure 2-2. Finger Model used by Hardenburg and Berard

The end system that was developed had a real time capacity of around

20-25 Hz. Data from their evaluation shows about 6 frames out of 25 are

misclassified with a fast moving foreground. Accuracy was off in between

0.5 and 1.9 pixels. They have concluded in their paper that the system

developed was simple and effective capable of working in various lighting

conditions.

2.2 Using Marking Menus to Develop Command Set


for Computer Vision Based Gesture Interfaces

The authors Lenman, Bretzer and Thrusson present “Using Marking

Menus to Develop Command Set for Computer Vision” [6] published in the

Proceedings of the Second Nordic Conference on Human-computer

Interaction. This gesture based interaction will be somewhat a replacement

for the present interaction tools such as remote control, mouse, etc.

Perspective and Multimodal User Interfaces are the two main scenarios

3
discussed for gestural interfaces. First the Perspective User Interface aims

for automatic recognition of human gestures integrated with other human

expressions like facial expressions or body movements. While the

Multimode User Interfaces focuses more on hand poses and specific

gestures that can be use as commands in a command language.

Here they included the three dimensions to be considered in designing

gestural command sets. The first dimension was the Cognitive aspect, this

aspect refers to how easy commands are to learn and be remembered;

therefore command sets should be practical to the human user. Articulation

aspects being the second dimension tackle on how gestures are easy to

perform or how tiring it will be for the user. The last dimension was on the

Technical aspects. This refers to the command sets must be state of the art

or futuristic and will meet the expectations of the upcoming technology.

The authors concentrate on the cognitive side. Here they considered

that having a menu structure would be of great advantage because

commands can be then easily recognize. Pie and Marking menus are the two

types of menu structures that the authors discussed and explained. Pie

menus are pop-up menus with included alternatives that are arranged in a

radial manner. Marking menus, specifically Hierarchic Marking Menu is a

development of pie menus that allows more complex choices by

implementing sub-menus.

4
As a test, a prototype for hand gesture interaction was performed.

Lenman, Bretzer and Thrusson have chosen a hierarchic menu system for

controlling functions of a T.V., CD players and a lamp. As their chosen

computer vision system was the representation of the hand. The system will

search and then recognize the hand poses based on a combination of

multiscale color detector and particle filtering. Hand poses are then

represented in terms of hierarchies of color image features with qualitative

interrelations in terms of position, orientation and scale. Their menu system

has three hierarchical levels and four choices. Menus then are shown on a

computer screen which is inconvenient and in the future an overlay on the

TV screen will be presented.

As for their future work, they are attempting to increase the speed and

tracking stability of the system in order to acquire more position

independence for gesture recognition, increase the tolerance for varying

light conditions and increase recognition performance.

2.3 Computer Vision-Based Gesture Recognition


for an Augmented Reality Interface

Granum et. al. presented “Computer Vision-Based Gesture

Recognition for an Augmented Reality Interface” [7] which was published

in the Proceedings of the 4th International Conference on Visualization,

Imaging and Image processing. It contains or talks about different areas like

5
gesture recognition, segmentation, etc. that are needed to complete the

research and the techniques that will be used for it. Already there has been a

lot of research on vision-based hand gesture recognition and finger tracking

application. Because of our growing technology, researchers our finding

ways for computer interface to perform naturally, limitations such as

sensing the environment with sense of sight and hearing must be imitated by

the computer.

This research is done in one application for a computer-vision

interface for an augmented reality system. The computer-vision is centered

on gesture recognition and finger tracking used as interface in the PC. There

structure will project a display on a Place Holder Object (PHO) and by the

use of your own hand the system can create controls and situations for the

display, movements and gestures of the hand are detected by the Head

mounted camera which serves as the input for the system.

There are two main areas of problem and the presentation of their

solutions is the main bulk of the paper. The first was segmentation; the use

of segmentation was to detect the PHO and hands in 2D images that are

captured by the camera. Problems in detection of the hands the varying

forms of the hand as it move and the varying of its size from different

gestures. To solve this problem the study used a color pixel-based

segmentation which provides extra dimension compared to gray tone

methods. Color pixel-based segmentation creates a new problem on

6
illumination which is dependent on the intensity changes and color changes.

This problem is resolve by using normalized RGB also called chromatics

but implementing this method creates several issues one of which is that

normally cameras have limited dynamic intensity range. After segmentation

of hand pixels from the image next task is to recognize the gesture, they are

subdivided into two approach first is detection of the number of outstretched

hands and second is for the point and click gesture. For gesture recognition

a simple approach is done to resolve counting of hands is done by polar

transformation around the center of the hand and count the number of

fingers which is rectangle in shape present in each radius, but in order to

speed up the algorithm the segmented image is samples along concentric

circles. Second area of concern is detection of point and click gestures. The

algorithm in the gesture recognition is used and when it detects only one

finger it represents a pointing gesture tip of the finger is defined to be an

actual position. The center of the finger is found for each radius and the

values are fitted into a straight line, this line is searched until the final point

is reached.

Figure 2-3. Polar transformation on a gesture image.

7
The paper is a research step for gesture recognition. It is implemented

as a part of a computer-vision system for augmented reality. The research

has proven qualitatively that it can be a useful alternate interface for use in

augmented reality. Also it was proven that it is robust enough for the

augmented reality system.

2.4 Creating Touch-Screens Anywhere with


Interactive Projected Displays

Claudio Pinhanez et. al. researchers of “Creating Touch-Screens

Anywhere with Interactive Projected Display” [8] published in Proceedings

of the Eleventh ACM International Conference on Multimedia, started few

years ago working and developing systems which could transforms an

available physical space into an interactive “touch-screen” style projected

display. In this paper, the authors demonstrated the technology named

Everywhere Display (ED) which can be used for Human-Computer

Interactions (HCI). This particular technology was implemented using an

LCD projector with motorized focus and zoom and a computer controlled

pan-tilt zoom-camera. They also come up with a low-end version which

they called ED-lite which functions same as the high-end version and differs

only in the devices used. In the low-end version the group used a portable

projector and an ordinary camera.

8
Several group of professionals were researching and working for a

new method of improving the present HCI. The most common method they

make use of for HCI is by use of mouse, keyboard and touch-screens. But

these methods require an external device for humans to communicate with

computers. The goal of researchers was to develop a system that would

eliminate the use of such external device that would link the communication

of human and computers. The most popular method under research was

through computer vision. Computer vision is used nowadays since it offers a

methodology similar to human-human interaction. The goal in the

advancement of technology in HCI is to create a methodology to implement

the said advancement that is more likely to a human-human interaction.

IBM researches used computer vision to implement ED and ED-lite. With

the aid of computer vision, the system was able to steer the projected

display from one surface to another. Creating a touch-screen like interaction

is made possible by using techniques and algorithms for machine vision.

Figure 2-4. Configuration of ED (left), ED-lite (upper right), and sample


interactive projected display (bottom right).

9
The particular application used by IBM for demonstration is a slide

presentation using Microsoft PowerPoint. They were able to create a touch-

screen like function using devices which was mentioned earlier. The ED

unit was installed at ceiling height on a tripod to cover greater space. A

computer is used to control the ED unit and performs all other functions

such as vision processing from interaction and running application software.

The specific test conducted was a slide presentation application using

Microsoft PowerPoint controlled via hand gestures. There is a designated

location in the projected image which the user could use to navigate the

slide or to move the content of the projected display from one surface area

to another. The user controls the slide by touching the buttons superimposed

in the specified projected surface area. With this technology the user

interacts with the computer using bare hand and without using such input

devices attached to the directly to the user and computer.

2.5 Interactive Projection

Projector designs are now shrinking and are now just in the threshold

of being compact for handheld use. That is why Beardsley and his

colleagues of Mitsubishi Electric Research Labs propose “Interactive

Projection” [9] published in IEEE Computer Graphics and Applications.

Their work is only an investigation of mobile, opportunistic projection

10
which can make every surface into displays, a vision to make the world as

its desktop.

The prototype has buttons that serves as the I/O of the device. It also

has a built-in camera to detect the input of the user. Here, it discusses three

broad applications of interactive projection. First class is using a clean

display surface for the projected display. Another class creates a projection

on a physical surface. This, typically, is what we call augmented reality. The

first stage is object recognition and the next is to project an overlay that

gives some information about the object. The last class is to project physical

region-of-interest, which can be used as an input to a computer vision

processing. This is similar to a mouse that creates a box to select the region-

of-interest, but instead of using a mouse, the pointing finger is used.

Figure 2-5. Handheld Projector Prototype

There are two main issues when using a handheld device to create

projections. The first one is the keystone correction to produce undistorted

projection and next is the correct aspect ratio. Keystoning occurs when the

projector is not perpendicular to the screen producing a trapezoidal shape

11
instead of a square. Keystone correction is used to fix this kind of problem.

Second is the removal of the effects of hand motions. Here it describes the

technique of how to make a static projection on a surface even when in

motion. They use distinctive visual markers called fiducials to define a

coordinate frame on the display surface. Basically, a camera is used to sense

the markers and to infer the target area in camera image coordinates and

these coordinates are transformed to projector image coordinates and the

projection data is mapped into these coordinates giving the right placement

of projection.

Examples of applications for each main class given above are also

discussed. An example of the first class is a projected web browser. This is

basically a desktop Windows environment that is modified so that the

display goes to the projector and the input is taken form the buttons of the

device. An example application of the second class is a projected augmented

reality. The third application is a mouse-button hold-and-drag defining a

Region of Interest (ROI) just like in a desktop but without the use of a

mouse.

2.6 Ubiquitous Interactive Displays in a Retail


Environment

Pinhanez et. al. in their work “Ubiquitous Interactive Displays in a

Retail Environment [10] published in the Proceedings of ACM Special

12
Interest Group on Graphics (SIGGRAPH): Sketches, proposes an interactive

display that is set to a retail environment. It uses a pan/tilt/mirror/zoom

camera with a projector using computer vision methods to detect interaction

with the projected image. They call this technology the Everywhere Display

Projector (ED projector). They proposed using it in a retail environment to

help the customers find and give them information about a certain product

and it also tells where the product is located. The ED projector is installed

on the ceiling and it can project images on boards that are hung on every

aisle of the store. At the entrance of the store, there is a table where a larger

version of the product finder is projected. Here, a list of product is projected

on the table and the user can move the wooden red slider to find a product.

The camera detects this motion and the list scrolls up and down copying the

motion of the slider.

Figure 2-6. Setup of the Project Finder

13
2.7 Real-Time Fingertip Tracking and Gesture
Recognition

Professor Kenji Oka and Yoichi Sato of University of Tokyo together

with Professor Hideki Koike of University of Electro-Communications,

Tokyo worked on “Real-time fingertip Tracking and Gesture Recognition”

[11] published by IEEE Volume 22, Issue 6, that introduced method in

determining fingertip location in an image frame and measuring fingertip

trajectories across image frames. They also propose a mechanism in

combining direct manipulation and symbolic gestures based on multiple

fingertip motion. Several augmented desk interface have been developed

recently. DigitalDesk is one of the earliest attempts in augmented desk

interfaces and using only charged-coupled device (CCD) camera and a

video projector the users can operate projected desktop application with

fingertip. Inspired by DigitalDesk the group developed an augmented desk

interface called EnhancedDesk that lets users performs tasks by

manipulating both physical and electronically displayed objects

simultaneously with their own hands and fingers. An example application

demonstrated in the paper was EnhancedDesk’s two handed drawing

system. The application uses the proposed tracking and gesture recognition

methods which assigns different roles to each hand. The gesture recognition

lets users draw objects of different shapes and directly manipulate those

objects using right hand and fingers. Figure 7 shows the set-up used by the

14
group which includes infrared camera, color camera, LCD projector and

Plasma display.

Figure 2-7. EnhancedDesk’s set-up

The detection of multiple fingertips in an image frame involves

extracting hand regions, finding fingertips and finding palm’s center. In the

extraction of hand regions, an infrared camera was used to measure

temperature and compensate with the complicated background and dynamic

lighting by raising the pixel values corresponding to human skin above

other pixels. In finding fingertips a search window for fingertips were

defined rather than arm extraction since the searching process in this

method is more computationally expensive. Based on the geometrical

features, fingertip-finding method uses normalized correlation with a

properly sized template corresponding to a user’s fingertip size.

15
Figure 2.8. Fingertip Detection

Measuring fingertip trajectories involved determining trajectories,

predicting fingertip location and examining fingertip correspondences

between successive frames. In the determination of possible trajectories

predicting the locations of fingertips in the subsequently frame is done then

compare it to the previous. Finding the best combination among these two

sets of fingertips will determine multiple fingertip trajectories in real time.

Kalman filter is used in the prediction of fingertip location in one image

frame based on their locations detected in the previous frame.

Figure 2.9. (a) Detecting fingertips. (b) Comparing detected and


predicted fingertip to determine trajectories

16
In the evaluation of the tracking method, the group used Linux based

PC with Intel Pentium III 500-MHz and Hitatchi IP5000 image processing

board, and a Nikon Laird-S270 infrared camera. The testing involves seven

test subjects which was experimentally evaluated the reliability

improvement by considering fingertip correspondences between successive

image frames. The method reliably tracks multiple fingertips and could

prove useful in real time human-computer interaction applications. Gesture

recognition works well with the tracking method and able the user to

achieve interaction based on symbolic gesture while performing direct

manipulation with hands and fingers. Interaction based direct on

manipulation and symbolic gestures works by first determining from the

measured fingertip trajectories whether the user’s hand motion represent

direct manipulation or symbolic gesture. Then it selects operating modes

such as rotate, move, or resize and other control mode parameters if direct

manipulation is detected. While, if symbolic gesture is detected, the system

recognizes gesture types using a symbolic gesture recognizer in addition to

recognizing gesture locations and sizes based on trajectories.

The group plans to improve the tracking method’s reliability by

incorporating additional sensors. The reason why additional sensors were

needed is because the infrared camera didn’t work well on cold hands. A

solution is by using color camera in addition to infrared camera. The group

17
is also planning to extend the system to 3D tracking since the current system

is limited to 2D motion on a desktop.

2.8 Occlusion Detection for Front-Projected


Interactive Displays

Hilario and Cooperstock creates an “Occlusion Detection System for

Front-Projected Displays” [12] published by Austrian Computer Society.

Occlusion happens in interactive display systems when a user interacts with

the display or inadvertently blocks the projection. Occlusion in these

systems can lead to distortions in the projected image and information is

loss in the occluded region. Therefore detection of occlusion is essential to

prevent unwanted effects and occlusion detection can be used for hand and

object tracking. This work of Hilario and Cooperstock detects occlusion by

a camera-projector color calibration algorithm that estimates the RGB

camera response to projected colors. This allows predicted camera images

to be generated for projected scene. The occlusion detection algorithm

consists of offline camera-projector calibration then online occlusion

detection for each video frame. Calibration is used for constructing

predicted images to the projected scene. This is needed because Hilario and

Cooperstock’s occlusion detection occurs by pixel-wise differencing

predicted and observed camera images. Their system is used with a single

camera and projector; it also assumes a planar Lambertian surface with

18
constant lightning conditions and negligible intra-projector color calibration

to be used. Calibration is done by two steps, first is offline geometric

registration which will compute the transformation from projector to

camera frames of reference. It will center the projected image and aligned

the images to the specified world coordinate frame. For geometric

registration the paper adopted the same approach based on the work of

Sukthankar et al. which projector prewarping transformation are obtained

by detecting the corners of a projected and printed grid in camera view.

Second step in the calibration process is the offline color calibration. Due to

certain dynamics a projected display is unlikely to produce an image whose

colors match exactly those from the source of image. For us to determine

predicted camera images correctly we must determine the color transfer

function of the camera to the projection. This is done by iterating through

the projection of primary colors of varying intensities, measuring RGB

camera response storing it as color lookup table. This response is the

average RGB color over corresponding patch pixels measured over multiple

camera images. Then the predicted camera response can be computed by

summing the predicted camera responses to each of the projected color

components. Camera-projector calibration results are used in the online

occlusion detection. It is stated in their preliminary results that it is critical

to perform general occlusion detection for front projected-display system.

19

You might also like