0% found this document useful (0 votes)
2 views

Kippetal2014 - anvil

The paper discusses the development of 3D visualizations to enhance the qualitative analysis of nonverbal communication, addressing challenges in interpreting 3D data on 2D screens. Proposed visualizations include directional body motion, hand movement direction, gesture space location, and proxemic dimensions, which are integrated into the open-source JMocap tool and planned for the ANVIL video annotation tool. The techniques aim to improve annotation efficiency and reliability, potentially revealing new phenomena in human interactions.

Uploaded by

Maria Marchiano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Kippetal2014 - anvil

The paper discusses the development of 3D visualizations to enhance the qualitative analysis of nonverbal communication, addressing challenges in interpreting 3D data on 2D screens. Proposed visualizations include directional body motion, hand movement direction, gesture space location, and proxemic dimensions, which are integrated into the open-source JMocap tool and planned for the ANVIL video annotation tool. The techniques aim to improve annotation efficiency and reliability, potentially revealing new phenomena in human interactions.

Uploaded by

Maria Marchiano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Single-Person and Multi-Party 3D Visualizations

for Nonverbal Communication Analysis


Michael Kipp, Levin Freiherr von Hollen, Michael Christopher Hrstka,
and Franziska Zamponi
Augsburg University of Applied Sciences
An der Hochschule 1, 86161 Augsburg, Germany
E-mail: {firstname.lastname}@hs-augsburg.de

Abstract
The qualitative analysis of nonverbal communication is more and more relying on 3D recording technology. However, the human
analysis of 3D data on a regular 2D screen can be challenging as 3D scenes are difficult to visually parse. To optimally exploit the full
depth of the 3D data, we propose to enhance the 3D view with a number of visualizations that clarify spatial and conceptual
relationships and add derived data like speed and angles. In this paper, we present visualizations for directional body motion, hand
movement direction, gesture space location, and proxemic dimensions like interpersonal distance, movement and orientation. The
proposed visualizations are available in the open source tool JMocap and are planned to be fully integrated into the ANVIL video
annotation tool. The described techniques are intended to make annotation more efficient and reliable and may allow the discovery of
entirely new phenomena.

Keywords: annotation tools, interaction analysis, human motion visualization

Figure 1: Direction of body motion is visualized by an arrow, the speed is shown by the size of a disc. The visualization
is only active above a certain motion threshold to avoid visual clutter (grey = no motion).

of multiple views of the same scene (e.g. front view, top


1. Motivation view, side view etc.) increases the complexity and
The most limiting factor when analyzing human subjects requires additional cognitive effort to fuse the different
on video or in a 3D view, be it for gesture analysis views.
(cf. McNeill, 1992) or interpersonal process analysis
(cf. Bales, 1951), is the missing or hard-to-read depth The visualizations we propose in this paper are aimed at a
information. While a 3D viewer allows to rotate the single integrated view with visual enhancements that can
camera or view the scene from different angles be switched on/off depending on the current target of
simultaneously, we as human beings are optimized to analysis. We also add derived information computed from
watch a single 2D scene. Our visual system is neither the underlying 3D data like location, speed and angles to
made to parse true 3D information nor to integrate enhance our visual markup. We thus extend the existing
multiple views of the same scene. The fact that we have 3D visualization of the ANVIL video annotation tool
stereoscopic vision is only a minor enhancement to 2D which can display figures as skeletons in 3D space with
vision. Therefore, the vast amount of data in 3D color-coded motion trails of the hands (Kipp, in press;
recordings cannot be trivially mapped to a Kipp 2012, 2012b; Heloir et al., 2010).
human-readable visualization. The simultaneous viewing
2. Related Work the movement of the wrist through space (as opposed to
Nonverbal communication researchers primarily rely on hand-internal motion which is currently still hard to
video data for their analyses. Videos are manually capture). In previous work, we have visualized the path of
annotated with meaningful data like gesture occurrences the hand motion as a color-coded trail of spheres through
according to an annotation manual (cf. McNeill, 1992) space (Heloir et al., 2010). Color-coding allows to show
and these data can then be quantitatively analyzed. Quek the different movement phases (preparation, stroke,
et al. (2002) used computer vision techniques to derive retraction etc.) along this trail. In previous work, we
such data automatically (to some degree) and to support visualized the speed of the motion by adding orthogonal
qualitative analysis with continuous data visualization like 2D circles where the diameter is proportional to the hand's
motion curves, e.g. the position of the hand along the speed at that point. We now added a representation in the
up-axis over time. Motion capture data provides such data form of an arrow whose length is proportional to the
without the need for extraction techniques and much with hand's speed to make the current direction and speed more
higher precision. In (Heloir et al., 2010) we presented visible (Fig. 2).
visualizations both as curves and as 3D markups attached
to the 3D stick figure that represents the speaker in the 3D
scene. Similar techniques are used in the analysis of sports
motion (e.g. swimming motion analysis, see
Kirmizibayrak et al., 2011) and for 3D computer games
(cf. Zammitto, 2008). Unfortunately, the scientific
documentation of such visualizations is rare.

The rich motion caption data is especially useful in


multi-party interactions, when computing relationships
between people in space, e.g. whether they are oriented Figure 2: Direction and speed of hand movement is
toward their interlocutor at particular stages of the depicted by an arrow along the tangent of the current
interaction (Battersby and Healey, 2010). Systems based movement where the arrow's length is proportional to the
on motion capture data are intended both to support hand's speed.
qualitative analysis with informative visualizations and to
automatically annotate data for quantitative analysis (e.g. This visualization can be useful in the analysis of single
the PAMOCAT system by Brüning et al., 2012). In this gestures.
paper we focus on visualizations for qualitative analysis.
However, by automatically writing the visualization 3.3 Gesture Space
information into our hand-made annotations, we can also In gesture research the location of the hands during the
use these data for quantitative analysis. decisive phases of the gesture (typically stroke and
independent hold) is a meaningful aspect of the gesture.
3. Single-Person Visualizations McNeill (1992) suggested a scheme called gesture space
Our first suite of visualizations concern body and hand that decomposes the frontal plane into various sections on
motion of single subjects. the extreme periphery, periphery, center and center-center
(Fig. 3).
3.1 Directional Body Motion
Our first visualization is concerned with body motion. We
define body motion as the motion of the hip through space.
The motion direction is indicated by an arrow and the
magnitude of the speed is shown with a disc whose
diameter is proportional to the magnitude (Fig. 1).

To avoid distraction by small movements we define a


speed magnitude threshold below which we do not
indicate motion. In this case, the arrow vanishes and the
disc turns gray.

This visualization can be used when analyzing the


movement patterns of a single person or when analyzing
crowds.
Figure 3: Gesture space (taken from McNeill, 1992).
3.2 Hand Direction and Speed
With motion capture data, the location of the hands in
Gesture researchers are mainly interested in the movement
gesture space can be automatically determined and
of the hands. In our data this corresponds most closely to
visualized. For visualization, we attached a planar gesture
space grid in front of the figure (Fig. 4). We compute are larger in northern European countries and smaller in
whether a hand is within a section. If this is the case the southern European countries.
section is highlighted in either yellow (right hand), green
(left hand) or red (both hands). In 3D, it can be hard to see how far apart people are from
another unless one looks at them from a bird's eye view,
which comes at the cost of having multiple views.
Therefore, we visualize distance with an ellipsoid between
the feet of the interlocutors that becomes thicker (more
circular) if people are closer and slimmer if people are
farther away from another (Fig. 5). Thus, there is two
shape cues for distance: the length of the ellipsoid and the
thickness. Moreover, we color-code the proxemic zones
(Fig. 6), i.e. for each of the four zones the ellipsoid
changes to a specific color. Finally, we put the precise
distance as text into the ellipsoid.

Figure 4: Gesture space is attached as a grid to the


skeleton. Sections are highlighted whenever a hand is
located within it (yellow: right hand, green: left hand,
red: both).

We decided to keep the gesture space grid upright at all


times, i.e. it does not bend when the figure's upper body
bends. This not only corresponds to McNeill's
methodology where a 2D video view is annotated but also
avoids visual motion clutter where the grid would
constantly make small tilting movements. Also, if a
subject bends the concept of gesture space is of limited Figure 5: Distance between figures can be hard to read in
use and our priority was to make our visualizations as 3D. Our proxemic visualization displays a color-coded
easy-to-read as possible, adding as little distraction as ellipsoid between the feet of the figures which becomes
possible. Of course, our grid does follow the figure as it is more circular (thicker), the closer the figures are.
always positioned in front of and in parallel to the
shoulders.

This visualization is useful in gesture research. The


automatically detected location in gesture space can easily
be exploited for the automatic annotation of gesture
location.

4. Multi-Party Visualizations
Figure 6: Hall's (1966) proxemic zones are visualized by
The following visualizations concern the relationship
color and shape. From center to periphery: intimate (red),
between multiple people. These visualizations can be used
personal (orange), social (yellow), public (green).
in the context of proxemics (Hall, 1966) and/or when
studying social interactions in terms of e.g. F-formations
The exact sizes of the zones, e.g. at how many meters
(Kendon, 1990) or micro-territories (Scheflen, 1975).
does the "personal zone" start and end, can be changed in
a configuration file to keep the visualization adjustable to
4.1 Interpersonal Distance
different cultures (Hall, 1966).
In his theory of proxemics, Edward Hall (1966)
introduced interpersonal space as a meaningful aspect of 4.2 Relative Body Movement
nonverbal communication. He divided the possible
In a two-person situation (dyad) it is meaningful whether
distance between two interlocutors into four functionally
person A is approaching person B or moving away or
different spaces: intimate, personal, social, public. The
moving sideways. To visually clarify this relation, we
exact sizes of these zones differ across cultures, e.g. sizes
combine the single-person visualization for body motion
(disc and arrow, see Sec. 3.1) with a small marker which
shows the position of the interlocutor. With each figure
having a "little radar" around its hip, it is easy to see how
the figure is moving relative to another figure. To clarify
which figure the little marker is referring to we color-code
the marker. In Fig. 7 figure A has a yellow disc and figure
B has a blue disc. On figure A's "radar" disc, figure B is
then represented with a blue marker. The relation between
the figure's own speed arrow and the marker makes clear
whether the figure is approaching the other figure or
moving sideways etc.

Figure 8: Two visualizations for interpersonal orientation.


At the feet of a figure there is one arrow pointing to the
other speaker and one arrow for the upper body
orientation. Between the figures two arrows show the
"overall" orientation relation, i.e. how much the two face
each other, taking both figure's orientations into account.

The second visualization which is placed in the middle


shows the two upper body orientation arrows of the two
speakers together. This represents how much averted both
are. If the arrows form a straight line, the speakers face
Figure 7: To visualize relative movement we use the body each other. Otherwise, the stronger the divergence from
motion discs and add a little marker which represents the the straight line, the more averted they are. All angles are
other figure. The little blue marker on the left figure's disc also shown numerically.
corresponds to the right figure.
5. Conclusion and Future Work
4.3 Relative Body Orientation
Although 3D recordings have much to offer in terms of
When two people are communicating with each other it is information, it is challenging to visualize this information
meaningful how they are oriented toward each other (or in a way that makes analysis easier and potentially more
turned away from another). This can be used to determine substantial. In this paper, we have presented six
the F-formation according to Kendon (1990). Orientation visualizations that aim at supporting the manual
for two people has two aspects: the orientation of a single annotation and analysis of nonverbal communication.
speaker toward the other (is he facing the other or looking
away) and the total angle between the two (if both are The presented techniques are implemented in Java and
looking away, how much so). Therefore, we conceived integrated into the publicly available open-source JMocap
one visualization for each figure and one for both figures software 1 . In future work, we will integrate the
which is positioned exactly in the middle between the two visualization techniques into the ANVIL video annotation
(Fig. 8). tool2 (Kipp, 2001, 2012, 2012b, in press). The challenge
will be to offer controls to combine various visualizations
The individual figure's visualization shows two arrows: easily and to optimize them according to the user's needs.
the first arrow points in the direction of the other speaker, Moreover, it is necessary to ensure that visualizations are
the second arrow shows the figure's own upper body also compatible and configurable with three and more
orientation. The angle between the two arrows shows how figures without making the view visually cluttered.
much the figure is averted from or facing his interlocutor.
Instead of upper body orientation the second arrow could
show the direction of the head or eye gaze direction (not
1
implemented). https://code.google.com/p/jmocap
2
http://www.anvil-software.org
For the future, multi-party interactions with 3+ people Kipp, M. (2001). Anvil - A Generic Annotation Tool for
may necessitate new visualizations for group Multimodal Dialogue. Proceedings of the 7th European
constellations like in interaction process analysis (Bales, Conference on Speech Communication and Technology
1951). Another relevant addition would be the automatic (Eurospeech), pp. 1367-1370.
analysis and visualization of posture (e.g. open vs. closed). McNeill, D. (1992). Hand and Mind: What Gestures
Moreover, our techniques need be to evaluated by Reveal about Thoughts, University of Chicago Press.
annotation/analysis experts in two regards: first, whether Quek, F., McNeill, D., Bryll, R., Duncan, S., Ma, X.-F.,
the visualizations increase coding reliability - i.e. manual Kirbas, C., McCullough, K. E. and Ansari, R. (2002).
annotations become more consistent - and, second, Multimodal human discourse: gesture and speech. ACM
whether the discoverability of new phenomena is Transactions on Computer-Human Interaction, Vol. 9
facilitated or enabled because of the richer information (3), pp. 171-193.
and information visualization. Scheflen, A. E. (1975). Micro-Territories in human
interaction. In A. Kendon, R.M. Harris and M.R. Key
(Eds.) The Organization of Behavior in Face-to-Face
6. References Interaction. The Hague: Mouton Publishers,
Bales, R. F. (1951). Interaction Process Analysis. Chicago pp. 159-173.
University Press, Chicago. Zammitto, V. (2008) Visualization techniques in video
Battersby S., Healey P. G. T. (2010). Head and hand games. Proc. of Electronic Visualisation and the Arts
movements in the orchestration of dialogue. (EVA 2008).
Proceedings of the 32nd Annual Conference of the
Cognitive Science Society.
Brüning, B., Schnier, C., Pitsch, K. and Wachsmuth, S.
(2012). Integrating PAMOCAT in the research cycle:
linking motion capturing and conversation analysis.
Proceedings of the 14th ACM international conference
on Multimodal interaction (ICMI '12). ACM, New York,
pp. 201-208.
Hall, E. T. (1966). The Hidden Dimension, Doubleday,
New York.
Heloir, A., Neff, M. and Kipp, M. (2010). Exploiting
Motion Capture for Virtual Human Animation: Data
Collection and Annotation Visualization. Proc. of the
LREC Workshop on "Multimodal Corpora: Advances in
Capturing, Coding and Analyzing Multimodality",
ELDA.
Kirmizibayrak, C., Honorio, J., Jiang, X., Mark, R. and
Hahn, J. (2011) Digital Analysis and Visualization of
Swimming Motion. The International Journal of Virtual
Reality, Vol. 10 (3), pp. 9-16.
Kendon, A. (1990) Spatial organization in social
encounters: the F-formation system. In A. Kendon,
Conducting interaction: Patterns of behavior in focused
encounters, Cambridge University Press, pp. 209-237.
Kipp, M. (in press). ANVIL: A Universal Video Research
Tool. In J. Durand, U. Gut, G. Kristofferson (Eds.)
Handbook of Corpus Phonology, Oxford University
Press.
Kipp, M. (2012). Annotation Facilities for the Reliable
Analysis of Human Motion. Proceedings of the Eighth
International Conference on Language Resources and
Evaluation (LREC), ELDA, Paris.
Kipp, M. (2012b) Multimedia Annotation, Querying and
Analysis in ANVIL. In M. Maybury (Ed.) Multimedia
Information Extraction: Advances in Video, Audio, and
Imagery Analysis for Search, Data Mining, Surveillance
and Authoring, Chapter 21, John Wiley & Sons,
pp. 351-368.

You might also like