Untitled Document


Representational Gestures Reflect Conceptualization in Problem Solving

Kirsten J. Muser L-SAW 2010
Dr. Padraig O'Seaghdha, Advisor

Dr. Barbara Malt, Second Reader

Lehigh University


The Information Packaging Hypothesis (Alibali, Kita & Young, 2000) holds that gestures play a role in organizing information into conceptual "packages" to be verbalized by the speaker. Representational gestures may be specific for strengthening conceptual representations in working memory. This embodied spatial content may facilitate verbal explanations of both physical and more abstract problems. Experiment 1 was designed to analyze transfer of format-specific gestures. Participants solved the Tower of Hanoi problem in a physical or computerized version and then explained the solution of the same or opposite format. Representational gestures were format-specific in the same conditions but did not show dominance of experienced gestures in transferring to opposite formats. In Experiment 2, participants first performed the standard Tower of Hanoi problem and then explained an analogous Russian Dolls problem while gesture was allowed or precluded. The problem was congruent (direct mapping) or incongruent (inverse size mapping of tower pieces and dolls). In this case, representational gestures did carry over to the analog, but with adaptations. Under congruent mapping, they were modified to "doll-appropriate" holding formations, whereas under incongruent mapping speakers adhered to disc-appropriate grasping gestures. Gesture prohibition impeded fluency in explanations of the Russian Dolls problem. Representational gestures are therefore not only imagistic, but are linked to underlying structures for conveying conceptual material. More broadly, explanation of analogs is a fruitful setting for exploring the cognitive utility of representational gestures.

Representational Gestures Reflect Conceptualization in Problem Solving

Though they are widely observed, some of the gestures we produce may be mere handwaving or floundering to indicate speakers' frustrations with producing words. This would render gesture production an automatic but gratuitous behavior that we do not consciously suppress when it is unnecessary. Conversely, we may produce some gestures when they are not necessary for listener comprehension because they provide cognitive utility for the speaker.

Fig. 1: General language production model

Representational gestures are defined by Chu and Kita (2008) as iconic gestures in which the hands represent placement or movement of an object. These specific gestures may not necessarily be floundering to indicate dysfluency, but actually a means to prevent it in the speaker by providing access to appropriate mental imagery before the onset of speech. Under this view, representational gestures may serve as an aid for the speaker in alleviating production difficulties. General models of the language production system (e.g., Levelt, 1989) support the idea that non-linguistic ideas are first conceptualized before they are structured as well-formed sentences. A critical question, then, is whether gesture supports conceptualizing the message or formulating the sentence. If moments of conceptual dysfluency reflect demand on a speaker's production system, gestures may help speakers alleviate them by providing a format for remembering more visuospatial details (Goldin-Meadow, Nusbaum, Kelly & Wagner, 2001). This possibility is consistent with the idea that that representational gesture supports the organization of conceptual material for later verbalization (see Fig. 1).

We often find our hands moving when we are providing narrations (McNeill, 1992) or explaining how we discovered a solution to a problem (Wagner, Nusbaum, and Goldin- Meadow, 2004). These tasks are quite complex. The mind must take a concept or event and break it up into segments that flow together logically. We spatialize events in time and verbalize them in order to make them accessible to others. However, the English language has a relatively limited spatial vocabulary, so it is somewhat limited in terms of describing objects in space or indicating directional information (Wagner et al., 2004). Pausing or slowing down speech may occur when a speaker needs to consider the best way to structure statements which convey directional information. Furthermore, by enhancing the ability to conceptualize this non-linguistic information, gesture may alleviate dysfluencies in explaining a problem's solution to somebody else.

By focusing on gesture's role in problem solving, we can address the limits of speech in problem solving, as well as whether speakers compensate for these limits by using gesture to build conceptual representations for speech. Previous research has specifically focused on gesture in the case of explaining how to solve the Tower of Hanoi (Wagner & Goldin-Meadow, 2004; Wagner Cook & Tanenhaus, 2009). The Tower of Hanoi is a puzzle consisting of three or more graduated discs to be moved from a source peg to a goal peg (e.g., Hayes & Simon, 1977). The discs must be moved according to two rules. The first rule states that only one disc can be moved at a time; the second rule limits the solver from placing a larger disc on top of a smaller one. Many conceptual features of explaining the steps of this sequential puzzle seem to support representational gesture production. However, the Tower of Hanoi can also be represented in a diverse range of analogs, all which differ in their respective problem spaces. Solving analogies requires a transfer of relational knowledge from a source problem to a target by finding correspondences between the two analogs (e.g., Yaner & Goel, 2006). Thus, representational gesture processes used in explaining how to solve the Tower of Hanoi may be critically necessary in the case of analogical problem solving. Building conceptual representations is a critical element of solving these problems, for the solver must store in memory the structure of the source analog. By doing this, it becomes easier to track the analogical mapping between two problems, especially if the source analog can be represented in the mind as a concrete image. The solver can "map" the properties of the target analog to the source analog in order to more easily visualize critical aspects of the target analog's problem space. The target problem's steps are now much easier to picture in this "mental map" following transfer between the two formats. However non-linguistic, spatial mental representations of solution-relevant details are difficult to conceptualize for speaking. Given the limited number of words for spatial references in English, speakers may find it desirable to move their hands to "visualize" the concepts that are to be verbalized.

Gesture may be critical for conceptualizing this non-linguistic information to eventually be placed in speech. Specifically, representational gestures may be a reflection of how solvers conceptualize particular problems (Chu & Kita, 2008). In the present study, I will address whether representational gestures help capture solution-relevant details and carry forward to different problem formats. I will evaluate whether speakers produce gestures that reflect thinking in a particular problem format, whether there is transfer of format-specific gestures, and why speakers might prefer to gesture in one format over another. Furthermore, I will evaluate whether format-specific gestures carry forward to facilitate explanations of novel problems, and what these gestures say about how we conceptualize easy versus difficult information. If these format-specific representational gestures are associated with conceptualizing difficult material, then they may provide cognitive utility for speakers.

Embodiment of Spatial Content

Rauscher et al. (1996) showed that levels of gesture tend to increase when the content of speech is spatial. In their study, speakers were prohibited from gesturing when describing a cartoon and were required to use as many uncommon words as possible or to avoid using words that contained a specified letter. This was done to specifically increase dysfluencies and induce speech that would not normally be found in the speaker's usual register. Their results suggested that gestures derived from spatially encoded knowledge facilitate lexical retrieval during grammatical encoding, but only for speech terms that indicated directional information. When gesturing was prevented, participants' ability to describe spatial content was slowed down, but non-spatial descriptions (for example those including idioms "the coyote ended up hoisted by his own petard") were not affected. Here, gesture may have served as a tool to embody spatial content. Preventing gesture may have interfered with access to information in visuospatial memory. This Lexical Retrieval Hypothesis states that the role of gesture is to help generate the surface forms of utterances when speech production is restrained by a limited spatial vocabulary.

Conversely, Alibali et al. (2000) found gesture's utility to be conceptual. In their study, children who were placed in situations of constrained thinking, such as in a Piagetian conservation task, gestured more than in a purely descriptive task. Because verbal responses were comparable across tasks that differed in their degrees of information packaging, then the gestures may have played a role in organizing that information. This Information Packaging Hypothesis posits that gestures have cognitive utility for placing specific conceptual information into "packages" that help structure the elicited speech, supporting evidence from McNeill (1992) that gesture plays a role in thinking for speaking by enhancing conceptualization of nonlinguistic material. This form of conceptualization allows speakers to organize a string of concepts into mental representations, which are further broken down into verbalizable units (Kita, 2000). Furthermore, gesture may also influence mental representations of sequential tasks by focusing speakers' attention on particular features of their situations (Alibali & DiRusso, 1999). Under this hypothesis, prohibiting gesture leads to difficulties conceptualizing speech. Thus, when speakers cannot gesture, they may take time to look for different ways to package that information, increasing their number of speech dysfluencies (Alibali et al., 2000).

Conveying Procedural Knowledge

Gesture may be especially useful in instances of communicating information about task execution, where the knowledge to be conveyed is procedural. According to Willingham, Nissen, & Bullemer (1989), procedural knowledge is not always linguistically accessible. This may be due to the conceptual demands associated with breaking down a task into discrete steps that follow logically to achieve a particular goal. Furthermore, this process is especially difficult in attempting to convey that knowledge after the task has been performed, when visual cues have been removed. A question then, is whether "action" information is encoded in gesture when procedural knowledge is conveyed to others, especially when this information embodies spatial content. Space and action descriptions are known to elicit iconic gestures (McNeill, 1992; Rogers, 1978, 1979) which may play a role in conveying how solvers reach the end state of a sequential problem.

Lozano and Tversky (2006) found that limiting speech not only increased the number of gestures needed to convey procedural knowledge, but showed which gestures were most critical for achieving the goal of assembling a piece of furniture. In particular, when speakers were prohibited from speaking, representational gestures that demonstrated how and where to properly place the furniture parts were selectively used over deictic gestures such as finger points or mere indications. These iconic gestures carried semantic content which represented attributes, movement, and relationships between objects in the steps of the sequence. Participants whose speech was constrained showed a selective increase for these iconic gestures to provide as complete a picture of the procedure as possible. From these findings, we can conclude that speaking best conveys descriptive information about how a problem space is represented in the mind, but gesture which models action is selectively chosen for demonstrating actions when speech is not sufficient for creating a mental image in the mind of a listener. Therefore, representational gestures which demonstrate actions may provide us insight for understanding how we represent and conceptualize solution-relevant details of particular problems.

Reflections of Conceptualization

Evidence for the Information Packaging Hypothesis is observed in Wagner Cook and Tanenhaus's (2009) study involving solving the Tower of Hanoi. These researchers found that explaining how to solve a physical version of Tower of Hanoi elicits different representational gestures than when explaining how to solve a computer version. The representational gestures in explaining the physical version took the handshape form of grasping the discs with the thumb and forefinger as opposed to the flatter handshape which resembles holding a mouse. Although the speech patterns were similar across both explanations, those who solved the problem using real objects reproduced features of those actions in their gestures more than those who solved the problem by moving those objects with a mouse on a computer screen. The gestures that accompanied the subsequent explanations of how to solve the problem reflected procedural knowledge of the task, where certain concepts can be expressed in the hands as well as through speech. These gestures represented strategies derived from concrete visuospatial concepts. Because verbal responses were similar across the tasks that differ in the type of information encoded (for example, differences between physical and computer versions), gesture may play a role in organizing that information conceptually.

Wagner Cook, Mitchell, and Goldin-Meadow (2008) also addressed gesture's role for directly encoding visuospatial information by manipulating children's gestures when learning abstract mathematical concepts. Children who were permitted to gesture while learning the mathematical concepts retained that knowledge better than children who were only allowed to speak. These findings suggest that gestures serve as a way to spatially represent new, abstract ideas in visuospatial memory, especially in situations of quantitative reasoning. Expressing information in gesture and speech, instead of speech alone, might produce robust memory representations as a result of strong, concrete motor movementsa form of using the hands to train the mind. These movements encoded procedural concepts that may have been retrieved at a later time, especially for the task of conveying action information to others.

More specifically, kinesthetic movements might focus a speaker's attention on selecting pieces and how they relate to produce complex sequences. A speaker can use representational gestures, or movements of their hands and arms that depict the image they are describing (McNeill, 1992) for the purpose of conceptualizing a complicated spatial image into units for speaking. These were precisely the findings of Hostetter, Alibali, and Kita (2007). If representational gestures are indicative of underlying mental representations, then speakers should gesture more when conceptualizing more difficult information. In the Hostetter et al. (2007) study, when speakers described ambiguous pictures of dots that were scattered (abstract), higher gestural levels emerged than when they described dots connected by concrete, geometric shapes. Representational gestures seemed to indicate speakers' attempts to make their mental representations a bit more concrete before placing them into speech. This evidence supports the claim that representational gestures should occur more often when spatial information needs to be organized. This type of conceptual demand differs considerably from that which involves integrating a series of logical steps to describe spatial information. Since the spoken spatial information is more ambiguous, the speaker must do more to chunk it into appropriate units, and would be more likely to use gesture as a strategy to organize such information. This type of strategy may be especially helpful in cases of conceptualizing novel analogs.

Analogical Transfer

Analogical reasoning involves a transfer of relational knowledge from a source problem to a target problem by finding appropriate correspondences between the two problem spaces. This process involves transferring the relational structure from the source to the target, then correspond it to the target. To adapt the solution of the source to the target, the problem solver must store in memory the source of the structure (Yaner and Goel, 2006). This process involves mapping aspects of visual and conceptual information to target analogs. When a person selects the relevant information, that visuospatial information is matched with that of the target. When both problems share common structural attributes, it is more likely that the solver will be able to move back and forth between them. However, when common objects share only functional relations but differ in object attributes, mapping between the two tasks becomes much more difficult (Chen, Mo, and Honomichl, 2004). An example of this would be the Tower of Hanoi problem and its analogic Tea Ceremony problem:

In the inns of certain Himalayan villages is practiced a refined tea ceremony. The ceremony involves a host and exactly two guests, neither more nor less. When his guests have arrived and seated themselves at his table, the host performs three services for them. These services are listed in the order of the nobility the Himalayans attribute to them: stoking the fire, fanning the flames, and pouring the tea. During the ceremony, any of those present may ask, "Honored Sir, may I perform this onerous task for you?" However, a person may request of another only the least noble of the tasks which the other is performing. Furthermore, if a person is performing any tasks, then he may not request a task that is nobler than the least noble task he is already performing. Custom requires that by the time the tea ceremony is over, all the tasks will have been transferred from the host to the most senior of the guests. How can this be accomplished?

Rather than thinking about the problem in terms of transferring objects, the solver must refer to the Tower of Hanoi "disc sizes" abstractly as "least noble" and "most noble." Additionally, the solver must refer to leftmost and rightmost pegs as "least senior" and "most senior." Thus, the step "move the smallest disc to the rightmost peg" becomes "have the most senior guest do the least noble task." In this particular situation, the goal is to have all of the tasks shift from the least noble to the most noble of the guests. The host is analogous to the leftmost Tower of Hanoi peg, while the most senior guest is analogous to the rightmost peg. The least noble task of "fanning the flames" is analogous to the smallest disc while "pouring the tea" is analogous to the largest disc. Therefore, explaining this problem's solution is a matter of stating that the most senior guest must first do the least noble task, the second most senior guest do the second most noble task, and so on. However, the underlying mental representation of the concrete Tower of Hanoi makes it easier for the solver to visualize which guest does which task at exactly the right moment in the sequence. It would be very difficult to solve this conceptual problem and keep track of the steps of the solution without a spatial, mental representation encoded from the original Tower of Hanoi. Therefore, the underlying mental representations of both the Tower of Hanoi and the Tea ceremony are spatial, despite the fact that they describe a different set of verbalized action sequences (speaking in spatial terms versus non-spatial speech).

Previous research has examined many of the gestural effects when explaining how to solve a spatial problem like the Tower of Hanoi; however, it has not yet been determined whether there are transfers of gesture across different problem formats. Furthermore, this research has not yet determined whether experience with the same sensorimotor information is critical for facilitating the conceptualization of novel problems. Gestural representations of one problem could also be useful when working with more abstract representations of the same task goal. All of the analogs of the Tower of Hanoi contain similar respective problem spaces, but these might not be very obvious to the solver, depending on how they are stated. Still, thinking about the Chinese Tea Ceremony analog brings to mind many of the similar conceptual representations of the Tower of Hanoi. Regardless, the problem is highly verbal and does not explicitly state the same amount of spatial information.

Representational gestures may facilitate analogic problem solving by having an impact on working memory. Specifically, they may play a role in illuminating and retrieving mental representations of the source analog to be transferred to the target. Wagner et al. (2004) found that gesturing facilitates recall of visuospatial details. For this to be true, representational gestures must also be linked to processes for conveying conceptual material. If conceptual information underlies gestures associated with describing information in visuospatial memory, those gestures might also be beneficial for solving analogous problems containing more conceptual information. In other words, when the mental representations underlying speech are spatial in nature, gesture may act on those visuospatial details in order to help the solver parse the sequence of actions as he or she explains how to solve an analog.

Transfer of Representational Gestures

So far, we have seen a good deal of supporting evidence for the production of representational gestures in the conceptual phase of speech planning. Furthermore, these representational gestures seem to be a reflection of how problem solvers work in particular problem spaces (Wagner Cook & Tanenhaus, 2009). The present study sought to extend existing research by not only noting how representational gestures reflect specific problem spaces, but whether features of them transfer to different formats. Experiment 1 was designed to determine whether embodied gesture effects transfer across situations with formats of differing concreteness. This experiment extended the findings of Wagner Cook & Tanenhaus (2009) by comparing gestures in solving two different versions of the Tower of Hanoi. In a physical version of the task, solvers manipulated concrete discs of the puzzle with their hands, while the computer version involved clicking and dragging the discs on a screen with a mouse. I examined whether the effects of being in either the physical or computer versions would transfer across changing situations (i.e., whether the gestures from the physical experience transfer when explaining the computer version and vice versa). For participants presented with a computer version of the task following the physical experience, I predicted that the physical action gestures would tend to "carry over" in those explanations, indicating that the speaker called on previous gestural experiences to help facilitate the explanation. On the other hand, if a speaker is placed in the physical situation following the computer experience, he or she might still produce some grasping gestures without actually having the experience of physically manipulating the concrete objects. I will call this an import effect, where previous general knowledge, but not recent concrete experience, is reflected in the approach to the current problem. These transfer and import effects between two very similar formats served as a manipulation check to be extended into Experiment 2. Conceptualizing information in different formats should be reflected in the different representational gestures produced by speakers if they are speaking about the same thing. I examined whether the mapping between source action gestures formed from the physical Tower of Hanoi could aid a speaker in explaining how to solve an analog.

Experiment 2 addressed the role of gesture in a situation where conceptual demand is raised after transferring to the more verbal format of an analog. Spatial imagery will still be needed for solving a particular analog of the Tower of Hanoi, and I wish to address whether this will engage gesture as it does in concrete problems. If a speaker is given the opportunity to gesture when explaining how to solve the analog, the capacity to verbalize information that is stored in a visuospatial format may be greater (Goldin-Meadow et al., 2001). Thus, when speaking about the abstract analog, the mental representations strengthened by gesture might shift some of the cognitive load from other areas of working memory. When information is more evenly distributed across these modalities, speakers may find it easier to place certain information in the hands and certain information through speech. Such information might be reflective of how speakers are conceptualizing in different problem spaces.

One purpose of Experiment 2 was to expand the study of representational gesture to a more abstract situation. Another purpose of the experiment was to test the degree to which representational gestures are imagistic and are linked to underlying structures for conveying abstract, conceptual material. I manipulated the mental representations of some of the speakers by requiring them to invert the first Tower of Hanoi task in order to be able to solve the analog. This manipulation increased the conceptual demand of the task. If speakers found a benefit in using gesture to handle analogous verbal material that involved spatial inversion, then these gestures would provide insight to breaking down the more complex image. Overall load may therefore be decreased and it will be easier for the speaker to give a thorough explanation of how to solve the problem, providing greater insight into the underlying structure and curbing speech dysfluencies that would result from decreased ability to organize conceptual information.

Experiment 1: Physical – Computer Transfer

Experiment 1 was designed to determine whether the embodied gestures found in Cook and Tanenhaus (2009) would transfer to different formats, while also replicating their formatspecific findings. I was interested in comparing format-specific gestures that may differ when comparable speech is used to explain the solution to the Tower of Hanoi. However, my study goes beyond that of Cook and Tanenhaus (2009) because I was also interested in seeing whether strong action gestures from one task carried over to explanations of the alternate task solution. Therefore, I presented the alternative version of the recently completed task and asked them how to solve this specific version. If participants gesture differently despite similar speech in direct explanations of the physical or computer format, my results will build further supporting evidence for the Information Packaging Hypothesis (Alibali et al., 2000). Transferring the gestures when describing the alternative task would be a carryover effect, reflecting retrieval of gestures from the immediately preceding experience with the physical or computer task. More carryover is predicted in physical-computer transfers, because the grasping handshapes seen in the physical format are more specific. In addition to carryover, I predicted an import effect, where the properties of the current task determined the gestures used, rendering the previous experience less relevant.



Participants were Lehigh University undergraduates (n = 32) enrolled in an introductory psychology course. They participated to gain direct experience of the research process. All participants reported that they were fluent in English. None reported recent experience with the Tower of Hanoi.


I used a physical version of the Tower of Hanoi which consisted of three wooden pegs and three wooden discs. Additionally, I used a similar computerized three disc version which presented the Tower of Hanoi task clearly to the participants (Wong, n.d.). Participants solved this version of the puzzle on the computer screen.


Fig 2: Format Transfers

Participants were placed in one of four conditions: physical-to-physical, physical-tocomputer, computer-to-computer, and computer-to-physical (see Figure 2). In the physical condition, participants worked with the physical version of the puzzle and then explained its solution without any exposure to the computerized version. In the physical-to-computer version, participants worked with the physical version, and watched the experimenter partially solve the computerized version before explaining the latter. In the computer-to-computer version, participants worked with only the computerized version and then explained its solution. In the computer-to-physical version, participants solved the computerized version of the puzzle but never actually solved the physical puzzle. Thus, the experiment used a 2 Representational gestures 18 Experience format (physical, computer) by 2 Explanation format (physical, computer) betweengroups design.


Following consent to be videoed, each participant was initially presented with either a physical model of the Tower of Hanoi covered by a paper bag, or the computer version with the monitor turned off. In every condition the experimenter uncovered the model (either by removing the bag or turning on the monitor) and explained to the participant that he or she would be solving a common puzzle where all three discs must be transferred from the first peg on the left, to the third peg on the right. The experimenter provided the two rules: that only one disc could be moved at a time and that a smaller disc could only be placed on top of a larger disc. To encourage as many participants to solve it optimally the first time as possible, the experimenter told the participant that the smallest disc must first be moved to the rightmost peg. Once participants indicated that they understood the rules and hint, they proceeded to solve their respective versions of the puzzle.

Participants were videotaped as they solved and later explained the puzzle. If the participant did not solve the puzzle in seven steps the first time, the experimenter explained to them that it could be solved in fewer steps and had the participant repeat the task until they solved it optimally once. After they solved the puzzle, the experimenter either covered the physical model or turned off the monitor so that neither would visually aid the participant during his or her explanation.

Participants in the physical-to-physical or computer-to-computer versions went into their explanations without being exposed to the other format. Participants in the physical-tocomputer version, following their experience with the physical model, were exposed to the computerized version of the task. The experimenter brought up the computer version and explained that the Tower of Hanoi can also be solved on a computer. The experimenter briefly demonstrated a few moves for the participant but did not actually solve the puzzle. Participants in the computer-to-physical condition, following their experience with the computerized version, were exposed to the physical model. The experimenter uncovered the physical model and explained that the Tower of Hanoi can also be presented as a physical puzzle. Again, a few moves were demonstrated, but the experimenter did not actually solve the puzzle.

In every condition, the experimenter asked each participant to imagine that a second grader would be learning how to solve this physical or computer version of the puzzle for the first time, to think about what that would be like if they were the young child and a complete novice to the task. They were told that they would be explaining how they solved the puzzle for the 2nd grader, being very explicit as to which disc they were moving and where they were moving it to. This way, the second grader would be able to follow along as they described each step. Participants, after indicating that they understood the instructions, explained how they solved the puzzle to the video camera that was operated by the experimenter. Following the explanation, the camera was turned off and participants were debriefed and thanked.


Coding of Embodied Gestures

Two main types of representational gestures occurred alongside speech. Static gestures accompanied speech that indicated a certain size disc (i.e. first you take the smallest disc). Here, the handshape indicated that the participant was about to work with that particular disc, but did not yet send the disc in a particular direction. Static gestures were coded as grasping and non-grasping. Grasping gestures consisted of the thumb and one or more fingers closing down around an imaginary disc. Such gestures appeared to reference the physical version of the Tower of Hanoi. Non-grasping gestures did not consist of these specific hand shapes. Instead, these gestures indicated that the participant was about to select a certain disc, but did not physically mimic holding it. The hand shapes were flatter, less specific, and thus appeared to reference manipulating a computer mouse.

In addition to seeing how participants would select objects, I was interested in whether they would maintain those specific hand shapes when indicating directional information. Directional gestures sometimes accompanied speech that described how to move an object to a specific location (e.g., ...and move it to the rightmost peg). These gestures sometimes maintained static gesture hand shapes, and sometimes did not. For example, two participants might produce similar static grasping gestures, but one might maintain the handshape and one might not when indicating directional information. Directional gestures were therefore coded as grasping and non-grasping. Grasping directional gestures maintained the grasping hand shape while simultaneously moving the hand in space; non-grasping directional gestures indicated directional information without grasping, and so appeared to reference moving a computer mouse.

Physical Solving

Static and directional gestures were compared across physical and computer explaining conditions after solving the physical Tower of Hanoi problem. Results are presented in Figure 3a. Clearly there is no difference in overall gesture levels as a function of the format being explained (46 Physical vs. 52 Computer). However, there is a clear difference in the distribution of grasping and non-grasping gestures across conditions, X-square (1) = 12.96, p < 0.001. Participants who explained the physical version of the Tower of Hanoi maintained high levels of static grasping gestures, but participants who explained the computer version produced relatively balanced numbers of grasping and non-grasping gestures.

Fig. 3a: Frequencies of static grasping gestures produced by physical solvers

Patterns for directional gestures were similar to those of static gestures (Figure 3b). Again, participants showed clear preferences for format-specific gestures. The pattern is not as distinct as for static gestures, but is significant, X-square (1) = 5.61, p < 0.05. Even when describing direction, the incidence of grasps was high among participants who explained the physical version while there were more non-grasps among participants who explained the computer version.

Fig. 3b: Frequencies of directional gestures produced by physical solvers.

Computer Solving

Turning to the conditions where participants had direct experience with the computer version of the problem, the same analyses were preformed. Static and directional gestures were again compared across explained conditions. Results for the static gestures are presented in Figure 4a. Again, overall gesture levels were comparable. Although the distributional pattern resembles that for the physical computer condition, with a preference for grasping gestures in the physical explanation condition, this time it is not significantly different than chance (X-square (1) = 1.25, p = 0.26, ns). Unlike after physical experience of the Tower of Hanoi, format-specific gestures were not as strongly evident after solving the computer version of the Tower of Hanoi.

Fig 4a: frequencies of static gestures produced by computer solvers

Interestingly, participants showed different patterns of directional gestures following the computer experience (Figure 4b). Participants in the physical explaining condition maintained their grasps, whereas those in the computer explaining condition tended to shift to a non-grasping mode, X-square (1) =12.88, p < 0.007. This outcome suggests that participants were "captured by the mouse," when indicating directional information. This phenomenon may have resulted from format-specific encoding of directional information in order to be able to retrieve from memory the steps in solving the puzzle.

Fig. 4b: Frequencies of directional gestures produced by computer solvers

These findings suggest a strong carryover of grasping gestures from physical experience to physical explanation, replicating the results reported by Wagner Cook and Tanenhaus (2009). Participants also imported grasping gestures in physical explanations following computer experience, though this was significant only for directional gestures (22 grasping vs. 9 nongrasping, X-square (1) = 5.42, p < .05. However, there is little evidence for grasping gestures transferring from physical experience to the computer task. Even in explaining the computer version following computer experience, there were as many static grasps (18) as non-grasps (17). However, an interesting shift occurred for directional gestures. Here, the balance of grasping and non-grasping was maintained after physical experience, but shifted substantially to non-grasping gestures following computer experience (compare figures 4a and 4b). This pattern is statistically significant (X-square (1) = 8.00, p < .001).

The results from the computer solvers provide us with slightly different reasons as to why conceptualization may differ as a result of having a less concrete experience. Static gestures were relatively balanced across all of the explaining conditions, indicating that participants' static codes were not strongly influenced by experience with the computer format. However, once participants needed to provide directional information, their gestures became very format-specific. Physical explainers adhered to the preference for imported grasps, whereas computer explainers shifted to non-grasping directional gestures. From this, it appears that the static encoding of objects is not very specific following solving a computerized version of the puzzle, but that encoding becomes more format-specific as speakers indicate directional information in their explanations.

Why did computer solvers show format-specific preferences for indicating direction but not for indicating objects? It may be useful to consider this finding from the perspective that indicating directional information in English is relatively difficult. The grasping handshape may be preferred for indicating the selection of objects (static codes), but the grasp is irrelevant when a shape is dragged by a mouse. Thus, speakers are "captured by the mouse" when visualizing how to transfer the correct discs to the correct pegs, echoing the mouse movements in the gestures they produced. Computer solvers, whose hands were formed around the mouse while solving the task, encoded control of the moved disk differently than the physical solvers whose hands physically grasped the discs.

Under the Information Packaging Hypothesis, images which are conceptualized differently should be indicated in different types of representational gestures. In the present study, speakers may have found it easy to indicate objects, but may have found themselves temporarily uncertain when searching for the correct directional move in order to solve the problem. These findings build upon the Information Packaging Hypothesis by showing that speakers produce format-specific gestures that call upon recent experience, particularly for indicating directional information. The recent experience reflected in these speakers' gestures may be the mind's way of unlocking a way to conceptualize information in a different format. If this is true, then perhaps higher conceptual demand might also lead speakers to prefer thinking in recent formats rather than in formats which reflect knowledge that is relevant, but not tied to recent experience. Conceptualizing information in different formats should be reflected in the different representational gestures produced by speakers if they are speaking about the same thing. To test this hypothesis, we turn to Experiment 2.

Experiment 2: Explaining Analogs

Experiment 2 was designed to explore the role of representational gestures in explaining analogs rather than different formats of the same problem. In analogs, imports may be expected based on the requirements of the analog format, but they may also bring carryover from having solved the physical model of the Tower of Hanoi. For this experiment, I chose to use a Russian Dolls analog:

A New York antiques dealer has made arrangements to have a set of three very expensive and fragile antique Russian dolls delivered to Vladimir in Moscow. The dolls will all travel by mail. To minimize the risk of loss, only one doll may be mailed at a time. The dolls are all currently nested in New York City and will be available for shipping in the order largest to smallest. Also, because of a superstition, only a larger doll can ever be sent to a location where there is already another doll, so that the larger doll can "protect" the smaller doll. Vladimir, who is known to be temperamental, has warned that if this rule is not followed the deal will fall through. To meet these requirements, the London Office of the Antiques dealer will act as a go between. I am going to ask you to describe all of the steps that will be needed for all of the dolls to eventually make it to Moscow.
This particular analog was chosen over the Chinese Tea Ceremony for several reasons. The problem space is different from that of the Tower of Hanoi in terms of the objects and locations, but was not as abstract as the Chinese Tea Ceremony. The different "objects" of the Russian Dolls analog (dolls as opposed to discs) may influence the types of produced representational gestures. Specifically, participants may be able to import "doll-holding" gestures over "disc-grasping" gestures.

The second purpose of this experiment was to test the effects of conceptual demand on representational gestures. Half of the participants solved the Tower of Hanoi in its standard setup, where smaller discs were on top of larger. Keep in mind that solving this Russian Dolls analog involves mentally reversing the traditional Tower of Hanoi, in that larger dolls must be around smaller dolls. Consequently, participants who had only experienced the standard Tower of Hanoi setup were in an incongruent condition relative to the Russian Dolls analog. Participants who solved the Tower of Hanoi in the reversed form did not need to carry out mental reversal. Because their Tower of Hanoi experience directly mapped onto the target analog, these participants were in a congruent condition. I tested the imagistic utility of representational gestures by increasing conceptual demand in the participants who needed to reverse the mental image of the puzzle to solve the analogthat these gestures were linked to underlying mental structures and were not mere handwaving.

In order to examine the effects of gesture availability, the participants were further divided into two groups, with half being free to gesture and half not allowed to gesture. I predicted that representational gestures would aid speakers in translating the base problem into the Russian Dolls format, especially in the more difficult incongruent condition, and conversely, restricting gesture would have a greater cost to fluency in the same condition. If these predictions are supported, I will conclude that representational gestures are engaged at the conceptual or message level of speech planning and can be carried forward or adapted to different formats.

Fig. 5a: Experiment 2 design


Participants were Lehigh University undergraduates (n = 24) enrolled in an introductory psychology course. They participated to receive direct experience of the research process. All participants were fluent in English and reported no recent experience with the Tower of Hanoi.


Participants were placed into four groups: congruent/gesture permitted, congruent/gesture precluded, incongruent/gesture permitted, and incongruent/gesture precluded. Participants first solved the Tower of Hanoi in its standard form or in a form that was already reversed. Because the reversed form directly mapped onto the analogy, it was a considered to be in the congruent form. On the other hand, solving the Tower of Hanoi in standard form called for mental inversion to conceptualize the analog. This version of the Tower of Hanoi was thus considered to be in incongruent form.

After solving the Tower of Hanoi in one of these two formats, participants were presented the Russian Dolls analog and were asked to think-out-loud through it. They were either permitted to gesture or precluded from gesturing during their explanation. Thus, the experiment used a 2 Congruency (congruent, incongruent) by 2 Gesture (permitted, precluded) between-groups design (refer to Figure 5a).


Fig. 5b: visual aid accompanying oral presentation of the analog

Participants solved a physical version of the Tower of Hanoi which consisted of three wooden pegs and three wooden discs, presented in either reverse (congruent) or normal (incongruent) form. Also, as visual aids for instruction on the analog, a set of three Russian dolls was set up on a sheet of paper on which three city names, New York, London, and Moscow, were printed horizontally (see Fig. 5b).


Each participant was lead to a room where a physical model of the Tower of Hanoi was covered by a paper bag. In every condition the experimenter uncovered the model and explained to the participant that he or she would be solving a common puzzle where all three discs must be transferred from the first peg on the left, to the third peg on the right. In the incongruent condition, in which participants solved the Tower of Hanoi in its normal form and would have to reverse the puzzle to solve the analog, the experimenter provided two rules: that only one disc could be moved at a time and that a smaller disc could only be placed on top of a larger disc. In the congruent condition, where participants solved the Tower of Hanoi reversed and could directly map onto the analogy, the experimenter modified the second rule so that a larger disc could only be placed on top of a smaller disc. To encourage participants to solve it optimally in as few tries as possible, the experimenter provided the hint that the top disc must first be moved to the rightmost peg. Once participants indicated that they understood the rules and hint, they proceeded to solve their respective versions of the puzzle. If the participant did not solve the puzzle in seven steps the first time, the experimenter explained to them that it could be solved in a fewer number of steps and had the participant repeat the task until they solved it optimally once. Participants were videotaped as they solved the puzzle. After they solved the puzzle, the experimenter covered the model so that it would not visually aid the participant during his or her explanation.

After solving the base problem, the experimenter informed the participants that she was testing how well the task had prepared them for solving an analog of the Tower of Hanoi. The experimenter also explained that the analog was essentially the same problem but stated in a different way. The experimenter read the Russian Dolls analog to the participant and asked them to understand how it related to the Tower of Hanoi version. While the experimenter orally presented the analog, the participant was able to look at a set of Russian dolls in order to see how their containment order was related to the stacking of discs in the Tower of Hanoi. Also, the participant could reference the sheet with the New York, London, and Moscow labels to get a sense of how to visualize them as the three pegs. Following the oral reading of the analogy, the experimenter informed the participant that he or she could think of the places as the pegs and the different sized dolls as the discs. In the incongruent condition, the experimenter added that the analogy differed in the sense that the sizes were reversedthat the smallest doll must be "beneath the larger dolls."

After the participant indicated that he or she understood the analogy, the experimenter asked him or her to imagine that he or she must explain to someone else how to solve the analog task for the first time, that he or she would think through the problem out loud and explain as though to someone in another room how to do it. The participant was also asked to imagine that this person would have a model of the Tower of Hanoi in front of her and would learn to do it by moving the discs in the same order as she would send the doll. Furthermore, the participant was asked to be specific as to which doll she was sending and the city it was being sent to, so that the person could easily follow along at each step.

If gesture was to be prohibited, the experimenter added to the instructions that she wanted the participant to keep his or her body as still as possible during the explanation so that he or she would be able to focus on the explanation better. The experimenter asked the participant to sit on his or her hands. This was done so that participants would not spontaneously begin to gesture if their hands were unrestricted. Furthermore, I decided that this would be the most "natural" way of suppressing gesture, as many people sit on their hands in informal conversation while speaking from a chair. Participants, after indicating that they understood the instructions, explained how to solve the analog to the video camera that was operated by the experimenter. If the participant gave errors in their explanation that prevented them from continuing, or produced a complete production with errors, he or she was permitted to restart his or her explanation up to two more times. Upon finishing, the experimenter asked if they had any previous experience working with the Tower of Hanoi, and participants' answers were recorded on the video camera. Lastly, the camera was turned off and participants were debriefed and thanked.


Coding of Time and Efficiency

In order to test whether conceptual demand was increased by having to mentally reverse the Tower of Hanoi, all files were coded for how long it took for participants to completely talk through the Russian Dolls analogy. First, I coded the number of restarts participants needed before producing a correct trial. A restart was coded each time a participant completely restarted his or her explanation. Total time was considered as the time it took to completely explain the analogy correctly. In other words, the time of each restart and the time of the correct trial were summed to assess how long it took participants to complete their explanations. Finally, the time of the best complete production was recorded. It is important to note that in the incongruent/gesture precluded condition, many participants never gave a completely correct production, even after two restarts. Thus, their first correct productions were treated as their best complete productions, which for most participants, were nearly always the last ones (with the fewest number of accuracy errors).

Coding of Embodied Gestures

Representational gestures were divided into two main categories: holding and grasping. Holding gestures were produced as though participants were physically holding Russian dolls with two hands, indicating their knowledge of what it feels like to handle this type of object. Thus, holding gestures were a type of import effect, appropriate to Russian dolls and often used despite the recent experience of handling the discs of the physical Tower of Hanoi. Holding gestures were classified as static or directional by the same criteria as in Experiment 1. Static and directional grasping gestures were also coded by Experiment 1 criteria. Hands moving directionally while accompanying spatial information conveyed in speech, but without holding or grasping, were coded as non-specific directional gestures. Gesture repetitions were coded as gestures which repeated when speech phrases repeated. These gestures were also specified as holding, grasping, or non-specific. Finally, non-representational, beat gestures used alongside extraneous speech were coded as "other." Because I was not specifically coding for speech that indicated current states (i.e., so now you have two dolls in London, etc), these gestures were tallied but not considered relevant for the current analysis.

All gestures were coded as either fluency or dysfluency gestures. Dysfluency gestures either lined up with or immediately followed speech dysfluencies, while fluency gestures were produced in conjunction with fluent speech.

Coding of Speech

Speech dysfluencies were coded to indicate speakers' difficulties with the explanations. Silent pauses of one second or longer were coded as hesitations. Expressions that did not add new steps to the solution (i.e., um, like, well, uh) were coded as interjections/filled pauses. Changes in the speaker's choice of dolls (i.e., then you take the largest...the smallest doll...) were coded as size revisions. Similarly, changes in the speaker's choice of destination (i.e., and move it to London...Moscow) were coded as destination revisions. Phrase repetitions were coded for repetitions of at least two complete words of the message.

Conceptual Demand

To determine whether congruency and gesture preclusion increased conceptual demand, I analyzed the time of participants' best complete productions. Mean times are presented in Fig. 6a. Though a 2x2 between-subjects ANOVA with factors of congruency and gesture revealed no significant effects or interaction (F(1,23 < 1), the data do pattern as predicted with longer times in the incongruent and gesture precluded conditions..

Fig. 6a: Mean length of speakers' best complete production

These trend patterns also held for participants' number of needed attempts. Mean attempts are presented in Figure 6b. Though a 2x2 between-subjects ANOVA with factors of congruency and gesture revealed no significant interaction (F(1,23)=1.42, p=0.26, ns), the data also pattern as predicted, with longer times in the incongruent and gesture precluded conditions. There were main effects of congruency (F(1,11)=13.02, p < 0.001) and gesture preclusion (F(1,11)=22.25, p < 0.001). We can conclude that these manipulations increased difficulty in explaining how to solve the analog, but they are independent of one another.

Fig 6b: Mean additional attempts needed before best complete production


Total dysfluencies were compared across congruency conditions. Overall dysfluency was assessed by summing the number of hesitations, interjections (filled pauses), revised errors, unrevised errors, and phrase repetitions. Results are presented in Figure 7a. Effects of congruency and gesture preclusion are evident, but these effects were independent of one another. There were far more dysfluencies in the incongruent conditions and the rate of dysfluency was doubled in both levels of congruency when gesture was precluded. Participants benefited by having access to gesture, despite incongruent mapping.

Fig. 7a: Dysfluency as a function of congruency and gesture preclusion

Gesture Preferences

To make an initial assessment of gestures, they were first divided into representational (including holds and grasps) and nonrepresentational and further classified by their alignment with fluent vs. dysfluent speech in both congruent and incongruent conditions. Total representational gesture counts were obtained by summing all static and directional grasps and holds, while non-representational totals were obtained by summing all non-specific and "other" gestures. Results are presented in Figures 7b and 7c. In both congruent and incongruent conditions, representational gestures overwhelmingly accompanied fluent speech. Chi-square analyses show that the concentration of representational gestures in the fluent conditions is significant in both the congruent (X-square (1) =12.72, p <0.001) and incongruent (X- square (1) = 5.215, p<0.05) conditions. These outcomes suggest that representational gestures accompanied successful steps in the solution description.

Fig 7b,c: Gestures aligned with fluent and dysfluent speech

The most revealing analyses turned out to involve the distribution of different kinds of representational gestures. Specifically, there were systematic patterns in the distribution of holds versus grasps, and these varied with conceptual demand. Figure 8a shows a striking association of holding gestures (referencing dolls) with the congruent problem format and of grasping gestures (referencing disks) with the more difficult incongruent problem mapping. A chi-square analysis confirms that this patterning is highly significant, X-square (1) = 28.28, p <0.001. Participants whose Tower of Hanoi experience directly mapped onto the target analog produced holds almost exclusively. However, participants who needed to mentally reverse their Tower of Hanoi experience produced considerably more grasps than holds. One way to interpret this is that these participants were often forced back to the original disc context to sort out the mapping of dolls to discs, and so their gestures reflected disc-appropriate grasping. In the congruent condition, participants already had the experience of seeing and working with the already-inverted Tower of Hanoi. Their mental representation of this form of the puzzle may have already been encoded. As a result, they may have been able to embellish this "baseline" mental representation with features of the dolls they were speaking about in their solution description. Such details may have been incorporated into their conceptualization and, consequently, in their representational gestures.

Fig 8a: Static representational gesture preferences

These patterns also held when static and directional gestures were separated. Results are presented in Figure 8a,b. The separate Chi-squares were both significant (X-square (1) = 13.67, p<0.001, X-square (1) = 14.80, p<0.001, respectively). Static and directional holds were dominant with congruent mapping, but static and directional grasps dominated with incongruent mapping. For the most part, grasps and holds were maintained from gesture initiation through directional gestures.

Fig. 8b: Directional representational gesture preferences

Experiment 2 Discussion

The main purpose of the above analyses was to determine whether representational gestures have cognitive utility by facilitating solutions to novel, conceptual problems. Because representational gestures reflect conceptual demands (Hostetter et al., 2007), I hypothesized that precluded gesture would make solving a novel problem with high conceptual demand more costly than if the novel problem had low conceptual demand. The first analyses provided partial support for this hypothesis. An analysis of time of best production patterned as expected but did not yield statistically significant effects. An analysis of number of attempts needed to successfully explain the problem was more sensitive. There were separate effects of gesture preclusion and congruency of base problem format, but these effects were not statistically interdependent. Merely thinking about the problem in a conceptually different format (congruent conditions) was not particularly difficult, but thinking about a problem in a conceptually difficult format (incongruent conditions) was quite challenging. Precluding gesture added to the relative difficulty of the speakers' task at both levels of congruence.

Similar patterns emerged in the analysis of speech dysfluencies. There was a relative benefit to fluency in permitting gesture in both congruent and incongruent-mapping conditions. Allowing gesture with incongruent mapping caused speakers to be about as fluent as those who solved the analog with congruent mapping. Again, however, we cannot conclude that fluency was differentially impaired with high conceptual demand.

My hypothesis was also contingent on the idea that representational, but not beat gestures are critical for conceptualization. Representational gestures overwhelmingly accompanied fluent speech while if any gestures accompanied dysfluent speech they were nonrepresentational. These results, however, are inconclusive as to whether representational gestures necessarily relieve speech dysfluencies because participants did not seem to utilize them during the dysfluent episodes. Thus, the non-representational gestures which accompanied dysfluent speech may have been nothing more than a reflection of dysfluent thinking. However, an alternative view is that representational gestures did not relieve, but prevented dysfluencies as they were produced alongside fluent speech. Gestures temporally overlap with semantically co-expressive words (McNeill, 1992), but onsets of gestural movements typically are produced before the onset of the related speech. The differences in onset times can range from less than one to up to a few seconds (Morrell-Samuels & Krauss, 1992). It may be that the produced representational gestures preceded the speech they accompanied, and that planning these gestures may be enough to organize such conceptual information into verbalizable groupings as claimed by the Information Packaging Hypothesis. Further analysis of the timing of the onset of representational gestures relative to the onset of speech may help clarify this issue.

One of the most revealing outcomes of Experiment 2 is that representational gestures were quite different in the congruent and incongruent conditions. Holding gestures were imported in congruent-mapping situations, while grasping gestures were carried forward into incongruent-mapping situations. These format-specific gestures appear to reflect differences in how speakers conceptualized the problems. When conceptual demand was low, participants conceptualized the problem more generally, using their outside knowledge of Russian dolls to visualize and retrieve solution-relevant details. When conceptual demand was higher, participants were more likely to conceptualize the analogy through their most recent experience by visualizing a concrete model of the Tower of Hanoi. One explanation which may account for this is that being conscious of the difficult mental reversal imposed a more specific type of cognitive demand. Speakers may have needed to refer to the concrete format of the physical Tower of Hanoi in order to sort out their next moves. On the other hand, speakers who had the benefit of congruent mapping may have found it easier to transfer to the new format, as shown by the importing of holding gestures. This finding seems to be consistent with the findings of Experiment 1 by supporting the likelihood that speakers will gesture in line with the more recent format, provided that the two formats are congruent. In Experiment 1, all of the transfers to different formats involved congruent mapping, i.e. the two formats were different but similar in conceptual difficulty. This may account for why grasping gestures– which reflected the physical experience–did not carry forward to the computerized format, as predicted. If, however, participants needed to explain an incongruent computerized version, we could see a carryover effect similar to that observed in the incongruent condition of Experiment 2. Future studies may want to explore this possibility as it may speak for the cognitive utility of format-specific gestures when strategizing in different formats.

General Discussion

The main question of interest in the present study was whether representational gestures aid conceptualization by transferring across problem formats. To address this question, I developed two tasks that elicited similar utterances, but differed in how speakers were expected to think about the problems. Experiment 1 extended the findings of Wagner Cook and Tanenhaus (2009) by showing that embodied gestures can be imported to opposite formats. Grasping gestures of physical-physical solvers carried forward while non-grasping gestures were imported to explain a computerized version of the Tower of Hanoi. Following a computer experience, speakers were not format-specific in referring to selected objects but were format-specific for indicating direction of movement. Experiment 2 tested whether representational gestures respond to conceptual demand by facilitating solutions to novel problems. Speakers experienced a greater cost to fluency when gesture was precluded when thinking through a problem with higher conceptual demand. Furthermore, speakers' representational gestures varied in format, even though they all described the same analog. Because representational gestures accompanied fluent speech, the speech content was largely equivalent in this comparison. Holding gestures were imported to facilitate explanations in a straightforward new format, but disk-appropriate grasping gestures were employed when the same problem was conceptually and spatially more complex. These findings support the Information Packaging Hypothesis by demonstrating the association of representational gesture with conceptual planning rather than linguistic formulation. They also extend previous research by specifying when problem solvers draw on recent experience rather than on general knowledge when asked to describe a problem in a different format.

Representational gestures are a way of breaking down a lengthy process into a discrete sequence of steps (Alibali et al., 2000). The Tower of Hanoi is a good example of this type of task as speakers must solve the problem optimally in 7 steps, with each step contingent on a previous correct step. The cognitive processes behind this task are dictated by its embodied nature. Individuals will think about it differently depending on real-world interactions such as manipulating objects. From this view, reproducing features of these interactions in gesture may be a way of accessing embodied spatial content in the perceptual-motor system (Schwartz & Black, 1999). The utility of this is that the speaker may access solution-relevant details for explaining how to solve the problem. By using representational gestures, solution-relevant details form strong mental representations in the visuospatial system that give solvers insight regarding the most appropriate way to conceptualize the problem. The results of the present study stand in agreement with this view and extend it by indicating what types of conceptualization are reflected in format-specific gestures.

In Experiment 1, speakers articulated their movements not only using speech, but through specific motor plans. Wagner Cook and Tanenhaus (2009) showed that speakers provide reliable perceptual-motor information in gesture when physical objects are not available in the immediate environment. This important finding accounts for why non-grasping gestures were produced following solving a computerized version of the Tower of Hanoi. What speakers represented in gesture followed consistent patterns that reflected the mental representation of the task goal. My findings not only demonstrate this specificity after experience with one task, but with the most recent task. My findings indicate that when we are faced with new problems, we will immediately conceptualize them according to salient perceptual features of the problem's particular format. It is almost as though the human mind will instinctively immerse itself in the current format to be as efficient as possible in retrieving solution-relevant details. These details, which reflect differences in conceptualization, are produced through representational gestures. I propose, then, that conceptualizing the same task in a different domain involves a subsequent planning of format-specific gestures which reflect how task-relevant features are organized.

However, transferring formats does not always involve a change in conceptualization, as is demonstrated in Experiment 2. Gesture carryover effects were evident when conceptual demand was high, indicating a preference to remain in the format of the previous task rather than to switch to a new one. These findings agree with those of Hostetter et al. (2007), which assert that speakers produce embodied representations of conceptually difficult spatial images for the purposes of organization. In the present study, higher conceptual demand involved reorganizing a complex spatial task into smaller units, and speakers' hands may have preferred to aid the speaker in visualizing concrete aspects of the Tower of Hanoi. I propose that, in situations where there is a greater degree of re-organization, our hands must find a way to efficiently examine the new visuospatial details and how they relate within the sequence. In this case, it should be more efficient to first think about how the Tower of Hanoi looks in reversed form before visualizing three Russian dolls being shipped to Moscow. Because the speakers in the congruent mapping condition already had the experience of solving the Tower of Hanoi in reversed form, they had already encoded the perceptual details of what the Tower looked like in reversed-form. With this basic mental representation in place, speakers could embellish their conceptualization with more details, incorporating their general knowledge of Russian dolls to make the images more vivid. On the other hand, speakers in the incongruent mapping condition had to mentally formulate the image of the reversed Tower of Hanoi without having any previous experience of it. They needed to give themselves the experience of it mentally because they did not have access to it physically. The resulting grasping gestures produced aspects of this concrete mental representation which provided the appropriate background for approaching the novel problem.

This proposal is consistent with Alibali et al. (1999) in that problem content must influence the resulting mental representation of that problem. The findings of Experiment 2 may be considered as involving gesture mismatches, where gesture and speech do not always convey the same information. An example of this would be representing features of problems in gestures that do not employ the same strategies being described in speech, such as whether the hands move horizontally in a different direction than what is indicated in speech, as though the speaker may be briefly considering an alternative move. Alibali et al. (1999) argue that content is one of many factors that might influence how a problem is represented. I have extended this by showing that the conceptual demand of that content may be another factor which influences mismatches by showing that speakers prefer to represent concrete experiences in gesture while speaking about a new problem. These mismatches may be critical for giving insight for solving the more difficult version of the problem, and suppressing gesture would only restrict access to the mismatch. These findings further the understanding of how people represent problems by showing that strategy choice depends not only on content, but on difficulty.

Taken together, the findings support the conclusions that representational gestures reflect the conceptual differences that arise between problem formats and aid speakers' fluency in generating descriptions of complex problems. These representational gestures offer valuable insight for understanding how people choose to visualize and construct specific problem-solving strategies. Spatial thinking has an embodied nature which is enhanced by our actions in the world and how those actions build spatial representations in our minds. The ability to conceptualize space is critical for planning speech, not necessarily at the linguistic formulation stage, but at the pre-linguistic message level where thoughts are organized before they are placed into words. Thus, in accordance with Chu and Kita (2008), gestures that reflect non-linguistic material are perhaps linked with, but are not a part of, the speech production system (see Figure 1). The representational gestures that accompany speech may be generated from a "growth point" in which images and linguistic categories inter-relate and evolve into gestures and utterances (McNeill, 1992). Under this view, the results of the present study speak for representational gestures as a way to encode action information. My results indicate that representational gestures systematically play a role in thinking about problems in different formats, a process that provides cognitive utility beyond maintaining the flow of speech. Specific features of this phenomenon not only provide insight about how we solve problems, but how we conceptualize different problem formats in our minds.


Alibali, M. W., Bassok, M., Solomon, K. O., Syc, S. E., & Goldin-Meadow, S (1999). Illuminating mental representations through speech and gesture. Psychological Science, 10 (4), 327-333.

Alibali, M.W., & DiRusso, A.A. (1999). The function of gesture in learning to count: More than keeping track. Cognitive Development, 14, 37-56.

Alibali, M. W., Kita, S., & Young, A. J. (2000). Gesture and the process of speech production. We think, therefore we gesture. Language and Cognitive Processes, 15 (6), 593-613.

Chen, Z., Mo, L., & Honomichl, R (2004). Having the memory of an elephant: Long-term retrieval and the use of analogues in problem solving. Journal of Experimental Psychology: General, 133 (3), 415-433.

Chu, M., & Kita, S. (2008). Spontaneous gestures during mental rotation tasks: Insights into the microdevelopment of motor strategy. Journal of Experimental Psychology: General, 137 (4), 706-723.

Goldin-Meadow, S., Nusbaum, H., Kelly, S. D., & Wagner, S. (2001). Explaining math: Gesturing lightens the load. Psychological Science, 12 (6), 2001.

Hayes, J.R., & Simon, H.A. (1977). Psychological differences among problem isomorphs. In N.J. Castellan, Jr., D.B. Pisoni, & G.R. Potts (Eds.), Cognitive theory (Vol. 2). Hillsdale, NJ: Erlbaum.

Hostetter, Autumn. B., Alibali, Martha W. and Kita, S (2007). I see it in my hands' eye: Representational gestures reflect conceptual demands. Language and Cognitive Processes, 22 (3), 313-336.

Kita, S. (in press). How representational gestures help speaking. In D. McNeil (Ed.), Language and gesture: Window into thought and action, pp. 162-185. Cambridge, UK: Cambridge University Press.

Kotovsky, K., Hayes, J. R., and Simon, H. A. (1985). Why are some problems hard? Evidence from Tower of Hanoi. Cognitive Psychology, 17, 248-294.

Levelt, W.J.M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press.

Lozano, S. C., & Tversky, B. (2006). Communicative gestures facilitate problem solving for both communicators and recipients. Journal of Memory and Language, 55, 47-63.

McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: University of Chicago Press.

Morrel-Samuels, P., & Krauss, R.M. (1992). Word familiarity predicts temporal asynchrony of hand gestures and speech. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 615-622.

Rauscher, F. H., Krauss, R. M., & Chen, Y. (1996). Gesture, speech, and lexical access: The role of lexical movements in speech production. Psychological Science, 7, (4), 226-231.

Rogers, W. T. (1978). The contribution of kinesic illustrators toward the comprehension of verbal behavior within utterances. Human Communication Research, 5, 54-62.

Rogers. W. T. (1979). The relevance of body motion cues to both functional and dysfunction communicative behavior. Journal of Communication Disorders, 12, 273-282.

Schwartz, D.L., & Black, T. (1999). Inferences through imagined actions: Knowing by simulated doing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 116-136.

Wagner, S. M., Nusbaum, H., & Goldin-Meadow, S. (2004). Probing the mental representation of gesture: Is handwaving spatial? Journal of Memory and Language, 50, 395-407.

Wagner Cook, S., Mitchell, Z., & Goldin-Meadow, S. (2008). Gesturing makes learning last. Cognition, 106, 1047-1058.

Wagner Cook, S., & Tanenhaus, M. K. (2009). Embodied communication: Speakers' gestures affect listeners' actions. Cognition, 113, 98-104.

Willingham, D. B., Nissen, M. J., & Bullemer, P (1989). On the development of procedural knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15 (6), 1047-1060.

Wong, K. (n.d.) LHS Tower of Hanoi. Retrieved from

Yaner, P.W., & Goel, A.K. (2006). Visual analogy: Viewing analogical retrieval and mapping as constraint satisfaction problems. Applied Intelligence, 25, 91-105.