National Science Foundation, Developing AutoTutor for computer literacy and physics, 2001-2005, $1,274,075. Art Graesser is PI.

The Tutoring Research Group at the University of Memphis has developed a computer tutor (called AutoTutor) that simulates the discourse patterns and pedagogical strategies of unaccomplished human tutors. The typical tutor in a school system is unaccomplished in the sense that the tutor has had no training in tutoring strategies and has only introductory-to-intermediate knowledge about the topic. The development of AutoTutor was funded by an NSF grant. The discourse patterns and pedagogical strategies in AutoTutor were based on a previous project that dissected 100 hours of naturalistic tutoring sessions.

AutoTutor is currently targeted for college students in introductory computer literacy courses, who learn the fundamentals of hardware, operating systems, and the Internet. Instead of merely being an information delivery system, AutoTutor serves as a discourse prosthesis or collaborative scaffold that assists the student in actively constructing knowledge. AutoTutor presents questions and problems from a curriculum script, attempts to comprehend learner contributions that are entered by keyboard, answers student questions, formulates dialog moves that are sensitive to the learner’s contributions (such as short feedback, pumps, prompts, assertions, corrections, and hints), and delivers the dialog moves with a talking head. The talking head displays emotions, produces synthesized speech with discourse-sensitive intonation, and points to entities on graphical displays. AutoTutor has seven modules: a curriculum script, language extraction, speech act classification, latent semantic analysis (a statistical representation of domain knowledge), topic selection, dialog management, and a talking head. Evaluations of AutoTutor have shown that the tutoring system improves learning with an effect size that is comparable to typical human tutors in school systems, but not as high as accomplished human tutors and intelligent tutoring systems. The dialog moves of AutoTutor blend in the discourse context very smoothly because students cannot distinguish whether a speech act was generated by AutoTutor or a human tutor.

This research will substantially expand the capabilities of AutoTutor by designing the discourse to handle more sophisticated tutoring mechanisms. These mechanisms should further enhance the active construction of knowledge. One enhancement is to get the student to articulate more knowledge, with more formal, symbolic, and precise specification; if the student doesn’t say it, it is not considered covered by AutoTutor. Another enhancement is to set up the dialog so that it guides the user in manipulating a 3-dimensional microworld of a physical system; the student attempts to simulate a new state in the physical system by manipulating parameters, inputs, and formulae. The research will develop AutoTutor in the domains of both computer literacy and Newtonian physics, so we will have some foundation for evaluating the generality of AutoTutor’s mechanisms. AutoTutor has been designed to be generic, rather than domain-specific; an authoring tool will be developed that makes it easy for instructors to prepare new material on new topics. After the new versions of AutoTutor are completed, we will evaluate its effectiveness on learning gains, conversational smoothness, and pedagogical quality. During the course of achieving these engineering and educational objectives, the project will conduct basic research in cognitive psychology, discourse processes, computer science, and computational linguistics.

Project Description

General Background: Theory, Research, and Practice

It is widely acknowledged in the field of education that students rarely acquire a deep understanding of the material they are supposed to learn in their courses. Students normally settle for shallow knowledge, such as lists of concepts, a handful of facts about each concept, and simple definitions of key terms. Students lack the deep coherent explanations that organize the shallow knowledge and that fortify the learner for generating inferences, solving problems, and applying their knowledge to practical situations. They lack the skill of articulating and manipulating symbols, formal expressions, and precise quantities. They lack the ability to forecast how a complex system will behave when given different inputs. The acquisition of shallow knowledge is unfortunately reinforced by the normal classroom activities and testing formats. Classroom lectures typically are information delivery systems for shallow knowledge. The teacher’s questions are typically short-answer questions that require only single words or short phrases in the student responses. The format of most examinations consists of multiple choice, true-false, or fill-in-the-blank questions that, once again, tap primarily the shallow knowledge. Given this unfortunate state of affairs, many researchers and teachers in education have been exploring learning environments and pedagogical strategies that promote deep comprehension.

The constructivist movement is the most popular approach to cracking the barrier of shallow knowledge (Biggs, 1996; Bransford, Goldman, & Vye, 1991; Brown, 1988; Chi, deLeeuw, Chiu, & LaVancher, 1994; Palincsar & Brown, 1984; Papert, 1980; Piaget, 1952; Pressley & Wharton-McDonald, 1997; Rogoff, 1990; Vygotsky, 1978). According to this approach, the learner needs to actively construct meanings and knowledge by interacting with the world and other people. Learning environments should stimulate active construction of knowledge and provide feedback on these constructions rather than being mere information delivery systems. Dialectical constructivism stipulates that complex learning primarily occurs through an interaction between learners and their environments, whereas exogenous constructivism emphasizes the constraints of the outside world and endogenous constructivism emphasizes the cognitive and biological constraints of the learner (Moshman, 1982). Constructivist approaches have been so compelling that they have shaped the standards for curriculum and instruction in the United States during the last decade, e.g., Standards for the English Language Arts (NCTE, 1996), Curriculum and Evaluation Standards for School Mathematics (NCTM, 1989), National Science Education Standards (NRC, 1996). One of the central challenges for constructivist theorists is to identify the strategies, processes, practices, and environments that account for the learning gains. Effective pedagogical activities may not be the same for different domains of knowledge and different classes of learners.

One-on-one tutoring is the simplest learning context to investigate because there is only one learner and one teacher. The researcher can track particular pedagogical activities and assess the learning gains, without worrying about the impact of the other social agents in group or classroom environments. Moreover, one-on-one human tutoring is extremely effective when compared to typical classroom environments. Cohen, Kulik, and Kulik (1982) performed a meta-analysis on a large sample of studies that compared human-to-human tutoring with classroom controls. The vast majority of the tutors in these studies were untrained in tutoring skills and had moderate domain knowledge; they were peer tutors, cross-age tutors, or paraprofessionals, but rarely accomplished professionals. These “unaccomplished” human tutors enhanced learning with an effect size of .4 standard deviation units, which translates to approximately a half a letter grade. Accomplished human tutors do substantially better according to Bloom (1984), who reported an effect side of 2.0 standard deviation units in learning gains (or 2 letter grades approximately). So the advantage of human-to-human tutoring over the classroom appears to vary between .4 and 2.0 standard deviation units, depending on the expertise of the tutor.

Computer tutors have implemented some of the pedagogical strategies and discourse patterns of human tutors or ideal learning theories. One advantage of computer tutors is that particular pedagogical strategies can be manipulated, as opposed to merely observed naturalistically. This makes it easier to determine whether a particular pedagogical component has a causal impact on learning gains. Moreover, the computer tutors have proven to be effective. As will be discussed later, AutoTutor implemented the pedagogical strategies and dialogue patterns of normal unskilled tutors and produced learning gains of .5 to .6 standard deviation units (Graesser, VanLehn, Rose, Jordan, & Harter, in press). During the last 20 years, several intelligent tutoring systems have been developed that implement sophisticated strategies and mechanisms for promoting learning, such as the error identification and correction, building on prerequisites, frontier learning (expanding on what the learner already knows), student modeling (inferring what the student knows and having that guide strategies), and building coherent explanations (Anderson, Corbett, Koedinger, & Pelletier, 1995; Gertner, & VanLehn, 2000; Koedinger, Anderson, Hadley, & Mark, 1997; Legold, Lajoie, Bunzo, & Eggan, 1992; Sleeman & Brown, 1982; vanLehn, 1990). Those systems that have been successfully implemented (such as van Lehn’s ANDES physics tutor and Koedinger’s PACT algebra tutor) have produced learning gains of approximately 1.0 standard deviation unit (one letter grade). It should be noted that most intelligent tutoring systems (ITS) have either not been fully implemented or have not been assessed in promoting learning gains so the field is not quite ready to offer a meaningful meta-analysis. However, with the data available, it appears that the learning gains of these sophisticated ITS’s (1.0 SD) are higher than those of unaccomplished human tutors (.4 SD) but not quite as good as the accomplished human tutors (2.0 SD). AutoTutor’s performance (.5 to .6 SD) is on par with unaccomplished human tutors.

In summary, in-depth analyses of tutorial dialog have uncovered two different mechanisms that potentially explain the effectiveness of tutoring (Corbett, Anderson, Graesser, Koedinger, & VanLehn, 1999; Graesser, Person & Magliano, 1995). The first is the sophisticated tutoring strategies that have been identified in the intelligent tutoring literature. The second is the dialog patterns and natural language that help the tutor scaffold the learner to new levels of mastery. According to Graesser et al. (1995) and the theoretical foundation of AutoTutor-1, there is something about discourse and natural language (as opposed to sophisticated pedagogical strategies) that explains the effectiveness of unaccomplished human computers (as will be discussed later). According to the performance assessments of ITS systems, the sophisticated tutoring strategies move the tutoring process one giant step further. Therefore, the underlying premise of the proposed research is that the ideal computer tutor would embrace both of these mechanisms.

Succinct Overview of Proposed Research

The general goal of the proposed research is to build and test new versions of AutoTutor (hereafter called AutoTutor-2) that combine discourse/natural language mechanisms (D/NLP) and sophisticated intelligent tutoring mechanisms (IT). These two mechanisms are believed to assist the learner in active construction of knowledge, in the tradition of constructivism. Computer systems will be built for introductory computer literacy and introductory Newtonian physics. After these systems are built, we will evaluate the impact of different learning conditions on learning gains in tests of shallow versus deep knowledge, and on learners’ perception of the learning experience. These outcome measures will be compared in the following learning conditions: (1) no new learning control, (2) rereading a chapter from a book, (3) human tutor selected from the pool of tutors in a university setting, (4) D/NLP alone (AutoTutor-1), (5) D/NLP + IT hybrid (AutoTutor-2), (6) IT alone (when available). A second general goal is to investigate more specific theories of cognition and discourse that underlie the constructive processes, as will be discussed later in the proposal. For example, how is language coordinated with animation and visual media when a learner manipulates a 3-dimensional microworld of a physical system? A third general goal is to explore new computational architectures that will enhance the capabilities of AutoTutor. For example, how can AutoTutor learn from experience by tuning existing fuzzy production rules and creating new production rules? A fourth general goal is to test and develop models in computational linguistics that are needed to understand natural language (NLU) and generate natural language (NLG).

It should be apparent that the proposed research is an interdisciplinary effort that incorporates several fields in addition to education. The PI, co-PI’s, senior researchers, and postdoctoral researcher include 5 individuals who are primarily affiliated with cognitive psychology (Graesser, Gholson, Hu, Person, and Wolff), 2 in computer science (Garzon, Kosma), 1 in physics (Franceschetti), and 1 in linguistics (Louwerse). However, nearly all of these 9 faculty have substantial expertise in more than field, in the spirit of the interdisciplinary Institute for Intelligent Systems at the University of Memphis. There are two researchers outside of the city of Memphis who have agreed to serve as consultants on our NSF research. Kurt VanLehn is a senior researcher in Computer Science at the Learning Research and Development Center at the University of Pittsburgh. VanLehn has nearly two decades of experience in developing intelligent tutoring systems (including the Andes and Atlas tutoring systems in physics, Gertner & VanLehn, 2000) and recently was program chair of the international Intelligent Tutoring System society. University of Pittsburgh (VanLehn) is collaborating with the University of Memphis (Graesser, Franceschetti, Hu, Louwerse, and Person) on an ONR/MURI grant that involves building an intelligent tutoring system in qualitative physics (with a different architecture than the proposed AutoTutor-2)(Graesser, VanLehn et al.,2000). James Lester is a professor of Computer Science at North Carolina State University. Lester is one of the pioneers in developing animated pedagogical conversational agents and avatars in tutoring contexts (Lester, Voerman et al., 1999). VanLehn provides critical advice for developing IT whereas Lester provides advice for D/NLP. The professors in the proposed research will supervise 6 graduate students funded on this project. A computer programmer will also be hired to devote full time to the development of AutoTutor-2.

What is AutoTutor?

The Tutoring Research Group (TRG) at the University of Memphis has developed a fully automated computer system, called AutoTutor, that simulates a typical human tutor (Graesser, Franklin et al.,1998; Graesser VanLehn, et al., in press; Graesser, Wiemer-Hastings et al., 1999; Person, Graesser, Kreuz et al., in press; Wiemer-Hastings, Graesser et al., 1998). We began developing AutoTutor in September of 1997 when we were funded by an NSF grant in the Learning and Intelligent Systems program (SBR 9720314, which ends September, 2000, LIS, NSF97-18). AutoTutor attempts to comprehend student contributions and to simulate dialog moves of human tutors. AutoTutor-1 is currently simulating the dialog moves of normal (unskilled) tutors, whereas in the proposed research we hope to develop AutoTutor-2, which will incorporate more sophisticated tutoring strategies. AutoTutor was developed for college students who take an introductory course in computer literacy. These students learn the fundamentals of computer hardware, the operating system, and the Internet. AutoTutor is written in the Java programming language and is currently implemented on Pentiums in an NT operating system.

A brief snapshot of AutoTutor-1 in action should concretize the nature of AutoTutor. AutoTutor works by having a conversation with the learner. AutoTutor appears as a talking head that acts as a dialog partner with the learner. The talking head delivers AutoTutor’s dialog moves with synthesized speech, intonation, facial expressions, and gestures. The major question (or problem) that the learner is working on is both spoken by AutoTutor and is printed at the top of the screen. The major questions are generated systematically from a curriculum script, a module that will be discussed later. AutoTutor’s major questions are not the fill-in-the blank, true/false, or multiple choice questions that are so popular in the US educational system. Instead, the questions invite lengthy explanations and deep reasoning (e.g., answers to why, how, what-if questions). The goal is to encourage students to articulate lengthier answers that exhibit deep reasoning, rather than to recite short snippets of shallow knowledge. There is a continuous multi-turn tutorial dialog between AutoTutor and the learner during the course of answering a major question (or solving a problem). When considering both the learner and AutoTutor, it typically takes 10 to 30 turns during the tutorial dialog when a single question from the curriculum script is answered. The learner types in his/her contributions during the exchange by keyboard. For some topics, there are graphical displays and animation, with components that AutoTutor points to. AutoTutor was designed to be a good conversation partner that comprehends, speaks, points, and displays emotions, all in a coordinated fashion.

An example human-tutor dialog. The exchange below is one of the shorter tutorial dialogs between a college student and AutoTutor. This student was an above average student in the computer literacy class.

01 AUTOTUTOR: When you turn on the computer, how is the operating system first activated and loaded into RAM?
02 STUDENT: When the computer is turned on a file is automatically booted up.
03 AUTOTUTOR: Anything else?
04 STUDENT: This file contains the operating system.
05 AUTOTUTOR: What else?
06 STUDENT: The operating system is then read into RAM and activated.
07 AUTOTUTOR: Right. The CPU executes instructions in ROM, which help the operating system to start.
08 STUDENT: After being read from ROM the OS is activated.
09 AUTOTUTOR: Right. Let’s review what we’ve just gone over. The operating system must first be stored on the hard disk. When you turn on the computer, the CPU executes instructions stored in ROM. These instructions help boot the operating system. The operating system is then loaded from the hard disk into RAM.

AutoTutor first asks a how-question in turn 1. Then the student provides answer information in turns 2, 4, and 6. These answers are encouraged by AutoTutor’s pumps in turns 3 and 5 (i.e., “Anything else?”, “What else?”). AutoTutor delivers these pumps in an effort to encourage active learning and question answering on the part of the student. Thus, instead of being an information delivery system that bombards the student with a large volume of information, AutoTutor is a discourse prosthesis that attempts to get the student to do the talking and that explores what the student knows. As discussed earlier, AutoTutor adopts a constructivist theoretical view that a key feature of effective learning lies in assisting students in actively constructing subjective explanations and elaborations of the material (Bransford et al., 1991; Chi et al., 1994; Conati & van Lehn, 1999; Pressley et al., 1992), as students answer questions and solve problems that require deep reasoning. At the same time, however, the students need to answer enough questions and to solve enough problems for them to understand the constraints of the domain knowledge. It would not be good for the student to flounder unproductively for a long time, so AutoTutor sometimes needs to bring the student back on track by supplying cues and clues that lead to the evolution of a complete answer to the question. These clues include hints, prompts for the student to fill in a word or phrase, and assertions that fill in missing ideas. The student had forgotten the role of ROM in launching the operating system, so AutoTutor brings up ROM in turn 7. The student builds on this suggestion in turn 8. At that point, the important pieces of a good complete answer have been covered, so AutoTutor summarizes the answer in turn 9. AutoTutor periodically gives positive immediate feedback after the student contributions (i.e., “right.”). This feedback is not only motivating, but creates the impression that AutoTutor is listening to what the student is communicating. These characteristics of a tutorial exchange are quite similar to discourse patterns in normal tutoring between humans (Graesser & Person, 1994; Graesser et al., Person & Graesser, 1999), as will be described shortly.

Tutorial dialog with unskilled tutors and ideal tutors. AutoTutor incorporated features of tutorial dialog that are prevalent in normal tutoring sessions with unaccomplished human tutors. In previous research projects funded by the Office of Naval Research, Graesser and Person videotaped, transcribed, and analyzed nearly 100 hours of naturalistic tutoring sessions (Graesser & Person, 1994; Graesser, Person, & Magliano, 1995; Person & Graesser, 1999). The corpus of tutoring sessions included (a) graduate students tutoring undergraduates on the fundamentals of research methods) and (b) middle school students tutoring younger students in basic algebra. After analyzing this rich corpus, Graesser and Person discovered what tutors do versus do not do during most tutoring sessions. Our discoveries were enlightening and often counterintuitive. Whatever tutors do is extremely effective when considering learning gains, as discussed earlier.

Our anatomy of normal tutoring sessions revealed that normal unskilled tutors do not use most of the ideal tutoring strategies that have been identified in education and the intelligent tutoring system enterprise. These strategies include the Socratic method (Collins, 1985), modeling-scaffolding-fading (Collins et al., 1989; Rogoff, 1990), reciprocal training (Palincsar & Brown, 1984), anchored situated learning (Bransford et al., 1991), error diagnosis and remediation (Sleeman & Brown, 1982), frontier learning, building on prerequisites (Gagne, 1977), and sophisticated motivational techniques (Lepper et al., 1991). Detailed discourse analyses have been performed on small samples of accomplished tutors in an attempt to identify sophisticated tutoring strategies (Fox, 1993; Hume et al., 1996; Merrill et al., Moore, 1995; Putnam, 1987). However, we discovered that nearly all of these sophisticated tutoring strategies were virtually nonexistent in the unskilled tutoring sessions that we videotaped and analyzed (Graesser et al., 1995; Person & Graesser, 1999). Tutors clearly need to be trained how to use the sophisticated tutoring tactics because they do not routinely emerge in naturalistic tutoring with untrained tutors. The primary assumption that underlies the proposed research is that the most effective computer tutor will be a hybrid between naturalistic tutorial dialog and ideal pedagogical strategies.

The 5-step dialog frame is one of the prominent dialog patterns in both naturalistic tutoring and many intelligent tutoring systems (Graesser & Person, 1994). The five steps in this frame are presented below.

Step 1: Tutor asks question (or presents problem)
Step 2: Learner answers question (or begins to solve problem)
Step 3: Tutor gives short immediate feedback on the quality of the answer (or solution)
Step 4: The tutor and learner collaboratively improve the quality of the answer.
Step 5: The tutor assesses the learner’s understanding of the answer

This 5-step frame has been adopted in AutoTutor-1 and will continue to be adopted in future versions of AutoTutor. This 5-step dialog frame in tutoring is a significant augmentation over the 3-step pattern that is prevalent in classroom instruction. That is, Mehan (1979) and others have reported a 3-step pattern that is often referred to as IRE: Initiation (a question or claim articulated by the teacher), Response (an answer or comment provided by the student) and Evaluation (the teacher evaluates the student contribution). These IRE steps directly correspond to steps 1, 2, and 3 of the 5-step dialog frame for tutoring. Graesser et al. (1995) argued that the advantage of tutoring over the classroom lies primarily the lengthy multi-turn exchange in step 4. Another possibility might be Step 5. However, our anatomy of naturalistic tutoring revealed that tutors only minimally assess the learner’s understanding of the student in step 5. The tutor normally asks “Do you understand?” and then the vast majority of student responses are positive (“Yes”), even though most of the students have a vague, incomplete, or incorrect understanding (Person, Graesser et al., 1994); in fact, it is the better students who tend to answer “No” to these comprehension-gauging questions, perhaps because they are more self-regulated learners or have more fine-tuned metacognitive strategies (Hacker, Dunlosky, & Graesser, 1998). An ideal tutor would press the student further by asking follow-up questions that diagnose whether the student truly understands the answer (VanLehn & Martin, 1998).

The mechanism of AutoTutor-1.

It is beyond the scope of this proposal to review all of the components of AutoTutor, but a few of the highlights will convey a general sense of the mechanism.

  1. Curriculum scripts with example problems, deep questions, graphics, and animation. A curriculum script is a
    loosely ordered set of skills, concepts, example problems, and question-answer units. Most human tutors follow a script-like macrostructure, but briefly deviate from the structure when the student manifests difficulties, misconceptions, and errors. The content of the curriculum script in tutoring (compared with classrooms) has more deep reasoning questions (e.g., why, how, what-if, what-if-not), more problems to solve, and more examples (Graesser et al., 1995). AutoTutor-1 has a curriculum script that organizes the topics of the tutorial dialog. The script includes didactic descriptions, tutor-posed questions, example problems, figures, and diagrams (along with anticipated good responses to each topic). There also is a glossary of technical terms with definitions (i.e., answers to the learner’s “What does X mean?” questions). There were 36 topics (example problem or deep reasoning question) in AutoTutor-1, 12 each for the hardware, the operating system, and Internet. Each topic is represented simply as a set words, sentences, or paragraphs in a free text format. Thus, it is easy for a lesson planner to create new topics and content with a simple authoring tool; there is no need to craft the content in structured LISP or Prolog code, which is routinely done when systems are created in the ITS enterprise. Associated with each topic is a focal question, a set of basic noun-like concepts, a set of ideal good answer aspects (each being roughly a sentence of 10-20 words), different forms of expressing or eliciting each ideal answer aspect (i.e., a hint, prompt, versus assertion), a set of anticipated bad answers (i.e., bugs, misconceptions), a correction for each bad answer, and a summary of the answer or solution. Except for the hints, prompts, and corrections, the preparation of curriculum script requires no special knowledge on the part of the lesson planner. The system was designed this way so that AutoTutor could be used for a large range of topics (virtually any topic except those that require the precision of mathematics) and so lesson planners could develop the content with minimal knowledge of discourse or computer science.
  2. Natural language extraction and speech act classification. AutoTutor needs to classify the speech acts of student contributions in order to flexibly respond to what the student types in. AutoTutor segments the categorized string of words and punctuation marks within a learner’s turn into speech act units, relying on punctuation to perform this segmentation. Then each speech act is assigned to one of the following speech act categories: Assertion, WH-question, YES/NO question, Metacognitive comments (I don’t understand), Metacommunicative acts (Could you repeat that?), and Short Response.
  3. Latent Semantic Analysis. The fact that world knowledge is inextricably bound to the process of comprehending language and discourse is widely acknowledged, but researchers in computational linguistics and artificial intelligence have not had a satisfactory approach to handling the deep abyss of world knowledge. Recently, Latent semantic analysis (LSA) has recently been proposed as a statistical representation of a large body of world knowledge (Landauer & Dumais, 1997; Landauer, Foltz, & Latham, 1998). LSA provides the foundation for grading essays, even essays that are not well formed grammatically, semantically, and rhetorically; LSA-based essay graders can assign grades to assays as reliably as experts in composition (Foltz, 1996; Landauer, Foltz, & Laham, 1998). An LSA space is created after processing a large corpus of texts that are relevant to the topic being tutored. The LSA uses singular value decomposition to reduce a large Word by Document cooccurrence matrix to approximately 100-500 dimensions. LSA capitalizes on the fact that particular words appear in particular texts (called “documents”). Each word, sentence, or text ends up being a weighted vector on the K dimensions. The “match” (i.e., similarity in meaning, conceptual relatedness) between two words, sentences, or texts is computed as a geometric cosine (or dot product) between the two vectors, with values ranging from 0 to 1. AutoTutor has successfully used LSA as the backbone for assessing the quality of student assertions, based on matches to good answers and anticipated bad answers (Graesser, Wiemer-Hastings, Wiemer-Hastings, Person, Harter, & TRG, in press).
  4. (4) Selection of Dialog Moves. There needs to be a mechanism for dialog management that has discourse markers and other cues that guide the student in the exchange and that can accommodate virtually any input of the student (Freedman, 1999; Soller et al., 1999). AutoTutor selects dialog moves by using fuzzy production rules and a a dialogue advancer network (Graesser et al., 1999; Person, Graesser et al., in press). There are different categories of dialog moves: main questions, short feedback (i.e., positive, neutral, negative), pumps (“uh huh”, “tell me more”), prompts ( "The primary memories of the CPU are ROM and _____"), hints, assertions, corrections, and summaries. The selection and sequencing of the categories are sensitive to various parameters that are induced from the tutorial dialog. Fuzzy production rules are tuned to (a) the quality of the student’s assertions in the preceding turn, as computed by LSA, (b) global parameters that refer to the ability, verbosity, and initiative of the student, and (c) the extent to which the good answer aspects of the topic had been covered. A dialog advancer network manages the exchange by specifying appropriate discourse markers (e.g., “Moving on”, “Okay”) and dialog move categories within a tutors turn in a fashion that is sensitive to the learner’s previous turn. Formally, the dialog advancer network and associated production rules consist of an augmented state transition network. The selection of the next good answer aspect to cover from the curriculum script is determined by the zone of proximal development in AutoTutor-1. That is, AutoTutor builds on the fringes of what is known in the discourse space between the student and AutoTutor by selecting the good answer aspect that has the highest subthreshold coverage value (e.g., the idea is almost but not quite covered). A topic is finished when all of the aspects have coverage values that meet or exceed the threshold t.
  5. Talking Head with Gestures. Researchers have recently developed animated conversational agents that have speech synchronized with facial expressions and gestures (Cassell et al., 1999; Cassell & Thorisson, 1999, Cohen & Massaro, 1994; Johnson, Rickel, & Lester, in press; Lester et al., 1999; Rickel & Johnson, 1999). Ideally, the computer controls the eyes, eyebrows, mouth, lips, teeth, tongue, cheekbones, and other parts of the face in a fashion that is meshed appropriately with the language and emotions of the speaker. Microsoft Agent is currently being used as the talking head with synthesized speech in AutoTutor-1. Parameters of the facial expressions are generated by fuzzy production rules. There are eye blinks, nods and “uh-huh” for back channel feedback, and hand gestures that try to prompt information out of learner. Positive, neutral, and negative feedback are expressed by animated facial expressions and synthesized speech with appropriate intonation. It should be noted that AutoTutor-1 should be contrasted with other conversational agents that require computer systems with considerably more computational power, sometimes requiring up to 5 independent processors (as is the case in the above references.)

It is important to reiterate that AutoTutor’s architecture was designed to make it easy to create a tutoring system on a new material. In the proposed research we will develop AutoTutor-1 for physics. This can be accomplished in just 3 steps. First, we would need a large corpus of texts on physics in electronic form. The existing LSA program could create an LSA space on the corpus in a day or two. Second, we would need a lesson planner to create a curriculum script with (a) deep reasoning questions (and/or example problems to solve) and associated pictures or animations, (b) key concepts and their synonyms, (c) good answer aspects and their associated hints and prompts, and (d) anticipated bad answers and their corrections. All of this is material is entered in English and guided by an authoring tool. Third, a glossary of terms and their definitions is needed in electronic form. After supplying these three items, AutoTutor does all of the rest automatically; there is no tinkering and tuning of parameters.

Evaluations of AutoTutor.

AutoTutor has been tested on nearly 200 students in a computer literacy course at the University of Memphis. The tutoring was provided as extra credit in the course at a point in time after the students had allegedly read the relevant chapters and attended a lecture in the course. So AutoTutor gave students an opportunity to have additional studing of the material. Our evaluations of gains in learning and memory were very promising. AutoTutor provided an effect size increment of approximately .5 to .6 SD units when compared to a control condition where students reread yoked chapters in the book or did nothing. This increment in learning gains was found for questions that tap both deep and shallow learning. These results are on par, if not better, than the .4 SD that occurs in normal human tutoring (Cohen et al., 1982).

For illustration, consider one of the experiments that we conducted on AutoTutor-1. AutoTutor-1 was tested on 36 students in a computer literacy course at the University of Memphis. The students received extra credit for participating in the experiment. Each student had one of the macrotopics (hardware, operating systems, internet) assigned to one of three conditions, using a suitable counterbalancing scheme: AutoTutor (student uses AutoTutor to study one of the macrotopics), Reread (student re-reads a chapter for a macrotopic), and no-read Control (student does not re-study a macrotopic). A repeated measures design was used so that we could evaluate aptitude X treatment interactions; that is, we could assess whether AutoTutor is relatively effective for some categories of learners but not others (such as high versus low performers overall). On the average, students took 38 minutes to use AutoTutor, which was somewhat less time than the 45 minutes assigned in the Reread condition. There were 3 outcome measures. There was a sample of testbank questions that were actually used in the computer literacy course; these were in an N-alternative multiple- choice format . We discovered that all of these questions were shallow according to Bloom’s taxonomy of cognitive difficulty (Bloom, 1956). There was a sample of deep multiple choice questions, one question for each of the 36 topics, that tapped causal inferences and reasoning. And finally, there was a cloze test that had 4 critical words deleted from the ideal answers of each topic; the students filled in these blanks with answers. The proportion of correct responses served as the metric of performance. We also combined all three outcome measures into a composite score. There were significant differences in composite scores among the three conditions, with means of .43, .38, and .36 in the AutoTutor, Reread, and Control conditions, respectively, F(2, 70) = 6.10, p< .05. Planned comparisons showed the following pattern: AutoTutor > Reread = Control. The effect size of AutoTutor over Control was .50. A repeated measures ANOVA was performed that crossed the three conditions with the three types of subtests. There was a significant main effect of condition, F(2, 70) = 48.03, p< .05, MSe = .038, a significant main effect to test, F(2, 70) = 3.06, p< .05, MSe = .037, and no significant interaction. Aptitude X treatment interactions were not found in this study but we remain in the hunt for such interactions. These results support the conclusion that AutoTutor had a significant impact on learning gains.

We have evaluated AutoTutor on the conversational smoothness and the pedagogical quality of its dialog moves in the turn-by-turn tutorial dialog (Person, Graesser, Kreuz et al., in press). When experts rate the quality of AutoTutor’s dialog moves, the mean ratings are positive, but there clearly is room to improve in the naturalness and pedagogical effectiveness of its dialog. In a recent study we performed a bystander Turing test on the naturalness of AutoTutor’s dialog moves. How did we do that? We randomly selected 144 tutor moves in the tutorial dialogues between students and AutoTutor-1. We asked 6 human tutors (from the tutor pool on computer literacy at the University of Memphis) to fill in what they would say at these 144 points. So at each of these 144 tutor turns, we had what the human tutor generated and what AutoTutor generated. We subsequently tested a group of 36 computer literacy students as to whether they could discriminate whether these dialog moves were generated by a human versus a computer; half in fact were by human and half were by computer. We found that these students were unable to discriminate whether particular dialog moves had been generated by a computer versus a human; the d’ discrimination scores were actually a bit negative (-.08), although not significantly. This rather impressive outcome supports the claim that AutoTutor is a good simulation of unaccomplished human tutors.

AutoTutor has done a surprisingly good job evaluating the quality of the answers that students type in during the tutorial dialog. AutoTutor attempts to “comprehend” the student input by segmenting the contributions into speech acts and matching the student’s contributions to good answer aspects and bad answers through latent semantic analysis (LSA) (Landauer & Dumais, 1997). Our research revealed that AutoTutor is almost as good as an expert in computer literacy in evaluating the quality of student answers to questions and the quality of contributions in the tutorial dialog (Graesser, Wiemer-Hastings et al., in press; Wiemer-Hastings et al., 1999). For example, 2 graduate student research assistants have a correlation of approximately .5 to .6 when grading the quality of student answers, whereas there is nearly a .5 correlation between AutoTutor’s LSA component and a graduate student RA. Some critics may not be impressed with the .5 to .6 interjudge reliability scores, but it should be noted that the interjudge reliability correlations were approximately .6 to .7 when Foltz had expert composition teachers grade essays. The goal of this research is not to carefully train a group of experts to optimize their reliability scores (a goal of some research), but rather to obtain a reasonable estimate of the reliability of these scores in a naturalistic context and to observe how well AutoTutor’s LSA component compares. We found that our LSA evaluator of the quality of student contributions was in the arena of graduate student RA’s, the individuals who normally grade these answers in a university course.

Limitations of AutoTutor-1.

AutoTutor-1 is a promising first step in developing an automated computer tutor that simulates human tutors. However, there are a number of general limitations that must be acknowledged and that motivated the proposed research. First, AutoTutor-1 was limited to simulating an unaccomplished human tutor. More impressive learning gains are expected if the tutorial dialog modules support more sophisticated tutoring strategies. Second, the natural language processor was limited to lexicons, speech act classifications and LSA. AutoTutor should improve by adding more powerful natural language understanding components that are available in computational linguistics, such as syntactic parsers and semantic analyzers (Allen, 1995; Jurafsky & Martin, 2000). Third, the dialog management of AutoTutor-1 is an augmented state transition network that does not track the goals, beliefs, and shared knowledge of the tutor and learner. Part of the reason for this was intentional; most student contributions are too vague for it to be worthwhile to track their goals, beliefs, and knowledge states. However, it might be worthwhile to explore discourse components and models that have been proposed in the Discourse Research Initiative (1997; Poesio & Traum, 1998). Fourth, it is uncertain how well AutoTutor would hold up in another learning domain. Fifth, there are limitations with Microsoft Agent
that prevent it from going the distance in simulating human tutors. For example, the version of Microsoft Agent that was available during the development of AutuTutor-1 cannot point and speech simultaneously. We need to enhance the conversational agent in a fashion that coordinates speech, facial expressions, pointing, and body movements as the agent interacts with animated graphical displays and the learner in a context-sensitive fashion.

Proposed Research

The proposed research will develop and test enhanced versions of AutoTutor that incorporate more sophisticated tutoring mechanisms in its attempt to promote active construction of knowledge. We refer to the enhanced version as AutoTutor-2, although there may be multiple versions that focus on particular tutoring capabilities. For example, one enhanced version will attempt to get the student to articulate all of the important pieces of information; if the student doesn’t say it, then it is not covered. This is in contrast to AutoTutor-1, where it was assumed that a good answer aspect was covered if it was mentioned by either the student or the tutor, in the shared discourse space. Efforts will be made to get the student to articulate information with a formal language, with precision, or with symbols accurately linked to referents. Another enhanced version will attempt to get the learner to actively manipulate parameters in a 3-d simulation of a microworld for a physical system. Imagine the student manipulating gravity, friction, resistance, and other components of Newtonian physics and then exploring the consequences in a simulation. These two versions of AutoTutor-2 will use natural language and tutorial dialog to scaffold the student to enhanced articulation and to simulations of physical systems, respectively. These two versions of AutoTutor-2 are being planned, but we want to leave the door open to exploring other sophisticated tutoring techniques.

In addition to the practical objectives of developing tutorial software and testing its impact on learning gains, the proposed project will be advancing basic scientific research in several areas: cognitive psychology/science, discourse processing, computer science, and computational linguistics. AutoTutor is a complex system that requires solutions to many levels of cognition (pattern recognition, learning, knowledge representation, reasoning, problem solving), language (lexicon, syntax, semantics) and discourse (speech act classification, dialog planning, dialog management, common ground, repair). It incorporates a number of different computational and quantitative architectures that are suited to the idiosyncratic features of phenomenon being modeled, such as neural networks, fuzzy production rules, latent semantic analysis, and finite state automata. AutoTutor-2 will incorporate new computational architectures, the ability to learn from past tutorial dialogs, and more advanced modules from computational linguistics (context free syntactic parsers, semantic grammars, dialog planners).

There is a long history of basic research on cognitive mechanisms, but surprisingly very little of this research has been tested in complex learning environments (see Hegarty, Narayanan, & Freitas, in press; Mayer, 1997). For example, cognitive scientists know very little about the process of the learner monitoring attention while viewing a complex human-computer interface with a talking head, an animated display of a causal mechanism, a window with the focal question or problem, and a GUI earmarked for the learner’s keyboard input. Therefore, in the proposed research, we plan on collecting eye tracking data while college students use AutoTutor. It is possible to experimentally manipulate features of AutoTutor’s interface and to observe the impact on the learners’ linguistics descriptions, eye tracking profiles, learning gains, and other cognitive measures. For example, how can the synthesized speech, facial expressions, and gestures of the talking head be coordinated with the presentation of the graphical displays and animation? When is there a split attention problem (Sweller, 1988) or cognitive overload from “feature bloat” on the interface?

In summary, there are six major objectives in the proposed 3-year project:

  1. To develop AutoTutor-2 and additional versions that enhance the conversational and pedagogical capabilities of AutoTutor.
  2. To develop AutoTutor for both introductory computer literacy and basic Newtonian physics.
  3. To develop an authoring tool for instructors to prepare new material on AutoTutor.
  4. To test the effectiveness of AutoTutor on learning gains, conversational smoothness, and pedagogical quality.
  5. To conduct basic research on cognitive mechanisms that explore how college students interact with AutoTutor’s complex learning environment.
  6. To conduct basic research in computer science and computational linguistics that potentially improves the computational components of AutoTutor.

It is beyond the scope of this proposal to cover all of these major objectives both comprehensively and in depth. Hopefully, our previous successes in building and testing AutoTutor-1 make a convincing case that the research team can deliver a working system, test its effectiveness, advance basic research, and disseminate the findings to the scientific community. The work on AutoTutor-1, which started in the fall of 1997, has produced 35 publications (in referred journals, books and conference proceedings) and 46 presentations at professional societies. Instead of being comprehensive, the remainder of the proposal will focus on particular issues, plans, and methods.

Getting the student to articulate knowledge.

As discussed above, there are pedagogical advantages to having the student articulate knowledge rather than having the knowledge be delivered by the tutor. Therefore, a good answer aspect will be considered covered in AutoTutor only if it is articulated by the student. In some cases, there is a pedagogical value in having the knowledge be articulated precisely, formally, and with appropriate symbols rather than the informal language that is typical of conversation (Biber, 1988; Clark, 1996). Indeed, it could be argued that precisification, formalization, and symbolization are critical features of the learning process; indeed, some developers of ITSs in geometry (Koedinger & ) and physics ( ) have directly focused on building GUI’s to encourage these processes. Somehow, the dialog management of AutoTutor-2 needs to provide a systematic mechanism for getting the student to articulate the important knowledge.

In order to convey how AutoTutor would promote the student’s articulation of knowledge, a few details are need about the mechanisms of AutoTutor. Consider first the curriculum script. Presented below is an example focal question, a set of good-answer-aspects” (designated as “\pgood”), different forms of expressing one of the good-answer-aspects, an anticipated bad answer (designated as “\bad”), and a correction of the bad answer. The explanation point symbol (!) is part of the mark-up language to the talking head; it designates that a word should be stressed in the synthesized speech.

\topic_Operating_System
\info-8 !Large, !multi-user !computers often work on several jobs !simultaneously. This is known as !concurrent processing. Computers with state-of-the-art !parallel !processing use multiple CPUs to process !several jobs simultaneously. However, the typical computer today has only !one CPU. So here's your !question.
\question-8 How does the operating system of a typical computer process !several jobs simultaneously, with only !one CPU?
\pgood-8-1 The operating system helps the computer to work on several jobs simultaneously by rapidly switching back and forth between jobs.
\pgood-8-2 When there is idle time on one process or job, the operating system takes advantage of this idle time by working on another job.
\pgood-8-3 Timesharing computers use concurrent processing whenever multiple users are connected to the system. A timesharing computer moves from terminal to terminal, checking for input and processing each user's data in turn.
\pgood-8-4 Concurrent processing is common in personal computer operating systems that allow multitasking.
\pelab-8-4 Concurrent processing is common in operating systems that allow multi !tasking.
\phint-8-4-1 When would !concurrent processing be needed in a !personal computer?
\pprompt-8-4-1 Concurrent problems on !several tasks is common in operating systems that allow
\ppromptc-8-4-1 That allow multi !tasking.
\pgood-8-5 Multitasking allows the computer user to issue a command that initiates a process in one application while the user works with other applications.
\bad-8-1 The operating system does one job at a time.
\correction-splice-8-1 The operating system can work on !several jobs at !once.
\summary-8 The operating system !rapidly switches !back and !forth between !jobs. When there is idle time on !one job, the operating system switches to !another job. Multi !tasking allows the computer to work concurrently on !one command while processing !other commands of a single user. !Timesharing allows several !users to use an operating system !simultaneously.

Some of slots contain content that participate in pattern match processes. That is, the student speech acts that are classified as Assertions are matched to the content of \pgood and \bad slots; if there is a high enough pattern match to one of these slots in the curriculum script (i.e., the LSA cosine match meets some threshold and beats the competitors), then there is a successful unification with the slot and that slot is covered. Other slots contain content that is produced as synthesized speech by the talking head. These are the slots with mark-up language (!), such as those that present didactic information (\info), questions (\question), assertions-elaborations (\pelab), hints (\phint), prompts (\pprompt), correct prompt completions (\ppromptc), and corrections (\correction-splice). These speech acts are produced when triggered by the Dialog Advancer Network (DAN) and a set of fuzzy production rules.

There are a variety of ways to get the student to articulate the knowledge in AutoTutor-2 at different levels of specification. One method is to have a larger family of hints and prompts associated with any particular good answer aspect A. Each hint or prompt would be designed to elicit a different noun-phrase, prepositional phrase or clause in the good answer aspect. In essence, the hint or prompt would be selected until the missing constituents have been supplied by the student. It would be possible to implement one or more cycles of hint-prompt-assertion when extracting the constituents of aspect A. That is, a hint is first generated in tutor turn N, then a prompt in turn N+2, then an assertion in N+4, and then additional cycles until all of the content is articulated by the student. A progressive specificity in hinting mechanisms has been implemented in the ANDES physics tutor (Gertner & VanLehn, 2000) and in the PACT algebra tutor (Koedinger et al., 1997), but these systems have not yet completed tutorial dialog in natural language. One method of getting the student to articulate the content more precisely is to raise the LSA threshold (t) for coverage of a a good answer aspect A. As the threshold approaches 1.0, the student would be expected to articulate the information in a fashion that closely matches the good answer aspect. An obvious method of encouraging a formal articulation of knowledge is to have the good answer aspects articulated in formal language, as opposed to conversational language. Regarding symbolization, it will be necessary to have a syntactic parser identify noun-phrase constituents and bind these referring expressions to referents in a knowledge structure associated with the topic. This will require a syntactic parser and a semantic analyzer that is capable of anaphoric reference (e.g., binding pronouns and noun-phrases to referents). Advances in computational linguistics have made noticeable progress in automating components of language analysis that lie within the span of a sentence and short discourse segment, such as tagging the part-of-speech of words, identifying the correct sense of words with multiple senses, parsing sentence syntax, connecting adjacent clauses, and extracting information that is relevant to slots in conceptual templates (Allen, 1995; DARPA, 1995; Jurafsky & Martin, 2000; Lehnert, 1997). During the development of AutoTutor-2, we experimented with Abney’s SCOL parser (1997) and Rose and Lavie’s LCFLEX parser (1998) because these parsers could perform a partial analysis or a repaired analysis of student contributions that are not well-formed syntactically (which is very often the case). We will integrate more modules from computational linguistics into AutoTutor-2.

Getting the student to use the knowledge in computer literacy and Newtonian physics.

So far AutoTutor has been developed exclusively in the area of computer literacy. In order to assess the generality of AutoTutor, the proposed research will develop learning modules in Newtonian physics. Physics is selected as a tutoring topic because some of the members of this project (Graesser, Franceschetti, Hu, Louwerse, Person) are collaborating with Kurt VanLehn (at the University of of Pittsburgh) in building an ITS in qualitative physics. The project is funded by the Office of Naval Research on a MURI grant (N00014-00-1-0600, 2000-2005). This Why2 system has a collaborative dialog in natural language while the student attempts to solve qualitative physics problems and to explain their reasoning (Graesser, VanLehn, et al., in press). The goals, scope, and computational architecture of Why2 are significantly different from AutoTutor. The goal of the proposed research is to develop an AutoTutor version for Newtonian physics in order to assess the generality of the AutoTutor architecture.

Another reason for focusing on physics is that it is possible to set up a learning environment for simulating physical events, and analyzing how language is coordinated with these understandings. Deep comprehension is achieved when learners can use their knowledge and forecast what will happen in simulations. Even after many hours of physics instruction, students often continue to have misconceptions about the dynamics of moving objects (e.g., McCloskey, Caramazza, & Green, 1980; Halloun & Hestenes, 1985). For instance, even after training, students may still believe that gravitational force varies in magnitude as a projectile rises and falls (Ploetzner & VanLehn, 1997). This may occur, in part, because formal instruction tends to emphasize quantitative physics knowledge, that is, knowledge that defines functional relations in terms of algebraic and vector-algebraic equations. Only rarely do textbooks address the common misconceptions in physics that plague people’s qualitative physics knowledge. Qualitative physics knowledge is an understanding of the general characteristics of a physical system that can be used to make simple predictions about a system’s behavior under different conditions (Forbus, 1984). Clearly, both kinds of knowledge are needed in the analysis of physical systems.

To acquire qualitative knowledge, learners must understand that the behavior of a physical system is an emergent property of that system. They must recognize that an object’s path through space results from a complex interaction of factors involving gravity, air resistance, wind, elasticity, friction and density. In the classroom setting, problems that ask students to “solve for” a particular value, while clearly necessary for the development of quantitative knowledge, may conceal the dynamic nature of physical systems. Our hypothesis is that qualitative knowledge of physical systems is best fostered by tasks that highlight the complex, interactive nature of physical systems. More specifically, we hypothesize that the emergent properties of a force dynamic system are best learned through the active construction of systems that give rise to those properties. What learners need, then, is a conceptual workbench for interactively examining how force dynamic systems operate under a range of constraints.

Such a workbench is now possible due to recent advances in computer simulation and visualization. To build such a system, we will use a software package called 3D Studio Max, one of the most powerful animation and modeling packages available. Among other features, the program has a robust and very fast motion dynamics system that allows for real time interaction. The motion dynamic system solves for the motion, energy, and momentum of sets of objects over time. As in the real world, collisions between objects depend on the velocity of the objects and their properties. The key properties assigned to an object are its density, elasticity, static friction and sliding friction. Static friction specifies how hard it is to start moving on a surface while sliding friction determines how hard it is for an object to continue moving over a surface. Objects move when acted upon by gravity, wind, or other objects. One other important parameter is air resistance. When any object moves, it hits air resistance (except in a vacuum). The faster it moves, the higher the relative air resistance with the square of the speed. Thus, air resistance imposes an upper limit on the speed of things that are falling with gravity, and also makes objects tumble due to the effect of air resistance on each face of the object.

One especially significant feature of 3D Studio Max is its configurability. This flexibility is made possible by a built-in scripting language (accessible to non-programmers) that can be used to control all the parameters of a motion dynamic system, as well as virtually all other parameters in the system, including the user interface, which can be completely reconfigured or replaced. It also allows for live interfacing with external systems. In effect, then, the program can be placed within AutoTutor, seamlessly handling the higher-order math necessary for simulating a dynamical system as well as outputting smoothly shaded graphics.

We hypothesize that interactions with a simulated dynamical system will foster qualitative knowledge of that system. This can be tested in a series of experiments. For example, the training part of each experiment will usually begin with a short presentation of a computer-generated sequence of events. For example, participants might view a ball rolling into second ball, causing the second ball to roll, or a marble roll up and off a ramp, land on a see-saw, then catapult a box over a barrier. Participants will then be asked to replicate these sequences of events by setting the parameters of the motion dynamic simulator. Participants will have access to one or more of the following parameters:

  1. Point force (1 = one newton)
  2. Gravity (1 = force imparted by gravity at sea level)
  3. Air resistance (0 = vacuum; 100 = air resistance at sea level)
  4. Density of any object (1 = one gram/cubic centimeter)
  5. Elasticity of any object (0 = clay; 1 ? super ball)
  6. Sliding friction (0 = frictionless; 1 ? sandpaper)

By testing different sets of parameters, it should become clear to participants that this seemingly simple event is quite complex. The system is also quite delicate: even small changes in the parameters can have radical effects. For example, if the density of the first marble is reduced from 1 to 0.6 g/cc, it will bounce back into the air upon hitting the heavier second marble (see Figure 2a). Conversely, if the density of the second marble is reduced from 1 to 0.6 g/cc, the same marble with fly off into the air once it is bumped before returning back to the surface. The process of discovering a set of parameters that results in a particular sequence of events will represent an interesting challenge for many of the participants. Once again, AutoTutor’s dialog facility will assist the student in manipulating these parameters and observing what happens.

We will investigate how our language for motion dynamic systems might change as our qualitative knowledge of such systems improves. It has been claimed that in colloquial speech, we adopt a materialistic and causal world view, whereas scientific physics is acausal, constraint-based, and nonmaterialistic (Chi, Slotta, & de Leeuw, 1994). Indeed, there are many verbs in English that encode the notion of cause (Levin & Rappaport Hovav, 1994), and everyday language often makes use of individual entities to refer to chains of processes (Van Valin, & Wilkins, 1996). We predict, then, that with increases in qualitative knowledge there should be a shift in how people describe motion dynamic systems. For one, we predict verbs encoding the notion of cause (e.g., cause, force, make, get) will be replaced with verbs simply describing a transfer of motion or momentum (e.g., impart, communicate, guide, give). In addition, we predict that as qualitative knowledge increases, people will be more likely to describe relationships in terms of processes than objects.

Building an authoring tool for AutoTutor.

AutoTutor was initially designed to make it easy for a lesson planner to create new topics in computer literacy and for creating a tutoring system in other domains. Stated differently, AutoTutor is generic rather than content constrained. The AutoTutor architecture can be used for virtually any domain that has the following characteristics: (1) there is a correct answer to questions or problems and (2) the answer does not require the precision of mathematics or syllogistic reasoning. The latent semantic analysis (LSA) component in AutoTutor is tailored for verbal answers of varying lengths and for assessments of similarity between student input and expected answers. There are only two modules that need to be developed for any domain of tutoring that fits this description: the LSA space and the curriculum script.

It will be straightforward to develop an authoring tool to create the LSA space for a new knowledge domain. In fact, we have already created a web sight that instructs users how to develop an LSA space from a corpus of documents. The proposed tool will first request a corpus of electronic texts. If these are unavailable, guidance will be needed to scan in texts, apply an optical character scanner, correct misspellings, cleanse the text of unwanted characters/symbols, and discard unwanted excerpts. Then the tool would ask the user to declare some parameters, such as (a) whether to use a sentence, paragraph, or section as the document unit and (b) how many dimensions to use (between 100 and 500 K dimensions is the typical range). Then the LSA space is created and each word is assigned a vector of values on the K dimensions. This is the input to AutoTutor.

Similarly, it will be straightforward to build an authoring tool that guides the lesson planner on how to build the curriculum script on a new topic. In fact, a rudimentary authoring tool has already been built. The tool will prompt the user to (a) type in the focal question or problem in English, (b) add any animations or graphical displays in particular media formats, (c) type in good answer aspects (sentences) in English, (d) prepare hints, prompts, and elaborations for each good answer aspect, (e) type in expected bad answers, (f) prepare correction/splices to correct the bad answers, (g) enter a glossary of words and definitions (which is usually available electronically), (h) type in key words and their synonyms, (i) type in a summary answer/solution, and (j) add mark-up language that specifies emphasized words and pauses. After this is created, AutoTutor does all of the rest. Thus, the fuzzy production rules, the dialogue advancer network, the good answer aspect selector, and the talking head are all generic.

In the proposed project, we will implement modules in computational linguistics and computational discourse to automate steps d, h, i, and j. There will be a hint generator that attempts to generate a family of hints for any given good answer aspect. This task is well within the scope of the natural language generation (see Jurafsky & Martin, 2000) because many of the rules are systematic and it would operate on a single sentence (as opposed to multi-sentence texts). However, a generic hint generator has not yet been developed so this would be an advance in the field of computational linguistics and tutorial dialog. Similarly, a prompt generate would generate a family of prompts that attempt to get the learner to fill in various noun-phrases, prepositional phrases, main verbs, and clauses. Key words could be induced by consulting word frequency norms and various lexicons; the synonyms could be added through an electronic thesaurus or induced through the corpus of texts. There are summary generators in the field of computational linguistics, although the quality of these would need to be tested. Regarding mark-up language that specifies the stress of words, these could be generated on-line in a fashion that is sensitive to the dialogue history. A word gets emphasized when it is a content word that is introduced in the discourse space for the first time (Clark, 1996; Givon, 1995; Goldsmith, 1995). For example, if a technical term (e.g., multi-tasking) is first introduced in the discourse space for a given topic, we want it to be emphasized; it should not be emphasized if it occurs later in the dialog, unless it is contrasted with another term or idea (e.g., batch processing). Pauses should occur after the tutor presents a dialogue move with a high information load. Obviously, we would need to test the performance of any module that automatically fills in information in the curriculum script.

After this tool is developed, we plan on assessing how quickly and effectively instructors can generate new C-script content in computer literacy and other topics by collecting data from a sample of instructors. A particular component or step in the authoring tool would be regarded as problematic if the instructors enter incorrect input for the slot, take a lengthy period of time to fill the slot, stop using the authoring tool, or explicitly articulate that there are problems with the slot. We would compare the computer-generated output (for d, h, i, and j) with the output of expert lesson planners; recall and precision scores would be collected as performance measures (using human experts as the gold standard), as is routinely done in the field of computational linguistics.

New computational models.

We plan on exploring new computational architectures for the various modules and compare their performance to old ones. Given the length restrictions of this proposal, it is not possible to go into these in depth. However, we plan on enumerating some of them.

  1. Speech act classifier. AutoTutor currently classifies the learner’s speech acts into the following categories: Assertion, WH-question, YES/NO question, Metacognitive comment (I don’t understand), Metacommunicative act (Could you repeat that?), and Short Response (okay, yes). We have performed classification through neural networks, syntactic parsers, and frozen expression catalogues. Unfortunately speech act segmentation has not yet been solved satisfactorily so we have relied on punctuation to mark junctures between speech acts. We will explore neural networks and dynamical systems models that are sensitive to a broad array of input features, contextual features, and complex interactions among features over time (Kozma, 1996; Kozma & Freeman, in press). We also plan on exploring a broader set of speech act categories that have been proposed by the Discourse Research Initiative (1997).
  2. Learning fuzzy production rules. We needed to hand craft the fuzzy production rules (Kosko, 1992; Zadeh, 1997) that select dialog moves, manipulate facial features, and guides various other tasks. For example, the dialog moves in AutoTutor-1 are generated by 15 fuzzy production rules that are sensitive to the ability of the student and to the dialog history (Person, Graesser et al., in press). AutoTutor-1’s production rules are tuned to the following parameters: (a) the quality of the student’s Assertion in turn N, (b) the ability of the student, based on mean LSA values of previous Assertions in the tutoring session, (c) topic coverage, (d) the verbosity of the student (how many words per turn) and the stage of the dialog for a topic (early, middle, late). For example, consider the following dialog move rule: IF [student Assertion match with good-answer-aspect = HIGH or VERY HIGH], THEN [select <positive feedback> dialog move]. AutoTutor provides Positive Feedback (e.g., “Right”) in response to a high quality student Assertion. Ideally, AutoTutor would be able to learn these rules from tutoring experience, rather than having the researcher hand craft them. We will explore some induction mechanisms that tune the fuzzy production rules and create new ones (Kozma & Freeman, in press).
  3. Intelligent dialog management. A collaborative exchange between AutoTutor and the learner requires a mutual understanding of the turn-taking process. In human-to-human conversations, speakers signal to listeners that they are relinquishing the floor and that it is the listener’s turn to say something (Clark, 1996; Nofsinger, 1991; Sacks et al., 1978). However, human-to-computer conversations lack many of the subtle signals inherent to human conversations. When conversational agents lack turn-taking signals, the learner does not know when or if the learner is supposed to respond, and is sometimes confused when the tutor generates elaborations, prompts, hints, and other dialog moves. We have developed a dialog advancer network (DAN) in AutoTutor-1 that does a fairly impressive job in managing the conversation, based on our performance data (Person, Graesser, & TRG, in press; Person, Graesser, Harter et al., in press). The DAN is an augmented state transition network that generates discourse markers, dialogue move categories, and frozen expressions in a manner that that is sensitive to the learner’s previous turn. After the student speaks in turn N, the tutor classifies the student input into speech act categories and produces a response that adapts to what the student said. If the student produces a frozen expression that requests the tutor to repeat itself (“Could you say that again?”), AutoTutor-1 produces a discourse marker (“One Again”) and repeats itself. If the student asks a YES/NO question, Auto-Tutor-1 answers the question and then goes on. If the student Asserts something, AutoTutor-1 gives evaluative feedback and then advances the conversation with another dialog move. DAN specifies AutoTutor’s dialog move options for any given student turn category. It has 78 legal dialog pathways altogether. The DAN solved nearly all of AutoTutor’s turn-taking problems in the useability tests. In spite of the impressive performance of the DAN, we plan on pursuing more intelligent dialog management systems, such as COLLAGEN (Rich & Sidner, 1998). These systems perform dynamic planning of dialogue moves on the basis of knowledge states, goals, and beliefs that the tutor infers about the student and that the tutor believes is in the common ground. Unfortunately, the language of learners is extremely vague and underspecified so the application of these models may have limited value. The primary challenge in discourse management is likely to be managing vague assertions of students rather than accurately inferring specific knowledge states of larners. In fact, Graesser et al.’s (1995) in depth analysis of human tutorial dialog revealed that tutors have only a crude, approximate sense of what students know. The sudent and tutor live in frighteningly different mental spaces rather than there being an intense meeting of the minds.
  4. A talking head that coordinates speech, intonation, facial expressions, and gestures. A conversational agent is an important feature of AutoTutor because it concretely grounds the conversation between the tutor and learner. A talking head also provides a separate channel of cues for providing mixed feedback to the learner. For example, when a learner’s contribution is incorrect or vague, the speech is often positive and polite whereas the face has a puzzled expression; this conflicting message satisfies both pedagogical and politeness constraints so it is preferable to a threatening speech message that says “That’s wrong” or “I’m having trouble understanding you.” The nonverbal facial cues are known to be an important form of back channel feedback during tutoring (Fox, 1993; Graesser et al., 1995; Person et al., 1994), as well as other contexts of conversation (Clark, 1996). Similarly, pitch, pause, duration, amplitude, and intonation contours are among the intonation cues that signal back channel feedback, affect, and emphasis (Brennan & Williams, 1995; Ladd, 1996). Unfortunately, Microsoft Agent has technical limitations that present serious challenges in coordinating speech, intonation, facial expressions, and gestures. For example, the current version of Microsoft Agent does not allow AutoTutor to point and to display facial expressions at the same time that it produces synthesized speech. When we try to generate good prompts (that encourage the student to fill in a word or phrase), there are jerky transitions between delivering a spoken prompt with a rising pitch, exhibiting an encouraging facial expression, and gesturing to the student to input a word. It would be better to have a well-timed, simultaneous, ballistic, delivery of these components. The grain size of the intonation and facial parameters of Microsoft Agent is too crude to handle the subtle facial expressions that we desire. We have developed a prototype talking head that computes these components of the talking head in parallel, at virtually an arbitrary level of grain size, with low-level parameters computed on the fly in a fashion that is sensitive to higher level parameters (Drumwright & Garzon, 2000). It does this in parallel computation using Java 3D with a neurofuzzy controller (Kosko, 1992).

Testing the effectiveness of AutoTutor.

Earlier in the proposal, we described several methods that we adopted when we assessed the performance of AutoTutor-1. These methods assessed learning gains, the conversational smoothness and pedagogical quality of AutoTutor’s dialog moves, a bystander Turing test, and performance assessments of LSA. When assessing learning gains, we used a repeated measures design that assigned learning condition to macrotopics according to a counterbalancing scheme (hardware, software, Internet); this permitted us to examine Aptitute X Treatment interactions. These methods and measures will also be used when evaluating the performance of AutoTutor-2. However, we plan on expanding the assessment procedures in three fundamental ways.

  1. More learning conditions. The the following learning conditions will be compared: (1) no new learning control, (2) rereading a chapter from a book, (3) human tutoring, with the tutor selected from the pool of tutors in a university setting, (4) AutoTutor-1, (5) AutoTutor-2, and (6) other contrast controls. AutoTutor-2 will contain the enhancements that involve more ideal pedagogical mechanisms, as discussed above. AutoTutor-1 has the dialog and natural language processing (D/NLP) capacities of unaccomplished tutors, but not the more sophisticated ideal pedagogical mechanisms. In some experiments, it will be possible to have versions that have the ideal mechanisms, minus the D/NLP. In some experiments, we plan to lesion (i.e., turn off) particular modules of the tutoring system and assess how performance compares to the complete AutoTutor. So how effective is AutoTutor when we turn off the talking head (bubble print instead)? Or the animations? Or the simulations of physical events? Or attempts to get to student to articulate knowledge precisely? Effect sizes will be computed and compared to the various control conditions (1, 2, and 3). The effect size on learning gains (measured in standard deviation units) can be computed as [mean (AutoTutor) – mean (Control)] / [SD (Control)]. As the research evolves, it may be unnecessary to include condition 1. It is interesting to note that conditions 1 and 2 have not significantly differed in our previous research.
  2. Additional outcome measures. In the domain of physics, we plan on adding outcome measures that are routinely used in assessments of Newtonian physics. This includes the same set of tests that have been used by VanLehn in his tests of the Andes ITS, which is being used in the Naval Academy. We need a better test of deep comprehension for both computer literacy and physics. We plan on using a 3-alternative, forced-choice that adopts principles of qualitative physics (see Graesser, Olde, & Lu, 2000). It is assumed that there are a set of N components in a system, which are connected by a network of -, +, and 0 relations. Suppose that component C is affected in some fashion (i.e., increased input, broken, initiated). How would this event propagate its effects on other components in the system? Specifically, how would it influence another component X? Would X increase, decrease, or stay the same? The alternatives on the test item would be increase, decrease, and same.
  3. Eye tracking. We plan on using our eye tracking equipment to explore how attention is managed in AutoTutor. An eye tracking lab has recently been set up at the UofM and we have collected data on the processing of illustrated texts on equipment that malfunctions (Graesser, Olde, et al., 2000). There is an ASL Model 501 eye tracker with a head mounted unit. This equipment can be used to investigate several questions in the applied and basic science arenas. For example, the conversational agent is presumed to play a central role in directing attention during tutorial dialog because it is frequently speaking, displaying emotions, pointing, gazing and directing attention to graphical displays. However, this is an empirical question that has not yet been systematically investigated in the literature on conversational agents. Perhaps the agenda of the conversational agent is superceded by more stimulus-driven features, such as motion, color, abrupt changes in luminance, and new objects presented on the display (Pashler, 1998; Yantis, 1998). We will conduct controlled experiments that investigate the conditions under which attention is guided in a goal-driven fashion versus a stimulus-driven fashion. Is attention to special screen locations more likely to be guided by gazes of the animated agents, by deictic gestures, by speech, by graphic displays, by text, or some combination of these inputs. The plan is to scrupulously manipulate features of AutoTutor as independent variables and to measure the impact on eye tracking and other measures of visual attention.

Extensive use of multimedia can have its advantages. But there are also liabilities from “feature bloat” to the extent that a multimedia show splits the attention of the student and overloads working memory (Schneiderman, 1998; Sweller & Chandler, 1994). The coordination of the visual and auditory information will be guided by recent research on multimedia, education, and cognitive science to the extent that research is available. For example, spoken narrativion (i.e., what AutoTutor says) needs to be sequenced simultaneously with picture depictions (Mayer & Moreno, 1998), text and pictures must be in spatial and temporal contiguity (Moreno & Mayer, 1999), and lengthy narrative messages should not be delivered in both speech and print simultaneously (Kalyuga, Chandler, & Sweller, 1999). Nevertheless, available research on the cognitive processing of multimedia is surprisingly limited to short instruction periods of 100-300 seconds. Therefore, we need to collect data on how students cognitively interact with AutoTutor’s multimedia during the course of an hour. One approach is to conduct experiments that systematically manipulate the speech, graphics, animations, and methods of coordinating these components. What is the impact of the alternative multimedia presentation methods on cognitive measures of attention, comprehension, memory, and reasoning? Another approach is to collect eye tracking data from a sample of students while they use AutoTutor. The pattern of eye movements reveals the particular regions of the interface that the learner attends to while interacting in the complex learning environment.
Personnel

Most of the personnel in the proposed research participated in the existing NSF grant that developed AutoTutor-1, so there is proven track record in this research team. All of the faculty are members of the interdisciplinary Institute for Intelligent Systems (IIS) at the University of Memphis. Dr. Graesser (the PI) has expertise in cognitive psychology/science, discourse processing, artificial intelligence, and computational linguistics. He is a professor in Psychology, an adjunct professor in Computer Science, director of the Center for Applied Psychological Research, and is co-director of the IIS. Graesser is currently editing the journal Discourse Processes, has served as program chair for Division C of the American Educational Research Association, and has been serving on the editorial boards of several journals: Journal of Educational Psychology, Journal of Experimental Psychology: General, Cognitive Science (just invited), Cognition and Instruction, Society for the Scientific Studies of Reading, International Journal of Speech Technologies. Graesser has published 2 books, 8 edited books, over 100 articles in refereed journals, and over 150 articles in books or conference proceedings.

Dr. Garzon, professor and chair of Computer Science, has worked over 10 years in neural net and complex systems analysis, modeling, and development; he played a major role in developing the conversational agent in AutoTutor. Dr. Kozma, an assistant professor in Computer Science, has investigated neural networks, fuzzy systems, and complex dynamical systems. Dr. Hu is an associate professor of Psychology, with extensive experience in software development and mathematical modeling. Dr. Gholson is a professor of Psychology, with expertise in cognitive development, educational psychology, and multimedia processing. Dr. Person is an assistant professor of Psychology at Rhodes College and has conducted a decade of research on tutoring. Dr. Wolff is a new assistant professor at the University of Memphis and is a newcomer to the AutoTutor project. His expertise is in cognitive science, with a focus on language processing and links between animation and linguistic descriptions. Dr. Franceschetti, a professor of Physics, has conducted research in complex systems, self-organizing systems, and DNA computing. Dr. Louwerse is a new postdoc at the University of Memphis; he has a background in linguistics, semantics, and discourse processing. These six professors will supervise 6 graduate students funded on this project. A computer programmer will also be hired to devote full time to the development of AutoTutor-2.