Reports – Reflective Code

Linguistic Networks

The following is a report I wrote for my Artificial Intelligence course in November 2006. It discusses Franklin Chang’s article entitled “Symbolically speaking: A connectionist model of sentence production.”

F. Chang developed neural networks to produce proper sentences from basic messages. His networks were implemented on LENS neural network software. This research was performed around the year 2000 and a paper detailing his research and findings was published as an article in Cognitive Science in 2002. In this paper, I will explain, summarize, and analyze his article.

Background

Because Chang’s models are neural networks mimicking human linguistic abilities, comprehension of his work relies on familiarity with linguistics and connectionist models. Linguistics will be addressed as necessary throughout this paper. A neural network is a computing model comprised of nodes (also called units) grouped into layers. The nodes between two layers are connected by weights. As nodes are activated, the activations of each layer are passed forward to the next layer by the connecting weights. Layers are connected in a forward-feeding fashion so that there are no loops. The value of a weight determines how much of the activation from the sending node is transferred to the receiving node. The total of all the activations received by a node determines its activation value. The result is a network of nodes that pass activations forward through the network. Activations originate from the input layers, which serve as the input for the network. These activations feed-forward to the output layers, whose activations serve as the output of the network. Learning occurs as weights are adjusted to correct the actual output values to match the desired output values.

Abstract

Chang’s article discusses and compares several connectionist models for properly constructing sentences appropriate to given messages. To achieve this goal, the models need to learn the proper vocabulary and grammar for expressing messages. Traditional approaches have relied heavily on statistics, employing simple recurrent networks (SRN). This approach theorizes that by learning the statistical regularities of a sufficiently large training set, an SRN model can generalize to correctly produce sentences for similar messages not found in the training set. However this approach generally doesn’t generalize as well as desired. While it can correctly learn the training set, the SRN model poorly applies what it has learned to novel situations. To overcome this difficulty, Chang develops the new dual-path model. The goal of the dual-path model is to utilize symbols in its processing. This model separates the actual words from the context in which they are used. This abstraction allows the dual-path model to learn how the contextual symbols of a message are arranged independently of learning the contents of the symbols.

Summary

A message is an event comprised of an action (verb) and the related participants. “The participants in an event [are] classified into one of three event roles: agent, patient, goal. The agent [is] the cause of the action, the goal [is] the final location for the object, and the patient [is] the object in motion or the affected object.” (612) For example, in the simplified sentence the man give the boy a plate, the underlying message would be comprised of the action give, the agent the man, the patient a plate, and the goal the boy. These messages are abstracted from any surface sentence structure. Additional information is also included in the message, such as whether a participant is a definite participant (marked by the in a sentence), and various event semantics. Event semantics indicate features in a message. In our example, the man give the boy a plate, there is a cause (the man initiates the action), a motion (the plate is moved), and a transfer (the boy comes to possess the plate). These event semantics are important for distinguishing between possible uses of a verb.

The models are challenged with building proper sentences, something that takes years for a human to become proficient at. There are two aspects to this problem, correct lexical selection and proper sentence-level ordering. The lexical aspect involves representing each feature in the message with the correct word in the sentence i.e., choosing the correct vocabulary word to express a concept. For example, when the feature is DOG is in the message, the program must output the correct vocabulary word dog in the sentence. At the sentence-level, the words must be placed in the correct positions and order. To further illustrate our example, if the action is BARK, then the correct sentence would be a dog bark, not bark a dog. It is important that the network learns these two aspects independently of each other. Otherwise it may not be able to place words into novel positions that the network hasn’t experienced before.

Simple Recurrent Network

The Production Simple Recurrent Network (Prod-SRN) is the first network model employed by Chang. This model has three layers feeding the hidden layer: cword, message, and context. The hidden layer feeds an output layer: word. The model is recurrent, each cycle producing the next word of the sentence. Before the cycles begin, the message layer is activated with the features present in the message. These activations are static and do not change once initially set. Before each cycle, the cword layer is activated to represent the previous word in the sentence. Also, the previous activations of the hidden layer are copied to the context layer. This allows the context layer to serve as a memory of the previous cycle. During each cycle, activations feed-forward through the network. The resulting activations of the word layer indicate the word produced by the network. During learning, weights are modified using back-propagation to help match the actual output to the desired output. This process is repeated until the end of the sentence is produced (indicated by an end-of-sentence marker).

The results of this model are disappointing. In Chang’s research, the Prod-SRN model reaches 99% accuracy in learning the training set. However Prod-SRN performs very poorly on the test set, never topping 13% accuracy. It seems the knowledge learned by the Prod-SRN model is not of much use outside the training set. It does not learn lexical selection separately from sentence-level placement. The two are too tightly bound, hindering its ability to place words into new positions. This hindrance is due to the single mechanism used for learning both the lexical and sentence-level aspects of sentence construction. In short, Prod-SRN generalizes poorly.

Dual-Path Model

When Chang set out to design an improved model, he wanted it to be better suited to generalize in novel situations. He devised the dual-path model, with two distinct paths through the network. These two paths give the model two mechanisms with which to learn sentence construction. One pathway is the sentence-level pathway. The first half of the sentence-level pathway sequentially leads from the cword layer, to the cwhat layer, to the cwhere layer, to the hidden layer. Just as in the Prod-SRN model, the cword layer is activated with the previous word of the sentence. The cwhat layer represents the feature associated with the previous word. The cwhere feature represents the role associated with the previous word. This structure causes the activations reaching the hidden layer in the sentence-level pathway to be abstracted from the actual word, symbolically representing only the role of the current word in the message. There is one cwhat node for each possible feature and one cwhere node for each possible role. The cwhat–cwhere weights are set according to the message and are static throughout a sentence’s production. For example, if the action is GIVE then the weight between the ACTION node in the cwhere layer and the GIVE node in the cwhat layer would be set as positive. And if the agent is MAN then the weight between the AGENT node in the cwhere layer and the MAN node in the cwhat layer would be set as positive. All other unused weights between the cwhat and cwhere layers are set to zero. These bindings represent knowledge of the message in the network.

The second half of the sentence-level pathway is a reversal of the first half. The hidden layer leads to the where layer, then to the what layer, and finally to the word layer. The weights between the where and what layers are a mirror copy of the weights between the cwhere and cwhat layers. This way the hidden layer can receive role activations from the first half of the pathway, and send role activations through the second half. The activated roles are translated back into the related features by the where-what weights.

The other pathway in the dual-path model influences lexical selection. It leads from the cword layer, to the ccompress layer, to the hidden layer, to the compress layer, and finally out to the word layer. This pathway learns in a manner similar to the Prod-SRN model, except that the layers are smaller because the dual-path model does not solely rely upon this pathway as a means of sentence construction.

Three additional layers feed the hidden layer. The cwhere2 layer contains the summation of previous cwhere activation levels. This serves as a memory of roles previously activated during sentence production. The context layer is a copy of the hidden layer’s previous activations. This also serves as a memory, just as in the Prod-SRN model. Finally, the events-semantics layer encodes the event semantics of the message. There is a unit for each possible event semantic, and appropriate units are set as active during the duration of sentence production.

Test Results

The dual-path model is also tested and compared to the Prod-SRN model. Not only does it learn the training sets just as well as the Prod-SRN model, it also correctly constructs about 80% of the sentences in the testing set. This is much better than the 13% accuracy achieved by the Prod-SRN model. The success of the dual-path model is owed to the symbolic processing achieved by separate mechanisms for lexical selection and sentence-level ordering. This allows the dual-path model to determine the placement of the words independently from selection of the words. The result is a markedly improved ability to generalize compared to the Prod-SRN model.

Two additional models are created as modifications of the dual-path model. They both differ from the dual-path model in only one way. This serves to highlight the importance of particular features in the dual-path model. The no-event-semantics model is identical to the dual-path model, except that it lacks the event-semantics layer. Without event semantics, the model losses information about how the message is to be constructed. In practice, it learns the training set as well as any other model, but produces sentences in the testing set with only 50% accuracy. This decrease in performance highlights the importance of event semantics in sentence construction.

The final model was the linked-path model. It differs from the dual-path model in that it has a link between the hidden layer and the what layer. This violates the dual-path architecture because it circumvents the symbol abstraction provided by the sentence-level path. Sometimes a network can perform better with extra layers or links between layers added for complexity. This alternate model was developed to test if the success of the dual-path model is primarily attributed to its extra complexity, or to the two separate pathways. The results show that the linked-path model learns the training set just as well as the other two models, but can only achieve 70% accuracy on the testing set. In fact after a certain point, the accuracy of the linked-path model decreases with additional training. This suggests the linked-path model starts to memorize the training set, a problem that is avoided in the dual-pathway architecture. It can be concluded that the dual-path model’s success is due to its separation of lexical and sentence-level processing.

Additional testing and analysis of the dual-path model suggests that it learns in ways similar to humans. When the system is damaged (by randomly changing weights) in particular regions that correspond to regions of the brain in impaired patents, it is found to generate errors similar to the human patients’ errors. Other testing shows that as the dual-path model learns, it over-generalizes and sequentially constrains its over-generalizations in a fashion similar to children. As the dual-path model learned, it over-generalizes the use of verbs into improper constructions because the sentence-level pathway allowed it to place verbs a number of ways. But as continues to learn, the lexical pathway begins to constrain the over-generalizations to prevent particular verbs from being used improperly. This same pattern of behavior can be observed in children. The similarity between human examples and the dual-path model is a natural result of the dual-path model being designed specifically to mimic human-like use of symbols in sentence construction.

Analysis

Chang’s work is impressive because he successfully unites several fields of study into a fully functional model. His accurate linguistics model is constructed as a compromise between two connectionist schools of thought, one statistics based, the other modularly based. Chang borrows from both, building a modular sentence-level pathway and a statistical lexical pathway. I think the most innovative feature is the representation of the message as bindings between the word and where layers. Although not discussed in the article, I imagine that the most difficult aspect of the project would have been the generation of valid messages and sentences to as training and testing sets.

Chang’s analysis of the dual-path model reveals close similarities to observed human linguistic abilities, development, and hindrances. Therefore, it helps confirm the linguistic theories that it is built upon. It raises interesting questions about what other seemingly complex human behavior can be mimicked given an insightful model. Further research could be conducted with an expanded domain, one with an increased lexicon (Chang only used 63 words) and a greater assortment of event semantics to produce a greater variety sentences. Other extensions might involve adding noun-verb agreement or verb-tenses. Additionally, the system could be reversed to extract a message from a sentence.

WinOthello – A C# implementation of Reversi

The following is a report on my C# program, WinOthello. I designed it for my Artificial Intelligence course in November 2006.

Download (57 KB) the game or the source code (85 KB).
Provided under a permissive open source license.
.NET Framework 2.0 required.

Abstract
As an exploration of artificial intelligence and particularly game state space search, I developed WinOthello to simulate Othello games. This application allows human and computer players to interact with a virtual game board displayed on the user’s screen. The primary objective of WinOthello was to devise an adept computer player that proves challenging to average human players. This goal was achieved via a depth-limited minimax search of the game space and quickened with the application of alpha-beta pruning.

What is Othello?
Othello is a strategic two-person board game. Players take turns placing pieces on an eight-by-eight board. Each piece has two distinct faces: a light face and a dark face. The first player is designated the dark player and places pieces on the board dark side up. The second player is the light player and places pieces light side up. Pieces can only be placed in positions that outflank the opponent’s pieces. Outflanking occurs when a player places a piece that causes a contiguous line of one or more of the opponent’s pieces to be bounded by the player’s pieces. All of the opponent’s outflanked pieces are then flipped over, and the turn ends. If a player can not place a piece, then play passes to the other player. The game ends when neither player can place any more pieces. The player with the most corresponding pieces on the board at the end of the game wins.

Why Othello?
I choose to explore Othello because it is a perfect candidate for applying state space search. This falls under the game playing field of artificial intelligence. Othello is suitable because it is a two-person game. One player can be viewed as trying to maximize his score, while the second tries to minimize the first player’s score. This is a foundational concept of minimax. Additionally, Othello provides well-defined moves, meaning that for any game state, a list of all possible immediate moves can be generated. The generation of moves allows the projection of later game states, another requirement of minimax. In Othello pieces can only be placed and can not be moved or removed. This means that only a finite number of moves are possible, which simplifies a minimax search. Another convenience is that Othello is purely strategic and doesn’t rely upon chance. This makes the game easy to accurately model. All of these factors make Othello an excellent choice for applying AI techniques.

Goals
My primary goal was to develop an effective Othello computer player. I planned on using a depth-limited minimax search to find good moves. As a further enhancement, I wanted to apply alpha-beta pruning to reduce the search time. The resulting computer player should decide upon a move within a reasonable amount of time, and win almost every game against an average human player. This portion of my project is referred to as the engine and directly relates to artificial intelligence.

My secondary goal was to develop a pleasant Othello GUI. The board should be displayed to the user and allow simple clicks to place pieces. Game settings are to be set at the start of each game. Additional features could include hints for human players and a game history allowing the undo of a move. This portion of my project is referred to as the GUI and isn’t directly related to artificial intelligence.

Approach
Minimax is at the heart of my Othello project. Minimax involves systematically searching possible future gamestates to decide the best move in the current situation. Starting with the current gamestate as the root node, child nodes are created for each currently plausible move. This generation of nodes continues recursively until a specified depth or an endgame situation. After node generation, each of the bottom nodes is evaluated. The higher a node’s value, the more advantageous it is to the current player (MAX). The lower a node’s value, the more advantageous it is to the opposing player (MIN). These values bubble up to their parent nodes. A parent node chooses the move leading to the maximum child node if the node’s game state represents MAX’s turn. If it is MIN’s turn, the parent node chooses the move leading to the minimum child node. This assumes that MAX will always wants to choose a move that maximizes his score, and MIN will choose a move than minimizes MAX’s score. Once the root node chooses a maximizing move, that move is chosen for play.

As an optimization, I extended the minimax search with alpha-beta pruning. This greatly reduced computation time without changing the result of the search. Alpha-beta pruning works by halting the generation and evaluation of nodes for portions of the search that will not be chosen anyway. To make the pruning more efficient, moves are sorted before evaluation with the most promising moves checked first. This increases the likelihood that poor moves will be pruned from the search. The decreased computation time allowed me to increase the depth of search. In general, greater search depths should result in better move decisions.

Static Evaluator
I implemented a very simple static evaluator. It simply counts the number of pieces MAX has on the board, and subtracts the number of pieces that MIN has on the board. The resulting number reflects the relative advantage that MAX has over MIN, with a high number indicating a higher advantage. If the gamestate is an endgame situation, a value of 100 is returned for MAX’s win, and -100 for MAX’s loss (or a tie).

Plausible Move Generator
The plausible move generator results a list of all possible moves available to a given player for a given gamestate. This turned out to be a tricky section of code that scans outward from each board location for possible ways to outflank the opponent. Each possible move location is also associated with a value indicating the number of pieces that would be flipped by the move. This value is used to sort the moves for effective alpha-beta pruning.

Results
WinOthello is an effective application of artificial intelligence. The engine implements minimax searching and prunes the search space to greatly reduce the computation time. Despite this efficiency, there is still a trade off between difficulty and quickness. The computer player’s difficulty is directly related to the depth of the minimax search. The greater the difficulty, the deeper the search and the longer it takes. A depth of six takes only seconds on modern computer hardware. I am relatively new to Othello and do not consider myself an expect player by any means. So if I can regularly beat the computer player, I consider it to be a weak player. Increasing the difficulty to 10 makes the computer a much more difficult opponent. However, at this depth, it can take the program about a minute for it to choose a move. It is debatable if this is a reasonable amount of time. It seems that, in WinOthello’s current form, challenging game-play and quick computation are mutually exclusive.

The GUI is simple, understandable, and easy to use. Many of the features outlined as goals were implemented. However, the undo feature hasn’t made it into the game yet. I suspect that it would be fairly easy to add, but I haven’t investigated its implementation yet. An additional feature I added near the end of WinOthello’s development was a game tree viewer. This function displays a tree containing each node traversed in the last minimax search performed by the program. The tree shows rudimentary information such as if a node is MAX or MIN, the value of the nodes, the fraction of each nodes’ children that were actually searched, and the best move selected from the node’s children. I consider this feature incomplete as it was added late for debugging purposes. I decided to keep it for its educational value.

Comparison
Many other people have developed Othello playing programs. Logistello was a strong Othello computer program written in the 1990s by Michael Buro. This program rose to prominence during its time, winning computer Othello tournaments and soundly defeating the world champion, Takeshi Murakami, in August 1997. It used a more complex static evaluator that considered Othello strategies such as mobility and parity. Logistello also used a book of opening moves that guided its moves at the beginning of the game. It even learned, expanding its book as it played each game. WinOthello does not even begin to approach the complexity of Logistello. Although Logistello is far superior, WinOthello was still a useful exercise in applying and exploring artificial intelligence. A quick comparison to Logistello suggests that WinOthello could greatly benefit from an enhanced static evaluator.

Program Architecture
WinOthello is written in the object-orientated language C#. I will only discuss the architecture of the project’s engine because the GUI is not relevant to artificial intelligence. The board is represented by a board class. The class contains a pair of two-dimensional Boolean arrays, each eight-by-eight, one array for each player. Each piece on the board is represented by a true value in the corresponding array. Various operations can be performed on the board, such as getting a position’s color, testing the possibility of using a location for a move, getting the players’ scores, making a move, copying the board, running the static evaluator, and running the plausible move generator. The game knowledge is primarily represented by this class. Most other classes in the engine operate on this board class.

A player object represents each player. This class indicates if a player is human or computer, the difficulty level, and the player color. This class contains the recursive searching method used to get a computer player’s next move.

A move object also exists. It is used to store the moves found by the plausible move generator. Each move contains the location of the move on the board, a value representing the number of pieces immediately flipped as a result of the move, and a random number used to randomize the order of moves with the same value.

The game object serves as an interface for the GUI to access the engine and controls most aspects of the simulated Othello game. Methods are provided to start the game and allow players to move. Values such as the location of pieces, the scores, and the current turn can also be accessed through the game object. Because of WinOthello’s modular design, a completely different interface can be written that uses the exact same engine without having to rewrite any code game-specific code.

Extensions
Far more could be done with WinOthello. The static evaluator could be enhanced with better heuristics. Various parts of the program could be optimized. Some sort of memory could be added to quicken searches by recalling previously generated moves. An undo feature could be incorporated. Alternative interfaces could be implemented, such as a command line or web interface.

Reflection
My Othello journey has been both exciting and disappointing. I was happy to make quick progress in developing a GUI and working game engine, and thrilled to see it actually make good moves. However, my chosen approach proved limited. The simple static evaluator seems to be the program’s weak point. I choose it because I wanted to keep the design simple, and because I assumed that the program would choose good moves if it was allowed to search the game space deep enough. However time proved to constrain the search depth more than I anticipated. The program isn’t allowed to search as deep as it needs to effectively use such a simple static evaluator.

The greatest difficulty was in trying to design an elegant program structure. I started rewriting the engine several times, once completely rewriting it. It could have been beneficial to map out the program in detail before I started coding. However, given the experimental nature of the assignment, I think my approach was appropriate. I wasn’t sure of what would work best until I had tried it.

An interesting question raised by my project is what constitutes artificial intelligence? My program uses a brute force method to find an ideal solution. Such systematic probing does not seem very humanlike. Even a “smarter” Othello program works by searching a game space and evaluating game states based on preprogrammed heuristics. It seems that all the intelligent behavior stems from the programmer. Any program’s “intelligence” is a result of its design. Programs are simply tools imbued by their creators with a distinctly artificial form of intelligence.

Genius – Made or Born?

The following is a report I wrote for my Artificial Intelligence course in September 2006. It discusses an article by David Dobbs entitled “How to be a genius.” This report also references an email exchange (source not available) between Douglas Hofstadter and David Dobbs.

In the email conversation between Howard Rutiezer and Douglas Hofstadter, Rutiezer presents an article by David Dobbs while Hofstadter rebuttals it with his own observations. The topic of discourse is the relationship between innate intelligence or talent, and greatness in terms of success or recognition. Dobbs contends that although some people are inherently smarter or more talented than other people, their gifts do not guarantee their greatness in terms of success. Furthermore, people who achieve greatness are not necessarily more gifted than usual. Dobbs argues that greatness is achieved by fervent and lengthy hard work, and is not dependant on a person’s inherent traits. Hofstadter strikes back, presenting his personal observations and experiences as counterexamples to Dobbs’ viewpoint. Hofstadter asserts that hard work alone can not achieve greatness; it is necessary to also be specially gifted.

Dobbs starts with a narrative, explaining how he enjoyed natural intelligence and writing talent in his youth. But as he matured, he discovered that his natural talents would not carry him far unless he intently dedicated himself to his work. He cites studies in which people who have achieved greatness are shown to have high IQs on average, but not remarkably high IQs. This indicates that another factor is at work, an individual’s drive and willingness to work. According to Dobbs, some factors leading to greatness are constructed as a child. An environment encouraging children to explore and learn is thought to contribute to the child’s potential for greatness. Even then, Dobbs believes that someone who has achieved greatness is rarely exceptional outside of their field of expertise. Their greatness is focused in one area of practice that they have dedicated themselves to, and often doesn’t spillover into unrelated areas of life. Thus, greatness is the result of our (sometimes average) inherent talent being fully realized through massive amounts of dedicated hard work.

However, Hofstadter very much disagrees with Dobbs. As a graduate student, he found himself unable to ascend his understanding of math above a certain level of abstraction, despite his strenuous efforts to overcome this barrier. Thus, he believes that although hard work will certainly pay off, no amount of hard work can ensure greatness for someone who simply was born without the related capacity for greatness. He also contends that there are many who achieve greatness with only a modest amount of effort; they are carried almost entirely by their inherent abilities. Therefore Hofstadter believes that greatness is in large part determined by the abilities one is born with, and dedicated work plays a small role in unlocking that potential.

Who is correct? Is genius made or born? The issue is widely debated. Hofstadter mostly draws upon his personal observations and experiences, while Dobbs also supports his assertions with several large studies. Personally, I believe that both natural talent and dedicated work are important for achieving success. But, like many things in life, there are obvious expectations. Some people are natural cooking connoisseurs or chess masters and put forth only modest amounts of effort. Perhaps we aren’t all capable of being so exceptional, but most people can be reasonably successful with enough hard work and only modest natural talent.

Cognitive Architectures and General Intelligent Systems

The following is a report I wrote for my Artificial Intelligence course in September 2006. It discusses Pat Langley’s article entitled “Cognitive Architectures and General Intelligent Systems.”

In his article, Langley claims that artificial intelligence is a fragmented field that has lost sight of its original goal: to create general intelligent systems. Such a system would resemble humans in their ability to reason and make intelligent decisions, and be applicable in various environments. But because of the difficulty in designing such a system, much of the artificial intelligence field has become focused on much more specific tasks, such as computational linguistics or computer vision. Langley asserts that more attention should be refocused on the development of general intelligent systems.

There are three primary paradigms that attempt to achieve general intelligent systems: multi-agent systems, blackboard systems, and cognitive architecture. Multi-user agent systems resemble object-orientated programming; each agent has a role and defined interfaces for communicating with other agents, much like objects in a programming language like Java. The next paradigm, the blackboard system, is similar to multi-user systems in that modular agents are used. However, they communicate indirectly via a shared resource, which makes the agents less directly dependent upon each other. Because it allows for greater flexibility than a multi-user agent system, the blackboard system is thought to be more similar to cognitive thought.

Cognitive architecture is the third paradigm. This approach emphasizes interdependence – the modules constrain each other. This high degree of interdependence is considered to best model cognitive thought. Langley uses the cognitive architecture Icarus as an illustration of a Cognitive architecture. Icarus represents knowledge as propositions. For example, the proposition “(current-street me A)” could represent “I am currently on street A” within a system. This analytical representation allows Icarus to match conditions with existing knowledge and supplement its knowledge with the result of the condition.

The Icarus architecture stores long-term and short-term memories for an agent. Long-term memory consists of conceptual memory and skill memory. Conceptual memories are logical predicates that are used to make deductions. Skill memories are a set of steps that are taken to achieve a goal. The short-term memory consists of belief memories, goal/interaction memory, and two buffers: perceptual buffer and motor buffer. The perceptual buffer is refreshed each cycle with information about the environment, such as the existence and properties of objects. The motor buffer consists of all the actions the agent will perform that cycle. Belief memory is built by applying what is known in the perceptual buffer to the conceptual memory to produce inferences. When the conditions of a concept are contained in the belief memory, the result of the concept can be added into the belief memory. The goal/intention memory attempts to use skill memories combined with the information accumulated in belief memory to execute skills that reach a desired effect. When the conditions of a skill are satisfied by the belief memory, the actions specified by the skill may be executed. Much of Langley’s article is devoted to the explanation of this interdependent architecture.

Two important aspects of the Icarus architecture are its problem solving and skill learning abilities. Icarus can achieve goals for which it does not already have a skill by using other skills and concepts. Icarus will attempt to use backward chaining to execute other goals (concepts or skills) until it is able to achieve its original goal. This is a remarkably human approach: when confronted with a problem with an unknown solution, the system applies what it already knows and is capable of doing to the problem in the hopes of discovering a series of steps that lead to the solution. Additionally, whenever Icarus reaches a goal by problem solving, a new skill is saved that contains the subgoals used to achieve the goal. Since these new skills are available for future use, these skills can be utilized by later attempts at problem solving. It remembers the solution to a previously unknown problem so that if the system encounters the problem again, it can draw upon its experience to quickly arrive at the solution. Incremental increases in skill memory occur when these later attempts successfully use a previously learned skill and incorporate it into the new memory.

Langley advocates for advances in the development of general intelligent systems, and describes cognitive architecture as a potential avenue for achieving such a system. He familiarizes the reader with cognitive architecture by quickly explaining the structure and mechanisms of the Icarus architecture. He finishes by acknowledging other promising architectures and by identifying issues, like episodic memory, which need more research. The resulting article raises awareness about the desire for general intelligent systems and the potential of cognitive architectures.