Machines that think, dream, aspire: ever since John McCarthy coined the term ’artificial intelligence’ in 1956, mankind has striven to create sentience.
To an extent, we’ve succeeded. Intelligent systems assist our elderly, vacuum our floors, and guide our web searches. Automated telephone assistants are nearly as easy to talk to as real people, and children’s toys learn to recognize voices. But do these accomplishments indicate true intelligence, or are they simply ingenious technological tricks?
The answer depends on whose definitions you’re using.
AI researchers distinguish between strong AI and weak AI. The latter refers to any system that responds appropriately to its environment, and the former refers to a system capable of performing any intellectual task that a human being can do. A virtual stroll through the most recent International Robot Exhibition1 will convince even hardened skeptics that weak AI is already among us. But although specialized systems are able to recognize faces, respond to spoken commands, and detect high-radiation areas in nuclear reactors, strong AI—along with anything resembling sentience—continues to elude us.
One barrier to creating genuine, human-level intelligence lies in the inherently specialized nature of machinery. Consider NASA’s Mars Exploration Rover. This $25 million device can navigate uneven terrain, identify and avoid obstacles, collect dust particles, and abrade rock surfaces. The Rover took years to design, required 18 months to assemble, and is arguably one of the most sophisticated pieces of equipment ever allowed to perform autonomously. But if you plop it on a park bench and instruct it to unwrap a stick of chewing gum, it’s going to be as useless as a jackhammer in a china shop. Ask it to play chess, and success is even less likely.
Even situated learners like MIT’s Kismet2 and Cog3 robots have a difficult time adapting to new situations. Why? Because transferring knowledge from one domain to another requires generalization. Successful generalization is a hallmark of human intelligence, but for robots and computers it’s notoriously difficult. Understanding why requires an idea of what goes on beneath the hood of today’s most popular AI algorithms. Let’s look at some of them.
Brute Force Intelligence: Decision Trees, Knowledge Bases, and Deductive Reasoning
In its broadest sense, agency can be described as a search through decision-space. Intelligent agents—humans included—seek actions that will produce a desirable outcome. One way to do this is to use a decision tree; a branching representation of decisions and their consequences. In a decision tree, the robot considers all possible actions that might be taken given its current state. It then considers which actions might be taken from each newly attained state, and follows the branches downward until it discovers a path that leads to its desired objective.
Decision trees can be extremely powerful. They were the basis for Newell and Simon’s Logic Theorist, the first mechanical device capable of proving mathematical theorems4. They were also the primary tool used by Deep Blue, the chess program that defeated Grand Master Kasparov. (Afterwards, critics claimed that Deep Blue’s victory was less the result of clever programming than of its massively parallel ability to evaluate 200 million board configurations per second5. Perhaps the most astounding aspect of the match wasn’t that Deep Blue defeated a human grand master, but that it was only able to beat him twice out of six games.)
The problem with decision trees is that they require complete a priori knowledge of the environment. A chess game is predictable and easily defined. It begins with a known configuration and it is possible to predict which new configuration will result from any given move. For machines equipped with modern technology, navigating the decision space is child’s play.
The real world, in contrast, is dynamic and unpredictable. A robot charged with the task of locating a fallen candy wrapper and placing it in a garbage can must be able to recognize the wrapper under different lighting conditions, from different angles, and against different backgrounds. It must also be able to tell the difference between a leaf, a candy wrapper and a $10 bill.
These sound like elementary tasks, but they are not. The number of different ways a candy wrapper might appear to a robot’s onboard camera is extremely large. The number of possible configurations of trees, rocks, flower beds, children’s toys and garbage cans in a randomly-selected park is equally daunting, and there is no guarantee that the configuration on any given day will be the same as it was the day before. The robot’s motors may become more or less efficient due to changes in the temperature, humidity, and corrosive build-up on its moving parts. The terrain the robot traverses might be dry or wet; dusty or sticky from spilled soda pop; or occluded by fog or driving rain.
If throwing away a candy wrapper sounds like a simple task to you, that is only because your brain is magnificently optimized for dealing with the inherent messiness of our world.
Closely related to decision trees are knowledge bases; extensive libraries of information intended for use in deductive reasoning. Whereas a decision tree begins with an assessment of the current situation and explores known courses of action, a knowledge base enables the synthesis of new ideas through propositional logic. Such machines ’learn’ by adding newly acquired factoids to the knowledge base and ’deduce’ by applying a sequence of mathematical laws to accepted factoids. The abilities exhibited by C-3PO in Star Wars and by Lt. Commander Data in Star Trek—most notably their tendency to calculate the odds of succeeding at a proposed task—are characteristic of knowledge based systems.
Knowledge bases work well when constrained to specific, carefully-defined topic areas. Unfortunately, as the number of factoids increases, so does the number of possible deductions, the vast majority of which are accurate-yet-useless statements such as “Green is a color. Red is a color. Therefore red and green are both colors.” Searching for useful deductions becomes very much like searching for a needle in a haystack the size of the Empire State Building.
The question of scale thus becomes increasingly important for rational systems. Current research strives to reduce the size of the decision-space by grouping perceptual states together or by cutting off unprofitable lines of search. This proves to be a rather delicate process. Cut off too few branches of the search, and the remaining decision space remains intractable. Cut off too many, and you might discard precisely the solution you were looking for.
Fictional computers like HAL from 2001: A Space Odyssey have used heuristics to prune the search tree—with disastrous results. Unable to locate an action sequence that permitted the accomplishment of its mission in the face of opposition, HAL 9000 settled for the best solution it could find: murdering the Discovery’s crew.
Asimov’s Laws of Robotics were designed to prevent this sort of mishap. Interestingly, the Three Laws of Robotics were not a decision-making process in and of themselves. Rather, they were an evaluation criterion; a yardstick used to accept or reject actions proposed by the decision-making algorithm. As Asimov’s stories demonstrate, however, even these rules are not infallible. Giving a machine the power to think must of necessity also give it the power to make mistakes.
Emergent Intelligence: Bayesian Learning and Neural Networks
Generalization—the application of past information to new situations—requires clustering. New situations must be recognizably similar to past experiences, otherwise previously-acquired knowledge becomes useless.
That sounds reasonable enough, but how do you teach an OCR program that the word “Literature” written in cursive script is functionally equivalent to the same word written in block letters? How do you teach a voice recognition program to distinguish between male and female voices, but not between hoarse and healthy voices? Attempts to explicitly enumerate such rules have resulted in little success.
One approach favored by AI researchers is called Bayesian learning. In this algorithm, environmental states are represented as a point in feature-space. (A feature may be anything from the number of corners on an alphanumeric character to the number of centimeters between a robot and the nearest wall.) The algorithm is then presented with a set of training data which includes concrete values for each feature, accompanied by a desired classification.
As an example, the features for a weather-forecasting algorithm might include the temperature, humidity, wind speed, and barometric pressure at the target location. The training data provides a classification (sunny, rainy, or cloudy) for several points in the feature space. When presented with new data, the Bayesian learner compares the input to its internal map of the feature space in order to determine the correct classification.
Neural networks offer a related approach to generalization. A neural network is a matrix of nodes connected by lines of activation. Each node ’fires’ if its input values exceed a programmatically defined threshold.
In one common type of net, a set of input values is fed into the nodes at the left of the matrix. These nodes fire (or not) in response, causing the next row of nodes to activate, and so forth. A set of output nodes provides a classification for the input data.
Like Bayesian learners, a neural network requires a training set, which it uses to adjust the strength of the connections between each pair of nodes. Influential input values will be assigned strong connections. If one of the input values turns out to be irrelevant to the task at hand, the neural network will ’ignore’ it by dropping the connection entirely.
Although inspired by neurological principles, neural nets have relatively little in common with the human brain. As Luc Reid points out in his March 2010 article “Future Brains,” thought is as much a chemical process as an electrical one. Dopamine, cortisol, testosterone, and dozens of other neurotransmitters carry messages across synaptic gaps, making communication between the brain’s neurons far more complex than a simple numeric connection. True axons also have a firing spike that builds, peaks, and fades over time. They fire asynchronously and must ’recharge’ between spikes, factors that are seldom incorporated in computer simulations. Brian Trent’s 2009 article, “Eternal Lives on Hard Drives,” describes some of the challenges involved in digitally duplicating the brain.
The greatest advantage (and simultaneously the greatest drawback) of neural networks is their incredible ability to learn things their designers never anticipated. A well-structured neural network can invent its own way of perceiving the universe, clustering inputs into groups that are utterly unintuitive to humans and yet extremely efficient in performing the task at hand. For example, rather than viewing age and ethnicity as separate character traits, a neural net designed to predict a person’s education level based on a photograph might lump both factors into a single feature labeled ’facial characteristics’. The same network might also learn to distinguish between several different posture profiles, including whether the subject’s weight rests primarily on the left or right foot and whether the subject’s shoulders are slouched. In this sense, neural networks are not unlike the aliens in Niven and Pournelle’s sf classic The Mote in God’s Eye, constructing custom devices capable of performing many functions at once.
Given the eerie unpredictability of neural networks once they have been set loose, it’s not surprising that many of fiction’s greatest AIs make use of them. Mycroft, the self-aware computer befriended by Manuel Garcia in Heinlein’s The Moon is a Harsh Mistress, used associative neural networks to process data. Lt. Commander Data’s positronic brain incorporated a neural network which presumably worked in conjunction with his knowledge base, a fairly common arrangement in today’s expert systems. The CPU of Schwarzenegger’s Terminator also relied on neural nets.
The Search For Sentience
Will mankind’s creations ever take on life of their own? Will they experience biographies as epic and poignant as that of Isaac Asimov’s Bicentennial Man? Possibly. But if true synthetic sentience ever arrives, it will probably look and behave far differently than we expect.
Throughout history, mankind has relied on the technology of the day as the medium for artificial intelligence. The works of Homer and Plato refer to talking statues made of bronze or clay. Edward S. Ellis’ 1865 story The Steam Man of the Prairies features a mechanical man powered by the technological innovations of his time6. Tik-Tok from L. Frank Baum’s Oz stories was described as a ’clockwork man’. In the early 1920’s Edmond Hamilton wrote of a computer brain that ran on atomic power7 and the prominence of computing in the late 1950’s lead to a flurry of fictional AI’s powered by electronics.
More recently, Robert J. Sawyer’s WWW:Wake series postulates an emergent awareness within the connections of the World Wide Web. Others have speculated about AIs based on nanotechnology and quantum computing. Based on this evidence, it is likely that when man first creates a device of equal intellectual capacity to his own, it will be based in an as-yet-uninvented technology.
1. Cool Robots from iREX (via Singularity Hub) <http://singularityhub.com/2009/12/04/cool--robots--from--irex&--2009--pics--and--video/>
2. Kismet (Sociable Machines Project, MIT) <http://www.ai.mit.edu/projects/humanoid--robotics--group/kismet/kismet.html>
3. Cog (Humanoid Robotics Group, MIT) <http://www.ai.mit.edu/projects/humanoid--robotics--group/cog/>
5. Deep Blue (Chess Computer) <http://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)>
6. Edward S. Ellis, The Steam Man of the Prairies <http://en.wikipedia.org/wiki/The_Steam_Man_of_the_Prairies>