IJCNN
INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS
NEUROCONTROL
AND NEUROBIOLOGY WITH AEROSPACE APPLICATIONS
SPECIAL
SESSION PRESENTATIONS
IEEE Catalog Number: 92CH3114-6 ISBN: Softbound Edition 0-7803-0559-0
Casebound Edition 0-7803-0560-4 Microfiche Edition 0-7803-0561-2 Library of Congress Catalog No 91-59048
*This paper gives personal views, not the views of NSF or NASA. As government work, however, it is in the public domain. Many thanks are due to C.Lau and A.Penz for making this possible.
U.S. Government work not protected by US. Copyright
Neurocontrol and
Neurobiology: New Developments and Connections
Paul J. Werbos, Room 1151 National Science Foundation* and
Washington D.C. 20550
Andras J. Pellionisz
NASA Ames Research Center* 261-3 Moffett Field, CA 94035
ABSTRACT
The past two years have seen major progress in neurocontrol, particularly in the use of complex reinforcement learning schemes which appear ever more relevant to understanding neurobiology. For example, at McDonnell-Douglas, controllers which combine adaptive critic networks (i.e., approximate dynamic programming) with the use of backpropagation in real time have solved difficult control problems -- resistant to classical methods and simpler ANNs -crucial to the feasibility of building an airplane (the National Aerospace Plane, NASP) able to reach earth orbit. Such developments led to a joint NSF-McDonnell workshop in October 1990, and to a new book [1] which provides extensive implementation details. As these details emerged -- particularly in relation to planning, chunking and real-time adaptation of time-lagged recurrent networks -- parallels to neurobiology have grown stronger, and have begun to lead to empirical possibilities of importance to neuroscience. This has led to thoughts of NSF-NASANIH-NIMH-(?) collaboration, in facilitating what could become a Newtonian revolution in neuroscience, with cognitive implications as well. This paper will elaborate on each of these points in turn. Because of page limits, it will summarize important conclusions, and leave it to the citations to provide more details and the reasons behind the conclusions.
RECENT
PROGRESS IN NEUROCONTROL IN GENERAL
Artificial Neural Networks (ANNs) have performed four types of tasks in control applications. First, they have been used in subordinate roles -- for sensor fusion, pattern recognition, etc. -- in systems where the control signals were not generated by an ANN. Such applications have been very common and useful, but they do not meet the definition of neurocontrol. Second, ANNs have been used to "clone" human experts (or to clone automatic controllers too slow to use in real time). People have often trained ANNs to reproduce the actions of a human, taking as input the current sensor data available to the human; however, because good human controllers (like good automatic controllers) are highly sensitive to dynamics, it works better to treat these applications as an exercise in dynamic modeling or system identification or emulation of the human expert. Third, ANNs have been used to make a plant follow a desired trajectory, or stay at a desired set-point, or follow a desired reference model. This can be done in a direct way ("direct inverse control") or in an indirect way. The direct way fits the biologists' notion of learning the mapping from spatial coordinates (in which the desired path is encoded) to motor coordinates (actions required to meet that path), but it works better if dynamics are accounted for. In the indirect method, one defines an error function or disutility function which combines some measure of tracking error, jerkiness of movement, etc., and one then uses an optimization method to minimize that error over time. Narendra says that the latter works better and he has a stability proof for it [1]: Finally, there are two classes of neurocontrol designs to do optimization over time -- the backpropagation of utility (U), and the adaptive critic family of designs.
All four types of application have seen great advances in the past few years. Many tricks have been discovered which are crucial to the real-world success of such methods [1]. Miller has reported very accurate tracking of a real Puma robot arm, and rapid real-time adaptation to new conditions. Accurate Automation reports great improvements in control of the main arm of the space shuttle, in simulation (a problem which AI and classical methods have been tried on at length), funded by NSF; the NASA follow-on program director has authorized tests on the real arm, and says he expects a 10-fold improvement in productivity in use of the arm.
On the other hand, there have been many failures, too, due most often to inadequate system identification (i.e., poor "emulators") in those designs which include a prediction component. However, methods have been tested which can overcome that problem. In chemical industry applications -- involving real-world refinery data and wastewater treatment plant data -- prediction errors have been reduced by an order of magnitude, using a robust training scheme with time-lagged recurrent networks [1]. There is a new basis for developing even better prediction networks, with future research. (The relevant theory can be understood as an advance within the literature of dynamic system identification in control theory, which in turn is far beyond the static least-squares, iid concepts used by most researchers.) Improved optimization in chemical processing and combustors could be extremely valuable from an environmentalist point of view.
Unfortunately, many researchers have continued to reinvent the wheel and thereby fail to learn the lessons of the past, particularly in the case of direct inverse control and supervised control ("cloning") -- two approaches which appear extremely simple until one begins to analyze their performance and limitations. Some researchers have also used "new" methods which are (old) special cases of the backpropagation of utility; often they arrive at such designs by ignoring the link between present actions u(t) and later state variables (R(t+...)), or they develop simple plant emulators by perturbing the inputs and outputs of the plant to be controlled. There have also been interesting and successful applications of vibration controllers, and the like, which are special cases of Kawato's Feedback Error Learning (FEL) scheme [2].
PROGRESS
IN OPTIMIZATION AND REINFORCEMENT LEARNING
In addition, there has been substantial progress in the use of optimization networks and reinforcement learning -- the only classes of neural networks of any sort with any plausibility as models of entire organic brains. (Many researchers have used direct inverse control as a model of lower motor functions in the brain; however, Uno and Kawato et al [2] have shown through experiments that the lower motor system does have an adaptive optimization capability which cannot be explained on that basis. Further experiments on these lines will be crucial.)
In 1988, there were only four working examples of the backpropagation of utility [2]: Jordan and Kawato's simulated robot arms, Widrow's simulated truck-backer-upper and Werbos's official DOE model of the natural gas industry. Now, there are probably dozens, including Narendra's adaptive control scheme [1], McAvoy's Model-Predictive-Control scheme used in the chemical industry[1], Fukuda's robot arm controller[3J, Hwang's electric power scheme, etc. The backpropagation of utility can be very useful in engineering applications, and in developing basic theory, but it clearly does not describe biological systems; it can be used only in three modes - the backpropagation through time (BTT) mode which is not a true real-time method, the forward perturbation mode (whose cost scales as N [square] where N measures the size of the network), and truncation approaches (which ignore cross-time linkages and are therefore limited in capability). Random search methods have also been used to bypass the convergence problems with BTT, in an off-fine mode, but there are other ways to deal with those problems [1,2].
In the meantime, advanced adaptive critics -- systems which combine a sort of secondary reinforcement scheme (or "Critic" network) and some use of generalized backpropagation in real time -- have grown from being a theoretical class of method [2] to an important real-world option. At this point, four groups have developed working systems (McDonnell-Douglas, Jordan-Jacobs, BehavHeuristics, Jameson), but two of the four are substantial real-world applications, and the process has only just begun. (The Grossberg/Levine reinforcement learning schemes(4] could also be seen as a kind of advanced adaptive critic, in which "gated dipoles" take over the role of backpropagation, based on an assumption of linearizability like that in [5].)
The McDonnell-Douglas story is especially interesting. The composite materials group at McAir -- not the ANN group -- began by studying the very difficult materials fabrication problems which affect not only NASP, but the cost of all kinds of advanced aircraft. McAir has played a leading role in composite materials technology, but was unable to use affordable mass-production techniques to make high-enough-quality parts; they had spent millions in trying all kinds of classical methods, first-principles models, AI, etc., to no real avail. After hearing about neurocontrol techniques which could be explained in terms of classical control concepts [6], they decided to experiment with ANNs too. Direct inverse control did not work either, even on a simplified version of the problem. But they did not quit. The Barto-Sutton-Anderson adaptive critic worked well on the simplified problem, but could not learn fast enough (i.e., could not scale) on the full version; therefore, based on [2], they put together an advanced adaptive critic which, for the first time, produced real parts in continuous time up to their specifications [1,7]. After this success, and the development of the basic software, a wide variety of applications was developed, ranging from a reconfigurable flight control system for the F-15 through to thermal control for the NASP.
White and Sofge, who led the work at McDonnell-Douglas, have expressed the personal view that a neural thermal control system will be necessary to enable the NASP to reach its true objective: a speed high enough to reach orbit as an airplane, an achievement which could open up the solar system to true economical settlement by human beings. NASP still faces additional control challenges (and political obstacles); the editors of (1] still hope to provide test problems representing these challenges, so as to allow you -- the reader -- to make a major contribution to this important event in human history.
White and Sofge have since moved to MIT and Neurodyne, where they are working on new applications, but McDonnell-Douglas still has capabilities in this area. In their earlier work, they only took the first step beyond the Barto design: (1) instead of a scalar Critic, outputting a single reinforcement signal J, they used a multicomponent Critic [1,2]; (2) instead of using A-rp to adapt the Action network or controller, they used an Action-Dependent Adaptive Critic(1], which allowed them to use backpropagation; (3) based on our discussions of [2] and certain technical problems[1], they differentiated the inputs and outputs of the Critic with respect to time. But larger problems and better control will require designs with prediction or emulator components (1], and work in that area has only just begun. There is currently far more interest in these designs and in their potential applications than there are people capable of making them work. Also, their success has stimulated an expansion of the relevant theory [1], so that there is still a backlog of things waiting to be tried.
GENERAL
IMPLICATIONS FOR NEUROBIOLOGY AND SCIENCE POLICY
The neural network community has often talked about interdisciplinary cooperation between engineering and neurobiology, but there is still a tendency to think in terms of cooperation rather than unification. People often use very similar-looking models but totally different standards of validation. It is very easy to fall into a kind of routine or habitual mindset, and thereby fail to appreciate the enormous potential and the concrete reality of the opportunity now before us. For that reason, we will talk about some of the generalities first -- with an eye to policy issues -- before talking about specific new connections and views of the brain.
Some biologists despair of ever understanding the brain in a truly scientific way, analogous to physics; they despair of the sheer complexity of the brain, and the diversity of its lower-level systems. However, even in physics, the phenomena are impossibly complex; the trick (as in Newton's breakthrough) is to identify the dynamic principles behind the phenomena, which -- in the case of the brain -- involves the basic mechanisms of learning. Certain brain circuits are indeed ad hoc, complex, and different from person to person based on genetic factors; however, those components or circuits which involve higher intelligence are highly modular and adaptive [8], and should be understandable in generalized, engineering terms.
Since the brain as a whole system is a neurocontroller [8,9,10], this requires the development of families of neurocontrol designs capable of replicating brain-like capabilities. It requires an iterative kind of process, in which we are not afraid to make conjectures, test them, and continuously revise our views of the brain. It requires continuous two-way exchanges between neurocontrol engineers and experimental biologists, working in interdisciplinary teams. Engineering viability must be used more systematically as an additional standard of validation and source of inspiration for brain models and for brain theory in general. Cognitive neuroscience is also important, in encouraging those kinds of experiments which most reveal higher-order capabilities of the system, though true reverse engineering can do likewise, and comparisons between different classes of vertebrates will also be useful. New, more interdisciplinary funding mechanisms will be crucial to prevent the loss of these opportunities through the routine, habitual tendencies of bureaucracies staffed by specialists.
Despite the scientific importance of all this, certain disease-oriented lobby groups in Washington have sometimes asked about its potential importance for medicine. In fact, the understanding of learning or plasticity in the brain will be crucial to our understanding of how new connections are formed in the brain [12]; that, in turn, will be crucial to our future ability to graft new cells into the brain (cells from bioreactors [2], using technology resulting from recent discoveries at Johns Hopkins, where NSF has funded some work in tissue engineering), so as to enable enhanced mental performance in the aging or the disabled. There are many other potential benefits to a scientific understanding of the brain, but this one high-risk opportunity is by itself of enormous medical significance.
A NEW VIEW OF THE BRAIN
A fascinating pattern has begun to emerge in recent months; for reasons of space, however, we will extract only a few highlights from [1,9,10] and from a forthcoming joint paper on the cerebellum to be given at Rutgers this spring in an IEEE/Biomedical symposium.
As mentioned above, the advanced adaptive critics are the only known neural network designs potentially able to replicate the main capabilities of the brain, and there is reason to doubt that anyone will ever devise an alternative [9,10]. These designs require a use of generalized backpropagation, but not of the popularized special case so often questioned by biologists; there is good reason to believe that generalized backpropagation is biologically plausible [9,10]. (The original form of this method was in fact inspired by Freud [10]). In fact, the demonstrated capabilities of biological systems give us only three possibilities, all of which should be pursued more vigorously: (1) to look for generalized backpropagation in the brain, which will require improved instrumentation to consider a wide range of possibilities [9,10]; (2) to develop a much stronger engineering understanding and application of the Grossbesg/Levine designs, whose engineering viability is still unproven; (3) to invent something totally new, but defensible and workable from an engineering point of view, for the relevant class of problems. Again, the third possibility is very hard to imagine, for a variety of reasons [9,10].
Taking the first approach, it now appears likely [1] that advanced adaptive critics could solve truly challenging planning problems (like robot navigation through general cluttered workspaces without training to a specific location of obstacles, or like SDI) only if the Critic networks were simultaneous-recurrent networks. This, in turn, requires a design involving iterations-within-iterations and highly precise clocks, a design which sound biologically implausible and leads to decision times much longer than the effective sampling time of human motor control. However, empirical work by Freeman and by Llinas, among other, has shown that cycles within cycles and highly precise clocks do exist in the higher centers of the brain [1,10], and that the higher centers are too slow to fully explain motor control.
In the higher centers, it is clear (from the classic work of Olds) that the limbic system acts as a secondary reinforcement (Critic) network. It is clear that the cerebral cortex performs the prediction and state-estimation (or "working memory") functions of an emulator or prediction network, among others. It is clear that the problem of long decision times is solved by linking these higher systems to a faster, modular lower control system, in which the cerebellum plays a central role. Lisberger has measured latencies as small as l5ms in the lower system [14].
At the level of the cerebral cortex, it seems puzzling how system identification can be performed -- requiring time-lagged recurrence -- when the techniques now used to adapt such networks involve forms of backpropagation -- parallel to those used in backpropagating utility -which are not biologically plausible; however, using an Error Critic design [1], based on an advanced adaptive critic, it is possible to solve this problem. The Error Critic design fits very nicely with the evolutionary fact that 6-layer neocortex resulting from a merger of older general cortex with additional layers coming from the limbic (Critic) system.
In the lower motor system, there are stereotyped and nonadaptive postprocessors (or "motor pools" [8]), controlled by a fast adaptive lower control system. The cerebellum is essentially a feedforward Action network with two hidden layers: (1) a granule cell layer, containing a huge number of cells, reflecting the known fact that many hidden units are needed with feedforward nets in complex, general nonlinear situations [1]; (2) a Purkinje cell layer. The output layer consists of cells in the deep cerebellar nuclei and in the vestibular nucleus (which is wired up as if it were one of the deep nuclei, even though it is physically outside the cerebellum).
This arrangement fits the observation in neurocontrol that two hidden layers are often needed for effective inverse control. Mapping specific learned functions of specific cells (ala Hubel and Wiesel) requires geometric analysis similar to that used in robotics but more complex [11]; this will be useful in studies of learning, which will be our focus below.
Pellionisz and Linas [11] argued that the deep and Purkinje layers are both adapted in response to training signals sent from the olive to the cerebellum over climbing fibers. Recent work by Houk and Barto(13] elaborates on this for the Purkinje layer. Lisberger [14] and Robinson(15] have verified the adaptation of the output layer in response to climbing fiber signals. If these signals represent something like the derivatives of secondary reinforcement (J) with respect to the excitation levels of the output cells, then the adaptation of the Purkinje cells as described by Houk and Barto appears to fit an exact, electrical implementation of backpropagation through the upper part of this Action network: changes are proportional to the inputs to the Purkinje cell, to a global measure of the cell's excitation/output, and to the derivative of J with respect to the cell's output (from the climbing fiber). Normally, one would also have to know the weight Wij from Purkinje cell i to deep cell j, but the peculiar geometry of the cerebellum converts that term into an unnecessary scaling factor (due to the many-to-one wiring and the fixed sign of the weights). Other mechanisms for implementing backpropagation in biology would not restrict the geometry so much, but this arrangement allows a very fast adaptation, which is crucial to the high-speed coordination role of the cerebellum.
Houk appears to be puzzled by the fact that the synapses from the climbing fibers to the Purkinje cells appear to excite those cells, as well as train them (as discussed in [11] and [12]); however, if the signals on those fibers do indeed represent derivatives of J, then an opportunistic optimal controller would in fact modify current actions immediately, without waiting for adaptation of the network. There is an analogy here to the bias/calibration parameter used by McAvoy in Model Predictive Control [1], which increases the overall robustness of the system. An optimal use of this trick would probably require slight changes in the adaptation procedure (varying with the current level of bias adjustment).
Surprisingly, the Purkinje layer appears to be a time-lagged recurrent layer ("multistable" or "sticky") within an Action network. Thus it appears to perform something like state estimation. According to Pellionisz and Llinas [16], it also shows many of the capabilities of a Prediction network. Yet it clearly does not have the kind of wiring one would expect to generate a vector of predictions of sensor inputs; again, it is just a layer of an Action network. Once again, the need for very fast operation can explain the unusual architecture. The Error Critic design [l] can easily be modified to handle this kind of recurrence, when given a backpropagation signal (from the climbing fibers) to drive the whole process. The unknown, important role of the basket cells [8] may involve such an additional adaptation mechanism for the Purkinje cells. On the other hand, at this low level of control, a simple truncation design (ignoring cross-time effects) might be adequate.
A central problem lies in explaining how the olive cells themselves are adapted. Houk and Barto discuss this adaptation in verbal terms, but the engineering translation is far from obvious. (The existence of such adaptation is enough by itself to rule out a variety of models.) There are two obvious choices (1) training signals sent backwards along (or parallel to) the climbing fibers (!!); (2) fibers from the output layer of the cerebellum to the olive, enabling the calculation of training signals within the olive. Demonstrating and understanding these two alternatives is clearly the next important task ahead of us in the lower motor system. If the first of these options should apply, then the origin of these training signals may lead us back to a more complex story, involving the red nucleus and the tectum. Better wiring diagrams of the olive, the red nucleus and the deep cerebellar nuclei may be crucial to making sense of all this.
All of these possibilities are clearly leading us towards an exact link between well-defined mathematical operations and specific sites in the brain -- something which, when consolidated, would truly constitute a Newtonian revolution in neuroscience.
REFERENCES
1. D.White & D.Sofge, Eds., Handbook of Intelligent Control. Van Nostrand, 1992.
2. W.Miller, R.Sutton & P.Werbos, Eds., Neural Networks for Control. MIT Press, 1990. 3. T.Fukuda, T.Shibata,F. Arai, M.Tokita & T.Mitsuoka, Neuromorphic sensing and control for robotic manipulator - position, force and impact control. IJCNN, 1991, p.II A-1001.
4. D.Levine & S.Leven, Eds., Motivation, Emotion, and Goal Direction in Neural Networks. Erlbaum, 1992. 5. P.Werbos, Backpropagation and neurocontrol: a review and prospectus. IJCNN, 1989, p.I-209.
6. P.Werbos, Elements of intelligence, Cvbernetica (Namur), No. 3, 1968.
7. D.Sofge & D.White, Neural network based process optimization and control. IEEE Proceedings Conf. Decision and Control (CDC Hawaii). IEEE, 1990.
8. W.Nauta & M.Feirtag, Fundamental Neuroanatomy. W.H.Freeman: 1986.
9. P.Werbos, Neurocontrol, biology and the mind. IEEE Proceedings SMC, October 1991. 10. P.Werbos, The Cytoskeleton: why it may be crucial to human learning and neurocontrol, Nanobiology, Vol. 1, No. 1.
11. A.Pellionisz & Llinas, Tensor network theory..., Neuroscience, Vol. 16, p.245-274, 1985.
12. D.Purves, Body and Brain: a Trophic Theory of Neural Connections. Harvard U. Press, 1988. 13. J.Houk & A.Barto, Distributed sensorimotor learning. In G.Stelmach & J:Requin, Eds.,
1 Tutorials in Motor Behavior II Elsevier, 1992.
14. S.Lisberger,The Neural basis for learning of simple motor skills,Science, Voi. 242, p.728-735, 1988. 15. A.E.Luebke & D.A.Robinson, Climbing fiber intervention interferes with motor learning in the vestibuloocular reflex of the cat, submitted to J. Neurophysioi., Jan. 1992.
16. A.Pellionisz & R.Llinas, Brain modeling by tensor network theory and computer, the cerebellum: distributed processor for predictive coordination, Neuroscience, Vol. 4, p.323, 1979.
[END]
III-378