This thesis presents a theory of infant cognition, a computer model embodying this theory, a set of experiments chosen to test the model and results which show the close functional equivalence attained by the model with respect to very early human learning.
The theory is based on the assumption that there exists a genetic pool of resources available to the infant from birth, enabling him to employ sensory information to develop knowledge about his environment and hence learn to behave in context-dependent ways.
Attention was paid mainly to the problem of the nature of early knowledge with respect to its representation, acquisition, interpretation, and consequent adaptation.
The model contains three separate functional levels of memory.
The representational structure used was a Production Rule (a Condition-Action pair), a number of such Rules comprising the entire Production System, equating to human Long Term Memory.
A Production Rule could be regarded as an encoding of experiential knowledge, all possible behaviours of the model being dependent on the set of Rules currently contained in the System.
A Rule is activated by stimulus symbol structures maintained in a temporary Attention Span (Short Term Memory), and results in internal cognitive and/or external responses.
The model contains a set of innate Learning algorithms - encodings of psychological learning laws - which operate upon the Production Rules to produce transformations within the data base, and a set of Reinforcing Mechanisms which serve to alter the status of existent Rules. Thus the Production System is subject to adaptation such that the model constantly updates its impressions of the external world.
The set of experimental data chosen to test the model was taken from live psychological experiments conducted by Siqueland and Lipsitt. They demonstrate the model's ability to learn by strengthening, weakening and adapting its responses subject to environmental reinforcements. The results of several simulation runs show that the model's learning curves (the mean percent increase or decrease in elicitation of the positive response) could be made to closely resemble those obtained from the live three-day old human infants.
This model presents a valuable insight into the behaviour of adaptive systems within a total cognitive framework.
I wish to proffer my sincere thanks to the following:
Professor Bob Hopgood for his advice, constant and constructive criticism, unobtrusive direction of my research, all the effort and time expended upon supervising me, and most of all for the faith he has always had in my successful completion.
Dr. Mike Elstob for his encouragement when I most needed it.
Dr. Ronan Sleep for expressing alternative viewpoints.
Professor Mike Pitteway and all the members of Brunel's Computer Science Department for their wide and varied conversation, in particular, Mrs. Audrey Beck for her mere presence.
Mrs. Renee Clarke and Angela Simmons for their patience and typing abilities.
Last, but by no means least, to my husband Stephen for his wisdom, tolerance and encouragement, and to my son Shoban for all the love and data he supplied.
The figures in the original copy of the thesis were hand drawn. They have all been redrawn, hopefully without making any errors.
Any theory, proposed within any science, is a method of devising explanations that may be made to fit a number of observable facts. A theory of cognition is necessary if we wish to explain and understand human cognition at a level which is more profound than our everyday, common-sense understanding of what it means to think (Dodwell, 1971).
A philosopher's function has long been the analysis of thought. This assumption that thought exists, i.e. that humans think, has led to the postulation of cognitive processes and cognitive structures, the interaction of both giving, or attempting to give, a near exhaustive account of all observable human behaviour.
The objective of this thesis is the presentation of a theory of cognitive development commencing at the naive infant state. A model of this will also be programmed, with a set of routines defining the cognitive processes, with well defined data structures representing cognitive structures, and executed on a computer.
The role of the psychologist is the investigation of behaviour. He must propose a set of principles or a model which he may then use to explain all, or most of his findings. To date, such attempts at modelling cognition have been, of necessity, crude and piece-meal. The best models being special purpose and providing underlying principles for particular facets of human action - such as verbal learning and recall, concept formation and problem solving.
The cognitive psychologist (a cognitive theory nay be held without any reference to cognitive processes or structures (Beilin, 1971).) assumes the position that cognition, that is some form of mediating processes between the sensory input and the observable response output, exists. Such a standpoint usually leads to the characterisation of an active process organism, (Beilin, 1971) literally the spirit in the machine, although such an overt mentalist view is not strictly necessary. One may always fail back on specifying innate neural circuitry and organisational laws to account for this hidden entity. There are differing paths of research left open to the cognitive investigator.
He may assume a neurophysiological basis for his cognitive model (Hebb, 1949; Sokolov, 1963; Cunningham, 1972, 1974) or not (Becker, 1970; Lindsay,1973; Hayes-Roth, 1974). He may investigate particular cognitive skills such as verbal learning (Gregg, 1972; Simon & Feigenbaum, 1964), visual processing, also known as the area of pattern recognition (Selfridge, 1959; Dodwell, 1964; Uhr, 1966), protocol analysis (Newell et al., 1959; Newell & Simon, 1961, 1972), theorem-proving (Gelernter, 1959; Gelernter et al., 1960), game-playing (Samuel, 1959; Newell & Simon, 1963; Waterman, 1970) and natural language (Winograd, 1971; Wilks, 1971), He may investigate the nature of the underlying processes required for the acquisition of skill in a particular task (Young 1973; Klahr & Wallace, 1976). He may study cognition as lending to the growth and development of knowledge. The empiricist holds, for instance, that knowledge is derived totally from experience. The idealist argues that some knowledge must be innate and independent of experience. The developmental psychologist holds that knowledge is acquired in a series of well-constructed and ordered stages. Piaget (Piaget, 1929, 1949, 1953, 1955, 1956) attempted to bridge this gap with a theory of genetic epistemology. He proposed structuralism with genesis as opposed to genesis without structure, and structuralism without genesis.
However, some fundamental questions still remain unanswered, namely:
The model to be presented in this thesis proposes some answers to these questions. It does not attempt to present a structural equivalence but does attempt to present a functional equivalence to the workings of the human brain.
The subject area investigated was the manner in which the human infant employed sensory information to develop knowledge about its environment and hence to learn to behave in context-dependent ways. Attention was paid mainly to the problem of the nature of early knowledge with respect to its acquisition, representation, interpretation and consequence.
The model assumes the existence of a genetic pool of systems' resources available to the infant fron its moment of conception. Although a structural equivalence was not attempted, the proposed architecture is in keeping with known neuro-physiological constraints.
The model consists of a 3-tier memory system:
(Figure 1.1 presents an over view of the model).
Within this framework a set of innately acquired processes conjoin to produce adaptive behaviour and the growth of knowledge. These processes are:
The chosen representational schema is a Production Rule (Newell & Simon, 1972; Moran, 1973; Young, 1973; Newell et al, 1976; Klahr & Wallace, 1976) which is basically a Condition-Action pair. In this system, a Reflexive Rule constitutes the basic form of a rule, it being:
Conditional stimulus ⇒ <response>
and a more complex form of the Rule being:
Conditional stimulus set ⇒ <response set> Expected stimulus set
The reflexive rule portrays an S-R bonding, whilst the more complex form portrays an S-R-S bonding.
An activated rule generates a set of responses to be externally executed by the model, and a set of expectational stimuli which are written back into STM to serve as perceptual alerts.
The Learning mechanism, incorporated within the Cognitive Processes, serves to adapt existent Rules to accommodate newly acquired experiential information. It does so by employing two basic laws for extension and creation of new Rules. These are:
Thus the Production System is self extensible allowing for the constant generation of new representational schemata. The acquisition of knowledge proceeds, therefore, in an evolutionary manner, with each newly created Rule hearing symbols in common with its creator. The Reflexive Rules form the foundation of the system, but no S-R or R-S bonding is irreversible. Thus, there exists a variety of Rules, containing symbols in common with each other, redundancy being an essential part of learning.
The consequence of activating a Rule is, through the Learning Mechanism, the creation of a new, competitive Rule, it being a subtly altered copy of its father. There exists no need for change to occur within the system, other than for this reason alone, i.e. the arousal of a formerly quiescent memory culminating in a new near copy of itself.
The environmental confirmation of an internally generated expectation constitutes a solution to the Learning Mechanism.
By incorporating the process of Reinforcement, the Learning Mechanism is able to:
There also exist, within the learning Mechanism, refining algorithms which remove spurious symbols from a Rule thereby making it more specific in its definition, and which insert symbols into a Rule thereby making it more general in its definition.
The Reinforcement process also alters the status of activated Rules if they serve to enhance or diminish any need level within the system.
Thus, the Drive Process lends direction to the processes of learning by its alliance to the learning Mechanism. It also lends affect to Rules by incorporating internally generated Drive elements (known as primary stimuli which are emitted by the activation of a Drive) into newly created Rules.
The processes available to the model are believed to be functional equivalents of the processes available to the human infant during stage 1 of the sensori-motor period of development. Whether, at the commencement of each stage pre-formed processes should be made available, is debatable. For example, to propose internally generated deterministic links as opposed to expectational stimuli (which act as probable links due to their presence in STM), would be necessary if one wished to simulate simple skill-acquisition behaviour. These questions are discussed further in the latter parts of the thesis.
The model was programmed in Fortran and executed on the ICL 1906A computer under George IV at the Atlas Conputer Laboratory, and on an ICL 1903A under George III at Brunel University.
The experiments chosen for simulation were taken from live psychological experiments which were reported in sufficient detail for data to be extracted and results to be compared with those of the simulation runs. They were chosen to show the following features in the model:
It is hoped that this model will help bridge the gap between Cognitive Psychology and Computer Science.
The computer provides a sophisticated analogy to the human brain. A computer scientist may make use of the principles of computer science when he proposes a theory in any scientific field. Principles such as:
A computer model allows the experimenter to look inside his subject and discover why the model behaved as it did at any time. One may not do so with human subjects (even verbal recall being subject to psychological constraints) and hence may only conclude from externally observable responses.
This model may be used as a valuable psychological tool, the experimenter defining his own genetic processes and observing the responses of the system, or merely as an investigation into adaptive behavioural systems.
The human being at infancy is an extremely helpless and dependent life-form. He possesses, in varying degrees of maturity, the efferent and afferent systems of his species, but his use of them is seen to be uncoordinated, crude and generally lacking in control and precision.
How then does this seemingly naive organism acquire the complex behaviour patterns attributable to the average, human adult? Is it merely a matter of maturation, of practise, or is there a learning process involved?
Learning could simply be the acquisition of various sensory motor connections, i.e. cells in the sensory system acquiring linkages via cortical tissue to corresponding cells in the motor system. Or it could be attributed to a set of autonomous cortical mechanisms which operate upon sensory, cortical and motor cells and bring about specific co-ordinations such that a certain pattern of cortical activity, determined by the cells that are currently reverberating, control the behaviour of the individuals. Each cortical network, then acquires a meaning or some association with specific patterns of stimulus energy, the overall excitation determining the action that is resultant.
There are basically two types of learning to be observed in the human (Hebb, 1949). Early learning as seen in the infant, and later learning as in the adult.
Early learning is of the slow, continuous type and the foundation of it is the gradual acquisition in infancy, of sensori-motor connections. Sensori-motor intelligence, as defined by Piaget (Piaget, 1949), constituted complex structures or 'configurations' in the infants' sensori-motor intelligence, but far from being static and non-historical, they constitute 'schemata' which grow out of one another by means of successive differentiations and integrations, and he saw this type of learning as occupying the first two years of the infant's life and laying the foundations for the child's decreasing dependence on sensory stimulation.
Later learning is of the non-continuous type, involving insight (Insight may be defined as a sudden change in behaviour, or a conscious experience one may have. (Eureka ! I have it!) The operation of intelligence in finding a solution.) and occurring as a single jump (Hebb, 1949). Certainly., early learning may be seen in the mature adult but later learning has not been observable in the infant.
Learning, then, involves the use of experience if it results in the development of behaviour such that each succeeding form appears more complex, more general, more successful and more stabilised than the earlier form. It implies a growth in understanding such that abstract concepts may be applied to real world circumstances. It is a natural order of progression that brings the human mind into contact with the fruits of experience.
If one wishes to stipulate processes which bring about learning, then one must define what form these processes take and upon what fabric they operate so as to produce the phenomenon of learning.
One may propose mechanisms for the development of cognitive ability, as opposed to specifying pre-formed structures of understanding which become operational when provided with sufficient experiential information. These former may be called developmental rules or laws, cognitive or pre-cognitive processes, or learning mechanisms but they each attempt to explain the evolution of behaviour by postulating innate processes which serve to structure information such that understanding of what this information conveys is ever amplified and made certain.
Such mechanisms must involve co-ordinating or associating processes of some kind, including various elements: sensory, drive and motor, such that they may appear in various combinations (Adcock, 1964).
The first, and the best documented type of learning is that achieved through Associational Processes. In particular, when stimulus patterns appear together (temporally) consistently they tend to become associated such that the response elicited by one may be transferred to the other. Pavlov (1927) and Skinner (1953) investigated the learning achieved through stimuli presented to the organism within short time intervals of each other such that they appear to be temporally contiguous. The law states, in general terms, that if two events occur close together in time, the organism will act as if the first event was somehow associated with the second (Scott, 1968). Thus the properties of any stimuli comprising the one configuration are transferred to the other. Siqueland and Lipsitt (Siqueland & Lipsitt, 1966) have shown that stimulus to stimulus transfer is possible in the human neonate, but only if the experiments are clustered, i.e. all the trials being concluded within a period of hours rather than days.
Secondly, the rapidity and duration of the association is dependent on the number of repetitions. This may be called the law of habit or strength (Scott, 1968). Thus, if the same events are made contiguous a sufficient number of times, the habit may be made relatively permanent, though this is dependent on the age of the subject.
The strength of the association may also be affected, beneficially or adversely, by reinforcement if a reward is made contingent upon the occurrence of a particular response, the rate of emergence and stabilisation of that response and its associations may be considerably influenced. If the reinforcement is pleasurable or associated with the satiation of some primitive need (such as the mother's face or food) then the association may be strengthened. If the reinforcement is frightening or painful (sudden loud noise or an electric shock) then the response may become actively inhibited or it may become aversive. Similarly, the organism may also learn to do nothing in particular situations (autistic) if the contingent reinforcement was negative.
The more learned an association becomes, then, obviously, the more difficult it becomes to break the habit. Thus in extinguishing trials (where the conditioned response is to be gradually decreased) the faster the conditioning, the slower will be the extinguishing sequence.
Further association results in the freezing of habituation. When a stimulus has become conditioned to, if it then is consistently repeated, the organism shows no decrement in response or attention. Cessation of attendance, or habituation, occurs usually with repeated patterns of stimulation, presumably when the subject becomes bored, and the stimulus gradually loses its signalling power.
(Due to repeated exposure of a constant stimulus (s1), some central representation or schema of s1 comes to be established. Attentive behaviour declines so long as there is a match between the external event and the internal representation. However, if a novel stimulus is introduced, expectation is violated, and so that the new features be assimilated, there is an increase in attentive behaviour (McGurk, 1974).)
This leads to another type of mechanism for learning, namely, that which is associated with expectation or a self-generated prediction. If two events are repeatedly observed to be temporally consecutive, then the occurrence of the former sets up an expectation for the occurrence of the latter,i.e. it acts as a signal for the subsequent event. The setting up of such expectations enables the organism to set up a whole sequence of temporally related structures, so that if the associations be well-formed, they may act as a strategy - a sequence of actions initiated by some recognisable signal. Hebb (Hebb, 1949) referred to these as phase-sequences and stated that prompt learning is possible when the stimulation sets off well-organised phase-sequences but not otherwise. Later learning, of the sudden-jump type, depended, therefore, on the setting up of such well-organised phase-sequences during early learning, and involved the interaction of two or perhaps three organised activities (Hebb, 1949). This was seen as learning to learn (Holland, 1974; Scott, 1968) by the formation of useful learning sets - comprising useful habits rather than learning not to learn which comprises a series of inactive habits. Piaget's (Piaget, 1949; Kessen, 1971) learning mechanisms were also based on the subject's expectation and constitute structural re-arrangements contingent upon perturbations. An event was perturbing only if it perturbs the domain within which it exists and Piaget stated that the subject looks neither at what is too familiar, because he is in a way surfeited with it, nor at what is too new because this does not correspond to anything in his schema, (Piaget, 1953). In his theory, learning could only occur if perturbation took place and such a perturbation had to be, in some way, optional. Then new structures were created, through the accommodating process by successive differentiation and integration of structures already in existence. There is considerable evidence that avoidance behaviour can be elicited in animals confronted by novel objects (Fiske & Maddi, 1961). Behaviour is instigated by the degree of incongruity between sensory input and some standard within the organism representing information already coded and stored within the brain ... if the discrepancy between input and centrally present storage is too great, the organism finds itself unable to deal with the situation and is impelled to flee. (Schaffer & Emerson, 1964) The experiences that the organism has encountered and learned about define his subsequent behaviour patterns. Hunt (Hunt, 1960) suggests that infants exposed to a large number of environmental objects do not usually display fear of strangers.
In the previous section, specific learning mechanisms were discussed which involved connections of some sort being produced between sensory, drive and motor cells. However, learning may be ascribed to the organisational processes occurring as a result of fluctuating mental activity. Sokolov (Sokolov, 1963) proposed a set of autonomous centralised responses which superimposed organisation upon the neural structures. These were known as the Generalized and Localized Orientation Response (OR) and the Defence Response (DR). The generalized OR was characterised by an overall rise of neural activity (a state of excitation) and a localized OR was characterised by the excitation of that part of the CNS related to a particular sensory organ. The generalized OR was relatively lengthy, lasting from a few seconds to many minutes, had a long latent period (of up to 30 seconds), was rapidly habituated and recovered slowly over a period of hours or days. The localized OR was relatively short lived (of up to 10-15 seconds), had a short latency, was very resistant to habituation and recovered in a few minutes. In the generalized OR, a repetitive stimulus often caused sleepiness. This could be due to the overall inhibition of attention through habituation, this general inactivity leading to sleep. The OR, though an autonomous reflex, was dependent in its strength and duration upon certain properties of the stimulus: intensity, duration and signal value. (Conditioned stimuli have high signal value.) A conditioned stimulus elicits a very strong OR thereby inhibiting habituation to the conditioned stimulus. If an habituated stimulus is made into a conditioned stimulus, it may again elicit a strong OR (Lynn, 1966). However the conditioned stimulus, if it had been negatively reinforced, can also elicit a very strong DR. An OR may also change into a DR if the eliciting stimulus is allowed to recur a number of times, after habituation sets in. The DR lowers the excitation in the CNS, thereby reducing the powers of attention of the organism. The DR ensures that the positive feedback effect of an OR (causing increasing excitation) may be brought under control thereby leading the system back to.a state of equilibrium. The DR, however, though brought about by habituation, cannot, itself be habituated and its effect is to restore and preserve equilibrium, not to depress it (Berlyne, 1960).
When the CNS is alerted by an OR, its ability to attend, or focus its attention is greatly increased, due to the state of heightened awareness. A stimulus therefore, has a lower threshold for entry (its impact energy is increased) and hence can be assimilated quicker, and more readily. However, once it is thoroughly assimilated, it loses its powers of excitation, which required the organism to learn about it. In a DR, learning has already taken place, and the remembered past experience has acquired connotations which require it to be ignored.
Sokolov's postulation of autonomous central mechanisms bringing about learning by imposing sensitivity, selectivity and ordering upon the sensory set and by imposing organisation upon the neural structures, serves to dilute the anthropomorphic qualities of learning mechanisms. The activities that control the attentional processes therefore, are a combination of the sensory set and the state of excitation of the organism, making attention neither mystical, animistic, nor yet undefinable. Hebb (Hebb, 1949) refers to them as the autonomous control processes controlling behaviour without being part of the current afferent excitation, and being part of the genetic make-up of the organism.
They are, undeniably, hypothetical mechanisms but have no flavour of animism and hence are acceptable and respectable (Hebb, 1949).
Having discussed specific processes for learning, whether mechanistic or autonomous, one may now question whether any such operational processes exist, acting upon stored information in such a way as to produce motivated behaviour in infants?
In adult behaviour, a strategy, or the execution of a series of planned moves, may easily be observed as being the result of learning and experience. A desired outcome (Miller, Galanter & Pribram, 1960) may be defined and accomplished through intention which is definable as the uncompleted part of a plan undergoing execution. Such goal-seeking behaviour does not seem incongruous in an average adult. But can infants understand, let alone compute, a value for the discrepancy between his current state and the state being tested for (Woodward, 1971)?
Some alternative explanation seems necessary at the onset, which through practice leads to the selection and execution of strategies and such goal-oriented behaviours.
The newborn infant, in addition to responses which may be elicited by specific patterns of environmental stimulation, possesses also a number of gross, undirected movements in his behavioural repertoire. The alert infant usually indulges in such undirected responses (senseless thrashings of infants) and during the course of his indulgencies, may happen, accidentally, to brush an object with his palm. This causes the elicitation of a grasp reflex such that the object may be retained in his fist for some short time. Or again, he may accidentally bring his finger near his mouth causing a rooting reflex which may end in his finger inside his mouth which again elicits another reflex - the sucking reflex. Through such experiences, he learns to associate his bodily movements with certain patterns of re-afferent stimulation. It may be some months before he has the motor control required to maintain his grasp, or to keep his finger within his mouth to be sucked, but the sensori-motor connections exist and if exercised, he can grasp the object and retain it in his fist as he had done, accidentally in the past. It is through the learning in connection with undirected movements that the infant acts upon his environment and produces changes in it (Woodward, 1971). This is different from the later trial-and-error behaviour of humans where a number of responses are consciously attempted, until at last one produces a successful outcome. Repetitions of this action, producing the same success, serve to reinforce it thereby enabling the individual to acquire a stabilised, successful strategy. The learning in the infant is not the formation of a conscious strategy, but the associations produced through accidental, undirected movements causing environmental alterations which impress upon the infant. It is through the co-ordinations of reflex exercises with chance, undirected movements, that the infant is able to evoke the stimulus himself which causes activation of the reflex exercise. In learning this co-ordination, he becomes no longer dependent on chance contacts but has the sensori-motor connections necessary to bring about further co-ordinations leading to more complex action sequences.
As for instance, if his finger is brought near his mouth, and is accidentally sucked whilst he is hungry, it may serve to briefly allay his hunger and derive temporary pleasure. What at first was an undirected movement, may through such reinforcement, become directed. Thus when the infant is hungry again, he may remember the pleasure of finger-sucking and endeavour to repeat it. Piaget (Piaget, 1953) conjectures that the infant may even learn to direct his finger to his mouth by means of cues such as chin, bedclothes etc; akin somewhat to rats learning paths in a maze.
All knowledge of the external world is conveyed to the organism through its sensory organs. The beginnings of cognition lie in the sensations which are conveyed to the central cognitive processes. It has been suggested by psychologists that the initial sensational data is transformed prior to operative thought. This transformation is seen as being effected by the Perceptual Process. Thus, when two objects of unequal size are perceived, such that the larger one is seen as the smaller, the perceptual process may be operated upon by logical thought such that the optical illusion is corrected. This correction of sensory data was speculated upon by Helmholtz (Helmholtz, 1925) who suggested that such phenomena were indicative of discrepancies between the sensory data and the cognitive construct. The implication is that somehow the organism obtains more data upon the world than is conveyed by its sensory organs (Gibson & Gibson, 1955). This difference has become attributed to the perceptions and leads to the theory that the finished product of sensation, is something other than sensation - which has become known as a percept - Helmholtz attributed this intervention to unconscious inference which in effect uses past experience to interpret sensory data. He believed that intellectual knowledge modified the sensory data by transforming them into percepts. Hering (Kohler, 1929) attributed such transformations to innate neural organisation, not activities of the intellect. The structuring of the sensory data was effected by intrinsic physiological mechanisms that spontaneously differentiated the products of sensation so as to lend them meaning and content.
It could also be argued (Gibson & Gibson, 1955) that all the information is available in the sensory information by way of variations, shadings and subtleties of energy (Gibson & Gibson, 1955). Thus, there need basically be no difference between sensation and the finished percept.
When regarding the communication process between a perceiving organism and its environment, the organism need not be thought of as a passive recipient of information, such as a tape-recorder. Indeed, in this interaction the subject is active. His action is not a reflexive (that is, passive) response to the environment by virtue of the operation of some instinctual program (Beilin, 1971). Rather, that portion of the knowledge which is attended to, is determined in part by the nature and status of the organism at the time he receives the data. The sensations are the source of knowledge, and the initial synthesis of this knowledge leads to the construction of the percept - the perceived item. Percepts are then acted upon to yield those cognitive structures, which, if not already present within the cognitive framework, may be incorporated into it, so as to represent new items of information. These structures are assimilated into the existent cognitive structures (Beilin, 1971).
Perception need not be thought of as a prior or different process to cognition. Piaget, in fact, denies the differentiation between perceptual and cognitive knowledge and introduces cognition into perceptual activity, Although he does not deny the existence of perceptual activity, he sees that such activity does not lead to knowledge construction without the intervention of operativity (Furth, 1969; Beilin, 1971), i.e. perception merely becomes one aspect of the process by which cognitive knowledge develops. Cognitive knowledge is derived from the organisations of sensory data performed by the perceptual process. Perceptual data, in itself, cannot lend any new knowledge to the organism without being incorporated into its current knowledge structures. Piaget holds that correction of optical illusions cannot be carried out purely by the perceptions alone but need the intervention of cognition to do so. Perception, then, is an activity which produces knowledge when reason, thought or some understanding process enters into it - or as Piaget asserts, when it is assimilated into the existent structures.
Further, Piaget states that whilst perceptual knowledge is probabilistic in nature, cognitive knowledge deals with firmly held concepts. Bower (Bower, 1967) saw perceptual existence as being qualitatively different from conceptual existence and asked if conceptual existence developed from perceptual existence. One of Bower's findings was that infants of 7-8 weeks old responded to objects in their absence if the absence was for short durations, and when the speed of disappearance of the object was slow, showing that some kind of existancy belief was present already.
However, what kind of activity is perceptual activity? Is perception a creative process or is it a discriminative process? (Gibson & Gibson, 1955)
The notion that perception is basically a constructive act rather than a receptive or purely analytic one is quite old. (Neisser, 1967). With reference in particular to hallucinations and illusions, a man who sees things that are not present must be constructing them for himself (Neisser, 1967). Neisser argues that perceptual activity must involve categorisation on the basis of past experience. They construct chunks out of the raw material of sensation, out of which the central cognitive processes may synthesise different products. Such categorisations may be on the basis of innate recognition of stimulus wholes (chunks) or purely based on what has been previously experienced. Neisser argues, in fact, that Perhaps we experience familiarity to the extent that the present act of visual synthesis is identical to an early one. Thus the act of synthesising sensory data is seen as having a physiological trace, the memory of the trace (the process of construction) being revived on the same features being again in the sensory data. This precludes mere template-matching for recognition to occur, but rather the matching of the process used for recognition itself.
However, Neisser talks of the crude, holistic and parallel primary processes and the deliberate manipulation of information in the secondary processes.
By posing the existence of some sort of physiological tracing, be it the trace of a perceptual construction or a memory trace, which is caused by past experiences, the arrival of new sensations serves to arouse these memories which have associated (or similar) sensory elements, thereby enriching the sensations. As a result the memories concerned accrue further associations (linkages) giving rise to more complex and diversified experiences, (Postman, 1955). Consequently it may often be rather hard to say how much of one's apperceptions as derived by the sense ... is due directly to sensation, and how much of them, on the other hand is due to experience ... (Helmholtz, 1925).
In this aspect of lending itself to new experiences, perception may be seen as constructive - a construct of the interplay of sensations and of experience. It is discriminative, in as much as it discriminates within the sensory data currently available. It leads to directed movements on the part of the organism (Woodward,1971). Von Hoist (Gyr,1966), in fact, sees perception as the end product of a comparison or summation process between efferent and afferent signals.
Perceptual activity serves to structure sensory elements into wholes such that as the infant is exposed to more and more sensory input, he gradually begins responding to these wholes in his environment. The adult when supplied with only a few sensory clues is able to identify the whole of the object. He uses his past experience to construct the object required. Miller (Miller, 1956) expressed this as the ability to chunk information such that individual sensory elements are structured into more and more complex wholes. Perceptual learning involves the ability to bind bits of information into a system such that representation for complex patterns is made possible (Adcock, 1964).
Development may also be attributed to the physical maturation of the infants' sensory system. Thus as his receptors develop, what they are able to transmit becomes wealthier in detail. Increasing motor control further enables him to adjust his receptors so as to increase or decrease stimulation as required.
On the premise that from the onset sensory information contains all the necessary details, perceptual learning would be a matter of deriving further cues. Thus, an individual, who initially performs the same response for a wide range of objects, will, through learning, increase the specificity of his identifying response.
Piaget (Piaget, 1949) asserts that perceptual constancies of shape and size are learned by development of perceptual activity. Reisen (Reisen, 1968) maintains that the development of form and pattern vision involves learning via stimulus contiguities.
The acquisition and development of knowledge, experience and forms of understanding would be meaningless unless they constituted some means to an end. (Aristotle's claim is that all things may be understood only in terms of their end or telos) The end may be said to be cognition or the process of thought resulting in further thought, in externalised responses and in the further development of the structures of thought themselves. The representation of knowledge in the form of cognitive structures or schema (Piaget, 1929) is merely the carriage; motivation or cognitive drive giving direction to the carriage,and thinking, giving it velocity and purpose. Cognition without reference to cognitive drive and cognitive constructs would be as meaningless as cognitive drive and representation without cognition. They are inseparable, mutually inclusive and constitute all which characterises the human's thinking abilities.
The fundamental unit of cognition has long been known as the concept. What then is a concept? A concept of x is an understanding of the purpose and being of x. (Toulmin, 1971; Hamlyn, 1971) We say someone has the concept of x if by his behaviour he shows that he knows what x is for; what it leads to; and everything else that may stand as being x or similar to x. An adult may show the concept of x as formally representable in words.
The learning process may be visualised as a set of necessary steps towards the formalised use of ones sets of concepts; towards the further acquisition of concepts; and towards the modifications of the currently existent concepts.
Knowing what a concept is, then, leads one to the question of how it may be formed. With regard to the acquisition of knowledge by the infant, it cannot be seen as the means of attaining that inner goal - a concept. He progresses quite contentedly and at some time he may find that his current beliefs are insufficient for explaining some facet of his environment. He may then be motivated toward altering one or many of his beliefs so as to incorporate this new feature. His steady progress may in fact turn out to be a series of discontinuous stages, where he undertakes among other things, the occasional pursuit of what may .... turn out to be a blind alley which leads to satisfying discoveries (Hamlyn, 1971).
In Section 2.2 perception and the perceptual structures were portrayed as constituting the beginnings for intellectual activity. Intelligence or intellectual operations (Cognition pertains to the activity of intellectual operations upon experiential data. Intelligence may be viewed as a qualitative description of cognition.) act upon the structures of perception to create concepts (Hamlyn, 1971). Piaget (Piaget, 1969) speaks of the prefiguring of intellect by perception, due to their sharing sensory-motor roots. The percepts formed in the sensori-motor period preceding the commencement of true intellectual operations define the substance upon which intellect may have its beginnings (Piaget, 1953). As concepts are developed from the perceptual fabric, new perceptual data is constantly being assimilated into the developing network, thereby defining that the interaction of perceptual and cognitive activity constitutes the cognitive capacity at all times. To put it in another way, the perceptual set (what is perceived) must always determine the direction of thought. Again, as Piaget says, there are no definable boundaries between the properties of the assimilated object and the structures of the assimilating subject (Piaget, 1969).
Piaget viewed cognitive development as the successive differentiation and integration of cognitive schemata. Such accommodational processes were executed on the principle of least effort and least cost. But although he verges on defining these processes in executional detail, the content of his perceptual and cognitive structures remain rather vague.
Minsky, in his frame theory of cognitive structures, intimated that these frames held experiential content and were joined to other frames through terminal connections (Minsky, 1975). He proposed a network of frames as constituting the knowledge base but here too the precise content of a structure remained vague. Chomsky (Malcolm, 1971) in proposing that the use of language is innovative .... potentially infinite in scope .... free from control of detectable stimuli .... appropriate to a situation implies that whilst structures are far beyond the stimulus-response configurations suggested by the Behaviourist School, yet are some elicited by the appropriate environmental circumstance? That ideas owe something to verbalisation is not an uncommon theory. But the pre-verbal infant still is able to form some belief structures with regard to his world, whatever their content may be. May such pre-verbal constructs be thought of as concepts? Urmson, with reference to linguamorphic processes states that recognising may be equated with naming on sight (Toulmin, 1971), i.e. performing a verbal response. Infants, however, respond initially with purely bodily responses, and yet show some recognition of an object. The question with reference to the nature of the concept, then, is :
With reference to the content of a concept, this question becomes further complicated when viewing the fabric of the brain itself. How may an idea be constructed from a set of neurons, axons, dendrites, synapses, etc? Hebb attempted to answer this with his theory of behavioural organisation (Hebb, 1949). Cells (individual neurons or neuron clusters) became associated with each other on a temporal basis - the repetitions of certain stimulus patterns leading to cell-assemblies being formed. Assemblies, once formed, tended to reverberate together. Assemblies could have conceptual linkages when the ideas represented by a cell in one is contained in another. Thus, though temporal contiguity led to the construction of assemblies, the assemblies themselves, when sufficiently well-formed, became concepts (representing a particular idea) and tended to form conceptual linkages with other assemblies.
Concept formation, then, may be theorised with reference to a neural model though it is difficult to see how the higher order logical operations may be brought about. The physical content of a structure, acquires, somehow, ideas with regard to environmental objects and events. Whether this is through the attachment (association) of a verbal response, or that prior to this process, concepts may still be held, is debatable.
Having discussed the nature and formation of a concept, the problem now faced is what constitutes conceptual development? It was proposed that the operations of intellectual or logical processes of cognition upon the perceptual data produced a concept. A theory of cognitive development would be required to answer the questions:
Cognitive developmental theories must imply a continuously adaptive process occurring within the existent cognitive structures. Is the adaptive process directed towards any end (minimise inhibition, maximise excitation and such like) or is it a process occurring due to the interactions of perceptual data and cognitive processes (such as hydrogen and oxygen atoms combining under particular values of pressure and temperature to yield water - an automatic process constituting structural changes in both)?
The externalised behaviour with regard to cognitive development is the subject's ability to:
For Piaget that which changes during the genesis of knowledge is the relation between the knowing object and the object known (Beilin, 1971). His genetic theory of knowledge acquisition holds that knowledge is constructed out of the intrinsically given structures/ interacting with currently obtained environmental information.
To Freud, all cognition was based on impulse. A mode of thought was, from the onset, employed in the service of gratifying the internally instinctual drives. The early hallucinatory behaviour of infants was, therefore, seen as the beginnings of thought (Kessen, 1971). Maturational changes occurring in the instinctual drives led to progressive adaptation of responses such that any conflict caused by the changed drives could be resolved. Acceptable satisfaction constituted gratification such that the drive impulse was momentarily chilled. Development occurred as a result of conflict and the maturing mental drives interacting with environmental information constituted the cause for conflict. The demand for change in psychoanalytic theory was caused by the imbalance between the need of an innate drive to be fulfilled and an environment which did not fulfill this need unless a change in the response and in the mental knowledge set was made. This imbalance constituted the theory of disequilibrium and cognitive development occurred as a result of cognitive disequilibrium. Thus cognitive development was shaped, of necessity, around the nature of the world.
Piaget's theory of cognitive disequilibrium was based, also, on a set of affects or drives. However, affects were formed around known structures and did not constitute an instinctual need. In all behaviour the motives and energising dynamisms reveal affectivity, while the techniques and adjustments of the means employed constitute the cognitive sensory-motor or rational aspect (Piaget, 1969). Affectivity cannot, however, lead to the creation of new cognitive structures and this is the essential difference between his theory and psychoanalytic theories of cognitive disequilibrium.
The need for change, in Piaget's eyes, was a perturbing event, i.e. an event which puts a current schema (retrieved for application to the situation) into a condition of disequilibrium. This, then, is the motivation for structural change. The infant may in these circumstances assimilate the event into a current schemata, he may alter the schemata through the process of accommodation or he may not effect any action at all. Piaget's theory gives development as being a progressive and discontinuous structuration .... of behaviour (Kessen, 1971), and therefore inevitable and non-motivated. However there are some difficulties as to how all organisms may identify perturbing features as belonging to a specific domain, and how features come to exist in the same domain. Further, Piaget states that an assessment is made of gain or loss - by the infant? He talks of the reinforcement of the feeling of ones own power - can we really talk about an infant having a feeling of power (Kessen, 1971)?
Piaget also attributes the notion of value to an action. The infant, if he feels a resistance to a particular activity, will evaluate the object (believed to cause the resistance) highly in his need to overcome the object.
The question of the organism recognising the need for change is a non-existent concept. Accommodation is an automatic process instigated by environmental circumstances causing cognitive perturbations. Assimilation is an automatic process instigated by a feature needing recognition and incorporation within cognitive structures.
Hebb (Hebb, 1949) similarly used those assemblies (cell associations) set up in the process of perceiving an object, to lead to concept formation. The concept of an object .... is an irregular cycle, each phase of which is the activity of a cerebral cell-assembly. If a large enough part .... is aroused, the whole becomes active. One-trial learning occurred as a result of a number of these cycles becoming associated, and associations could come about independently of the two events (constituting the concepts) ever being temporally contiguous. The need for alterations in structures (or in formations of new linkages) resulted from the spontaneous activity of cells, and repetitious activities leading to permanent changes. The alterations were inherent in the pattern of the perceptual activity, and not brought about by conscious or unconscious agents.
Hebb's model is a step away from the structures and processes usually associated with cognitive functioning. There are no spirits in his machine and his theory lacks the usual flavour of anthropomorphism.
Most schools of Psychology accept that there are some innate structures (the idea of process being inherent in the word structure within this connotation) which lend themselves to the process of development.
What these structures constitute and how deeply they may affect development has been a philosophical point of debate for some time.
The Behavioural School, though being deeply environmentally orientated, still accept that certain associations may be innate (reflexive structures) and that the process of effecting links between stimulus and response sets is also innate. Learning took place as the result of subject-object interactions, the innate recognition of stimulus strength, signal value, repetitions causing linkages between sensations and response mechanisms. Habit formation as a result of stimulus properties and temporal associations within sets of stimulation, resulted in the entire range of complex human behaviour. Empirical evidence for such a claim, being obtained from the performance of lower-order animals in highly artificial and controlled conditions.
Gestalt Psychologists rely heavily on innate laws of organisation which serve to bind sensory elements into whole configurations. The pattern impressed upon a field consists of the simplest possible one that can best express the structure of the field, and this simplest one being also the best equilibrated. These laws are seen as being independent of development and underlie the entire range of cognitive development. Perceptual structures are therefore the same in the infant as in the adult. Further, Gestalt Psychology proposes that past experience does not lend meaning to structures such that the reasoning processes for interpreting current events are relatively uninfluenced by past knowledge. But it does impose organisation upon perceptions, this being the only effect it does possess.
In contrast, Genetic Psychologists place the role of innate structures highly in the development of cognition. If higher structures arise through transformation of the innate structures, then the instinctive pattern is not interesting just as an original concatenation of elements. More important is the underlying structures which will reappear in the transform (Taylor, 1971). The concept of innateness may also arise in the form of pre-cognitive processes made available to the species from infancy - as in Piaget's theory, the processes of assimilation and accommodation - the both tending to equilibration. The role of assimiliation is to ensure that what is incorporated into the knowledge set is a function of what is already known (or in the Reappearance Hypothesis (Neisser, 1967) what is remembered is a function of what is known). The innate, therefore, is forever combining with the new experiential information to lead to structural alterations in the knowledge base.
Some fundamental questions are:
Cognition is visualised as a process of developing the relatively amorphous structures of the brain into crystaline forms, such that ordering and precision become apparent in the knowledge structures.
The perceptions act upon sensory information to produce perceptual information. Perceptual Learning involves the enrichment of information, the formation of wholes, the formation of specific responses to previously indiscriminable configurations, the ability to form perceptual strategies such that the range of stimulation may be substantially increased.
Perception develops hand in hand with cognition; The simple measurement of illusions show the existence of modifications with age that would be inexplicable without a close affinity between perception and intellectual activity (Piaget,1949).
The need for development has been expressed in differing ways by the different schools of thought. The major reason given was that of resolving disequilibrium between what was perceived and what was known through previous apperceptions.
Developmentary stages imply the unfolding of pre-formed processes through maturation, or a gradual accumulation of knowledge such that new abilities become operative at particular points dependent on the level of acquired knowledge.
The production of an intelligent artifact has long been one of the aims of Artificial Intelligence workers. The sub aim, therefore, is to plumb the secrets of the human mind to discover the reason for the individualistic, adaptive behaviour that the human performs so easily. For this, we need to delve into the seeming complexity and sophistication of human cognition and to emerge with some theories as to the origin and development of human, intelligent behaviour.
So that one may point at his creation and say Here is my Intelligent Artifact!, some definitions of what intelligence constitutes is required.
Turing (Turing, 1950) bypassed the need to precisely define intelligence as a concept, and suggested alternatives which would serve as criteria for detecting the presence of some minimal level of intelligence. This has come to be known as Turing's Test, wherein:
Three people, A, B and C are involved in a game. C plays the interrogator who is in communication with, but not in view of A and B (via a teletypewriter, say). By way of questioning A and B, C must determine their identity. If for one of A or B an artifact is substituted, and it succeeds in convincing C of its humanity, then it may be deemed intelligent.
In order to pass such a test, the artifact must possess input, output and mediating channels of a sufficiently high level of complexity, so as to be able to process natural language input in terms of context, and respond, again in natural language, within a sufficiently human-like response time.
Today it may be possible for certain systems to pass this test, if the subject matter they were to be questioned on was to be kept fairly specific. Colby's Parry (Colby et al, 1971) may satisfy C as being a paranoeac (even if C was to be a psychologist). Certain chess playing programs (Greenblatt et al, 1967) may satisfy county chess champions as to their ability to play by beating them. Certain theorem proving systems (Gelenter et al, 1959) may convince C as to their human-student identity, etc. The field of Artificial Intelligence has certainly begat fairly complex learning systems but most of these are special purpose, make no claims to being human-like and lack the adaptability that characterises human performance.
Parry, for instance, would suffer in its knowledge and assessments of the current economic crisis. (However, so would most average human beings - a hallmark of the level of Artificial Intelligence criteria on intelligence, has been the exceptionally high judgement of average human intelligence.)
Although the major concern in Artificial Intelligence research has been to produce intelligent artifacts, not all their creators have claimed that they do so by modelling the human framework - hence the term, possibly, Artificial intelligence.
The major concern of this thesis, though, is the modelling of psychological processes as is believed to occur in the human and the representation of human thought processes in the form of computer programs. However, it is proposed that learning systems cannot truly portray intelligence unless they are able to adapt their behaviour to differing environments. As such, many famous models of complex human behaviour in particular situations should be excluded as being not of the family Intelligent Learning Systems.
Any system that can obtain information about its environment, and use it to evolve those behaviour patterns which enable its survival and need-gratification within any environment, may be classified as a learning system.
The acquisition and use of knowledge about unknown sequential environments has been studied in the artificial intelligence and control literature as the problem of 'learning machines'. (Gaines, 1976).
In order that the machine may learn from its information intake, it must have some method of internal storage, retrieval upon recognition and structural change such that it may remember past experience, revive its memories as a function of current experience and change its ideas if needs so ordain.
A Learning machine .... is any device whose actions are influenced by past experience (Nillson, 1965).
This implies some form of thinking in the machine (or some such similar process - retention, recognition, revival and re-structuring). This vital question can a machine think? (Feigenbaum and Feldman, 1963) has given rise to intensive study into the general topic of learning machines.
The mass production of such machines has been grossly hampered by many restraints: such as the general non-availability of sensory organs capable of performing similarly to their biological counterparts; such as the immense storage requirements needed to emulate the storage of data by the brain.
Such restraints have led to investigations into specific problem areas (areas that perhaps the brain does not have to cope with due to its superior, evolutionary status) in the hope that the restraining fetters will be unloosed and a learning machine will eventually result.
Some of these fields of research have proved fruitful and others have not. However, they all have contributed knowledge so that what we do not know about the human mind, albeit still disappointingly vast, is becoming gradually less.
The ways in which data may be abstracted from the enormous amount of environmental information that is available at any instance. This is largely the problem of formulating some method of identifying regularity, similarity and degree of import within the data so that recognition of figure from ground, patterns etc becomes possible.
As Andrew stated In any non-trivial application of a learning system the number of discriminable configurations of the input signals is likely to be literally astronomical (Andrew, 1959). Applying restrictions to the input would be to imply some calculation being performed based on a hedony measure (a measure of the degree of goal-achievement .... which precisely reflects the merit of the current mode of operation (Andrew, 1959)) such that parameters in the mathematical function relating input to ouput variables may be dynamically adjusted.
In order that the machine be motivated towards achieving some definable outcome, it must be provided with some end-points or goals. In human problem solving tasks, the goal is always well-defined. Well-defined goals may be further reduced to a series of sub-goals, which if attained in order of generation, leads one to the eventual goal. Behaviour, in the machine, is always directed, the task of the machine being to evolve a strategy that may lead it to its goal - the goodness of a particular strategy being computed with respect to the effort involved, the time taken to achieve the goal and how well the goal is achieved. The selection of a strategy at any point depends further on the amount of learning that has been achieved - learning can effect changes in the sub-goal hierarchy, serving to re-order or modify sub-goals in the light of new knowledge. It is necessary that the system has criteria for generating, destroying or modifying sub-goals such that policy changes may be effected at any point. There must also be some measure as to the distance of a sub-goal from its main goal such that certain sub-goals may be achieved more easily than others. However, ill-defined problems are of a different nature altogether and in the case of the infant, is there a problem at all to be recognised, be it well or ill defined? Klopf (Klofp, 1975) gives a possible alternative to explicit goal-identification. He sees each neuron as seeking to maximise the amount of excitation and minimise the amount of inhibition that it is receiving. The system has, therefore, a goal implicit in the foundations of its structure and functions, serving to effect overall goal-directed behaviour.
It has long been assumed that the human possesses the, ability to construct ideas from the material of sensation (Helmholtz, 1925), and that these ideas, when well-formed, may function separately through the operation of mental activity, such that what is perceived is always influenced by the ideas associated with what has been sensed. It is the ideas that hold knowledge and when activated yield that knowledge for the subject's use. Knowledge comes not from sensation itself, but from relations among ideas that associate the elements of sensation(Beilin, 1971). Knowledge, then, is the product of cognitive activity upon the representational structures. In order to model cognition and cognitive activity, we need to propose a form for these representational structures that define their content and the mode in which they may be operated upon. Piaget (Piaget, 1949, 1953) proposed a schema as the representational structure. A schema was a well defined sequence of physical or mental actions (Beard, 1969) and various forms of these were proposed, the most basic (least complex) form being the sensori-motor schema which was simultaneously perceptual and motor (Beard, 1969). A schema was used for Template matching where, if one possessed a schema, it could be used for matching against the currently obtained experience.
In terms of an information processing system (the human being one) we need an image, which is simply any stored information that is sufficient in kind and amount to enable the response term to be generated and an index, any stored information sufficient to get from the stimulus to the stored image (Simon, 1972). In EPAM, for instance, the representational structure was a pair of S-R elements which could be interpreted as if S then do R. But since a stimulus does always give a deterministic response, the output being defined by the current context of the stimulus appearance, some more general representation is needed of the type R(A,B) which defines a two-termed relation. It may be used to represent, for example, a board position in the game of tic-tac-toe. Thus Ai(Pj, Vij) would read Attribute i of position j has the value Vij as used by Williams (Simon, 1972).
Becker (Becker, 197O, 1972) visualised a schema for representing experiential information. A schema constituted the sequence:
[ Event1 → series of actions ⇒ Event2]
and read: if on receiving stimuli corresponding to the known Event1, then perform the given series of actions in that order and Event2 may then be expected to occur. Newell and Simon (Newell & Simon, 1972) used a linear Production System composed of a number of Production Rules. Each was of the form:
Condition ⇒ Action
the Condition being the stimulus set defining a given environmental condition and the Action being the series of actions associated with that event. Minsky, (Minsky, 1975) proposed a frame which he portrayed as a data structure for representing a stero-typed situation. He suggested that each frame be a network of nodes and relations such that all information pertaining to that frame was held within it and represented a concept.
On a lower level of representation, Hebb (Hebb, 1949) saw the neuron as the basic unit for holding information, which information could be obtained by stimulating each neuron (or a cluster which he called a cell) to the point of firing - releasing energy along its output links. Similarly, McCulloch and Pitts (McCulloch & Pitts, 1949) had the logical neuron, where each neuron had a specified threshold for activation, that threshold being attained (by input energy) firing the neuron as an all-or-nothing activity. Cunningham (Cunningham, 1972) used also the Hebbian structure but called it an element (more like an assembly of cells) which, again, was activated by energy along its input links and transferred such activity along its output links to connected elements.
Fogel, Owens and Walsh (Fogel et al, 1966 ) represented the entire memory as a finite-state machine composed of different states, each state transform defining a mode of activity. However, to represent every state transformation that is possible in the human brain, and to define every type of input activity as a function of each and every state, would require an enormous state-space, and this, perhaps, is one of the constraints upon using finite-state machines. Ashby (Ashby, 1947, 1952) portrayed memory as an enormous matrix, such that given a known input pattern, there occurred a mutual interaction between the input and the matrix, which transformed the matrix to some form, slightly different from but similar to its form before the transformation. Ashby stated that all sufficiently large systems will become filled with self-reproducing forms, i.e. they tend to preserve themselves by acquiring a cycle of reproduction which produced forms similar to the original. This implies that, considering the brain, which is a very large system, even if it be assembled partly at random, would still tend towards understandable behaviour due to this ability to generate a large number of near self-reproducing forms.
Models of human thinking have proliferated in the field of psychology and now in the field of Artificial Intelligence. The advent of the electronic computer meant that such models could now be programmed and executed on the computer so as to simulate thought processing. However, it is more than just a tool for simulation. It served also to crystallise previously vague hypotheses by forcing upon the model builder, the necessity to define the representational data-structures, data for input, those routines which were to simulate human cognition and the learning algorithms. It brought also its own vocabulary with terms such as a program, supervisor, serial or parallel processors, sub-routines, etc., which presented psychologists with the precise tools that they had long required for giving structure and life to their proposed models. The crystallisation of psychological processes was made possible by the computer and the goal of artificial intelligence research .... is .... to construct computer programs which exhibit behaviour that we call 'intelligent behaviour' when we observe it in human beings (Feigenbaum & Feldman, 1963).
This ties in closely with Simon's observations on human behaviour. A human being can think, learn and create because the program his biological endowment gives him, together with the changes in that program by interaction with his environment after birth, enables him to think, learn and create .... Clearly this will not be a program .... that calls for highly stero-typed and repetitive behaviour independent of the stimuli coming from the environment and the task to be completed. It will be a program that makes the system's behaviour highly conditional on the task environment - on the task goals and on the clues extracted from the environment that indicate whether progress is being made toward those goals. It will be a program that analyses, by some means, its own performance, diagnoses its failures, and makes changes that enhance its future effectiveness (Simon, 1960).
If that which underlies the creation of intelligence was seen to be an adaptive biological program, would this constitute a dalliance with mentalism, an approach scorned by many traditional psychologists? The Genetic Psychologist has, however, long been laying claims for such a genetic endowment program. A program, in fact, which now can be programmed and executed on a computer and whose performance may be observed and analysed, with or without recourse to traditional human behaviour (without recourse if one wishes to produce an artifact with sufficient competence in specific fields of endeavour).
The student of psychological processes who wishes to test his model on a computer needs only to propose the structure of his underlying genetic program, to define its modes of interaction with environmental data, to define processes for coding experiential information into specified forms of a data structure and lastly, to define an overall architecture for his concept of the human mind (the structure of the fabric).
The next section will deal with some particular models of human thought processing, whose creators wished to extend the boundaries of knowledge upon such aspects as concept formation, utilisation of conceptual knowledge, the structure and organisation of representational and operative knowledge (behaviour), and the underlying cognitive mechanisms which effect learning and understanding in the human infant and adult.
Three models of cognition have been chosen for study and appraisal. The reasons for choice were these:
that the models be general-purpose; that they could reasonably be expected to (or have proved to) generate human-like reactions; that they should throw light upon such areas as concept formation, knowledge acquisition and development, behavioural development and organisation and encoding of experiences so as to produce understanding and adaptivity.
Three such models have been chosen. That others exist is known, but these in particular may suffice to serve the intentions that were earlier discussed.
The first two models are basically performance-type models where they may exhibit performances similar to the human infant. The third is more a competence-type model, which has to perform sufficiently competently (not necessarily human-like) in order to satisfy its creator.
Hebb postulated a theory of perceptual learning in which he intended to show how a fixed, structural memory trace could go hand in hand with a system that could recognise percepts independently of the activation of specific neural elements (Dodwell, 1971).
When using a neurophysiological base for a model of cognition, one is forced to use the known brain fabric and properties to construct data structures and those organisational processes which serve to form these structures.
Hebb envisaged cells - i.e. individual neurons or specific clusters of neurons - as having an innate capacity for firing, if stimulated above an activational threshold point, by external and internal stimulation.
The firing of a number of cells equated to a perception, the form of the perception indicated by the pattern of the activated cells through time.
One could therefore imagine a stimulus energy wave front impacting upon the brain causing cell after cell to fire, some cells firing simultaneously (nearly simultaneously) and others firing in particular phase sequences. Those cells which fired simultaneously tended to become associated (Hebb proposed a structural association - the growth induced by contiguous firing causing integrations in previously what were anatomically disorganised cells), the strength of the association dependent on the number of times such cells were induced to fire simultaneously.
If two cells A and B became associated in this way, they formed what Hebb termed a cell-assembly after which A and B could never be considered independently of one another.
The process of cognitive growth constituted the formation of more and more complex cell-assemblies, and behaviour was dependent on the cell-assemblies which were active at any time.
Further, when two assemblies fired one after another and such a pattern was sufficiently often repeated, then a phase-sequence was formed, where if the former fired, the potential for the latter firing was considerably heightened. These phase-sequences constituted learning-sets such that the activation of one set, put in motion conditions for the firing of the next and so on. This could be visualised as the process of learning-to-learn (Holland, 1974) and what could be learned (or which new assembly could be formed) was dependent on the previously acquired learning-set.
The property of a cell-assembly was such that it could continue reverberating even after the stimulus source (or pattern of activation which induced it to fire initially) had ceased. As can be seen in Figure 3.1, cells A, B, C, D, E constitute a ce11-assembly, each cell having a number of input and output links. If A was made to fire by an external source, it induced B to fire which in turn fired C, D, E and again A. This could go on for an indeterminate period of time, such activity ceasing only if this assembly caused another to be set in activation and therefore losing the energy to continue or through interference from external sources. It is not clear if Hebb intended that:
However, the definition of a cell-assembly such as ABCDE meant that A was more certain to transfer activity along its B link than any other output link. Thus every output seemed to have a probability factor associated with it, strong links having a greater probability for activation than weak ones (See Figure 3.2). From what little is known of neurons today, the firing of a neuron is an all-or-nothing activity and thus if cell A (in Figure 3.2) fires due to the input links i1, i2, and i3 being activated, its output value is not dependent on the separate values of i1, i2 or i3. Thus in reality i1 = 12 = i3 and all transfer links must carry the same energy value. (Hebb in his time could not have known this and in any case never specifically states input or output values, merely firing or non-firing of cells.) The output links have some probability measures p1, p2 and p3 of being activated but once activated, transfer an equal amount of energy, i.e. equal to the total output along1each link (= i1).
If, for instance, the cell A constituted the perceptual element line (activated on seeing a line) and B constituted the element vertex then the percept triangle should be formed by result of the eyes transferring vision from vertex to vertex and line to line. However, Hebb states that the complex percept formed as a result of associating its elements, is something other than the sum of all the parts. Figure 3.3 shows the perception of a triangle ABC. Layer 1 constitutes the visual cortex and bears a number of cells a, b and c, a corresponding to the vertex A of the triangle, b to vertex B and c to vertex C. When the vertex A is fixated upon (the infant having an innate tendency to fixate on corners and then transfer vision from vertex to vertex - so when at A, probability of next fixating on B = probability of fixating on C) then all the a cells in the cortex are simultaneously activated (although some b and c cells may also reverberate). When A, B and C are looked at successively in any order, amorphous cell-assemblies corresponding to A-B, B-C, C-A, A-B-C and the lines AB, BC, CA are set up in layer 2 (the horizontal line BC in fact should form a fairly strong cell-assembly due to innate tendencies to scan horizontally). Eventually, through formations in layers 3, 4 etc a cell-assembly corresponding to the triangle T itself will be constructed in some layer n. This results from the inter-facilitation of the intermediary assemblies, due to several sightings of different triangular structures at varying distances and angles of vision. However, Hebb clearly states that the eventually formed concept of the triangle, denoted by the cell-assembly T, is essentially a new one, by no means a sum or hooking together of a, b and c .... this is something other than the sum of its parts. We can say, then, that the schema t is not merely made up of the six elements of the triangle (3 vertices and 3 lines) but is somehow imbued with other knowledge.
The ultimate concept of the triangle T allows for conceptual activity to occur, i.e. allows for seeing more into an object than what the sensations have recorded. It is therefore an activity not controlled purely by sensation and occurring only when the concept has become we11-formed. Concepts, once formed, could also become linked to other concepts having elements in common. This was what Hebb termed conceptual linkages forming between cell-assemblies which were not dependent on contiguity of firing.
As can be seen in Figure 3.4, a concept A1 has been formed by the association of the cell-assemblies A, B and C. A concept A2 has been formed by D, E and C. They both have the assembly C in common. This, says Hebb, provides a basis of prompt association. Further, he states that the perception of an actual object .... involves more than one phase cycle. It must be a hierarchy: of phases, phase cycles, and a cycle or series of cycles. (An object is sensed through different modes: touch, sight, etc. Thus its perception involves the activation of several assemblies, such that several phase cycles are in activity simultaneously. Each time the object is perceived, the same cycle may, therefore, recur.) Thus concepts may be associated by being activated in the same cycle (in different phases, i.e. different activational sequences corresponding to each mode of sensation, say) which concepts also have subsystems (like C) in common. This defines a more effective link than merely contiguous phase cycles.
Hebb states then that The prompt learning of maturity is not an establishing of new connections but a selective reinforcement of connections already capable of functioning. A concept is also, not unitary. Each concept has a central core but relies on associated fringes which supplement the central concept. The fringe content that is aroused depends on the context of the perception, and thus a concept may be used independently of environmental circumstances, such generalisation becoming possibly only with the extent of the acrued fringes. The acruing of a fringe is a slow learning process, and allows for the individual to proceed from the particular to the general.
A cycle of activity (cell-assemblies functioning in sequence) may be broken and the cortical organisation disrupted if the environment changes its pattern of behaviour. In conditioning experiments then, a cycle which is set up, such as press bar (one assembly) and obtain food (the expected and second assembly in sequence) may be broken by not issuing food contingent upon bar press. Thus the expected assembly (food) initially fires, but gradually diminishes as the environment does not confirm its occurrence. The neural pattern then alters such that the probability of the food assembly firing after the bar press assembly gradually grows less and is, then, completely extinguished (the phase sequence is destroyed).
Motivational factors such as a hunger drive was present in innate neural circuitry (inherent cell-assemblies) and such circuitry became activated as a result of the onset of the need. The activation increases along with the need, resulting in a destructive interference process with the ordinary patterns of neural activity. Thus what is perceived becomes more and more distorted by non-satiation of a need. Gratification results in lowering the destructive activity and normal perception is resumed upon satiation.
Such innate drive assemblies may also evoke specific forms of behaviour, since each phase in the phase sequence may have its specific motor facilitation. Hebb also sees a state of restlessness being induced by continuation of activity of the drive circuitry. This restlessness is not directed but, by disrupting normal activity patterns, creates an instability of direction. The directedness result only in learning being established as a result of such instability. Such learning may be effected through the setting up of phase sequences prior to need activity (recognising the onset of the need before it occurs) which evoke directed behaviour (the baby learns to turn his head towards the nipple, when, formerly the sucking reflex was only activated by touching the mouth or surrounding areas) - behaviour that may previously have only been evoked by a specific stimulus (a reflex activity).
To summarise, Hebb's model had the following features:
There are certain factors lacking in Hebb's model, knowledge of which has largely increased since Hebb's initial publication of his theory (1949).
Habituation (Response decrement to repeated stimulus presentation, that decrement not being due solely to peripheral processes such as sensory adaptation, effector fatigue, or changes in arousal. (Chibucos, 1974)) was seen by Hebb as certain changes .. resulting from repeated or prolonged stimulation. In his model, cell-assemblies once fired tended to continue for a very short period after cessation of the stimulus. He suggests, as an explanation for the non-firing of an assembly when repeatedly stimulated, that if nonetheless the arousing sensory stimulation persists . . .. , forcing a continued activity, the tendency would be to induce a change of frequency properties in the assembly. The facilitation delivered to other cerebral systems would then be changed, which means some change in perception. The reason then being a change in perception. This contradicts current theories on habituation as an active inhibition rather than a change in perception. The subject perceives the same object but loses interest in it and if presentation is still continued, then may act even aversely, which factors Hebb's model cannot easily account for.
In Hebb's theory a transient memory was the result of the continued reverberation of a cell for some short while after stimulation has ceased.
Consider, Cells firing at time t0 facilitate cells firing at t1 on to t2, t3, etc. In order that the transient memory be available for questioning then, it must be accessed before the cells at t0 cease reverberating, cells at t1 being only a subset of those at t0 and including new stimulus occurrences that occurred at t1. i The original at t0 only holds the entire information of what was perceived at t0. In experiments on transient memories, subjects asked to recall a character array which had been exposed for only 50 ms, are unable to do so fully. However, if a known (defined previously to the subject) stimulus (auditory) was associated with a particular row and the stimulus being given after presentation of the visual array, then recall was 100%. By Hebb's model, then, the original assembly at t0 had to be revived in order to attach (associate) the auditory stimulus which through a process of reinforcement served to strengthen the assembly and all its elements thereof. However, why should the assembly at t0 be available under certain circumstances and not others? This brings one to the pre-attentive and central-attentive processes (Neisser, 1967). In Hebb's model, perception of an event was defined by the stimulus energy emanating from the event, the structuring that had occurred so far in the brain and the current activity pattern of these structures. To attend to something, however, implies a focusing process upon parts of stimulation serving to inhibit other parts which had been perceived but then actively, ignored. The commonplace model for such a selective process, today, is that of transferring information from the transient modules to modules less transient. In Hebb's model, it would imply the phase sequence set into motion by the perceived object, that being attended to dependent on the subset which had been somehow better perceived (or better known to the system corresponding to better cell-assembly definitions) than others. This does not account for selective remembering after the perceived event, i.e. for being able to recall an object when reinforced upon a later (in the order of milliseconds of course) occasion, and not remembering if no reinforcement was made. The phenomenon of backward masking (Neisser, 1967) too could be accounted for by Hebb's model if the delay between the two presentations was always below some minimum level. Then, contiguous elements may serve effectively to dominate one another dependent on how well they were known and upon other factors such as intensity of stimulation. But masking can occur with delays of up to 100 milliseconds, implying that temporal contiguity is not the only masking factor, and hence, cannot easily be explained.
In Hebb's theory, the fundamental unit of data was the cell. In Cunningham's theory it is the element which Cunningham equates in structural status to Hebb's cell-assembly.
The tendency to proceed from an undifferentiated,. structurally chaotic fabric into order and precision took place as a result of learning. Learning constituted the processes of differentiation and co-ordination within and between structures. This Cunningham visualised as being similar to Piaget's theory of assimilation and accommodation. Further, the structuring of the fabric was controlled by centrally autonomous responses which served to lower or heighten the input energy value of the stimulus wave-front.
Similar to Hebb, however, behaviour at all times was defined by the flow of activity within the structures initiated by the external stimulus energy. But unlike Hebb, elements when in the path of flow of stimulus energy did not always become activated. Activation depended on the threshold value of each element, threshold values, changing through experience and learning. Thus elements representing commonly perceived elements had lower threshold values (due to their greater occasion for reverberating than others.
Elements which became active simultaneously tended to become associated. However, associations resulted in the formation of a new memory element which served to conduct energy from one to another in a more determinate manner. In Figure 3.5a the two reflex links AB and CD, at the onset function independently. However, if they happened to function simultaneously, then a new element X was formed input C to output B, (Figure 3.5b).
Those elements which are reverberating at any time constitute the highest level of functional memory and make up the Attention Span. Those elements which were active but reverberating below threshold constituted the secondary level and were the most probable for reveberation in the next time instant. Those which were not active constituted a third level of memory which had potential for future use.
Memory built upon memory could be effected as a result of memory elements being linked together. In Figure 3.6 the links AB and CD have a common memory element X. The links A'B' and C'D' have a common element X'. If these two systems reverberated together, then they become associated through the memory element Y. If now X and Y reverberated together they became associated through Z and so on.
Co-ordination between systems is brought about by the creation and activity of the memory elements. Differentiation within a system (such as the walking activity becoming differentiated into running, jumping, etc) is brought about by the feedback from reflex activity into the sensory system - reafferent stimulation - resulting in new element-assemblies or new elements being formed.
Co-ordination and differentiation constitute the process of accommodation - new structural and behavioural accomplishments - within the system. The assimilatory process is inherent in the reverberation of elements by external stimulation and by the passage of such activity through the system.
The Attention Span may be momentarily increased or decreased by the centralised Orientation and Defense Responses. However, since the Attention Span must, by definition, always occupy a certain fraction of the total active area available, it must, as a result of accommodation, gradually increase. Cunningham speculates as to the maximum being Miller's (Miller, 1956) seven plus two chunks.
The learning mechanisms in Cunningham's model are basically:
The major interest in Cunningham's theory, disregarding its neurophysiological basis and similarities to Hebb's work, is the programming of the model to be run on the computer using Piaget's observations and developmental theory as a basis of comparison and evaluation. Commencing with only the reflex activities and connections being specified between every input and output element, Cunningham foresaw the model developing its behaviour in a manner akin to the human infant - though he only considers the auditory and vocal channels of sensory input.
Even with such a restricted version, Cunningham believes he achieved Stage 1, Stage 2 and Stage 3 behaviour. (What these stages imply will be discussed in Section 9.2.3.) The interesting factor is the basic simplicity in design of the model; the tendency to see how far it can develop without adding an increased degree of complexity. Simple designs can account for complex behaviour (as was demonstrated by Rosenblatt's perceptron which was basically a linear learning system and no more (Rosenblatt, 1958)).
Unlike Hebb's theory where learning progressed as a result of steady structural alterations and growth, Cunningham visualised learning as an all-or-nothing structural alteration - not a subliminal increment of a newly learned item. To learn a new thing however, a learning set must already have been previously formed since a structural alteration is only formed as a result of currently reverberating elements.
Classical conditioning, in Cunningham1s model, could be viewed in terms of co-ordination. The insertion of a new element co-ordinating a pattern of sensory input (called the conditioned stimulus) to a pre-existing stimulus-response pair of elements (the unconditioned stimulus and response). The stimulus, once conditioned, forces an orienting response upon the system (see Figure 3.7).
Positive reinforcement may be seen as that newly created element serving to co-ordinate those structures that it links together. A positively reinforcing event then is that which has the potential for creating new co-ordinating elements. A negatively reinforcing event creates elements which channel activity away from the structures that it co-ordinates.
Extinction may only occur through competition, i.e. a previously formed structure being competed by another which serves to channel activity away from the original structure. However, extinction of a conditioned response may also occur by presentation of a reinforcement indiscriminately - not contingent upon the response occurring. It is difficult then to see how Cunningham's model could cope with extinction. Extinction by punishment of a once correct response will progress slowly if the punishment lowers attention span and lessens the chances of competing co-ordinations.
In terms of concept formation, those structures which are sufficiently complex (in organisation) and well-formed (self supporting) are defined as a concept since they can maintain an independent existence. Conceptual operations involve co-ordination and differentiation of two large complexes to yield a new combined concept, and implies prompt activity with regard to structural alterations. When complexes interact with other complexes in a consistent, predictable, and independent manner then the subject has developed a true concept. Each complex may then be regarded as a unitary functioning unit with deterministic output.
Basically Cunningham's model is a performance model. (As opposed to a model of competence, which must provide those actions which define some average level of competence.) It may be used to obtain the average performances of an infant provided the average stimulation is input. It features:
The model is conspicuous for its absence of any innate drive or need - direction for behaviour being determined by the direction of the flow of activity through the structures.
The only learning mechanism available, other than the autonomous responses, is the creation of a new element being made conditional upon simultaneous activity in elements.
Firstly, it must be noted that Becker proposed a theory for the encoding and application of experiential information by a goal-oriented organism, designed to survive in and adapt to any environmental situation. Thus, though it had human characteristics, it need not necessarily be seen to have been provided with human-like cognitive processes and structures. Becker, in fact, claims to simulate the performance of middle-level cognition and does not, for instance, consider autonomous reflex systems or higher order logical operations.
This middle-level cognitive organism is provided with a functioning efferent and afferent system. The efferent system is capable of knowing which sensory organ conveyed the input stimulation and also possesses investigatory processes of activity reaching its environment for specific stimuli when ordered to do so by cognition.
The cognitive structure for holding information is a schema which is basically composed of an antecedent and consequent event.
[Event1 → actions ⇒ Event2]
Each event is a partially ordered set of kernels such as A → B, the little arrow defining a condition preceding an action, i.e. A is a conditional set of afferent kernels and B is a list of efferent commands to be executed on obtaining A. A kernel is the smallest formal structure capable of expressing an event or a relation - and finally tends to become an ordered n-tuple of nodes; where a node is equivalent to a conceptual link and is a nest of two-way pointers serving to join elements in different kernels. The major difference in Becker's schema is the existence of the Condition - Action pair as the left hand side (Condition usually being the lhs and Action the rhs) and the goal as the right hand side. Thus each schema may be seen as a program for attaining the goal contained in the right hand side and is an active, not passive, data-structure.
Becker uses two memory modules:
The cognitive processes by acting on these memory modules produce experiential data so that the organism may always relate its currently obtained information to its past experiences. Once such an experiential base has been formed, the organism is always striving to attain a number of short-term or long-term goals at any precise instance in time. (The naive organism, however, has no motivation for directing its actions and it is difficult to see how it will ever get off the ground, as it were!)
There are a number of cognitive processes and sub-processes.
The Goal Monitoring Process scans STM and assesses each kernel as to its desirability, neutrality or non-desirability with respect to the system's current short term (sub-goal kernels generated by Schema Application) and long term (predictive goals - rhs of applied schema) goals. Each kernel is then given an individual Relevancy Weight (its relevance to the system).
The Schema Application Process is next called, which attempts to find the closest matching schema from LTM with respect to the contents of STM. This is when LTM is not naive. When LTM is naive (empty) then an unguided STM-LTM encoding process is called and schema application cannot occur. In order to locate schemata in LTM, an LTM Search Sub-Process must be called. Having obtained a schema from LTM, the Analogic Matching Sub-Process must be called to decide on how well the obtained schema matches the contents of LTM. There are certain criteria that must be observed with regard to the matching process:
The Analogic Matching Sub-Process will emerge with a heuristic measure for each schema which indicates how well that schema meets its requirements. The schema actually chosen for application, then, must be the one with the best heuristic measure. The LTM Search Sub-Process will, therefore, output the schema which should be applied. The Schema Application Process now looks at the constituents of the schema, i.e. its left and right hand sides.
If the left hand side has already been attained, then the Application Process is over. If it has not, then the following question must be asked:
Is the right hand side of the schema desirable?
If it is not, then the Goal Monitoring Process is warned such that it may evaluate any kernels remaining on the left hand side of the schema (which may enter STM at some later time) as being undesirable to the system.
If it is, then the remainder of the left hand side kernels (that have not been attained as yet) are set up as a list of sub-goals to be obtained and the Goal Pursuit Sub-Process is called up. Now, there are four ways in which a sub-goal kernel may be obtained:
If, after scanning STM, a kernel cannot be found, then the LTM Search Sub-Process must be recalled, but this time to obtain those schemata containing any one of the required sub-goal kernels in their right hand side. Then, the left hand side of the schema may be applied, such that these kernels be obtained. The left hand side kernels now constitute the new sub-goals and the Schema Application Process must be recalled, which in turn may call up the Goal Pursuit Sub-Process. Thus, sub-goal application may require recursive calling of each before they are obtained finally, and the left hand side of the original schema be fully attained.
Once the left hand side of a schema is fully attained, then the goal is deposited in STM and tagged as an internal prediction.
LTM Modification Processes also exist which serve to create new schema on the basis of new information obtained from the Schema Application Process and the Analogic Matching Sub-Process. This serves to continuously refine structures in LTM such that they tend towards a more correct definition of the environmental events that they mirror.
The Unguided STM - LTM Encoding Process serves to create new schemata when none exist in LTM. This is done almost on a random basis, consequence being defined as the most relevant kernel to be found immediately occurring after an efferent kernel in STM.
This constitutes the fundamental flaw in Becker's system: that there is no motivation for a proper foundation to be laid such that reasonable behaviour patterns may emerge. This was substantiated, in fact, by a project currently underway at Queen Mary's College by Mott (Mott, 1976). It was found that some basis must be present for:
Becker's system has an inherent sense of causality, being motivated to see events as being antecedent and precedent. The representation of well defined strategies becomes possible allowing the organism to evaluate problems in terms of the strategies it may have developed in the past.
The Learning Mechanism provided to the system via the cognitive processes, are, admittedly, extremely artificial. However, this again is eminently justifiable in terms of the aims of Becker's research.
The interesting factors in Becker's system are his:
Production Systems originated with Floyd (Floyd, 1961) and were first used by Newell (Newell & Simon, 1972) as a notation for protocol analysis and hence for modelling psychological processes. Since then, they seem to have come into vogue, with a number of models employing Production Systems (Moran, 1973; Young, 1973; Waterman, 1970, 1975; Rychener, 1975; Anderson, 1976) occurring in the succeeding years.
The Production System consists of an ordered set of Rules of the form:
i α → j β
where i is one of the symbols in the input alphabet and j is one of the symbols in the system's repertoire. The two symbols α and β may be thought of as states in a finite-state machine. Thus the Rule could be described as:
Given the input symbol i, when the machine is in state α, the symbol j will be output and the machine will move to state β.
It is possible for any of the symbols i, α, j or β to be null.
A Production Rule, then, is basically a Condition-Action pair, usually stored in the permanent data-base (equating to human Long Term Memory). It may be invoked by symbols contained in some temporary working store (equating to Short Term Memory).
Given the latest input symbol, say K, resident in STM, whilst the machine is in state γ, the following processes are instigated:
A Production System may be considered adaptive if there exist transformation operators which act on existent Rules to create new ones for insertion into the data-base.
A Production Rule has often been compared to a Stimulus-Response pair. However, it is more than an S-R pair for the following reasons:
A Production System can be more powerful than a finite-state machine for the following reasons:
A Production Rule may be seen as the embodiment of a Piagetian schema containing simultaneously perceptual and motor components, the perceptual components being an encoding of some past perceived configuration and the motor components defining the resultant actions that were undertaken.
Newell and Simon see far more into the role of Production Systems: we confess to a strong premonition that the actual organization of human programs closely resemble the production system of organization and further, might well express the kernel of truth that exist in the S-R position (Newell & Simon, 1972). In Newell's system STM was so arranged that it reflected a temporal ordering upon its constituent symbols, with older symbols being moved to the right and new symbols being inserted into the left-most position. Thus as processing proceeded, the older symbols tended to fall-out from the right hand end of STM. Symbols in STM were also subject to temporal decay, such that decayed symbols could be written over by the newly occurring symbols. Newell foresaw the process as proceeding in parallel, all Rules being scanned and selected or rejected simultaneously. Thus search-time was independent of search-space. However, he did define a serial ordering upon Rules, higher Rules having greater capacity for selection than the lowly ones. As Anderson stated, it is hard to imagine a mechanism that would serially order productions but at the same time access them in parallel in a time independent of number (Anderson, 1976). Newell's system also incorporated a chunking process where an n-symbol string could be concatenated into one symbol structure. An internal symbol denoting this chunked string was then written back into STM to replace the original n. However, Newell's chunking grouped symbols arbitrarily contrary to Miller's hypothesis that chunking could only occur on very familiar configurations.
To date, the majority of Production System models have represented static processes, i.e. what Rules are required to define the problem area considered such that the required degree of skill may be exhibited by the model. The question:
What transformation processes are required that a minimal set of Rules may be operated upon to create Rules which produce the required behaviour? has rarely been considered.
With the exception of Waterman (Waterman, 1970, 1975),truly adaptive Production Systems have been non-existent. Waterman used Productions for embodying heuristic rules such that they could be dynamically manipulated by a supervisory training program which created several instantiations of each general Rule. If a Rule, when activated, resulted in a defined error situation, then a new Rule was created by using the training information. The new Rule was inserted above the error-producing Rule.
In Anderson's ACT, the Production System composed the operative component of cognition, a Rule being fired if its Condition side matched any elements in the set of active elements in LTM. The Action side served to dynamically alter LTM which itself was a network of interconnected nodes and links. In ACT, therefore, actions were undertaken as the result of a particular configuration of memory which resulted in transforming the original data-base. In his suggestions for production inducers in ACT, Andersen suggests that the sequence of transformations should be interpreted causally, such that a preceding configuration be taken as the cause of the succeeding one. Thus if Rule P1 was activated as a result of the first configuration, and Rule P2 as a result of the second, then P1 should form the Condition side and P2 the Action side of a new Rule. Thereafter, whenever P1, was active, P2 was reproduced. The model was therefore said to be capable of learning desired activation contingencies in memory, and should presumably acquire a set of Rules which defined a set of phase-sequences for LTM.
The interesting feature is that given the strong resemblance of Production Systems to human Long Term Memory structures, no one has yet encoded any of the psychological laws of learning as the required set of production inducers to create an adapative Production System.
The learning components defined in the theory presented in this thesis attempts to set right this grave lack of intercourse between Artificial Intelligence research scientists and psychologists.
Cognitive models have been proposed which constitute complete, or near-complete, solutions in restricted and subject-specific domains. The task is to find complete, or satisfactorily complete solutions in unrestricted, varying and noisy domains.
As Lindsay (Lindsay, 1973) stated the most striking characteristic of human behaviour is its variability, both within and between persons. To produce a system which produced deterministic behaviour would not constitute a solution to the cognitive problem, though it would constitute the perfect answer for an artificially intelligent system. The goals for psychologists and artificial intelligence workers may, therefore, be different. The former needs to find theories of human cognitive abilities, and the latter to produce systems capable of surviving in and adapting to varying environments.
A model of performance, produces behaviour which is reasonably characteristic of humans. It is, therefore, a function of every experience it has known but giving individualistic performances due to the initial setting of the parameters available. A model of competence will merely perform competently in its environment, the competence measure being judged by an external observer (its creator) and subject to the environment it is placed in. Models of competence usually favour unswerving processes that must inevitably choose the same path but models of performance offer a potential richness in behaviour by modelling their processes within a single formalism. A program that is both sensitive to current contexts and to its past experience, evaluates solutions on the basis of both and offers the variability implicit in human behaviour. Insight, is, after all, dependent on what has been previously learned - the learning set - and hence different levels of insight must be the result of different learning sets.
It is only through the process of proposing a psychological theory, building the model, and having the means of evaluating its performance, that human thought and behaviour may be understood by the student of psychology or artificial intelligence.
The entire cognitive system is viewed as an autonomous body comprising three functional levels of memory expressed as different storage modules, and a set of innate processes which serve to structure and exchange information within and to and from each module, and which coordinate and control the flow of information through the entire system.
The model's every interaction with its external world enables it to obtain experiential data, which when fully processed, aid towards the construction of an adaptive knowledge base. It is the content of the structures in the knowledge base which, when activated, determine the range of possible behaviours (internal and external) which may be executed by the model.
Figure 4.1 presents the major components of the model.
The memory modules comprise:
The memory processes are:
Input is in the form of internally recognizable units of information which are termed stimuli. Each stimulus consists of a stimulus name followed by a list of classifiers and a list of attributes. Each classifier denotes mode of input and primary or secondary value of a stimulus. The attributes pertain to position, intensity, etc.
The Perceptual Process inserts each stimulus into a Stimulus Symbol Table, and returns an internal symbolic identifier denoting the unique position of each stimulus in the Stimulus Symbol Table. Thus the internal encoding of a stimulus represents its name plus its known classifiers and attributes.
SSTM contains a string of symbols each representing a stimulus facet after environmental input has occurred. Thus we have an input:
Stimulus Name / Classification List / Attributes list → Internal symbolic identifier
For example:
TOUCH / Tactual / Left-cheek, Soft → Integer table position
For stimuli which are already known to the model, the table position gives the actual entry position. If the stimulus is unknown, then the table position gives its nearest equivalent entry. Initially, SSTM contains these symbols in the order of input. However, each symbol may be regarded as having a particular entrance priority for transmission into STM. These priorities are dependent on:
The preceding five categories of symbols are identified by comparing items in SSTM with those already in STM. The remaining symbols in SSTM, therefore, do not contain copies in STM. These are each assigned a priority dependent on how well known (given by the Stimulus Symbol Table) they are to the model and are input in order of their novelty value, i.e. less known ones entering first.
Symbols already contained in STM (but not having copies in SSTM) now become candidates for overwriting dependent on:
SSTM - STM transference continues until as such time:
If transmission ceases before SSTM is exhausted then the remaining symbols become inaccessible to the system and are lost from SSTM.
After SSTM - STM transfer, STM may contain newly input symbols, symbols resident from a previous input cycle and internally generated expectation symbols from the previous cycle.
STM therefore contains a sequence of independent entities each denoting an internally recognizable unit of stimulus information. However, it is a completely unordered list,each symbol containing its time of entry field and a tag field which is used for keeping priority information or information upon its usage to date (whether the symbol has been attended).
The size of STM is the number of symbols that it may contain in any given time interval and remains constant within that time interval.
The symbols resident in STM constitute the current attention span of the system and serve to evoke past experiential encodings stored in LTM. A symbol may activate a permanent memory structure in LTM if it matches any symbol contained in the Contingency descriptor field (Conditional field) of that structure. Thus partial matches (only a subset of the field being matched) and multiple matches (one symbol evoking more than one structure) may occur.
A limit may be imposed upon the number of structures that may be chosen as candidates for activation in any cycle. This serves usually to restrict the search space to those structures contained in the higher levels of LTM, and hence reduce the search space and search time involved.
However, since multiple matches may occur, some process for resolving conflict must exist. This is done by assigning each structure a value for its likelihood of activation known as its inherent Worth.
The Worth value of a structure is computed dependent on:
The computed Worth values define a probability value for the activation of the structure. The highest Worth structure will therefore have the greatest probability for activation, but due to the probability function involved, will not always be selected.
A structure chosen for activation may eliminate other candidates from the list since a symbol may only activate one structure (the chosen structure inhibiting the other structures evoked by that symbol). Thus only a subset of the candidate set may actually be activated. Further, there exists a parameter for controlling the number of structures that may be activated simultaneously within any one time interval. Thus, the first chosen may be the only activated structure. If more than one structure be chosen for activation, then their action fields are said to be executed in parallel. There exist no processes, therefore, for resolving conflicting actions (thus a left head turn may be activated at the same time as a right head turn), such conflicts being believed to be resolved at the muscular level.
The result of a structural activation is the execution of its action field. This may constitute one or more external responses and/or one or more internally generated expectation symbols.
The responses are written out into an output field and consists of a numeric identifier. Prior to environmental output, these identifiers are decoded into Response names. They , for example, may be decoded into the response elements:
TURN HEAD LEFT, MOVE RIGHT HAND
The internally generated symbols have now to be written into STM. If these exceed the available STM capacity then the excess symbols are considered lost to the model. If an internal symbol is already in STM (from a prior generation) then it is overwritten by the new generation (and its time of entry is updated). Next, all STM symbols which have been attended to in the current cycle are overwritten. Next, those symbols which have not been attended to are overwritten, and finally, the remaining internal symbols are written into the empty STM positions still available until capacity is exceeded.
Internal expectational symbols may only be those symbols (constituting a stimulus) which are already known to the Perceptual Process and entered in the Stimulus Symbol Table.
When written into STM they serve to give entrance priority to matching symbols in SSTM. They may therefore be seen as perceptual alerts, the Perceptual Process having a lower threshhold of input for that which is expected than that which is not (confirmed expectations getting the highest input priority for STM entry).
That structure chosen first for activation competes with the most significant previously activated structure for the position of the most significant activation to date.
The previously chosen significant structure has its Worth value decremented each interval by a decay factor. Thus new activations with a high enough Worth may overwrite old activations whose Worth has decayed.
The system also maintains a history of the expectational symbols generated to date. Once a symbol is confirmed it is taken off the list. Else it may be kept reverberating until its Worth value becomes too insignificant for it to remain on the list.
The Learning Mechanism creates new structures for insertion into LTM by applying transformation operators upon currently activated structures and the most significant structure to date. These operators serve to form near copies of a structure by expanding (generalizing) or contracting (instantiating) them on the basis of the information currently in STM and the changes in need levels within the system. Structures already existent in LTM may be subject to positional change by the Reinforcement Processes on the basis of confirmed expectations, altered need levels and the nature of newly input symbols into STM.
These processes describe a complete input - update - output cycle executed by the model. It varies from a pure recognize-act cycle due to the learning processes occurring within the cycle. Each such cycle constitutes a psychological quantum of time. Thus time is incremented by that interval comprising a psychological quantum of time.
Each memory structure is a Production Rule, a number of such Rules comprising the entire Production System in LTM. In contrast to the nature of stored information in SSTM and STM, a Production Rule once created remains for relatively lengthy periods. The reason it cannot be regarded as being permanent is due to the limitations imposed upon LTM capacity. Thus, once the limit is reached, Rules get deleted through competition from newly created Rules. Rules are selected for deletion if they have remained unused over a long period of time.
A Production Rule is basically a two-tuple comprising a Contingency - Response pair. Each Rule is a potential candidate for activation if it bears symbols in the Contingency field which match any symbol in STM.
A Contingency is defined by a list of symbols S1, S2, ..., Sn and may also be a null-list.
A Response is defined by a list of response symbols r1, r2, ..., rm and a list of internally generated expectations S1', S2', ...., Sℓ' either or both of which may also be null.
A Production Rule may therefore be thought of as, in its simplest form, a stimulus-response pair (S-R).
Depending upon the familiarity of an environmental event, several structures may become candidates for activation upon its occurrence. However, conflict is resolved by applying the Worth Function, thus narrowing down the alternatives and allowing the eventual choice to be on a probabilistic but contextual basis.
The properties of a Production System that are relevant to cognitive modelling will be further discussed in Sections 5.4 and 5.4.1. Suffice to say now that:
The Learning Mechanism provides a means by which new Rules may be acquired. This acquisition process is necessary if the model wishes to:
The learning processes enable the model to evoke a series of Rules each reflecting the degree of criteriality of a particular stimulus symbol to the definition of a particular contingency by virtue of the Rule's status within the System. It also enables the model to execute those response elements which generate the most consistent consequences. The rules for generalizing and instantiating a Rule plus the fact that a Rule always creates a new Rule when modified rather than an altered substitution of itself, means that the model may identify variations within similar environmental definitions. Thus a Rule commencing with a general description of a certain class of events, may, through the Learning Mechanism develop a number of descendant Rules which define individual component events of the class originally defined.
A Reinforcement process enables the system to alter its estimate of Rules dependent on incongruities between environmental feedback and expected consequences.
The Rule may also be thought of as defining:
an environmental event - responses to be performed by the model - the environmental consequences to be expected.
A Rule encoded in this manner has indirect means of invoking other Rules within the Production System.
Thus a Rule of the form:
S1 S2 .... Sn ⇒ <r1, r2, ..., rm> S1' S2' ... Sℓ'
may be considered as indirectly capable of activating any Rule of the form:
Si ⇒
where Si ⊂ {S1', S2', ..., Sℓ'}
The Production System is initially considered composed of a number of Reflexive Rules of the form:
S ⇒ <r>
where the symbol S innately activates the response element r.
The Production System is built up incrementally from its initial state by the Learning Mechanism which creates and adds one or more new Rules in any time interval. Thus LTM may be thought of as equating to human long term memory, wherein the arousal of a stored memory from its previously quiescent state results in the creation of a subtly altered copy of itself and in the alteration of the links it has with other stored memories.
As Rychener stated A P can be seen as a generalization of the notion of a stimulus-response path, where stimulus has been generalized to include internal symbol structures, and where response has become a sequence of internal symbolic manipulations and signals associated with motor commands (Rychener, 1976).
A Rule is positively reinforced when:
A Rule is negatively reinforced when:
By using an ordered set of Production Rules, the Learning Mechanism is able to keep a history of its estimate of each Rule inherent in the position of the Rule itself.
The entire Production System by virtue of its dynamic alterations, may be seen as becoming more and more in tune with its environment and thereby constructing a more and more accurate model of its world and maintaining a history of its interactions with this world.
The Learning Mechanism incorporates the following fundamental algorithms required for creating new Rules.
If two Rules are obeyed together within the same time interval, say:
P1: S1..Sn ⇒ <r1 .... rm> S1' ....Sℓ' P2: S1..Sα ⇒ <r1 .... rβ> S1' ....Sγ'
then the new Rules that may be created are by the addition of one element of one stimulus set to the stimulus set of the other Rule:
P3: S1..Sn Si ⇒ <r1 .... rm>
where
Si ⊂ S1 ... Sα
and
P4: S1..Sα Sj ⇒ <r1 .... rβ>
where
Sj ⊂ S1 ... Sn
by the addition of one element of the response set of one Rule to the response set of the other:
P5: S1..Sn ⇒ <r1 .... rm ri>
where
ri ⊂ r1 ... rβ
and
P6: S1..Sα ⇒ <r1 .... rβ rj>
where
rj ⊂ r1 ... rm
and by adding an expectational stimulus from one Rule into the other, giving:
P7: S1..Sn ⇒ <r1 .... rm> S1' ... Sℓ' Si where Si ⊂ {S1' ... Sγ'}
Note that in all cases the expectational stimulus set has been eliminated from the associated process.
The element chosen for combination is chosen at random from the appropriate set.
If the most significant activation to date was the Rule:
P1: S1..Sn ⇒ <r1 .... rm> S1' ....Sℓ'
and if the current contents of STM are:
S1 ... Sα
then a new Rule is created of the form:
P2: S1..Sn ⇒ <r1 .... rm> S1' ....Sℓ' Si
where
Si ⊂ S1 ... Sα
since the activation of P1 is believed to have led to the environmental event resulting in the current contents of STM.
Only one Si of the stimulus set within STM is chosen for combination with the expectational stimulus set of P1. This is chosen at random.
In special cases, if ∃ an Sj ⊂ S1 ... Sα where Sj is a novel stimulus, then this has precedence for combination in the creation process.
It must be necessary for the learning process to refine an environmental definition by arriving at that Rule containing only the set of criterial elements.
The Law of Redundancy serves to eliminate non-criterial information on the basis of currently perceived information.
If the Rule activated at T0 was:
P1: S1..Sn ⇒ <r1 .... rm> S1' ....Sℓ'
and ∃ any Si ⊂ S1 ... Sn not belonging to the stimulus set currently contained in STM (a non matching element), then a new Rule P2 is created where Si is eliminated from the stimulus set.
P2: S1..Sn-1 ⇒ <r1 .... rm>
such that
Si ⊄ S1 .... Sn-1
This law serves to add criterial information onto a newly created Rule.
If the Rule activated was:
P1: S1..Sn ⇒ <r1 .... rm> S1' ....Sℓ'
and ∃ in STM any Si where Si ⊄ S1 ... Sn then a new Rule is created:
P2: S1..Sn Si ⇒ <r1 .... rm>
If Si is a Primary and there exist another Sk in STM where Sk ⊄ S1 ... Sn then Si is chosen for addition over Sk, the Primary taking precedence.
Further, if Si is a Primary, then the new Rule created is of the form:
P3: S1..Sn Si ⇒ <r1 .... rm> S1' ... Sℓ'
the Rule being allowed to include the expectational stimulus set belonging to P1.
These laws serve to alter the contents of the Production System incrementally, on the basis of the current contents of STM (reflecting the environmental information available to the model) and on the past significant actions undertaken by the model.
However, these laws constitute a minimal set of learning algorithms. For a more varied and complex set of behaviours to evolve, it may be necessary to formulate a more comprehensive set of algorithms.
A diagram of effects is given below showing the flow of information through the Learning Mechanism (LM), and its area of control over the internal state of the model.
The Drive Process controls and regulates the activities of the basic needs within the model, the two considered being Hunger and Pain.
The Drive Process computes the level of Hunger and Pain during each time interval. An internal Drive stimulus - called a Primary -is emitted during any time interval dependent on the computed need level. The two Primary stimuli are HUNGER and PAIN each being emitted according to the levels of Hunger and Pain within the model. Both are emitted on a probability basis where
P(HUNGER) α Hunger level P(PAIN) α Pain level
and P(HUNGER) = (maximum content - present content) / maximum content
To view the need level as having content could be seen as being analogous to viewing the system as a food storing, consuming and expelling machine. Thus the content of the machine defines, at any time, the level of Hunger, where content α 1/(need level) i.e. the greater the content, the more food there is in the machine and hence less the level of the allied need.
It is obviously difficult to compute factors such as:
However, the model requires:
The Drive Process influences the Learning Mechanism through the Law of Insufficiency (where addition of a Primary takes precedence) and through the Reinforcement process. In this case the following steps are taken:
If the Rule activated at T0 was:
P1: S1..Sn ⇒ <r1 .... rm> S1' ....Sℓ'
where ∃ an Si ⊂ S1 ... Sn and Si is a Primary and
Hunger level (T1) > Hunger Level (T0)
(the hunger has increased) then P1 is demoted as a result of having failed to alleviate the allied need.
However, if Hunger level (T1) < Hunger level (T0) then P1 is promoted as a successful Rule, (the same applies to Pain).
The Pain level is treated slightly differently. If at T0 the Rule activated was P1, and Pain level (T1) > Pain level (T0) then a new Rule is created of the form:
P2: S1..Sn ⇒ *<r1 .... rm>
An Inhibit Rule of the type P2 if chosen for activation at T0 states:
Inhibit the set of responses belonging to P2. If any ri ⊂ * r1 ... rm exists in any other Rule chosen for activation at T0, then inhibit the corresponding Rule.
Figure 4.2 serves as a summary to the behaviour of the model.
It may be seen as proceeding through the following steps:
A number of components in this model are believed to be subject to physical maturation. Maturational processes have an effect upon cognitive development since they impose, at any time, limits upon the rate and quantity of information available for internal processing.
The quantity of information that may be perceived, (input to SSTM), attended (input to STM via the communication channel), invoked (LTM search space), activated (number of structures that may be activated simultaneously) and processed (the set of learning algorithms available) is subject to the level of maturation achieved by the memory buffers (their physical size) and the memory processes.
Further, the rate of decay of a memory item may also involve maturational factors, such that the early infant possesses very little capacity for retention (decay rate is high both for items in STM and significant structures from past activations), whilst adults possess far greater capacity for maintaining reverberation. The latter is subject to debate, though, since better retention may purely be due to the presence of rehearsing algorithms which deliberately rehearse items in STM and hence keep them reverberating longer.
It is obviously difficult to know exactly how maturing components aid development. However, it is presumed that:
Such memory components have been defined as parameters to the model such that their actual value may be specified prior to commencing any experiments.
The size of SSTM may be specified as containing N items of stimulus information during the input phase. N defines a maximum capacity to SSTM and at any time input stimulus information may ≤ N. If input exceeds N items, then those items from the N + 1th item onwards are disregarded.
The communication channel bears the following characteristics:
Its capacity is specified upon input and may be any integer value C where Cmax defines the upper limit. In any time interval C items may be transmitted from SSTM to STM where C ≤ Cmax Once C > Cmax, transmission ceases and left over items in SSTM are destroyed. C may be altered by the Perceptual Process as an autonomous defence function. This happens when any need level becomes acute and channel capacity is reduced to effectively 1 where the only item of information that may be input must be that Primary stimulus generated by the Drive Process associated with the acute need level.
STM capacity is also subject to maturational processes from some initial birth capacity to some maximum which may or may not correspond to Miller's magical number seven plus or minus two chunks (Miller, 1956).
Pascal-Leone (Pascal-Leone, 1970) visualised this set measure to grow in an all-or-none manner as a function of age in normal subjects. He considered it to be a quantitative measure, characteristic of each Piagetian developmental stage, the difference between the two measures in any two adjacent stages being a constant and indicative of the intelligence of the subject.
In terms of the model, STM capacity is specified as some quantity C where Cmin ≤ C ≤ Cmax and C is not necessarily integer. Thus if Ci < C < Cj where Ci and Cj are the nearest integers to C, then the probability of Cj being chosen is (C - Ci) and P(Ci) = (Cj - C).
It must be remembered that the capacity of a memory buffer does not imply the physical presence of a set area of storage space of a specified size. Rather, if one thinks in terms of reverberational items, then the rate of reverberation of an item defines the memory buffer which it occupies. Thus SSTM - STM transmission may be thought of as raising the level of reverberation of N items wherein only C items may achieve the higher rate of reverberation, simultaneously (C ≤ N). Thus maturation involves the growth in the capacity for maintaining a number of simultaneously reverberating items and not an actual physical extension.
The capacity of LTM defines the number of Production Rules that may be maintained simultaneously in LTM. Once this capacity is reached, then structures have to be deleted in order that new structures may be inserted. Deleting structures may be thought of as equivalent to the process of forgetting whence the memory is no longer available to access. The structures chosen for deletion are on the basis of number of times of activation and time of residence in LTM.
The model does not cater for maturational processes with regard to the learning algorithms. If such a feature existed it would involve the unveiling of pre-set algorithms according to the age and developmental level of the subject - e.g. chunking and receding algorithms (discussed further in 9.2.1).
A psychological quantum of time defines one trial or one executional cycle of the model from stimulus entry to response output. Each unit of time is considered to be Δt seconds long. The question is whether Δt is subject to development or not? Supposing it were, then it would be reasonable to have it contracting as development progresses, such a contraction not affecting the maximal set of stimuli that may be simultaneously recorded. Further, it may be altered by autonomous responses such as an Orientation Response (dilating Δt) and a Defence Response (contracting Δt) (see Cunningham, 1972).
The cognitive theory presented in this thesis assumes that memory has three differing functional levels. A transient sensory buffer contains sensory stimuli upon registration of stimulus energy upon the sensory organs. The Perceptual Process transform the input into a set of internally recognizable symbol structures which are then ordered and input into STM. Symbols in STM evoke past experiential encodings in LTM. These evoked encodings have an intrinsic value computed which values denote their likelihood of activation. Activated structures may inhibit other structures bearing symbols in common. An activation may result in the generation of a set of external responses and/or internal symbols which are written back into STM. A set of genetically encoded learning algorithms serve to create new experiential encodings and to alter status of existent structures within LTM. The theory assumes that these processes occur within a psychological quantum of time which interval defines a complete input - update - output cycle.
When an individual is born into this world as an infant, he faces the awesome task of having to understand the nature of the world about him. In order to do so, and taking a cognitive viewpoint, he constructs an internal model - one part of which is concerned with understanding and construing the physics of his world, and the other part of which is concerned with understanding himself, which is reducible to understanding his model. The both must serve his physical and psychological self, and hence the two must, by definition have much in common.
To understand a human's psychological self, then, we are faced with two alternatives:
The first approach is undertaken by psychoanalysts, philosophers, and many psychologists. The second approach is taken by Artificial Intelligence workers and some psychologists.
In attempting a computer simulation of human cognition, the model-builder must define a structure for his model and laws that it may follow in order to perform its function.
He is faced with two alternatives:
Whichever alternative he takes, his model must be able to answer those questions that interest him about the process that he is modelling. The purpose of a cognitive model is that it may take the position of a human, and when put to any test must perform in such a way as to (i) appear more or less like the human, and (ii) produce responses to that test which, when analysed, lend insight into the model that it has built. Thus, not only do we have the results of the test at hand (being the output of the model) but we can also dissect our model in order to determine why it behaved the way it did (to window in on the structure of its model of the world at the time of undertaking the test).
At all times, though, the model-builder must keep in mind that his model is, after all, only a model, and to read too much into the reasons it performed the way it did, might give a completely misleading insight into the relative human behaviour.
The model presented in this thesis was based:
It is considered to be reasonably difficult for a computer simulation to be discarded as being totally inadequate, if a large number of answers it provides bears a striking resemblance to the real-world situation. The model presented in this thesis, has, to date, performed reasonably like a human infant so as not to be totally coincidental, and the objective is to allow it to progress through more and more experimental situations and to observe, discuss and analyse its behaviour throughout. This is in the hope that it will predict some aspect of human behaviour in particular situations, which at the time, the model-builder had no cognizance of, and hence to justify its creation.
The purpose of the cognitive model is to construct a model of its world, given a set of endowed processes and brain architecture, and an encoded description of its outside world as defined by the programmer. Its mode of perceiving, conceptualising, and hence building its own knowledge base and learning to understand the representational knowledge therein, i.e. to convert it into operational knowledge, is all given in the form of a suite of computer programs and data structures. The reasons for the design of the program was influenced strongly by certain beliefs held on the nature of perception, cognition, and learning as undertaken by the human infant.
The following sections will attempt to give a psychological interpretation of the computer model and the theories upon which the model was built.
The theory undertaken in this thesis is that Perception may be thought of as a prior process to cognition, and though it performs the primary stages of cognition, is of a different nature to cognition, possessing genesis and structural laws of its own.
Perceptual activity is portrayed in this model as made up of the following characteristics:
These are processes unique to Perception (although the processes of cognition may perform similar functions) and may be thought of as innate laws of Perceptual Organisation.
It is of a different nature to cognition, in that the knowledge produced by Perceptual Activity is of a purely speculative nature, unlike the operative processes of cognition.
Perceptual Learning is not a feature of this model since there are no chunking processes incorporated into Perception. However, in the final chapter,a method for introducing such a process will be described and discussed.
The Perceptual Process is seen as maintaining its own temporary storage module (SSTM) and its own, more permanent Knowledge Base. The Knowledge Base, though it is obviously of the same fabric as the rest of memory, differs from LTM in one important aspect: only the Perceptual Process may enter any data into this memory block and it is only accessible by the Perceptual Process during the act of Perception (though at other stages it is accessible to the Cognitive Processes). It provides a long term memory of every sensory element known to the organism and hence provides Perceptual Activity with an inherent knowledge of how to deal with a sensory element that has just been encountered. In other words, it defines the type of Perception being employed. Perception, being a differentiating process, must be able to differentiate the mass of sensory data received into some coherent structure. It does so by classifying the data into mode of input (the sensory channel which obtained the data) and into primary or secondary value. During this process, each sensory element is incorporated into the Perceptual Knowledge Base. In its constructive phase (strictly speaking, all the different phases conjoin so as to appear as one process performing the different types of action), the Perceptions encode each element into its internally identifiable form and assimilate the data totally into its Knowledge Base. This process of incorporating previously unknown elements or a new attribute of an element into its correct place in the Knowledge Base, is seen as identical to Piaget's Assimilatory process. Assimilation here being the active incorporation of a perceived element into the Perceptual Knowledge Base in such a way that all its connectives are correctly observed (the element may be thought of as being connected to other similar components in the base. Each time a new element is assimilated, all its ends must be connected and any broken ends - links - as a result of inserting the new element, must themselves be re-connected). Once an element is assimilated it is thereafter, known to the system in a unique, internal form - which may be thought of as being equivalent to a percept. (Eventually, the percept must be more than just the transform of one sensory element, and if we think of chunking items then the percept must equate to chunks".) It is the comparative process available to the Perceptions, which enables it to discriminate amongst the total set of sensory data received. The Perceptions are alerted by the central processes of cognition, to look out for the occurrence of certain, significant sensory elements which have been predicted by the central processes. Any matching sensory element is immediately identified by the Perceptions as being of greater significance than non-matching elements. However, a number of priorities are observed by the Perceptions, dependent on the current state of the system (its expectations and its needs) and on the nature of the sensory input. The selection of the Perceptual Set is always, therefore, a discriminative process, certain elements being actively passed onto the central attentive processes, and certain being passed on dependent on how much attentive capacity there is currently available. A stimulus may therefore be sometimes discarded, sometimes sent directly through for attention and sometimes sent through dependent on currently available capacity, i.e. its threshold for input varies as a function of the current state of the Perceptions and Perceptual Activity, and hence, varies the threshold for entry of a stimulus, raising or lowering it as deemed necessary. The Perceptions have no direct influence upon cognition, although what is currently perceived must, to some extent, define the Attentional Set. However, if the organism is in a state of acute need, then, the Perceptions may influence cognition to a large extent by serving to cut down drastically upon the selective capacity of cognition. It does so by reducing the Attentional Set to one stimulus element only.
In this model Perceptual Activity has no specific motor element, other than the indirect influence exerted upon cognition by the determination of the Perceptual Set (and hence, to some extent, the Attentional Set). Perception, therefore, determines the form of cognition that is to be employed, but does not actively direct it.
Perceptual Activity exists always in the human, from infancy to death. Whether it does so in an unaltered form is debatable. The activities undertaken, may themselves not be subject to change, but the output produced may be seen to increase in richness due to (i) the physical maturation of the sensory organs, and to (ii) the extension of the Perceptual Knowledge Base to cater for more and more experiential knowledge. Since the processes are believed to be g enetically encoded, that they themselves can change within a lifetime, must be, improbable if not impossible. However, this is not to say that the perceptual processes as inherited by today's infant, does not differ from his ancestor of two thousand years, ago, due to the processes of evolution and to the rich educational and social structures available to today's infant. What we are capable of perceiving today and hence cognizing about, must be superior to the perceptual and cognitive abilities of our forefathers, and hence, what our descendants may be capable of, providing the species survive, must surpass ours.
When the cognitive structures contained in LTM are activated by the Cognitive Processes, they yield operative, conceptual and cognitive activity. Whilst they remain unactivated they contain Representative Knowledge, and define the total Representative Knowledge available to the organism. When stimulated into activation, they yield:
The initial cognitive structures available to the organism yield Reflexive Activity and are genetically endowed to the system as species-specific knowledge.
The Learning Mechanism constituting the learning process, may be seen as the means by which the organism acquires further Representational Knowledge, thereby allowing for further Operative Knowledge. His steady progress may be seen as brought about by the steady acquisition of new Representational Structures, and by the modification of currently existent structures, the two combining to extend the scope of his knowledge and attune it to his environment.
The nature of knowledge, particularly with regard to the infant, is a fundamental issue. Is the infant capable of conceptualizing from birth, or is this ability brought about through a developmentary process, steadily imbuing his Knowledge Structures with greater and greater certainty and resolution?
A theory of structural genesis proposes the belief that certain structures in the human mind have their origin in the genes. This means that experience merely provides the data upon which the innate structures operate to obtain the initial ideas contained in the human infant's mind.
Innate structures equate, in this thesis, to two different objects. Namely, genetic data structures encoded in the infant which enable him to recognise (through Perceptual Activity) certain invariant patterns of stimulus energy and to which he initially responds purely reflexively, and, a set of genetic processes which operate upon the data structures to create internal activity: perceptual, operative, conceptual and cognitive.
The thesis, incorporating this belief in structural genesis, introduces a model of cognition, which contains a specified brain architecture, preformed data structures and a set of encoded processes, which conjoin and act upon the data of experience, to produce its own model of the world, enabling it to behave in a remarkably infant-like manner.
The development of all physical concepts, of emotions, of morals and of ego must, therefore, have their origins in the innately given structures of the mind. The experiences that the infant undergoes merely define the time and the way in which they develop, but their very development being dependent on the level of genetic programming endowed to the infant.
From this theory emerges the notion that intelligence, being the outward aspect of developmentary processes, is determined by the innate programming available to the infant, and only influenced by experiential data as to its outward form of manifestation. One facet of intelligence is the acquisition of a Knowledge Base incorporating those experiential objects that exist in the environment, i.e. a Representational Knowledge Base. It, however, also involves the acquisition of beliefs - a notion as to the purpose and mode of interaction of these known objects and himself. It defines the response outwardly to be employed and when the Knowledge Base is sufficiently extended, incorporates a belief in the consequence of his actions upon the environment. Thus, beside the formal representation of an object in the mind, there is also acquired some idea as to what they mean to the self, and how they may be used to obtain environmental consequences and need gratification. Each object is defined, therefore, by a cluster of structures which may be portrayed as a fringe built up around each central facet - the fringe corresponding to facets of that object in different environmental contexts and in different organismic states. The Knowledge Base is eventually interwoven by fringe and overlapping fringe, where each region of overlap is the association of one object with another, conveying the notion of grouping objects and events together on some emotional, temporal and/or affective basis. It is this grouping together of objects and of structures which give the possibility of constructing complex organisations of belief and action, which eventually leads to the emergence of intelligent behaviour. It is the creation and evolution (within the system) of grouped cognitive structures, which have their beginnings in genetically endowed sensori-motor reflexes, and the ability to maintain, at all times, some equilibrium between what is thought to exist and what does exist, that allows for the emergence and stabilisation of the higher thought processes.
The problem is therefore to understand how operations arise out of material action, and what laws of equilibrium govern their evolution; operations are thus concerned as grouping themselves of necessity into complex systems ...... but these, far from being static and given from the start, are mobile and reversible, and round themselves off only when the limit of the individual and social genetic process that characterises them is reached. (Piaget, 1949)
This is to say that the genetically endowed structures and processes interact with experiential data to form more and more complex structures allowing for complex knowledge representations and increasingly complex forms of activity, which forever re-group themselves reflecting changing individual and environmental relations, until some limit of evolutionary growth is reached - this limit being set by the form of the initial genetic endowment available to the infant.
Those processes which serve to activate the Representational Knowledge structures may be thought of as the processes of Cognition. As stated before they serve to create Operative, Conceptual and Cognitive activity. The structures which may be activated at any time are bounded by the Perceptual Set currently obtained and by past structures which are still reverberating within the system. Cognition, therefore, is not always bounded purely by Perception, and hence is not an activity bounded by the immediate time. It is this ability to free cognition from perception that allows the organism to tend towards more assumptive and imaginary behaviour, the beginnings of this being expressed in infants Hallucinatory Behaviour patterns.
The presence of a genetic, cognitive program provides a set of constraints (a framework) within which cognitive development may take place. The actual forms that develop, are dependent on the nutriments of the environment which bring about the realisation of the possibilities that inhere in genetic structure (Beilin, 1971).
Cognitive dysfunctions, then, could be due to inherent malfunctions in the genetic program, to physical malformations or to insufficient sensory stimulation. The latter two may be corrected through surgery and special rehabilitation processes. The first we cannot, as yet, correct.
To proceed from perceptual data activating purely reflexive behaviour, to holding, internally, ideas about objects which may be activated without recourse to the perceptual existence of the object, may be represented by the gradual construction of cognitive structures. These, although originally created out of the reflexive structures, eventually, through successive integration and differentiation, become structures rich in Representational Knowledge and potential operative use.
That process, which, making use of the comparative abilities of Perception, and certain inherent laws of its own, creates new structures and alters old structures so as to dynamically reflect the changing relationship between organism and environment, is, in terms of this model, the Learning Mechanism.
The Learning Mechanism by:
It allows, therefore, for cognitive activity to develop in a progressive and natural manner, always as much in resonance with the environment as possible.
To propose an evolutionary theory of knowledge acquisition is to reject the notion of irreversible bonding processes between elements as being the basis of the acquisition of new Representational Knowledge.
Psychological theories which state that the bonding of sensory, motor and drive elements (see Adcock, 1964) in different combinations so as to produce various modes of behaviour, assert that the bonding process is irrevocable. Once two elements are bound together in a particular configuration, they may not be considered as existing separately and available for further bonding.
However, an element S, may be present in several different environmental contexts, certain contexts. activating reflexive links and others having to be learned about. Thus S, may exist in the conditional set of several data structures, which circumstance leads to an incredibly complex system, even given only a few hours of subjective experience. It further implies that although the Reflexive Structures lend themselves initially to knowledge acquisition, after a while, more complex structures are available for creative purposes and the reflexive structures themselves (in their original form) may lapse into disuse.
Experiments on cortical lesions and sectioning have shown that certain fundamental behaviours can be relearned (e.g. discriminating between horizontal and vertical stripes, for instance, can be relearned if lost after isolation of the visual cortex (Sperry, 1958)). This implies considerable evidence for physiological plasticity in the brain. So much so that even complete sectioning may fail to produce any direct symptoms of malfunction.
Evidence of such versatility and flexibility in neural circuitry led Sperry to speculate on diffusely distributed structures of knowledge, and to reject connectivity theories which specify S-R and S-S bondings.
The nature of knowledge, then, appears to be that the structures are scattered over the data base in several similar forms, that they may lend themselves to activation in subtly different contexts. They bear a variety of associations, such that if, by massive sectioning, an entire portion is lost, then, under most circumstances a sufficient number of similar forms are held, such that the same structures, or similar forms, may be rebuilt. The entire system can, therefore, be regenerated in some relatively, unchanged state, if we were to suppose that each structure could only lend itself to the creation of some comparatively, similar structure.
The formulation of a Production System in which any newly created structure is a copy of the older structure which lent itself to the creation, gives rise to a Knowledge Base wherein the structures of knowledge are widely reproducible, context alterable, diffusely scattered, and such that any original forms are rarely lost. Thus, if any knowledge is lost, it can be relearned in a similar form, given the perceptual experience necessary for the relearning.
Further, due to the elemental constituents of a Rule, each Rule bears a potential relationship to several other Rules - those with common or similar elements. Thus, around each notion extends a nexus of other notions, all conceptually associated but structurally different and at different levels of status within the system. A full understanding of an object is only gained through the accumulation of a nexus of understanding about other things. The one being fully understood only when some proportion of the rest is fully understood.
The Knowledge Base evolves upwards and outwards, employing the basic reflexive structures for their beginnings, and then building upon the new as well as the old, until eventually a structure is produced having no relation with any of its ancestors.
It is only when the organism has lost the power of creation that an item of knowledge lost, is lost forever.
The understanding of an idea must be dependent on the degree of understanding afforded to the surrounding nexus, and hence an idea never suddenly appears, but steadily evolves from its created instance, to its first activational instance. It is this ability to grow in understanding, to gradually learn about things in relation to other things, that enables the individual to grow in understanding and ,in his ability to conceptualise. A thing, when known with regard to many other things (within the Knowledge Base) may then be conceptualised abstractly without reference to the concrete and particular which originally gave it being.
The initial knowledge available to the infant, if one maintains a belief in structural genesis, must be instinctual. Those structures which are activated by particular patterns of stimulation and eliciting reflexive response patterns.
The infant may be seen to respond reflexively to a variety of stimulation. The grasp reflex enables him to maintain contact with objects he may accidentally touch. The rooting and sucking reflex are invaluable in the feeding environment allowing him to find his source of nutrition and then to employ it. He instinctively cries if in distress, which ensures his mother's attendance; he soon learns to smile, ensuring her pleasure; and Bowlby (Bowlby, 1969) stresses that such instinctual behaviours are set up in such a way in the human infant as to ensure proximity with the mother (Als, 1975). He uses his instinctual knowledge not only to obtain comfort and protection, but also for various more specific functions. The repertoire of possible human newborn elicitors important in this context might include visual behaviours such as eye opening, eye contact and visual following, facial expressions such as smiling, frowning, grimacing and lip-pursing, and vocalisations such as voking , fussing and crying (Als, 1975).
His instinctual behaviours serve to elicit responses from his environment. If the caretaker he is provided with (usually the mother) follows the normal pattern of a caretaker, she behaves in a predictable way to his various behaviours. Thus, if he cries she usually appears near him and lifts him up to soothe and pacify him; if he starts seeking her nipple with crude but directed face movements and lip-pursing, she gives him her breast (or a bottle) to suck, or sometimes just a pacifier. She, in fact, responds to some of his instinctual behaviours in a predictable manner and this could lead to the setting up of expectations on the part of the infant, as to how his mother will respond.
This mechanism of expectation, i.e. the belief that through some action of his own a consequent event may be brought about, leads to a kind of problem-solution environment. The infant, in a particular situation which constitutes, to him, a problem, initiates that behaviour, whose expectation constitutes a solution to that problem. Thus, if when hungry, he accidentally brought his finger into his mouth which he then sucked and temporarily eased his need, he learns to do so again, when hungry and with no external means of satiation available. The finger-sucking is the solution to the problem hunger.
This action gives the impression of goal-oriented behaviour but due to the expectation mechanism, may be carried out without any goals being set up, implicitly. The infant is merely behaving in a way that he expects will result in some need gratification.
Similarly, if on some occasion, he kicked his cot accidentally, thereby setting in motion a swinging doll attached to his cot (see Piaget's observations on his child Lucienne - observation 94 (Piaget, 1953)), he may kick his leg at some future time and expect the doll to swing again. So whilst he derives pleasure from watching the doll swing, he will remember what action of his may cause it to do so.
But what nature does his expectation have? Does, for instance, the infant firmly believe that his expectation will be confirmed? Surely, the nature of the environment is such that some flux may, for instance, not allow some part of it to respond in its usual manner. For example if his mother is in the garden she may not respond to his cry readily. Thus, sometimes, if not often, the predictable does not occur. The infant surely must experience such environmental fluctuations which lead to non-confirmation of expectation. Does this totally destroy his partially formed knowledge or does it merely lessen his certainty in it? The latter would seem far more reasonable. In fact, it seems to be reasonable to say that all his early beliefs must stand on extremely shaky ground. He simply does not know enough of his environment to incorporate all its vagaries.
It is proposed then, that the initial acquisition of new knowledge structures reflect partially formed concepts - these may be termed sub-concepts indicating a less than firm belief in the outside world due to the immaturity (lack of knowledge) of the infant and his relative inexperience (lack of sufficient time for his environmental interactions).
The question, then, is how are sub-concepts formed and do they ever develop into certain beliefs and if so, how?
Thus, one reason for choosing a Production System representation now becomes more obvious. The fact that new structures may be incorporated into the system with an extremely low probability of activation, and allowed to gradually assert themselves by regeneration such that they steadily increase in probability of activation, allow for a gradual learning to occur. As the structure becomes strengthened it may eventually reach a sufficient status for reverberation. A reverberating (activated) structure causes environmental consequences leading to the enhancement of knowledge (a generated expectation) in the structure. If the consequence be in any way desirable (leading to need satiation or occurrence of novelty) then, chances are, the structure will get activated more often. Each time it has its expectation confirmed, it becomes considerably enhanced, and each time it fails, it becomes diminished in its certainty. But so long as it gets occasional confirmations it should continue to be activated in preference to others, since it does lead to some solution, whereas the others may not.
The nature of knowledge is such, then, that it remains below the bounds of certainty. However, as the infant grows older and identifies the circumstances during which his expectations may not be confirmed, he may adapt his structures leading to better behaviours. Thus, if he learns that crying for mother when she is not there is not as successful as crawling to where she is and then crying, then he has generated a more realistic structure capable of eliciting a more predictable response. In this way his structures may gradually evolve so as to incorporate a better definition of an event, thereby increasing the certainty of their consequences. As such, sub-conceptual beliefs may become conceptual or firm beliefs, the goal being a knowledge set incorporating as much truth as is possible with regard to the world and its behaviour.
Alternatively, concepts may be seen as being created by some pre-formed process, activated at a particular developmental stage and incorporating a new function. This function serves to generate deterministic links between structures, such that the nexus of a concept becomes strengthened and of a firm nature.
In a computer simulation of this model, probabilistic links (expectations) as well as deterministic links (state generators) were defined. The results and their implications will be discussed in Chapters 8 and 9.
It is only when representational knowledge yields certain operative knowledge, that truly intelligent behaviour may result. This is believed to occur only after the first 10 months (on an average) of the infant's life and enables him to indulge in active experimentation. He may, therefore, throw a ball to see what will happen to it or just to see it arc and fall. Active experimentation is only possible when the infant has developed hand-eye co-ordination to such a degree that the eye knows what to expect from a particular hand movement. This kind of knowledge must be of a fairly firm nature and it is the build-up of firm knowledge structures which allow for them to be experimented with.
A knowledge structure when not being currently activated has the potential of operation but whilst unactivated is merely representational. It represents some idea of an event. The instinctual knowledge structures are also representational although they only represent sensory motor connections. Any new structures, which must have their origins in the instinctual structures, may themselves be purely sensori-motor or they may be more complex in their representation. A more complex representation is one which defines a context more fully or one which defines a consequence. A representation may also include one or more of the drive elements (primary stimuli) in which case they represent a definition of a need situation.
An activated structure may be thought of as being operative in many different senses:
Finally, the nature of knowledge is such that certain structures may be inherently more significant than others. This could be due to their better definition of an event, to generations of expectations or because they incorporate primary stimuli, but, at any time, one reverberating structure may be thought of as 'reverberating more1 than the others. It therefore has greater potential for restructuring and any new structures must be created with reference to the most significant structure to date. This is to say that the system's belief in a structure leads it to lend similar beliefs to other structures.
The data structure chosen to represent a cognitive knowledge structure in this model is the Production Rule.
The model commences with a set of innate, hand-wired Production Rules representing reflexive connections, and proceeds to construct, using its innate set of learning laws, a Production System which represents, at any time, its total available behaviour.
It does so by using the experiential information contained in STM and by creating new Rules to represent new perceptions and re-organising old Rules so as to reflect new experiential knowledge.
Each Rule, considered by itself, is an encoded memory image, defining an experience that the organism has undergone. It does so by associating an environmental event (or an object), with a set of actions executed in response to the event by the organism, generating a particular consequence (a second environmental event) which the organism assumes was brought about by his own actions. Each Rule may therefore be considered as a machine defining a state transition. The transition being
given input si at state Si, produce output ri, and transform to state Si + 1
Each Rule gives the state transformation brought about by activating a particular set of inputs at a particular organismic state, and the resultant output.
However, it is not a finite state machine, since each Production Rule is, in itself, extensible, being able to add on or subtract input and output elements. Further, the entire Production System is also extensible, since any alteration to a Rule creates, in effect, a new Rule incorporating the alteration, the old Rule remaining in its original form within the system.
Thus, the Production System may be considered to be an infinite state machine, the only problem being the larger it becomes the longer the time taken to search through the system.
The advantages of such a representational system is, however, enormous. Each Rule is flexible. It can lend itself to an infinite number of other Rules in the creative process, each new form bearing an almost identical relationship with its creator, but being subtly different. This is in essence, Ashby's self-reproducing system (Ashby, 1947), where every new perceptual act creates automatically a new form through the dynamic interaction of the Perceptual Set with the old, activated forms.
This enables the organism to:
All these abilities, for instance, could not be incorporated into a finite state machine representation, since the very word finite denotes boundaries to the representational structure.
Further, the Production System need not be defined in total from the outset. Given the laws for learning, it merely evolves new Rules and keeps on adding to the system. This again is unlike a finite state machine representation which would require preliminary definitions of every state it would need to hold and each input-state-output-state transform combination. A system would, in no way said to be capable of learning, if every possible experience it had to encounter were to be given to it from the start.
When a number of Rules become grouped together, such that they define a phase-sequence of activity, they may be thought of as representing an operation, or a strategy to be executed in a particular situation, such that invariant sequences in action may be defined by the system. Thus Rules may, in fact, make up their own machine of behaviour, these machines tending to maintain their form and eventually become self-supporting and unbreakable. In the present definition of the model, an internally generated symbol structure may only designate another Rule for activation with an associated degree of probability. Thus, if two Rules are constantly activated one after the other, it is not through the alteration in nature of the internal symbol, but rather through invariances in the environment. However, if the internal symbol structure was to be defined as a deterministic link to another Rule, then Rules could become clustered together through the inherent belief of the system in certain operations being invariant, rather than an environmental imposition. Such a version has been implemented and will be discussed in Section 9.2.2.
The system defines a Relational Data Base since each Rule may be considered to be related to a set of Rules, the relationship being different in different contexts of usage.
The Production System allows for an evolutionary knowledge base, where the structures are homogeneous and the same internal representation serves to encode information obtained through different sensory modalities and representing different environmental objects and events. It allows for the encoding of a subset of relational facts, where, if critical facts are required (critical with respect to matching items in STM), the extracting process serves to retrieve the implied facts along with the critical facts. Further, the critical facts in the left hand side conditional list serve to label each Rule, such that all Rules with the same labels (or some subset of the label) are potential for retrieval in the same time interval. The critical facts enable the identification of redundant facts (such that specific instances of Rules may be created from the general form) and for the identification of additional criterial evidence (those critical items not contained in the label but present in STM) such that generalisations may be formed from the particular.
Such a general scheme of representation allowing for an infinite number of similar forms, allows the system to cater with the problems caused by noise, (non-criterial environmental information) and any initial perceptual slowness (caused by immature sensory organs) such that an adequately representative data set may only slowly be built up. Those Production Rules occupying higher positions in the league table, constitute the executive for behaviour currently proposed.
The idea of each Production Rule having an inherent worth with respect to other Rules in the system has been mentioned previously. The worth of a Rule defines its potential for activation, a Rule with higher worth being activated more often than one with a correspondingly lower worth. The worth system therefore defines a league table of Production Rules, where Rules are ordered in decreasing worth. The highest Rule bearing the greatest worth value and the lowest Rule bearing the lowest worth value. Since the entire system is in a constant state of flux, with new Rules being added, Rules being deleted through disuse and Rules being moved about within the system, the position of any Rule in the league table is constantly altering.
The problem was to define a function for computing the worth of a Rule, such that displacements caused by the constant movements did not push the system into a state of disequilibrium, i.e. the insertion, deletion or movement of a Rule did not destroy the total current worth ( Σ worth of each Rule) of the system.
Any Rule may be thought of as being characterised by the quantity and quality of its constituent elements and by its position in the league table. If a Rule was to be displaced upwards, then the displacement distance must be such that:
Thus, the Reinforcement Process for displacing a Rule goes hand in hand with the worth process for computing the inherent worth of the Rule.
A Rule displaced upwards, say from position Pj to Pi, tends to displace all those Rules between Pj and Pi downwards. A Rule displaced downwards from Pi to Pj tends to displace all Rules from Pj onwards, upwards.
If we consider Figure 5.1, and designated the horizontal distance between Rules P2 and P3 as δ, and the vertical depth between level 1 and level 2 as α, then obviously α must be greater than δ , i.e. a Rule moved up one level must have a significantly greater increment in its worth value, than a Rule shifted horizontally along the same level.
The worth function, if computed only dependent on the position of a Rule in the league table, must vary as a function of position;
W α f(p)
where p is the position of a Rule in the league table. Any chosen function, e.g. p-½ or p-2 would give, approximately, the same gradient if we were to consider the graph of worth plotted against f(p). Any negative exponential curve would do, showing the worth of the value decreasing as its position in the league table increases. For instance, at f(p) = 1/p½, the difference between worth values at positions 1 and 100, would not be as great as had f(p) been 1/p2. The gradient must be such that considering any two points close to each other on the curve, their respective worth values should be approximately equal. Further, there must be a significant difference in worth values for two points far apart from each other on the curve.
At f(p) = 1/p2, the curve falls off extremely sharply at the beginning and soon tends to giving a worth value of zero. Whereas f(p) = 1/p, the curve falls less sharply at the beginning having the same properties at the end of the curve. The function may only be defined after several simulation runs tying in results with actual experimental data.
Any adaptation in behaviour must imply learning of some sort. The infant perceives the objects in his environment and modifies the object of his perception by relating it to what he currently knows about his environment. He does so by incorporating the perceived elements into the already existent structures of knowledge. Any modification in the knowledge structure leading to the creation of a new structure reflects a new idea that the infant has formed with relation to his environment.
The acquisition of a new structure and its subsequent strengthening or diminishing of status within the entire set is indicative of a learning process.
When an organism learns, it means that it has adapted its attitude to its environment in some way. This could mean a change in its notions about the object concerned (an alteration in relationship) or a change in how it responds to the object (change in overt behaviour).
In any biological process in order that survival is ensured, adaptation is a necessary activity. The organism must learn about the new things in its environment and he must learn of how he should relate with them.
As Piaget states we can then understand that, superimposed on the direct interpenetration of organism and environment, mental life brings with it indirect interaction between subject and object, which takes effect at ever increasing spatio-temporal distances and along ever more complex paths (Piaget, 1949).
Once learning is established it is little affected by set and does not seem to need reinforcement (Hebb, 1949). In fact a well learned act (or set of actions) may become so well established as to seem reflexive or instinctual. It is difficult, for instance, for an adult to remember that he had to learn to walk as an infant, which act he performs so automatically and perforce without thought or concentration involved.
Hebb proposed that the characteristics of learning undergo an important change as the animal grows, particularly in the higher mammals; that all learning tends to utilise and build on any earlier learning, instead of replacing it ...... so that much early learning tends to be permanent . . . .; and, finally, that the learning of the mature animal owes its efficiency to the slow and inefficient learning that has gone before, but may also be limited and canalised by it".
Learning must involve the transference of properties belonging to one item of knowledge to another. Thus learning is evolutionary in nature, as Hebb asserted, and involves the gradual build up of knowledge, each one being subtly blended with the one before.
How much information may be considered as being transferred during learning is debatable. Hebb talks of learning as seeming to be half transfer. However, can one really talk of halves and quarters when it is in relation to knowledge and the elements of knowledge?
In terms of knowledge structures, then, learning can be transference of one or more of its elements to one or more separate knowledge structures. Thus structure A if it has been learned through structure B must incorporate one or more of B's constituent elements.
An adult may be seen as transferring preferred elements of knowledge, since his knowledge is of a far more precise nature. Transference in the infant may be much greater and more generalised (Hebb, 1949).
We may learn something only if we experience it. What is not experienced cannot be learned about, though, it remains in potential, learnable. As Reisen states we customarily think of the 'educative process' in higher primates as one that results in the gradual accumulation of the fruits of earlier experience over the months and years. Actually, our understanding of how prior learning supplies building blocks for new integrations and discriminations is growing only very gradually (Reisen, 1958).
Reisen's experiments reveal that what may normally be taken as being innate, may, in fact, be gradually learned. His light-deprived chimpanzees did not even produce an eye blink in response to a threatened blow toward the face.
Learning is brought about by the interaction of the innate Cognitive Processes with the structures of knowledge. The Learning Mechanism as is portrayed in the model, uses certain inherent notions to direct the learning process such that the system's understanding of its environment may progressively increase. The goal of learning, if one talks in such terms, is to increase understanding at all times, and is brought about by the gradual build up and grouping of structures.
When a structure acquires a link to another structure, it may be seen as becoming part of some centralised concept. To learn indicates the build up of concepts with larger and better-defined fringes of reference to other concepts. Eventually it must lead to the formation of executable strategies activated upon occurrence of a particular environmental event.
What is. the nature of learning in terms of a Production System? The phenomenon displayed is one of a slow learning process where each newly created structure struggles to improve itself such that it may become part of the selected behaviours and from thereon become part of the actively preferred set of behaviours. This slow progress of a structure up to the realms of operativity and preferability is indicative of the early learning of the species.
However, the gradual build up of many newly created structures, some of which may be more complex in form than others, may also reflect the later learning process, so typical in human adults, i.e. when they eventually do get activated, it appears to be a sudden jump from one form of complexity to a higher one. Thus, when the first sensori-motor structure incorporating an expectational element is activated, it suddenly gives rise to a new form of behaviour. An active form, which expects from its environment a particular consequence, It is a more demanding behaviour, and its appearance seems sudden, although it has been created, stored and then gradually allowed to evolve until eventually it got activated.
The question.then becomes:
Are there really two different learning processes?
or;
Could it be just the one, but its nature be such that it gives the
appearance of two separate forms?
This is a difficult question to answer. The model only employs the one and certainly, it does give the appearance of two distinct processes, rather than one. It gives the appearance of sudden insights (be it however lowly) and also the appearance of a slow, gradual build up of a total repertoire of behaviours.
Admittedly this does not exclude the possibility of two processes being present, one in the infant and two in the adult. However, it does give rise for speculation upon the fact that it may not necessarily be so, and that the one may account for both. In fact, the quick learning so typified by human adults may be due to the extensive experience he has gained, to the maturity of his Learning j^chanism and to the learning sets that he has already acquired.
The adult who remembers a face for years after just one glance, may only be able to do so due to his extensive familiarity with a variety of faces and to the development of his learning skill allowing for better retention and recall. Barlow's experiments show that the learning capacity of rhesus monkeys "may be changed out of all recognition by prolonged experience" (Hebb, 1949).
The infant commences learning with relatively little differentiation of his neural structures. His potential for learning is at its maximum in terms of how many structures of knowledge he may develop. One would assume, then, that learning is easiest in infancy, even when one thing learned may grossly contradict another. This may be due to two reasons:
Early learning should predictably be:
such that the setting up of habits is such that the habit is relatively easily alterable in content and in use.
Later learning should predictably be:
such that a habit is long-lasting and relatively permanent.
These predictions may be confirmed with reference to Stone's experiments on rats (Vince, 1961). He found that:
From this one may conclude that with regard to the learning in an infant:
The rate at which learning may proceed may, therefore, be dependent on:
Beach and Jaynes point out: anatomical differences, nutritional requirements, sensory sensibility, motor development and previous experience are closely interwoven variables with age and cannot usually be controlled independently of each other (Vince, 1961), implying the reliance of learning processes upon a number of other variables. Vince asserts Early experience may influence behaviour (a) by preventing the acquisition of other types of behaviour which could compete with a habit formed in response to a particular situation; (b) because motivation may be more intense in the young, and (c) because certain types of early experience influence later behaviour by structuring the individuals' perceptual capacities..
In terms of (a) with regard to the particular model presented in this thesis, a well-formed habit need not actively prevent the formation of contrary habits. It merely, by having a high reverberational coefficient, prevents the contradictory structures from themselves becoming sufficiently elevated so as to be activated. This is a natural process to do with the characteristics of the knowledge structures themselves than to do with an active process of inhibition (although inhibitive structures are present, see Section 6.2.3 ).
With reference to (b) if any structure is associated with need-satiation, then its probability of reverberation becomes substantially increased. Since need-satiation in- the infant is of an extremely basic nature, (Hunger and Pain) need-gratification is therefore a powerful reinforcing agent - more so than in the adult.
In the case of (c), those expectations which have become associated with and generated by the stabilised structures, affect perception in such a way as to influence the selective process it undertakes upon the sensory set. Thus, what the infant has learned to expect causes greater orientation than what, in reality, does occur. In the very young infant, he may actually choose to respond to what he thinks has occurred rather than, to what, in reality, has occurred. Learning to learn therefore depends on what has been learned previously. If the underlying habits form a useful learning set, then what is latterly learned may be better than had the learning set constituted inactive processes.
The modification of perception, cognition and behaviour, has, in this model, been attributed to a specific mechanism known as the Learning Mechanism. By effecting structural changes within the cognitive knowledge base, the Learning Mechanism allows for perceptual alterations such as identification and definition of different cue functions, for cognitive alterations such as the creation of new cognitive structures and alterations in status of the old, and for behavioural modifications such as altered responses, stimulation and altered patterns of internally generated consequential beliefs.
The functions of Retention (storage of a memory structure), Remembering (selecting appropriate structures on a contextual basis), and Recall (activating a cognitive structure) have been differentiated from the function of Learning, which in this model incorporates processes for:
Hence the Learning Mechanism, although strictly speaking is a part of cognition, has, for ease of functional clarity, been set aside as a separate mechanism, summoned and executed under the control of the Cognitive Processes, but existing as a separate sub-routine.
Laws of Learning have been formulated by many psychologists (Hull, 1920), recognised ones being recency, frequency, vividness, effect, exercise, readiness and assimilation. They may be summed up as:
Learning occurs when a recently occurred event has been attended to, the rapidity and duration of the learning being dependent on such properties (of the event and hence of the stimuli constituting the event) as the intensity of stimulation, the effect it has on the organism, the number of times the elicited response is exercised, the awareness of the organism and the number of times he is exposed to the same event.
Thus certain properties of the features in the environment and certain properties of the organism have been attributed to learning.
The essence of such organismic properties seems to be:
This led to the postulation of a minimal set of laws, minimal in the respect that most human organisms seem to possess them, and in the belief that humans have a varying set of laws according to their genetic endowment, the ones proposed herein constituting a common subset.over approximately all average humans.
It is not intended that these laws serve to account for all the behaviours exhibited by the human adult. Rather that they may serve to account for some basic set of behaviours, particularly those found in the human infant.
They are believed to be inherent to the Learning-Mechanism of the average human infant. They are:
We could go further and state laws for intensity, for colour, for threshold, etc. However, the criterion in any model is to observe how far the hypothesis may extend, and in this case to see how far these laws can cater for the model in enabling it to behave in an infant-like manner.
One thing more: the Learning Mechanism has a further process available to it. Namely, that of Reinforcement which is based on the belief that the confirmation of a prediction made by the organism, is considered important to the organism and results in the strengthening of the belief in that structure which generated the prediction. Conversely, if a prediction is not confirmed, then the organism responds by reducing its belief in that structure which generated the prediction. This allows the model to build up a fairly adequate belief system, maintaining structures in some priority order.
If we consider any organism, a finite time is required by that organism to sense, perceive, cognize and respond to an event. This time interval has been described as a Quantum of Psychological Time (Neisser, 1967) and is defined as that time interval taken by the organism to commence and complete his processing of a sensory event. Any interruptions within this quantum results in interfering with that process which is currently underway. Thus within a sensory modality, the processing of each momentary input is interwoven with and influenced by the processing of all other input adjacent to it in time (Neisser, 1967).
Stimuli appearing within the same psychological quantum become confused one with another, such that the properties of one may be attributed to the other and vice versa. Phenomena such as Backward Masking (the obscuring of a stimulus by another stimulus occurring almost immediately after the first and working retroactively on the first) could be attributable to the fact that stimuli within the same time interval, interfere with one another, particularly if they happen to be within the same sensory modality. (This would predict maximal interference when the two stimuli have zero or near zero delays, but this is not always true, masking occurring with delays of up to 100 millisecs (Neisser, 1967) and hence indicating that the masking function may be U-shaped.)
The Law of Temporal Association as defined by this thesis is in accordance with Stroud's theory (Neisser, 1967) on temporal summation within the same quantum of psychological time. This states that successive stimuli will be integrated only if they fall within the same discrete 'psychological moment' and it portrays the organism as progressing through time in discrete steps, each step indicating a subjective interval during which the organism responds to his environment and updates his knowledge base in keeping with newly obtained information.
This law is in accordance with the well documented law of association. In general terms if two events occur close together in time and are repeated.later in the same order, the individual will act as if the first event produced the second. Once the first event occurs, he begins to prepare for the second (Scott, 1968). A corollary to this law states that the strength of the association depends on the number of repetitions or reinforcements. This might also be called the law of habit strength. The notion of causality enables the organism to associate events together if they happened close to each other. Repetitions of such occurrences lead to the same form of the Rule being regenerated, each regeneration of the same form leading to positive reinforcement of that form through the Reinforcement Process.
This law enables the organism to make predictions about the consequence of his behaviour upon the environment. This ability to predict is considered to be the primary way in which any organism may effect control over his environment. With regard to the human species, the ability to predict must be based on some inherent notion of causality. The first experience of causality may be purely accidental, as when the infant sets in motion a swinging doll by shaking his legs in the cot (the doll being attached to the cot). Such an incident illustrates to the infant his impact on his environment. Perhaps this is not a conscious conceptualisation of cause and effect, but rather the beginnings of such a conceptualisation. In fact, it is when the action has become disassociated from the event that the true conceptualisation of causality may be held, i.e. the action now becomes the means to an end, the intention being to cause other events with the same action.
In terms of this thesis, the infant is given the means of identifying and isolating perceptual events and a belief in perceptual existence, e.g. Bower's (Bower, 1967) experiments with infants which show that an infant is capable of expressing a belief in the existence of an object which has temporarily disappeared. However, more than just a belief in perceptual existency is given to the infant via the Law of Causality. In enabling the infant to predict a sequence of events, he must first have experienced the sequence already. Thus, in the case of a disappearing object, he may only direct his gaze at the point of expected reappearance, if he has been exposed to such an event previously. It is only through the confirmation of his predictions about environmental events, that the infant may eventually learn duality, i.e. self and external object. Thus, his initial predictive and associative ability may be taken as being purely phenomenalistic (belief in the empirical data associated with an observed phenomenon) and it is through the strengthening of his belief in his ability to predict effects that the infant learns to isolate his causal action from the environmental effect.
These laws have been lumped together since they incorporate the same mechanism. That of refining a cognitive structure such that it forever tends towards a precise definition of an environmental object or event. It is therefore the problem of isolating and identifying the subset of critical facts (or stimuli) which act as the cue to an organismic response. Initially, the cue stimuli may be mixed in together representationally (within a cognitive structure) with non-criterial stimuli. This may be through inadequate perception (not perceiving sufficiently) or through mistaken notions (believing that stimuli, constituting the description of some former event, make up the definition of some currently occurring event). The two laws herein defined, serve to trim away non-criterial elements and to add on further criterial elements. They are postulated through the belief that such refinement processes must exist, if the organism is to correctly identify and define an environmental object or event. It is through these laws that it is able to form differential responses to stimuli, which originally were undifferentiated and associated within the same class, and which eventually become identified as individual members and hence forming sub-classes of their own.
There is no actual algorithm encoded for discovering the criterial elements in a function (like The Monkey's Uncle program of John Brown (Lindsay, 1973)), rather identification progressing slowly, but in most cases, surely, towards the eventual definition.
The model has incorporated into it, a specific Drive Process which serves to identify the level of need associated with any particular drive, to generate Drive Stimuli (in the case of Hunger) and to inform the Learning Mechanism as to any alterations in the level of need.
Drive Stimuli (Primary Stimuli) become incorporated into the cognitive structures via the Drive Law. Primary stimuli are considered to be of a different category from the secondary stimuli (which make up all other stimuli) and hence are treated completely differently from the others.
They are believed to be more effective in that they can cause inhibitive behaviour (as in the case of Pain stimuli) and be incorporated into cognitive structures whence a need has been assuaged, such structures representing a belief in that their activation should cause alleviation of the associated need.
This enables the learning process to pattern cognitive structures around the drive elements, such structures becoming the basis of motivation. It could be said that it is through this patterning of drives that personality is born (Adcock, 1964). However, in this model only two Drives have been identified - namely Hunger and Pain. Fear, for instance, has been attributed Drive properties and in fact the complete range of emotions may be representable as Drives, each generating their own stimuli. Alternatively Pain and Pleasure may be taken to be the basis of all emotions. Pain causing fear, despair etc and Pleasure causing hope, happiness, etc, such affects becoming built into cognitive structures through the Drive Law and eventually contributing towards the formation of an entire affect system, perhaps one, which, when well formed, becomes a separate functioning process in itself.
This could be expressed as the Novelty Drive or the Law of Curiosity, though they would all serve the same process in whichever way they were expressed. The Law of Novelty was included since novelty in a perceived stimulus is considered to cause an orienting reflex in the infant, which, of course, enables him to assimilate the novel object and to accommodate his cognitive structures to cater for it in the nearest, previously experienced context.
Novel stimuli become attached to Interrogate Type Rules which may be thought of as part of the infants undirected response repertoire. This is based on the belief that it is through these undirected responses which the infant is constantly indulging in, that new environmental objects get discovered and become identifiable by the infant.
At a particular stage in an organism's development, a fear reaction may be elicited by novel objects, particularly moving novel objects. Hebb (Schaffer, 1966) with regard to this phenomenon, supposed that fear would result if the object was similar to other known objects in enough respects to arouse the habitual processes of perception, but was also sufficiently dissimilar to the stored image as to cause a disruption in the central neural patterns laid down by previous perceptual acts. Schaffer came up with the Incongruity Hypothesis which stated that At a certain optimum degree of incongruity, interest is aroused ...., whereas deviation in either direction will give rise to avoidance and withdrawal responses ..... the organism finds itself unable to deal with the situation and is impelled to flee. (Schaffer, 1966).
At present the model does not cater for computation of novelty leading to a definition of any optimal measures. The organism is merely imbued with specific interest in novel objects to the extent of accommodating to them via the Law of Novelty. Thus, the model does not, ever, aversely respond to novelty. However, perhaps the computation of a fear level could be incorporated into the model, though this would constitute a somewhat artificial method of catering for the onset of fear through exposure to novelty. Further, there is evidence to the effect that considerable exposure to particular features (such as a variety of human faces) prevents the latter stage of onset of fear in the infant, and Schaffer suggests that the age at onset of fear could be a function of the different kinds of social interaction experienced. However, it is apparent that the manifestation of fear of strangers is a multi-determined event. The perception of unfamiliarity alone, it appears, is not sufficient to being about fear: other variables, referring both to organismic and to stimulus characteristics, must also be taken into account (Schaffer, 1966).
Conditioning has been defined as that process by which a response comes to be elicited by a stimulus, object or situation other than that to which it is the natural or normal response. The term was originally used of the case where a reflex, normally following on a stimulus A, comes to be elicited by a different stimulus B, through the constant association of B with A (Drever, 1975).
For conditioning to take place, the following are required:
Conditioning takes place as a result of sensory-sensory integration that involves learning according to a contiguity principle (Riesen, 1958), and when there is consistency and repetition in the patterns ...... sensory-sensory conditioning takes place. With regard to the reinforcing agent, which stimuli may be considered as being strong enough (attractive, desirable, innately preferable, etc) to enhance and speed-up the conditioning process. A Primary Stimulus? The human face? A brightly coloured object? These all have been used as reinforcing agents in various experiments. Siqueland and Lipsitt (Siqueland and Lipsitt, 1966) used food; Koch (Koch, 1968) used the face and voice of the mother; Koch also used different toys. Koch established that rapidity of conditioning and extinction were a function of the age of the infant, whilst Siqueland and Lipsitt, though using a Drive Stimulus, could not establish the level of the need (associated to the Drive) as a function of rapidity of conditioning and extinction. Reisen's experiments on deprived chimpanzees showed that even the ability to condition, i.e. establish S-S integrations, may in itself have to be learned. Even the use of painful reinforcing agents to strengthen an aversive response (an electric shock) did not enable a deprived chimpanzee to learn to avoid the shock, whereas in an average chimpanzee such learning is achieved almost immediately.
In terms of the model, conditioning takes place as a result of associations being made between frequently occurring and adjacent events allowing for innate reflexive structures to become co-ordinated with learned structures.
If we have, therefore:
S1 ⇒ <R1>
where S1 ⇒ < R1 > is the reflexive pair, and say it represented the innate recognition sight of Food ⇒ <salivate> , then, if the sound of a buzzer consistently preceded the plate of food, an association is created between the food and the buzzer, such that the Learning Mechanism creates the following structures:
P1: Sound of Buzzer ⇒ <listen> Food P2: Sound of Buzzer ⇒ <salivate> P3: Sight of Food ⇒ <listen>
whence the unconditioned stimulus-response pair was
Sound of Buzzer ⇒ <listen>
The Production Rule, P1, is created from the law of causality inherent in the Learning Mechanism. This states that if an event E1, is observed to precede an event E2, then E1 is believed to be the cause of E2.
In this case E1 is the buzzer-listen event and E2 is the food-salivate event. This states, then, that the event of listening to the buzzer caused the appearance of food to occur. In these circumstances, obviously the belief is untrue in as much as the two events are completely unrelated, only the design of the experiment having caused any dependence between them. But by constantly repeating the experiment, the subject will firmly believe in their mutual dependence and will become conditioned to the sound of the buzzer.
In order that P2 and P3 be created, the stimuli food and buzzer have to be attended to in the same time interval. Thus they must both be present in STM, together, in order that the subject attends to each on the same occasion. The Rules:
Sound of Buzzer ⇒ <listen> Sound of Buzzer ⇒ <salivate>
will then be activated simultaneously giving rise, through the law of temporal association, to the new structures:
P2: Sound of Buzzer ⇒ <salivate> P3: Sight of Food ⇒ <listen>
(Note: other structures may also be formed through the same association, but in this case only P2 and P3 need be considered.)
P2 is known as the conditioned reflex, and the buzzer becomes the conditioned stimulus. Structurally speaking, in this case the new structure is still, basically, only a sensori-motor structure, and no increase in complexity has occurred.
P1, however, is more than a sensori-motor pair and has an added degree of complexity reflected by the generated expectation.
Once P1 is created it initially will have a low probability of activation being a new structure. However, further repetitions of the sequence buzzer, food will lead to the enhancement of P1 raising its probability for activation constantly. Eventually P1 will actually reverberate (become activated ) upon occurrence of the buzzer, and provided the experiment is still in progression, the expectation should be confirmed by the presentation of food. Thus P1 will eventually reach an extremely high status within the system and become a preferred structure.
A structure such as P1 could show conditioning effects without have to create subsidiary structures such as P2 or P3. P1 when becoming preferred generates strong expectations. This causes, usually, behaviour to the expected stimulus being generated even it it did not occur environmentally. Environmental confirmation leads to high priority of attendance to the confirmed stimulus, leading to definite activation of the original reflexive linkage. Thus, even without the conditioned response being directly associated with the conditioned stimulus, it may still be activated through connections between the structures incorporating each.
Conditioning takes place faster and remains preferable longer if reinforcement is made contingent upon elicitation of the conditioned reflex.
A reinforcement which is associated with primary need satiation (such as food) results in a quickened learning process. It may be that if reinforcement is made contingent when the need level is high, then learning should proceed even faster. However this is yet speculation with regard to the human infant (Prechtl, 1957) and has not yet been confirmed experimentally.
The following factors should influence the rate and strength of conditioning:
Extinction refers to the process by which a conditioned reflex may become abolished so that upon occurrence of the conditioned stimulus, elicitation of the conditioned reflex become more and more decreased until only randomly occurring, (level at which the response may be expected on some normal basis of expectation).
In the model extinction may take place upon:
Sound of Buzzer ⇒ <listen> Food
the process of extinction occurs by food not being subsequent to the sound of the buzzer. The generated expectation is, therefore, unconfirmed and leads to the gradual loss in status of the structure P1. Obviously, if P1 is high in status to begin with, then the process of extinction takes place much more slowly than if P1 had been lower in status. This is due to the fact that unconfirmed expectations still have a chance of being attended to, thereby not allowing for decreases in elicitation of the conditioned reflex for some time.
If cessation of reinforcement is instigated then the expectation of reinforcement generated by the conditioned stimulus structure never gets confirmed, and like in the case above, the structure gradually loses its status through non-confirmations.
If reinforcement is made randomly, then extinction may proceed even slower than before due to the fact that occasionally the structure may actually have its expectation confirmed. If the structure was highly placed, then, in certain cases it may never get extinguished. However, if the structure was in some average position with regard to the other structures in the system, and if non-confirmation of expectations occurred a few times in succession, then the structure may be negatively reinforced such that it becomes extinguished, i.e. ceases to be activated. Even in these circumstances, if, randomly, confirmation did occur before the structure was sufficiently displaced, then it may take a while for extinction to occur and there is a small probability that it never does get completely extinguished.
For extinction to occur, then, the structure must be sufficiently negatively reinforced such that its probability of activation is low, before any occurrence upon which positive reinforcement could be made, does occur. If, however, it is still relatively highly placed, and positive reinforcement occurs, then it will delay the process of extinction even further and as stated before, may sometimes result in complete non-extinction.
Extinction depends on:
To say that the human infant is genetically endowed with a Drive System, is to say that the human infant possesses a set of instinctive impulses or motivational forces, prompting him or polarising his behaviour towards particular ends. A Hunger Drive, for instance, would tend to force the infant towards satiating his need (or his hungriness). In general terms, a Drive implies a motivational force toward a definable end. The need being generated (by the Drive) being that thing necessary for a particular purpose. The Hunger Drive, for instance, generating a need for satiation of the Drive. This is basically different from taking the stance that an instinctual Drive carries with it, a definition of that object which can satisfy its generated need, e.g. like saying that the Hunger Drive generates a need for food.
The model is only endowed with a set of innate Drives which generate a need that requires satiation, but which does not define that object which may satiate it.
When the Drive is functioning, the satiation of the generated need becomes, gradually, to be of prime importance to the organism. This is similar to Hebb's (Hebb, 1949) picture of an internal Drive being innate neural circuitry, which when activated (the amount of activity proportional to the generated need) tended to disrupt the normal processes of perception (normal patterns of stimulation). However, due to the vast quantity of differing patterns of stimulation being received by the infant, it is difficult to see how the patterns of firing set up by the Drive circuitry and those set up by external sources could interact to produce some identifiable interference pattern.
The model herein presented generates a specific stimulus element called Hunger when the Drive starts functioning. The stimulus generated remains the same as the Drive increases in activity, all that is changed being the frequency of emission of the stimulus element. Thus, as the generated need increases, so, proportionally, does the number of emitted Hunger stimuli, serving to disrupt the normal process of perception by pervading the entire system with their insidious effect.
When the system is in a state of need, these structures related to need satiation (containing the associated primary in their conditional set) obviously obtain a greater probability for activation than those which are not related. When such Rules are initially created, they tend to incorporate the primary stimuli and that object which led to the need-satiation. When an activated Rule leads to alleviation of a need, they get positively reinforced. Therefore, those structures related to need satiation and which are well placed in the league table, tend to incorporate those elements which have become identified with alleviating the need. This means that that element identifying a need, and that element identifying the means of satisfying the need become associated within the same cognitive structure.
Reflect now on Piaget's assertion: the different needs and interests that motivate the child to apply and develop his schema are conceptually dependent on those schema ..... what can 'reinforce' a child's behaviour, that is, satisfy or reduce his needs, is not identifiable apart from his schemas, so that both need and satisfaction are motivations which are 'indissociable from cognitive structure1 (Mischel, 1971). An affect is, therefore, born of the association between the need and its satiating means, both contained within the same structure, and hence inherently becoming more desirable to the system during the functioning of the Drive. It is this discovery of the desirability of a Rule with respect to the state of the organism, which leads to a system of affects (or emotive feelings) and is not something inherently possessed by the organism. Contrary to Freud's beliefs, affects (the attachment of feeling to a cognitive structure) are not given to the system, attached from the onset to particular structures, but result from the properties of the structure itself. Thus, when hungry, the infant is motivated to use those structures incorporating the belief that they are capable of alleviating his hunger, i.e. he has a need to motivate a particular kind of structure, rather than arbitrarily apply his structures. Further, the presence of a primary stimulus leads to the generation of structures incorporating it in the conditional set, i.e. the infant has a need to develop a particular kind of cognitive structure. He, therefore, has different needs to motivate and develop his cognitive structures, these needs being generated by the Drive System.
The generation of internal primary stimuli by the Drive Process may be seen as similar to Hull's (Mischel, 1971) internal drive stimuli (Sd) which together with the perceived stimulus played a role in the selection of the response.
In this system, internal drive stimuli became incorporated with externally derived stimuli within the same Production Rule through the Drive Law present in the Learning Mechanism. They act together, therefore, to form the conditions upon which responses may be selected. Motivation becomes connected, via learning, with reinforcement and internal factors that determine the 'direction' of behaviour (Mischel, 1971).
Any structures incorporating the Drive Stimuli, if when activated, lead to the alleviation of the need (the need level associated with the particular Drive), become positively reinforced, i.e. they have their affect strengthened. Conversely, therefore, an increase in need after such a structure has been activated, leads to the negative reinforcement of that structure.
This may be seen as leading to that process through which the infant acquires beliefs about his actions with respect to his environment. A well established cognitive structure is imbued with the belief that its activation will lead to eventual need satiation.
From this one may infer that any response with a need-satiating object (such as food) should become stabilised much more rapidly when the need level is at its highest, i.e. the organism should condition much more quickly when hungry (and using food as reinforcement) than when just satiated. Prechtl (Prechtl,1957) talks of the facilitating effect of hunger on obtaining the head-turning response, although Siqueland and Lipsitt (Siqueland & Lipsitt, 1966) could not show that the level of hunger had any significant effect on the rate of conditioning. Experiments by Stone (Vince,1961) on older rats, showed that their learning performance could be significantly raised above average levels by increasing their level of hunger. This suggests, according to Vince, that some facet of behavioural changes may be due to alterations in level of motivation.
This chapter attempts to present some reasons, taken from current psychological theories, which may serve to justify the chosen architecture of the model.
It presents a theory of development through genetic endowments wherein the infant is born with a set of basic structures and processes. It is the interaction of his basic endowments with the experiential data he obtains which enables him to form a model of his world.
Some predictions as to the nature of learning with regard to the model's architecture and the form that it should take are presented.
The laws for learning proposed in this theory are meant to be a common subset of the capabilities found in the species' initial repertoire. That they may serve to simulate the early learning abilities of the infant is predicted, the results from the actual programmed model serving to stand as justification.
In summary, Figure 5.2 presents the model in two categories: the set of genetic endowment programs, and, the set of genetic endowment structures. The arrowed lines indicate the direction of the flow of information through the model.
There are three primary memory modules available to the system and a number of secondary memory modules maintained by the system:
SSTM serves as an input buffer for the environmental information that is generated in each time interval. Figure 6.1 gives the structure for SSTM. It is a two-dimensional array, where SSTM (I, 1) contains the nearest equivalent symbol position and SSTM (I, 2) contains the actual symbol position in the Stimulus Symbol Table. SSMX defines the maximum number of symbol structures that SSTM can hold during any time interval. As soon as a symbol is entrant, it is assigned a unique position in the Stimulus Symbol Table where the entry key is the name of the symbol structure. Thus SSTM contains each unique internal identifier, rather than the actual symbol itself.
Figure 6.2 gives the design for the Stimulus Symbol Table. Consider an entrant symbol structure with the name TOUCH. TOUCH obviously belongs to a particular mode of entry (Modality) and may be a Primary or Secondary stimulus (TOUCH being, by definition, secondary). It also has distinctive attributes:
What, therefore, actually identifies a stimulus to the organism is not just the entrant symbol, but all its associated properties. Thus, the stimulus TOUCH mode TACTUAL, classification NON-PRIMARY, with attributes LIGHT, LEFT CHEEK, for instance, would be identifiable as being different from the stimulus TOUCH, mode TACTUAL, classification NON-PRIMARY with attributes LIGHT, RIGHT CHEEK.
Therefore, it is not the entry in STIMNM (see Figure 6.2)which uniquely identifies a stimulus, but the associated entry in STIMLS, which is indicated by the pointer MPS. SSTM contains, then, the MPS value of a stimulus. The MPS value, unique to each stimulus, enables access to the attributes list in the list structure STATAR, to the symbol itself in STIMNM, and through the forward pointer to the classifications list in the structure STCLAR. The forward pointers from list to list are all used to ensure that the correct build up of the Stimulus Symbol Table is maintained as new symbols, classifications and attributes are added to the system. Each entry to the classifications and attributes lists may have as many positions assigned to it as is required, the requirement being given by the preceding COUNT Value. In the STCLAR list no actual terminating value is given to each entry, termination being defined by COUNT and the forward pointer from STIMCL to the next entry in STCLAR. However, STATAR, has, as its initial element for each entry a backward pointer to STIMNM, followed by a forward pointer to STIMLS, then the COUNT succeeded by as many attributes as defined by the COUNT value, and the terminating element is a forward pointer to the most similar entry in STATAR. This enables similar stimuli (possessing common classifications, say, would be one definition of similarity) to be linked together in STATAR. STIMLS contains all information pertaining to each stimulus (as opposed to each symbol structure), which information is continuously updated by the system. STBKAT is a backward pointer to the entry in STATAR. The other elements contain the following information:
It must be remembered that any entry pertaining to a symbol structure in the lists STCLAR and STATAR may have as many elements as required. The system is therefore easily extensible allowing for other classification or attribute factors to be defined, such as, for example, a signal value for each stimulus dependent on such properties as whether its reflexively known, conditioned, etc.
The Stimulus Symbol Table is designed so as to act as a pattern recognition system, capable of holding individual entries per stimulus element or of chunking such items and replacing them with the chunk code when necessary. All that is required to be added, therefore, is a chunking routine. (Some ideas for a chunking routine are discussed in Section 9.2.1).
The memory structures, themselves, are so designed as to cater for most extensions to the systems.
Figure 6.3 gives a design for STM and OTM. As in SSTM, STM contains entries for the nearest equivalent stimulus in STM (*, 1) and the actual stimulus position in STM (*, 2). STMTAG holds the tag value for each entry which defines its priority for remaining in STM, and STMAGE gives the current age of the stimulus since its entry into STM, i.e. the number of consecutive time intervals that it has been in STM. STMCAP defines the maximum number of entries that can be defined in STM in any time interval (current capacity of STM being STPTCY ≤ STMCAP).
The permanent data base for holding encoded cognitive structures is LTM. Each structure is encoded as a Production Rule, the maximum number being defined as PRDLMX.
Figure 6.4 gives the structure for LTM. Each Rule has its list of elements defined by:
The system is able to maintain a list of Production Rules with variable left and right hand sides. To keep storage down, the system was designed so as to maintain a unique entry per Rule with its own set of pointers to the different list. This is opposed to having fixed format with the maximum expected size being the storage space for each Rule. All contents of a Rule are packed up tightly in the three 1-dimensional arrays PRSTLA, PRRSRA and PRSTRA.
When a new Rule is inserted into the Production System, it is assigned its entry position in the arrays PRLST, PRNXT etc, where PRLST would be a pointer to the preceding Rule in the system (a new Rule is placed at the end of the System ) and PRNXT would be a pointer to the successor. This does not necessarily mean that PRLST points to the last Rule to be generated, rather to that Rule occupying the position preceding this one.
Its left hand stimulus set is entered from the next free position onwards, preceded by a count, in PRSTLA, a pointer to the count element being inserted into PRSTLF.
The right hand response set is entered from the next free position onwards, preceded by a count, in PRRSRA, a pointer to the count element being inserted into PRRSRG.
The right hand stimulus set is entered from the next free position onwards, preceded by a count, in PRSTRA, a pointer to the count element being inserted into PRSTRG.
The elements in each list (left stimulus, right response, and right stimulus) are ordered in ascending order. This makes the matching process much easier. For example, if the symbol structure designated by the internal unique identifier 1 is being matched, and the first element encountered in the array PRSTLA (accessed by the pointer in PRSTLF) is greater than 1, then that Rule may be abandoned, the search going onto the next one in the system, given by the pointer in PRNXT.
Similarly responses are decoded into internal numbers, a unique number designating each response element. (Numbers starting from 1001 onwards and obtained by the Function RSINNM (RSNMPS).)
Finally, consider the Production Rule
1, 5, 6 ⇒ <1001> 3
to be inserted into the system. Figure 6.5(a) gives the Production System before insertion, and 6.5(b) gives the System after insertion.
A suite of programs have been designed and encoded in FORTRAN so as to simulate the innate processes available to the infant (see Figure 6.6). The major processes are:
Figure 6.6 serves to describe the flow of execution through the model:
Firstly, at the beginning of every time interval, a new batch of information is generated and entered into SSTM as the current sensory data.
SSTM is acted upon by the Perceptual Process which encodes the sensory information into uniquely, identifiable symbol structures, updates all information pertaining to each symbol structure in the Stimulus Symbol Table, ascertains priorities for movement of data from SSTM to STM and at this point, if informed by the Drive Process as to some need level acuity, cuts down STM and channel capacity to 1, and then eventually, transfers the current Perceptual Set from SSTM to STM.
The Learning Mechanism, using the current information contained in STM and with access to OTM, SIGPR (the most significant Rule activated to date) and STPRED (the list of expected internal symbol structures), creates new Production Rules and Reinforces certain old Rules. It also receives information from the Drive Process as to any alteration in need levels.
The Drive Process computes the new need levels and alerts the Perceptual Process if any need level has exceeded any defined boundary limits.
The Cognitive Processes update SIGPR and PRPRED (by decreasing their worth value by a temporal decremental factor), match the items in STM with the left hand sides of the Rules in LTM (there is a limit to the number of matched Rules that may be retrieved, thereby limiting search time and space), computes the current inherent worth of each matched Rule and then chooses Rules for activation dependent on their worth values, updates STM with any internally generated symbol structures, fills in the current response list and now updates SIGPR, PRPRED (those productions which generated most significant predictions), and STPRED, dependent on the currently activated set of Rules and currently generated symbol structures.
TNOW is now updated by one psychological quantum of time (which at present is set at 1 unit of time) and the model enters its next cycle of execution.
The stimulus information that is generated in each time interval is dependent on:
The routine SSINNX defines the different test environments that have been specified to the model. It generates stimuli dependent on whether:
Thus, SSINNX must have defined for it, values for:
SSINNX also has defined to it the available sensory information it is to generate in each cycle, the actual item chosen for generation being dependent on the TMST (the current TCYCLE in a trial) value and the previous responses of the model.
If the model is to be tested in some standard environment (as opposed to an experimental one) then the routine INSTIM leads in the sensory input (from a pack of data cards or from an input terminal) and currently, it is up to the programmer to define his input for each time interval.
The routine expects the input format to be number of stimulus names, stimulus name, classification count, classification list, (and followed by as many stimuli and classifications as defined by the count in number of stimulus names). Number of stimulus names, stimulus name, attributes count, attributes list (followed by as many stimuli and attributes as defined by the count in number of stimulus names).
For example, the input could be
3 TOUCH 2 5001 5002 NOISE 1 5001 FOOD 1 5001 3 TOUCH 1 4005 NOISE 2 4005 4007 FOOD 1 4006
The routines SSCVIN, SSORDR and STIMIN compose the Perceptual Process.
SSCVIN - this sets up the currently generated sensory information in the array SSTMBF. At the end of the routine, SSTMBF uniquely defines each sensory item in its internal form and SSTMBF (*, 1) contains the nearest equivalent sensory item in the Stimulus Symbol Table if this is its first appearance, and (*, 2) contains the actual position of the item in the Stimulus Symbol Table.
In order to encode each sensory item into its uniquely, identifiable internal form, the function STINNM is summoned, which returns the position of that symbol structure matching the given sensory item in the table STIMNM (it returns the integer value STIMNM position).
STINNM assumes that every sensory item to be encountered by the model must be initially preset into STIMNM.
There is, however, a considerable difference between a sensory item defined to the model and the actual stimulus facets that it may represent. The actual stimulus facet is defined by the attribute and classifications entries for each sensory item. Thus:
TOUCH with attributes LIGHT, LEFT CHEEK
is a stimulus facet of the sensory item TOUCH and differs from the stimulus:
TOUCH with attributes LIGHT, RIGHT CHEEK
SSCVIN calls the routine STENTR to insert each stimulus into the Stimulus Symbol Table. To do so, STENTR checks to see if the attributes list defined for that sensory item is already present in the array STATAR. If it does, then the stimulus has already been previously perceived, and the position of its entry in STIMLS (its MPS value) is returned as the actual and nearest value for insertion into SSTMBF.
If the attribute list is not present in its defined form, then it is added into that list which is considered as being closest to it. To do so the routine STCNT is called which returns a count for the number of attributes in the input list matching the attribute list currently being scanned in STATAR. A pointer is kept to that list having the greatest STCNT value. After STCNT has terminated its search, the input attribute list is appended to the list immediately succeeding that entry which has been estimated as having the highest matching count. The pointers in the STATAR array are updated to account for the new insertion. SSTMBF (*, 1) contains the position in STIMLS of that entry estimated as the closest match, and SSTMBF (*, 2) contains the position in STIMLS of the newly appended stimulus (see Figure 6.7).
If no attribute list exists for the input sensory item then it is appended at the end of the attribute array STATAR, and a pointer to it is inserted into STIMAT. An entry position is obtained for it in STIMLS and that position value is returned to SSTMBF as the nearest and actual value.
SSORDR - this routine re-orders the symbol structures designating the stimulus items in SSTMBF and copies them into the array SSTM. In so doing, any multiple occurrences of the same stimulus are removed leaving only one entry per stimulus. SSORDR exits with SSMX containing the number of stimuli in SSTM. One may consider that all the perceptual processing is done on SSTM, the buffer SSTMBF only serving as a computational asset and not existing as part of the defined innate structures within the system.
STIMIN - this is the major routine in the Perceptual Process. It defines the Perceptual Set in every time interval. Appendix 4 gives a flow diagram for STIMIN.
It performs the following functions:
Confirmed internal stimuli = 4 Primary but non-acute stimuli = 3 Non-habituated, recurrent stimuli = 3 Non-confirmed, internal stimuli = 2 Habituated stimuli = 1 Non-attended, old stimuli = 1 Attended, old stimuli = 0
Although the Learning Mechanism (see Figure 6.6) is defined as being executed after the Perceptual Process, in reality it is called upon in two stages. Once, between SSORDR and STIMIN, and then, after STIMIN.
The Learning Mechanism has access to past information contained in OTM, SIGPR, PRPRED and STPRED (see Appendix 2).
New Production Rules are created according to the inherent laws contained within the Mechanism, and being expressed as different routines, executed under predefined contexts.
There are five laws, each expressed as a singular routine within the Learning Mechanism.(see Section 4.2.4).
CONCAT - this generates new Rules according to the Law of Causality. It accesses OTM (the set of stimuli which arrived in STM at TNOW-1) and chooses a non-primary stimulus S from the set in OTM.
If the most significant Production Rule (obtained from the contents of SIGPR) was of the form:
S1 S2 S3 ⇒ < R1 > then, a new Rule of the type: S1 S2 S3 ⇒ < R1 > S is generated.
At present, CONCAT may only generate Rules from SIGPR if the Rule in SIGPR did not possess an internal stimulus. However, in general, CONCAT may be considered additive. Thus, if the Rule in SIGPR was:
S1 S2 ⇒ < R1 > S3
then the new Rule would be created by adding S to the set:
S1 S2 ⇒ < R1 > S S3
SLTCHG - this generates new Rules according to the Law of Temporal Contiguity. If the two Rules:
S1 S2 ⇒ < R1 R2 R3 > S11' S12' S13' S3 S4 ⇒ < R4 R4 R6 > S14' S15'
are activated within the same time interval, they may lead to the generation of any one of three Rules, the Rule type actually created being chosen at random. These are:
S1 S2 S4 ⇒ < R1 R2 R3 >
which involves the addition of a left hand stimulus from one Rule to the other. Thus, as many combinations of this kind of Rule as there are stimuli in the left hand set may be generated.
Or:
S1 S2 ⇒ < R1 R2 R3 R5 >
which involves the addition of a right hand response from one Rule to the other. Thus, again, the number of different combinations depends on the number of responses contained in each Rule.
Or:
S1 S2 ⇒ < R1 R2 R3 > S11' S12' S13' S15'
which involves the addition of a right hand internal stimulus from one Rule to the other.
In every case, the actual one chosen is obtained at random.
The new Rule is inserted, at the end of the System.
EXTEND - this generates new Rules according to the Law of Insufficiency.
If STM at TNOW contains the set:
S1 S2 S3
and, if any of this set is a non-primary and was not attended (did not match the lhs of a Rule chosen for activation), then, it is added onto one of the Rules that was chosen for activation.
Thus, if the Rule activated (and chosen for addition) was:
S4 ⇒ < R1 >
then, the new Rule:
S1 S4 ⇒ < R1 >
may be created, where S, is non-primary and was not attended.
EXTEND is used specifically, also, for adding the primary stimulus Hunger onto the left hand stimulus set of an activated Rule, if it is not already present. Thus, if STM at TNOW contains:
S1 S2 H
and, if the Rule activated was:
S4 ⇒ < R1 >
then, H is added to this Rule even if it was attended to, i.e. even if it matches the conditional set of an activated Rule.
The new Rule created is:
S4 H ⇒ < R1 >
EXTEND also caters for the accommodation of novel stimuli.
If at TNOW-1, a Rule activated was of the Interrogational type:
⇒ < R2 >
and STM at TNOW contains a novel stimulus of modality M1, which modality matches with some stimulus S contained in the stimulus set of OTM and which S was not attended, then, a new Rule is created:
S ⇒ < R2 >
ADDHNG - this generates new Rules through influence from the Drive Process.
If STM at TNOW-1 contained:
S1 S2 H S3
(where H = Hunger) and the Rules activated were:
S1 S2 ⇒ < R1 R2 > S11' S12' S3 ⇒ < R3 R4 > S13' S14'
and: HCOUNT (TNOW) > HCOUNT(TNOW-l), (the Hunger need has been reduced), then, the new Rule created is:
S1 S2 H ⇒ < R1 R2 > S11' S12' or S3 H ⇒ < R3 R4 > S13' S14'
one being chosen at random and not previously containing H).
If STM at TNOW-1 contained:
S1 S2 P S3
and a Rule activated was:
S1 S2 ⇒ < R1 R2 > S11' S12'
and PCOUNT (TNOW) < PCOUNT(TNOW-1) (the Pain level has been reduced), then, a new Rule is created of the form:
S1 S2 P ⇒ < R1 R2 > S11' S12'
PNNGRF - this generates a special kind of Production Rule known as an INHIBIT Rule, and which, if activated, should not be obeyed (i.e. its responses should be inhibited).
If a stimulus PAIN(P) of modality M is present in STM at TNOW, (not being present in OTM), and, if, at TNOW-1 a Rule was activated with a stimulus of modality M in its conditional set, e.g.
S1 S2 S3 ⇒ < R1 R2 >
then, it is believed to be responsible for causing PAIN, and a new Rule of the type:
S1 S2 S3 ⇒ * < R1 R2 >
is created, where the * indicates inhibition of R1 and R2. In reality, an INHIBIT Rule is encoded as:
S1 S2 S3 ⇒ <R R1 R2 >
where R is the lowest value assumed by any response in the response set (= 1000). Since the response list is in ascending order, only the first element need be checked to see if it is an INHIBIT Rule.
REINF - the Learning Mechanism, in addition to creating new Rules, also Reinforces certain Rules negatively or positively according to success or failure.
PRDIGS - checks to see if any internally generated stimulus maintained in the list STPRED has been confirmed in STM at TNOW. It removes confirmed internal stimuli from STPRED (by setting the position number of the Production Rule which confirmed it to zero) and if the Production number of the Rule which generated that stimulus (contained in PRPRED) is also the most significant Rule to date (SIGPR) then the worth of the Rule in SIGPR (SGPRW) is set to zero, thereby removing the Rule from .its position of significance. The associated Rule in PRPRED is rewarded by the Reinforcement Process. PRDIGS serves also to remove predictions from STPRED if their associated worth values (WTPRED) are significantly lower than that of the most significant prediction. It does so by using a factor WTHCUT - a cut off point for worth values.
PRRMVL - this routine rewards those Rules which served to reduce a primary need. If any Rule has been activated with a Primary in its conditional set, and if:
HCOUNT(TNOW) ≥ HCOUNT(TNOW) or PCOUNT(TNOW) ≤ PCOUNT(TNOW-l)
then that Rule is rewarded.
INTNOV - this rewards any Rules which are believed to have generated a novel stimulus.
If any Rule activated prior to TNOW contained a stimulus S of modality M in its conditional set, and STM at TNOW contains a novel stimulus S' of modality M, then the Rule is rewarded.
PRINBF - if an attempt is made to generate a Rule which is already present in the Production System, then it is subject to positive reinforcement. This is an important mechanism for reward, since learning is made possible even when Rules have not been subject to activation. Learning, therefore, does not always occur on the basis of what has been recalled, but can be effected in the quiescent memory layers as well.
PRINVL - if the Drive Process reports an increase in any need level, such that:
HCOUNT(TNOW) < HCOUNT(TNOW) or PCOUNT(TNOW) > PCOUNT(TNOW-l)
and there exists an activated Rule with the associated Primary in its conditional set, then that Rule is subject to negative reinforcement.
CALSIG - if any stimulus in STPRED has been overwritten by a new internal stimulus, prior to confirmation of the stimulus, then the associated Rule (given in PRPRED) is subject to negative reinforcement.
REINF uses three routines for reinforcing Production Rules.
MOVEUP - moves a Rule upwards, the displacement distance being computed as:
New position = Old position - [(Old position * M1)/16] + 1
where M1, is a multiplicative factor - the whole being rounded to the nearest integer.
M1 is defined for each type of movement, having different values for amount moved up when an internal stimulus is confirmed (JPRIGS), amount moved up for reduction in hunger need (JPRLES), amount moved up for causing novelty (JNOVUP) and amount moved up for regenerating an existent Rule (JALCRE).
The different values for each of these Parameters will be further discussed in Section 8.2.3.4.
MOVDW - similarly, this routine negatively reinforces a Production Rule, the displacement distance being computed as:
New position = Old position - [(Old position * M1)/16] + 1
rounded to the nearest integer.
M1 has different values dependent on whether the Rule was punished for increasing the hunger need (JPRGRT), for non-confirmation of an internal stimulus (JPNIGS) and for generating pain (JPNGRF).
MVUPDW - consider the following situation:
in TNOW-1 a Rule is activated of the type:
P1 : S1 ⇒ < R1 >
which leads to the generation of a new Rule at TNOW, say, of the type:
P2 : S1 ⇒ < R1 > S2
from the Law of Causality.
P2 is inserted into the system at a position one above the last Rule in the system - effectively giving it a very low probability for activation.
If at some further point in time, S1 recurs, then P1 is more likely for activation than P2. If now S2 recurs in the next time interval, then obviously P2 would have been a more informative Rule to activate than P1. However, if now an attempt is made to regenerate P2' then P2 is moved upwards. It may be argued at this point, that P1 should be correspondingly moved downwards, since in reality a better Rule exists in the system P2, which should succeed its creator P1.
This routine serves to move an associated Production Rule (P1) downwards, every time an attempt is made to regenerate the Rule P2, thereby displacing it upwards. P1 is reinforced negatively according to the formula given in MVDWN. However, P2 is moved upwards until:
New Worth of P1 + New Worth of P2 = Old Worth of P1 + Old Worth of P2
i.e. the total worth of both Rules are kept the same.
The Cognitive Processes are made up of a number of routines designed to perform the following functions:
CALSIG - this routine maintains a memory of the most significant (the greatest worth) Rule activated, and the most significant Rule is stored in SIGPR (its position number in the system) and its corresponding worth value is stored in SGPRW. In each time interval, that Rule chosen first for activation (the first Rule need not be the one with the greatest worth value, since a worth value only gives a probability for activation, thereby giving opportunity for secondary Rules to come above the best Rule), has its worth value compared against SGPRW. Meanwhile, previous to this comparison, CALSIG decrements SGPRW by a temporal decay factor DECRT. Thus, in each time interval the Rule loses some reverberation or energy and hence some significance to the system. DECRT serves to lessen the memory of a significant event in a natural manner, allowing therefore, for sufficiently significant successive events to dominate over past memories. If at any current time interval, the first Rule chosen for activation has a greater worth value than SGPRW then it overwrites the Rule in SIGPR. The Rule indicated in SIGPR dominates the learning faculties, such that any new item learned about is usually in relation to that Rule in SIGPR, and hence, bears elements in common to it.
The list of the most significant predictions generated to date is kept in STPRED. The corresponding Production Rules which served to generate them are maintained in PRPRED with associated worth values in WTPRED. Similar to that law which cuts SIGPR down to one at the commencement of development, NMPRED may also be restricted to some minimum, which is then allowed to grow as the child develops. In the current implementation, the length of STPRED may be varied by altering the parameter NMPRED. if the most significant prediction in STPRED has a worth value of W1, then the worth of every other prediction must come up to some proportion of W1. This is computed as:
(Worth of Prediction) / (Worth of Best Prediction) ≥ WTHCUT
WTHCUT defining a cut off point, beyond which the prediction is considered to be too insignificant (relative to current predicted memories) to reverberate and be kept in memory. The worth of each prediction in WTPRED is also subject to temporal decay, such that each prediction reverberates less over time. The decay factor DCPRED reduces WTPRED by a small amount in each time interval.
SIGPR and STPRED allow the system to maintain a memory of the most significant past events and future expectations, rather than to remember merely on a temporal basis - i.e. on a basis of degree of impact upon the system rather than recency.
Anytime that the contents of SIGPR or STPRED are overwritten by another Rule or by other predictions, then they are automatically considered to have been lost to memory (short term not long term) and the Learning Mechanism takes appropriate action at such points (with regard to Reinforcement procedures as previously explained).
MATCH - this routine serves to compare the contents of STM against the conditional set of Rules encoded in LTM. It scans LTM serially, and hence has a limit for the number of Rules it may choose as being relevant by degree of match. (The cut off point at the moment is ten matched Rules. This, again, could be considered to be a developmental parameter set to increase according to the size of the system and degree of cognitive development.)
Consider the contents of STM to be:
Sl S2 S3 S4
and the Rules in the system to be:
P1 : S1 ⇒ < R1 > P2 : S1 S2 ⇒ < R1 R2 > P3 : S2 ⇒ < R2 > P4 : S3 S4 ⇒ < R3 R4 > P5 : S3 S5 ⇒ < R5 >
then, obviously they are all potential candidates for the matching routine, since they all bear some degree of match to the contents of STM. Consider now, the Rule P2. If this is chosen, then effectively S1 and S2 have been used up, i.e. already matched. Thus P1 may be eliminated on the basis of choosing P2. Similarly for Rules P2, P3 and P4, P5.
Appendix 5 gives a flow diagram for MATCH. It calls the sub-routine PARMCH which scans through the Production System checking the Rule P in position p against the contents of STM. If a match of some kind is found, then the information is stored in the array MT, where:
MT (*, 1) = production number MT (*, 2) = count of items on lhs of Rule MT (*, 3) = number of matching items MT (*, 4) = 0 if non-primary, 1 if primary element matched
PARMCH ensures that at least some proportion of the number of elements in each Rule must be matched, before it can be set aside as a relevant Rule. The minimum number of elements that must be matched is defined by the parameters PRNM, (for a non-primary Rule) or PRPR (for a primary Rule) and not more than MSNM (for a non-primary) or MSPR (for a primary) elements must remain non-matched.
MATCH exits with a set of relevant Rules for CHOOSE to choose from.
WORTH - MATCH, selects a list of the most relevant Rules for activation based only on their degree of context. The routine WORTH computes the current inherent worth of each Rule. Each Rule must be considered to possess some potential for activation. The question is, how may this potential be determined? The routine WORTH assesses a number of characteristics for each Rule and gives each a measure, such that its total worth may be computed as some function of all the defined characteristics. These characteristics may be defined as:
number of matching elements / total number of elementsWith regard to the internal stimulus list, complexity must be proportional to the number of beliefs currently expressed by the Rule.
The worth of a Rule may be expressed as the following relations:
W α 1/p W α Primary W α Complexity W α Relevancy or degree of match W α 1/Age for criterial stimuli The formula used by the system is: WORTH = PRMCNS * (No. of elements on lhs + 1) / (No. of elements in STM) * (No. of matching elements on lhs + 1)2 / (No. of elements on lhs) * (1 / ((position + PSNCN1)PSCN2) * 1 / Age * (1 + PRIGCN * No. of elements in internal stimulus list)
PRMCNS: if Rule contains a primary, then its worth is enhanced by the multiplicative quantity PRMCNS (set to 0.5 generally). A primary Rule is, therefore, 1.5 times as good as an ordinary Rule.
(No. of matching elements on lhs + 1)2 / (No. of elements on lhs): complexity of the Rule is defined as a proportion of its own complexity to that of STM (adding 1 ensures the term will never be zero).
(No. of matching elements on lhs + 1)2 defines the relevancy of the Rule as the proportion of its criterial elements to the complexity of its conditional set.
(1 / ((position + PSNCN1)PSCN2): it was decided that 1/p2 gave that curve within the family of curves (defined by 1/pn) with the most desired characteristics (hence PSNCN2 = 2). The parameter PSNCN1 merely shifts the vertical axis of the graph towards the right. This means that even considering the first three Rules in the system, the difference between them is lessened by pretending that a number of preceding Rules (the number being the quantity PSNCN1) exist. Thus, the horizontal component δ (see Section 5.4.1, Figure 5.1) is kept more or less the same over all the system. If the parameter PSNCN1 were ignored, then, δ over the first few Rules in the system (the first three for example) is far more emphasised than the δ measure over the Rules lower down in the system. By setting PSNCN1 = 10, say, it effectively reduces δ over the first few Rules (since now they are assumed to be at positions 11 onwards) and hence destroys the tremendous advantage that Rule 1 has over 2, and 2 over 3, etc.
1 / Age: the Age of each stimulus is given in STMAGE. The potential for activation (and hence for the worth) of a stimulus varies inversely as the length of time it has been resident in STM. Thus the younger a stimulus, the more effect it has on the worth of a Rule.
(1 + PRIGCN * No. of elements in internal stimulus list): indicate its experience. The more beliefs it has about its expected consequences, the more it has been activated by the organism in response to environmental situations. PRIGCN increases the worth of predictive Rules proportional to the number of predictions it contains. In case the number of predictions is zero, a +1 quantity has been added so that this term is never zero.
CHOICE - The computed worth of each Rule defines its probability for activation. The Rule with the highest worth, therefore, has the best probability for activation but by no means may be determinately chosen. Consider the following Rules chosen by MATCH with worth values computed by WORTH:
P1 : W1 = 1.0 P2 : W2 = 1.5 P3 : W3 = 0.5 P4 : W4 = O.2 P5 : W5 = 0.6
The RANDOM process assumes a RANGE of values as defined by the total cumulative worth values of all the Matched Rules. (RANGE = 3.8 in this case.) Considering each Rule, then, P1 has a 1 in 3.8 chance of being activated, P2 being better with a 1.5 in 3.8 chance of activation.
RANDOM is appropriately weighted, therefore, to yield a value in any particular sub-range. Thus P2 has the best probability, but still has a 2.3 in 3.8 chance of not being chosen.
The Rule chosen first (and hence is compared with SIGPR) is the one in whose sub-range the generated random number fitted. Say this was P2.
The routine DLTMPS now computes which of the remaining Rules are still applicable, since a chosen Rule may eliminate other Rules in the matched set. In this instance (since P2: S1 S2 ⇒ < R1 R2 > (see the P2 definition earlier) ) P1 may be eliminated since S1 has been accounted for.
Appendix 6 gives a flow diagram for CHOICE, CHOOSE, RANDOM and DLTMPS being repeatedly called until the criterial set has been exhausted.
CHOICE exits with a number of Rules chosen for activation.
UPDENV and UPDSTM - The routine UPDENV updates the environment, i.e. informs it as to the current actions being executed by the system. It performs the following functions:
The routine INSIGS updates STM with the internally generated stimuli. This may increase the size of STM, which increase is allowed up to the currently defined limit for STM capacity (STMCAP). If STM capacity may be exceeded, then the number of activated Rules must be reduced until STM can cater for all internally generated stimuli. (Note - Some internal stimuli may already be present in STM and this condition is checked for initially.)
All those stimuli which were currently attended to (formed the criterial set) are tagged as prime candidates for overwriting by the internal stimuli. Appendix 7 gives a flow diagram for INSIGS.
There are two basic Drives ascribed to the system. These are expressed as fluctuations of need levels, the level of Hunger and the level of Pain. The Drive Process serves to compute the level at each time interval and to send out the appropriate signals to the other processes.
(i) HUNGER - Hunger is defined as a Primary stimulus and its frequency of generation is dependent on the level of the associated need. It is defined by the following system variables:
HCOUNT α 1 / Hunger StimuliThus, the lower the value of HCOUNT, the greater probability there exists for the emission of Hunger stimuli. HCOUNT may be said to portray the quantity of sustenance present in the system (the food level). The higher the value of HCOUNT, the less intense the need for further sustenance, and hence signifying a less actively functioning Drive.
Three specific processes exist for dealing with Hunger:
UPDHNG - this process serves to update HCOUNT in each time interval, dependent on whether any sustenance has been input or not. The new value for HCOUNT is computed as:
HCOUNTnew = HCOUNTold - DHOUT + DHIN
in order that the system benefits from any input sustenance, obviously DHIN > DHOUT.
HNGACU - checks to see if HCOUNT has reached a point signifying an acute need level. Namely, if; HCOUNT < HMIN. If it has, then the Perceptual Process cuts down STM and channel capacity to one, allowing only Primary stimuli through, or any related stimulus.
ADDHNG - this routine is part of the Learning Mechanism and serves to add on Hunger stimuli to the conditional set of activated Production Rules if HCOUNTnew > HCOUNTold.
(ii) PAIN - Pain is defined as a Primary stimulus and is defined by the following system variables:
There exist a number of routines common to both need levels. These are:
STMPRM - checks STM to see if a Primary stimulus is contained within and if so returns the value TRUE.
PRRMVL - positively reinforces any Rule believed to have reduced a Primarv need.
PRINVL - negatively Reinforces any Rule believed to have increased a Primary need.
PRIMAR - a function which returns the value TRUE if the given stimulus is a Primary (the stimulus being given as a parameter value - its position in STIMLS - and PRIMAR checks the classification list to see if that particular entry is Primary).
Appendices 1 to 7 give all the variables and routines which have been defined to the system.
Experiments 1 and 2 were based on experiments conducted and reported by Siqueland and Lipsitt (Sigueland & Lipsitt, 1966). Previously, conditioning experiments on neonates had failed to establish any learning abilities being present. Prechtl (Prechtl, 1957) confirmed that infants possessed an innate ability to respond to tactile stimuli around the mouth area with an ipsilateral head turn, such reflexes serving an important function in his feeding history. Siqueland's experiments were clustered, an inter trial period being only 30 seconds and all trials being conducted in adjacent time intervals. The purpose of simulating these experiments was to show:
Experiment 3 was chosen so as to show the model's capacity for performing adequately in a fairly non-trivial game playing environment. The game of noughts and crosses was chosen, the criterion being for the model to learn, in some reasonable time period, how to draw or win fairly consistently over an average opponent.
Experiment 3 was expected to show the model's ability to acquire a fairly non-trivial skill. In Section 9.2.2 the design of the experiment, the computer simulation runs undertaken, the results and analyses of the results are given.
Experiment 1 was based on Siqueland and Lipsitt's (Siqueland & Lipsitt, 1966) experiments on conditioned head-turning in human newborns. The object of their experiment was to observe how specific components of unconditioned responses may be shaped or strengthened by environmental consequences functioning as reinforcers ..... The focus of the experiment was to evaluate the cumulative effect of reinforcement operations on the response of ipsilateral head movements to tactile stimulation of the face.
The subjects used were 36 full-term newborns, 14 males and 22 females, tested between 40 and 93 hours after birth. The S's were selected from a population of awake, bottle-fed (routine feeding every 4 hours) infants. They were assigned to four groups:
There were 30 training trials consisting of:
Figure 7.1 gives the temporal sequence of the experiment, where b denotes buzzer, t denotes touch, pht denotes positive head turn and fd denotes food.
A trial was defined as the 6 second time interval beginning with onset of the buzzer. There was a 3O second inter-trial interval. (For the purposes of the simulation, a trial was defined as the full thirty second period.)
The buzzer was introduced for two reasons. As an experimental arousal stimulus and to see if an increase in the frequency of anticipatory responses to buzzer might occur through reinforcement.
Dextrose presentation occurring 8-10 seconds after the tactile stimulus was to ensure that the S's were never reinforced by chance for turning to tactile stimulation but controlled for effects of arousal level and sensilisation over trials.
After training all infants received at least 12, but not more than 30, extinction trials. Extinction trials were terminated when the infant underwent 4 consecutive trials without the activation of a head turning response.
The results as reported in their paper seem to clearly indicate a greater frequency of head turning responses with reinforcements in the experimental groups over the controls during both training and extinction periods. Analysis of the extinction data suggests that the pairing of the reinforcement with the left head turn responses for the experimentals resulted in more responding for these groups over the controls after termination of the training period.
These results suggest that the effect of a reinforcing agent stabilises the responses to an otherwise relatively weak eliciting stimulus.
Figure 8.23 contains a table of the results obtained by Siqueland and Lipsitt in experiment 1 taking 30 training and 12 extinction trials.
Simulation 1 was termed the Short Siqueland experiment.
Each trial period was considered to be 30 seconds and consisted of the sequence:
0-5 secs - buzzer 2-5 secs - touch on left cheek 5-9 secs - left head turn response/no response, or right head turn response 13-17 secs - presentation of food if left head turn response observed
The psychological quantum of time was defined as 7½ seconds, which denoted a complete executional cycle for the infant. Considering then 4 cycles/trial, Figure 7.2 gives the sequence of events occurring.
Dependent on the value of HCOUNT, Hunger stimuli may or may not be emitted.
Further, in any cycle one or more insignificant stimuli may be registering on the infant. It is not clear as to how strictly controlled such psychological experiments are. But it seems reasonable that some irrelevant stimuli may also impinge upon the infant during an experiment.
Thus, the following set of input stimuli may be defined:
Buzzer Left cheek touch Hunger Random (i) Food
Considering each cycle then, the possible input stimuli are:
CYCLE 1 Buzzer, left cheek touch, Hunger CYCLE 2 Hunger, Food/random(i) CYCLE 3 Hunger, random(i) CYCLE 4 Hunger, random(i)
(Hunger is optional in any cycle dependent on HCOUNT) In cycle 2 it was assumed for the purposes of the simulation that an ipsilateral head turn was a reflexive response elicited by any tactile stimulus occurring on or around the sensitive regions of the mouth.
Since the buzzer emanated from a speaker placed above the infant's head, it was further assumed that the buzzer could not itself elicit a head turn response in order to orient reflexively toward the direction of the noise.
The following genetic hard-wired Production Rules were defined to the system and encoder prior to commencing the simulation exercise:
1. BUZZER NOISE ⇒ <listen> 2. CHEEK TOUCH ⇒ <turn head left> 3. CHEEK TOUCH ⇒ <turn head right> 4. FOOD IN MOUTH ⇒ <swallow>
Note that Productions 2 and 3 could have been further differentiated into:
LEFT CHEEK TOUCH ⇒ <turn head left> LEFT CHEEK TOUCH ⇒ <turn head right> RIGHT CHEEK TOUCH ⇒ <turn head left> RIGHT CHEEK TOUCH ⇒ <turn head right>
However, for the purposes of simulation 1 of experiment 1, it was felt that just 2 and 3 would suffice.
HCOUNT could be initialised at any value. The following factors were assumed about the infant in order to preset values for the variables DHIN (amount taken in with every swallow), DHOUT (amount automatically lost in every cycle) and HCOUNT initial value to portray low (30 to 90 minutes after feed) and high (120 to 180 minutes after feed) hunger.
Infant's feeding time - 20 minutes
Since time interval is 7½ seconds seconds:
No. of time intervals taken for feed - 160 minutes
If HMAX = 100.0, HMIN = 15.0 and HHIGH = 90.0, then intake/time interval = 100/160 = 0.6125
Therefore DHIN = O.6 units of food.
If we now assume that after a feed, the infant has the maximum quantity stored, i.e. 100 units, and it takes him 6 hours to reach an acute level (5 units) of no sustenance, he must lose, therefore approximately 0.03 units per time interval.
Therefore DHOUT = 0.03 units of food.
If we now take the median value for the time elapsed in testing the high hunger infants (120-180 minutes) then we have the experiments being performed 150 minutes after feeding time.
Thus the initial value for HCOUNT = 64.0 (assuming he commences at 100 units after each feed, there being 1200 time intervals in 150 minutes, hence the infant loses 36 units in that time).
The parameters for Hunger in the Short Siqueland simulation were:
HCOUNT = 64.0 DHIN = 0.6 DHOUT = 0.03 HMAX = 100.0 HHIGH = 90.0 HMIN = 15.0 64.0 O.6 0.03
The probability of getting a Hunger stimulus at HCOUNT = 64.0 may be given by:
P(H) = (100.0 - 64.0) / 100.0 = 0.36
At the start of the trials, then, there is a 0.36 probability of Hunger being present in the input stimulus set. If the infant takes a number of trials in order to respond positively, the probability measure will increase. However, if he responds positively,, fairly early and conditions rapidly, then P(H) will correspondingly fall off.
This is obviously rather a crude model of Hunger, but it does serve to approximate the effects of Hunger upon the system.
Sigueland's experiments, like most psychological experiments, were very carefully controlled, such that insignificant stimuli were believed to have been eliminated. However, the average infant in the average feeding environment has no such careful control and hence a number of irrelevant (irrelevant with regard to some significant event) stimuli impinge upon the infant. Yet, he does condition fairly rapidly and form differential responses toward particular events. This indicates some degree of ability to identify and ignore irrelevant stimuli.
The stimulus set Random(i) was introduced as a set of irrelevant input stimuli. If imax is small, however, its probability of occurrence would be fairly high, making it, therefore, of some significance. Thus, a fairly large imax must be defined such that Random(i) remains a noisy but irrelevant stimulus. It was found that 1max = 10 (reducing the probability of any one i occurring in adjacent cycles and in adjacent trials) would give a fairly realistic number of irrelevant stimuli being input, interfering with the conditioning process and ensuring that a conditioned response was not merely a fortuitous event.
A number of Production Rules that are believed to be innately present were encoded into LTM prior to the simulation exercise. However, there were some problems involved:
At testing, the infant had already experienced 3-4 days of life, and hence some learning may have already occurred. This is to say that the infant was not totally naive and this would affect the positioning of the Reflexive Rules.
A decision had to be taken then on:
Simulation experiments were run with:
It was also decided that a number of response Rules should be defined, of the type:
⇒ <r1> ⇒ <r2> etc
such that the Random stimuli may be paired off with adequate responses from the available response repertoire.
If we were to commence simulation with TNOW = 50 then 30 trials would occupy 30 * 4 executional cycles. Thus, at TNOW = 170, the training trials would have been terminated. The variable TEXTNC defines the time interval in which the extinction trials begin.
Therefore:
TEXTNC = 170 TFIN = 218
assuming a minimum of 12 trials during extinction.
Simulation 2 was termed the long Siqueland experiment.
The trial period and sequence of events was as defined by Siqueland and Lipsitt (see Figure 7.1).
However, in this experiment, the psychological quantum of time was defined as 5 seconds, thereby giving 6 cycles in each trial period. Figure 7.3 gives the sequence of events for 6 cycles/trial.
The only requirement for the definition of the psychological quantum of time was to provide an intervening cycle between the actual tactile touch and the subsequent presentation of food. This was to show that the model was capable of associating across an intervening interval. Thus any value for a time interval could have been chosen, the only difference being the number of intervening time intervals. The Short and long Siqueland experiments were designed so as to have no intervening interval and to have one intervening interval.
The following set of input stimuli may be defined:
Buzzer Touch Hunger Random (i) Food
Considering each cycle, the possible input/cycle is:
CYCLE 1 Buzzer, Hunger CYCLE 2 Touch, Hunger CYCLE 3 random(i), Hunger CYCLE 4 Food/random(i), Hunger CYCLE 5 Hunger, random(i) CYCLE 6 Hunger, random(i)
The hard-wired Production Rules were defined as being:
1. BUZZER NOISE ⇒ <listen> 2. CHEEK TOUCH ⇒ <turn head left> 3. CHEEK TOUCH ⇒ <turn head right> 4. FOOD IN MOUTH ⇒ <swallow>
The parameters for Hunger, the Random stimulus set and the initial genetic set are as defined in the Short Siqueland experiments.
However, the number of cycles being 6, DHIN and DHOUT have to be correspondingly altered.
Since time interval is 5 secs, number of time intervals/feed = 240. Therefore intake/time interval = 0.41 so:
DHIN = 0.4 DHOUT = 0.02
Commencing again at TNOW = 50, the 30 training trials would occupy 180 time intervals
TEXTNC = 230 TFIN = 302
assuming a minimum of 12 trials during extinction.
Siqueland and Lipsitt designed this experiment to assess the effect of pairing reinforcement differentially with two responses for individual S's. It was assumed that experimental evidence for differential changes in probability of response occurrence for reinforced and non-reinforced head turning responses would represent learned differentiation in human newborns.
The subjects chosen were selected from newborns ranging from 24 to 112 hours of age. Infants were tested 120-180 minutes after feeding (median 150). They were assigned to four groups:
Group YE Younger, tested 24-48 hours after birth Group YC Controls for YE Group OE Older, tested 64-112 hours after birth Group OC Controls for OE
A baseline trial was introduced to ascertain whether any innate biases existed. Baseline data from each infant determined which of the two responses would be assumed as positive. The response with the higher degree of occurrence indicated an inherent bias for that direction, and reinforcement was paired with the tactile stimulus on the other cheek (i.e. if right biased then infant had to condition to left head turn and vice versa). On alternate trials the tactile stimulus was presented to the right or left cheek of the infant.
There were 48 training and 36 extinction trials, one trial occupying 30 seconds. The sequence of events was defined as:
0-5 seconds auditory stimulus "buzzer" paired with right sided stimulation "tone" paired with left sided stimulation, 2-5 seconds cheek touch right cheek if buzzer /left cheek if tone. 5-9 seconds positive-left on left cheek if right biased or right on right cheek if-left biased. Other turns, eg. right to right touch if right biased, assumed contra-lateral. 13-17 seconds presentation of food on positive head turn and no food on contra-lateral head turn.
The results reported by Siqueland and Lipsitt indicated a reliable shift in the experimental groups from base line to the positive head turn response. There was also an indication that older infants were responding more than the younger infants during extinction. Moreover, the experimentals showed a reliable decrease in the positive response during extinction.
Each trial period was 30 seconds and consisted of the sequence:
0-5 seconds buzzer/tone 2-5 seconds touch on left/touch on right 5-9 seconds left head turn/right head turn 13-17 seconds food if head turn is positive and no food if contra-lateral
The psychological quantum of time was defined to be 5 seconds giving 6 cycles. Figure 7.3 gives the sequence of events. However, on alternate trials buzzer and left cheek touch were paired and tone and right cheek touch were paired. Therefore a positive response depended on whether right sided or left sided stimulation had occurred. For left sided stimulation, a positive response was left head turn and for right sided stimulation, a positive response was right head turn.
The following set of input stimuli may be defined:
Touch Left Touch Right Buzzer Tone Food Hunger Random(i)
In each cycle, the following stimuli may be entrant:
CYCLE 1 Buzzer/Tone, Hunger CYCLE 2 Hunger, Touch Right/Touch Left CYCLE 3 Hunger, random(i) CYCLE 4 Food/random(i), Hunger CYCLE 5 Hunger, random(i) CYCLE 6 Hunger, random(i)
As before, the probability of emission of Hunger stimuli depended on the current value of HCOUNT.
HCOUNT was initialised at 64.0 DHIN was 0.6 DHOUT was 0.03 HMAX was 100.0 HHIGH was 90.0 HMIN was 15.0 P(H) = 0.36 at HCOUNT = 64.0.
Ten Random Stimuli (imax=10) were originally defined as input stimuli. This reduced the probability of any random stimuli recurring in adjacent time intervals or in the same time interval over different trials. Thus the possible input stimulus set was Random(1), Random(2), ... Random (10).
An initial bias can be given to the system by the relative positioning of the Production Rules.
1. TOUCH LEFT ⇒ <left turn> 2. TOUCH LEFT ⇒ <right turn> 3. TOUCH RIGHT ⇒ <left turn> 4. TOUCH LEFT ⇒ <right turn>
Thus, if we had them in the initial order 1, 2, 3, 4 the system would have an overall left bias. The probability for the left turn response being activated, whichever cheek was touched, being greater. Similarly, the order 1, 3, 2, 4 would give a left bias. The order 2, 1, 4, 3 or 2, 4, 1, 3 would give a right bias. Again, the vertical distance (the distance between the relative position numbers) would give increasingly greater degrees of significance to the bias corresponding to greater and-greater vertical distances. Thus Rules 1 and 3 placed in the higher order positions with Rules 2 and 4 placed in very much lower order positions, would give the system an extremely significant left bias. Siqueland's experiment seems to indicate that neonates have no significant bias in either direction - indicating that the Rules should be kept close together.
It was decided that the following Rules would be encoded into the Production System to comprise the initial sensori-motor repertoire:
TOUCH LEFT ⇒ <right> TOUCH LEFT ⇒ <left> TOUCH RIGHT ⇒ <left> TOUCH RIGHT ⇒ <right>
and a number of Random Rules of the type
RANDOM 1 ⇒ <nil 0> RANDOM 1 ⇒ <nil 11> ... RANDOM 1 ⇒ <nil 20> plus a few rules of the type ⇒ <nil>
The experiment was commenced with TNOW=50. there were 48 training trials which gives:
TEXINC = 338
and 36 extinction trials, giving:
TFIN = 554
The Short Siqueland simulations were characterised by the following input stimuli:
Cycle 1 TOUCH, HUNGER Cycle 2 FOQD/R(i), HUNGER Cycle 3 R(i), HUNGER Cycle 4 R(i), HUNGER
Appendix A contains the relevant details with respect to these simulations, Table Al giving the input parameter settings for a characteristic run, Table A2 giving the Random input stimulus set and Table A3 giving the state of the Production System at the start of each run. Note that the stimulus Buzzer Noise has been eliminated from the input stimulus set in the actual computer simulation. This was due to the following reason: Siqueland stated The buzzer was introduced for two reasons:
Since this experiment was differentiated into two separate computer simulations: Short and Long Siqueland, it was felt that the Short Siqueland simulation should deal with reason (a) and the Long Siqueland with reason (b). Thus in this series of simulations, the stimulus Buzzer Noise was not input but rather assumed to have awakened the infant such that the stimulus cheek touch impinged upon an alert system, i.e. the cheek touch stimulus, coincided with the commencement of the subject's psychological quantum of time.
A maximum of ten random stimuli were used such that the probability of any particular one being emitted was 0.1. However, it must be noted that Siqueland conducted his experiments in strictly controlled laboratory conditions. As such,the number of extraneous (not relevant to the experimentally induced stimulus set) stimuli were probably few and far between. This could imply two things:
To use ten random stimuli as constituting the extraneous set was really implying factor (ii).
However, Section 8.1.3 deals with input parameter alterations under which runs made with changes in the number of input random stimuli will be discussed.
A particular random stimulus is a stimulus external to the experimental set and occurs with a 0.1 probability in any relevant cycle.
During the extinction sequence, food was presented arbitrarily independent of the model's behaviour.
There were 30 conditioning trials and 12-18 extinction trials.
The following aspects were chosen as measuring the success of a run:
Initially, r is relatively high due to the absence of any T → ℓα Rules and the presence of R → βtype Rules (R being a generic term for any R(i)) in the initial Production system - r, in reality, is an indicator of the amount of noise in the system. If the model has successfully differentiated between the significant-experimental and the non-significant-extraneous stimuli, then r should be typically high. If not, r would be low indicating that the system has been swamped by irrelevancies. A very low value of r signifies that no conditioning has, in fact, even occurred. In a successful run, r should rise slowly to some peak and then stabilise indicating that some stable point has been reached between the good Rules, the average Rules and the bad Rules.
The chosen run is merely one of many and could be considered to be fairly typical of the whole series of simulations.
Figure 8.1 gives the evolution of the Rule T → ℓF.
Figure 8.2 gives the head-turn responses grouped into three trial blocks.
Figure 8.3 gives the measure r computed against time.
T → ℓF was created at time 68 and inserted into position 32 (one before the last Rule in the system). It evolved to position 1 by time 115 and remained there until the end of the conditioning trials By this internal indicator, this was an extremely successful run. At time 66, the Reflexive Rule T → ℓ, was activated (after 3 consecutive activations of T → r) which generated T → ℓF through the law of causality. It was reinforced at time 80 through the activation of T → ℓ which obtained food, thereby reinforcing T → ℓF by attempting to create it again through CONCAT. At time 86, T → ℓF was directly activated resulting in a positive reinforcement through confirmation of food. At time 114, it was positively reinforced again and hence moved up to position 1. However, even with T → ℓF at position 1, there was a period in which 4 consecutive activations of other Rules such as T → r occurred. For example with T → ℓF having a worth of 11.85, the Rule T → r was still activated with a worth of 1.32, more than ten times less than the former. T → ℓF shows no signs of extinction, however, from time 170 onwards.
If we new group the thirty trials into blocks of three, we obtain, with respect to the external indicator of success (the positive head-turn response):
0 1 1 1 1 3 2 1 3 3
i.e. in the first three trials, no positive head-turns were elicited (three Reflexive rights being obtained), in the next three, one and so on.
Thus in the last six consecutive trials, six positive head-turns were obtained.
In any experiment performed upon a human infant, the only measure that can be taken with respect to response learning having occurred, is the increase in response occurrence over the training period and a corresponding decrease during extinction.
However, with a cognitive model, one can examine the internal reasons as to the increase in response occurrence over the training period.
Consider the Production system at time 149 (Figure 8.4).
There exist 5 Rules which when activated result in a left head-turn response, compared with 35 which could result in a right head-turn response. The conditioning Rule i.e. T → ℓF is at the top level and has an associated worth of 11.85 if the stimulus TOUCH is newly entrant in STM. Correspondingly the last six positive responses were generated by the activation of this particular Rule. It is obvious,though, that activation of any one of the other three Rules would also result in the positive head-turn response, there being no way for an experimenter to know how that response was generated; i.e. T → ℓF may never get formed and left head-turns only obtained through the Reflexive Rule, it being possible to get an increase in response without actual conditioning having occurred. A model such as this is transparent with regard to its actions and the motivations for such actions.
The reason for the proliferation of Rules resulting in a right head-turn was the number of times the Reflexive Rule T → r was activated in half of the trials and for the inconsistent nature of the newly created Rules. For instance, of the first 15 trials, there were 11 right turns compared with 4 left turns (2 Reflexive and 2 by the conditioning Rule). Thus, newly created Rules being dependent on the most significant activation to date, were generally spawned by T → r rather than T → ℓ.
Figure 8.3 shows, however, that although T → r dominated the first half of the training period, the T → ℓ type Rules still managed to assert themselves due purely to the high worth of the Rule T → ℓF. Each time this Rule was activated, it resulted in food being presented to the model, which, since it confirmed the system's generated expectation, served to strongly reinforce the Rule. Initial reinforcements, prior to direct activation of the Rule were through the activation of the Reflexive Rule, which, having created T → ℓF, reinforced it each time it tried to recreate it. This mechanism, in fact, forms the basis of the early learning period prior to stabilisation of T → ℓF itself. Thus, at tine 149, the worth of T → ℓF alone exceeded the cumulative worth of all the Rules with random stimuli in their conditional sets, showing that the system had clearly differentiated between the significant and the non-significant stimulus classes.
This run indicates that:
Extinction effects were not tremendously obvious. Figure 8.1 shows no decrease in position of the conditioning Rule. Figure 8.2 shows no decrement in elicitation of the positive response. Figure 8.3 shows an increase in r between time 149 and 218, due to an increase in the number of T → ℓ type Rules in the top 10 during extinction. This failure in obtaining reliable extinction effects was due to the arbitrary presentation of food technique employed by the simulation. (Siqueland actually presented food each time during extinction.) In this period T → ℓF was activated 12 times, of which 8 were rewarded with food, thereby considerably strengthening the Rule. Section 8.1.3.5 will deal with how reliable extinction effects can be obtained with this simulation series.
We shall now examine a number of runs paying particular regard to the following features:
This results in two kinds of movements:
Figure 8.1 shows an example of the effect of direct reinforcements. T → ℓF inserted originally into position 32 graduates to position 1 by time 115. Figure 8.1 gives the movements of three other Rules, T → ℓ, T → r and TH → ℓF. The Reflexive left Rule does not show much sign of movement. TH → ℓF in contrast, moves up from position 82 to position 3. The Rule T → r, is never subject to any direct negative reinforcements, but moves down from position 4 to 11 through the upwards movements of Rules below. Rules such as TH → ℓF graduating upwards serve to negatively reinforce the Reflexive Rules.
The ways in which a Reflexive Rule may be directly reinforced (since they contain no primary or expectational stimuli) is as follows:
Thus, in the case where an activation of T → ℓ serves to generate T → ℓF, any attempt to do so again results in T → ℓ being moved downwards and T → ℓF being moved upwards.
It could be argued that the evolution of a Rule built directly upon the Reflexive component, but seeming to benefit the system more in a particular environment, should also result in weakening its creator. This is to say that T → ℓ should show more movement than is brought out in this particular run, since its descendant T → ℓF shows much more positive movement in the system.
Figure 8.5 gives the movements of the Reflexive T → ℓ Rule in three other simulation runs. Now, in a typical successful run one would expect the following behaviour: as TT → ℓF evolves upwards, T → ℓ should move downwards, and during extinction T → ℓ snould move upwards and T → ℓF move correspondingly downwards.
Curve 1 lives up to expectations, T → ℓ moving, from a starting position of 17 at time 50, to 42 by the end of the training period.
Curve 2, similarly, shows T → ℓ moving from an initial position 18 at time 50 to 32 by the end of training - in fact it drops to 37 at time 76.
Curves 1 and 2 also show the expected signs of recovery during extinction, T → ℓ rising to position 3 by time 250. In fact, in these two runs T → ℓF shows no signs of evolving downwards during extinction, but interestingly, T → ℓ shows that same degree of extinction has occurred in as much as the Reflexive response has become strongly positioned again.
Curve 3 is atypical. After an initial downward tendency (position 24 at time 89 after an initial 14) it ascends again to reach position 3 by the end of training. In this run, T → ℓF ascends from position 50 at time 52 to 1 at the end of training. Looking at the model's internal behaviour, the Rule T → ℓF was activated in the last four consecutive TOUCH intervals despite T → ℓ's high status. In fact, in this run, the Reflexive Rule was not solely under the influence of its T → ℓF predecessor, but was also influenced by its other predecessor TH → ℓ. TH → ℓ was activated at time 110 without HUNGER being present in STM. REDUCE reduced the Rule obtaining T → ℓ which positively reinforced the same. This moved T → ℓ from 24 to 8 and then again from 8 to 1 for the identical reason. However, when T → ℓ was itself activated, during the training period, it generated food, thereby enhancing T → ℓF and negatively reinforcing itself down to 3. During extinction, it was activated three consecutive times, each time generating food, thereby displacing itself down from 1 to 2, from 2 to 3 and from 3 to 4, and then twice more moving it from 4 to 7 and lastly from 7 to 8.
Considering indirect movements of a Rule, it is possible, within a large system (100 Rules for example), for a Rule lying toward the bottom, to get promoted a fair distance through the falling out of Rules above it. Thus, there was in the case of one particular run, the Rule TH → ℓF getting inserted into position 99 at time 140 and at time 170 being at position 70 though not subject to any direct reinforcements. i.e. a displacement of nearly 30 indirectly. However, it is only when a Rule lies towards the lower end of the system that such large indirect movements can take place, since the upper end of the system contains the more consistent Rules which are subject to small, directed movements.
It is commonplace for a Rule to fall a few places or gain a few places indirectly, but movements have not been known to exceed 5 when considering any in the first 20 Rules.
Infantile hallunication has often been quoted as the beginning of mind, the origin of thought (Kessen, 1971). This refers to the situation in which the infant, desiring sustenance but being unable to obtain it directly, is observed to purse its lips and make sucking motions, as if sucking a hallucinatory nipple, thereby obtaining gratification. In other words, the infant has associated the sucking actions of his lips upon having an object placed within them, with obtaining food, and has also associated obtaining food with producing desirable effects within himself.
He is, therefore, supposedly capable of conjuring up the feeling of a nipple within his mouth, when under the influences of his hunger need, sucking this hallucinatory nipple and imagining the gratification produced thereby.
It has been found that the model behaves in a similar infantile manner, due to its internal generation of stimulus symbols. When a Rule such as T → ℓF is activated the stimulus FOOD is written into STM. (Strictly speaking, this is not always so dependent upon the spare capacity in STM.) During extinction, however, FOOD is not always environmentally produced. But, any symbol in STM is potentially able to activate a Rule, the system not differentiating between a real stimulus (environmentally produced and hence actually perceived) and an imaginary one (internally generated). Thus it can attend to any stimulus in STM regardless of its origin.
The model has been known to attend to FOOD in its external absence and to activate F → a as a result. (Note: this Rule is, in reality, a shorter version of the two Rules OBJECT ON MOUTH → <SUCK> and OBJECT IN MOUTH → <SWALLOW>.) The model, therefore, has successfully hallucinated its own OBJECT ON MOUTH and responded to it thereby obtaining imaginary sustenance. But, this action, in no way, enhances HCOUNT i.e. the model may only obtain temporary gratification, its need level remaining unaffected. Thus eventually, if no sustenance is forthcoming, HCOUNT would reach a minimum allowable value, and activate the system's autonomous defence mechanisms.
There is another circumstance in which the model responds to non-existent stimuli. For example, say STM contains FOOD, TOUCH and S1 Rules such as
P1 : α FOOD → P2 : β TOUCH → P3 : γ Sl →
may all be activated, α, β and γ being any other stimulus set. Thus, if α = S2, say, and P1 were activated, then the model could be seen as imagining S2 as being present and hence responding to the condition S2 and FOOD being concurrently present in STM. Another feature is the fact that externally perceived stimuli may go unattended. Thus, if S1 was entrant into JSTM at t0 , it need not be chosen for attendance. It can, however, remain in STM and be there at t1 and stay on till some time tn, still not having been attended. If at tn, it was chosen for attendance and it is by no means certain that S1 still exists in the environment, i.e. the conditions defined by S1 no longer exist. The model is therefore responding to some extinct situation and hence may be seen as hallucinating whereas, in fact, it is not, S1 having been real enough at its time of entrance. This situation could also arise through multiple attendance, wherein, S1 may be attended to at t0 and again at tn and, of course, potentially being able to attract attention whilst resident in STM.
Old stimuli in STM, whether real or imaginary, being attended do cause problems to the system. Since stimuli have a decaying influence on memory (the longer they stay in STM the less chance they have of being attended due to PROLD) those that do exert delayed influence usually occupy the conditional set of a Rule at a relatively high level. If this Rule happens to contain an expectational stimulus, then, being activated under no longer prevailing conditions, expectations are not confirmed. Hence they get demoted. This kind of reinforcement to a Rule which is correct if activated in the correct circumstances, may lead to alternative Rules being applied during that interval defining the circumstances wherein that Rule should have been activated. However, these are fluctuations that do occur and in the majority of runs do not constitute an insoluble problem, although they may slow down the conditioning process. Perhaps they should be regarded as an intrinsic part of the model's behaviour rather than as random fluctuations.
The problem of learning how to behave in context-dependent ways may be equated to the problem of learning (to differentiate between non-differentiable stimuli and hence to assign an adequate response to them. If,for instance,we have the stimulus set Sj (j = 1,2,...,n) where each set member is unknown to the system (no Rules exist with S1,.,.,Sn in their conditional sets), then, if the system is to learn about Sj it must assign a set of responses Ri(i = 1,...,m) to these stimuli. The Sj class of stimuli give rise, through learning, to the R1 class of responses each one having some response Ri attached to it (members of the class Ri may be identical to each other).
The model assigns responses to stimuli by the process of associating previously known stimuli and currently obtained, unknown stimuli. If, Si1 ∈ Si is a novel stimulus, then it is matched for similarity with stimuli in the Stimulus Symbol Table maintained by the Perceptual Process. Each stimulus has a string of attribute elements ek, two stimuli being similar dependent on the degree of match between the two attribute lists, i.e. how many ek exist in common. If no ek exists in common, then the stimulus is chained on to the end of the Stimulus Symbol Table and linked to the previously last symbol. Thus it would get that symbol's response assigned to it. If it could be established as having similarities to any other symbols, then it would get the response elements of that symbol having the greatest degree of match. This is seen as response learning. However, this would result in the class Sj having a number of common responses between them (since a class implies a great degree of similarity between stimuli in that class). Thus differentiable responses have to be formed, such that eventually Ri have as few elements in common as necessity demands. This is achieved through the Learning Mechanism, which can, through the SLTCHG process (see Section 6.2.3) switch stimulus and response elements, or add on further elements to the response set.
Looking at some simulation runs, we observe the stimulus TOUCH having the following responses assigned to it:
<swallow > <nil > (nil can correspond to any reasonable system response such as twitch leg), <right head-turn > <left head-turn > and multiple responses such as: <swallow left head-turn > <swallow right head-turn > <left head-turn nil > <right head-turn nil >
In fact, as many response elements as exist in the initial response repertoire may be assigned to an entrant stimulus, reasonable Rules eventually evolving upwards and away from the useless ones. The system, therefore, learns to select that Rule from a number of alternatives, which is most appropriate in any given environmental context. During the training period, this is shown by the evolution of T → ℓF which gradually occupies and stabilises at the highest level. Such a technique serves to effectively inhibit those Rules which contain alternative response elements, without actually employing any mutually inhibitive structure in the Learning Mechanism. Thus, once T → ℓF stabilises at the top level, it is activated far more often than any other T → Rule, its each activation serving to stabilise it even further. Providing the environmental conditions do not alter, the model, as it stands, would go on activating the conditioning Rule, behaving in an extremely predictable and stereo-typed manner. The basic mechanism of choice here being the actual worth of a Rule, a consistent Rule keeping its worth value equally consistent. The strongest Rule is nearly always chosen in response to a particular stimulus situation, its high worth value effectively sharpening the contrast between itself and other possible alternatives. To return to simulation results, we find the following worth values typical during conditioning (chosen just before extinction conmenced):
Rule | Worth value | Response |
---|---|---|
T → ℓF | 11.85 | left, internal food | >
T → r | 1.09 | right |
T → ℓ | 2.13 | left |
T → nil | 0.0003 | arbitrary |
T → sℓ | 0.0001 | swallow, left head-turn |
T → r R2 | 0.88 | right head-turn, internal symbol R2 |
The high worth value effectively separates it from the rest of the T → category, hence acting as an inhibitor on the other members of the same class. The underlying factor for such exclusion becomes now the basic learning functions under control of the response-contingent reinforcement scheme, not any separate inhibitory-excitatory mechanism.
It was observed in certain simulation runs that contradictions sometimes existed between the internal and external behaviour of the model. Thus, we have:
Considering (i), there exist a number of reasons as to why the turn left symbol should be generated. Typical Rules in this class are:
T → ℓF T H → ℓ T H → ℓF T → ℓ T H → ℓα T → ℓβ γ → ℓ
and so on, where α and β are internal stimulus sets and γ is a conditional stimulus set not including TOUCH, but including relevant stimuli such as FOOD and HUNGER. When the model is touched on its left cheek, there therefore exist a number of reasons as to why it should turn left. Of these only two are conditioning Rules, namely T → ℓF and TH → ℓF. When these Rules occupy high status then the model is said to have become strongly internally conditioned, having associated stimulus TOUCH with the generation of FOOD upon turning left. But, as shown, the model can still show itself strongly externally conditioned without having either Rule occupying the higher levels (positions 1, 2 and 3 say). Consider a simulation run where the Production System was:
Position Rule Worth 1 R2 → nil 5.93 2 R3 → nil 3.33 3 T → r 2.13 4 R1 → nil 1.09 5 T → ℓF 2.18 6 TH → ℓ 3.35
During time 138 to 169, the following Rules were activated on occurrence of TOUCH:
T → ℓF T → ℓF T → ℓ T → ℓr TH→ ℓr TH→ ℓF T → ℓF
Out of 7 left head-turns externally executed, only 4 were due to the Rule T → ℓF and one due to TH → ℓF
In the non TOUCH intervals, a further 13 left head-turns were activated, 11 of these being due to the Rule H → ℓ i.e. if Hungry turn left. This shows some basic association having occurred between the hunger state and turning left, but absolutely no learning having taken place with regard to correct context-definition. Thus, the model seemed to firmly believe that turning left would somehow ease its hunger, but precisely when and how was not known.
Considering (ii), this situation obviously represents the converse. The model has learned to associate TOUCH, left head-turn and obtaining FOOD but due to interference from other Rules in the system, cannot always behave in the manner that it knows gives it the best environmental response. We have H → ℓF fluctuating at positions 1, 2 and 3. Considering two simulation runs, we find from time 130 to 169, the following Rules being activated in the first run:
Position | Rule obeyed | Worth | Cumulative Worth of all T→ Rules | Current Worth of T → ℓF | % Chance of activating T → ℓF |
---|---|---|---|---|---|
2 | T→rR7 | 6.67 | 13.47 | 4.27 | 31.70 |
8 | T→r | 0.53 | 13.41 | 6.67 | 49.74 |
3 | T→rR7 | 4.27 | 11.85 | 6.67 | 56.29 |
2 | T→ℓF | 6.67 | 10.44 | 6.67 | 63.89 |
4 | T→rR7 | 2.96 | 16.34 | 11.85 | 72.52 |
1 | T→ℓF | 11.85 | 15.08 | 11.85 | 78.58 |
1 | T→ℓF | 11.85 | 24.48 | 11.85 | 48.41 |
7 | T→rR7 | 1.32 | 19.24 | 4.27 | 22.19 |
7 | T→rR7 | 1.32 | 5.91 | 2.96 | 50.00 |
4 | T→ℓF | 2.96 | 6.63 | 2.96 | 44.65 |
There were 6 negative responses in these 10 cycles, although T→ℓF never dropped below position 4 and most of the time was at position 1.
The second run showed from time 130 to 169, the following situation:
Position | Rule obeyed | Worth | Cumulative Worth of all T→ Rules | Current Worth of T → ℓF | % Chance of activating T → ℓF |
---|---|---|---|---|---|
1 | T→ℓF | 11.85 | 15.35 | 11.85 | 78.00 |
9 | T→ℓ | 0.44 | 15.35 | 11.85 | 78.00 |
1 | T→ℓF | 11.85 | 16.70 | 11.85 | 70.96 |
4 | T→r | 1.48 | 15.42 | 11.85 | 76.85 |
5 | T→rR0 | 2.18 | 11.57 | 6.67 | 57.65 |
2 | T→ℓF | 6.67 | 10.20 | 6.67 | 65.39 |
1 | T→ℓF | 11.85 | 15.61 | 11.85 | 75.91 |
13 | T→rR1 | 0.47 | 15.50 | 11.85 | 76.45 |
1 | T→ℓF | 11.85 | 14.82 | 11.85 | 79.96 |
1 | T→ℓF | 11.85 | 14.15 | 11.85 | 83.74 |
There were three negative responses in ten cycles.
This is merely to show that external and internal factors need not be compatible, an external response sometimes belying the internal cognitive state.
In Siqueland and Lipsitt's experiments, that such a circumstance could exist was not accounted for, a straight positive head-count being taken. Presuming that the model may be functionally analogous to the neonate, these results indicate that the conditioning process depends on several indicators, only one of these being externally visible. However, even with respect to the non-verbal infant, it is felt that experiments could be designed to cater for indicators other than merely the visible increase in frequency of elicitation. For example, the introduction of an extraneous, controlled stimulus and the observation of its interference effects upon the conditioned response. It could be argued that the well-stabilised response could not be shaken but the weakly-formed one might.
Due to the process of Reinforcement included in the Learning Mechanism, Rules tend to move up and down all the time, the better defined and more consistent Rules showing a constant upward trend to the higher levels, whilst the inaccurate ones gradually move downwards to occupy the lower levels. If we look at some of these learned Rules we may observe a large number of funny beliefs being expressed by the node! resulting through associations between old stimuli (still in STM but no longer in the environment) and other Rules in the system. We obtain Rules such as:
HUNGER → <swallow> TOUCH
i.e. if hungry, then activate the reflex response swallow and then expect to be touched on the cheek - a queer belief to say the least.
Consider:
R1 → <nil> TOUCH
and suppose R1 ≡ visual object S1, and nil ≡ twitch right leg, then we have the belief, if visual object S1 appears, then twitch right leg and expect touch on the cheek to follow.
This could equate to a Rule hypothesized for Piaget's observation 94 (Piaget, 1953; Cunningham, 1972), namely: arbitrary visual object S1 → <shake leg> shaking doll object S2 i.e. the mistaken belief that on sight of any pleasant visual object, if he shakes his leg he can expect to see a particular doll swinging near his cot. Obviously the doll does not swing because the child shakes his leg. However, at some time whilst he was shaking his leg, the wind may have caused the doll to swing, thereby building such a Rule excluding the true causative factor which is, of course, beyond the child's comprehension. A model such as this would keep this Rule at the lower levels, but it must be remembered, that if, by chance, the doll did swing and confirm his expectation (a chance wind again) then the Rule would be strongly reinforced (the lower the Rule, the greater the displacement distance) and be difficult to displace for quite a while.
It may also be pointed out that a number of adult humans persist in holding on to a few superstitious beliefs although these are rarely subject to substantiation. Thus, that we do hold such beliefs and retain them into maturity must be attributed to chance associations and lack of perception as to the actual causative factor.
In one sample run, the Rule R2→n (nil) was initially inserted into position 7 prior to commencing the experiment. This Rule was activated several times in the first few trials, leading to the Rule R2→nT being created and inserted into position 81 at time 75. At time 77, R2 recurred, R2→n re-activated, T followed at time 78 and R2→nT was moved up to position 30. A chance reinforcement one could say. This situation was repeated at times 105, 106 leading to R2→nT being updated to position 13. R2→nT was itself activated at time 113, T was input at time 114 and this Rule was moved up to position 9. At time 129 it was up at position 7 compared with T→ℓF at position 37. At time 156 it was re-activated, T did not occur and the Rule was moved down to 9 (it in fact missed T by just one interval). At time 175, it was reinforced through the activation of its forefather back to position 7. At the end of the run R2 →nT was up at position 8, the highest learned Rule in the system, the next one being T →ℓF down at position 16.
A case of superstition come true?
Firstly a word about stimulus symbol structures. The model uses such internal symbols as TOUCH which correspond to an external stimulus touch. However, this by no means indicates that the model knows precisely what a touch defines, i.e. any stimulus is made up of a number of facets, these being physiological, chemical, physical, etc. That some component is recognised must be taken as obvious if the model can be made to respond in a stable and predictable way to the same environmental condition. Thus, some aspect or aspects of TOUCH must be known if the Rule T→ℓF is to be consistently activated.
However, it is believed that it is only through the build-up of a number of Rules pertaining to the same (similar) condition, that any form of knowledge can arise with respect to that condition. The model at present does not provide for the creation of conceptual structures, but it does provide for firmly held beliefs in the existence of a particular perceived object. As stated, this is through the creation of a variety of Rules pertaining to that perception and to the gradual bringing together and stabilising of a subset of these at a high level in LTM.
We have for example, a snapshot of LTM at time 169 (just prior to extinction) of one particular simulation:
Position | Rule | Worth |
---|---|---|
8 | T→ℓ | 0.53 |
13 | T→ℓF | 0.47 |
14 | TH→ℓ | 0.79 |
15 | H→ℓ | 0.28 |
It could be stated that such a grouping together of similar worth values (0.28 to 0.79) indicates a strong knowledge of the S, R pair TOUCH, left, the drive element HUNGER imbuing it with a further degree of definition.
If the model is expected to acquire skill at a particular task, it must be able to consistently activate a sequence of Rules pertaining to that task. For example, suppose the model had to acquire the skill of obtaining a visual object such as a cup. It most have Rules such as:
VISUAL OBJECT CUP ⇒ <reach out> TOUCH HAND ⇒ <grasp>
which sequence should result with the stimulus symbol CUP IN HAND.
Consider the Rules:
P1: TOUCH (CHEEK) ⇒ <turn head left> FOOD P2: FOOD ⇒ <swallow> where P2 can be broken up into P21: TOUCH (MOUTH) ⇒ <suck> FOOD P22: FOOD (MOUTH) ⇒ <swallow>
Thus, we have 3 Rules P1, P21, and P22 which should be activated in strict order if gratification is to ensue. i.e. if the model is to obtain any sustenance thereby reducing its hunger.
In the Short Siqueland simulations, due to:
In the early part of the training sequence, there was no such sequential activation, the system needing to tune itself initially before being able to identify its more consistent and desirable Rules.
Toward the end of the training sequence, dependent on how well conditioned the system had become, T→ℓF was more or less always activated in cycle 2 and F→s in cycle 3.
Thus, even with no deterministic linkages between Rules, the system was still able to isolate sequences which had to be activated. So long as the environment did not alter (as happens when extinction sets in) the model could be reliably expected to activate these Rules and thereby showed every sign of skill (albeit basic) acquisition.
With extremely strongly positioned Rules, the model will in fact go through the task without obtaining environmental confirmation of its stimuli. Thus eventually the Rules:
Visual object CUP ⇒ <move hand> TOUCH TOUCH ⇒ <grasp>
would be activated almost as the one equivalent Rule:
CUP ⇒ <move hand grasp>
causing the model to move its hand and grasp purely on perceiving the cup.
There are a number of parameters specifically defined to the model prior to a simulation run (see Table A1 for typical input parameter values).
If the model is sufficiently robust, then it should be more or less immune to alterations in parameter values. Thus, they could perhaps affect the rate of conditioning and extinction, but not entirely impede the learning process.
A number of runs were made specifically to see the effect of parameter alterations upon the model's behaviour. The chosen parameters were:
A measure of the number of extraneous stimuli in the environment is given by the number of Random stimuli allowed in as input. i.e. R(i), i = 1,....,n.
A number of runs were performed, varying i.
(i) i=1
When i = 1, only one Random stimulus is present in the input set. Thus, it may be expected to exert as much influence as an experimental stimulus. The stimulus, R, was allowed as input in cycle 2 (if FOOD not present), 3 and 4. R, therefore, occurred in every trial, and occurred more than once in any one trial.
Figure 8.6 shows the curve of r vs time, where r = (Σ T→ℓ)/(Σ R→). Initially there were no R→ type Rules in the system due to R not being a reflexively known stimulus. However, by time 59 some learning has occurred with regard to both T, R, and we have r = 4.19. This high value owes itself to the high initial position (2) of the left Reflexive Rule, to the evolution of TH→ℓ and to there being 4 T→ℓ type Rules to consider as compared with only 2 R→ type Rules.
At time 79, r was phenomenally high due to the high position of TH→ℓ (2) and T→ℓ (3), and to there being four relatively well placed T→ℓ type Rules compared to only two, rather lowly placed R→ type Rules.
By time 99, the system has had sufficient time to stabilise itself and from here onward, the gradual lowering of r occurs. This is due to the emergence and evolution of the Rule RH→ℓ and to the gradual deterioration of T→ℓ and TH→ℓ.
In fact, T→ℓF was formed and evolved to position 10 by time 139, but then disappeared from the system through lack of activation.
This run showed that with only one R stimulus, the model had no way of differentiating between the experimental and the extraneous stimuli. In fact, as far as the model was concerned T and R were equally significant.
(ii) i = 6
This reduced the probability of any one R stimulus entering from 1.0 to 0.16. This was equivalent to stating that there were six extraneous stimuli present, each one having a probability of entrance of one-sixth and capable of entering in cycles 2 (if no FOOD), 3,4.
It would be expected that r would be decreased due to the sheer quantity of R(i)→ type Rules, but if conditioning was to occur successfully, then this should stabilise r, thereby obtaining a roughly bellshaped curve, with its peak around time 169 (just prior to extinction).
As can be seen from the curve of Figure 8.7, a peak occurs at time 169 (r = 3), rising from an initial 0.57 value at time 69, and dropping to 0.07 at time 219. The slight drop between times 129 and 169.was due to a slight drop in T→ℓ and to a slight increase in the number of significant R→ type Rules.
During conditioning, none of the R→ Rules rose above position 5, due to the dominance of T→ℓF and T→ℓ. Thus conditioning occurred and the model differentiated between TOUCH and all R(i) extremely successfully.
During extinction, T→ℓF and T→ℓ fell off (as they should do) leaving room for R→ type Rules to move up. However, the RR→ Rules occupying the higher levels understandably fluctuated, since none were consistent enough to dominate over any length of time.
(iii) i=6
The above example showed that i = 6 may be large enough to prevent any one R stimulus from exerting over-due influence. However, this run served, to show the opposite.
The input parameters were the same as for the former run. But this time, one R stimulus did control the system and thereby prevented successful conditioning from occurring.
Consider the following situation:
time 52 : R5 enters STM; → nil was activated; TH → rR5 (from previous time) and R5 → nil created. time 53 : R5 enters STM; R5 → nil, TH → r R5 activated; R5 enters STM as internal stimulus. time 54 : T enters STM; → nil R5, H → r R5, THR5 → r R5, R5 → nil R5 created; time 55 : R5 included in new Rules 56 : R5 included in new Rules 57 : R5 included in new Rules 58 : R5 included in new Rules; time 64 : R5 still in STM!
As a result, T→ℓF was created at time 56 and could only gain position 6 (from an initial 17) at time 169. Looking at the top 10 Rules just prior to extinction, four incorporate TOUCH but an amazing five incorporate R5, with a Rule such as R5 → nil R5 at position 3.
Thus, even with a variety of six, it is possible to obtain destructive interference from one extraneous stimulus.
(iv) i = 10
It was unusual to find any one extraneous stimulus interfering when the variety was increased to 10. However, one particular run seemed to show that it could occur. In this run, the Rule T → r R7 was the first newly created Rule and was initially placed in position 10. Thereafter it successfully evolved to a peak of position 2 and at time 169 was still highly placed at 5. However, although it did interfere with the conditioning process, it merely slowed down the evolution of T→ℓF rather than stop it altogether.
A number of runs were made with i = 10 (approximately 50) and it would be safe to say that although conditioning may be slowed down by interference from an extraneous stimulus, it could not effectively be stopped.
There were two factors to be noted in the initial starting state of the system. Namely, the size of the initial system, and the positions of the two Reflexive Rules.
(i) Small Initial Production System
It must be remembered that given a small initial number of Rules (say 10), the worth of a newly created Rule (coming into position 11) is much greater relative to the rest of the system than a new Rule being inserted into a larger system. If the cumulative worth of the initial Rules be Wc10 (for 10) and Wc30 (for 30), and the worth of the new Rule is Wi, then
Wi/Wc10 > Wi/Wc30
This gives a distinct advantage to those Rules created towards the beginning of a run, over those created after the system expands.
A corollary to this is that a new Rule inserted into a small initial system lowers the worth of the highest Rule proportional to the rest, more than if the new Rule had been inserted into an initially larger system.
The table below serves to illustrate this feature:
Initial size of system | Initial position of new rule | Wi/ΣW | Amount lowered |
---|---|---|---|
4 | 5 | >0.46 → 0.42 | 0.04 |
10 | 11 | >0.35 → 0.34 | 0.01 |
15 | 16 | >0.33 → 0.32 | 0.01 |
20 | 21 | >0.312 → 0.309 | 0.003 |
25 | 26 | >0.304 → 0.304 | 0.00 |
When the initial system was limited to 10 Rules, the positioning of the Reflexive Rules was quite critical. For instance, if T→ℓ was at position 4 and T→r at 5, then Wℓ = 2.13 and Wr =1.48, thereby giving the left a distinct bias over the right turn. This meant, of course, that T→ℓ would have more chance of early activation than T→r. This could, on external appearances, give the impression of successful conditioning, in the absence of T→ℓ in any strong position. This situation did occur a few times (see Section 8.1.2.4).
Further, a small initial system meant that any learned Rule, so long as it had a fair degree of consistency, could easily graduate to the higher levels. Thus we have the case where Rules such as R(i)→γ and α→β R(i), are created early, evolve fast and exert overdue influence upon the model's behaviour (see Section 8.1.3.1).
(ii) Large Systems_with Reflexive Rules at Positions 17, 18.
A number of non-matching or filler Rules were placed in the first eight positions, with T→ℓ at 17 and T→r at 18. This gave Wℓ =0.15 and Wr = 0.13, hardly any bias at all. The initial system totalled 66 Rules.
Such an arrangement allows for the following:
A run was executed with the described initial state. At time 169, of the first 10 Rules, 6 were fillers, T→ℓF was at position 1, TH →ℓ at position 2, TF→ℓ at position 5, and H →ℓ at position 9. Such left sidedness showed that the model had extremely successfully learned on which side it obtained satisfaction and had associated the relevant stimuli TOUCH, HUNGER and FOOD together. The Reflexive Right Rule had suffered no reinforcement effects (indirect reinforcements rarely occurring at the higher levels), and was still at 18, whilst T→ℓ had been demoted to position 42.
In the extinction phase, T→ℓF did not drop far, just down to 4, but T→ℓ had been promoted right up to 3 whilst TH →ℓ and TF→ℓ had both dropped out of the first ten. In fact, the Rule at position 1 was R2 → nil T, i.e. an extraneous stimulus had gained ascendancy due to the total inconsistency of the environment.
(iii) Large system with Reflexive Rules at 13, 14.
Wℓ = .24 and Wr = .21, i.e. no significant difference. The initial system totalled 50 Rules with the top 6 being fillers. This was an extremely unsuccessful run due to the model exhibiting an extreme right bias. Thus, at time 169, T→ℓF was down at position 12, the only learned Rule in the top ten being R2→nil T. This was mainly due to R2 → nil having a starting position of 7. However, if the training period had been extended, T→ℓF would have graduated eventually to a dominant position. Thus, an initial right bias may only slow down conditioning but not halt it completely.
The filler Rules and the large system again acted as an effective buffer preventing non-significant Rules from attaining dominant positions.
Another run was made with the Reflexive Rules reversed, thus Wr = .24 and Wℓ = .21. In this run, conditioning successfully occurred and T→ℓF was at position 1 at time 169. T→ℓ had graduated up to position 3, H→ℓ at 4 and T -> r remained at 13. In fact, the evolution of T -> ℓ (after it had dropped to 24 through the reinforcement of T→ℓF) was mainly due to Rules such as TH→ℓ and TH→ℓF being activated in the absence of HUNGER, leading to the reduced Reflexive Rule being positively reinforced.
The parameter, PRDLMX sets the limit for the maximum number of Rules that can be held within the system. Normally, this parameter was set at 100, which meant that when the system contained 100 Rules, any new insertion had to be preceded by a deletion, the oldest, most infrequently used Rule being usually deleted (times used/time period).
Runs were made with PRDLMX = 60. It was found that if T→ℓF was created early, and if the initial system was small (less than ten Rules), then T→ℓF evolved successfully and obtained a position of dominance.
However, if T→ℓF was created late, then, with PRDLMX = 60, it was extremely difficult for the Rule to maintain its existence. In the early life of the Rule, the only way it could evolve was through the re-create mechanism upon activation of T→ℓ. Thus, initially after creation, the Rule itself was rarely activated. Now, if the Rule was placed lower down in the system, then due to its inactivity, it stood a chance of being deleted to make room for a newly created Rule.
In one sample run, T→ℓF was created at time 76 and inserted into position 48. By time 89 it. had indirectly moved up to 42. At time 100 it was deleted due to inactivity. At time 108 it was created again and inserted into position 59. By time 129 it was up at 29, again through indirect movements but at time 132 was again deleted. At time 148 it was generated for the third time and inserted into 59. At time 164 it received its first positive reinforcement through the Reflexive Rule and went up to 16. At time 168 it moved up to 12 again through T→ℓ. In fact, although T→ℓF did not ascend any more during conditioning, it did, during extinction, receiving its first direct activation at time 194 and being moved up to 3. It finished at position 3 at the end of the run.
Thus, it can be said that conditioning can be effected but over a longer training period than 30 trials as stipulated by Siqueland.
The parameter, PROLD controls the decay of a stimulus in STM. PROLD = 0.1 means a tenth reduction in the worth of a Rule containing a stimulus which was 1 time quantum old, a hundreth reduction if it was 2 time quanta old, etc. The average setting was found to be 0.01 which meant that decay set in very rapidly. This is obviously a developmental parameter controlling the rate of forgetting an incident from Short Term Memory. Thus in the 3 day old infant, decay was set to be fairly rapid.
(i) PROLD = 0.01
It was found that left over stimuli exerted influence fairly often on Rule activation. Consider the following sequence:
time 98 : T input; STM E T,H,R8 ; T→ℓF activated. time 99 : F,H input; STM = F,H,R8 ; H→r, F→s activated. time 100: R7 input; STM = R7,H,R8 TH→r, R8→nil R8 activated.
At time 100 two left over stimuli activated the highest Rule, namely TOUCH (3 quanta old) and HUNGER (1 quantum old). Note also that R8 was resident for 3 quanta before being generated again internally.
(ii) PROLD =0.10
This meant a less rapid decay rate for left over stimuli in STM.
T was input at time 126 and activated T→ℓF, resulting in F at time 127. However, T left over in STM re-activated T→ℓF at 127 resulting in negative reinforcement since F did not ensue.
In fact, throughout this particular sample run, old stimuli exerted influence over the model's behaviour.
Consider:
time 142 : T,H was input, STM = H,T,R8; FH→s, T→ℓF activated, time 143 : F was input, STM = F,T,R8; T→ℓF, F→sR1 activated, time 144 : R3 was input; STM = F,R1,R3; R3→nil T, R1→nil activated. time 145 : R6, input; STM = F,T,R6, R6→nil, T→ℓF activated.
In one trial interval, therefore, T was matched three times, the last occasion being when T was 4 quanta old.
(iii) PROLD =0.001
This meant that left-over stimuli should exert very little influence over Rule activation.
In fact, a left-over stimulus did not activate a Rule at any point after its actual entry time. Rules were activated only by currently input stimuli.
The majority of Short Siqueland runs were made with JPRIGS = JPNIGS = 4. However, consider the following situation:
Rule P1 is positively reinforced from p2 to p1, at time t0.
P1 is negatively reinforced from p1 down to p2'.
For positive and negative reinforcement to be equivalent
P2 = P2'
It must therefore be ensured that at all times symmetric movements should occur.
Now p1 = p2 - d where d = displacement distance.
d = p2 * (imx/16) /+1
when moving down, if we start from p1, then new position should be p2. i.e.
p1 + d = p2, therefore d = p2 - p1 = p2 * (imx/16) + 1 p2 = (1+p1) / (1 - (imx/16) d = 1 + (imx/16) * (1+p1) / (1-imx/16)
and we obtain d = 1 + imx * (1 + p1) / (16 - imx)
If we have JPRIGS = 3 and the Rule at position 12, movement upwards = 1 + 3 * (1 + 12) / (16 - 3) = 1+3.13/13 which gives d = 4 and the new position would be d = 8.
In order to move down from 8 to 12, we have
4 = 1 + JPNIGS * (1 + 8) / (16 - JPNIGS)
and we obtain a JPNIGS value of 4.
In this way a table of equivalent up and down movements could be computed, which would give:
JPRIGS JPNIGS 1 1 3 4 5 8 7 15 9 19
Some simulation runs were made with these altered values for JPRIGS and JPNIGS.
(i) JPRIGS = 1, JPNIGS = 1
An upward and downward displacement distance of 1 could be expected to slow down the conditioning process considerably and not obtain any extinction effects at all. It must be remembered that these parameters do not represent the actual displacement, but that this is relative to the starting position.
Figure 8.8a gives the evolution of the conditioning Rule. As can be seen this was not a very successful run. Figure 8.8b gives the count of positive head turns grouped into three-trial blocks. Again no improvement in frequency of elicitation was observable.
At position 10 (time 102) T→ℓF obtained its first direct activation. This sent it up to 9. At time 128 it had dropped to 12 through the movements of surrounding Rules. At this time it was reinforced to 9 through the activation of the Reflexive Rule resulting in FOOD. At time 134 it received its second direct activation moving it from 10 to 9 again. At time 166 it was activated again. However, this time H was present in STM leading to the attempted re-creation of TH→ℓF, thereby demoting it to 13 (the system demotes a Rule which was the forefather of a new Rule). There were no extinction effects observable either, the Rule being at 11 at the end of the experiment.
It could be argued that this run may have worked if T→ℓF had been generated earlier (it being the 42nd new Rule). However, with a greater value for JPRIGS, the two direct reinforcements would have sent it up into the first five, thereby making it a more successful run.
(ii) JPRIGS = 3, JPNIGS = 4
This was almost the same setting as for the average experimental runs (4,4).
Figure 8.9a and 8.9b give the internal and external results.
Conditioning effects were not successful. There were three direct activations moving it up from 16 to 12, from 9 to 6 and from 5 to 4. (If JPRIGS = 4, then equivalent movements would have been 16 to 10, 9 to 5 and 5 to 2 - a much brighter picture).
During extinction the Rule seems to have fared better, moving up to a peak of 2. However, it did drop to 8 towards the end. Not successful here either.
(iii) JPRIGS = 5, JPNIGS = 8
Figures 8.10a and b give the results.
Conditioning was extremely successful, the Rule's first direct activation moving it from 9 to 6, 5 to 3 and 3 to 2. By time 106 it was at position 1.
There were no discernible extinction effects, mainly due to the arbitrary presentation of the sweet solution resulting in a large proportion of direct positive reinforcements.
(iv) JPRIGS = 7, JPNIGS = 15
Figure 8.11a and 8.11b give the results.
Good conditioning effects were obtained, direct reinforcements sending T→ℓF from 13 to 7, from 7 to 3 and from 2 to 1.
During extinction, the Rule did manage to fall to 18, but was reinforced through the activation of the Reflexive Rule obtaining FOOD up to 6 and from 6 to 3. At the end it was at 4. It could be said that the extinction effects could only be obtained if there was a run of non-sustenance obtaining activations. But with the present system, the Rule could only move up and down at random.
(v) JPRIGS = 9, JPNIGS = 19
Figures 8.12a and 8.12b give the results.
This was an extremely successful run, both for conditioning and extinction effects.
There was also compatibility between internal and external behaviour, both indicating the same effects.
(vi) JPRIGS = 1, JPNIGS = 8
Figures 8.13a and 8.13b give the results.
No conditioning or extinction effects were observable. The Rule was directly activated twice only, moving it up from 14 to 13 and 13 to 12. (Again, if JPRIGS = 4, the Rule would have moved from 14 to 8, and from 13 to 8. Further, an activation from 14 to 8 would have meant a second activation at 8, moving it to 4.)
During extinction the Rule was activated 8 times, of which five resulted in FOOD. This meant a general Rule inconsistency reflected by its slight up and down movements.
(vii) JPRIGS = 1, JPNIGS = 19
Figures 8.14a and 18.4b give the results.
This run showed that successful conditioning was due to the respective values of both parameters, not merely to JPRIGS. Here although upward reinforcement was scarcely significant, the large downward displacements given to failed Rules served effectively to filter the good Rules up through the system. Demoting the alternatives to a Rule lessens the competition it may otherwise have suffered and hence makes it more likely that the Rule be activated under the appropriate circumstances. In the previous run when JPNIGS was 8, successful conditioning did not occur, the ratio of the two parameters JPRIGS/JPNIGS being too high. However, here, with a lower ratio, successful conditioning did occur serving to show that not only was the absolute value of each parameter important (a large enough JPRIGS would nearly always effect good conditioning independent of JPNIGS) but the ratio also played a significant part.
(viii) JPRIGS = 5, JPNIGS = 1, 19
Figures 8.15a and 8.15b and 8.16a and 8.16b give the results.
With JPNIGS = 1, good conditioning did occur due to the value of JPRIGS being sufficiently high to affect large positive movements. Thus the high ratio (5.0) did not affect the run adversely.
Extinction effects were not observed.
With JPNIGS at 19, both successful conditioning and extinction results were obtained.
(ix) JPRIGS = 1, JPNIGS =1,8
Figures 8.17a and 8.17b and 8.18a and 8.18b give the results.
These two runs serve to show that altering the ratio JPRIGS/JPNIGS does have a direct effect upon the success of a run.
With the ratio at 1.0, very bad conditioning was obtained (see 8.17a), the Rule T→ℓF proceeding no higher than 14 during the entire run.
Lowering the ratio to 0.1 served to effect good conditioning although the absolute value of JPRIGS remained unchanged. Thus even with a large JPRIGS value, a run can prove a failure if the ratio JPRIGS/JPNIGS also remained high.
This series of runs served to show that the absolute value of each parameter was not the only factor contributing towards success. Large enough values for JPRIGS could effect good conditioning whilst large enough values for JPNIGS could effect good extinction. However, if the ratio JPRIGS/JPNIGS was allowed to become too high then conditioning could be affected adversely resulting in failure. It could be concluded that the value of JPNIGS affected conditioning success whilst the value of JPRIGS affected extinction success.
The channel and STM capacity determine the number of environmental stimuli that can be input into STM at any time, the number of stimuli that can concurrently be kept reverberating in STM and the type of new Rule that can be created.
(i) CHPTCY = STPTCY = 1
Figures 8.19a and 8.19b give the results.
Conditioning occurred, the Rule evolving to position 1 by time 127. Behaviourally, this run was similar to other, normal simulation runs. However, the capacity restrictions had the following effects:
(ii) CHPTCY = STPTCY = 2
Figures 8.20a and 8.20b give the results.
There was good conditioning effected in this run, there being no discernible differences between this run and other average runs.
Extinction effects were not significant although the Rule did drop to 12 before being reinforced back into position 2 by the end of the run.
Altering the value of HCOUNT was equivalent to differing Hunger need levels prior to the start of the experiment. The lower the value of HCOUNT the higher the probability of emitting a HUNGER stimulus in any cycle.
Altering the parameter PRMCNS was equivalent to differing effects of HUNGER upon the computation of the worth function, where for every matching primary WORTH = WORTH * (1 + PRMCNS).
The creation and evolution of the Rules TH→ℓ, H→ℓ and TH→ℓF was monitored along with the usual indicator T→ℓF.
It was expected that for low values of HCOUNT and for high values of PRMCNS, HUNGER should have a pronounced effect upon behaviour.
(i) HCOUNT = 40.0, PRMCNS = 0.5
The model was being tested 250 minutes after its last feed, assuming it was full at the end of the feed. PRMCNS was maintained at its usual value, which meant that any Rule with a matching Primary had its WORTH increased by a factor of 3/2, i.e. it was worth 1.5 times more than a non-Primary Rule.
T→ℓF evolved to position 1 by time 112. Figure 8.2la gives the history of all four chosen Rules. As can be seen T→ℓF, TH→ℓ and H→ℓ all performed extremely well. At the end of the conditioning period they were respectively in positions 1, 2 and 5 showing quite an effective clustering effect. Their cumulative worth totalled 15.08 giving them a completely dominant hold over behaviour.
Disappointingly, TH→ℓF never showed any signs of evolution upwards.
(ii) HCOUNT = 21.0, PRMCNS =2.0
The model was being tested approximately 330 minutes (5.5 hours) after its last feed. It should be in an equivalent state to quite a hungry infant (neonates usually indulge in 3 hourly feeds).
Figure 8.21a gives the history of the chosen Rules and 8.21b gives the count of left head-turns. Externally the model conditioned fairly well, increasing from 5 positive responses in the first half to 12 in the second.
But T→ℓF was only at position 5 by time 169, TH→ℓF at 12, TH→ℓ at 6 and H→ℓ at 9. This seeming failure was largely due to the interference effects between these four closely allied Rules. For example, if TH→ℓ was activated in the absence of H, then REDUCE attempts to re-create H→ℓ, thereby demoting TH→ℓ and promoting H→ℓ. In the other two, if T→ℓF was activated in the presence of H, then PRDIGS promotes the Rule for generating FOOD but PRINVL demotes it for not having H on its left hand side. In fact. TH→ℓF could be said to be the most completely definitive Rule and if activated during extreme hunger, should always be promoted.
The cumulative worth of the four Rules at time 169 was 15.13 thereby completely dominating the model's behaviour. The two opposing Rules in the top ten T→ℓr and T→r had a cumulative worth of only 2.79 giving an effective left learned bias to the model. Thus, on this indicator and on the external indicator, conditioning seems to have successfully occurred, and the model has also associated TOUCH, HUNGER and FOOD fairly strongly together, the Rules being clustered together at positions 5 (T→ℓF), 6 (TH→ℓ), 9 (H→ℓ) and 12 (TH→ℓF).
(iii) HCOUNT = 64.0, PRMGNS =2.0
The model was being tested 150 minutes after its last feed and hence HUNGER should not affect it greatly. However, PRFCNS has been set high giving any matching HUNGER Rule a worth three times greater than normal.
Figure 8.22 gives the evolution of the Rules. As expected TH→ℓF and H→ℓ did not fare too well. T→ℓF received very little interference and once it reached the higher levels, it remained there through its consistency. TH→ℓ fared quite well attaining position 10 by time 169. H→ℓ fared better during extinction, having been created late, and was up at 8 at time 249.
At low hunger, then, the model performed as it was expected to even with the high PRMCNS setting.
A total of 15 experimental runs were made, where each run commenced at a different starting point. This effectively gave an.average over a number of different simulated infants, each chosen random number indicating an unique simulated infant.
These results show that the model conditioned extremely successfully reproducing the learning curve obtained by Siqueland's experiments on live 3 day old infants.
The mean percent response over all trials is presented in Table 1 of Figure 8.23 in blocks of three trials for the ten blocks of training trials and four blocks of extinction trials. Siqueland's results are given in row 2 of Table 1.
Graph 1 show's the learning curves obtained, the dotted line being that obtained by Siqueland.
These simulations were characterised by the following possible set of input stimuli:
Cycle 1 NOISE, HUNGER Cycle 2 TOUCH, HUNGER Cycle 3 R(i), HUNGER Cycle 4 R(i)/FOoD, HUNGER Cycle 5 R(i), HUNGER Cycle 6 R(i), HUNGER
Much more was required of the model's learning ability in order that it could condition successfully to the tactile stimulus.
In these simulations, the extraneous stimuli play a far more vital part. By being input in the intervening cycle they could be expected to divert the model sufficiently such that it was unable to associate T and F through all the environmental noise. Expectational stimuli are not goals in that they cannot be unerringly sought after whatever the surrounding circumstances. If the stimuli attended to in the intervening cycle had a greater worth than that Rule which was activated by the T stimulus, then F would arrive too late to reinforce any T activated Rule. Thus, if the events occurring in the environment are sufficiently distracting, then the model must attend to them and in its attending will, due to the present design constraints of the Learning Mechanism,, forget that T occurred in some previous cycle. This is far more natural, surely, than if the model was allowed to construct a goal tree and pick its undeviating way along the channels until each goal was achieved?
The design constraints were such that:
It was felt that WTHCUT would enable T→ℓF to be. reinforced even if a more significant Rule was reverberating simultaneously, so long as T→ℓF had graduated to a sufficiently high enough level.
However, governing SIGPR down to 1 meant that there was the risk of the creation of T→ℓF being delayed given a sufficiently distracting environment. The delay in its creation meant, of course, that N could begin to dominate the system and effectively delay creation even more.
For instance if N→ℓnT was created early enough (and it was discovered that it was always created within the first trial obviously) then it could graduate to a high position and become potentially able to be activated in the T cycle, i.e. the model could only attend to the buzzer and totally ignore TOUCH if N→ℓnT had a worth that was significantly higher than T→ℓF.
The model, then, was required to do far more in these simulations as compared to the earlier series, in order to prove itself capable of learning.
Extinction effects were not expected to be particularly significant due to the inequality between JPRIGS and JPNIGS (4 and 6). If the model conditioned to N (as well as or instead of to T) then it was not expected to lose its conditioned response after the training period had ended.
Figure 8.24a gives the history of the Rules T→ℓ, T→r, N→ℓnT, T→ℓF. Figure 8.24b gives the count of positive responses in 3-block trials.
Table 1 in Appendix B gives the input parameter values for this run, and Table 2 giving the starting state of the Production System.
There were 5 stimuli used in the Random input set, giving Rl, R2, R3, R4 and R5, each with a probability of entry of 0.2 during cycles 3, 4, 5 and 6.
Figure 8.24b shows that the frequency of elicitation of the positive response increased from 7 in the first run, to 9 in the second run and 10 in the last ten trials. There was also a slight decrease to 8 in the first run during extinction.
Figure 8.24a shows that internally the model had successfully identified the significant stimuli, namely, TOUCH, NOISE, FOOD and HUNGER.
As expected, a significant increase of anticipatory responses to buzzer occurred, the Rule N→ℓnT graduating from position 42 at time 52 to position 2 by the end of training. This Rule expressed the belief that the tactile stimulus delivered on the left cheek is always preceded by the auditory buzzer stimulus. Thus, upon hearing and attending to the buzzer, the model expects next to be touched on its cheek. This is an obvious and relatively easy connection to make since N and T are input in temporarily adjacent cycles. Thus the model merely has to carry the expectation over from its generated cycle into the next for confirmation. N was, in fact, identified very early by the model as a significant stimulus leading to a number of Rules of the type N→ℓnα, Nδ→ℓnα and β→ γN. For example:
(i) Rules of the type N→ℓnα could be created in many ways. Consider the following sequence:
Time | i/p | STM | Rules activated | Most significant rule | Rules created |
---|---|---|---|---|---|
50 | N | N | N→ℓn | N→ℓn | - |
51 | T,H | T,H,N | N→ℓn T→r |
N→ℓn | N→ℓnT TH→r TN→r |
52 | R1 | H,T,R1 | R1→nil | R1→nil | N→ℓnR1 |
This sequence of activities led, in fact, to the creation of N→ℓnR1, N→ℓnT, TN→r all within three cycles.
Consider:
Time | i/p | STM | Rules activated | Most significant rule | Rules created |
---|---|---|---|---|---|
110 | N,H | R3,H,N | N→ℓnT | N→ℓnT | NH→ℓnT |
111 | T | R3,H,T | T→ℓF | T→ℓF | - |
In this sequence EXTEND inserted H into the activated Rule N→ℓnT.
When N→ℓnT attained a high position, it could be kept reverberating over a number of cycles (if, say, it was activated in cycles 1 and 2) in which case EXTEND kept inserting unattended stimuli present in STM into the most significant Rule. this could create a number of Rules of the type Nδ→ℓnT.
(ii) Rules of the type β→γN could be caused by:
Time | i/p | STM | Rules activated | Most significant rule | Rules created |
---|---|---|---|---|---|
53 | R3 | H,R3,R1 | R3→nil R1→nil |
R1→nil (from time 52) |
- |
54 | R2,H | H,R3,R2 | R2→nil | R1→nil | R1→nil R3 |
55 | R3 | H,R3,R2 | >R3→nil R2→nil |
R1→nil | R1→nil R2 HR2→nil |
56 | N,H | H,N,R2 | Nδ→n | R1→nil | R1→nil N R2 R3→nil NHδ→n |
In this case R1→nil dominated the system over many cycles (five in all) leading to a number of R1→nilα type Rules, one of these being R1→nilN. Incidentally, R1→nil had a high initial worth purely due to the hardwiring of the initial Rules in the system. R1→nil was placed in position 1 with a worth of 0.59 followed by 9 filler Rules. (See Table 2 in Appendix B).
At the time 209 there were four Rules with N in the top ten (N→ℓnα, Nδ→ℓn, β→γN) having a cumulative worth of 2.14 as compared with five T Rules having a cumulative worth of 1.42.
The early identification of N as a significant stimulus resulted, therefore, in:
However, the model did, extremely successfully, also condition to the tactile stimulus. This is a far more difficult condition to identify due to the fact that T and F were separated by an intervening cycle. The model could, for instance, easily be led to believe that N created F due to its early identification of N as being an important stimulus. A more likely event, of course, was that the model would believe that F was brought about by whatever action it performed in that intervening cycle.
Other simulation runs do show that the Rule N→ℓF could be created and end up at a fairly high level.
In this run, the following situation did occur:
Time | i/p | STM | Rules activated | Most significant rule | Rules created |
---|---|---|---|---|---|
57 | T | T,N,R2 | T→ℓ | R1→nil | R1→nil T |
58 | R4,H | T,R4,H | HR2→nil R4→nil |
T→ℓ | T→ℓR4 |
59 | F | T,F,H | F→s | F→s | T→ℓF |
99 | T,H | T,H,N | T→ℓ | R4→nil | R4→nil T |
100 | R4 | R4,H,N | R4→nil | R4→nil | TH→ℓ |
101 | F | H,F,R4 | F→s R4→nil |
R4→nil | R4→nil F R4→s nil |
Thus, in the early sequence T→ℓ was successfully carried over in the intervening cycle to allow CONCAT to form the Rule T→ℓF. But the latter sequence showed that the model cannot always successfully reinforce this belief due to interference from Rules activated in the intervening cycle.
It can be seen, then, that it is by no means an easy task for the model to associate the presentation of F with its reflexive left movement to the tactile stimulus. The fact that it does do so (T→ℓF being in position 1 by time 155) is due to:
Figure 8.24a shows the evolution of the two Rules N→ℓnT and T→ℓF Although the model conditions first to the auditory stimulus, this does not prevent it from identifying the association between T and F and conditioning to the tactile stimulus. Thus, although the model attended N strongly, this early conditioning does not interfere with its attendance to T. However, there were some runs which showed that N could interfere destructively with T which will be discussed in Section 8.2.2.l. But the fact that they may peacefully co-exist is brought out quite conclusively, this run merely being a sample of a number of successful runs.
Extinction effects were not very good, the Rule dropping down to position 9 at tine 263, but recovering again by the end.
As was expected N→ℓnT did not show any signs of extinction since the model correctly assumed strong connections between N and T but not N and F.
There was some interference from the extraneous stimuli. At time 209, when the model was strongly conditioned to both T and N, the cumulative worth of the R→α type Rules was 1.38 (considering the top ten only) whilst N→ℓnβ type Rules had a cumulative worth of 0.8 and the T→ℓγ type Rules scored 0.38. In fact at this time R2→nil N was at the top dominating both N→ℓnT and T→ℓ (it dropped down to 4 by the end). Rules of the type Ri→α did shuffle up and down the system, but no one Rule was able to maintain any lengthy period of dominance due to its inherent inconsistency.
We shall now, as was done with the Short Siqueland Simulations, examine some interesting features of the model. Obviously those features already discussed will not be renewed, other than to say that these also emerged in the Long Siqueland simulations.
The major interference with TOUCH was caused by buzzer NOISE.
The principal points were:
The conditions upon success imposed by the interference effects from N meant that:
The interesting feature is that despite such interference T→ℓF usually managed to reach a relatively high position before the end of the run. Due to the arbitrary food presentation scheme, it stood to be positively reinforced even during extinction. This showed that the model's ability to condition to T was slowed down by interference from the buzzer but not necessarily fully blocked.
Figures 8.25a,b,c,d,e and f serve to show a series of runs in which T→ℓF was slowed down considerably by interference from N→ℓnT, but managing to improve even during extinction, serving to show that, had the training period been extended, the model may have successfully conditioned to TOUCH.
In the Long Siqueland simulations, a cycle was set at 5 seconds as opposed to 7½ seconds in the previous simulations.
This meant an intervening period between the input of T and the input of F (contingent of course upon a left head-turn).
This led to a number of interesting problems:
Several runs were executed allowing for a T→ℓ to occur either in cycle 2 or in cycle 3. However, there were no significant differences between these and other runs with regard to the evolution of the T→ℓF Rule. Hence, it was decided to check only in cycle 2 for a positive head-turn.
Again, most of the parameters of interest were subject to alteration as in Section 8.1.3. However, there were some parameters particular to these simulations which require further discussion.
Namely:
Several runs were conducted with the following order of appearance:
CYCLE 1 NOISE, HUNGER CYCLE 2 NOISE, TOUCH, HUNGER CYCLE 3 R(i), HUNGER CYCLE 4 FOOD/R(i), HUNGER CYCLE 5 R(i), HUNGER CYCLE 6 R(i), HUNGER
This meant that the buzzer was sounded sometime after the commencement of a time quantum such that the first two seconds coincided with the last two seconds of quantum 1 and the next three seconds coincided with the first three seconds of quantum 2.
This also meant that N would exert a large amount of control over the Learning Mechanism, since, by appearing twice it could, of course, be attended to twice and hence had more chance to overshadow and even completely dominate the T stimulus input in Cycle 2.
The Rules that may interfere with T→ℓF are N→ℓnT and N→ℓnTN. There were four sample runs considered.
This run had a very small initial set of Rules (10). This meant that although the N→ℓnα Rules would interfere, if T→ℓF was generated early, then conditioning should occur. Figures 8.26 a and b show that conditioning to T does occur but that N does interfere quite destructively. In fact the decline of T→ℓF from time 199 may be tied directly to the advance of N→ℓnNT which was at position 1 at time 229.
Extinction occurred successfully, N→ℓnN being able to use this to graduate to position 2 at time 279. The system was almost swamped by a high proportion of N Rules showing that the buzzer had been interpreted as being of high significance. At time 229, there were 10 Rules containing N in the top twenty compared with 4 containing T of which one was the Reflexive right Rule. In fact, N was attended to so strongly that F→sN was generated early and allowed to evolve to position 2 by time 229 and maintained this high position at the end of extinction.
In this run Rules such as NH→ℓn, TNH→ℓn and their creations along with the N→ℓnα type Rules competed strongly such that T→ℓF was never even generated. The usual sequence was:
Input | Rules activated | Rules created | |
---|---|---|---|
Cycle 1 | N | N→ℓnα | |
Cycle 2 | N,T | NT→ℓnβ | N→ℓnT TNH→ℓn |
Cycle 3 | R(i) | R(i)→nγ | NT→ℓnR(i) |
At time 117, the first activation of the Reflexive Left Rule occurred (in fact this was the first attendance to T) but T→ℓF was not created due to a N→ℓnα Rule occupying SIGPR at the time. In fact, during the training period, T→ℓ, was never able to occupy SIGPR and hence T→ℓF remained un-created.
There was no increase in frequency of elicitation of the positive head-turn response.
The initial positions of T→ℓ and N→ℓn were changed to 11 and 28 respectively.
The major problem again was the Rule N→ℓnTN. As before, this Rule was generated early (inserted into position 57) and was in position 1 by time 106.
The method of computing worth values meant that this Rule (having two predictive stimuli), once it occupied a dominant position, tended to be activated in both cycles 1 and 2. Due to this, the Rule TN→ℓnTN also got generated and had potential for activation in both cycles. Thus T→ℓF had its chances for activation even further reduced.
The following table gives the percentage chances for the activation of T→ℓF considering the first 15 Rules.
Time | % Chance for activation | Position of T→ℓF |
---|---|---|
129 | 1.76 | 12 |
209 | 4.90 | 5 |
289 | 10.73 | 2 |
Although the % chance for activation remained low the Rule did manage to evolve into the higher levels and was in position 2 at the end.
The count of positive head-turns was extremely low:
------------C------------ ----E----- 2 2 1 0 1 0 0 1 0 2 0 2 0
It was argued at this point that a naive organism could not be able to keep two predictions reverberating concurrently and it would be more realistic to cut down the number of expectational stimuli to one.
Thus, the maximum number of stimuli in the expectational set of any Rule was reduced to one.
This run had the above change implemented. The Rule TN→ℓnT interfered, again having potential for activation in cycles 1 and 2. The table below shows the % chances for activation of the 2 Rules considering the cumulative worth of the top 20 Rules.
Time | % Chance for activation of T→ℓF | Position | % Chance for activation of TN→ℓnT | Position |
---|---|---|---|---|
129 | 0.50 | 19 | 41.90 | 2 |
209 | 1.23 | 11 | 55.56 | 2 |
289 | 1.28 | 12 | 38.78 | 2 |
T→ℓF, obviously, fared even worse in this run never rising above a 2% chance for activation. The Rule TN→ℓnT had, at its peak, a near 56% chance and N→ℓnT had, at its peak, a 30% chance, both seeming considerably better to the model than T→ℓF.
Changing the order of stimulus input and having N input in the first two cycles proved far too great a distraction to the model. It could be said that when the N source is so dominant, the model conditions strongly to the buzzer and only very weakly to the tactile stimulus. If the state and behaviour could be allied with that of the human infant, then, the fact that Siqueland's infants did condition successfully meant that:
An R(i) stimulus was, as explained earlier, an extraneous stimulus. To have i=5 meant that there were five extraneous stimuli present along with the experimental set, any one being certain of entry in the absence of an experimental stimulus, i.e. P(ΣR(i)) = 1.0.
In the case of the average human infant in his normal feeding environment, it would be true to state that i > 1 and that 0 ≤ P(R(i)) ≤ 1.
In the case of strictly controlled laboratory experiments, values for i and P(R(i)) are far more difficult to estimate.
It would be easy to state, for instance, that Siqueland and Lipsitt conducted their experiments in an extremely well controlled environment such that i = 0 and P(R(i)) = 0. This would, of course, reduce the Long Siqueland simulations almost back to the Short Siqueland simulations there being no stimuli entrant in cycle 3. (However, the model could attend to left-over stimuli in STM and this could prove slightly distracting.)
It may be observed that the average human infant in his normal home environment could take several days to learn to correctly orient to the touch of his mother's nipple. But, the fact remains that he does learn eventually to do so.
A series of runs were implemented varying P(R(i)) in the range 0 ≤ P(R(i)) ≤1.
Figures 8.27a and b give the results of this run. No R(i) was allowed in cycle 3 but H could be entrant dependent on the value of HCOUNT.
The model successfully conditioned to T, N proving no deterrent in the process.
In the intermediary cycle the model attended, in the majority of trials, to the left-over T stimulus except when H had been newly input. It was rare that it attended to either a left-over R(i) stimulus or to an internally generated stimulus.
Once T→ℓF had become established at the top, it was usually activated in both cycles 2 and 3. Its slight decline just before the end of the training trials was due partly to:
During extinction T→ℓF had dropped down to position 7 by the end of the run.
The count of positive head-turns also shows a slight decline during extinction from 5 in the first six trials to 3 in the last 6. This showed that conditioning could be effected rapidly and strongly if no R(i) were allowed in during the intermediary cycle.
Figures 8.28a and 8.28b give the results of this run.
As can be observed, conditioning was effected but was a much slower process than in the previous run.
The model initially conditioned to N reinforced by T, and then conditioned to T also reinforced by F. The sequence:
STM | Activated Rule | SIGPR | Reverberating expectation | |
---|---|---|---|---|
Cycle 1 | N | N→ℓnT | N→ℓnT | T |
Cycle 2 | N,T | T→ℓF | T→ℓF | F |
Cycle 3 | F,H,R(i) | T→ℓF | T→ℓF | F |
Cycle 4 | F | F→s | F→s | - |
was usually obtained. However, if in cycle 3 no stimulus was input, then either N→ℓnT or T→ℓF (both being highly positioned) were activated, the reverberating expectations being either T or F. If, in cycle 3, T→ℓF was again activated, then we obtain:
Generated expectation | Confirmed expectation | Rule Rewarded | |
---|---|---|---|
Cycle 1 | T | - | - |
Cycle 2 | F | T | N→ℓnT up |
Cycle 3 | F | F | T→ℓF up |
Cycle 4 | - | F | T→ℓF up |
Thus in some trials, T→ℓF could be rewarded twice. If in cycle 3 N→ℓnT was activated then it was subject to positive and negative reinforcement, thereby effectively maintaining it in the same place.
This run was a good example of modifying the speed of learning by the introduction of extraneous stimuli. Comparing the 2 curves .8.27a and 8.28a it can be seen that the second curve has a much less steep gradient indicating that conditioning was effected at a slower rate.
The value of PROLD defines the extent to which a left over stimulus in STM could exert its influence over Rule activation. The lower the value of PROLD, the greater the aging effect of a stimulus, the formula being WORTH = WORTH * PROLDT where T = number of cycles in STM.
Keeping all other parameters constant, the value of PROLD was varied from 0.1 to 0.01, 16 runs being executed for obtaining results.
Figure 8.29a represents the count of positive responses for PROLD =0.1 and 8.29b represents the same for PROLD = 0.01. Figures 8.29c and 8.29d give the positions of T→ℓF and N→ℓnT for PROLD =0.1 and 0.01.
As can be seen, the increase in frequency of elicitation of the positive response is far greater for 0.01 than for 0.1 indicating that when left-over stimuli exert little or no influence, the conditioning is considerably strengthened.
The two learning curves indicate the same result, six out of eight runs having a marked degree of success; for T→ℓF when PROLD = 0.01 compared with two out of eight for PROLD =0.1. However, looking at the extinction results for the latter, it was found that T→ℓF was in the first ten, four out of eight times, indicating that conditioning was being slowed down rather than stopped.
Figure 8.30 (Tables a and b) show variation of PROLD again, but this time; with PROLD = 0.1, extraneous stimuli were only allowed in 50% of the time, whilst with PROLD = 0.01, extraneous stimuli were allowed in 100%.
Again Table 8.30b shows an improvement over Table a indicating that when left-over stimuli could not exert any influence, conditioning was effected far better than when they were allowed to influence behaviour hand in hand with far less extraneous noise.
Alterations in PROLD proved far more significant in this series of simulations due mainly to the interference from the Rule N→ℓnT . When PROLD was set such that an old-stimulus had potential for activation, then once N→ℓnT was highly placed it tended to be activated even in the T cycle. Governing PROLD down decreased the potential for activation of a Rule when any of its conditional stimuli had not been currently entrant. This meant, generally, that the last activated Rule was the most significant one and effectively also decreased the chance for keeping T→ℓF reverberating in SIGPR over the intermediary cycle. Thus governing PROLD too much could influence conditioning adversely, as shown by run 2 (Table b) where T→ℓF was at position 85 at the end of the training trials.
The deconditioning effects were not very pronounced in a large number of the Long Siqueland runs due to the unequivalent setting of JPRIGS and JPNIGS. The computations made in Section 8.1.3. gave a table of more equivalent values for these two parameters. Several runs were made to observe the extinction effects.
The conditioning process should not be significantly affected but extinction effects should prove far more significant with these settings.
Figure 8.31a gives the results of a series of eight runs each with a different starting point i.e. number of random numbers missed giving effectively a different start for the model.
Summarizing these results it may be observed that the frequency of elicitation of a positive head-turn increased from 58.3% to 95.8%. Extinction effects were equally apparent, the frequency of elicitation declining from 83.3% to 58.3%. Considering individual runs, however, brings out the fact that extinction did not always occur mainly due to the arbitrary method of glucose-presentation used during the extinction trials.
These were the most successful set of runs obtained demonstrating quite conclusively that conditioning and extinction effects may be obtained with the Long Siqueland simulations.
It could be expected from these settings that extinction effects would be satisfactorily obtained. However, the ratio 5/31 could prove too small to effect good extinction results since the movement upward of relevant Rules sufficiently fast enough to replace the movement downward of no longer consistent Rules should be carefully balanced and obviously the values here are clearly non-equivalent.
Figure 8.31b summarizes the results over a series of eight runs. During training the frequency of elicitation of the positive response rose from 58.3% to 91.7%. During extinction it declined from 79.2% to 70.7%. Not very conclusive extinction results. Looking at .the performance of T→ℓF over the eight runs, some steep drops were notable. In particular, position 2 to 98, 1 to 21, 1 to 66 and 2 to 20. i.e. internally, the conditioned response seems to have been successfully extinguished, the reason for the poor external results being the number of Rules causing a left head-turn remaining towards the top of the System.
Figure 8.31c gives the results over a series of eight runs. It may be observed that Tables a and c are reasonably similar. In Table a the count of the number of head-turns during training and extinction totalled 174 and 74. In Table c the equivalent counts were 173 and 84. Obviously a fared better than c with respect to external extinction effects.
In fact during extinction, considering these runs alone, the frequency of elicitation declined from 87.5% to 83.3% hardly a significant amount.
These runs seem to show that the absolute value of JPRIGS - the upward displacement parameter - has some effect upon extinction; not merely JPNIGS as could be expected.
These runs seem to show that the Long Siqueland simulations could demonstrate excellent conditioning and extinction results. They also confirm that it is not only the absolute values of the two parameters which effect learning, but that each plays a part upon the converse effect, i.e. JPRIGS influences extinction whilst JPNIGS influences conditioning. Thus the ratio JPRIGS/JPNIGS must indeed have influence upon learning.
Alterations in HCOUNT were equivalent to starting the model off at differing need levels. Alterations in PRMCNS influences the effect a Primary stimulus could have upon the Rule activation mechanism. (WORTH = WORTH * (1 + PRMCNS) for each matching Primary.)
Several runs were executed varying each parameter in turn to note the effects of individual and paired alterations.
It was hoped that the model would demonstrate increasing speed of learning with increasing need levels, the speed being comparatively enhanced by increasing values of PRMCNS.
Figure 8.32a gives the count of positive head-turns taken over three trial blocks as usual.
This indicates a decrease in the frequency of elicitation of the required response over training, followed by a recovery at the start of extinction and then another decrease towards the end of the experiment.
An HCOUNT of 88.0 represents a low-hunger infant, one who was being tested just after having had a full feed. Thus, one would expect such an infant not to condition readily due to the choice of reinforcing agent - namely, a sweet solution.
This was simulating an infant being tested 150 minutes after his last feed, assuming he had a full feed at the time. An HCOUNT of 64.0 represents, therefore, a middle-hunger infant.
Figure 8.32b summarises the results, showing a marked increase in the frequency of elicitation of the positive response. During extinction, a slight decrease was also observed.
Figure 8.32a gives the results taken over 3 runs. Again, a notable increase could be observed, but with no extinction effects.
Figure 8.32 d gives the results taken over a set of 8 runs. This time, no notable increase could be observed over the training period, the number of times a positive response could be obtained remaining at a probability of 0.5 over the last thirty trials.
This simulates the infant being tested 330 minutes after his last feed.
Table 8.32e gives the results over a set of 8 runs.
The conditioning was evident but not very good, the frequency increasing from 10 over the third block of thirty trials to 18 over the last block of thirty trials. Figure 8.32g gives the positions of the Rules T→ℓF and TH→ℓF over the 8 runs.
T→ℓF was successful (in the top 6) in three runs, whilst TH→ℓF was successful also in 3 runs. Only in one run were they successful simultaneously, the former being in position 1 after 210 trials whilst the latter was in position 5. In this same run, T→ℓF dropped to 5 during extinction but TH→ℓF rose to 2.
The PRMCNS value was raised such that Rules of the type TH→ were given a distinct advantage.
Figure 8.32f gives the results, showing a slight increase only in the frequency of elicitation of the positive response - Figure 8.32h gives the evolution of the Rules T→ℓF and TH→ℓF, the former graduating to positions 1 and 4 in two runs only whilst the latter occupied positions 7,1,1 and 3 in four runs. In terms of the internal indicator of success, then, this set of runs were significantly better than the previous ones, but fall far short of the middle-hungry infant simulations.
The reason for this was the competition between the two Rules TH→ℓ and T→ℓF, which competition increased the hungrier the model became.
Looking at these runs in detail, the problem was:
In both circumstances, then, T→ℓ got demoted and hence decreased enormously the chances for reinforcing either Rule through regeneration attempts.
Further, if the Rule TH→ℓ did manage to be activated, then it resulted in the generation of TH→ℓF thereby demoting itself.
The cumulative result of these fluctuations was that as training progressed, the worth of the Rules of the type T→ℓ decreased whilst the worth of the Rules of the type T→rβ increased, i.e. the model tended to turn right more often than left.
Looking in detail at a successful run the following points were observed:
Rule | Position | Worth |
---|---|---|
T→ℓ | 20 | 0.10 |
T→γ | 35 | 0.04 |
TH→ℓ | 40 | 0.11 |
TH→r | 50 | 0.08 |
TH→rR8 | 60 | 0.12 |
Thus TH→ℓF had a 33.8% chance of being activated. Its high worth value was due to PRMCNS = 2.1 whereas a value of 0.5 would have reduced it to 0.07.
T→ℓF was not generated due to T being always input alongside H, thereby generating TH→ℓ and TH→ℓF through ADDHNG. Therefore interference effects were minimal.
It was originally felt that lowering HCOUNT and correspondingly forcing PRMCNS higher would force the model to condition strongly and to bring up the Rule TH→ℓF along with T→ℓF.
The simulations showed that this was simply not the case. In the range mid (44.0) to high (21.0) hunger, interference between the two conditioning Rules resulted in much poorer conditioning than was effected in the range mid to low (88.0) hunger. However, looking at the positioning of the relevant Rules, one could say that
It would be true to say that upon interpretation of the model's internal state the hunger state could actually slow down the speed of learning, i.e. interference between Rules could be destructive enough to keep all or most of the relevant Rules from ascending to any dominant level.
The conclusion drawn from these observations was that in the range mid to high hunger (as portrayed by the computed settings for HCOUNT) hunger was simply not dominant enough to effect speedy learning. Thus it could be argued that for values of <21.0, the need would be immediately responded to and speedy learning effected. This is borne out by the present model of need within the model. Even at high hunger, the stimulus HUNGER was not always emitted. If, for instance, p(H) = 1.0, then T→ℓF could never even be created and hence TH→ℓF would have no competing Rules strong enough to prevent conditioning.
Clearly the model could successfully condition to a tactile stimulus even if the reinforcement is delayed allowing for an intervening time interval.
As in the Short Siqueland Simulations, these results indicate that pairing a tactile stimulus with presentation of a sweet solution resulted in the increase in frequency of elicitation of the reinforced response, namely the left head-turn. Figure 8.33a gives the mean percent response for a series of runs in blocks of three trials giving ten blocks for training and four for extinction. Row 1 gives the simulation results and Row 2 gives the results obtained by Siqueland and Lipsitt (Experiment 1, Table 1).
Figure 8.33b presents the two learning curves, the dotted one being that obtained in Siqueland and Lipsitt (Experiment 1, Fig. 1).
The effect of the buzzer was to set up a strong expectation for the occurrence of the tactile stimulus thereby strengthening the probability of responding to the tactile stimulus in the intervening period during its occurrence and presentation of the sweet solution. Observation of the Production System showed that in all cases the model had identified the auditory stimulus as being of extreme significance and had conditioned strongly to both stimuli.
These results suggest that a normally weak eliciting stimulus (Siqueland & Lipsitt, 1966) could be transformed into a strong stimulus eliciting a stable response by pairing it with a suitable reinforcing agent.
The deprivation results were not altogether satisfactory. They seem to indicate that conditioning may be effected during low hunger, medium hunger and very high hunger but that in the range medium to high hunger the learning process itself could be disrupted by the inconsistent nature of the Drive function. This could imply that when the Drive was constantly emitting internal Primary stimuli, that conditioning would be rapidly effected relative to the process observed at low and medium hunger levels. Obviously at an acute need level, no learning at all could be established due to the autonomous defence mechanism supplied to the model which effectively takes over the system until such time as the level is brought down.
This is in keeping with Siqueland and Lipsitt's results which could not show any reliable effects upon conditioning during high hunger as compared with the low hunger group.
The positive response was designated RS+ being that response which was contralateral to the most frequent response RS- obtained during baseline trials.
The baseline trials were constructed so as to present a tactile stimulus to alternate cheeks on alternate trials. It was then noted whether a positive response (left-turn to left-cheek and right-turn to right-cheek) was elicited or not, that response elicited most often indicating the existent bias. Thus if there were more right-turns to right cheek-touch than left-turns to left cheek-touch then the infant was assumed to possess a right bias.
In the model a bias was given by the initial placements of the Reflexive Rules within the Production System. Thus the arrangement:
P1 : TR → r P2 : TR → ℓ P3 : TL → r P4 : TL → ℓ
would give an initial right bias, TR → r and TL → r being higher than TR → ℓ and TL → ℓ and hence having a greater probability for activation. It was decided that the initial positions should be:
Position | Reflexive Rule | Worth |
---|---|---|
11 | TR→ r | 0.32 |
12 | TR→ ℓ | 0.27 |
13 | TL→ r | 0.24 |
14 | TL→ ℓ | 0.21 |
which gave the model a very slight overall right bias.
Thus RS+ was a left head-turn, RS- being a right head-turn. The positive stimulus was a left cheek-turn which if it elicited a left head-turn was paired with the presentation of the reinforcing agent. However, the negative stimulus was not reinforced even if it elicited a positive response (right head-turn).
The positive stimulus was always preceded by the auditory stimulus BUZZER, whilst the negative stimulus was preceded by the auditory stimulus TONE. In alternate trials then BUZZER or TONE were sounded followed by the left and right tactile stimuli.
Thus the sequences:
TRIAL n: CYCLE 1 BUZZER, HUNGER CYCLE 2 TCHLF, HUNGER CYCLE 3 R(i), HUNGER CYCLE 4 FOOD/R(i), HUNGER CYCLE 5 R(i), HUNGER CYCLE 6 R(i), HUNGER Trial n+1: CYCLE 1 TONE, HUNGER CYCLE 2 TCHRG, HUNGER CYCLE 3 R(i), HUNGER CYCLE 4 R(i), HUNGER CYCLE 5 R(i), HUNGER CYCLE 6 R(i), HUNGER
defined the total experiment, the trials being alternated giving a total of 60 training trials.
The evolution of the Rule TL→ℓF was taken as the internal indicator of success whilst the count of the positive responses was taken as the external indicator of success.
The measure of learning was the mean percent increase in elicitation of RS+ , RS- was expected to show a decrease in frequency of elicitation.
A maximum of ten random stimuli were used.
Appendix C gives the relevant details of this run, Table Cl giving the input parameter settings for the chosen run and Table C2 giving the initial Production System.
Figure 8.34 gives the creation and evolution of the TL→ℓF Rule. Figure 8.35 gives the count of the positive head-turn responses RS+ and the count of RS- grouped into three-trial blocks. Figure 8.36 shows the percentage opportunity afforded to the model for turning left (Σα→ℓ) and turning right (Σβ→r) taken over the training period, i.e. it shows the evolution of the two ratios:
(Σ worth of Rules of type α→ℓ) / (Σworth of all Rules) * 100 and
(Σ worth of Rules of type β→r) / (Σworth of all Rules) * 100
over the time period 50 - 410.
The Rule TL→ℓF was generated at time 114 from the sequence:
Time | Rules obeyed | and Worth | Rule created | Most significant Rule | and Worth | |
---|---|---|---|---|---|---|
111 | H,R4,TL | TL→ℓ | 0.44 | - | TL→ℓ (overwrites BZ→ℓn | 0.44 |
112 | H,R4,TL | R4→nℓ4 | 0.11 | TL H→ℓ | TL→ℓ | 0.22 |
113 | H,R4,F | F→s H→nℓ |
0.04 0.01 |
TL→ℓR4 | TL→ℓ | 0.11 |
114 | H,R4,R9 | R9→nℓ9 | 0.01 | TL→ℓF | TL→ℓ | 0.05 |
In fact, TL→ℓ remained in SIGPR until time 116 when it was overwritten by TN→ℓn (TONE → listen). It was directly activated at time 123 from position 90 (with a worth of 0.01) which resulted in a direct PRDIGS reinforcement up to position 50. At time 138 it was reinforced through attempted re-creation to position 19, and then again to 10 through direct activation at time 171.
It moved down two places (through movements of other Rules around it) but was reinforced again to position 11 through activation of the Reflexive Rule. At time 209 it moved up to 5 through direct activation, and similarly up to 2 at time 233. It attained position 1 at time 269 through PRDIGS.
By the internal measure of success used, this was an extremely successful run, the model quite distinctly differentiating between a tactile stimulus on its left cheek and one on its right cheek, associating the left head-turn response (upon receiving a left cheek-touch) with the presentation of food and becoming strongly conditioned with respect to the positive response. In contrast the right head-turn responses (TR→r and TL→r) remained at almost their original positions moving down slightly from 11 to 13 and from 15 to 16 by the end of the training period. TL→ℓ moved down from 12 to 31 by the end, a far sharper decrease, and at time 209 was at position 65, which constituted a tremendous deterioration in its worth value.
Figure 8.34 shows also the history of the Rules TL→ℓ, TL→r, TR→r, BZ→ℓn TL and TN→ℓn TR (TONE → <Listen> TOUCH RIGHT).
As can be observed, the model conditioned strongly to the stimuli BUZZER and TONE, and upon receiving the appropriate auditory stimulus set up strong expectations for the occurrence of the corresponding tactile stimulus. Conditioning to the left cheek-touch occurred independent of this although there was the expected tendency for the three Rules to compete for the highest level.
Extraneous stimuli were allowed in with 1.0 probability during cycles 3 (if no FOOD), 4,5 and 6 in the Buzzer trial and during cycles 3,4,5 and 6 in the Tone trial. Thus they had increased (over the previous simulations) opportunity to exert control over the model's behaviour. In fact the decline in the elicitation of the positive response (see Figure 8.35) was due to the activation of the Rule TL→rR2 which was at position 5 (from an initial position of 75) at time 387. Rules of this type tended to reverberate for a long while in SIGPR due to (i) their high worth (1.96 at time 387) and (ii) the fact that generally it took a long time for confirmation to occur. In the latter part of the experiment this is not of great significance since the creation of new Rules has distinctly fallen off due to most of the permutations and combinations having already been satisfied. However, during the early training period, a Rule of this type reverberating for a number of cycles in SIGPR does actively prevent Rules such as TL→ℓ or TL→ℓF from occupying SIGPR and hence can delay the creation and evolution of the conditioning Rule.
Looking now at Figure 8.35, the external measure of success does not appear terribly conclusive. Taking blocks of nine we obtained counts of 4,5 and 4 RS+ responses and 4,4 and 2 RS- responses, i.e. there was a shift in occurrence of the head-turn to the negative stimulus (TONE and right cheek-touch) from 4 to 2 but no shift in occurrence of the head-turn to the positive tactile stimulus (4 to 4).
The model demonstrated a higher incidence of RS+ head-turns to the positive stimulus, over RS- to the negative stimulus (13 out of 27 to 10 out of 27) during training.
During extinction there was a decrease in RS+ from 4 in the first 6 trials to 1 in the last 6. However, RS- showed a similar decrement from 2 to 0 in six trial blocks.
Figure 8.36 gives the history of the two ratios percentage measure of elicitation of a left head-turn and percentage measure of elicitation of a right head-turn. The following Table demonstrates the difference in the two measures and the total worth of all Rules at the time of computation.
Time | % of Left Turns | % of Right Turns | Difference | Total Worth of system |
---|---|---|---|---|
50 | 6.72 | 5.80 | -0.9 | 8.3 |
129 | 21.20 | 19.60 | -1.6 | 5.5 |
209 | 5.02 | 7.86 | +2.9 | 15.7 |
289 | 1.70 | 2.78 | +1.1 | 34.1 |
369 | 1.65 | 5.76 | +4.1 | 40.4 |
449 | 5.30 | 8.00 | +2.7 | 39.2 |
A negative value for the difference indicates a right bias whilst a positive value indicates a left bias. It may be observed that the model shifts from a right bias at time 129 to a left bias by time 209. At time 369 just prior to the extinction trials, the System contains 5.8% of Rules which make it execute a left head-turn as compared to 1.7% of Rules which make it turn right. The total worth of the System increased also from an initial value of 8.3 to 40.4 at time 369. This was due to three reasons:
The initial decrease in cumulative worth between time 50 and 129 was due to the fall of R1→nℓ1 which was originally at position 1 (with a worth of 5.93) and which declined to position 8 (with a worth of 0.53).
This run clearly demonstrates that the model was capable of learning to differentiate between two eliciting stimuli, conditioning to the positive stimulus and habituating to the negative stimulus.
Many Siqueland II simulations were run, the execution time required beinp 280 seconds.
Of seven runs, six indicated a shift in the occurrence of the head-turn response to the positive stimulus and six indicated a decrease in elicitation of the response to the negative stimulus. In all seven runs, the un-biased response to the positive stimulus was reinforced with the sweet solution. During extinction, the non-reinforced response showed a decrease in 4 out of 7 runs, whilst the reinforced response showed a decrease in 1 out of 7. Thus extinctions results were not particularly conclusive.
Better extinction results were attempted by raising JPNIGS (from 9 to 17) and by lowering JPRIGS (from 7 to 5).
Figures 8.37a and 8.37b give the internal and external measures of success.
Externally the model seemed not to have extinguished the conditioned response, consistently turning left in 9 out of 9 trials.
However, internally it could be observed that TL→ℓF actually declined from position 5 (at end of training) to 19 by the end of the extinction trials.
The reason for this incompatibility was the delay in displacing TL→ℓF from its superior position. In fact the Rule was moved down from 4 to 6 in tine 508 (through EXTEND attempting to re-create TL H→ℓ) and from 8 to 17 in time 517 (through CALSIG for an unconfirmed prediction.) The delay was due to two reasons:
With a lower setting for JPRIGS (7 to 5) it was expected that conditioning would be effected slower and that extraneous stimulus Rules would find it more difficult to rise to the higher levels.
Figures 8.37C and 8.37d give an example of an extremely unsuccessful run. The random numbers missed was 77 (as compared with 20 for the above run) and hence effectively portrayed a different infant but with the same genetic setting.
This run was unsuccessful due to the reasons shown in the following Table:
Time | Rules activated | Rules created | SIGPR |
---|---|---|---|
51 | TL→ℓ | - | TL→ℓ |
52 | R1→nℓ1 | TL→R1 | R1→nℓ1 |
53 | F→s | R1→nℓ1 | R1→nℓ1 |
75 | TL→ℓ | H R1→nℓ1 F | R1→nℓ1 |
76 | TL→ℓ | TL H→ℓ | R1→nℓ1 |
77 | F→s | HR1→nℓ1 R7 | R1→nℓ1 |
87 | TL→ℓ | - | TL→ℓ |
88 | R1→nℓ1 | - | R1→nℓ1 |
89 | F→s | R1→nℓ1F | R1→nℓ1 |
99 | TL→ℓ | - | TL→ℓ |
100 | R5→nℓ5 | TL H→ℓ R5 | R5→nℓ5 |
101 | F→s | H R5→nℓ5 F | R5→nℓ5 |
This sequence describes five separate occasions upon which TL→ℓ could have fathered TL→ℓF but each time an extraneous stimulus interfered. The result was that by time 111 TL→ℓ had dropped its worth from an initial 0.21 to 0.06. In contrast TL→r had a worth of 0.18. At time 123 when the TOUCH LEFT stimulus recurred the following worth values were computed:
Rule | Worth |
---|---|
TL→r | 0.21 |
TL→ℓ | 0.07 |
TLH→ℓ | 0.11 |
TL→ℓR5 | 0.05 |
TL→r R3 | 0.01 |
The chance for TL→ℓ being activated was a mere 15.5% whilst TL→r had a 46.6% chance in contrast.
If, therefore, there is a delay in creating TL→ℓF initially then, the chances for creating it thereafter correspondingly drop. That is to say that if the model fails to identify a recurring feature early on during training, and it constitutes merely one of many recurring features present, then it becomes correspondingly more and more difficult for it to be able to isolate that feature. This task is further complicated by the introduction of random elements which may, through coincidence, appear regular and which serve to mask even further the more significant and more truly regular feature.
Another change in the starting point for the random "numbers missed (55) produced the one whose results are given in Figure 8.38 a and 8.38b.
As can be seen this was an extremely successful run.
This serves to show that given two organisms with the same genetic starting points and then placing them in two identical environments one can obtain totally opposing results.
Raising the upward displacement parameter from 5 to 7 should enhance conditioning. However, the ratio JPRIGS/JPNIGS has also been raised from 5/17 to 7/17. This may have a slight debilitative effect on the learning process.
Figures 8.39 a,b,c and d give the results from two sample runs.
They both show extremely strong conditioning internally and externally. In fact, conditioning has been effected much faster than in those runs with settings at 5 and 17. This was to be expected with the higher upward displacement value.
However, the extinction results differ greatly. In the first sample (Figure 8.39a) TL→ℓF declined to position 32 by the end of the run. In the second sample (c) it was still up at position 1.
If we look at its history of activations and the environmental results obtained, we find:
Sample 1 - 0011 Sample 2 - 111011
where 0 denotes an activation of TL→ℓF without the benefit of FOOD and 1 denotes a rewarded activation.
Thus, in sample 1 out of 9 possible opportunities for activation, TL→ℓF was activated only four times - it received two early demotions via CALSIG but the following confirmation bumped it up to 17. However, it received also a demotion from EXTEND which sent it down to 32, though the last confirmation brought it back up to 17.
In sample 2 it was activated 6 out of 9 of which 5 were rewarded. Even the one non-confirmation did not result in demotion since FOOD remained in the list of significant predictions through 12 whole cycles until the next activation of TL→ℓF.
In all previous simulations, food was presented randomly during the extinction trials. As the results show this did not give very good extinction effects even with suitable settings for the parameters JPRIGS and JPNIGS.
It was decided to attempt a series of runs whence food was made totally absent during extinction trials and not input under any circumstances. It was expected that these runs should show reliable extinction effects, the internal results being far better than what was previously obtained and the external results at least marginally better.
There was also an alteration made in the initial Production System (see Table c2 in Appendix C), F→s replacing R1→nℓ1 at the top of the list followed by four filler Rules, R→nℓ1 being placed now at position 30. This was in order to reduce the distinct control exerted by R1 at the top which had been quite evident in many previous simulations.
Figure 8.40a summarizes the external results for RS+ over a set of 7 runs each with a different random number as starting point. Clearly conditioning has occurred there being a shift in RS+, from 47.6% to 71.4%. During extinction RS+ shows a decrement from 57.1% to 38.1%. These extinction results are a definite improvement on the Short Siqueland results (a downward shift from 82.2% to 73.3%) and on the Long Siqueland results (85% to 67%). Runs 55 to 77 taken individually do not show strong conditioning or extinction effects. However, of 7 runs only 1 (Run 33) actually shows an increment in RS+ during extinction.
Figure 8.40b give the internal state of the model with regard to the two Rules TL→ℓF and TL H→ℓF was strongly positioned in 5 out of 7 runs. _In these well-conditioned circumstances, TL→ℓF shows a strong downward trend during extinction (3 to 16, 2 to 26, 2 to 57, I to 25 and 3 to 57) a far more positive set of results than previously obtained. In no run does TL→ℓF show an upward trend during extinction, a situation which had occurred often in the past.
Clearly, then, extremely good extinction results may be obtained by withholding food completely during extinction.
The stimulus which elicited a positive response followed by glucose presentation was termed the positive stimulus and that which did not get rewarded was the negative stimulus.
It was required that:
The measure of learning, therefore, was the shift in percent occurrence of RS+ and RS- over trials as a function of differentially reinforcing the two responses during training.
Figure 8.41 summarizes the results obtained by comparing the percent occurrence of RS+ and RS- over training and extinction trials.
The table presents the mean percent occurrence of RS+ and RS- during training and extinction, separately in blocks of three trials. Row 1 gives the results over a set of six simulation runs, all with identical parameter settings but each with an unique starting point.
It was expected that if conditioning had occurred, RS+ should show a higher total incidence (100 out of 180) over RS- (totalled 89).
The shift in RS+ during training was from 38.8% to 66.7%.
RS- shifted from 50% to 55.6%. RS+ showed a distinct upward trend whilst RS- remained average in elicitation (it never rose above 61% or dropped below 33%).
These runs clearly showed that the relative response to two eliciting stimuli was influenced by differential reinforcement such that one may be made far stronger than the other. Thus, the model showed it could:
The computer simulations of the experiments conducted by Siqueland and Lipsitt demonstrate the following features of the model: Given only an initial Reflexive repertoire the model was able to display:
The alteration of what were considered to be maturational parameters and the observation of subsequent effects upon the model demonstrated that although slight deviations from the norm occurred, these were more in the nature of slowing down acquisition effects rather than halting the learning process altogether.
Experiments with parameter adjustments with respect to the Production System produced valuable insight into the behaviour of a dynamically altering data-base. It was observed that:
It was also noted that reliable extinction effects could not be obtained by implementing the method used by Siqueland amd Lipsitt. This seemed to indicate that:
It was suggested that experiments undertaken on human infants should be designed so as to show the strength or stability of a conditioned-response, e.g. whether it be easily or gradually distracted by the introduction of a source of environmental noise.
The model presented in this thesis attempts to provide a complete system framework within which neo-psychological processes undertake the task of constructing an adaptive knowledge base.
It differs from the main stream of Artificial Intelligence research in its attempt to present a number of aspects of human learning within a comprehensive system's architecture. The psychological processes discussed in Chapter 2 acted as a number of constraints upon various features of the architecture. It was seen from the review of the AI literature in Chapter 3 that very few attempts have been made to study psychological processes within the overall framework of the human cognitive system, the tendency being to study interesting processes in isolation rather than in the context of human behaviour in general.
The tendency has also been to try and find answers to the question:
What internal cognitive constructs should be proposed in order to generate a particular form of external behaviour? rather than to explore the question:
What processes underlie the formation of cognitive constructs required to generate the observed behaviour?
There is a fundamental difference between these two questions. The first tries to encapture the encoded algorithms required to simulate a variety of state transitions expressed by the system. The second attempts to formulate learning laws which generate, of their own accord, constructs which when activated generate the series of required state transitions. It is a difference in level of viewpoint, the one being more fundamental than the other. It was required that the proposed model should account very closely for chosen features of data that have been obtained from experiments with human neonates.
Chapter 8 confirmed these expectations, the results from a number of simulation runs showing that the model's learning curve could be made to correspond very closely with that of the neonate.
This thesis also offers further insight into the usefulness of a Production System as a control structure for the encoding of experiential information. Tne form of the Production Rule used showed the potential richness of Production Systems for providing architectural alternatives for the encoding and elaborating processes. It was seen to be possible, using automated Rule elaboration and modification processes, to produce an evolutionary knowledge base wherein each Rule represented contextual and behavioural knowledge.
The goal of the system was inherent within these automated modification processes, and was such that the knowledge base be increased in order to extend the scope of activity of the model with respect to its environment.
The key to the future use of the model lies in two of its basic features. Namely:
If, however, one wishes to investigate further the question as to the underlying processes required for the creation of more advanced constructs, then the model provides an eminently suitable test bed upon which to try out the proposed functions.
Extensions to the model may be made in a relatively simple way in order to investigate other psychological processes, not in isolation, but within a suitable, total cognitive framework.
At present the model uses a 3-tier memory structure, each sub-structure differing from another by the level of reverberation of its information items, the stored form of these information items, and by their times of occupation.
SSTM may be said to contain extremely transient items of perceived information, where perception is equivalent to that process responsible for depositing information in SSTM and for selecting the set for transference into STM. STM is a less temporary module containing items of information capable of activating encoded information structures in LTM. This is seen as a similar process to attention where perceived information is capable of invoking memories already stored within LTM. The invoking of a structure in LTM resulting in its activation leads automatically to the creation of a near copy of itself to be stored in LTM. This is seen as a similar process to learning where learning results in the creation and storage of a subtly altered copy of an already activated structure.
With the aid of these memory structures (SSTM, STM and LTM), the encoded information structures (Production Rules) and the control functions that direct and coordinate the flow of activity through the model, specific phenomena attributed to activities of memory may be investigated.
There are two phenomena in particular which have already been subject to investigation (Pinkus, 1970), namely Receding and Chunking. However, they have been studied as isolated functions rather than as processes occurring within the overall cognitive system.
A function such as Receding, i.e. the replacement of a group of elements by an internally generated identifier which is a coded representation of the original sequence, should be investigated as constituting only one aspect of the overall Learning Mechanism.
Given the sequence S1, S2, S3 in STM, the problem is:
Sl S2 S3 ⇒ < > S1'where S1' ≡ S1 S2 S3 and is written back in STM. However, the S1' in this case is an internally generated stimulus symbol but not an expectational stimulus symbol such that S1' ∉ E where E = environmental stimulus set.
As another example, Chunking would involve SSTM, the Perceptual Process and the Stimulus Symbol Table, It would require that the Perceptual Process could associate stimulus symbols together on a temporal basis (such a function already existing in the Cognitive Processes) such that eventually a well recognised sequence could be replaced by an internally generated identifier. The difference, of course, between Receding and Chunking would be that the chunked information provides a pointer to its identifier, and the identifier would possess a backward pointer to the information. The saving would then be in SSTM - STM transfer and in STM storage. However, for Receding the backward arrow would not exist, there having to be a decoding process to regenerate the encoded information.
Thus Chunking would be a more natural extension to the model and more in keeping with its design philosophy than Receding. However, both could be implemented as extensions to the Perceptual Process and to the Learning Mechanism and studied within the overall framework of cognition and behaviour.
An attempt was made to see if the model could acquire skill in a non-trivial problem area, the area considered being the game of Noughts and Crosses (Tic-Tac-Toe).
The measure of learning was taken as the increase in the number of wins as accomplished by the model. This should be reflected internally by the set of Production Rules which eventually graduated to the higher levels, the evolved set representing the moves required for winning under different contingencies.
STM size was reduced to 2. This meant that in any time T0 STM would contain ST0 (the stimulus newly input at T0 ) and S'I-1 (an internally generated stimulus from the previous time interval).
Input was always 1, the opponent always making only one move. Output was always 1, the system's response, and if the Rule activated contained an internal stimulus this was written into STM.
Stimulus symbols in STM were set to decay extremely rapidly, such that only the current input symbols could be maintained in STM, i.e. decay was 100% in every time interval.
The sequence of events then, with respect to STM updating were:
Time Interval | Input | STM Contents | Rule Activated | STM Update |
---|---|---|---|---|
T0 | ST0 | ST0S'T-1 | ST0S'T-1 → r1 S'T0 | ST0S'T0 |
T1 | ST1 | ST1S'T0 | ST1S'T0 → r2 S'T1 | ST1S'T1 |
Thus in each time interval STM was made to contain the current input stimulus and the internally generated stimulus from the previous time interval.
The major difference was, of course, the form of the internally generated stimulus. All S' ∉ E where E = environmental stimulus set. Thus S' was not actually an expectation but rather a link to the next Rule to be activated. It could be thought of as a prediction, the actual generated symbol not embodying the predicted move (as in the original model) but serving to link to those Rules containing the predicted symbol. This led to a different form of a Rule:
Si αi → ri βi
where
Si - opponents move; αi - backward pointer to the set of Rules containing αi on their right hand sides effectively predicting Si as the next opponent move; ri - external response by the model; βi - forward pointer to the set of Rules containing βi on their left hand sides and selecting a set of possible future opponent moves.
However, this could easily be equated to the previous Rule
definition S1 S2 ... Sn r1 r2 ... rm S'1 ... S'l where Siαi ⊂ (S1, S2, ..., Sn) and ri ⊂ (r1, r2, ..., rm), i=m=1,; and βi ⊂ (S'1, S'2, ..., S'l), i = l = 1
The routine CONCAT, therefore, was altered to generate its own internal identifiers on the basis of the previous moves executed and the current state of the game.
The current states of the game corresponded to different Primary Drive settings (where the Primary Drive = game played) and each setting defined as Win, Fail, Draw, Continue and Illegal Move resulted in the reinforcement processes being implemented.
Unlike before, the Learning Mechanism could only be called by the above game states occurring. When called, it resulted in connecting two Rules upon a causality scheme and was in fact CONCAT with slightly different parameters.
Thus if the two Rules obeyed in times T-2 and T-1 were:
S1 → r1 S2 → r2
then create at T0:
S1 → r1 α1 S2 α1 → r2
which linked the two Rules on a causal basis.
There was a slight variance though, for, if a Rule of the form S1 → r1 α1 already existed in LTM, then only one new Rule was created, it being:
S2 α1 → r2
implying that a causal connection had already been partially built up by the model.
If the last two Rules activated were:
S1 → r1 β1 S2 → r2
then they were connected to form:
S2 β1 → r2
the causal connection being completed.
This may be seen as the activity of EXTEND where if S1 → r1 β1 was activated, then β1 was written into STM. S2 enters STM externally, activating S2 → r2, the Rule being extended to create S2 β1 → r2.
If the last two Rules activated were:
S1 α1 → r1 S2 β1 → r2
then create:
S1 α1 → r1 β1
This again is a result of the law of Causality wherein the activation of S1 α1 → r1 was believed to have caused the occurrence of β1.
If the last two Rules activated were:
S1 → r1 S2 → r2 β2
then create:
S1 → r1 β1 S2 β1 → r2
NOTE that as before EXTEND eliminates predictive symbols allowing the new Rule to form its own connections.
REINF was called upon if any defined game state occurred.
The Production System was initialised with the set of Rules:
→ 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8 → 9
The board squares were numbered:
1 2 3 4 5 6 7 8 9
where each ri denoted a board position that the model could activate as an external response. Thus a -> 2 resulted in an x (if the opponent's moves were O's) in board position 2.
Thus the model was completely naive with respect to the game being played upon onset of the experiment.
At any time interval STM contained the opponent's current move and the internally generated stimulus from the previous time interval.
→ ri
(null left hand side rule)
S2 → ri
(matching opponent's move)
S2 α1 → ri
(matching opponent's move and internally generated stimulus.)
In any time interval the choice process could exit with a number of matching Rules.
Only one Rule may be chosen for activation, the Rule chosen being based on its worth value.
Thus any Rule i could be chosen a fraction s of the time where
s = (wi/pi2) / (∫ wj / pj2 (the integral runs from j=1 to j= n)
where
pi = position of i th Rule in LTM qi = complexity of the i th Rule
(The complexity of the Rule being equivalent to its information content and being defined as:
w = 1 for a → ri Rule w = 2 for a S1 → ri Rule w = 3 for a S1 αi → ri Rule w = 3 for a S1 → ri βi Rule w = 3 for a S1 α1 → ri βi Rule
Thus the Rule with the most information content got preference.
A Rule was chosen, therefore, on a probability basis, preference being given to highly placed and more complex Rules.
The worth process is as defined in Section 6.2.4 but with a set of slightly different parameter settings.
New Rules were generated and old Rules reinforced on the basis of previous Rule activation and the current state of the game.
The Production System was seen to expand as the game progressed. Consider the following sequence of events, occurring some time after the game had been in progress . After five time intervals the game concluded in a draw with 5 new Rules being created in the process, 2 of them having been enhanced:
Time | Opponent's last move | STM | Rule Activated | Rules created | Promotion | Demotion | State |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | → 2 | 1 → 2 | - | - | Continue |
2 | 5 | 5 | 5 #8594; 9 400 | - | - | - | Continue |
3 | 6 | 6 400 | 6 → 4 300 | 1 → 2 100 5 100 → 9 |
- | - | Continue |
4 | 3 | 3 300 | 3 → 7 | 6 → 4 300 3 300 → 7 |
- | - | Draw |
In the early intervals less connections would be made between Rules and more Rules would be created of the form Si → ri. After the first Draw, the first Rule of a connecting type would be created of the form Si → ri βi and Sj βi → rj. From thereon, if these Rules were activated they would lead to connections being made even with the game state being Continue. Thus, as the game progressed it should also lead to the complexity of the average Rule increasing, the complex Rules slowly ascending with each successive Win or Draw. The bad complex Rules should similarly descend with each Fail.
For each game configuration the model should develop at least one sequence of Rules (good machine) which when activated allow it to either Win or Draw. Typically there should exist several such machines, each one defining slightly differing configurations and leading to Wins or Draws.
As more and more games are played the number of machines the model has developed should increase, whilst the better machines (leading to a Win) should gradually evolve and displace those leading only to Draws.
Take the sequence of moves 1, 5 and 9. Taking moves 1 and 2, the first thing the model has to learn is that 1 and 5 followed by a 9 leads to a Fail. Thus it should evolve the set of Rules:
1 → ri 5 → 9
(All other 5 → Rules leading to Fails and being demoted.) Having achieved this, the opponent would then switch his third move, say from 9 to 6, allowing him a Win on next playing 4. Thus the model has to form the set:
5 → 9 6 → 4
to enable it to continue the game. It now has the sequence:
1 → ri 5 → 9 6 → 4
but is still unable to connect these since no Win or Draw state has been accomplished. However, the Fails result in the demotion of the bad moves, effectively allowing for the indirect evolution of the good moves.
The opponent, now unable to play 4 can resort to trying for the diagonal, say, playing 3. The model has only two alternatives left, one leading to a Fail and the other to a Draw. Once it creates:
3 → 7
it has ended the game in a Draw and can now connect the last two Rules. These being:
6 → 4 100 3 100 → 7
and directly reinforces these Rules. This increases the chances of playing 6 → 4 100 as opposed to 6 → 4 or 6 → ri, thereby enabling it to create 5 → 9 200, 6 200 → 4 100 and so on until all the Rules have been connected.
Thus the machine:
1 → 2 300 5 300 → 9 200 6 200 → 4 100 3 100 → 7
can be formed enabling it to Draw. Obviously many such machines exist (e.g. 1 → 5 onwards).
Typically, if a series of configurations G1, G2, ... Gn be played, each one being repeated a sufficient number of times, then, the model should evolve a set of machines (sequence of Rules) M11, M12, ... M1m, M21, ... M2ℓ, ...Mn1...Mnp.
Figure 9.1 gives an example of a good machine for the sequence of moves 1, 5, 6 and 7 concluding in a Draw. Figure 9.2 gives a better machine which enables it to Win.
Figure 9.3 demonstrates an approximate break-down of the winning games for a series of game configurations.
The most likely sequence of Rules after 5,000 moves had been made was:
1 → 5 106 2 → 3 103 4 → 6
i.e. a Drawing machine.
However, after 10,000 moves the machine was:
1 → 4 5 → 9 101 2 101 → 3 7 → 6
i.e. a Winning machine.
Conflicting situations may arise which result in inhibiting learning or slowing it down quite effectively. Consider:
1 → 2 5 → 9 7 → 3
If the opponent now played 4 he would Win, leading to 7 → 3 being negatively reinforced. However, a 3 after 7 would also have lost the game for the model, thus 7 → 3 is really a good move but in an ambiguous situation where the opponent has an option of two Winning moves. The model has to backtrack to 1 → 2 being the bad move, and play some alternative to 2 to prevent this from occurring.
This could eventually be accomplished, but it is obviously a difficult situation to learn about since the model is totally unequipped with any rules of the game, winning or losing sequences, etc. It may only learn through trial and error, and under these circumstances may take a large number of games to evolve its best machines if able to do so at all.
The measure of learning could be taken as the shift in the number of wins and draws over n trials, where 1 trial = 1 complete game configuration and n trials = n repetitions of the identical configuration.
Figure 9.4 gives the results over several different games played against the model. Each curve represents the performance (number of wins only) over n trials.
This experiment served to show that given no initial knowledge whatsoever as to the rules of the game and the moves required to win it, given particular configurations, the model was capable of evolving its own set of drawing or winning finite-state machines(fsm). These fsm's served to recognise defined configurations, being activated Rule by Rule once the configuration had occurred. The last Rule in the fsm, when activated, forced the game to a draw or enabled the model to win. However, the model, in these experiments had to be matched against a below-average opponent. This game may be seen as being analogous to the feeding situation. Since the correct associations can only be made if food is given, some food has to be given if learning is to ensue. Similarly, if the model is to learn how to respond within a given game configuration, then it must be allowed to win a few times. Pitted against an expert, the model could never win, and hence would never learn, similar to the infant given no food and eventually dying of malnutrition.
Hence the opponent had to be defined as a below-average player, since the area to be investigated was the learning capacity of the model rather than the opponent.
Given a particular experiment, the opponent executes an unvarying algorithm which enables him to choose his next move dependent on the model's response.
The algorithms used for the opponent's strategy were in the routine TRY which looked at a row, column or diagonal:
1, 5, 9 1, 2, 3 1, 4, 7 4, 5, 6 7, 8, 9 2. 5, 8 . 3. 6, 9
Piaget on observing the behaviour of infants amassed a vast quantity of observational data from which emerged his epistemological theory of cognitive development.
A developmental theory states that the behaviour of the infant passes through well defined stages, in each stage, a new form becoming exposed enabling the infant to progress in its behavioural development. The sequence of stages .... reflects the tasks, or skills, that the child masters in the course of its 'development', 'enculturation', or 'education'.... The earlier stages in development will .... be, not 'temporal predecessors' but 'necessary prerequisites' of the later stages (Toulmin, 1971).
Why there appears to be stages in development may be due equally to an expression of the child's intrinsic 'law-governed' nature or to the progressively more complex tasks that the child is required to master. (Toulmin, 1971).
With regard to knowledge acquisition in terms of stages, each stage defines the things that the child may learn about and hence each stage leads the child to a particular end result with regard to its overt abilities.
A behavioural abnormality would hence be the extent that a child deviates from, and contravenes the assumptions created by the standard sequences of observed child behaviour.
It is usually assumed that a succeeding stage cannot be commenced until the end result of the preceding stage has been achieved. This is to say that it is only when a defined amount of development has taken place that the new form of behaviour may be enabled to arise. The emergence of this new form identifies the commencement of the succeeding stage.
The following section attempts to:
The developmental stages considered are as defined by Piaget (Piaget, 1953).
Using Piaget's classification system for developmental growth in the child, the following initial four stages may be defined (see also Cunningham, 1972):
Stage 1 : Reflex Exercise birth - 1 week Stage 2 : Primary Circular Reactions 2 weeks - 3 months Stage 3 : Secondary Circular Reactions 4 months - 7 months Stage 4 : Familiar Procedures 8 months - 1O months
It must be remembered that the duration of each stage will vary from child to child, and the times specified (by Piaget) were meant to be indicative of an average duration. Piaget stated that the ordering must be precisely followed, such that transitions from Stage 1 to Stage 3 cannot take place without passing through Stage 2.
The characteristic of each stage may depend sometimes on the maturational level of the infant. Thus, in Stage 5 for instance (Active Experimentation), active experimentation may only commence when the infant has acquired sane mode of locomotion, e.g.: crawling or walking.
Stage 1 presumes that the infant is equipped at birth with a set of sensori-motor connections. This is already present in the model as a set of Rules contained in LTM of the form:
S ⇒ <R> and ⇒ <R>
which may stand for
Touch on Palm ⇒ <grasp> and: ⇒ <move head> etc.
Therefore, at the onset the model has a set of responses some of which are linked to specified sensory elements and others which may possess innate predispositions toward a certain sensory element or elements.
A sensory element capable of matching a conditional element of a Rule innately possesses a certain degree of specificity. A non-specific sensory element may acquire specificity through the Learning Mechanism.
There may also be certain maturational features constraining initial development, in terms of this model, features such as the size of STM, the capacity of the communication channel, size of LTM and the laws present as algorithms in the Learning Mechanism.
Section 8.1.3.6 showed the constraining effect of a cut down on STM and channel capacity upon learning.
A characteristic of Stage 1 behaviour is the occurrence of closed loops where a response serves to elicit that stimulus which activated it in the first place. This could be expressed as Rules of the form S1 → r1 S1. Such Rules may be formed as a result of STM incapacity whence only one stimulus may be contained in any one time interval and by the physical inability of the infant to seek out other features of its environment. Thus it remains focused on S1 (until it inadvertently commits a physical move or until S1 disappears) resulting in the formation of S1 → r1 S1.
This reflex activity may only cease when the infant loses contact with the object or when a competitive stimulus element enters STM.
It was also demonstrated in Section 8.1.2.2 that hallucinatory behaviour could ensue as a result of activation of Rules of the form S1 → R1 S1 leading to activation of S2 matching Rules even in the absence of the environmental confirmation of S2.
The simulation runs on experiments 1 and 2 also demonstrated that the behaviour of the model was a close copy of the behaviour of the 3 day old human infant. It showed too that characteristics belonging to succeeding stages could be brought about in a proceeding stage by enforced training methods. However, this could only occur given the presence of the initial Reflexive Rules, the speed of acquisition partly owing itself to the position (or the strength) of the Reflexive Rule required for bootstrapping the more Complex Rules.
Behaviour in the second stage is dominated by what Piaget called the Primary Circular Reactions. An example of this is the reaction between the sucking and grasping reflexes, where by grasping an object may lead to sucking behaviour even though the mouth is not being stimulated. (Cunningham, 1972).
Cunningham states that it is only in Stage 2 that the infant is capable of conducting reflex exercises in parallel. This implies that whereas in Stage 1 only one stimulus configuration could be attended to, in Stage 2 more than one stimulus configuration could be processed within the same time interval. This is indicative of:
These two characteristic forms (activation of a reflex exercise in the absence of its activating sensory agent and parallel activation of Rules) are both demonstrated by the model. In the former instance, Rules of the type:
Object in Palm ⇒ grasp Object in Mouth
when activated could cause a Rule of the type:
Object in mouth ⇒ gsuck
to be activated even before the object has been transferred to the mouth.
Parallel activations of the Rules S1→r1 and S2→r2 would cause Rules of the form S1→r2, S2→r1, S1S2→r1, S1S2→r2, S1S2→r1r2 etc to be formed. Since Rules may be activated even on partial matches, then S1 could activate S1S2→r1, S1S2→r2, or S1S2→r1r2. If S1S2→r1r2 corresponded to:
Object in hand Object in Mouth ⇒ grasp suck
then both reflexes could be activated even in the absence of S2 ≡ Object in Mouth.
In the second instance, again, Rules of the type S1→r1r2 could cause parallel activation of reflexes. However, it must be remembered that it requires two Rules to be activated within the same time interval for the Learning Mechanism to create such forms. If this were only allowed in Stage 2 then obviously the form could not be constructed prior to commencement of Stage 2. A more likely explanation, in keeping with the philosophy behind this model, would be that these forms are created from the onset but due to the reinforcement scheme do not become candidates for activation (through low worth) until they are sufficiently well placed. This may require anything in the order of days due to the variety of configurations being input to the infant. Thus, although isolated examples of these advanced forms may be present during Stage 1, it is only after the Rules become stabilised in the higher levels that the behaviour may be consistently expressed.
This characterises an evolutionary mode of development through continuous growth processes rather than discontinuous jumps from one mode to another by the introduction of new processes.
This view is further substantiated by the fact that the one example of Stage 2 behaviour in Stage 1 occurs within the feeding environment. This is the one consistent series of configurations known to the infant and hence more rapid acquisition may occur here than in any other environment. The emergence of a more complex form in the feeding environment is thus, not a local adaptation but a natural consequence of learning occurring within a continuously recurring, consistent environment. This is borne out by Siqueland and Lipsitt's experiments which confirm that learning of the Stage 2 type can take place in Stage 1 providing the infant be subjected to a suitably consistent and recurrent environment.
Further, the average size of STM is also maintained as a gradually increasing parameter. If STM size was initially too small (1 for example) then the model would be extremely unlikely to execute any primary circular reactions irrespective of the positioning of its Rules. Thus, this allows for the execution of Stage 1 behaviour to become more and more likely as STM size gradually increases and the infant matures physically.
A typical example of Stage 3 behaviour is given by Piaget, where he observes his child form coordinations between a leg shaking activity she had previously assumed and the swinging of a doll on her cot. This led her to look for her doll each time she shook her legs, and to shake her legs each time she chanced to see her doll.
In terms of the model this could be formed in two phases. In phase 1 she happens to be shaking her legs (caused by some object S1) and then catch sight of her doll swinging.
Thus:
P1: S1 ⇒ shake legs in T1 P2: Swinging doll ⇒ look in T2 leads to the creation of P3: S1 ⇒ shake legs Swinging doll
In phase 2 the two Rules P1 and P2 are activated within the same time interval, leading to P4 : Swinging Doll ⇒ shake leg (amongst other possible creations) being formed.
P3 and P4 serve to elicit the required Stage 3 behaviour.
However, why is this behaviour not possible in Stages 1 and 2, since the laws of Causality and Association would be capable of working from the beginning?
Again the answer forwarded could be the inevitable delay between the creation of the requisite Rules and their actual consistent activation. It was observed (in Chapter 8) that when the environment had consistent patterns of occurrence, Rules could be rapidly learned and activated. However, in this particular example, the two events described by P1 and P2 would have to occur adjacently a sufficient number of times to allow P3 to move upwards. Obviously it would not always occur and hence P3 would sometimes be subject to direct negative reinforcement. As a result its total upward trend may be extremely slow, made even slaver by the rapid upsurgence of Rules corresponding to more consistent environmental occurrences. Such Rules also have the tendency to cluster together, most of them achieving activation potential within short times of each other. Thus they give the appearance of sudden, added complexities in behaviour, though in reality, the effect was caused by a very natural progression of events.
Up to this stage, the child has, presumably, been learning about his environment and will have, in his repertoire, a number of structures enabling him to manipulate the objects of his world and to predict environmental happenings. However, those structures which are most commonly activated, will belong to particular contexts, such that only in a particular situation will a particular response be elicited. In Stage 4, there is a break-away from the normal pattern of behaviour, and one observes the infant attempting old actions (or strategies) in new situations. That it may be a conscious decision to attempt a particular old action in the situation is debatable and, in fact, it is only through trial and error that the correct action is undertaken. However, there is a conscious decision being made about attempting any known old action rather than one in particular. This is best brought out by the following observation (Observation 122) made by Piaget. Laurent has just seen one of his favourite toys. He locks his gaze upon it and then moves his arm toward it in order to grasp it. However, on this occasion, Piaget has placed an obstacle in front of the toy such that Laurent cannot touch it. To reach directly and grasp the object was his old, familiar strategy. But that has failed and so he attempts a series of new ones. He waves his hands, shakes his head and generally applies various old actions. The toy remains out of reach. Eventually, (and randomly it seems) he reaches out and strikes the obstacle. This obviously results in displacing the obstacle so that now, a reach out and grasp sequence will be successful. The success reinforces the attempted strategy, such that faced with a similar situation, he responds immediately with his newly acquired sequence of actions.
In terms of Production Rules, the original strategy may be written as:
Visual Object ⇒ <reach out> Hand Touch Hand Touch ⇒ <grasp> Object in Hand
However, the obstacle prevents the Hand Touch stimulus being generated, though the expectation will probably cause the grasp to occur a few times.
The infant now scans through his Production System, choosing Rules that may apply to the situation. This may involve Rules of the type:
⇒ <R1>
with null left hand side, or Rules like:
S1 ⇒ <R1>
where S1 is some arbitrary visual stimulus object previously encountered.
Eventually he gets to:
⇒ <strike>
which he commits, causing the obstacle to be removed. He only recognises obstacle removal when he enacts:
Object ⇒ <reach out> Hand Touch Hand Touch ⇒ <grasp> Object in Hand
The new Rules will be built up, of the form:
Object ⇒ <strike>
through the Law of Insufficiency and:
Obstacle Object ⇒ <strike>
through the law of Association. Next:
Object ⇒ <reach out> Hand Touch is activated
causing
Object Obstacle ⇒ <strike> Object to be formed
The Hand Touch expectation being confirmed, causing positive reinforcement for the last obeyed Rule, and on any future occasion, the confirmation of Object on activating the new Rule causes that one to be reinforced too. A new Strategy has been formed therefore linking the three Rules together through the internally generated stimuli, which Rules will be activated the next time that the object is encountered.
This is not to say that the infant recognises the Object obscured by Obstacle as different from Object on its own. This may be built up, though, by formation of a Rule such as:
Object Obstacle ⇒ <strike> Object
which is activated only when both object and obstacle are in STM concurrently, as opposed to:
Object ⇒ <reach out> Hand Touch
which is activated when only object is in STM. Obviously mix ups may occur and the infant may strike out when sometimes he sees the object on its own or not strike out when the obstacle is present.
This concluding chapter has attempted to show to some extent the potential uses of the model developed in this thesis. The author firmly believes that albeit the model is extremely crude, it constitutes a firm step towards the construction of a systems approach towards the investigation of psychological processes and the substantiation of psychological theories of human, behaviour.
The model due to its simple design, philosphy, the large numbers of psychological constraints imposed upon it, and its realistic level of approach to the problem should be of use to investigators into the cognitive development field in general or into particular aspects of cognition and behaviour.
ACKERMAN J. (1975) 'A Study of the Mother-Infant Relationship in the First Year of Life', Phd Thesis, Political and Social Science Dept., Michigan State University.
ADCOCK C.J. (1964) 'Fundamentals of Psychology', Pelican, 1964.
ALS H. (1975) 'The Human Newborn and his Mother: An Ethological Study of their Interaction', Phd Thesis, Dept. of Psych., University of Pennsylvania.
ANDERSON J.R. (1976) 'Language,Memory and Thought', Lawrence Erlbaum Assocs., N.J.
ANDREW A.M. (1958) 'Machines 'Which Learn' , New Scientist, Nov. 27, 4:1383.
ANDREW A.M. (1959) 'Learning Machines', D.V.Blake and A.M.Uttley (Eds), Procs. of the Symposium on the Mechanisation of Thought Processes,NPL, Teddington, London.
ASHBY W.R. (1947) 'Principles of the Self-Organizing Dynamic System', J. of Gen. Psychol., 37:125-128.
ASHBY W.R. (1952) 'Design for a Brain', Wiley, New York.
BEARD R.M. (1969) 'An Outline of Piaget's Developmental Psychology', Routledge and Kegan Paul, London.
BECKER J.D. (1970) 'An Information Processing Model of Intermediate-Level Cognition', meme no. 119, Stanford AI Project, Comp. Sci. Dept., Stanford University,- also Report no. 2335, Cambridge, Mass., Bolt, Beranek and Newman Inc.
BECKER J.D. (1972) '"Robot" Computer Problem Solving System' , Final Progress Report, Camb., Mass., Bolt, Beranek and Newman Inc.
BEILIN H. (1968) 'Cognitive Capacities of Young Children: A Replication', Science 162, 920-921.
BEILIN H. (1971) 'The Development of Physical Concepts', in T.Mischel (Ed) 'Cognitive Development and Epistemology', Academic Press.
BERLYNE D.E. (1960) 'Conflict, Arousal and Curiosity', MoGraw-Hill.
BERLYNE D.E. (1965) 'Structure and Direction in Thinking', Wiley, New York.
BORGER R. & SEABORNE A.E.M. (1967) 'The Psychology of Learning', Penguin.
DODWELL P.C. (1964) 'A coupled system for coding and learning in shape discrimination', Psychological Review, Vol 71(2), Mar 1964.
DODWELL P.C. (1971) 'Perceptual Learning And Adaptation; Penguin 1971
FOGEL L.J., OWENS A.J, & WALSH M.J. (1966) 'Artificial Intelligence through Simulated Evolution'.
FURTH E.G. (1969) 'Piaget and Knowledge: Theoretical Foundations', Englewood Cliffs, NJ, Prentice-Hall.
GAINES B.R. (1976) 'Behaviour/Structure Transformations Under Uncertainty', Int. J. Man Machine Studies, (1976) 8.
GELERNTER H. (1959) 'Realization of a Geometry Theorem Proving Machine', E.A.Feigenbaum and J.Feldman (Eds) 'Computers and Thought', McGraw-Hill, 1963.
GELERNTER H., HANSON J.R. & LOVELAND D.W. (1960) 'Empirical Exploration of the Geometry Theorem Machine', E.A.Feigenbaum and J.Feldman (Eds) 'Computers and Thought', McGraw-Hill, 1963.
GIBSON J,J, & E.J. (1955) 'Perceptual Learning - Differentiation or Enrichment?', P.C.Dodwell (Ed) 'Perceptual Learning and Adaptation', Penguin, 1970.
GINSBERG H. & OPPER S. (1969) 'Piaget's Theory of Intellectual Development'; Englewood - Cliffs, Prentice - Hall, NJ.
GREENBLATT R.B., EASTLAKE D.E. & CROCKER S.D. (1967) 'The Greenblatt Chess Program', Proc. of the 1967 Jt. Computer Conference, 30, 801-810.
GREGG L.W. (1972) 'Simulation Models of Learning and Behaviour', L.W.Gregg(Ed) 'Cognition in Learning and Memory', John Wiley and Sons Inc.
GYR J.W. , BROWN J.S., WILLEY R. & ZIVIAN A. (1966) 'Computer Simulation and Psychological Theories of Perception', P.C.Dodwell(Ed) 'Perceptual Learning and Adaptation', Penguin, 1970.
HAMLYN D.W. (1971) 'Epistemology and Conceptual Development', T.Mischel (Ed) 'Cognitive Development and Epistemology', Academic Press.
HAYES - ROTH F.A. (1974) 'Fundamental Mechanisms of Intelligent Behaviour: The Representation, Organization, Acquisition and Use of Structured Knowledge in Perception and Cognition', Phd Thesis, Psychol. Etept., University of Michigan.
HEBB D.O. (1949) "The Organization of Behaviour', John Wiley and Sons Inc.
HELMHOLTZ H.VON (1866) 'Concerning the Perceptions in General', P.C.Dodwell (Ed) 'Perceptual Learning and Adaptation', Penguin, 1970.
HELMHOLTZ, H.VON (1925) 'Treatise on Physiological Optics', vol. 1. Dover: New York. 1925.
HOLLAND J. (1974) 'Adaptation in Natural and Artificial Systems',
HULL C.L. (1920) 'Quantitative Aspects of the Evolution of Concepts', Psychol. Monographs, Vol 28, no.123.
HUNT J. MCV. (1960) 'Experience and the Development of Motivation', Child Dev., Vol 31, 489-504.
HUNT E.B. & HOVLAND C.I. (1961) ' Programming a Model for Human Concept Formation' E.A.Feigenbaum and J.Feldman (Eds) 'Computers and Thought', McGraw-Hill, 1963.
KESSEN W. (1971) 'Early Cognitive Development: Hot or Cold?', T.Mischel (Ed) 'Cognitive Development and Epistemology', Academic Press.
KLAHR D. & WALLACE J.G. (1976) 'Cognitive Development: An Information Processing View', Lawrence Earlbaum Assocs.,NJ.
KOCH J. (1968) 'Conditioned Orienting Reaction to Persons and Things in Two-to-Five Month Old Infants', W.Sluckin (Ed) 'Early Learning and Experience', Penguin, 1971.
KOHLER W. (1929) 'Sensory Organization', P.C.Dodwell (Ed) 'Perceptual Learning and Adaptation', Penguin, 1970.
LINDSAY P.H. & NORMAN D.A. (1972) 'An Introduction to Psychology', Academic Press.
LINDSAY R.L. (1973) 'In Defence of Ad Hoc Systems', R.C. Schank and K.M'.Colby (Eds) 'Computer Models of Thought and Language', WH Freeman and Co. Ltd.
LYNN R. (1966) 'Attention, Arousal and the Orientation Reaction', Oxford Pergamon Press.
McGURK H. (1974) 'Visual Perception in Young Infants', B.Foss (Ed) 'New Perspectives in Child Development', Penguin, 1974.
MALCOLM (1971) 'The Myth of Cognitive Processes and Structures' T.Mischel (Ed) 'Cognitive Development and Epistemology', Academic Press.
MERRIAM E.W. (1975) '"Robot" Computer Problem Solving System' , Quarterly and Final Progress Report for NASA, Contract No. NASW-2749, BBN Report No. 3108.
MILLER G.A.(1956) 'The Magical Number Seven Plus or Minus Two', Psychol. Rev. 63, 81-97.
MILLER G.A., GALANTER E. & PRIBRAM K. (1960) 'Plans and the Structure of Behaviour', Holt, Rhinehart and Winston.
MINSKY M. (1975) 'A Framework for Representing Knowledge', P.H.Winston (Ed) "The Psychology of Computer Vision', McGraw-Hill.
MISCHEL T. (1971) 'Piaget: Cognitive Conflict and the Motivation of Thought' T.Mischel (Ed) 'Cognitive Development and Epistemology', Academic Press.
MORAN T. (1973) "The Symbolic Imagery Hypothesis:A Production System Model', Phd Thesis, Comp. Sci. Dept., Carnegie-Mellon University.
MOTT D.H. (1976) 'Experiments in Sensory-Motor Learning', Int. Doc., Comp. Sci. Dept., Queen Mary College, London.
MYERS A.E. (1975) "The Organization of Spontaneous Behaviours in Sleeping Newborn Infants', Phd Thesis, Political and Social. Dept., University of Michigan.
NEISSER H. (1967) 'Cognitive Psychology', Appleton-Century-Crofts.
NEWELL A. & SIMON H.A. (1959) 'The Simulation of Human Thought', 1959
NEWELL A. & SIMON H.A. (1961) 'Computer Simulation of Human Thinking', Science 1961
NEWELL A. & SIMON H.A. (1963) 'A Program that Simulates Human Thought'; E.A.Feigenbaum and J.Feldman (Eds) 'Computers and Thought', McGraw-Hill, 1968.
NEWELL A., SHAW J.C. & SIMON H.A. (1963) 'Chess Playing Programs and the Problem of Complexity' E.A.Feigenbaum and J.Feldman (Eds) 'Computers and Thought', McGraw-Hill, 1968,
NEWELL A., SHAW J.C. & SIMON H.A. (1972) 'Human Problem Solving', Prentice-Hall.
NEWELL A., McDERMOTT J. & MOORE J. (1976) 'The Efficiency of Certain Production System Implementations', Comp. Sci. Dept., Carnegie-Mellon University.
NILLSON N.J. (1965) 'Learning Machines', McGraw-Hill.
NORMAN D.A. (1965) (1969) 'Memory and Attention', John Wiley and Sons.
PAVLOV I.P. (1927) 'Conditioned Reflexes', Milford, Oxford.
PASCAL - LEONE J. (1970) 'A Mathematical Model for the Transition Rule! in Piaget's Developmental Stages', Acta Psychol. 32, 301-345.
PIAGET J. (1929) "The Child's Conception of the World', London, Routledge.
PIAGET J. (1949) "The Psychology of Intelligence', London, Routledge.
PIAGET J. (1953) 'The Origins of Intelligence in the Child' London, Routledge.
PIAGET J. (1955) 'The Child's Construction of Reality', London, Routledge.
PIAGET J. (1969) "The Mechanism of Perception', London, Routledge.
PINKUS A.L. (1970) 'STM: Computer Simulation of Receding and Chunking Processes', Phd Thesis, NY University.
POSTMAN L. (1955) 'Association Theory and Perceptual Learning', P.C.Dodwell (Ed) 'Perceptual Learning and Adaptation', Penguin, 1970.
PRECHTL F.H.R. (1957) "The Directed Head-Turning Response and Allied Movements of the Human Baby', Neurology Dept., University of Gruningen, Netherlands, Rec. I-XI.
REISEN A.M. (1968) 'Plasticity of Behaviour: Psychological Aspects', P.C.Dodwell (Ed) 'Perceptual Learning and Adaptation', Penguin, 1970.
RICHARDSON J.T.E. (1970) 'Perceptual Development', Int. Doc., Psychol. Dept., Brunei University, London.
RYCHENER M.D. (1975) "The Student Production System: A Study of Encoding Knowledge in Production Systems', Comp. Sci. Dept., Carnegie-Mellon University.
SALAPATEK P. & KESSEN W. (1966) 'Visual Scanning of Triangles by the Human Newborn', P.C.Dodwell (Ed) 'Perceptual Learning and Adaptation', Penguin, 1970.
SAMUEL A.L. (1959) 'Machine Learning Using the Game of Checkers', E.A.Feigenbaum and J.Feldman (Eds) "Computers and Thought', McGraw-Hill.
SCHAFFER H.R. (1966) "The Onset of Fear of Strangers and the Incongruity Hypothesis', W.Sluckin (Ed) 'Early Learning and Early Experience', Penguin, 1971.
SCHAFFER H.R. & EMERSON P.E. (1964) 'The Development of Social Attachments in Infancy', W.Sluckin (Ed) 'Early Learndng and Early Experience', Penguin, 1971.
SCOTT J.P. (1968) 'Early Development', W.Sluckin (Ed) 'Early Learning and Early Experience', Penguin, 1971.
SELFRIDGE O.G. (1959) 'Pandemonium: A Paradigm for Learning', P.C.Dodwell (Ed) 'Perceptual Learning and Adaptation', Penguin, 1970.
SIMON H.A. (1960) 'The New Science of Management Decision', Harper and Row, NY.
SIMON H.A. & FEIGENBAUM E.A. (1964) 'An Information Processing Theory of Some Effects of Similarity, Familiarization and Meaningfulness in Verbal Learning', J. of Verbal Learning and Verbal Behav., 3, 386-396.
SIQUELAND E.R. & LIPSITT P.L. (1966) 'Conditioned Head-Turning in Human Newborns', J. of Expt. Child Psychol., 3, 356-376,
SKINNER B.F. (1953) 'Science and Human Behaviour', McMillan, NY.
SLUCKIN W. & SALZEN E.A. (1961) 'Imprinting and Perceptual Learning', W.Sluckin (Ed) 'Early Learning and Early Experience', Penguin, 1971.
SOKOLOV Y.N. (1963) 'Perception and the Conditioned Reflex', Oxford Pergamon Press.
SPENCE K.W. (1950) 'Cognitive vs. Stimulus-Response Theories of Learning', Psychol. Rev., Vol 57, 159-172.
SPERLING G.(1960) 'The Information Available in Brief Visual Presentations', Psychol. Monographs, 74 (Whole no.11).
SPERRY R.W. (1958) 'Physiological Plasticity', P.C.Dodwell (Ed) 'Perceptual Learning and Adaptation', Penguin, 1970.
TAYLOR S. (1971) 'What is Involved in a Genetic Psychology', T.Mischel (Ed) 'Cognitive Development and Epistemology', Academic Press.
THOMPSON W.R. & HERON W. (1954) "The Effects of Restricting Early Experience on the Cognitive Capacity of Dogs', W.Sluckin (Ed) 'Early Learning and Early Experience', Penguin, 1971.
TINBERGEN N. (1951) 'External Stimuli', P.C.Dodwell (Ed) 'Perceptual Learning and Adaptation', Penguin, 1970.
TULVING E. & DONALDSON W. (1972) 'The Organisation of Memory', Academic Press. I
TURING A.M. (1950) 'Can a Computer Think', E.A.Feigenbaum and J.Feldman (Eds) 'Computers and Thought', McGraw-Hill, 1963.
UHR L. & VOSSLER C. (1961) 'A Pattern Recognition Program that Generates, Evaluates and Adjusts its own Operators', E.A.Feigenbaum and J.Feldman (Eds) 'Computers and Thought', McGraw-Hill, 1963.
VAN HOLST E. (1954) 'Relations Between the CNS and the Peripheral Organs', Brit. J. Animal Behav., Vol 2, 89-94.
VERNON M.D. (1962) "The Psychology of Perception', Penguin.
VINCE M.A.(1961) 'Developmental Changes in Learning Capacity', W.Sluckin (Ed) 'Early Learning and Experience', Penguin, 1971.
WALK R.D. & GIBSON E.J.(1961) 'A Comparative and Analytical Study of Visual Depth Perception', P.C.Dodwell (Ed) 'Perceptual Learning and Adaptation', Penguin, 1970.
WATERMAN D.A. (1970) 'Generalization Learning Techniques for Automating the Learning of Heuristics', AI, 122-170
WATERMAN D.A. (1975) 'Adaptive Production Systems', Proc. of J. Int. Conf. on AI.
WILKS Y. (1971) 'Grammar, Meaning and the Machine Analysis of Natural Language', Routledge and Kegan-Paul, Boston.
WINOGRAD T. (1971) 'Procedures as a Representation for Data in a Computer Program for Understanding Natural Language' , MAC-TR 84, MIT AI Lab., Phd Thesis.
WOHLWILL J.F. (19 ) "The Definition and Analysis of Perceptual Learning', Psychol. Rev. Vol 65, 283-295.
WOODWARD W.M. (1971) "The Development of Behaviour', B.M.Foss (Ed) 'New Perspectives in Child Development', Penguin, 1974.
YOUNG R.M. (1973) 'Children's Seriation Behaviour: A Production System Analysis', Phd Thesis, Comp. Sci. Dept., Carnegie-Mellon University.
**************************************************************** MAIN LOOP GIVING HIERARCHY OF MAIN ROUTINES CALLED **************************************************************** SSINMX INPUTS STIMULI FOR NEXT TIME INTERVAL CKLFTN CHECKS LEFT HEAD TURN SSCVIN SET UP STIMULI FOR NEXT TIME INTERVAL IN SSTMBF STINNM GETS POSITION OF STIMULI IN STIMNM STENTR CHECKS IF ATTRIBUTE LIST FOR STIMULI ALREADY THERE SSORDR REORDERS STIMULI IN SSTMBF AND COPIES INTO SSTM UPDRPD CREATE AND REINFORCE PRODUCTIONS CREATE CREATE NEW RULES CALSIG UPDATE PRODUCTIONS WITH GREATEST CURRENT WORTH REINF REINFORCE RULES STIMIN MOVES STIMULI FROM SSTM TO STM STMCHK CHECKS TO SEE IF SSTM ENTRY IN STM HNGACU CHECKS STM ENTRY FOR ACUTE HUNGER STMPN CHECKS STM FOR PAIN STMHNG CHECKS STM ENTRY FOR HUNGER STMINT CHECKS STM ENTRY FOR INTERNALLY GENERATED STMHAB CHECKS FOR HABITUATED STMNLY SETS STM TO A SINGLE ENTRY HUNGER STMDEL DELETE ITEMD FROM SSTM NO LONGER REQUIRED STMNOV REORDERS SSTM IN ASCENDING ORDER OF NOVELTY STMRTV FIND STM ITEMS OF A PARTICULAR TYPE STMSLC SELECT SUBSET OF ITEMS FROM COMPLETE SET STMUPD UPDATE TAG AND AGE FIELDS IN STM STMVIN MOVE ITEMS FROM SSTM TO STM AND DELETE FROM SSTM UPDPR2 GENERATE INHIBIT RULEs, REWARD RULES CHOICE CHOOSE SET OF PRODUCTION RULES TO OBEY CHPRSV SAVE STM IN OTM ETC STUPTM SET UP TMPSAY IN ASCENDING STIMULUS ORDER MATCH FIND PRODUCTIONS THAT MATCH STM CHOOSE CHOOSE A PRODUCTION FROM SET THAT MATCHES DLIMPS DELETES ITEMS IN TMPSAY ALREADY MATCHED REMTCH FIND SET OF PRODUCTIONS THAT STILL MATCH CONDENSED STM UPDSTM UPDATE STM WITH INTERNALLY GENERATED STIMULI LPDINV UPDATE ENVIRONMENT INCRDT INCREMENT TIME TO NEXT TIME INTERVAL
HMIN = 15.0 HCOUNT = 64.0 HMAX = 100.0 HHIGH = 90.0 DHOUT = 0.03 DHIN = 0.6 WTHRAT = 0.05 WTHCUT = 0.01 WPRSIG = 0.1 PCOUNT = 5.0 PCNTLD = 5.0 PRDLMX = 100.0 STMCAP = 5.0 CHCAP = 3.0 STPTCY = 3 CHPTCY = 3 MSNM = 1 PRNM = 1 MSPR = 1 PRPR = 1 IPRGRT = 2 IPRLS = 1 JPRIGS = 4 JPNIGS = 4 JPNGRF = 1 JNOVUP = 1 JALCRE = 4 PRMCNS = 0.5 PSNCN1 = 2.0 PSNCN2 = 2.0 PRIGCN = 1.0 PROLD = 0.01 DECRT = 0.0 DCPRED = 0.0 NMPRED = 1
RANDOM RANDM1 RANDM2 RANDM3 RANDM4 RANDM5 RANDM6 RANDM7 RANDM8 RANDM9 RANDM0
N → ln R6 → nil R6 → nil3 R6 → nil4 R6 → nil5 R6 → nil6 R2 → nil R6 → nil7 R6 → nil8 R6 → nil9 R6 → nil10 R7 → nil12 T → l T → r R3 → nil3 R7 → nil4 R7 → nil5 R7 → nil6 R7 → nil7 R7 → nil8 F → s R7 → nil9 R7 → nil0 R8 → nil2 R8 → nil3 R8 → nil4 R4 → nil R8 → nil5 R8 → nil6 R8 → nil7 R8 → nil8 R8 → nil9 R1 → nil R8 → nil0 R9 → nil2 R9 → nil3 R9 → nil4 R9 → nil5 R5 → nil R9 → nil6 R9 → nil7 R9 → nil8 R9 → nil9 R0 → nil R0 → nil2 R0 → nil3 R0 → nil4 R0 → nil5 R0 → nil6
As in Table A1 except the following values:
DHOUT 0.02 DHIN = 0.4
R1 → nil R6 → nil2 R6 → nil3 R6 → nil4 R6 → nil5 R2 → nil R6 → nil6 R6 → nil7 R6 → nil8 R6 → nil9 T → l T → r R3 → nil R6 → nil0 R7 → nil2 R7 → nil3 R7 → nil4 F → s R7 → nil5 R7 → nil6 R7 → nil7 R7 → nil8 R8 → nil2 R8 → nil3 N → ln R8 → nil4 R8 → nil5 R8 → nil6 R8 → nil7 R5 → nil R8 → nil8 R8 → nil9 R8 → nil0 R9 → nil2 → nil R9 → nil3 R9 → nil4 R9 → nil5 R9 → nil6
See Table B1.
R11 → nil2 R11 → nil3 R11 → nil4 R11 → nil5 R2 → nil2 R11 → nil6 R11 → nil7 R11 → nil8 R11 → nil9 TR → r TR → l TL → r TL → l R3 → nil3 R11 → nil0 R11 → nil11 R11 → nil12 R11 → nil13 R4 → nil4 R11 → nil14 R11 → nil15 R11 → nil16 R11 → nil17 R5 → nil R11 → nil18 R11 → nil19 R11 → nil20 R12 → nil2 F → s R12 → nil3 R12 → nil4 R12 → nil5 R12 → nil6 BZR → ln R12 → nil7 R12 → nil8 R12 → nil9 R12 → nil0 TNE → ln R12 → nil11 R12 → nil12 R12 → nil13 R12 → nil14 R6 → nil6 R12 → nil11 R12 → nil12 R12 → nil13 R12 → nil14 R6 → nil6 R12 → nil15 R12 → nil16 R12 → nil17 R12 → nil18 R7 → nil7 R12 → nil19 R12 → nil20 R13 → nil2 R13 → nil3 R8 → nil8 R13 → nil4 R13 → nil5 R13 → nil6 R13 → nil7 R9 → nil9 R13 → nil8 R13 → nil9 R13 → nil0 R13 → nil11 R0 → nil0 R13 → nil12 R13 → nil13 R13 → nil14 R13 → nil15 → nil R13 → nil16 R13 → nil17 R13 → nil18 R13 → nil19