Autonomous vision-based navigation: Goal-oriented ... - CiteSeerX

4 downloads 0 Views 521KB Size Report
Autonomous vision-based navigation: Goal-oriented action planning by transient states prediction, cognitive map building, and sensory-motor learning.
Autonomous vision-based navigation: Goal-oriented action planning by transient states prediction, cognitive map building, and sensory-motor learning Christophe Giovannangeli

Philippe Gaussier

CNRS UMR8051 ETIS Neurocybernetic Team Cergy-Pontoise University - ENSEA 2, avenue Adolphe-Chauvin 95302 Cergy-Pontoise, France

CNRS UMR8051 ETIS Neurocybernetic Team Cergy-Pontoise University - ENSEA 2, avenue Adolphe-Chauvin 95302 Cergy-Pontoise, France Member of the Institut Universitaire de France

[email protected]

Abstract— This article presents a bio-inspired neural network providing planning capabilities in autonomous navigation applications. The proposed architecture (hippocampus model) learns, recognizes and predicts transitions between places for any system able to provide a localization gradient from the current position to each learned place. The recurrent synapses of a cognitive map (prefrontal cortices model) encoded the spatiotemporal connectivity of the performed transitions. Particular transitions of interest (goal transitions) are associated to the satisfaction of drives. While planning, the diffusion of an activity from the goal transitions in the cognitive map allows to compute a proximity gradient to the goal from each learned transition. The shortest plan of transition to reach the goal is computed by merging the cognitive map information and the prediction of the possible transitions (nucleus accumbens). In parallel, a sensory-motor learning between the performed transitions and the corresponding movements occurs and enables to physically execute the proposed plan (cerebellum). Refinements (active forgetting capabilities) are proposed for the cognitive map building. The whole system is experimented on a real robot which autonomously learns a stable representation of its environment during a long random walk and proves to be able to return to the goal from any position of the environment.

I. I NTRODUCTION This paper describes our ongoing work on the feasibility of building an autonomous sentinel robot inspired from neurobiological data for patrolling and exploration missions in an a priori unknown environment (with the constraint of using only local visual information and no global positioning system). It focuses on the online latent learning of a cognitive map providing our robot with behavioral strategies of planning. Since the discovery of neurons called place-cells in the hippocampus of the rodents, which activity is highly correlated with the position of the rat in its environment [31], our understanding of the cognitive navigation mechanisms This work was supported by the D´el´egation G´en´erale pour l’Armement (DGA), contract no 04 51 022 00 470 27 75, and the Institut Universitaire de France. J.P. Banquet works in the neurocybernetic team on the neurobiological aspects of the models. [email protected] G. D´esilles is our scientific correspondant from the DGA.

[email protected]

has widely increased. Visual information is involved in taxon navigation (going back to a particular landmark), or place recognition from distant landmarks. Biological models of vision-based navigation use the azimuth of the landmarks [10], [28], [38], or more rarely, their identity or a conjunction of both [7], [20], [17], [5], [24]. Nevertheless, in spite of massive improvements in the computing capabilities of our computer and the progress in statistical methods in classical branches of robotics [12], [2], [37], [34], [9], the autonomous navigation in various environments without artificial localization system (GPS, bird eye) remains a great challenge. Constraints in indoor and outdoor environments are so different that navigation algorithms have always been subdivided in two branches: indoor navigation vs. outdoor navigation [13]. Biomimetic navigation seems to offer a relevant approach to reconcile outdoor and indoor navigation [21], insofar as the different biomimetic models are segregated by the cognitive complexity of the task rather than their field of application [15]. The proposed neural architecture is a functional point of view of the mammals brain structures connected to the hippocampus known to be involved in various cognitive capabilities, especially planning (the ablation of the hippocampus deprives rodents of planning a route in a maze (see [39] for a review). The next section presents a model of prehippocampal place-cells providing a robust level of localization decreasing regularly over a large area, without the need of any explicit Cartesian or topological map [17], [24]. Section III details an architecture able to learns to predict transition between places, to build a cognitive map and to plan a route to return to a goal [8], [19], [11]. Refinements are proposed to provide the cognitive map building with active forgetting capabilities. Section IV presents an experiment on a real robot in an unknown indoor environment. The results highlights the stability of the place learning, the latent learning of the cognitive map, the capability to plan transitions that reduce the distance to the goal, and the efficiency of the sensory-motor learning to associate the transitions with the corresponding actions. The conclusion

will open perspectives on a more general use of such an architecture for the learning of autonomous behaviors. II. P LACE RECOGNITION AND

VISUAL NAVIGATION

This section describes a mature and efficient model of prehippocampal visual place-cells, tested on several platforms to perform missions in open indoor and outdoor environments [20], [17], [21], [22].

Visual Input

Local View Iij

Place code

Elevation φk

Landmarks L ωij,k

lk

EM ωek LM ωlk

Place cell mk

Pk P ωik

Azimuth θk

AM ωak

Increasingly active neurons One to all modifiable link

is convolved with a DoG (Difference of Gaussian) filter to detect robust focus feature points at a low resolution. A competition between the feature points, inspired from biology, enables the system to primarily focus on the most activated points (based on a contrast and edge curvature criterion). A small circular image, centered on each focus point, is extracted and transformed in log-polar coordinates to enhance the pattern recognition when small rotations and scale variations occur. In the following, these small log-polar images will be called local views since they contain visual information in the neighborhood of the focus point. Fig. 2 illustrates the selected focus points, the size of the circular area, the log-polar local views. Merging of the what and where information (the localview recognition and its spatial localization in the visual field) is performed in a product space (i.e.: a third-order

Fig. 1. Block diagram architecture of the place recognition: Our architecture is composed of a visual system that focuses on points of interest and extracts small images in log-polar coordinates (called local views), a merging layer that compresses what and where information into a place code, and a place recognition layer.

Our model of entorhinal place-cells is inspired from ”what and where” functional theory of the cortical connectivity downstream the hippocampus [42]. Fig. 1 summarizes our visual processing chain. A place is defined by a spatial constellation of visual features learned online. A constellation is a set of triplets landmark-azimuth-elevation. The recognition of the landmarks provides the ”what” information and models the temporal pathway: ITC (infero-temporal cortices) in human or Pr (perirhinal cortices) in rats. The absolute position of the landmarks provides the ”where” information and models the parietal pathway: Ph (parietal and/or parahippocampal cortices in mammals). The constellation is built in a merging layer which models PrPh, characterising the synaptic location of the entorhinal cortices (EC). Neither Cartesian nor topological map building is required. On the contrary, the world acts as an outside memory of static invariants of the proposed attentional vision system [32]. Inasmuch as the learned invariants of a location persist in its neighborhood, the localization is possible without map building. Coupled with a sensory-motor system, the place recognition algorithm has already been proved to be sufficient to generate a robust navigation behavior with a trivial landmark extraction method in structured indoor environments [18], [17]. The navigation system has also been optimized for complex open outdoor environments [24], [21]. The first layer of the architecture is a visual system autonomously extracting landmarks from captured images [18], [29]. The images may be either re-built from a set of classical images, covering the whole 360o of the available visual field, or captured by means of a omnidirectional camera (here a Vstone VS-C42N-TR) to speed up the experimentations. The system can also operate without any panoramic reconstruction as demonstrated in [21], [22]. For sake of robustness to daylight intensity variation, the gradient image is computed from the CCD input. This gradient image

θ = 37°

θ = 49°

θ = 87°

θ = 106°

θ = 109°

θ = 113°

θ = 121°

θ = 128°

1 : 0.9615 10 : 0.9246 25 : 0.9221 11 : 0.9056

3 5 6 2

24 : 0.9096 17 : 0.9089 7 : 0.9073 37 : 0.8830

4 : 0.9130 21 : 0.9114 32 : 0.9059 13 : 0.9022

9 : 0.9358 16 : 0.9304 29 : 0.9301 14 : 0.8980

10 : 0.9151 19 : 0.9146 25 : 0.9040 2 : 0.9029

15 : 0.9033 0 : 0.8934 26 : 0.8934 12 : 0.8912

0 : 0.9382 2 : 0.9166 22 : 0.9116 12 : 0.9096

: 0.9589 : 0.9249 : 0.9190 : 0.8937

Fig. 2. Illustration of the landmark extraction mechanism: the gradient of a panoramic image is convolved with a DoG filter. The local maxima of the filtered image correspond to points of interest (center of the circles). Here, the eight first focus points are displayed. The system focuses on these points to extract local views in log-polar coordinates, learned as landmarks. The system also provides the bearing of the focus points by means of a magnetic compass. For each extracted local view, the identity of the four most recognized landmarks and their recognition level are given.

Fig. 3. Activity of 5 × 5 place-cells regularly encoded in a working room (7×7 meters). The position of a graph corresponds to the place-cell encoded in the corresponding position on the map of the working room. These results verify that the system is efficient for 25 place-cells, with 60 landmarks by place-cell.

tensor compressed into a vector of product neurons, see [35] for more classical sigma-pi units) defining a tensorial place code. Here, the azimuths are provided by a magnetic compass, but we shown that a visual compass can be built by associating each landmark to its shift with an arbitrary visual reference [29], [23]. This visual compass can then be merged with odometrical measurements to provide an efficient driftbounded orientation system [23]. Place-cells finally learn and recognize the current place code; place-cells activity corresponds to a localization gradient, a monotonic function decreasing with the distance to the learned position. The learning of several locations creates overlapping place fields and also leads to the paving of the space when the learning of new locations is triggered by the detection of low place-cell activities (according to a given threshold) as illustrated by the fig 3 (in this experiment, the vigilance signal was manually controlled). Another predictable mathematical consequence of the what and where merging is the following: the shape of the place field is homothetic with the shape of the environment (i.e: the place fields extend with the distance to the landmarks). For instance, in [24], we presented place fields having a useful radius of about 25 m., which was almost the size of the environment. In [22], we shown that an outdoor loop of 200m requires the same computation load an indoor loop of less than 15 m. For navigation capabilities, it is first possible to associate each place-cell with a particular movement in order to create a behavioral attraction basin [20], [17], [21]. We have shown that this behavior could learned by means of an intuitive human-robot interaction in indoor as well as in outdoor environments [22]. The next section focuses on a different navigation strategy which only requires an exploration phase and the discovery of goals to provide the robots with planning capability. The building of a topoligical map of places has received a relevant interest for planning in robotics [36], [2], [14], [43]. The particularity of our work is to propose a single neural network standing for a model of the brain functionning (from the eyes, to the cerebellum, through the hippocampus and various cortex areas), which integrates the presented landmark-based place learning, a predictor of the possible transitions, the building of a cognitive map of transitions of places and sensorymotor learning between allothetic transitions and idiothetic actions for the robot to return to several goals as soon as their discovery (notion of latent learning) [41], [8], [19]. III. T RANSITION AND

COGNITIVE MAP FOR PLANNING

Based on any system providing a localization gradient of a given number of positions in the environment (GPS based localization, metrical approaches, visual place-cells based localization ...), we propose in this section a neural architecture for planning by using a cognitive map. The whole architecture is illustrated in fig. 4. Fed with the information of the current place (P + , binarization of P in fig 1) and the previous place (P P (t) = P + (t − dt)), a neural group (T P ) first learns to recognize transient states (here transitions between places) and predicts all the possible

events according to the current state. The two previous performed transitions are then memorized (in T C , chronology of the successive transitions) in order to build the cognitive map (C) which simply links each performed transition to the previous one. The cognitive map is also able to associate particular transitions to the satisfaction of a drive (D). When a motivation related to a drive is activated, the corresponding neuron feeds the cognitive map. The recurrent synapses of the cognitive map (ω CC ), encoding the temporal adjacency of two transitions, enables the diffusion of a proximity gradient to the goal: the more activated a neuron is, the closer from the goal the corresponding transition is. By multiplying the gradient with the group providing all the possible transitions from the current state, it is possible to know which transition to realize in order to reach the goal (T G ). In parallel, each performed transition is associated with the movement (M P ) realized during the crossing of the departure places, allowing the robot to physically execute the proposed transition by executing the learned movement (M T )1 . A. Transition learning, recognition and prediction In the fig. 4, the group T P learns, recognizes and predicts transition of places, by means of the information of the + current place P + and the previous places P P (t) = P + (t − dt). In T P , each recruited neuron receives a single not null synapse from P + , corresponding to the arrival place and a single not null connexion from P P corresponding the departure place. th The activity tP neuron is the following: k of the k   PnP  DP P tP (t) = ω (t) · p (t) + k i ik i=1 +  PnP  AP + R (1) ω (t) · p (t) − θ i ik i=1 DP with nP the number of place-cells, ωik the weight of the th connection between the k neuron of T P and the ith neuron + AP the weight of the connection between the k th of P P , ωik P neuron of T and the ith neuron of P + . After learning, DP ωik = 1 if the k th transition starts from the place i or DP AP ωik = 0 otherwise; and ωik = θR if the k th transition AP arrives in place i or ωik = 0 otherwise. Hence, if a neuron corresponds to the current performed transition, its activity is 1. Otherwise, if the neuron correspond to a possible transition, its activity is greater than 0 but less or equal than 1 − θR (which is the activity of a prediction). DP AP Initially ωik = 0 and ωik = 0 for all ik. The DP modification of ωik must enable to predict the transition + k when the index of the current activated neuron in P P is AP i. In the same way, ωik must enable the recognition of the transition k when the index of the current activated neuron in P + is i. The following equation is used (one-shot Hebbian learning rule): 1 Notation: In the following, the elements of a vector X are noted x and k the dimension of the vector X is noted nX , corresponding to the number of recruited neuron in the group X.

Drives PFC Transition chronology

CA1

dk DC ωik CC ωik

Transition performed EC Place−cells WTA

CA1 or N−ACC

Cognitive map

tC k

tP k

p+ k

pk

tR k

CA3 Transitions

Proposed transition

ck

ǫT tG k

X AP ωik

TM ωik

ǫP DP ωik

Mean mvt

Previous place

Predicted mvt

mP k

pP k

mTk

Cerebellum

DG Increasingly active neuron Multiplicative link

X

One to one non modifiable link One to all modifiable link

Neuromodulation

Fig. 4. Neural architecture for planning: the system learns instantaneous temporal transitions between two successive places and predicts the possible transitions according to the current place. The cognitive map learns in a graph the temporal connexion between two successive transitions and can also associate some particular transitions with the satisfaction of a drive. When the motivation for this drive is activated, the corresponding transition neuron is stimulated. Its activity is diffused through the graph and enables to predict the best transition according to the current place. Labels of the group correspond to the putative anatomical structures at the origin of the model.

DP dωik = ǫT P (t) · RkT (t) · pP i (t) dt

(2)

and

AP dωik = ǫT P (t) · θR · RkT (t) · p+ (3) i (t) dt R with ǫT P (t) = 1− Γ0+ ( max tP k (t)+θ −1)), a recruitment

Γ

k=1..nT

signal spiking when the current performed transition has not yet been learned2 . If the current transition is unknown (no transition has an activity greater than 1 − θR ), a new neuron will be recruited by means of RkT which specifies the index of the neuron to recruit. nT must also be incremented. Our proposition is: nT (t)

=

Γ

RkT (t)

1 − Γ1 (|k − nT (t − dt) − 1|)

(4)

RnTT (t−dt)+1 (t)

(5)

= nT (t − dt) +

Since the recruitment signal ǫT P (t) depends on the group activity, the activity of the new recruited neuron is computed after the learning. The fig. 5 illustrates the dynamics of the group T P , and shows how the group learns, recognizes and predicts transition of places. Since the learning of new transitions is not restricted to the instant when the current place changes, both effective transitions (XY-like transition) as well as reflexive transition (XX-like transition) are learned. Γ

2

Γy (x) = 1 if x ≥ y and 0 otherwise (Heaviside function). By extension, we use the following notation: Γy+ (x) = 1 if x > y and 0 otherwise.

It could have been more intuitive to only consider effective transitions, but the learning of the reflexive transitions largely increases the efficiency of cognitive map: If two sequences ABCDE and FGCHI has happened, the learning of the effective as well as the reflexive transition enable to diffuse the goal gradient from E to A and to F (the sequences ’AA-AB-BB-BC-CC-CD-DD-DE-EE’ and ’FF-FG-GG-GCCC-CH-HH-HI’ share a common element ’CC’) whereas the learning of the sole effective transitions only enables to diffuse the gradient from E to A (the sequences ’ABBC-CD-DE’ and ’FG-GC-CH-HI’ do not share any common element). Obviously, the prediction of transition has similarities with the first predictive stage of a particle filter [6]. Indeed, in particle filters, the assumption the hidden states are defined by a Markovian chain is similar to the assumption in our architecture that an experimented transition is likely to happen again. However, our neural architecture supposes the probability that experimented transitions happens again is equal to 1, whereas particle filters use refined value of the probabilities. It might be really interesting in the future work to dig this comparison in order to refines the prediction capabilities of our architecture. B. Chronology and cognitive map learning Thanks to the dynamics of the group T P , it is easy to access to the chronology of the performed transitions.

Γ

T P (t)

D A

P + (t)

A

AA

AA B

AB

B

BB

MB ωkj

BB C

BC

BB ωij

MB ωlj XY

A

A

A

P + (t − dt)

B

B

tC j

cj

tC i

ci

XY

AA

AA B

AB

AB

B

BB BC

BB C

XX

C

BC

CC CD

A

A

B

B

XX

BB ωji

C

a) Fig. 5. Transition learning, recognition and prediction. When the current place changes, the neuron coding this transition is the most activated (activity is 1). Other activated neurons (activity is less or equal to (1 − θR )) correspond to the possible transitions starting from the place at t − dt. The robot performs twice the sequence A..AB..BC..C. At the first trial (upper line), the robot learns the corresponding places and place transitions. At the second trial, the robot is able to recognize and to predict all the experimented effective (XY-like) and reflexive transition (XX-like).

The group characterising the chronology of the performed transitions T C is a memory with a small forgetting factor. It simply memorizes the identity of the most activated neuron in T P each time a new transition is performed. The group identifying the realized transitions is noted T R : tR k = 1 if R k = argmax tP i and tk = 0 otherwise. The update equation i=1..nT

b) Non−blocking synapse, high weight Blocking synapse, high weight Non−blocking synapse, low or nul weight Blocking synapse, low or nul weight

c) Fig. 6. Active and passive synapses according to the neuromodulation: fig. a) corresponds to the learning and fig. b) corresponds to the diffusion of the goal activity. The weight modification (of the recurrent synapses and the synapses coming from the motivation neurons) occurs when the synapses are silent.

the activity of the motivation neurons:   CC MB ωik (t).ci (t), ωjk (t).dj (t) gkT (t) = max (i6=k) , (i=1..nT ) j=1..nD

of T C is:

Γ





R B T tC i (t) = max ǫ (t)· Γ0 ti (t)−s



 , λT ·tC (t−dt) (6) i

with ǫT (t) a signal spiking each time a transition is realized, sB a threshold over which a transition is considered to have B been realized (as tR < 1 is sufficient). i (t) = {0, 1}, s Hence, the activity of a particular neuron is the following: k(i) tC with k(i) the number of time steps since the i = λT last time the ith transition was performed. ǫT (t) spikes each time a transition is realized (i.e.: when the most activated transition in T P has just changed). ǫT (t) =

nT  X R +  ti (t) − tR i (t − dt)

The update equation of the recurrent synapses of cognitive map is computed as follow: “ dω BB ” ij

dt

i6=j

.

(7)

i=1

This equation provides the chronology of the performed transition downstream the cognitive map. To ensure that the cognitive map only associates the current transition with the previous one, all the neurons of T C , excepted the two most activated neurons, are inhibited. The update equation of the cognitive map is the following:  ck (t) = 1 − ǫT (t) · gkT (t) + ǫT (t) · tC+ k (t)

(9) The association between a transition and a drive (among the nD drives) is performed as follow, using the co-activation of a drive neuron and the neuron corresponding to the last realized transition: MB   ωij MB (t) · di (t) · cj (t) (10) = ǫT (t) · 1 − ωij dt

(8)

The association in the cognitive map between the current transition and the previous one is performed each time ǫT (t) spikes in order to reinforce the synapse between the neurons corresponding to the two last performed transitions. T C+ is simply the binarized activity of T C (use of a Winner Takes All, as for P and P + ). Otherwise, the cognitive map diffuses

= ǫT (t)

„“ ” dR ”“ BB 1+ γT − ωij (t) .ci (t).tC j (t) dt “ ”« BB −ωij (t). (γ0 + γ1 .tC (t) + γ .c (t) 2 i j

(11)

In this equation, we can see that the update of the synapse ij T uses the term tC j (t) and ci (t) when ǫ (t) spikes. Fig 6 gives the neural scaffold to implement the proposed equation. This architecture is inspired from models of cortical column [1], [33]. ǫT (t) controls which pathway is enabled during the learning and during the diffusion of the goal activity. The equation also considers three forgetting terms: the term γ0 is a weak passive forgetting factor applied to all the connexion whereas the terms γ1 .tC j (t) and γ2 .ci (t) correspond to stronger active forgetting modulations, which effect is illustrated by the fig. 7. Fig. 8.a shows all the learned transitions after a long random exploration of the simulated environment. By modulating the prediction of the possible transitions with the cognitive map gradient and applying a competition in the group

Performed transition Non−performed transitions which connexions with departure and arrival transition (reflexive transitions represented as a black circles) are subject to the active forget

Fig. 7. Active forgetting: When the robot goes from a place A to a place B, all the connexions linking XA to AA and all the connexions linking BB to BY are subjected to the active forgetting. However, the connexions linking the successive performed transitions (AA-AB and AB-BB) will be more reinforced than forgotten.

Fig. 9. Behavioral dynamics of the simulated robot when the motivation is activated (blue point in the environment of the figures 8).

Fig. 8. Transitions and cognitive map after a long random walk in the simulated environment. Each transition is represented by a yellow segment between the departure and arrival places. The dashed part of the segment correspond to the arrival place of the transition. In each place, the reflexive transition is represented by the empty circle. The left figure represents all the learned transition. The right figure shows, for each departure place, the most activated transition in the cognitive map when the motivation related to a drive is activated (the blue point in the left bottom corner is the goal). The color and the width are correlated with the activity of the neuron corresponding to the transition in the cognitive map. The cognitive map activity corresponds to a goal proximity gradient: the closer from the goal the transition is, the more activated the corresponding neuron is.

T G , the best transition to return to the goal is computed (see fig. 8.b). C. Path integration and sensory-motor learning In order to physically realize the transitions provided by T G (the best transitions to return to a goal), the group M T is devoted to compute a sensory-motor learning between each performed transition and the movement integrated during the crossing of the departure place. The performed movement, coded as a neural field by the mP k , is inferred by means of a path integration mechanism providing the current orientation θU (t) [16].   2.k.π U − θ (t) (12) mP (t) = 1 + cos k nM h i+ P P + mP k (t − dt) − mmax (t − dt) · ǫ (t − dt)

with nM the number of neurons coding the actions (nM = 61 in our applications), with θU the orientation of the unitary movement between t−dt and t, and with mP max (t) the activity of the most activated neuron of the group mP . ǫP (t − dt) is a binarized signal spiking if the current and the previous loPnP  + + cations are different: ǫP (t) = i=0 . pi (t) − p+ i (t − dt) This signal enables the reset of the integrated movement. The

update of the motor neurons according to the most activated transition to realize is computed as follow: nT   X T TM mk (t) = ωik · tG (13) i i=1

The sensory-motor learning follows an Hebbian learning rule: TM dωik P P = ǫT A · tG (14) i · mk · ǫ (t) dt with ǫT A the learning speed. Fig. 9 shows the trajectory of the robot after a long exploration and the discovery of a goal (blue circle on fig 8) for many starting points. The experiment validates the approach in theoretical conditions (non-ambiguous landmarks): all the trajectories converge toward the goal. IV. E XPERIMENTS IN REAL ENVIRONMENT In this section, we propose experiments in real environment, to show the adaptability of our approach. As in the simulated environment, the robot performs a pseudo-random walk during 45 mn. The robot keeps the same orientation during the crossing of a place-cell, and can only heads for another direction when it enters a new place-cell. The environment is bounded by small walls which are not visible for the robot camera (see 10.b). The learning of a new place is triggered when the activity of all the place-cells are under a given threshold. Fig. 10.a shows the trajectory of the robot during the random walk which lasted 45 mn corresponding to 3000 iterations of the architecture. The speed is not as high as possible because the tracking software runs in parallel. Fig. 10.b shows the positions where the place-cells were learned. The learned transitions are characterized by a segment between the departure place and the arrival place (the dashed side of the segment indicates the arrival place). Of course, the robot never accesses neither to the information of the learning position nor to the identity of the places defining the transitions. The place-cell and the cognitive map learning remain stable during the random walk.

0

50

100

150

200

250

300

350

400

450

500 50

100

150

200

250

a)

300

350

400

450

b)

Fig. 10. a) Trajectory of the robot during the random walk of 42 mn (3000 running steps of the neural network). b) Graph of the learned transitions after the random walk in fig. 10.a. The spatial learning remains stable.

to use quantitative analysis in order to compare our algorithm with classical probabilistic methods. Nevertheless, our algorithm has also some built-in properties, which rarely exist in classical probabilistic methods, such the adaptability to the size of the environment (place field shape is homothetic with the distance to the landmarks). Anyway, insofar as social rather than industrial robotics is concerned, we claim that the precision of the path is far from being the most important criterion, as compared to criterions as the robustness of the behavior in changing environment, the rapidity of knowledge acquisition [22], the size of the internal representation [40], the capability to deal with correspondence problems [4], the smoothness of the computed path [30]... For a long time, most approaches has striven to keep a constant precision all along the mission, whereas very accurate positioning is in fact needed only on critical phases. Intelligence is not in optimality, intelligence is in adaptability and robustness. V. C ONCLUSIONS AND

Fig. 11. Homing trajectories toward the goal, discovered at the end of the exploration (fig. 10.a). The cognitive map has latently been built and is directly usable as soon as the discovery of the goal. The robot is able to plan its actions and to return to the goal from all the positions in the environment.

Finally, the architecture is evaluated (qualitative analysis). At the end of the exploration, the robot is in the upper right corner of the environment, where the discovery of a goal satisfying a drive is simulated. Then, the drive is activated. The robot is positioned in different positions and the trajectories are recorded. Fig. 11 shows these trajectories and demonstrates that the robot is able to return to the goal from each position in the explored area. The latent learning performed during the exploration enables the robot to correctly select the transitions which get it closer to the goal. The sensory-motor learning enables to physically execute these transitions. The robot is able to plan its way back to the goal. For a quantitative analysis of the results, trajectories should be compared with the direct path toward the goal. But the paper primarily aims to show, for the first time, that our system is able to reproduce results obtained in simulated environment. Obviously, in future work, it will be important

PERSPECTIVES

We proposed a neural architecture providing a mobile robot with planning capabilities. Experiments in real unknown indoor environments highlighted the stability of the system and demonstrated the planning capabilities of the robot. However, longer experiments in wider and less controlled environments (changing environment, outdoor environment with changing illumination, presence of unexpected obstacles ...) have to be undertaken to firmly validate the approach and generalize its use in robots evolving in a classical human world. This architecture is inspired from our ongoing work related to the understanding of the role of the mammals brain structures. The recently discovered grid-cell [25] is explained in our last model [16] as the manifestation of the computation of a long distance path integration in the parietal cortices in parallel of a short distance path-integration mechanism in the hippocampus. The grid-cells activity enables to disambiguate similar visual situations according to the path-integration system, bridging the gap between our topological approach and metrical approach proposed by in classical robotics. The closing of the hippocampal loop proposed in [16] enables to access to anticipation capabilities which can be seen as the complementary of the reactive behavior. Indeed, whereas classical sensory-motor architectures only allow to perform a goal-oriented action according to the current state: the predictions tell what to do, the closing the hippocampal loop focuses on the anticipation of the effects of an action on the current sensory state and the use of such a knowledge to disambiguate similar visual state: the prediction tells what to see. Moreover, the general architecture is not exclusively devoted to solve navigation problems. Our more general model is able to performs complex temporal sequence learning of robotics arm movement [3], [26]. We also currently investigate how temporal sequence learning and spatial learning could be merged in a single control architecture, which could remain efficient even if the visual modality becomes

temporary or spatially unreliable (in darkness for example) [27].

Video of the experiment of fig. 11 is available on the webpages the authors. R EFERENCES [1] F. Alexandre, Y. Burnod, F. Guyot, and J. P. Haton. La colonne corticale, unit´e de base pour des r´eseaux multicouche. C.R.Acad.Sci.Paris, 309(III):259–264, 1989. [2] H. Andreasson and T. Duckett. Topological localization for mobile robots using omni-directional vision and local features. In Proc. IAV 2004, the 5th IFAC Symposium on Intelligent Autonomous Vehicles, Lisbon, Portugal, 2004. [3] P. Andry, Ph. Gaussier, J. Nadel, and B. Hirsbrunner. Learning invariant sensorimotor behaviors: A developmental approach to imitation mechanisms. Adaptive behavior, 12(2):117–138, october 2004. [4] A. Angeli, D. Filliat, S. Doncieux, and J-A. Meyer. Real-time visual loop-closure detection. In 2008 IEEE International Conference on Robotics and Automation (ICRA2008), 2008. [5] A.A. Argyros, C. Bekris, S.C. Orphanoudakis, and L.E. Kavraki. Robot homing by exploiting panoramic vision. Journal of Autonomous Rbots, 19(1):7–25, 2005. [6] S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. A tutorial on particle filters for on-line non-linear/non-gaussian bayesian tracking. IEEE Transactions of Signal Processing, 50(2):74–188, 2002. [7] I.A. Bachelder and A.M. Waxman. Mobile robot visual mapping and localization: A view-based neurocomputational architecture that emulates hippocampal place learning. Neural Networks, 7(6/7):1083– 1099, 1994. [8] J.P. Banquet, P. Gaussier, J. C. Dreher, C. Joulain, and A. Revel. Spacetime, order, and hierarchy in fronto-hippocampal system: A neural basis of personality. In Gerald Matthews, editor, Cognitive Science Perspectives on Personality and Emotion, volume 124, pages 123– 189, Amsterdam, 1997. North Holland. [9] R. Benenson, S. Petti, T. Fraichard, and M. Parent. Toward urban driverless vehicles. International Journal of Vehicle Autonomous Systems, 2006. [10] B.A. Cartwright and T.S. Collett. Landmark learning in bees. Journal Comp. Physiology, 151:521–543, 1983. [11] N. Cuperlier, M. Quoy, C. Giovannangeli, Ph. Gaussier, and Ph Laroque. Transition cells for navigation in an unknown environment. In The Society For Adaptive Behavior SAB’2006, pages 286– 297, Roma, Italy, 2006. [12] A. J. Davison and N. Kita. Sequential localisation and map-building for real-time computer vision and robotics. Robotics and Autonomous Systems, 36:171–183, 2001. [13] G.N. DeSouza and A.C. Kak. Vision for mobile robot: a survey. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(2):237–267, 2002. [14] D. Filliat and J.-A. Meyer. Global localization and topological map-learning for robot navigation. In Proceedings of the seventh international conference on Simulation of Adaptive Behavior, pages 131 – 140, 2002. [15] M.O. Franz and H.A. Mallot. Biomimetic robot navigation. Robotics and Autonomous System, 30:133–153, 2000. [16] P. Gaussier, J.-P. Banquet, F. Sargolini, C. Giovannangeli, E. Save, and B. Poucet. A model of grid cells involving extra hippocampal path integration and the hippocampal loop. Journal of Integrative Neuroscience, 6:447–476, 2007. in press. [17] P. Gaussier, C. Joulain, J.P. Banquet, S. Leprˆetre, and A. Revel. The visual homing problem: an example of robotics/biology cross fertilization. Robotics and autonomous system, 30:155–180, 2000. [18] P. Gaussier, C. Joulain, S. Zrehen, J.P. Banquet, and A. Revel. Visual navigation in an open environement without map. In Int. Conf. on Intelligent Robots and Systems - IROS’97, pages 545–550, Grenoble, France, September 1997. IEEE/RSJ. [19] P. Gaussier, A. Revel, J.P. Banquet, and V. Babeau. From view cells and place cells to cognitive map learning: processing stages of the hippocampal system. Biological Cybernetics, 86:15–28, 2002.

[20] P. Gaussier and S. Zrehen. Perac: A neural architecture to control artificial animals. Robotics and Autonomous System, 16(2-4):291–320, December 1995. [21] C. Giovannangeli, P. Gaussier, and G. D´esilles. Robust mapless outdoor vision-based navigation. In Proc. of the 2006 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS 2006), pages 3293– 3300, Beijing, China, 2006. [22] C. Giovannangeli and Ph. Gaussier. Human-robot interactions as a cognitive catalyst for the learning of behavioral attractors. In 16th IEEE International Symposium on Robot and Human Interactive Communication 2007, pages 1028–1033, Jeju, South Korea, 2007. [23] C. Giovannangeli and Ph. Gaussier. Orientation system in robots: Merging allothetic and idiothetic estimations. In Proc of the 13th Int. Conf. on Advanced Robotics, pages 349–354, Jeju, South Korea, 2007. [24] C. Giovannangeli, Ph. Gaussier, and J.-P. Banquet. Robustness of visual place cells in dynamic indoor and outdoor environment. International Journal of Advanced Robotic Systems, 3(2):115–124, jun 2006. [25] T. Hafting, M. Fyhn, S. Molden, M.B. Moser, and E.I. Moser. Microstructure of a spatial map in the entorhinal cortex. Nature, 436:801–806, 2005. [26] M. Lagarde, P. Andry, and Ph. Gaussier. The role of internal oscillators for the one-shot learning of complex temporal sequences. In Artificial Neural Networks – ICANN 2007, volume 4668 of LNCS, pages 934– 943. Springer, 2007. [27] M. Lagarde, P. Andry, Ph. Gaussier, and Giovannangeli C. Learning new behaviors : Toward a control architecture merging spatial and temporal modalities. In Workshop on Interactive Robot Learning - International Conference on Robotics: Science and Systems (RSS 2008), 2008. [28] D. Lambrinos, R. Moller, T. Labhart, R. Pfeifer, and R. Wehner. A mobile robot employing insect strategies for navigation. Robotics and Autonomous Systems, 30:39–64, 2000. [29] S. Leprˆetre, P. Gaussier, and J.P. Cocquerez. From navigation to active object recognition. In The Sixth Int. Conf. on Simulation for Adaptive Behavior SAB’2000, pages 266–275, Paris, 2000. MIT Press. [30] E. Magid, D. Keren, E. Rivlin, and I. Yavneh. Spline-based robot navigation. In Proc. Of the IEEE/RSJ International Conf. on Intelligent Robots and Systems (IROS2006), 2006. [31] J. O’Keefe and N. Nadel. The hippocampus as a cognitive map. Clarendon Press, Oxford, 1978. [32] J.K. O’Regan and A. No¨e. A sensorimotor account of vision and visual consciousness. Behavioral and Brain Sciences, 24(5):939–1011, 2001. [33] N. Rougier. Mod´eles de M´emoires pour la Navigation Autonome. PhD thesis, Universit´e de Nancy 1, Nancy, France, 2000. [34] E. Royer, J. Bom, M. Dhome, B. Thuillot, M. Lhuillier, and F. Marmoiton. Outdoor autonomous navigation using monocular vision. In Proc. of the 2005 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS’05), pages 3395–3400, 2005. [35] D.E. Rumelhart and D. Zipser. Feature discovery by competitive learning. Cognitive Science, 9:75–112, 1985. [36] B. Sch¨olkopf and H. A. Mallot. View-based cognitive mapping and path-finding. Adaptive Behavior, 3:311–348, 1995. [37] R. Sim, P. Elinas, M. Griffin, and J. J. Little. Vision-based slam using the rao-blackwellised particle filter. In IJCAI Workshop on Reasoning with Uncertainty in Robotics (RUR), pages 9–16, 2005. [38] Lincoln Smith, Andrew Philippides, Paul Graham, Bart Baddeley, and Philip Husbands. Linked local navigation for visual route guidance. Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems, 15(3):257–271, 2007. [39] L.R. Squire. Memory and the hippocampus: A synthesis from findings with rats, monkeys, and humans. Psychological Review, 99:143–145, 1992. [40] S. Thrun. Robotic mapping: A survey. In G. Lakemeyer and eds Nebel, B., editors, Exploring Artificial Intelligence in the New Millenium. Morgan Kaufmann, 2002. [41] E.C. Tolman. Cognitive maps in rats and men. The Psychological Review, 55(4):189–208, 1948. [42] L. G. Ungerleider and M. Mishkin. Analysis of Visual Behavior, chapter Two cortical visual systems, pages 549–586. J. ingle, m. a. goodale, r. j. w. mansfield edition, 1982. [43] H. Voicu and N. Schmajuk. Latent learning, shortcuts and detour: a computational model. Behavioral Processes, 59:67–86, 2002.