The Ventral Subiculum-VTA Loop: The Key Area of Reinforcement Learning and Memory Detection for a Rewarding Stimulus – A Personal View Point

The ventral subiculum region of the hippocampus (vSUB) is involved in wide range of cognitive and neurological processes. Some of the most common processes that incorporate the vSUB include reinforcement learning, sensorimotor integration, context-reward associations, and addictive behaviors [1-5], stress, anxiety, fear, schizophrenia, psychosis, Parkinson’s disease [2], sleep [6], regulation of social behavior [7], other neurodegenerative diseases such as Alzheimer’s [8], ... etc. Dopamine (DA) neurons of the midbrain, both from the ventral tegmental area (VTA) and substantia nigra pars compacta (SNc) are critically important in reinforcement learning and the processing of information associated with reward [3,9]. Due to their unique and context dependent firing behaviors in response to the presence and absence of both expected and unexpected rewards, these neurons are among few of the most widely investigated cells in the CNS. Consequently, by exploiting this unique behavior, researchers have gotten the largest research grants to investigate some of the hottest topics in the area of neuroscience including reinforcement learning, reward processing and the prediction of reward values, addiction and


Introduction
The ventral subiculum region of the hippocampus (vSUB) is involved in wide range of cognitive and neurological processes. Some of the most common processes that incorporate the vSUB include reinforcement learning, sensorimotor integration, context-reward associations, and addictive behaviors [1][2][3][4][5], stress, anxiety, fear, schizophrenia, psychosis, Parkinson's disease [2], sleep [6], regulation of social behavior [7], other neurodegenerative diseases such as Alzheimer's [8], … etc. Dopamine (DA) neurons of the midbrain, both from the ventral tegmental area (VTA) and substantia nigra pars compacta (SNc) are critically important in reinforcement learning and the processing of information associated with reward [3,9]. Due to their unique and context dependent firing behaviors in response to the presence and absence of both expected and unexpected rewards, these neurons are among few of the most widely investigated cells in the CNS. Consequently, by exploiting this unique behavior, researchers have gotten the largest research grants to investigate some of the hottest topics in the area of neuroscience including reinforcement learning, reward processing and the prediction of reward values, addiction and substances of abuse, and the newly emerging field of Neuro-Economics [10][11][12][13][14][15][16].
The ventral tegmental area, VTA and the substantia nigra parse compact, SNc, of the midbrain are the seat loci for cell bodies of DAergic (DA realizing) neurons [17]. Dopamine neurons from each nucleus send differential projections and make synaptic communications with several other cortical and subcortical areas [17][18][19]. The nature of projection of DA neurons to target loci is highly variable [18,19], some brain areas get the highest and the densest synaptic innervation whereas others get very few fibers and thus the least synaptic innervation. The accumbal region of the striatum, NAc, and the prefrontal cortex (PFC) are some of the areas with the highest projections from both nuclei, and the hippocampus, including the vSUB, receives the least projections of DAergic neurons -only 15 to 18% of the total mesolimbic DA population. The largest proportion of these putatively DAergic projecting neurons come from the VTA as compared to the anteromedial portion of the SNc [17,20]. With regard to DA neurons of the midbrain, the literature shows that there is a ventromedial and dorsolateral segregations both in terms of anatomical distribution as well the nature of synaptic projections, which scientists argue has its own intrinsic impact on the likelihood effects of addictive drugs such as psycho stimulants becoming addictive/reinforcing/, for instance, intracranial administration of cocaine or amphetamine results in pronounced addictive behaviors when administered into the ventromedial region of the VTA as opposed to when applied on the dorsolateral side [20].
Thus, it can be speculated that the anatomical disparities in the proportion and nature of projection of mesolimbic DA neurons has potentially created its own inherent bias on the attitude and research interests of neuroscientists -spending more funds for investigating the mesostriatal DA pathway as opposed to the VTA-hippocampus pathway. For instance, if one searches for the number of peer reviewed research articles in the High wire Press journal database, which is sponsored by Stanford University, the search outcome can simply reveal this bias. On this database if we search for articles containing all of the words "Dopamine, Reinforcement Learning, Reward Processing" anywhere in the text whose title must contain "Hippocampus" from the time duration of Jan. 1917 to Oct. 2017, the query turns in with only 17 articles. However, the same query whose title containing: "ventral tegmental area" results with 31 and for "Nucleus Accumbens" with 118 articles, respectively. This clearly indicates that when it comes to research projects geared towards investigating reinforcement learning and reward processing, researchers focus at least seven times more on NAc and twice as much on VTA as opposed to the hippocampus.
Nevertheless, sparse anatomical distribution and lesser synaptic innervation doesn't imply a functional insignificance; it could potentially mean to enhance the quality of incoming signal by refining the signal from noise; fine-tune the required information and filter out unnecessary and redundant ones to retain the most essential information to be stored in the long-term memory compartments of the hippocampus. In fact, it is well established fact that, despite this sparse nature of innervation of the vSUB by VTA DA neurons, research shows that the Hippocampus-VTA loop is the gateway for encoding the entry of any novel information from the outside world into the CNS before this novel information is stored as a long-term memory in the hippocampal memory storage compartments [21]. Recent fMRI study on human subjects also showed differential convergent connectivity between the hippocampus, the NAc and the VTA centered at the ventral region of the hippocampus [18,19]. In the top-down pathway of the loop, newly arrived information travels from the vSUB to NAc to ventral pallidum and then finally is conveyed to the VTA. Upon the arrival of this new information, VTA DA neurons get activated and show the aforementioned novelty dependent unique firing property [burst] [21]. On the other hand, once the bottom-up pathway is activated, DA is released within the hippocampus which, when the synapse is strengthened enough, results in the maintenance of LTP and the enhancement of learning and memory pertaining the information that entered into the CNS via the top-down pathway [21].
Looking at evidence from the neurophysiological and neuropharmacological experiments, we find that the electrophysiological firing properties of DA neurons have been widely studied both in-vivo and in-vitro. Through employing various models of cell and molecular assays at the neuronal level (e.g. from fresh brain slices) as well as various models of behavioral assays at the organism level (e.g. correlates of firing behaviors of these neurons from awake behaving and/ or anesthetized subjects), findings from electrophysiological experiments have consistently shown that the firing patterns of DA neurons are unique and this unique behavior is dependent on the contingencies of the environment and associated cues. The credit goes to decades of diligent research works on the behavior of mesolimbic DA neurons from the laboratory of Wolfram Schultz and his Colleagues [3] clearly show that midbrain DA neurons reliably encode "surprises" by phasing into a phasic burst of action potentials that comes immediately (within less than 0.5 seconds) after the arrival of a novel and unconditioned reward, however, if the animal is trained and conditioned in such a way an environmental cue is paired with it (e.g. light tone, bell ring, … etc.) that comes at least within 1 second preceding the actual arrival time for the delivery of the reward, the timing of the phasic burst now shifts towards the timing of the environmental cue occurring within less than 0.5 seconds post the onset of the cue. However, the neuronal firing behavior remains similar with the baseline at the delivery time of the predicted reward [15]. Surprisingly, the firing pattern changes dramatically when the expected reward is temporally and/or spatially withdrawn from its expected time and space. Surprisingly enough, the neurons report the absence/withdrawal/ of the expected reward by phasing into a complete silence, which is manifested by occasional depression of no firing signals or slowing down also known as the pause. Given all environmental contingencies intact, the pause occurs immediately after the learned timing of reward delivery and lasts for duration of less than 0.5 seconds [15]. Due to this shift on the firing behavior of mesolimbic DA neurons, I preferably nick them as the "Surprise Neurons".
The pause signal of DA neurons is also eminent in response to an aversive or painful physical stimulus (e.g., a puff to the hand, application of hypertonic saline to the mouth, when a brief pinch is applied to the tail of the subject) and this is true even in subjects under deep anesthesia [22,23]. This suggests that these neurons reliably discriminate between appetitive, neutral, and aversive stimuli as long as these stimuli are not sufficiently similar in their physical magnitude and timing [24]. Consistent with the facts of the literature in cellular and molecular electrophysiology experiments, electrophysiological recordings from freshly prepared brain slices have replicated these unique properties of DA neurons [25,26]. Under such experimental settings both phasic burst and "pause" (hyperpolarization) can be pharmacologically triggered once the membrane is sufficiently depolarized. Accordingly, agonistic activation of the glutamate sensitive AMPA followed by NMDA receptors results in burst of DA neurons, both in-vivo and in-vitro [27] whereas 'paus' is analogously described as "hyperpolarization" which requires Ca ++ release from intracellular stores (such as from the endoplasmic reticulum) following a series of cellular cascadesto activate IP 3 receptors.
The activation of the metabotropic glutamate receptor subtype 1 (mGLUR1) follows activation of IP 3 receptors leading to opening of the slow conductance Ca ++ dependent K + channel called SK [26][27][28]. Despite the above clearly established facts on the unique firing properties of midbrain DA neurons, the role of the vSUB in absolutely controlling this behavior, and the anatomical dorso-ventral disparities [29] on the distribution and nature of projections of these neurons, there is no clearcut evidence pertaining how the firing patterns of DA neurons gets altered when, for instance, the gateway for the entry of novel information is lesioned or when the upper arm of the vSUB-VTA loop is degenerated due to some neurodegenerative diseases. How do VTA DA neurons behave if they are devoid of novelty information by either blocking the vSUB-VTA topdown pathway or by disrupting the synaptic connections of the vSUB by breaking local interneurons? And, how does this blockade affect, for instance: acquiring and storing the memory of a newly arriving information, reinforcement learning, reward processing, and addiction and associated behaviors?
Gradually, there is some trends of shifting of research projects towards circuit dependent learning plasticity, trying to underpin the neural substrates of learning and memory at its earliest stage -acquisition, which provided more opportunity to re-investigate the vSUB and its connections with the limbic system. Looking into the role of vSUB in addiction, a research by Bossert and Stern [1] showed that cocaine seeking behavior was initiated by stimulating the vSUB using high frequency electrical stimulation, HFS (HFS: determined as 500 pulses, at 400 Hz, for a pulse duration of 250uS) [1]. This high HFS results in the release of glutamate in the VTA, activated VTA DA neurons in-vivo and triggereda NMDA dependent long lasting synaptic transmission ex vivo [30,31]. Additionally, researchers also show that retrieval and memory consolidation is coordinated at the junction of the vSUB and CA1 regions of the ventral hippocampus [32][33][34]. Furthermore, in mice and human subjects, it has recently been shown that retrieval of memory involves coordinated fast interplay of sparse and distributed cortico-hippocampal and neocortical networks [33]. In an effort to behaviorally test whether reinforcement learning is under the control of the ventral hippocampus, I investigated the circuit in somehow complicated rat model of intracranial METH induced conditioned place preference (CPP).
Consistent with some of the electrophysiological findings and with Donald Hebb's theory of learning plasticity and organization of behavior, we found that repeated intra-cranial conditioning and chemical stimulation by applying METH, via reverse microdialysis, into the top-down arm (upper arm) of the hippocampus-VTA loop in a sequential order of stimulating the ventral hippocampus (including vSUB), followed by VTA, followed by NAc resulted in the long lasting aversion of place preference for METH, whereas the same dose of METH produced a long lasting positive place preference when the sequence of administration begins by chemically stimulating the bottomup/lower arm/ arm in the order of VTA, followed by ventral hippocampus, followed by NAc [35]. Furthermore, consistent with the literature the place preference was more pronounced when applied in the NAc compared to the latter two. Keleta and Martinez [35] showed that the aversion was NMDA receptor activation dependent because co-administering MK801 with METH (1:1 concentration ratio) produced no aversion from CPP [35], which led us to speculative statement that the acquisition stage of reinforcement learning is under the direct control of the vSUB. More importantly, a low frequency chemical stimulation of the vSUB using METH potentially increases the likelihood of activation of D 2 receptors which further gives the time-window for the expression of LTD in VTA DA neurons, and a Hebbian type learning plasticity of the vSUB-VTA circuit as a whole.
Thus, on the basis of past and current literature review, it can be hypothesized that repeated low frequency electrical stimulation or repeated pharmacological inhibition of the VTA DA neuronsmay potentially help wane or alleviate drug seeking behavior through a cellular mechanism involving the activation of D 2 receptors. Because of their high affinity for DA (and its agonists) repeated activation of D 2 receptors located at the vSUB-VTA (from the top-down arm) will cause the expression of LTD in the VTA neurons, which in the long run will result in the Hebbian-type synaptic strengthening of the to encode aversive information related with drug seeking behavior. In this review I take the view that the story behind the dopamine hypothesis of reinforcement learning is incomplete without an intact function of the vSUB-VTA loop. The loop encodes and modulates information entering the CNS on the basis of novelty, frequency and the strength of the stimulus to be stored in the long term memory compartments of the hippocampus.

Discussion
It is well established fact that the mesolimbic DA system is highly implicated in reinforcement learning and addictive behaviors. However, researchers seem to ignore and fail to consider the ventral subiculum, the most inferior segment of the hippocampal formation, as a critically important component of the mesolimbic dopamine system [ventral subiculum is also known as the output region of the ventral hippocampus herein defined as the vSUB]. In synchrony with synaptic integration of DAergic input from the VTA, vSUB is the focal area of novelty detectors, i.e., neural processes through which the reward values and novelty components of sensory stimuli entering the CNS are reliably encoded and fine-tuned before they get stored in the long term memory storage compartments of the hippocampus. Thus, the unique firing behavior of DA neurons is directly under the patronage of the vSUB.
Under normal function of the loop and under normal circumstances, one can simply infer that the temporal shift in the firing behavior and patterns of DA neurons in response to the presence or absence of both expected and unexpected rewarding stimulus does not standalone well without an intact functioning of the vSUB-VTA loop, and hence, instead of investing more time and money investigating the mesostriatal DA system (the NAc and its networks), which is primarily the area of expression and action selection for behaviors that are learned and already gone awry, researchers should shift gears and make the vSUB-VTA loop as the main focal point of investigation. Because it is the initial gate, or checkpoint, for fine-tuning the entry of any salient sensory stimulus from the outside world into the CNS, integrating this region as part and parcel of the mesolimbic DA system and treating it as "the trigger locus of reinforcement learning and memory acquisition for novel stimulus entering the CNS" or redefining it as the "axon hillock loop of novel behaviors and memory acquisition" is not an exaggeration. The fine-tuning and filtering malleability of the system can be compromised in disease states such as schizophrenia. Hence, the breakdown and disruption of the vSUB-VTA loop in disease states including schizophrenia [2] is a clear manifestation about how significant the loop is in maintaining the stability and functional integrity of the mesolimbic DA system as well as other adjacent cortical and subcortical regions.
Thus, under normal circumstances and intact cellular organizations, it is permissible to summate that the temporal shift in the firing properties of these neurons signals us with three distinct information. First) they are religiously reporting that "something new" (novel)has just happened, second) for this novel sensory information contained within the novelty signal to be conveyed and reported as a new incident, either through the inhibition (followed by the expression of LTD) or excitation (followed by the expression of LTP), it has to be first funneled through the top-down arm from the vSUB passing via the ventral pallidum and the anteromedial area of the bed nucleus of the striaterminalis (amBNST) into VTA DA neurons [21,30,31]. Consistent with evidence of the literature from intracellular and extracellular electrophysiological recordings, both from brain slices as well as from behaving subjects [3,27], this hierarchical polysynaptic flow of information must require the activation of GLUergic neurons that are potentially relayed from the vSUB via amBNST to postsynaptically excite VTA DA neurons [30,31], third) depending on the strength, familiarity, and frequency of the signal, DA neurons then exhibit three major types of firing behavior, getting excited, staying neutral, or getting inhibited .If the stimulus is not familiar (novel) and if the frequency of stimulation of the signal entering through the vSUB onto VTA DA neurons is strong enough, it will excite and induce phasic burst, resulting in the release of DA at cortical and subcortical areas including selectively potentiating subicular drive to the NAc and the expression (maintenance) of motivational behavior associated with the strong signal (i.e. positive reinforcement learning).
Repeated and high frequency stimulation, or synaptic plasticity at excitatory synapses on to VTA DA neurons [36,37], will ultimately have its own effects on the expression of LTP and the maintenance of behaviors tagged with this positive reinforcement learning [38]. On the other hand, if the frequency of stimulation is weak, it inhibits these neurons potentially through the activation of D 2 receptors, i.e., VTA DA neurons will enter into a depression mode (pause) that will be encoded as less motivating stimulus or aversive. This information will be relayed to other adjacent cortical and subcortical are as to be encoded as aversive stimulus (i.e. negative reinforcement learning). Repeated low frequency stimulation will ultimately have its own effects on the expression of LTD and the maintenance of behaviors related with aversive/negative reinforcement learning. For familiar information that has already been stored previously in the hippocampal long term memory storage compartments [39][40][41] VTA DA neurons show no eminent alterations from their baseline firing behavior, they stay neutral [3,15].
Synaptic integration of information signaled back and forth between the vSUB and the mesolimbic DA [42][43][44] is essential in the processing of contextual and spatial information depending on the individual's internal states, the relevance of the information to the survival of the organism, and contingencies with the environment [1,17]. Context dependent reinstatement of heroin (as well as cocaine) seeking behavior can be decreased by selectively inhibiting the excitability of neurons within the vSUB using the GABAergic agonists muscimol and baclofen whereas application of these agonists into the dorsal aspect of the subiculum had no effect on the reinstatement of these behaviors [1,2]. However, under abnormal physiological and psychological circumstances the vSUB-VTA loop gets disrupted. For instance, disruption in the interneuronal regulation of the vSUB leads to exaggerated hyperactivity of VTA DA neurons which is eminent in the case of schizophrenia and psychosis [2]. Furthermore, specific lesioning of the vSUB, corrupts the vSUBhypothalamic circuitry which in turn is implicated with some neurodegenerative disorders including Alzheimer's disease [8]. Therefore, it is very clear that before the well-established unique behavior of DA neurons goes abnormal and awry, the earliest signal of the dysregulation or impairment can be manifested at the vSUB-VTA juncture.

Conclusion
It is well established fact that, under normal circumstances and intact anatomical connectivity, the mesolimbic dopamine neurons, both within the SNc and the VTA, show context dependent unique firing behaviors in response to the presence and absence of both predicted and unpredicted reward. Although tonic and spontaneously active at the baseline as if they are always on the 'standby", their unique behavior gives them the nick name "surprise neurons" because they enter into burst activity when reward is presented out of context, they shift and track the timing of any environmental cues that precede the timing of reward delivery by phasing into burst activity for the timing of the cues rather than the timing of the reward, and more surprisingly, when the predicted reward is withdrawn and not happening at the timing of the delivery of the reward, they express their "dismay" by phasing into a depression mode called the pause. This behavior, which shifts temporally tracking novel incidents, not tracking the reward per se, remains intact only when the vSUB-VTA loop is integral and fully functional. The impairment of the vSUB, as it is eminent in some psychotic behaviors including schizophrenia and in some neurodegenerative diseases including Alzheimer's, is a clear manifestation that the normal and unique behavior of DA neurons is under the control of vSUB. It also suggests that the earliest signals of abnormalities can be revealed within the vSUB before the behavior of the DA neurons goes awry. Therefore, taking the ventral subiculum out of the equation of mesolimbic dopamine system and reinforcement learning and focusing only investigating the mesostriatal DA and cohorts of the accumbal region, thus, is nothing less than searching for something in the jungle when it is too late. The NAc, the area of expression and maintenance of addictive behaviors, is not a good focal point to understand addictive behaviors and other psychological disorders. I believe that, investigators should shift their gears towards the focal point, the area of signal processing for the earliest and the newest information, which is the vSUB-VTA circuitry. By doing so, we can now understand the biophysiological and neuronal correlates of addictive, psychotic, and social behaviors when they happen at their earliest time. Moreover, we can also understand some of the costliest societal issues such as aging and other memory compromising neurodegenerative diseases including Alzheimer's before it is too late.