Training Robots with Synthetic Sentience

In traditional ML applications, networks are prepared for normal operation through a training process. In unsupervised learning, the network is exposed to input media (such as text, sound, or pictures) so that it can make sense of them through grouping of input features, or temporally as sequences. In supervised learning, the network is exposed to input media and associated labels (such as pictures of animals and their names). The training process adjusts the connection weights between nodes in the network through algorithms that reduce the predictive error of the network. This process is typically static-- inputs are presented at one edge of the network, and outputs appear at the other end of the network-- no clocking is involved.

Synthetic Sentience networks are not pre-trained. Instead the robot and simulation begin processing inputs and producing randomized output behaviors in the operational environment. Because the network is simulated as a spiking neural network, it is inherently clocked, allowing it to associate time with the external world's events.

The outputs are fed back into the network both through changes in sensory inputs and internally through trunk lines between processing centers in the brain. This causes non-linear feedback in the network, giving rise to a multitude of oscillating states that also trigger outputs to sustain behaviors. Sustaining states are said to have high reactance, and dampening states, where behavior dissipates, are said to have low reactance.

The network's plasticity algorithms, including Spike Timing Dependent Plasticity (STDP), adjust the weights of the network's connections as a background process. This increases reactance for some states, and decreases reactance for others, causing trained behaviors to emerge and transients to disappear. Honed behaviors evolve with increasing fidelity and a sense of purpose over time.

Randomized behavior within the allowed physical limits gradually yields to coordinated (yet perhaps undirected) behavior, such as moving multiple limbs for locomotion, or making sounds similar to those already heard in the operating environment. Coordinated behavior arises because mathematical attractors in the network make it more likely that this occurs over time, than uncoordinated behaviors.

Once coordination is established, the robot may actually be shown how to perform its tasks, by having a trainer move the robot's limbs to perform the desired activity. By repeating the desired behavior in relevant contexts, mathematical attractors created as a result of plasticity algorithms will ultimately make it easier for the system to enter these states than others, and the behavior is learned.

Negative reinforcement may be established for errant behavior by showing correct behavior, or by taking the robot out of the physical states exhibiting errant behavior, or by potentially associating the negative behavior with other behavioral contexts not associated with the expected operational environment, confining the errant behavior to non-work time.