Day 2 – How the Brain Inspired AI – From McCulloch–Pitts to Modern Deep Learning

Every time you talk to a voice assistant, watch a recommended video, or receive a diagnosis from an AI medical tool, you are benefiting from an idea that originated not in a silicon laboratory but in the human brain. The history of artificial intelligence is inseparable from neuroscience — and understanding that lineage reveals why today’s deep-learning models look the way they do, behave the way they do, and are limited in the ways they are.

This post traces that journey from its earliest mathematical spark in 1943 to the trillion-parameter foundation models of the 2020s. Along the way we will meet the researchers who refused to give up during two ‘AI winters’, the architectural leaps that unlocked new capabilities, and the open questions that keep neuroscientists and machine-learning engineers in conversation today.

Key Insight The brain did not just inspire AI — it continues to inspire it. Attention mechanisms, sparse representations, memory-augmented networks, and neuromorphic chips are all modern examples of neuroscience shaping the frontier.

1943 — The Artificial Neuron is Born

McCulloch and Pitts: Logic in Biology

In 1943, neurophysiologist Warren McCulloch and mathematician Walter Pitts published ‘A Logical Calculus of the Ideas Immanent in Nervous Activity’. Their central claim was bold: the activity of a neuron could be modelled mathematically as a binary threshold unit — it either fires or it does not. Multiple inputs, each carrying a weight, are summed together; if the sum exceeds a threshold, the neuron outputs a signal.

This was the first formal model that treated biological computation as something a machine could replicate. McCulloch–Pitts neurons could implement Boolean logic (AND, OR, NOT) and, in theory, any computation expressible in propositional logic. The paper planted a seed that would take decades to flower.

The McCulloch–Pitts Neuron in Plain English Imagine a lightbulb connected to several switches. If enough switches are flipped on, the bulb lights up. Adjust which switches matter (weights) and what ‘enough’ means (threshold), and you can build surprisingly complex decision rules from this simple idea.

1949 — Learning Through Connection: Hebb’s Rule

Canadian psychologist Donald Hebb published ‘The Organization of Behavior’ in 1949 and introduced what became known as Hebb’s rule, summarised by the memorable phrase: ‘Neurons that fire together, wire together.’ The idea is that synaptic connections strengthen when two neurons are repeatedly activated together — a biological account of how the brain might learn from experience.

Hebb’s rule translates cleanly into the mathematics of machine learning as a weight-update rule: if a pre-synaptic and a post-synaptic unit are both active, increase the weight between them. While naive Hebbian learning is unstable in practice (weights can grow without bound), it seeded the later development of the delta rule, backpropagation, and Hopfield networks. Every modern learning algorithm owes something to Hebb.

1957 — The Perceptron: First Trainable Network

Frank Rosenblatt at Cornell Aeronautical Laboratory built the Perceptron in 1957 and implemented it on a custom machine called the Mark I. Unlike McCulloch–Pitts neurons, which required hand-crafted weights, the Perceptron could learn its weights from labelled training examples. The learning rule adjusted weights in proportion to errors — a simple but genuinely powerful idea.

Rosenblatt attracted considerable media attention, with headlines suggesting machines would soon think like humans. This turned out to be premature optimism, setting a pattern of hype and disappointment that would recur throughout AI history.

1969 — The First AI Winter

Minsky, Papert and the XOR Problem

In 1969, Marvin Minsky and Seymour Papert published ‘Perceptrons’, a rigorous mathematical analysis showing that single-layer perceptrons could not solve non-linearly separable problems — most famously, the XOR (exclusive-or) function. This was not merely a technical limitation; multi-layer solutions existed in principle but training them was unsolved.

The book’s pessimistic tone contributed to a dramatic collapse in AI funding through the 1970s. Researchers moved on to symbolic AI (expert systems, logic programming), and neural networks were widely considered a dead end. This period is often called the first AI winter.

The XOR Problem Explained A perceptron draws a straight line to separate two classes. XOR cannot be separated by any single straight line — you need at least two layers and a non-linear boundary. It is the simplest possible demonstration that depth matters.

1974–1986 — Backpropagation: The Algorithm That Changed Everything

Paul Werbos derived backpropagation in his 1974 Harvard PhD thesis, though he framed it in the language of control theory and it attracted little attention at the time. David Parker independently rediscovered it in 1982. The algorithm reached mainstream machine learning in 1986, when David Rumelhart, Geoffrey Hinton, and Ronald Williams published their landmark paper ‘Learning Representations by Back-propagating Errors’ in Nature.

Backpropagation solves the credit-assignment problem: how do you know which weights in a deep network contributed to an error? The answer is the chain rule of calculus, applied in reverse through the network’s layers. Errors propagate backward, each layer receiving a gradient signal telling it how to adjust its weights to reduce the loss.

With backprop, multi-layer (‘deep’) networks became trainable — at least in principle. In practice, the 1980s hardware was too slow, datasets too small, and gradients too unstable for very deep architectures. But the theoretical foundation was now in place.

1980–1998 — Seeing Like the Brain: Convolutional Networks

Neocognitron and the Hierarchy of Vision

Kunihiko Fukushima’s Neocognitron (1980) was directly inspired by David Hubel and Torsten Wiesel’s 1959 Nobel-winning discovery that the mammalian visual cortex is organised hierarchically: simple cells detect edges, complex cells detect patterns, and higher areas recognise objects. Fukushima built an artificial network with alternating ‘S-cells’ (simple, feature-detecting) and ‘C-cells’ (complex, spatially pooling) layers — the conceptual blueprint for convolutional neural networks.

Yann LeCun took the Neocognitron idea, combined it with backpropagation, and trained it on handwritten digits. LeNet-5, published in 1998, was deployed by banks to read cheque amounts — one of the first real commercial uses of a neural network. The convolutional principle — local receptive fields, shared weights, pooling — became the dominant paradigm for computer vision and remains so today.

1997 — Memory in Sequence: Long Short-Term Memory

Recurrent neural networks (RNNs) can process sequences by feeding their output back as input — a crude model of temporal memory. The problem is that during backpropagation through time, gradients either explode or vanish across long sequences, making it nearly impossible to learn long-range dependencies.

Sepp Hochreiter and Jürgen Schmidhuber introduced the Long Short-Term Memory (LSTM) architecture in 1997. The key innovation was a ‘cell state’ — a kind of conveyor belt that runs through the sequence largely unchanged — regulated by learned gates that decide what to keep, forget, and write. LSTMs could finally learn dependencies spanning hundreds of time steps, enabling practical speech recognition, language modelling, and handwriting recognition.

Why LSTMs Matter for the Brain–AI Story Gates are a direct computational analogy to neuromodulatory signals — the brain’s dopamine and acetylcholine systems modulate which information is stored or discarded in working memory. The parallel was no accident; Hochreiter was deeply influenced by cognitive science.

2006 — The Deep Learning Renaissance

By the early 2000s, support vector machines and other shallow methods dominated machine learning benchmarks, and deep networks were once again unfashionable — a second AI winter had settled on neural approaches.

Geoffrey Hinton and Ruslan Salakhutdinov changed the conversation in 2006 with their Science paper on deep belief networks, showing that deep architectures could be trained effectively using an unsupervised pre-training stage (training one layer at a time with restricted Boltzmann machines) followed by supervised fine-tuning. This reframing — that deep networks needed careful initialisation, not just more compute — reinvigorated the field.

Around the same time, Hinton’s group and others began exploiting GPUs for neural network training, providing orders-of-magnitude speedups that made large experiments feasible for the first time.

2012 — AlexNet and the ImageNet Moment

The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) was a rigorous annual competition on classifying 1.2 million images into 1,000 categories. In 2012, a team led by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton entered AlexNet — a deep CNN with five convolutional layers, trained on two GPUs using ReLU activations and dropout regularisation.

AlexNet achieved a top-5 error rate of 15.3%, compared with 26.2% for the second-place entry. The margin was so large that the field effectively reoriented overnight. Computer vision became synonymous with deep learning, and the race to build bigger, deeper networks began in earnest at every major technology company.

2014–2017 — Attention and the Transformer

Neural Attention: a Glimpse at Cognitive Science

Attention mechanisms were first introduced in neural machine translation by Bahdanau et al. (2015). Instead of compressing an entire input sequence into a fixed-length vector, attention allowed the decoder to ‘look back’ at all encoder states and weight them by relevance at each decoding step — a computational echo of how human attention selectively focuses on the most task-relevant parts of a scene.

In 2017, Vaswani et al. at Google published ‘Attention Is All You Need’, introducing the Transformer architecture. The Transformer abandoned recurrence entirely, relying solely on self-attention to model relationships between all positions in a sequence simultaneously. This was parallelisable in ways RNNs are not, making it possible to train on vastly larger datasets.

The Transformer became the universal backbone of modern AI: language models (GPT, BERT, LLaMA), image models (ViT), audio models (Whisper), and multimodal systems (CLIP, Gemini) all build on it.

Attention and the Brain The analogy between neural attention mechanisms and the brain’s top-down attentional systems (prefrontal cortex modulating sensory processing) is contested but productive. Both systems selectively amplify relevant signals and suppress noise — a design principle that evolution and engineers converged on independently.

2018–Present — The Era of Large Language Models

BERT (Bidirectional Encoder Representations from Transformers, Google, 2018) and GPT-1 (Generative Pre-trained Transformer, OpenAI, 2018) demonstrated that pre-training large Transformers on unlabelled text, then fine-tuning on specific tasks, produced state-of-the-art results across virtually every natural-language benchmark.

GPT-3 (2020, 175 billion parameters) showed that simply scaling up — more parameters, more data, more compute — produced emergent capabilities that nobody explicitly programmed: arithmetic, code generation, few-shot learning from examples in the prompt. GPT-4, Claude, Gemini, and LLaMA followed, each pushing the frontier further.

Today’s LLMs are not explicitly brain-inspired in their architecture, but the field increasingly looks to cognitive science for guidance on reasoning, planning, tool use, and memory — capabilities that pure scaling alone has not fully delivered.

What Remains Unsolved: Where Neuroscience Still Leads

Despite spectacular progress, AI systems differ from biological intelligence in important ways that remain active research frontiers:

  • Humans learn to recognise a cat from a handful of examples; deep networks need millions. Neuroscience research on one-shot learning and memory consolidation during sleep is actively informing machine-learning alternatives. Sample efficiency
  • The brain learns new tasks without catastrophically forgetting old ones. Most neural networks suffer catastrophic forgetting — a problem the field calls ‘catastrophic interference’. Continual learning
  • The human brain runs on roughly 20 watts. A large-scale LLM training run consumes megawatts. Neuromorphic chips (Intel Loihi, IBM TrueNorth) attempt to close this gap by mimicking spiking neural dynamics. Energy efficiency
  • Humans readily distinguish correlation from causation. Current LLMs largely pattern-match. Yoshua Bengio and others argue that integrating causal graph structures — inspired by how the brain models the world — is essential for the next leap. Causal reasoning
  • Human intelligence is grounded in physical experience. Robotics and embodied AI research draw on neuroscience of the motor cortex, cerebellum, and proprioception. Embodied cognition

History Timeline: 80 Years of Brain-Inspired AI

YearMilestoneSignificance
1943McCulloch–Pitts NeuronFirst mathematical model of a neuron; Boolean logic in biological circuits.
1949Hebb’s RuleDonald Hebb proposes synaptic learning: ‘Neurons that fire together, wire together.’
1957PerceptronFrank Rosenblatt builds the first trainable artificial neuron — pattern recognition born.
1969Minsky & Papert CritiquePerceptrons exposed as limited (no XOR). AI funding collapses — first AI winter.
1974Backpropagation DerivedPaul Werbos derives backprop in his PhD thesis; largely ignored at the time.
1980NeocognitronFukushima’s hierarchical visual model — direct ancestor of convolutional nets.
1986Backprop PopularisedRumelhart, Hinton & Williams revive backprop; multi-layer networks become feasible.
1989LeNetLeCun applies CNNs to digit recognition. Banks start using it for cheque processing.
1997LSTMHochreiter & Schmidhuber solve the vanishing gradient — sequential memory in RNNs.
2006Deep Belief NetsHinton & Salakhutdinov show deep networks can be pre-trained — reignites deep learning.
2012AlexNetKrizhevsky’s CNN wins ImageNet by a huge margin. The modern deep learning era begins.
2014GANsGoodfellow et al. invent Generative Adversarial Networks — machines learn to create.
2017TransformerVaswani et al. publish ‘Attention Is All You Need’ — language AI is transformed.
2018BERT & GPT-1Large pre-trained language models arrive. Transfer learning goes mainstream.
2020GPT-3175B parameters, few-shot learning at scale — AI enters the general public’s awareness.
2022ChatGPT / DiffusionConversational AI and image generation reach consumer adoption at record speed.
2023–25LLM Agents & ReasoningModels reason, use tools, and act autonomously — echoing cognitive science goals.

Conclusion

The arc from McCulloch–Pitts to ChatGPT is not a story of machines gradually replacing human minds — it is a story of human minds trying, again and again, to understand themselves by building simplified models of their own cognition. Every major breakthrough in AI has either borrowed directly from neuroscience (CNNs from visual cortex, LSTMs from working memory, attention from cognitive psychology) or forced neuroscientists to re-examine what the brain actually does.

The dialogue is far from over. As AI systems grow more capable, they are increasingly used as tools for studying biological intelligence — training models to predict neural activity, comparing internal representations to fMRI data, and using reinforcement learning to understand reward circuits. The brain inspired AI, and AI is now helping us understand the brain.

Whether this virtuous cycle leads to artificial general intelligence, or simply to increasingly useful and efficient tools, depends on questions that are as much philosophical as technical. What matters now is that practitioners understand the history — because the constraints and insights of that history are still shaping the models they build and use every day.

Enjoyed this deep dive? ToolTechSavvy covers AI tools, cloud platforms, and developer workflows for practitioners who want the full picture — not just the hype. Read more at  tooltechsavvy.com

Leave a Comment

Your email address will not be published. Required fields are marked *