Skip to content

History of AI - Sebastian Bauer

A) - Recorded radio lecture

  • Terminology used, "brains"
    • Machines could be described as a brain, it depends about how we use them
    • Universality, a computer is a universal machine -> Can replace any machine of a certain very wide class
      • A computer "only" has to be programmed to imitate a brain and it also will be a brain
      • Storage capacity has to be sufficient; Same for speed/performance
      • Larger machine needed; Not available yet
      • Ne need in increased complexity; Only more performance needed / scaling
        • This already applies to current AI development, more GPU processing power; data centers; data needed
      • The more complicated the machine to imitate, the more has to be its programming
        • Does not really apply to the current state of AI
      • -> Computers have to be programmed to follow instructions to act like a brain
  • "Free will" is required to behave like a brain; A programmed computer is deterministic and can't reproduce that
    • Some random element is required to imitate a human brain / thinking
  • Already thoughts about alignment, machines surpassing human intelligence
  • The computer only follows instructions, there is no thought process

b) A brief history of deep learning

  • Multilayer perceptron -> Feedforward NN / Fully connected
    • Each unit fully connected to other ones
    • -> Non-linear activation functions
    • Multiple layers and non-linear activation functions enable us to solve xor problems
    • -> Hard to train, no training algorithm yet
    • Universal function approximators
  • 2nd AI Winter - late 1990s and 2000s
    • Probably due to popularity of classic ML: SVMs and Random Forests
    • NNs expensive to train; GPUs were not used to train
      • Changed after GPUs were used for the training in 2012
        • ImageNet trained on GPUs
  • DL
    • Representation learning capabilities define what DL is all about
      • Raw data -> Automatically discovers the representation needed for detection or classification
      • Not only the size of the network
    • 1989 - CNNs: Representation-/Feature learning to multilayer perceptron classifier
      • Can be considered as the most popular form of DL
    • Introduced new problems: Vanishing and exploding gradients
      • For example in deeper networks or RNNs (solution: LSTMs)
  • Hardware and software
    • Hardware
      • Specialized AI hardware, for example TPUs
      • ARM-based inference
      • Specialized GPUs for DL
    • Software
      • Theano - One of the first DL frameworks
      • PyTorch
      • Tensorflow
      • JAX (Numpy on GPU)
  • Current research trends
    • Self-supervised learning
      • Contrastive learning (I basically use this in my thesis as a fingerprinting gate)
    • Graph neural networks
      • Text/image data -> Graph structured data
    • Transformers model
      • LLMs
        • Distillation
      • Vision Transformer (ViT)

Inventions not mentioned

  • Multi modality
  • Agents
  • Reinforcement Learning
  • Diffusion based models
  • Deepfakes
  • Text to Video / Image to Video...

Biological neuron to the first learning algorithm for a single-layer neural network

  • McCulloch & Pitts
    • The binary artificial neuron
      • Can compute logical functions as and, not and or
    • Weights are fixed, a neuron is modeled as a weighted sum + threshold
  • Norbert Wiener
    • 1948, introduced the idea that biological and mechanical systems can be understood as feedback control system, that adapt based on error signals
    • -> Adjust parameters based on error
  • Claude Shannon
    • Information theory, treat inputs as signals carrying information
    • Use probability/statistics to model recognition and classification
  • Frank Rosenblatt: Perceptron
    • Combines all of the above
      • Artificial neuron
      • Feedback / Adaption
      • Statistical pattern recognition
      • Synaptic plasticity (Weight change)

AI Winter (Perceptrons book)

  • Book published in 1969
    • Subject was about the perceptron model by Frank Rosenblatt
  • It is claimed that pessimistic predictions made by the authors lead to the so called AI winter of the 1980s
    • Unable to perform xor-operations
      • Even though, multi layer perceptrons can solve xor-problems
        • But hard to build and train given other limitations in computational resources

c) Alfredo Canziani's course

  • Nothing really new compared to the previous lectures

AI history events

McCulloch & Pitts (1943)

  • Neurons acting as logic gates; Mathematical formulation of biological neurons
    • and, or, not gates
    • Binary neuron
      • As a computational unit
      • \(s=\sum_{n=1}^Nf_nW_n\)
        • Weighted sum of features
      • \(a=[s\gt 0]\)
        • Activation
        • if \(s>0\) returns 1; \(<0\) 0
  • Threshold has to be reached to fire; Weighted sum of inputs
  • Piecewise function definition: \(f(x)=1\) if \(x_1+x_2\geq t\); else 0
  • Different thresholds for different types of gates
    • 1.5 for and
    • 0.5 for or
    • -0.49 for not
  • Only inspired by biological neurons; Does not mirror them

Rosenblatt's Perceptron (1958)

  • Key innovation: Weights can be learned from the data
  • Learning algorithm to find weights and thresholds automatically
  • Inputs are a weighted sum and some activation function is applied afterwards
    • Activation function here is the threshold, neuron fires or doesn't
  • 1960: ADALINE: Differentiable neural model (Backpropagation can be used)
    • Update before the threshold; Can propagate the error back

Minsky and Papert - Critique of Perceptrons (1969)

  • Issue with perceptrons and ADALINE: Only binary classification
    • Unable to solve XOR problems
  • Hyperplane can only separate into 2 classes
  • -> Start of the first "AI Winter"
    • Perceptrons book: Very limited neurons, unable to solve xor-problems
    • Neural network research decreased

Backpropagation

  • 1986, Rumelhart and Hinton formulated it independently
  • Basis of all consequent neural network and DL progress
  • Efficiently trains NNs
  • There are also other algorithms
    • For example hebbian learning (No feedback; Connections get strengthen depending about usage / activations)

Deep Learning Revolution (2012)

  • ImageNet competition in 2012 made DL really popular
    • Competition: Which system can do the best classification
    • Majority of entries back then were classical ML methods
      • Manual feature engineering
    • ImageNet outperformed traditional methods by a large margin

Personal Reflection

  • How often did I pause the video to take notes and look up concepts? Did I understand the content without pausing, or did I need to pause frequently to understand the concepts?
    • There wasn't the need to pause frequently, as I'm already familiar with a lot of the presented content
    • Only had to stop the video to take notes
  • Which concepts were new to me, and which ones did I already know? Which concepts were the most difficult to understand, and which ones were easier? Which one didn't I understand at all?
    • I've never really thought about the existence of other learning algorithms aside backpropagation
    • What I'm still somewhat confused about is the definition of deep learning, as from some sources its just deep networks, for others its the representational part avoiding feature engineering
  • Which lecturer was easier to understand for me, and why? Did I prefer the style of one lecturer over the other? Did I find one lecture more engaging or informative than the other? Are there other lectures or ressources that helped me to understand the history of AI better?
    • Sebastians videos were by far superior in explaining the history and giving an overview of the AI landscape, but that's not really surprising as Alfredos lection 1 only covered the binary neuron
    • In teaching itself I prefer to start from the beginning and to not introduce a lot of new concepts at once
      • Sebastians lectures -> A lot of terms / architectures mentioned; Alfredo starts with the biological neuron and goes more in depth
  • Do you think that you are able to learn more about generative AI from lectures like these? Which activities in class do you think are more helpful for learning about generative AI? Do you prefer lectures, discussions, hands-on activities, or something else? How can we design the sessions in this course to make them more engaging and effective for learning about generative AI?
    • I prefer lectures combined with discussions with more visualizations, I'm not a fan of raw mathematical definitions without some hands-on example
    • As this is a introductionary course about GenAI the GenAI landscape should rather be covered in broad, not in depth, as this would require deeper theory lessons about AI/DL in general