The diversity of models
- Apr 14, 2025
- 4 min read
There are a huge number of neural network architectures and models. In practice, it is impossible to list them all, as new ones or variations of existing models are regularly created. However, they can be grouped into broad families and the most common, recognized, and influential architectures can be cited. Here is a (non-exhaustive) list organized by category:
1. Basic Neural Networks
Perceptron
The initial model, proposed by Frank Rosenblatt in 1957.
Conceptual basis of many neural networks.
Multi-Layer Perceptron (MLP)
Architecture composed of dense layers (fully connected).
Each neuron is connected to all neurons in the next layer.
Used in many simple classification or regression cases.
2. Convolutional Neural Networks (CNN)
LeNet (1998, Yann LeCun)
One of the first CNNs applied to handwritten digit recognition (MNIST).
AlexNet (2012, Alex Krizhevsky et al.)
First CNN to demonstrate breakthrough performance in image recognition (ImageNet).
VGG (2014, Simonyan & Zisserman)
Deep networks using stacked 3x3 convolutional layers.
GoogLeNet (Inception) (2014, Szegedy et al.)
Inception modules allow you to mix different filter sizes in the same layer.
ResNet (2015, He et al.)
Introduces skip connections to facilitate learning very deep networks.
DenseNet (2016, Huang et al.)
Dense connections between layers to encourage feature reuse.
MobileNet (2017, Howard et al.)
Lightweight CNN for mobile and embedded systems (depthwise and pointwise convolution separations).
EfficientNet (2019, Tan & Le)
Systematic scaling of image depth, width and resolution to optimize the performance/complexity ratio.
3. Recurrent Neural Networks (RNN)
Simple RNNs
Sequential information propagation using loops.
Sometimes struggle with long time dependencies (gradient problem).
LSTM (Long Short-Term Memory) (1997, Hochreiter & Schmidhuber)
Introduces gates to solve the gradient dissipation problem.
Widely used for natural language processing, time series modeling.
GRU (Gated Recurrent Unit) (2014, Cho et al.)
Simplifies the LSTM structure while maintaining performance on sequences.
Bidirectional RNN, LSTM, GRU
Process sequences in both directions (past and future).
4. “Transformer” networks and variants
Transformer (2017, Vaswani et al.)
Relies on an attention mechanism to process sequences, without recursion or convolution.
Revolutionized natural language processing (NLP).
BERT (Bidirectional Encoder Representations from Transformers) (2018, Devlin et al.)
Transformer-based model, trained bidirectionally for NLP tasks.
GPT (Generative Pre-trained Transformer) (2018, Radford et al., OpenAI)
Series of models (GPT, GPT-2, GPT-3, GPT-3.5, GPT-4, etc.) for text generation and other tasks.
Use a one-way autoregressive transformer.
RoBERTa, DistilBERT, XLNet, T5, etc.
Variants or improvements of BERT/GPT, each optimized for specific use cases (speed, performance, size, etc.).
5. Autoencoder type neural networks
Classic autoencoder
Reduces the dimension by a bottleneck and tries to reconstruct the input.
Useful for anomaly detection, compression or noise filtering.
Variational Autoencoder (VAE) (2013, Kingma & Welling)
Probabilistic approach to modeling the latent distribution.
Used in image generation, data interpolation, etc.
Denoising Autoencoder (DAE)
Learn how to remove artificially added noise to reconstruct the clean image.
Sparse Autoencoder
Uses a regularization penalty to force sparsity of representations.
Convolutional Autoencoder
Autoencoder applied to images, including convolution and deconvolution layers.
6. Generative Adversarial Networks (GAN)
Basic GAN (2014, Goodfellow et al.)
Two networks (a generator and a discriminator) train competitively to generate samples (images, sounds, etc.).
DCGAN (Deep Convolutional GAN)
Use of convolutions for the generator and discriminator, particularly effective for images.
WGAN (Wasserstein GAN) , WGAN-GP
Introduce more stable metrics (Wasserstein distance) for training.
CycleGAN (2017, Zhu et al.)
Allows translation of unpaired images (e.g. horse ↔ zebra) without the need for exact pairs.
StyleGAN (2018, Karras et al.)
Specializing in generating ultra-realistic facial images, working on the latent style.
BigGAN (2018, Brock et al.)
Large-scale GAN trained on ImageNet with excellent generation performance.
7. Graph Neural Networks (GNN)
Graph Convolutional Network (GCN)
Convolution adapted to the structure of a graph.
Used for node classification, link prediction, etc.
Graph Attention Network (GAT)
Uses the attention mechanism to weight the influence of neighbors in a graph.
GraphSAGE
Samples and aggregates neighbors to handle very large graphs.
8. Models for time signal processing and time series
WaveNet (2016, DeepMind)
Convolutional dilation model for audio wave generation (voice, music).
Temporal Convolutional Network (TCN)
Dilated convolution-based approach for capturing long time intervals.
Transformer applied to time series
Transformer extensions (Informer, LogTrans, etc.) to specifically handle long time series.
9. Reservoir networks and other less common approaches
Echo State Network (ESN)
Recurrent reservoir network where only the output is driven (recurrent weights remain fixed).
Liquid State Machine (LSM)
Inspired by biological models of brains (spiking neural networks).
10. Probabilistic and Bayesian Neural Networks
Bayesian Neural Networks
Incorporate uncertainty in weights by using probability distributions rather than fixed values.
MC Dropout
Using dropout in testing to approximate Bayesian behavior.
11. Diffusion Models
Denoising Diffusion Probabilistic Model (DDPM)
A diffusion process gradually drowns the image in noise, then learns to generate the inverse.
Adopted for image generation (e.g. DALL-E 2, Stable Diffusion).
Score-Based Generative Models
Learn a score function to estimate the gradient of the log of the data density.
Latent Diffusion Models
Apply diffusion in latent space (like Stable Diffusion) to make training and generation more efficient.
12. Other variants and research trends
Capsule Networks (CapsNets, 2017, Hinton et al.)
"Capsules" attempt to model hierarchies of object parts.
Promising aspect for image recognition with less data.
Neural Ordinary Differential Equations (Neural ODEs)
Interpret forward propagation as solving an ordinary differential equation.
Siamese Networks
Two (or more) networks sharing the same weights to compare representations (image search, recognition of similar entities).
Meta-Learning / Few-Shot Learning
Learn how to learn from a few samples (eg. MAML, Prototypical Networks).
Continual Learning
Learning by transfer and continuous updating of knowledge without forgetting previous tasks.
Conclusion
It is impossible to provide an exhaustive list of “all” neural models, as the field is constantly evolving.
The architectures cited represent the largest or most influential families (historically or currently).
Each family comes in multiple variations and improvements that appear regularly.

Comments