The diversity of models

Apr 14, 2025
4 min read

There are a huge number of neural network architectures and models. In practice, it is impossible to list them all, as new ones or variations of existing models are regularly created. However, they can be grouped into broad families and the most common, recognized, and influential architectures can be cited. Here is a (non-exhaustive) list organized by category:

1. Basic Neural Networks

Perceptron
- The initial model, proposed by Frank Rosenblatt in 1957.
- Conceptual basis of many neural networks.
Multi-Layer Perceptron (MLP)
- Architecture composed of dense layers (fully connected).
- Each neuron is connected to all neurons in the next layer.
- Used in many simple classification or regression cases.

2. Convolutional Neural Networks (CNN)

LeNet (1998, Yann LeCun)
- One of the first CNNs applied to handwritten digit recognition (MNIST).
AlexNet (2012, Alex Krizhevsky et al.)
- First CNN to demonstrate breakthrough performance in image recognition (ImageNet).
VGG (2014, Simonyan & Zisserman)
- Deep networks using stacked 3x3 convolutional layers.
GoogLeNet (Inception) (2014, Szegedy et al.)
- Inception modules allow you to mix different filter sizes in the same layer.
ResNet (2015, He et al.)
- Introduces skip connections to facilitate learning very deep networks.
DenseNet (2016, Huang et al.)
- Dense connections between layers to encourage feature reuse.
MobileNet (2017, Howard et al.)
- Lightweight CNN for mobile and embedded systems (depthwise and pointwise convolution separations).
EfficientNet (2019, Tan & Le)
- Systematic scaling of image depth, width and resolution to optimize the performance/complexity ratio.

3. Recurrent Neural Networks (RNN)

Simple RNNs
- Sequential information propagation using loops.
- Sometimes struggle with long time dependencies (gradient problem).
LSTM (Long Short-Term Memory) (1997, Hochreiter & Schmidhuber)
- Introduces gates to solve the gradient dissipation problem.
- Widely used for natural language processing, time series modeling.
GRU (Gated Recurrent Unit) (2014, Cho et al.)
- Simplifies the LSTM structure while maintaining performance on sequences.
Bidirectional RNN, LSTM, GRU
- Process sequences in both directions (past and future).

4. “Transformer” networks and variants

Transformer (2017, Vaswani et al.)
- Relies on an attention mechanism to process sequences, without recursion or convolution.
- Revolutionized natural language processing (NLP).
BERT (Bidirectional Encoder Representations from Transformers) (2018, Devlin et al.)
- Transformer-based model, trained bidirectionally for NLP tasks.
GPT (Generative Pre-trained Transformer) (2018, Radford et al., OpenAI)
- Series of models (GPT, GPT-2, GPT-3, GPT-3.5, GPT-4, etc.) for text generation and other tasks.
- Use a one-way autoregressive transformer.
RoBERTa, DistilBERT, XLNet, T5, etc.
- Variants or improvements of BERT/GPT, each optimized for specific use cases (speed, performance, size, etc.).

5. Autoencoder type neural networks

Classic autoencoder
- Reduces the dimension by a bottleneck and tries to reconstruct the input.
- Useful for anomaly detection, compression or noise filtering.
Variational Autoencoder (VAE) (2013, Kingma & Welling)
- Probabilistic approach to modeling the latent distribution.
- Used in image generation, data interpolation, etc.
Denoising Autoencoder (DAE)
- Learn how to remove artificially added noise to reconstruct the clean image.
Sparse Autoencoder
- Uses a regularization penalty to force sparsity of representations.
Convolutional Autoencoder
- Autoencoder applied to images, including convolution and deconvolution layers.

6. Generative Adversarial Networks (GAN)

Basic GAN (2014, Goodfellow et al.)
- Two networks (a generator and a discriminator) train competitively to generate samples (images, sounds, etc.).
DCGAN (Deep Convolutional GAN)
- Use of convolutions for the generator and discriminator, particularly effective for images.
WGAN (Wasserstein GAN) , WGAN-GP
- Introduce more stable metrics (Wasserstein distance) for training.
CycleGAN (2017, Zhu et al.)
- Allows translation of unpaired images (e.g. horse ↔ zebra) without the need for exact pairs.
StyleGAN (2018, Karras et al.)
- Specializing in generating ultra-realistic facial images, working on the latent style.
BigGAN (2018, Brock et al.)
- Large-scale GAN trained on ImageNet with excellent generation performance.

7. Graph Neural Networks (GNN)

Graph Convolutional Network (GCN)
- Convolution adapted to the structure of a graph.
- Used for node classification, link prediction, etc.
Graph Attention Network (GAT)
- Uses the attention mechanism to weight the influence of neighbors in a graph.
GraphSAGE
- Samples and aggregates neighbors to handle very large graphs.

8. Models for time signal processing and time series

WaveNet (2016, DeepMind)
- Convolutional dilation model for audio wave generation (voice, music).
Temporal Convolutional Network (TCN)
- Dilated convolution-based approach for capturing long time intervals.
Transformer applied to time series
- Transformer extensions (Informer, LogTrans, etc.) to specifically handle long time series.

9. Reservoir networks and other less common approaches

Echo State Network (ESN)
- Recurrent reservoir network where only the output is driven (recurrent weights remain fixed).
Liquid State Machine (LSM)
- Inspired by biological models of brains (spiking neural networks).

10. Probabilistic and Bayesian Neural Networks

Bayesian Neural Networks
- Incorporate uncertainty in weights by using probability distributions rather than fixed values.
MC Dropout
- Using dropout in testing to approximate Bayesian behavior.

11. Diffusion Models

Denoising Diffusion Probabilistic Model (DDPM)
- A diffusion process gradually drowns the image in noise, then learns to generate the inverse.
- Adopted for image generation (e.g. DALL-E 2, Stable Diffusion).
Score-Based Generative Models
- Learn a score function to estimate the gradient of the log of the data density.
Latent Diffusion Models
- Apply diffusion in latent space (like Stable Diffusion) to make training and generation more efficient.

12. Other variants and research trends

Capsule Networks (CapsNets, 2017, Hinton et al.)
- "Capsules" attempt to model hierarchies of object parts.
- Promising aspect for image recognition with less data.
Neural Ordinary Differential Equations (Neural ODEs)
- Interpret forward propagation as solving an ordinary differential equation.
Siamese Networks
- Two (or more) networks sharing the same weights to compare representations (image search, recognition of similar entities).
Meta-Learning / Few-Shot Learning
- Learn how to learn from a few samples (eg. MAML, Prototypical Networks).
Continual Learning
- Learning by transfer and continuous updating of knowledge without forgetting previous tasks.

Conclusion

It is impossible to provide an exhaustive list of “all” neural models, as the field is constantly evolving.
The architectures cited represent the largest or most influential families (historically or currently).
Each family comes in multiple variations and improvements that appear regularly.

AI4Cryptos