CNNs
- ConvNet
- ResNet
- EfficientNet
Best for: Image classification and feature extraction Aliases: ConvNet, ResNet, EfficientNet
How it works
$$y(i,j)=\sum_{m,n}w(m,n)\,x(i-m,j-n)+b$$A conv layer slides learnable kernels across the input computing the cross-correlation $y(i,j)=\sum_{m,n}w(m,n)\,x(i-m,j-n)+b$ followed by a nonlinearity (ReLU). Stacking convolutions with pooling (e.g. max over local windows) shrinks spatial size and builds a hierarchy from edges up to object parts, while translation-invariant weight sharing keeps parameter counts low. Very deep variants (ResNet, EfficientNet) add skip connections $h^{(l+1)}=\mathcal F(h^{(l)})+h^{(l)}$ so gradients flow through, letting hundreds of layers train. A final fully-connected head maps pooled features to class logits trained with cross-entropy.
When to use
Image classification and feature extraction where inductive biases of locality and translation invariance help.
Watch out
Need strong augmentation; data-hungry at scale; ViTs outperform on very large datasets.
Common fields
Medical imaging · manufacturing · retail