Neural Networks / MLPs

Best for: Nonlinear supervised learning

How it works

$$a^{(l)}=\sigma\!\left(W^{(l)}a^{(l-1)}+b^{(l)}\right)$$

Stacks affine layers $z^{(l)}=W^{(l)}a^{(l-1)}+b^{(l)}$ followed by a nonlinear activation $a^{(l)}=\sigma(z^{(l)})$, composing many such layers to approximate complex functions. Training minimises a loss $L$ over the data by gradient descent, with gradients computed by backpropagation — the chain rule applied layer by layer: $\delta^{(l)}=(W^{(l+1)})^\top\delta^{(l+1)}\odot\sigma'(z^{(l)})$ and $\frac{\partial L}{\partial W^{(l)}}=\delta^{(l)}\bigl(a^{(l-1)}\bigr)^\top$.

Common fields

Tabular DL · forecasting · embeddings · scientific ML