Why Non-Linearity
Without non-linear activation functions, a neural network of any depth is mathematically equivalent to a single linear transformation. Non-linearity is what gives networks the ability to learn complex patterns, curved decision boundaries, and hierarchical representations.
The universal approximation theorem guarantees that networks with non-linear activations can approximate any continuous function — but only if that non-linearity is present.
↓ Continue reading to unlock the evaluation ↓