Why Non-Linearity

Without non-linear activation functions, a neural network of any depth is mathematically equivalent to a single linear transformation. Non-linearity is what gives networks the ability to learn complex patterns, curved decision boundaries, and hierarchical representations.

The universal approximation theorem guarantees that networks with non-linear activations can approximate any continuous function — but only if that non-linearity is present.

↓ Continue reading to unlock the evaluation ↓