In this article you will learn about Activation Functions with real life analogy .You will also get answer that why they are needed and what their types.

2023_04_MicrosoftTeams-image-18.jpg

Activation functions are an important component of neural networks. They help to determine the output of a neural network by applying mathematical transformations to the input signals received from other layers in a network. Activation functions allow for complex non-linear relationships between input and output data points. 

The choice of which function is appropriate depends largely upon the problem one is trying to solve with their model and any constraints imposed by hardware capabilities or time/space limitations. However this article will guide you on which activation function to use and when. You will also learn about activation functions with real-life analogy.

Table of contents

  • What are activation functions?
  • Real-life analogy for activation functions
  • Why use activation functions?
  • Types of activation functions
  • How to choose activation functions?
  • Activation function Python code
  • Summary chart
  • Summary table

What are Activation Functions?

neural network activation function is a function that introduces nonlinearity into the model.

A neural network has multiple nodes in each layer, and in a fully connected network, every node in one layer is connected to every node in the next layer. First, let’s look at computing the value of the first neuron in the second layer. Each neuron in the first layer is multiplied by a weight (the weight is learned by training), the multiplied values ​​are added, and the sum is added to the bias (the bias is also learned).

Learn more – What is Deep Learning?

Explore DL courses

Real-life analogy for Activation Functions

Imagine that a neural network is a hose:

It takes to water (takes some input), carries it somewhere (modifies your input), and pushes the water out (produces some output).

Without an activation function, your hose will act more like a steel pipe: fixed and inflexible. Sometimes that’s good enough. Nothing wrong with using a pipe to deliver your water:

A rigid steel pipe won’t fit, no matter how you rotate it. An activation function is handy here because it allows your function to be more flexible.

In this case, a neural net with an activation function would act like a plastic garden hose. You can bend it to your specific needs and carry your water to a lot more places that are impossible to reach with a steel pipe:

So, the purpose of an activation function is to add flexibility to your hose (nonlinearity to your neural net).

Free Deep Learning Courses from Top e-learning Platforms

Why use activation functions?

1. Activation functions’ main objective is to add non-linearities into the network so that it can model more intricate and varied interactions between inputs and outputs. In the absence of activation functions, the network would only be capable of performing linear transformations, which cannot adequately represent the complexity and nuances of real-world data. Since neural networks need to implement complex mapping functions, non-linear activation functions must be used to introduce the much-needed nonlinearity property that allows approximating any function. 

2. Normalizing each neuron in the network’s output is a key benefit of utilizing activation functions. Depending on the inputs it gets and the weights associated to those inputs, a neuron’s output can range from extremely high to extremely low. Activation functions make ensuring that each neuron’s output falls inside a defined range, which makes it simpler to optimise the network during training.

Types of Activation Functions

Sigmoid activation function

The sigmoid activation function is a mathematical function used in artificial neural networks to classify information. It maps any input onto a value between 0 and 1, which can then be interpreted as either true or false. A common example of this is when an image recognition system needs to decide whether an object in the image is a cat or not; if the output from the sigmoid activation function for that particular object exceeds 0.5, then it’s classified as being “cat-like”; otherwise, it isn’t. The advantage of using this activation function over others lies in its ability to smooth out data points so that small variations don’t affect results too much – making predictions more reliable overall. 

Note: This function suffers from vanishing gradient problems.

Tanh activation function

The tanh (hyperbolic tangent) activation functions are similar but have some distinct differences compared with sigmoid: instead of mapping inputs onto values between -1 and 1 rather than just 0/1 like Sigmoid do – allowing for more nuance when classifying objects into categories based on their similarity scores across all features considered by the network at once (i.e., multi-dimensional classification). Tanh also has better gradient properties than Sigmoid functions – meaning they allow faster learning rates during training because gradients can be propagated back through layers with less distortion due to curvature effects along each axis (as opposed to flat lines like those produced by Sigmoid). This makes them ideal for deep learning applications where accuracy matters most! 

Note: This function suffers from vanishing gradient problems.

 

Softmax activation function

Softmax functions are often written as a combination of multiple sigmoid. We know that Sigmoid returns a value between 0 and 1. This can be treated as the probability of a data point belonging to a particular class. Therefore, sigmoid are often used for binary classification problems.

The softmax function can be used for multiclass classification problems. This function returns the probability of a data point belonging to each unique class. Here is the formula for the same −

ReLU activation function

In the case of hidden layers, Relu is the most effective option to use. It is computationally very effective. It also suffers from a vanishing gradient problem as if the value is less than 0. Then the output will be 0 means constant.

Note: If you need more clarification about your choice of activation function, especially for hidden layers, go for the Relu function.

Leaky ReLU activation function

Leaky ReLU is the most popular and effective way to solve the dying ReLU problem. Adds a small slope(as shown in fig.) in the negative direction to prevent ReLU problems from disappearing. Leaky Relu is a variant of ReLU. Instead of being 0 for z < 0, leaky ReLUs allow a small constant non-zero gradient α (typically α = 0.01).

Exponential linear units (ELU)

The Exponential Linear Unit (ELU) function is an AF that is also used to speed up the training of neural networks (similar to the ReLU function). The main advantage of the ELU function is that using identities for positive values ​​solves the vanishing gradient problem and improves the learning properties of the model.

Where” “is the ELU hyperparameter, which is normally set to 1.0, and controls the saturation point for net negative inputs. The ELU function does have one drawback, though. not centred on zero.

ELU has a negative value, which brings the average unit activation closer to zero, reduces computational complexity, and improves learning speed. ELU is a great alternative to ReLU. Reduce the bias shift by bringing the average activation closer to zero during training.

How to choose activation functions?

ConsiderationActivation Function
Non-linearitySigmoid, Tanh, ReLU, Leaky ReLU, ELU, SELU
DerivabilitySigmoid, Tanh, ReLU, Leaky ReLU, ELU, SELU
Range of output valuesSigmoid, Softmax
Computational efficiencyReLU, Leaky ReLU, ELU, SELU
SaturationReLU, Leaky ReLU, ELU, SELU

Other points to remember

  • If the network is being used for binary classification, a sigmoid function with an output range between 0 and 1 would be suitable.
  • For multiclass classification-Softmax activation function.
  • For other tasks such as anomaly detection, recommendation systems, or reinforcement learning, other activation functions such as the ReLU or the tanh functions may be used, depending on the specifics of the problem.
  • Some activation functions, such as sigmoid and tanh, may saturate at extreme values, leading to slower learning. In such cases, it may be better to use a function that does not saturate, such as ReLU.
  • For the hidden layer the best choice would be ReLU

Note: Other activation functions are available besides those listed here, and the choice of the optimal activation function may depend on the specific problem and neural network architecture.