Understanding RNNs
RNNs or the recurrent neural networks are the class of artificial neural nets where connections between nodes form a directed graph along a temporal sequence
Now, there are two important words here in this sentence —
- Directed graph
- Temporal Sequence
We will come back to these two words shortly. Let us say, we have a sentence as an input —
X = Harry Potter and Hermione Granger invented a new spell
Now, we can apply RNNs to identify people names from such sentences
A sample output for this sentence would be —
Y = 1 1 0 0 1 1 0 0 where each 1 signifies the presence of a person name in the sentence
Now let us label this kind of an input using the notation shown in below screenshot -
Now, there can be i number of sentences such as above for which we would need to find out the person names.
We can label each input and output per below notation -
To solve such problems first we need to represent such words. For this, we create a dictionary of words which stores every English word in a vector format. Let us say we are storing 10000 words in this dictionary. Now, we can represent every word that is there in the sentence using one-hot encoded vector which is 10000 word in length
We can create such vectors for all the inputs i in the data and can convert this problem now into a supervised learning problem
RNN Architecture
Let us now understand how such a problem will be handled by RNN. RNN will take input every word of the sentence into its hidden layer and would output if it is a person name or not. Also, unlike a normal neural network it passes the activation output from one neural net to next neural net which is already taking input next word that is coming in the sentence. This way it is taking some information from the previous neural net to include that into next neural network for the upcoming word. The neural net taking each word of the sentence as an input one by one is what we refer as temporal sequence
Also, there is a flow of information from left to right in a forward propagation step of RNN, which is what has been referred as formation of directed graph between nodes. This directed graph also is present in the backward propagation step shown in red arrows below -
This is how RNNs persist the information throughout the network. Below diagram shows the forward propagation step with black arrows and back propagation in red arrows -
In back propagation the network relearns and tries to minimise the errors that it has made till current step. This happens by minimising a loss function. This is the same loss function which is used in a logistic regression problem called logistic regression loss or cross entropy loss -
This is how RNNs function, they are nothing but a looped version of simple neural networks.
One of the disadvantages of these RNNs are that these are uni-directional in nature. This means, if I have two sentences -
- He said, “Teddy Roosevelt was USA president”
- He said, “Teddy Bears were his favourite soft toy in childhood”
Both of the above sentences carry Teddy word, and in the sentence 1 “Teddy Roosevelt is a person name while in sentence 2, “Teddy Bears” is not a person name. These can be wrongly classified with uni-directional RNN but are suitable use-case for bidirectional RNNs.