Learning Notes on LSTM Using Tensorflow
Preparation
- Data source
- Data is text8.zip downloaded from Mattmahoney.net
- Data is a 100M characters, the first 1000 is prepared for validation, the remaining is for training
- Since char is not able to do numerical calculation, each char will be finding a mapping id to represent itself.
- Char’s id is its numerical value substracting the first ascii char’s numerical value
- Batch generation
- A batch is a group of char feed into Tensorflow graph at the same time
- The batch size is defined as a constant number (64) no matter it is training text or a validation text
- The way the tutorial extracts a batch is a technique of uniformly extracting: e.g:
- In the following text, text length is 130, batch size is 5, segment is 130/5 => 26
- Cursor content in the beginning is [0, 26, 52, 78, 104]
- Cursor content in the next batch is [1, 27, 53, 78, 105]
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz a<-cursor a<-cursor a<-cursor a<-cursor a<-cursor
- The uniformity is achieved by segregating the whole long text into segments
- Each time when there is batch generation request, we use the cursor list to find the char and then generate the batch
- In my example, the first batch is [aaaaa], the next batch is [bbbbb] …etc
- The whole data batch generation can be forever even though the cursor moves beyond the segment or text boundary
- Because we will modulo the index so when boundary is passed, the index will go back from the beginning
- But the real batch is a 2D tensor just like this:
batch index 0 [0, 0, 0, 1, 0, 0, 0, 0, 0] batch index 1 [0, 0, 0, 0, 1, 0, 0, 0, 0] batch index 2 [0, 0, 1, 0, 0, 0, 0, 0, 0] batch index 3 [0, 0, 0, 0, 1, 0, 0, 0, 0] batch index 4 [0, 0, 0, 1, 0, 0, 0, 0, 0]
- The batch is a tensor with shape to be (batch_size, vocabulary_size) i.e (64, 26+1)
Training Procedure
- Aribtrarily defined training steps to be 7001 by course teacher
- For each step, the training code fetch a number of unrolling batches to the graph. This is because LSTM’s RNN nature. It needs to keep previous chars for calculation
- For example, in one of the training steps, we fed the graph with a list of length num_unrolling+1
[batches[0], batches[1], batches[2], batches[3] .......... batches[num__unrolling]]
- Each batches[i] contains 64 1-hot vectors converted from characters extracted from source uniformly distributed
*a* b c d __first step__
- ` d e f g second step`
- Feeding the graph
feed_dict[train_data[i]] = batches[i]
This is in the session training looptrain_data.append(tf.placeholder(...blah...))
This is in the graph- All I think is that code is a bit of magic. All we can imagine is using this high level interpretation:
train_data[i] = batches[i]