Tensorflow Mechanics 101 Learning Notes
What we are doing
Use Tensorflow to build a 2-layer neural network and training classification models to recognize digits from 1 to 10
Training Flow Overview
- find out “tensorflow/examples/tutorials/mnist/fully_connected_feed.py”
- find out “tensorflow/examples/tutorials/mnist/mnist.py”
- File in 1. does the overall training flow operation
- File in 2. implements the function to ‘inference’, ‘training’ and ‘evaluation’
- Training is done use ‘Stochastic Gradient Descent’
- The key-point in Tensorflow is: we need to define every tensor operation only then we can construct the graph using the defined tensor operation and running the session.
Training One-Iteration In Short
# Part 1. forward tensor
hidden1 = tf.nn.relu(tf.matmul(images, weights1) + biases1)
hidden2 = tf.nn.relu(tf.matmul(hidden1, weights2) + biases2)
logits = tf.matmul(hidden2, weights3) + biases
# Part 2. getting cross entropy
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels)
loss = tf.reduce_mean(cross_entropy)
# Part 3. getting gradient
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_op = optimizer.minimize(loss)
The above is a unit-step for training. Each training iteration will run through the above 3 parts in order
Training Overall Flow In Short
# Part 1. setting tensors operation for the graph
with tf.Graph().as_default():
logits = mnist.inference(image_input, flags...)
loss = mnist.loss(logits, labels)
train_op = mnist.training(loss, flag.learning_rate)
eval_correct = mnist.evaluation(logits, labels)
summary_op = tf.merge_all_summaries()
init = tf.initialize_all_variables()
saver = tf.train.Saver()
sess = tf.Session()
summary_writer = tf.train.SummaryWriter(...)
sess.run(init) # only run once!!
# Part 2. let the tensor flow in the graph
for step in xrange(max_step):
feed data ...
_, loss_value = sess.run([train_op, loss], feed_dict = data_dict)
summary_writer.add_summary(summary_str, step)
summary_writer.flush()
The above is a smapshot of the whole training flow. I felt it is a little bit confusing as to all these tensor operations. Are they actual multi-dimensional vectors? Or are they actually function objects that can be called during the graph running session?
API Notes
- tf.name_scope
- tf.truncated_normal
- meaning: Generate a normal distribution into 1-D array
- usage note: Parameter shape stores the generated integer values
- tf.nn.relu
- meaning: Computes rectified linear max(features, 0)
- usage note: ‘features’ in paramter is of shape 1-D?
- tf.matmul
- tf.nn.sparse_softmax_cross_entropy_with_logits
- meaning: Computes softmax cross entropy between logits and labels. Measure probability error
- usage note: MUST not input a softmax-ed value because there is a softmax operation inside this function
- tf.train.GradientDescentOptimizer
- tf.merge_all_summaries
- tf.initialize_all_variables
- meaning: Returns an Op that initializes all variables