Tensorflow Mechanics 101 Learning Notes

What we are doing

Use Tensorflow to build a 2-layer neural network and training classification models to recognize digits from 1 to 10

Training Flow Overview

find out “tensorflow/examples/tutorials/mnist/fully_connected_feed.py”
find out “tensorflow/examples/tutorials/mnist/mnist.py”
File in 1. does the overall training flow operation
File in 2. implements the function to ‘inference’, ‘training’ and ‘evaluation’
Training is done use ‘Stochastic Gradient Descent’
The key-point in Tensorflow is: we need to define every tensor operation only then we can construct the graph using the defined tensor operation and running the session.

Training One-Iteration In Short

# Part 1. forward tensor
hidden1 = tf.nn.relu(tf.matmul(images, weights1) + biases1)
hidden2 = tf.nn.relu(tf.matmul(hidden1, weights2) + biases2)
logits = tf.matmul(hidden2, weights3) + biases

# Part 2. getting cross entropy
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels)
loss = tf.reduce_mean(cross_entropy)

# Part 3. getting gradient
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_op = optimizer.minimize(loss)

The above is a unit-step for training. Each training iteration will run through the above 3 parts in order

Training Overall Flow In Short

# Part 1. setting tensors operation for the graph
with tf.Graph().as_default():
  logits = mnist.inference(image_input, flags...)
  loss = mnist.loss(logits, labels)
  train_op = mnist.training(loss, flag.learning_rate)
  eval_correct = mnist.evaluation(logits, labels)
  summary_op = tf.merge_all_summaries()
  init = tf.initialize_all_variables()
  saver = tf.train.Saver()
  sess = tf.Session()
  summary_writer = tf.train.SummaryWriter(...)
  sess.run(init)  # only run once!!

  # Part 2. let the tensor flow in the graph
  for step in xrange(max_step):
    feed data ...
    _, loss_value = sess.run([train_op, loss], feed_dict = data_dict)
    summary_writer.add_summary(summary_str, step)
    summary_writer.flush()

The above is a smapshot of the whole training flow. I felt it is a little bit confusing as to all these tensor operations. Are they actual multi-dimensional vectors? Or are they actually function objects that can be called during the graph running session?

API Notes

tf.name_scope
tf.truncated_normal
- meaning: Generate a normal distribution into 1-D array
- usage note: Parameter shape stores the generated integer values
tf.nn.relu
- meaning: Computes rectified linear max(features, 0)
- usage note: ‘features’ in paramter is of shape 1-D?
tf.matmul
tf.nn.sparse_softmax_cross_entropy_with_logits
- meaning: Computes softmax cross entropy between logits and labels. Measure probability error
- usage note: MUST not input a softmax-ed value because there is a softmax operation inside this function
tf.train.GradientDescentOptimizer
tf.merge_all_summaries
tf.initialize_all_variables
- meaning: Returns an Op that initializes all variables

References

Tensorflow doc