oh! nope... I have created another network:
best_inputs, best_out, best_scaled_out = create_network()
best_network_params = tf.trainable_variables()
Then I create this op:
update_best_network_params = [best_network_params[i].assign(original_network_params[i])
for i in range(len(original_network_params))]
Later in my program I invoke:
actor.update_best_network()
And the problem is that it seems that it is assigning a reference rather than doing a deep copy