用 TensorFlow 2 学习 MNIST：理解前向传播（tensor）的实战指南

最编程 2024-07-24 22:12:38

...

文章目录

介绍
- 定义
- 代码分析
- 扩展阅读

介绍

以龙良曲老师的《深度学习与TensorFlow 2入门实战》为教材，结合自己的理解，记录一下我的学习笔记。

代码

运行结果

定义

输出函数：
$out = relu\{relu\{relu[X@W_{1}+b_{1}]@W_{2}+b_{2}\}@W_{3}+b_{3}\}$
获取输出函数最大值：
$p r e d = a r g m a x (o u t)$
构建误差函数：
$l o s s = M S E (o u t, l a b e l)$
最小化minimize误差函数loss，获得更新：
$W^{'}_{1},b^{'}_{1},W^{'}_{2},b^{'}_{2},W^{'}_{3},b^{'}_{3}]$

代码分析

导入TF

import  tensorflow as tf
from    tensorflow import keras
from    tensorflow.keras import datasets

去掉无关信息

import  os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

加载数据

(x, y), _ = datasets.mnist.load_data()

自动检测有没有mnist数据集，否则直接从Google网站上下载

输入为：
x：60k张28x28像素的图片
y：60k个图片中的数字（从0-9）

切片

train_db = tf.data.Dataset.from_tensor_slices((x,y)).batch(128)

每次获取128张图片，作为一个批次batch

生成迭代器

train_iter = iter(train_db)
sample = next(train_iter)
print('batch:', sample[0].shape, sample[1].shape)

获取第一批次数据存在sample中，sample[0]为中输入的x，sample[1]为输入的y。显示结果：

batch: (128,28,28)(128,)

创建权值

w1 = tf.Variable(tf.random.truncated_normal([784, 256], stddev=0.1))
b1 = tf.Variable(tf.zeros([256]))
w2 = tf.Variable(tf.random.truncated_normal([256, 128], stddev=0.1))
b2 = tf.Variable(tf.zeros([128]))
w3 = tf.Variable(tf.random.truncated_normal([128, 10], stddev=0.1))
b3 = tf.Variable(tf.zeros([10]))

里面的参数为 [dim_in, dim_out], [dim_out]
b的大小就为dim_out
MNIST像素输入
如图所示，输入层为784个像素点(28*28)

如图所示，神经网络中
隐藏层1，256个节点
隐藏层2，128个节点
输出层，10个节点
[batch, 784] => [batch, 256] => [batch, 128] => [batch, 10]

这里用了tf.Variable()
因为之后要用tf.GradientTape()求导数。tf.GradientTape()默认只监控由tf.Variable创建的traiable=True属性（默认）的变量。tf.random()求导得到的是None类型数据。

把方差设置为0.1,，为了避免梯度爆炸。

创建学习率

lr = 1e-3

设置梯度下降中的学习率为0.001

创建循环

for step, (x, y) in enumerate(train_db):

对于每一批次batch
x [batch,28,28] => [batch,784]

    x=tf.reshape(x,[-1,28*28])

维度变换
第0个维度为-1，即不变。

插入梯度求解器

    with tf.GradientTape() as tape:

h1 = x@w1 + b1

    	h1 = x@w1 + tf.broadcast_to(b1, [x.shape[0], 256])

x 为 [batch, 784]
w1 为 [784, 256]
x@w1 得到 [batch, 256]
b1 为 [256]
所以要把b1 => [batch, 256]
(会自动进行维度转换，可以不用tf.broadcast_to())

添加非线性函数

    	h1 = tf.nn.relu(h1)

h2 = h1@w2 + b2

    	h2 = h1@w2 + b2
    	h2 = tf.nn.relu(h2)

获得输出函数out

    	out = h2@w3 + b3

[batch, 128] => [batch, 10]

y[batch] => y[batch,10]

		y_onehot = tf.one_hot(y, depth=10)

out为 [batch, 10]
y为 [batch]
所以要利用独热编码（one-hot encoding）可以这么理解：
y原本的数据为每张图片中的数字，即[5,2,1,9,4…]，第一张图写的是数字5，第二张图写的是数字2…
现在要变成
$\begin{bmatrix} 0. & 0. & 0. & 0. & 0. & 1. & 0. & 0. & 0. & 0. \\ 0 & 0. & 1. & 0. & 0. & 0. & 0. & 0. & 0. & 0. \\ 0 & 1. & 0. & 0. & 0. & 0. & 0. & 0. & 0. & 0. \\ & & & & \cdots & & & & & \end{bmatrix}$
即
第一行第5个索引位置为1
第二行第2个索引位置为1
…
有数据为1，没有数据为0。

tf.one_hot()函数用法：
tf.one_hot(indices, depth…)

indices: 需要编码的索引，这里就是y
depth: 编码深度，这里是10，因为数字是0-9

计算平均方差

		loss = tf.square(y_onehot - out)
		loss = tf.reduce_mean(loss)

$(\sum(y-out)^2)$
假设out为
$\begin{bmatrix} 0. & 0. & 0. & 0.04 & 0. & 0.94 & 0.02 & 0. & 0. & 0. \\ 0. & 0.02 & 0.82 & 0.09 & 0. & 0. & 0. & 0.06 & 0. & 0.01 \\ & & & & \cdots & & & & & \end{bmatrix}$

上一篇：理解卷积神经网络的前向传播过程

下一篇：回顾一下：在使用PyTorch时的forward传递函数

用 TensorFlow 2 学习 MNIST：理解前向传播（tensor）的实战指南

文章目录

介绍

定义

代码分析

理解深度学习基础：从神经网络构造到实践 - 1.评分函数介绍 2.SVM损失函数解析 3.正规化惩罚项说明 4.Softmax与交叉熵损失函数详解 5.前向传播中的最优化挑战 6.批量大小（batch_size）实操指南...

用 TensorFlow 2 学习 MNIST：理解前向传播（tensor）的实战指南

Tensorflow实战指南： MNIST手写数字案例的前向传播教程