使用TensorFlow自定义神经网络模型并训练

| |

上一篇博客里,我们构建了几种神经网络架构:回归、分类、宽深网络、自归一化网络,使用了批归一化、dropout和学习率调度等技术,这些都是基于 tf.keras 的,现在我们来学习一下TensorFlow的其他api接口。

TensorFlow 概要

TensorFlow 由 Google Brain 团队开发,2015年开源,2019年6月发布 TensorFlow 2.0 ,API摘要如下图:

mls3_1201mls3_1201

每个TensorFlow operation 在最底层都是用C++实现的,十分高效。许多操作都有称为内核(kernels)的多种实现:每个内核专用于特定的设备类型,例如CPU、GPU甚至TPU(张量处理单元)。GPU可以通过将GPU分成许多较小的块并在多个GPU线程中并行运行它们来极大地加快计算速度,TPU是专门为深度学习操作而构建的定制ASIC芯片。架构如下图:

mls3_1202mls3_1202

TensorFlow runs not only on Windows, Linux, and macOS, but also on mobile devices (using TensorFlow Lite), including both iOS and Android. Note that APIs for other languages are also available, if you do not want to use the Python API: there are C++, Java, and Swift APIs. There is even a JavaScript implementation called TensorFlow.js that makes it possible to run your models directly in your browser.
There’s more to TensorFlow than the library. TensorFlow is at the center of an extensive ecosystem of libraries. First, there’s TensorBoard for visualization. Next, there’s TensorFlow Extended (TFX), which is a set of libraries built by Google to productionize TensorFlow projects: it includes tools for data validation, preprocessing, model analysis, and serving (with TF Serving; see Chapter 19). Google’s TensorFlow Hub provides a way to easily download and reuse pretrained neural networks. You can also get many neural network architectures, some of them pretrained, in TensorFlow’s model garden. Check out the TensorFlow Resources and https://github.com/jtoy/awesome-tensorflow for more TensorFlow-based projects. You will find hundreds of TensorFlow projects on GitHub, so it is often easy to find existing code for whatever you are trying to do.
More and more ML papers are released along with their implementations, and sometimes even with pretrained models. Check out https://paperswithcode.com to easily find them.

基本概念

TensorFlow的API一切都围绕张量( tensors ),张量从一个操作流向另一个操作,因此命名为TensorFlow(张量流)。张量非常类似NumPy的ndarray,它通常是一个多维度组,但它也可以保存标量(简单值,例如42)。当我们创建自定义成本函数、自定义指标、自定义层等时,这些张量将非常重要,因此让我们来看看如何创建和操作它们。

张量 Tensors

You can create a tensor with tf.constant(). For example, here is a tensor representing a matrix(矩阵) with two rows and three columns of floats:

python

>>> import tensorflow as tf
>>> t = tf.constant([[1., 2., 3.], [4., 5., 6.]])  # matrix矩阵
>>> t
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

查看形状和数据类型:

python

>>> t.shape
TensorShape([2, 3])
>>> t.dtype
tf.float32

索引:

python

>>> t[:, 1:]
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[2., 3.],
       [5., 6.]], dtype=float32)>
>>> t[..., 1, tf.newaxis]
<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[2.],
       [5.]], dtype=float32)>

重点:张量操作

plaintext

>>> t + 10
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[11., 12., 13.],
       [14., 15., 16.]], dtype=float32)>
>>> tf.square(t)
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)>
>>> t @ tf.transpose(t)
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[14., 32.],
       [32., 77.]], dtype=float32)>

Note that writing t + 10 is equivalent to calling tf.add(t, 10) (indeed, Python calls the magic method t.__add__(10), which just calls tf.add(t, 10)). Other operators, like - and *, are also supported. The @ operator was added in Python 3.5, for matrix multiplication(矩阵乘法): it is equivalent to calling the tf.matmul() function.

Many functions and classes have aliases(别名,化名). For example, tf.add() and tf.math.add() are the same function. This allows TensorFlow to have concise names for the most common operations⁠5 while preserving well-organized packages.

张量也可以包含标量值,这种情况下,张量的形状是空的:

python

>>> tf.constant(42)
<tf.Tensor: shape=(), dtype=int32, numpy=42>

tensorflow里基本上包含了所有的基本数学运算,还要numpy的运算,只是他们的名称不同,并且一些属性和功能不太一样,因为tensorflow要支持GPU的特性。

Tensors and NumPy

张量可以与NumPy配合使用:你可以用NumPy数组创建张量,反之亦然。你甚至可以将TensorFlow操作应用于NumPy数组,将NumPy操作应用于张量:

python

>>> import numpy as np
>>> a = np.array([2., 4., 5.])
>>> tf.constant(a)
<tf.Tensor: id=111, shape=(3,), dtype=float64, numpy=array([2., 4., 5.])>
>>> t.numpy()  # or np.array(t)
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)
>>> tf.square(a)
<tf.Tensor: id=116, shape=(3,), dtype=float64, numpy=array([4., 16., 25.])>
>>> np.square(t)
array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)

默认情况下NumPy使用64位精度,而TensorFlow使用32位精度。这是因为32位精度通常对于神经网络来说绰绰有余,而且运行速度更快且使用的RAM更少。因此,当你从NumPy数组创建张量时,请确保设置 dtype=tf.float32

类型转换

Type Conversions
类型转换会严重影响性能,并且自动完成转换很容易被忽视。为了避免这种情况,TensorFlow不会自动执行任何类型转换:如果你对不兼容类型的张量执行操作,会引发异常。例如,你不能把浮点张量和整数张量相加,甚至不能相加32位浮点和64位浮点:

python

>>> tf.constant(2.) + tf.constant(40)
[...] InvalidArgumentError: [...] expected to be a float tensor [...]
>>> tf.constant(2.) + tf.constant(40., dtype=tf.float64)
[...] InvalidArgumentError: [...] expected to be a float tensor [...]

可以使用 tf.cast() 实现类型转换:

python

>>> t2 = tf.constant(40., dtype=tf.float64)
>>> tf.constant(2.0) + tf.cast(t2, tf.float32)
<tf.Tensor: id=136, shape=(), dtype=float32, numpy=42.0>

变量

前面的 tf.Tensor 都是常量,值不允许改变。使用 tf.Variable() 创建变量:

python

>>> v = tf.Variable([[1., 2., 3.], [4., 5., 6.]])
>>> v
<tf.Variable 'Variable:0' shape=(2, 3) dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

tf.Variable的行为与tf.Tensor的行为非常相似。

python

v.assign(2 * v)           # v now equals [[2., 4., 6.], [8., 10., 12.]]
v[0, 1].assign(42)        # v now equals [[2., 42., 6.], [8., 10., 12.]]
v[:, 2].assign([0., 1.])  # v now equals [[2., 42., 0.], [8., 10., 1.]]
v.scatter_nd_update(      # v now equals [[100., 42., 0.], [8., 10., 200.]]
    indices=[[0, 0], [1, 2]], updates=[100., 200.])

实际在训练神经网络时,几乎不需要手动创建变量,keras库函数都封装好了。

其他数据结构

定制模型和训练算法

自定义损失函数

假设现在用TensorFlo自行构建一个Huber损失函数:

python

def huber_fn(y_true, y_pred):
    error = y_true - y_pred
    is_small_error = tf.abs(error) < 1
    squared_loss = tf.square(error) / 2
    linear_loss  = tf.abs(error) - 0.5
    return tf.where(is_small_error, squared_loss, linear_loss)

# 照例使用即可
model.compile(loss=huber_fn, optimizer="nadam")
model.fit(X_train, y_train, [...])

最好返回每个实例包含一个损失的张量,而不是返回实例的平均损失。这样,Keras可以根据要求使用类别权重或样本权重。

For better performance, you should use a vectorized implementation, as in this example. Moreover, if you want to benefit from TensorFlow’s graph optimization features, you should use only TensorFlow operations.

保存和加载包含自定义组件的模型

保存包含自定义损失函数的模型效果很好,因为Keras会保存函数的名称。每次加载时,都需要提供一个字典,将函数名称映射到实际函数。一般而言,当加载包含自定义对象的模型时,需要将名称映射到对象:

python

model = tf.keras.models.load_model("my_model_with_a_custom_loss", custom_objects={"huber_fn": huber_fn})

If you decorate the huber_fn() function with @keras.utils.​reg⁠ister_keras_serializable(), it will automatically be available to the load_model() function: there’s no need to include it in the custom_objects dictionary.

进一步修改:
在当前的实现中,在-1和1之间的任何误差都被认为是“很小”。但是,如果你想要一个不同的阈值怎么办?一种解决方案是创建一个函数,该函数创建已配置的损失函数:

python

def create_huber(threshold=1.0):
    def huber_fn(y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < threshold
        squared_loss = tf.square(error) / 2
        linear_loss  = threshold * tf.abs(error) - threshold ** 2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)
    return huber_fn

model.compile(loss=create_huber(2.0), optimizer="nadam")

不幸的是,当你保存模型时,阈值不会被保存。这意味着在加载模型时必须指定阈值(请注意,使用的名称是"huber_fn",这是你为Keras命名的函数的名称,而不是创建函数时的名称)

python

model = tf.keras.models.load_model(
    "my_model_with_a_custom_loss_threshold_2",
    custom_objects={"huber_fn": create_huber(2.0)}
)

可以通过创建 tf.keras.losses.Loss 的子类,然后实现其 get_config() 方法

python

class HuberLoss(tf.keras.losses.Loss):
    def __init__(self, threshold=1.0, **kwargs):
        self.threshold = threshold
        # 构造函数接受**kwargs并将它们传递给父类构造函数
        super().__init__(**kwargs)

    def call(self, y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < self.threshold
        squared_loss = tf.square(error) / 2
        linear_loss  = self.threshold * tf.abs(error) - self.threshold**2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)

    def get_config(self):
        # 它首先调用父类的get_config()方法,然后将新的超参数添加到此字典中
        base_config = super().get_config()
        return {**base_config, "threshold": self.threshold}


# 可以在编译时使用
model.compile(loss=HuberLoss(2.), optimizer="nadam")
# 当你保存模型时,阈值会同时一起保存。在加载模型时,只需要将类名映射到类本身即可:
model = tf.keras.models.load_model("my_model_with_a_custom_loss_class", custom_objects={"HuberLoss": HuberLoss})

自定义激活函数、初始化、正则化和约束

都是一样的,举例如下:

python

# 自定义激活函数
def my_softplus(z):
    return tf.math.log(1.0 + tf.exp(z))

# 自定义Glorot初始化
def my_glorot_initializer(shape, dtype=tf.float32):
    stddev = tf.sqrt(2. / (shape[0] + shape[1]))
    return tf.random.normal(shape, stddev=stddev, dtype=dtype)

# 自定义1正则化
def my_l1_regularizer(weights):
    return tf.reduce_sum(tf.abs(0.01 * weights))

# 确保权重均为正的自定义约束
def my_positive_weights(weights):  # return value is just tf.nn.relu(weights)
    return tf.where(weights < 0., tf.zeros_like(weights), weights)


# 使用
layer = tf.keras.layers.Dense(1, activation=my_softplus,
                              kernel_initializer=my_glorot_initializer,
                              kernel_regularizer=my_l1_regularizer,
                              kernel_constraint=my_positive_weights)

# 保存参数
class MyL1Regularizer(tf.keras.regularizers.Regularizer):
    def __init__(self, factor):
        self.factor = factor

    def __call__(self, weights):
        return tf.reduce_sum(tf.abs(self.factor * weights))

    def get_config(self):
        return {"factor": self.factor}

激活函数将应用于此Dense层的输出,其结果将传递到下一层。层的权重将使用初始化程序返回的值进行初始化。在每个训练步骤中,权重将传递给正则化函数以计算正则化损失,并将其添加到主要损失中以得到用于训练的最终损失。最后在每个训练步骤之后将调用约束函数,并将层的权重替换为约束权重。

自定义指标

损失(Losses )和指标(metrics )在概念上不是一回事:损失(例如交叉熵)被梯度下降用来训练模型,因此它们必须是可微的(至少是在求值的地方),并且梯度在任何地方都不应为0。另外,如果人类不容易解释它们也没有问题。相反,指标(例如准确率)用于评估模型,它们必须更容易被解释,并且可以是不可微的或在各处具有0梯度。
也就是说,在大多数情况下,定义一个自定义指标函数与定义一个自定义损失函数完全相同。实际上,我们甚至可以将之前创建的Huber损失函数用作指标(现实中很少用Huber作指标,此处仅举例而已)。它可以很好地工作(持久性也可以以相同的方式工作,在这种情况下,仅保存函数名"huber_fn"):

python

model.compile(loss="mse", optimizer="nadam", metrics=[create_huber(2.0)])

As you saw in Chapter 3, precision is the number of true positives divided by the number of positive predictions (including both true positives and false positives). Suppose the model made five positive predictions in the first batch, four of which were correct: that’s 80% precision. Then suppose the model made three positive predictions in the second batch, but they were all incorrect: that’s 0% precision for the second batch. If you just compute the mean of these two precisions, you get 40%. But wait a second—that’s not the model’s precision over these two batches! Indeed, there were a total of four true positives (4 + 0) out of eight positive predictions (5 + 3), so the overall precision is 50%, not 40%. What we need is an object that can keep track of the number of true positives and the number of false positives and that can compute the precision based on these numbers when requested. This is precisely what the tf.keras.metrics.Precision class does:

python

>>> precision = tf.keras.metrics.Precision()
>>> precision([0, 1, 1, 1, 0, 1, 0, 1], [1, 1, 0, 1, 0, 1, 0, 1])
<tf.Tensor: shape=(), dtype=float32, numpy=0.8>
>>> precision([0, 1, 0, 0, 1, 0, 1, 1], [1, 0, 1, 1, 0, 0, 0, 0])
<tf.Tensor: shape=(), dtype=float32, numpy=0.5>

In this example, we created a Precision object, then we used it like a function, passing it the labels and predictions for the first batch, then for the second batch (you can optionally pass sample weights as well, if you want). We used the same number of true and false positives as in the example we just discussed. After the first batch, it returns a precision of 80%; then after the second batch, it returns 50% (which is the overall precision so far, not the second batch’s precision). This is called a streaming metric 流式指标 (or stateful metric), as it is gradually updated, batch after batch.

At any point, we can call the result() method to get the current value of the metric. We can also look at its variables (tracking the number of true and false positives) by using the variables attribute, and we can reset these variables using the reset_states() method:

python

>>> precision.result()
<tf.Tensor: shape=(), dtype=float32, numpy=0.5>
>>> precision.variables
[<tf.Variable 'true_positives:0' [...], numpy=array([4.], dtype=float32)>,
 <tf.Variable 'false_positives:0' [...], numpy=array([4.], dtype=float32)>]
>>> precision.reset_states()  # both variables get reset to 0.0

一个跟踪Huber总损失的简单示例以及到目前为止看到的实例数量。当要求得到结果时,它返回比率,这就是平均Huber损失:

python


# 仅仅是一个示例
# 一个更简单、更好的实现应该是继承keras.metrics.Mean类
class HuberMetric(tf.keras.metrics.Metric):
    def __init__(self, threshold=1.0, **kwargs):
        super().__init__(**kwargs)  # handles base args (e.g., dtype)
        self.threshold = threshold
        self.huber_fn = create_huber(threshold)
        self.total = self.add_weight("total", initializer="zeros")
        self.count = self.add_weight("count", initializer="zeros")

    def update_state(self, y_true, y_pred, sample_weight=None):
        sample_metrics = self.huber_fn(y_true, y_pred)
        self.total.assign_add(tf.reduce_sum(sample_metrics))
        self.count.assign_add(tf.cast(tf.size(y_true), tf.float32))

    def result(self):
        # 计算并返回最终结果,在这种情况下为所有实例的平均Huber度量
        # 应先调用update ,然后调用result
        return self.total / self.count

    def get_config(self):
        # 确保threshold与模型一起被保存
        base_config = super().get_config()
        return {**base_config, "threshold": self.threshold}

自定义层

可以自定义层;或者如果模型是A,B,C,A,B,C,A,B,C层的序列,则你可能想定义一个包含A,B,C层的自定义层D你的模型将简化为D,D,D。

首先,某些层没有权重:

python

# 将其包装在keras.layers.Lambda层中
exponential_layer = tf.keras.layers.Lambda(lambda x: tf.exp(x))

This custom layer can then be used like any other layer, using the sequential API, the functional API, or the subclassing API. You can also use it as an activation function, or you could use activation=tf.exp. The exponential layer is sometimes used in the output layer of a regression model when the values to predict have very different scales (e.g., 0.001, 10., 1,000.). In fact, the exponential function is one of the standard activation functions in Keras, so you can just use activation="exponential".

构建自定义的有权重的层:

python

class MyDense(tf.keras.layers.Layer):
    # 构造函数将所有超参数用作参数,units,activation
    def __init__(self, units, activation=None, **kwargs):
        super().__init__(**kwargs)
        self.units = units
        # 将超参数保存为属性,接受函数、标准字符串("relu" or "swish", or simply None)
        self.activation = tf.keras.activations.get(activation)

    def build(self, batch_input_shape):
        self.kernel = self.add_weight(
            name="kernel", shape=[batch_input_shape[-1], self.units],
            initializer="glorot_normal")
        self.bias = self.add_weight(
            name="bias", shape=[self.units], initializer="zeros")

    def call(self, X):
        # In this case, we compute the matrix multiplication of the inputs X and the layer’s kernel, 
        # we add the bias vector, and we apply the activation function to the result, 
        # and this gives us the output of the layer.
        return self.activation(X @ self.kernel + self.bias)

    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "units": self.units,
                "activation": tf.keras.activations.serialize(self.activation)}

多输入:

python

class MyMultiLayer(tf.keras.layers.Layer):
    def call(self, X):
        X1, X2 = X
        return X1 + X2, X1 * X2, X1 / X2

If your layer needs to have a different behavior during training and during testing (e.g., if it uses Dropout or BatchNormalization layers), then you must add a training argument to the call() method and use this argument to decide what to do. For example, let’s create a layer that adds Gaussian noise during training (for regularization) but does nothing during testing (Keras has a layer that does the same thing, tf.keras.layers.GaussianNoise):

python

class MyGaussianNoise(tf.keras.layers.Layer):
    def __init__(self, stddev, **kwargs):
        super().__init__(**kwargs)
        self.stddev = stddev

    def call(self, X, training=False):
        if training:
            noise = tf.random.normal(tf.shape(X), stddev=self.stddev)
            return X + noise
        else:
            return X

自定义模型

其实前面的博客里已经提过创建自定义模型类了
很简单:继承keras.Model类,在构造函数中创建层和变量,并实现 call() 方法来执行你希望模型执行的任何操作。

For example, suppose we want to build the model represented in Figure 12-3.

mls3_1203mls3_1203

输入经过第一个密集层,然后经过由两个密集层组成的残差块并执行加法运算, 然后经过相同的残差块3次或者更多次, ,然后通过第二个残差块,最终结果通过密集输出层。 Don’t worry if this model does not make much sense; it’s just an example to illustrate the fact that you can easily build any kind of model you want, even one that contains loops and skip connections. To implement this model, it is best to first create a ResidualBlock layer, since we are going to create a couple of identical blocks (and we might want to reuse it in another model):

python

class ResidualBlock(tf.keras.layers.Layer):
    def __init__(self, n_layers, n_neurons, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [tf.keras.layers.Dense(n_neurons, activation="relu",
                                             kernel_initializer="he_normal")
                       for _ in range(n_layers)]

    def call(self, inputs):
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        return inputs + Z

该层有点特殊,因为它包含其他层。这由Keras透明地处理:它会自动检测到隐藏的属性,该属性包含可跟踪的对象(在这个示例中是层),因此它们的变量会自动添加到该层的变量列表中。接下来让我们使用子类API定义模型本身。

python

class ResidualRegressor(tf.keras.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden1 = tf.keras.layers.Dense(30, activation="relu",
                                             kernel_initializer="he_normal")
        self.block1 = ResidualBlock(2, 30)
        self.block2 = ResidualBlock(2, 30)
        self.out = tf.keras.layers.Dense(output_dim)

    def call(self, inputs):
        Z = self.hidden1(inputs)
        for _ in range(1 + 3):
            Z = self.block1(Z)
        Z = self.block2(Z)
        return self.out(Z)

按部就班地做就行了,没什么特别的。

We create the layers in the constructor and use them in the call() method. This model can then be used like any other model (compile it, fit it, evaluate it, and use it to make predictions). If you also want to be able to save the model using the save() method and load it using the tf.keras.models.load_model() function, you must implement the get_config() method (as we did earlier) in both the ResidualBlock class and the ResidualRegressor class. Alternatively, you can save and load the weights using the save_weights() and load_weights() methods.

基于模型内部的损失和指标

Losses and Metrics Based on Model Internals , 例如,让我们构建一个自定义回归MLP模型,该模型由5个隐藏层和一个输出层的堆栈组成。此自定义模型还将在上部隐藏层的顶部有辅助输出。与该辅助输出相关的损失称为重建损失:它是重建与输入之间的均方差。通过将这种重建损失添加到主要损失中,我们鼓励模型通过隐藏层保留尽可能多的信息,即使对回归任务本身没有直接用处的信息。实际中,这种损失有时会提高泛化性(这是正则化损失)。以下是带有自定义重建损失的自定义模型的代码:

python

class ReconstructingRegressor(tf.keras.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [tf.keras.layers.Dense(30, activation="relu",
                                             kernel_initializer="he_normal")
                       for _ in range(5)]
        self.out = tf.keras.layers.Dense(output_dim)
        self.reconstruction_mean = tf.keras.metrics.Mean(
            name="reconstruction_error")

    def build(self, batch_input_shape):
        # 创建一个额外的密集层,该层用于重建模型的输入
        n_inputs = batch_input_shape[-1]
        self.reconstruct = tf.keras.layers.Dense(n_inputs)

    def call(self, inputs, training=False):
        # 处理所有5个隐藏层的输入,然后将结果传递到重建层,从而产生重构
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        reconstruction = self.reconstruct(Z)
        # 计算重建损失(重建与输入之间的均方差)
        recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
        self.add_loss(0.05 * recon_loss)
        if training:
            result = self.reconstruction_mean(recon_loss)
            self.add_metric(result)
        return self.out(Z)

使用自动微分计算梯度

微分的近似计算,在tensorflow中使用自动微分:只需执行一次正向传播和一次反向传播即可一次获得所有梯度
先创建两个变量,然后创建一个tf.GradientTape上下文,该上下文将自动记录涉及变量的每个操作,最后我们要求该tape针对两个变量[w1,w2]计算结果z的梯度。

python

w1, w2 = tf.Variable(5.), tf.Variable(3.)
with tf.GradientTape() as tape:
    z = f(w1, w2)

gradients = tape.gradient(z, [w1, w2])

# 查看一下结果
>>> gradients
[<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>]

# 调用tape的gradient()方法后,tape会立即被自动擦除
# 如果你需要多次调用gradient(),则必须使该tape具有持久性
# 并在每次使用完该tape后将其删除以释放资源
with tf.GradientTape(persistent=True) as tape:
    z = f(w1, w2)

dz_dw1 = tape.gradient(z, w1)  # returns tensor 36.0
dz_dw2 = tape.gradient(z, w2)  # returns tensor 10.0, works fine now!
del tape

# 默认情况下,tape仅跟踪涉及变量的操作,
# 因此如果你尝试针对变量以外的任何其他变量计算z的梯度,则结果将为None:
c1, c2 = tf.constant(5.), tf.constant(3.)
with tf.GradientTape() as tape:
    z = f(c1, c2)

gradients = tape.gradient(z, [c1, c2])  # returns [None, None]


# 你可以强制tape观察你喜欢的任何张量,来记录涉及它们的所有操作。
# 然后你可以针对这些张量计算梯度,就好像它们是变量一样:
with tf.GradientTape() as tape:
    tape.watch(c1)
    tape.watch(c2)
    z = f(c1, c2)

gradients = tape.gradient(z, [c1, c2])  # returns [tensor 36., tensor 10.]
# 在某些情况下,这可能很有用,如果你要实现正则化损失,从而在
# 输入变化不大时惩罚那些变化很大的激活:损失将基于激活相对于输入
# 的梯度而定。由于输入不是变量,因此你需要告诉tape观察它们。



# 在某些情况下,你可能希望阻止梯度在神经网络的某些部分反向传
# 播。为此必须使用tf.stop_gradient()函数。该函数在前向传递过程
# 中返回其输入(如tf.identity()),但在反向传播期间不让梯度通
# 过(它的作用类似于常量):
def f(w1, w2):
    return 3 * w1 ** 2 + tf.stop_gradient(2 * w1 * w2)

with tf.GradientTape() as tape:
    z = f(w1, w2)  # the forward pass is not affected by stop_gradient()

gradients = tape.gradient(z, [w1, w2])  # returns [tensor 30., None]

# Finally, you may occasionally run into some numerical issues when computing gradients.
#  For example, if you compute the gradients of the square root function at x = 10–50, the result will be infinite.
#  In reality, the slope at that point is not infinite, but it’s more than 32-bit floats can handle:
>>> x = tf.Variable(1e-50)
>>> with tf.GradientTape() as tape:
...     z = tf.sqrt(x)
...
>>> tape.gradient(z, [x])
[<tf.Tensor: shape=(), dtype=float32, numpy=inf>]
# To solve this, it’s often a good idea to add a tiny value to x (such as 10–6) when computing its square root.


# 最后在计算梯度时,你有时可能会遇到一些数值问题。例如,如果
# 你用大数值输入来计算my_softplus()函数的梯度,则结果为NaN:
# 这是因为使用自动微分计算此函数的梯度会导致一些数值上的困
# 难:由于浮点精度误差,自动微分最终导致计算无穷除以无穷(返回
# NaN)。幸运的是,我们可以分析发现softplus函数的导数为1/
# (1+1/exp(x)),在数值上是稳定的。接下来我们可以告诉
# TensorFlow在计算my_softplus()函数的梯度时使用
# @tf.custom_gradient来修饰它并使它既返回其正常输出又返回计算导
# 数的函数(注意,它将接收到目前为止反向传播的梯度,直到softplus
# 函数。根据链式规则,我们应该将它们乘以该函数的梯度):
@tf.custom_gradient
def my_softplus(z):
    def my_softplus_gradients(grads):  # grads = backprop'ed from upper layers
        return grads * (1 - 1 / (1 + tf.exp(z)))  # stable grads of softplus

    result = tf.math.log(1 + tf.exp(-tf.abs(z))) + tf.maximum(0., z)
    return result, my_softplus_gradients

自定义训练循环

非必要不要进行这样的操作,容易出错。
简单记一些吧,可能我压根用不到……

python

# 首先,让我们建立一个简单的模型。无须编译它,因为我们将手动处理训练循环:
l2_reg = tf.keras.regularizers.l2(0.05)
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(30, activation="relu", kernel_initializer="he_normal",
                          kernel_regularizer=l2_reg),
    tf.keras.layers.Dense(1, kernel_regularizer=l2_reg)
])

# 接下来,让我们创建一个小函数,从训练集中随机采样一批实例
def random_batch(X, y, batch_size=32):
    idx = np.random.randint(len(X), size=batch_size)
    return X[idx], y[idx]

# 让我们定义一个函数,显示训练状态,包括步数、步总数、从轮次
# 开始以来的平均损失(即我们将使用Mean指标来计算),和其他指标:
def print_status_bar(step, total, loss, metrics=None):
    metrics = " - ".join([f"{m.name}: {m.result():.4f}"
                          for m in [loss] + (metrics or [])])
    end = "" if step < total else "\n"
    print(f"\r{step}/{total} - " + metrics, end=end)
# 注:python字符串格式化{:.4f}会格式化小数点后四位数字的浮点数,并使用\r(回车)
# end=""确保状态栏始终打印在同一行上。

# 首先我们需要定义一些超参数,然后选择优化器、损失函数和指标
n_epochs = 5
batch_size = 32
n_steps = len(X_train) // batch_size
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
loss_fn = tf.keras.losses.mean_squared_error
mean_loss = tf.keras.metrics.Mean(name="mean_loss")
metrics = [tf.keras.metrics.MeanAbsoluteError()]

# 准备构建自定义循环!
# 创建了两个嵌套循环:一个用于轮次,另一个用于轮次内的批处理
for epoch in range(1, n_epochs + 1):
    print("Epoch {}/{}".format(epoch, n_epochs))
    for step in range(1, n_steps + 1):
        # 从训练集中抽取一个随机批次。
        X_batch, y_batch = random_batch(X_train_scaled, y_train)
        with tf.GradientTape() as tape:
            # 对一个批次进行了预测
            y_pred = model(X_batch, training=True)
            main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
            loss = tf.add_n([main_loss] + model.losses)     # 计算了损失:它等于主损失加其他损失(正则化损失)

        # 计算损失的梯度,然后用优化器来执行“梯度下降”步骤。
        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        mean_loss(loss)
        # 更新平均损失和指标(在当前轮次内)
        for metric in metrics:
            metric(y_batch, y_pred)

        print_status_bar(step, n_steps, mean_loss, metrics)

    for metric in [mean_loss] + metrics:
        metric.reset_states()       # 重置平均损失和指标的状态


TensorFlow 函数和图

TensorFlow Functions and Graphs

Graphs概念

python

def cube(x):
    return x ** 3

# 可以使用张量来调用它
>>> cube(2)
8
>>> cube(tf.constant(2.0))
<tf.Tensor: shape=(), dtype=float32, numpy=8.0>

# Now, let’s use tf.function() to convert this Python function to a TensorFlow function
>>> tf_cube = tf.function(cube)
>>> tf_cube
<tensorflow.python.eager.def_function.Function at 0x7fbfe0c54d50>


# This TF function can then be used exactly like the original Python function, 
# and it will return the same result (but always as tensors):
>>> tf_cube(2)
<tf.Tensor: shape=(), dtype=int32, numpy=8>
>>> tf_cube(tf.constant(2.0))
<tf.Tensor: shape=(), dtype=float32, numpy=8.0>
# 在后台,tf.function()分析了cube()函数执行的计算,并生成等效的计算图!

# 也可以这样
@tf.function
def tf_cube(x):
    return x ** 3


# 如果需要,可以通过TF函数的python_function属性使用原Python函数
>>> tf_cube.python_function(2)
8

TensorFlow可以优化计算图,修剪未使用的节点,简化表达式等等准备好优化的图后,TF函数会以适当的顺序(并在可能时并行执行)有效地执行图中的操作。因此TF函数通常比原始的Python函数运行得更快,尤其是在执行复杂计算的情况下。当你编写自定义损失函数、自定义指标、自定义层或任何其他自定义函数,并在Keras模型中使用它时,Keras会自动将你的函数转换为TF函数。也可以通过在创建自定义层或自定义模型时设置dynamic=True来告诉Keras不要将Python函数转换为TF函数。或者,可以在调用模型的compile()方法时设置run_eagerly=True。 默认情况下,TF函数会为每个不同的输入形状和数据类型集生成一个新图形,并将其缓存以供后续调用,但是这仅适用于张量参数:如果将Python数值传递给TF函数,则将为每个不同的值生成一个新图。这会很消耗内存。

AutoGraph and Tracing

自动图和跟踪,AutoGraph and Tracing
符号张量:只有形状,没有任何实际值的张量。

mls3_1204mls3_1204

python

# 可以查看自动图的源代码
tf.autograph.to_code(sum_squares.python_function)

还有关于 TF Function Rules 的部分,暂时用不到,略去不表。