tf.keras.layers.Dense leads to significant differences between CPU and GPU runs of the model implementation code #67829

PhyllisJi · 2024-05-17T10:19:55Z

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

tf 2.14.0

Custom code

Yes

OS platform and distribution

Ubuntu 20.04

Mobile device

No response

Python version

3.10

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

12.2

GPU model and memory

No response

Current behavior?

The difference in the output values of the entire neural network for forward propagation exceeds 0.05 when trained with CPU and GPU respectively. But the outputs are consistent before adding the line fc3_output = tf.keras.layers.Dense(units=10, use_bias=True, name="linear3_mutated") ( relu4_output).

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np
import os


os.environ['CUDA_VISIBLE_DEVICES'] = ''

def Model_VlysjQxB81qtaIXsA_VkCXmPGmE7aDNP(input):
    input = tf.keras.Input(shape=input)
    _zeropadding_input = tf.keras.layers.ZeroPadding2D(padding=((0, 0), (0, 0)))(input)
    conv1_output = tf.keras.layers.Conv2DTranspose(filters=6, kernel_size=(5, 5), strides=(1, 1), padding="valid", output_padding=(0, 0), data_format="channels_last", dilation_rate=(1, 1), use_bias=True, name="conv1_mutated")(input)
    relu1_output = tf.nn.relu(conv1_output)
    _zeropadding_relu1_output = tf.keras.layers.ZeroPadding2D(padding=((0, 0), (0, 0)))(relu1_output)
    maxpool1_output = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding="valid", data_format="channels_last", name="pool1")(_zeropadding_relu1_output)
    _zeropadding_maxpool1_output = tf.keras.layers.ZeroPadding2D(padding=((0, 0), (0, 0)))(maxpool1_output)
    conv2_output = tf.keras.layers.Conv2D(filters=16, kernel_size=(6, 8), strides=(1, 1), padding="valid", data_format="channels_last", dilation_rate=(1, 1), groups=1, use_bias=True, name="conv2_mutated")(_zeropadding_maxpool1_output)
    relu2_output = tf.math.softsign(conv2_output)
    _zeropadding_relu2_output = tf.keras.layers.ZeroPadding2D(padding=((0, 0), (0, 0)))(relu2_output)
    maxpool2_output = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding="valid", data_format="channels_last", name="pool2")(_zeropadding_relu2_output)
    output_transpose = [(0), (0, 1), (0, 2, 1), (0, 3, 1, 2), (0, 4, 1, 2, 3)]
    maxpool2_output = tf.transpose(maxpool2_output, list(output_transpose[(len(maxpool2_output.shape) - 1)]))
    flatten_output = tf.keras.layers.Flatten(data_format="channels_last", name="flatten")(maxpool2_output)
    fc1_output = tf.keras.layers.Dense(units=120, use_bias=True, name="linear1")(flatten_output)
    relu3_output = tf.keras.layers.ThresholdedReLU(theta=0.1, name="relu3_mutated")(fc1_output)
    fc2_output = tf.keras.layers.Dense(units=84, use_bias=True, name="linear2_mutated")(relu3_output)
    relu4_output = tf.math.erf(fc2_output)
    fc3_output = tf.keras.layers.Dense(units=10, use_bias=True, name="linear3_mutated")(relu4_output)
    output_transpose = [(0), (0, 1), (0, 2, 1), (0, 3, 1, 2), (0, 4, 1, 2, 3)]
    fc3_output = tf.transpose(fc3_output, list(output_transpose[(len(fc3_output.shape) - 1)]))
    tail_flatten_output = tf.keras.layers.Flatten(data_format="channels_last", name="tail_flatten")(fc3_output)
    tail_fc_output = tf.keras.layers.Dense(units=10, use_bias=True, name="tail_fc")(tail_flatten_output)

    tail_fc_output = tail_fc_output
    model = tf.keras.models.Model(inputs=input, outputs=tail_fc_output)
    return model


def go():
    with tf.device('/CPU:0'):
        try:
            shape = [1, 1, 28, 28]
            _numpy = np.random.random(shape).astype(np.float32)
            tf_input = tf.convert_to_tensor(_numpy.transpose(0, 2, 3, 1), dtype=tf.float32)
            tf_model = Model_VlysjQxB81qtaIXsA_VkCXmPGmE7aDNP(tf_input.shape[1:])
            tf_output = tf_model(tf_input)
            flag = True
        except Exception:
            flag = False
        return flag


def initialize(model):
    module_dir = os.path.dirname(__file__)
    gradient_transpose = [(0,), (1, 0), (2, 1, 0), (2, 3, 1, 0), (2, 3, 4, 1, 0)]
    for layer in model.layers:
        matrix_path = module_dir + '/../initializer/' + layer.name
        if hasattr(layer, 'kernel_initializer'):
            weight_init_path = matrix_path + '/weight.npz'
            weight_init = np.load(weight_init_path)
            weight_init = weight_init['matrix']
            tf_weight = tf.convert_to_tensor(weight_init, dtype=tf.float32)
            tf_weight = tf.transpose(tf_weight, gradient_transpose[len(tf_weight.shape) - 1])
            layer.kernel.assign(tf.keras.initializers.Constant(tf_weight)(layer.kernel.shape))
        if hasattr(layer, 'bias_initializer') and layer.use_bias:
            bias_init_path = matrix_path + '/bias.npz'
            bias_init = np.load(bias_init_path)
            bias_init = bias_init['matrix']
            tf_bias = tf.convert_to_tensor(bias_init, dtype=tf.float32)
            tf_bias = tf.transpose(tf_bias, gradient_transpose[len(tf_bias.shape) - 1])
            layer.bias.assign(tf.keras.initializers.Constant(tf_bias)(layer.bias.shape))

def train(inp, label):
    with tf.device('/CPU:0'):
        shape = inp.shape
        tf_input = tf.convert_to_tensor(inp.transpose(0, 2, 3, 1), dtype=tf.float32)
        tf_model = Model_VlysjQxB81qtaIXsA_VkCXmPGmE7aDNP(tf_input.shape[1:])

        initialize(tf_model)
        tf_output = tf_model(tf_input)
        output_transpose = [(0), (0, 1), (0, 2, 1), (0, 3, 1, 2), (0, 4, 1, 2, 3)]
        tf_output_trans = tf.transpose(tf_output, list(output_transpose[(len(tf_output.shape) - 1)])).numpy()

        tf_targets = tf.convert_to_tensor(label)
        with tf.GradientTape() as tape:
            tf_predictions = tf_model(tf_input)
            tf_loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)(tf_targets, tf_predictions)
        tf_gradients = tape.gradient(tf_loss, tf_model.trainable_variables)
        tf_gradients_dic = {}
        for var, gradient in zip(tf_model.trainable_variables, tf_gradients):
            gradient_transpose = [(0, ), (1, 0), (2, 0, 1), (3, 2, 0, 1), (4, 3, 0, 1, 2)]
            tf_gradient = tf.transpose(gradient, list(gradient_transpose[len(gradient.shape) - 1])).numpy()
            tf_gradients_dic.setdefault(var.name.replace('/', '.')[:-2], tf_gradient)
        return tf_gradients_dic, float(tf_loss.numpy()), tf_output_trans

Relevant log output

No response

PhyllisJi · 2024-05-17T14:29:50Z

Data from /Users/pinji/Desktop/MoCoDiff/tf-0513/tensorflow-LeNet/LeNet-12-654/case/tensorflow_cpu/output.npz:
Array name: output
[[-0.19694163, -0.09662387, -0.2447289 , ..., -0.03054091, -0.01255638,
0.11781679],
[-0.09944421, -0.1292284 , -0.20278051, ..., -0.01450334, 0.1526829 ,
0.2054872 ],
[ 0.07628068, -0.14517091, -0.16096437, ..., -0.21423727, 0.14428714,
0.09608107],
...,
[-0.20587711, -0.10656235, -0.33036825, ..., -0.13541238, 0.3215404 ,
0.2165656 ],
[-0.11513966, -0.1102111 , -0.3434586 , ..., -0.19525276, 0.08722814,
-0.05503507],
[-0.09822497, -0.10671525, -0.2368006 , ..., -0.16467465, 0.15050802,
0.10809175]]

========================================
Data from /Users/pinji/Desktop/MoCoDiff/tf-0513/tensorflow-LeNet/LeNet-12-654/case/tensorflow_gpu/output.npz:
Array name: output
[[-0.19693479, -0.09660047, -0.24478191, ..., -0.03068957, -0.01259471,
0.1177731 ],
[-0.09948827, -0.12920822, -0.20286846, ..., -0.01475166, 0.15262538,
0.20549278],
[ 0.07635015, -0.14510253, -0.16103148, ..., -0.21449052, 0.14431402,
0.09607705],
...,
[-0.20597562, -0.1065662 , -0.33040133, ..., -0.13553414, 0.32143065,
0.2166584 ],
[-0.1152329 , -0.11016633, -0.34351912, ..., -0.19536608, 0.08703801,
-0.05509032],
[-0.09832728, -0.10680126, -0.2368007 , ..., -0.1649086 , 0.1503351 ,
0.10806311]]

========================================
543 diff: 0.055949583649635315
cpu-543: [-0.1139553 -0.04668004 -0.4390249 0.20775127 0.0899936 0.22854462
-0.46038878 -0.15526699 0.23111053 -0.03327706]
gpu-543: [-0.12328502 -0.0404007 -0.4342668 0.21254592 0.05579592 0.24491714
-0.479826 -0.15429592 0.17516094 -0.05355967]

tilakrayal · 2024-05-20T12:12:56Z

@PhyllisJi,
I tried to execute the mentioned code. Kindly find the gist of it here. In the given code snippet you have defined the class and its methods but are not calling them anywhere. Also try to execute the code with the keras3.0 which is default for the TensorFlow 2.16 and let us know if you are facing the same issue. Thank you!

PhyllisJi · 2024-05-21T12:59:23Z

@PhyllisJi, I tried to execute the mentioned code. Kindly find the gist of it here. In the given code snippet you have defined the class and its methods but are not calling them anywhere. Also try to execute the code with the keras3.0 which is default for the TensorFlow 2.16 and let us know if you are facing the same issue. Thank you!

Due to the need for data support, I have put the reproduction code, data and steps in the repository, which you can reproduce by clone. https://github.com/PhyllisJi/MoCoDiff_Bug/tree/tf-issue%2367829

google-ml-butler bot added the type:bug Bug label May 17, 2024

google-ml-butler bot assigned tilakrayal May 17, 2024

tilakrayal added TF2.14 For issues related to Tensorflow 2.14.x comp:keras Keras related issues labels May 20, 2024

tilakrayal added the stat:awaiting response Status - Awaiting response from author label May 20, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tf.keras.layers.Dense leads to significant differences between CPU and GPU runs of the model implementation code #67829

tf.keras.layers.Dense leads to significant differences between CPU and GPU runs of the model implementation code #67829

tf.keras.layers.Dense leads to significant differences between CPU and GPU runs of the model implementation code #67829

tf.keras.layers.Dense leads to significant differences between CPU and GPU runs of the model implementation code #67829

Comments

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output