TensorFlow: Global and Operation-level Seeds15 Jan 2022 / 14 minutes to read Elena Daehnhardt |
Introduction
In training Machine Learning models, we want to avoid any ordering biases in the data. In some cases, such as in Cross-Validation experiments, it is essential to mix data and ensure that the order of data is the same between different runs or system restarts. We can use operation-level and global seeds to achieve the reproducibility of results.
Global and Operation-level Seeds
To begin, let’s create a mutable tensor with “Variable.”
# Create a variable tensor
tensor = tf.Variable([[[1, 2, 3],
[4, 5, 6]],
[[7, 8, 9],
[10, 11, 12]],
[[13, 14, 15],
[16, 17, 18]]])
In the code below, we use assign method to change the first element (which is a matrice) in tensor. We fillled its values with zeros.
# Change elements of the first tensor element
tensor[0].assign([[0, 0, 0], [0, 0, 0]])
<tf.Variable 'UnreadVariable' shape=(3, 2, 3) dtype=int32, numpy= array([[[ 0, 0, 0], [ 0, 0, 0]], [[ 7, 8, 9], [10, 11, 12]], [[13, 14, 15], [16, 17, 18]]], dtype=int32)>
In TensorFlow, we have global and operation-level seeds.
The global seed we define with set_seed:
# Set a random seed with value of 57
tf.random.set_seed(57)
The operation-level we can define directly in operation such as when shuffling the tensor
tf.random.shuffle(tensor, seed=57)
I will further go into four scenarios when using two types of seeds in TensorFlow.
No seeds defined
When we do not define any seeds, the system chooses any random seed in its pool. This will result in different values every time we re-run the code. This is why this scenario does not guarantee any reproducibility of results.
tf.random.shuffle(tensor)
<tf.Tensor: shape=(3, 2, 3), dtype=int32, numpy= array([[[ 7, 8, 9], [10, 11, 12]], [[ 0, 0, 0], [ 0, 0, 0]], [[13, 14, 15], [16, 17, 18]]], dtype=int32)>
The second run changes the data order:
tf.random.shuffle(tensor)
<tf.Tensor: shape=(3, 2, 3), dtype=int32, numpy= array([[[ 7, 8, 9], [10, 11, 12]], [[13, 14, 15], [16, 17, 18]], [[ 0, 0, 0], [ 0, 0, 0]]], dtype=int32) >
The data order will differ from the first run when we restart the environment.
tf.random.shuffle(tensor)
<tf.Tensor: shape=(3, 2, 3), dtype=int32, numpy= array([[[ 0, 0, 0], [ 0, 0, 0]], [[ 7, 8, 9], [10, 11, 12]], [[13, 14, 15], [16, 17, 18]]], dtype=int32) >
tf.random.shuffle(tensor)
<tf.Tensor: shape=(3, 2, 3), dtype=int32, numpy= array([[[ 0, 0, 0], [ 0, 0, 0]], [[13, 14, 15], [16, 17, 18]], [[ 7, 8, 9], [10, 11, 12]]], dtype=int32) >
Only global seed defined
We can just define the global seed with the set_seed function. In this case, the system combines the defined global seed and an arbitrary random seed. After the restart, we got the same shuffled data order when we re-ran the code.
tf.random.set_seed(seed=57)
tf.random.shuffle(tensor)
<tf.Tensor: shape=(3, 2, 3), dtype=int32, numpy= array([[[ 0, 0, 0], [ 0, 0, 0]], [[13, 14, 15], [16, 17, 18]], [[ 7, 8, 9], [10, 11, 12]]], dtype=int32) >
tf.random.shuffle(tensor)
<tf.Tensor: shape=(3, 2, 3), dtype=int32, numpy= array([[[ 7, 8, 9], [10, 11, 12]], [[ 0, 0, 0], [ 0, 0, 0]], [[13, 14, 15], [16, 17, 18]]], dtype=int32) >
Thus, we see precisely the same order of data between restarts.
Operation seed only
When defining only operation seed value, TensorFlow uses a default global seed plus the defined operation seed. When rerunning the code, we get the same results. When we restart, we get the same sequence of results.
tf.random.shuffle(tensor, seed=24)
<tf.Tensor: shape=(3, 2, 3), dtype=int32, numpy= array([[[ 7, 8, 9], [10, 11, 12]], [[ 0, 0, 0], [ 0, 0, 0]], [[13, 14, 15], [16, 17, 18]]], dtype=int32) >
tf.random.shuffle(tensor, seed=24)
<tf.Tensor: shape=(3, 2, 3), dtype=int32, numpy= array([[[ 0, 0, 0], [ 0, 0, 0]], [[13, 14, 15], [16, 17, 18]], [[ 7, 8, 9], [10, 11, 12]]], dtype=int32) >
Both seeds are defined
This is the best case to achieve reproducible randomness. We define global and operation-level seeds. We get the same sequence of results also when we restart the environment. Thus, after running the code, we always get the same sequence of results.
tf.random.set_seed(seed=57)
tf.random.shuffle(tensor, seed=75)
<tf.Tensor: shape=(3, 2, 3), dtype=int32, numpy= array([[[13, 14, 15], [16, 17, 18]], [[ 7, 8, 9], [10, 11, 12]], [[ 0, 0, 0], [ 0, 0, 0]]], dtype=int32)>
tf.random.shuffle(tensor, seed=75)
<tf.Tensor: shape=(3, 2, 3), dtype=int32, numpy= array([[[ 7, 8, 9], [10, 11, 12]], [[ 0, 0, 0], [ 0, 0, 0]], [[13, 14, 15], [16, 17, 18]]], dtype=int32)>
Did you like this post? Please let me know if you have any comments or suggestions.
Python posts that might be interesting for youConclusion
In this post, I have described how we can ensure the required level of reproducibility when using global and operation-level seeds. For writing this post, I have used TensorFlow documentation and tutorials at Udemy, TensorFlow Developer Certificate in 2022: Zero to Mastery.
About Elena Elena, a PhD in Computer Science, simplifies AI concepts and helps you use machine learning.
|