Tensorflow has gained a lot of attention in the last few years for deep learning models. It provides simple APIs for the creation of neural networks which can be effectively scaled and hosted on different platforms. Tensor is a mathematical term with definition as:
In mathematics, a tensor is an algebraic object that describes a multilinear relationship between sets of algebraic objects related to a vector space.Wikipedia
In common sense, a tensor is a multidimensional array. However, Tensorflow is a python framework that has a comprehensive set of tools and libraries to create machine learning projects. Here the relationship of the tensor with TensorFlow is in the sense that most of the input, intermediate, and output data is in the form of tensors. In this blog, we will learn the basic operations of TensorFlow and build a simple neural network for linear regression.
Tensorflow constant and variables
Tensorflow uses its own type of constants and variable also has a different method to declare, define and initialize them. The main reason is that Python vectorization and broadcasting are used in multiple operations of neural networks which are not possible with standard constants and variables. Constants and variables can be declared using the constant and Variable functions in the TensorFlow package. See the above code snapshot TensorFlow is imported as tf and constants and variables are declared.
Zeros and Ones
When performing various operations in the working of the neural networks, we have to take matrices with all elements as zero or one initialized. Tensorflow has zeros and ones function that can be used to create them in a single line of code. See the code snapshot below, tf.ones are used for creating a matrix of 3 by 2 with all ones having a data type of int32. Similarly, the matrix can be created for the zeros also. Let’s say we wish to create a matrix of either ones and zeros and its shape should be similar to an already available tensor, we can use ones_like function provided in the tensor package. In the code below, A1 and A23 are two constant tensors and B1 and B23 ones tensors were created using the shape of A1 and A23. A similar operation can be performed using zeros_like.
Ones and zeros functions are useful when creating weight and bias tensors for forward propagation in the neural networks. Apart from ones and zeros, we can also initialize a tensor of any shape with any value using tf.fill method.
Implementation of a neural network using TensorFlow requires various operations to be performed between the constants and the variables. In this section, some of the common operations will be discussed in brief with code.
The addition of tensors is a simple yet repetitive process a programmer may need to perform. The addition in TensorFlow is performed element-wise and can be done using the following code.
a3 will add each element of both tensors and store it in the resultant tensor. See the result of the print command below. Seven plus three resulted as ten for each element of a3 tensor.
Multiply and Matrix Multiplication
Multiplication of tensors can be performed in two ways, the first is the element-wise multiplication just like add method and the second is the matrix multiplication. Both can be easily conducted using Tensorflow, see the code below.
The shape of tensors needs to keep in mind when performing multiplication operations. In element-wise multiplication shape of both tensors should be exactly the same. However, for matrix multiplication, the columns of the first tensor should be the same as the number of rows of the second tensor.
In the implementation of the neural network, we may need to reduce a tensor into a single value. Thus, this single value will reflect the complete tensor for next layers or output of a neural network.
In the code above, I have taken an example of reduce_sum, however, there are many functions related to reduce which are listed in the table below.
|reduce_all()||Computes Logical AND across dimensions of tensors.|
|reduce_any()||Computes Logical OR across dimensions of tensors.|
|reduce_euclidean_norm()||Computes the Euclidean norm of elements across dimensions of a tensor.|
|reduce_max()||Finds maximum across tensor dimension|
|reduce_min()||Finds minimum across tensor dimension|
|reduce_mean()||Computes the mean of elements across dimensions of a tensor.|
|reduce_prod()||Computes the products of elements across dimensions of a tensor.|
|reduce_std()||Computes the standard deviation of elements across dimensions of a tensor.|
When updating the weights of tensors in the neural network, we need to find the minimum, maximum, or optimal value of loss or other functions used. The gradient can help us to identify that because gradient equals to zero is optimal value, change in gradient is greater than zero we can further minimize the value and, lastly, the gradient is smaller than zero we can maximize the value. In simple language, gradient provides the rate of change in a variable depending on the change in another variable.
In the above code, we have used context manager and gradient tape function to watch a variable for its gradient change. With any change in x it monitors the change in y. For more details please visit GradientTape page.
The neural networks are widely used for image processing, however, an image is stored in the form of numeric values ranging between 0 to 255 in a matrix format. But, the neural network only accepts a linear input so we need to reshape the input image into a one dimension tensor so that it can be fed to the neural network. Reshape function is useful to do so. In the code below, we created a simple grayscale image using random numbers and then converted this image into a linear tensor. The image below the code shows a pictorial representation of the idea for 2*2 image.
Random Number Generations
The optimal method of initializing a weight or bias tensor is to use a random process distribution. The TensorFlow provides various random number generators which can be directly used to generate a tensor using values from a random distribution. In the code of reshape function, we have created an image using uniform random distribution where the maximum value was 255. Similarly, we can generate random numbers using
stateless_binomial and Others. Please visit this page for all random number generator functions.
Losses in Neural Network
The loss actually helps to understand how accurate the neural network is for any prediction task. It also shows the path for the training process to train the weights. All weight updates are based on the loss value generated from the predicted output from the target output. The losses are basically divided into two types of problems which are classification and regression. Classification problems mostly use
CosineSimilarity. On the other hand, regression problem uses
Huber. For more detail, you can visit the documentation page of TensorFlow for the loss here.
As the name suggests, these methods optimize the weight and bias tensors of a neural network to minimize the loss we learned in the previous block. More detailed comparisons and working of optimizers can be read from this article on medium. The basic optimizer is the gradient descent which updates the weights and bias based on the gradient of loss value using the equation below.
W = W - α Δw
However, the most commonly used optimizer is the ADAM optimizer. Tensorflow documentation provides details about every optimizer on this page.
Linear Regression using TensorFlow
Now let’s use the above knowledge and create a simple model to train the intercept and slope variables of linear regression, see the code below. The auto-mpg dataset was downloaded from Kaggle and two columns were extracted from the data set as numpy array (line: 2,3). Intercept and slope of linear regression were created as TensorFlow variables with an initial value of 0.2 both (line: 6,7). A linear_regression function is defined which returns the value based on the linear regression equation (line: 10, 11). A loss function is defined which calculates the loss using mse and returns it (line: 14 -18). The Adam optimizer is used which tried to minimize the loss by manipulating the values of intercept and slope over an iteration of 1000. Loss and final values of intercept and slope were printed, see the output image below the code. We can see that loss is decreased gradually.
In this blog, we have discussed some of the basics of TensorFlow which we require to build neural network models. In the end, we have created a simple linear regression model to find the optimal value of intercept and slope. This blog’s intention was to discuss some of the basics we need to learn before using high-level APIs to develop a neural network for complex applications.