Alright, so you have built a deep neural network model and now you are ready to move it to production and have your application harness all it’s glory! There are many factors you need to consider; for example scaling on demand, training new models, upgrading models and all the related pipelines to make this happen. Today we will focus on deploying a model on GCP with Google’s AI Platform and how you can consume it at scale.
For the purpose of this article we will use Google CoLab for building, training and exporting our Tensorflow model, we will use Google AI Platform to host and serve our model and we will use Google App Engine to consume it.
The Game Plan:

- Use Google Colab to build, train and export a Tensorflow model.
- Upload the model to a GCP storage bucket.
- Load the custom model into Google's AI Platform and prepare it for serving.
- Deploy a Python web app on App Engine to interface with AI-Platform and run inference on our hosted model.
- Send data to our front end application and markup the predictions.
Let's get started!
Google CoLab
Google Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows anybody to write and execute arbitrary python code through the browser, and is especially well suited to machine learning, data analysis and education. More technically, Colab is a hosted Jupyter notebook service that requires no setup to use, while providing free access to computing resources including GPUs and TPU's. This means you can develop deep learning applications using popular libraries such as Keras, TensorFlow, PyTorch, and OpenCV.
For the purposes of this article, we will use the InceptionV3 ImageNet model with our Tensorflow backed application. However, the process workflow can be used for any model you create. The method and way in which you interact and run inference with that model will need to be adjusted. The data collection, feature extraction, model fitting, tuning, and scoring is something that you would do during your development stage, so we will leave that for another time. For this article, we will leverage an InceptionV3 as our model of demonstration. The key takeaway from this stage is that once you have a Tensorflow model ready and it is performing as you desire, you can then export that model in SavedModel format. The TensorFlow SavedModel format is a standalone serialization format for TensorFlow objects, supported by TensorFlow serving as well as TensorFlow implementations other than Python. It does not require the original model building code to run, which makes it useful for sharing or deploying (with TFLite, TensorFlow.js, TensorFlow Serving, or TensorFlow Hub). We will be deploying this model on Google AI-Platform which implements TFServing.
I wrote the following CoLab notebook to load, test and validate the model and extract it in SaveModel format for consumption on AI Platform. Feel free to load it up and follow along.
The notebook used for all steps in this project can be found here:
Google Storage
Next, we need to get our model into a GCP storage bucket so that we can load it into the AI Platform.
- Create a Google Storage bucket to hold our model
- Copy the model files over
Google AI Platform
With our files now uploaded let's go ahead and create a serving instance on AI Platform (ml-engine)
- Let's create the parent instance
- Let's create an instance of our model

Okay our model is online and ready to serve!
Google App Engine
Let's deploy our application on Google's App Engine so we can interface with our model.
For this section I have built a python application that will take a URL as input, the URL must end with an image endpoint (jpg, jpeg,png). The application pre-processes the image (resizing, scaling) in the backend to adjust the image to map to the inputs for the model.
The data is then passed via API to the AI Platform where our model is hosted and inference is run. The result is then returned to the application which parses it and extracts the predictions. The application then marks up the output for end-user consumption. By default the limit for ML Engine API requests is 1.57 MB (1572864 bytes). This can be raised by contacting Google Support, but something to be aware of when selecting images. This is something to consider when pre-processing images for our network while also attempting to remain below the this limit. For true production workloads this should be raised much higher.
Testing
Let's head over to the App Engine URL and do some testing. The initial ramp up of the application and prediction might take a few seconds but once the UI comes up you can start testing with the below URLs or use your own. Ensure you are able to reach the image in your own browser before testing it in the application. The goal is to have only the image come up so the application knows what to intake and work with.
Banana:

Guitar:

Great White Shark:

Coffee cup shaped like a pill bottle:

Interestingly in this last prediction, the photograph is a coffee mug that is designed like a pill bottle. You can see in our prediction, the neural network was able to pick up on features of both items. Although it did correctly classified the image as a coffee mug, it did have some level of confidence that the image also contained a pill bottle which is true!
We hope you enjoyed this article on an approach to operationalizing your Tensorflow model on Google GCP platform, and building an image classifier. Interested in learning more about machine learning, Tensorflow, code development or Google Cloud Platform?
Reach Out To Us!
//Take The First Step