Extract Text From Images Using Computer Vision API And Azure Function

Sep 3, 2017 AZURE COGNITIVE SERVICES

I started to work on a project which is a combination of lot of intelligent APIs and Machine Learning stuff. One of the things I have to accomplish is to extract the text from the images that are being uploaded to the storage. To accomplish this part of the project I planned to use Microsoft Cognitive Service Computer Vision API. Here is the extract of it from my architecture diagram.

Architcure Diagram

Let’s get started by provisioning a new Azure Function.

Creating Function App

I named my Function App as quoteextractor and selected the Hosting Plan as Consumption Plan instead of App Service Plan. If you choose Consumption Plan you will be billed only what you use or whenever your function executes. On the other hand, if you choose App Service Plan you will be billed monthly based on the Service Plan you choose even if your function executes only few times a month or not even executed at all. I have selected the Location as West US because the Cognitive Service subscription I have has the endpoint for the API from West US. I then also created a new Storage Account. Click on Create button to create the Function App.

Created Function App

After the Function App is created successfully, create a new function in the Function App quoteextractor. Click on the Functions on the left hand side and then click New Function on the right side window to create a new function as shown in the below screenshot.

Creating New Function

The idea is to trigger the function whenever a new image is added/uploaded to the blob storage. To filter down the list of templates, select C# as the Language and then select BlobTrigger - C# from the template list. I have also change the name of the function to QuoteExtractor. I have also changed the Path parameter of Azure Blob Storage trigger to have quoteimages. The quoteimages is the name of the container where the function will bind itself and whenever a new image or item is added to the storage it will get triggered.

Quickstart Function Templates

Naming Function

Now change the Storage account connection which is basically a connection string for your storage account. To create a new connection click on the new link and select the storage account you want your function to get associated with. The storage account I am selecting is the same one which got crated at the time of creating the Function App.

Function Storage

Once you select the storage account, you will be able to see the connection key in the Storage account connection dropdown. If you want to view the full connection string then click the show value link.

Function Storage Connection String

Click the Create button to create the function. You will now be able to see your function name under the Functions section. Expand your function and you will see three other segments named Integrate, Manage and Monitor.

Function Running

Click on Integrate and under Triggers update the Blob parameter name from myBlob to quoteImage or whatever the name you feel like having. This name is important as this is the same parameter I will be using in my function. Click Save to save the settings.

Function Running

The storage account which is created at the time of creating the Function App, still does not have a container. Add a container with the name which is used in the path of the Azure Blob Storage trigger which is quotesimages.

Function Running

Make sure you select the Public access level to Blob. Click OK to create a container in your storage account.

Change the public access level

With this step, you should be done with all the configuration which are required for Azure Function to work properly. Now let’s get the Computer Vision API subscription. Go to https://azure.microsoft.com/en-in/try/cognitive-services/ and under Vision section click Get API Key which is next to Computer Vision API. Agree with all the terms and conditions and continue. Login with any of the preferred account to get the subscription ready. Once everything went well, you will see your subscription key and endpoint details.

Cognitive Service API Keys

The result which I am going to get as a response from the API is in JSON format. Either I can parse the JSON request in the function itself or I can save it directly in the CosmosDB or any other persistent storage. I am going to put all the response in CosmosDB in raw format. This is an optional step for you as the idea is to see how easy it is to use Cognitive Services Computer Vision API with Azure Functions. If you are skipping this step, the you have to tweak the code a bit so that you can see the response in the log window of your function.

Provision a new CosmosDB and add a collection to it. By default there is a database named ToDoList and with a collection called Items that you can create soon after provisioning of the CosmosDB from the getting started page. If you want you can also create a new database and add a new collection which make more sense.

Creating CosmosDB

To create a new database and collection go to Data Explorer and click on New Collection. Enter the details and click OK to create a database and a collection.

Add Collection in CosmosDB

Now we have everything ready. let’s get started with the CODE!!.

Let’s start by creating a new function called SaveResponse which takes the API response as one of the parameters and is of type string. In this method I am going to use the Azure DocumentDB library to communicate with CosmosDB. This means I have to add this library as a reference to my function. To add this library as a reference, navigate to the KUDU portal by adding scm in the URL of your function app like so: https://quoteextractor.scm.azurewebsites.net/. This will open up the KUDU portal.

KUDU Portal

Go to Debug console and click CMD. Navigate to site/wwwroot/<Function Name>. In the command window use the command mkdir to create a folder name bin.

mkdir bin

KUDU File Listing

You can now see the bin folder in the function app directory. Click the bin folder and upload the lib(s).

View Function Files

In the very first line of the function use #r directive to add the reference of external libraries or assemblies. Now you can use this library in the function with the help of using directive.

#r "D:\home\site\wwwroot\QuoteExtractor\bin\Microsoft.Azure.Documents.Client.dll"

Right after adding the reference of the DocumentDB assembly, add namespaces with the help of using directive. In the later stage of the function, you will also need to make HTTP POST calls to the API endpoint and therefore make use of the System.Net.Http namespace as well.

using Microsoft.Azure.Documents;
using Microsoft.Azure.Documents.Client;
using System.Net.Http;
using System.Net.Http.Headers;

The code for the SaveResponse function is very simple and just make use of the DocumentClient class to create a new document for the response we receive from the Vision API. Here is the complete code.

private static async Task<bool> SaveResponse(string APIResponse)
{
    bool isSaved = false;
    const string EndpointUrl = "https://imgdocdb.documents.azure.com:443/";
    const string PrimaryKey = "QCIHMgrcoGIuW1w3zqoBrs2C9EIhRnxQCrYhZOVNQweGi5CEn94sIQJOHK3NleFYDoFclB7DwhYATRJwEiUPag==";
 
    try
    {
        DocumentClient client = new DocumentClient(new Uri(EndpointUrl), PrimaryKey);
        await client.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri("VisionAPIDB", "ImgRespCollcetion"), new {APIResponse});
        isSaved = true;
    }
    catch(Exception)
    {
        //Do something useful here.
    }   
    return isSaved;
}

View Function Files

Notice the database and collection name in the function CreateDocumentAsync where the CreateDocumentCollectionUri will return the endpoint uri based of the database and collection name. The last parameter new {ApiResponse} is the response that is received from the Vision API.

Create a new function and call it ExtractText. This function will take Stream object of the image that we can easily get from the Run function. This method is responsible to get the stream object and convert it into byte array with the help of another method ConvertStreamToByteArray and then send the byte array to Vision API endpoint to get the response. Once the response is successful from the API, I will save the response in CosmosDB. Here is the complete code for the function ExtractText.

private static async Task<string> ExtractText(Stream quoteImage, TraceWriter log)
{
    string APIResponse = string.Empty;
    string APIKEY = "b33f562505bd7cc4b37b5e44cb2d2a2b";
    string Endpoint = "https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/ocr";
 
    HttpClient client = new HttpClient();
 
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", APIKEY);
 
    Endpoint = Endpoint + "?language=unk&detectOrientation=true";
 
    byte[] imgArray = ConvertStreamToByteArray(quoteImage);
 
    HttpResponseMessage response;
 
    try
    {   
        using(ByteArrayContent content = new ByteArrayContent(imgArray))
        {
            content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
 
            response = await client.PostAsync(Endpoint, content);
 
            APIResponse = await response.Content.ReadAsStringAsync();
            log.Info(APIResponse);
            //TODO: Perform check on the response and act accordingly.
        }
    }
    catch(Exception)
    {
        log.Error("Error occured");
    }
 
    return APIResponse;
}

There are few things in the above function which need out attention. You will get the subscription key and the endpoint when you register for the the Cognitive Services Computer Vision API. Notice the endpoint I am using also had ocr in the end which is important as I want to read the text from the images I am uploading to the storage. The other parameters I am passing is the language and detectOrientation. Language has the value as unk which stands for unknown which in turn tells the API to auto-detect the language and detectOrientation checks text orientation in the image. The API respond in JSON format which this method returns back in the Run function and this output becomes the input for SaveResponse method. Here is the code for ConvertStreamtoByteArray metod.

private static byte[] ConvertStreamToByteArray(Stream input)
{
    byte[] buffer = new byte[16*1024];
    using (MemoryStream ms = new MemoryStream())
    {
        int read;
        while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
        {
            ms.Write(buffer, 0, read);
        }
        return ms.ToArray();
    }
}

Now comes the method where all the things will go into, the Run method. Here is how the `Run` method looks like.

public static async Task<bool> Run(Stream quoteImage, string name, TraceWriter log)
{
    string response = await ExtractText(quoteImage, log);
    bool isSaved = await SaveResponse(response);
    return isSaved;
}

I don’t think this method needs any kind of explanation. Let’s execute the function by adding a new image to the storage and see if we are able to see the log in the Logs window and a new document in the CosmosDB. Here is the screenshot of the Logs window after the function gets triggered when I uploaded a new image to the storage.

Function Logs

In CosmosDB, I can see a new document added to the collection. To view the complete document navigate to Document Explorer and check the results. This is how my view looks like:

CosmosDB Result View

To save you all a little time, here is the complete code.

#r "D:\home\site\wwwroot\QuoteExtractor\bin\Microsoft.Azure.Documents.Client.dll"
 
using System.Net.Http.Headers;
using System.Net.Http;
using Microsoft.Azure.Documents;
using Microsoft.Azure.Documents.Client;
 
 
public static async Task<bool> Run(Stream quoteImage, string name, TraceWriter log)
{
    string response = await ExtractText(quoteImage, log);
    bool isSaved = await SaveResponse(response);
    return isSaved;
}
 
private static async Task<string> ExtractText(Stream quoteImage, TraceWriter log)
{
    string APIResponse = string.Empty;
    string APIKEY = "b33f562505bd7cc4b37b5e44cb2d2a2b";
    string Endpoint = "https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/ocr";
 
    HttpClient client = new HttpClient();
 
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", APIKEY);
 
    Endpoint = Endpoint + "?language=unk&detectOrientation=true";
 
    byte[] imgArray = ConvertStreamToByteArray(quoteImage);
 
    HttpResponseMessage response;
 
    try
    {   
        using(ByteArrayContent content = new ByteArrayContent(imgArray))
        {
            content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
 
            response = await client.PostAsync(Endpoint, content);
 
            APIResponse = await response.Content.ReadAsStringAsync();
            //TODO: Perform check on the response and act accordingly.
        }
    }
    catch(Exception)
    {
        log.Error("Error occured");
    }
 
    return APIResponse;
}
 
private static async Task<bool> SaveResponse(string APIResponse)
{
    bool isSaved = false;
    const string EndpointUrl = "https://imgdocdb.documents.azure.com:443/";
    const string PrimaryKey = "QCIHMgrcoGIuW1w3zqoBrs2C9EIhRnzZCrYhZOVNQweIi5CEn94sIQJOHK2NkeFYDoFcpB7DwhYATRJwEiUPbg==";
 
    try
    {
        DocumentClient client = new DocumentClient(new Uri(EndpointUrl), PrimaryKey);
        await client.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri("VisionAPIDB", "ImgRespCollcetion"), new {APIResponse});
        isSaved = true;
    }
    catch(Exception)
    {
        //Do something useful here.
    }   
    return isSaved;
}
 
private static byte[] ConvertStreamToByteArray(Stream input)
{
    byte[] buffer = new byte[16*1024];
    using (MemoryStream ms = new MemoryStream())
    {
        int read;
        while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
        {
            ms.Write(buffer, 0, read);
        }
        return ms.ToArray();
    }
}

References: