OCR with Google Cloud Vision API

The first part of this guide is on PHP, whereas the second part is on how to implement it in Python.

Requirements
  1. Enable Cloud Vision API
  2. Generate an API key from the API console
PHP Script

This is how you use Google Vision API in PHP. Simply change the type to anything applicable (in this case, we're using TEXT_DETECTION API), pass some necessary data, and boom, muh Google artificial intelligence right in front of you.

<?php

$api_key = YOUR_API_KEY;
$cvurl = "https://vision.googleapis.com/v1/images:annotate?key=" . $api_key;
$type = "TEXT_DETECTION";

if ($_FILES['photo']['name']) {  
    if(!$_FILES['photo']['error']) {
        $valid_file = true;
        if($_FILES['photo']['size'] > (4024000)) {
            $valid_file = false;
            die('Your file\'s size is too large.');
        }

        if($valid_file) {
            //convert it to base64
            $fname = $_FILES['photo']['tmp_name'];
            $data = file_get_contents($fname);
            $base64 = base64_encode($data);

            $r_json ='{
                "requests": [
                    {
                      "image": {
                        "content":"' . $base64. '"
                      },
                      "features": [
                          {
                            "type": "' .$type. '",
                            "maxResults": 200
                          }
                      ]
                    }
                ]
            }';

            $curl = curl_init();
            curl_setopt($curl, CURLOPT_URL, $cvurl);
            curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($curl, CURLOPT_HTTPHEADER, array("Content-type: application/json"));
            curl_setopt($curl, CURLOPT_POST, true);
            curl_setopt($curl, CURLOPT_POSTFIELDS, $r_json);
            $json_response = curl_exec($curl);
            $status = curl_getinfo($curl, CURLINFO_HTTP_CODE);
            curl_close($curl);

            if ( $status != 200 ) {
                die("Error: $cvurl failed status $status" );
            }

            echo $json_response;
        }
    }
    else {
        echo "Error";
        die('Drror:  '.$_FILES['photo']['error']);
    }
}
?>

Extras -- Doing it in Python

Requirements
  1. Install Python
  2. Enable Cloud Vision API

Note:
Open up cmd and issue a command $ python --version to make sure Python is installed correctly

The Python way
  1. Download and save this file as cloudvisreq.py. But, as usual, I'm gonna shamelessly copy the script here as a mirror in case Gist is down.
from base64 import b64encode  
from os import makedirs  
from os.path import join, basename  
from sys import argv  
import json  
import requests

ENDPOINT_URL = 'https://vision.googleapis.com/v1/images:annotate'  
RESULTS_DIR = 'jsons'  
makedirs(RESULTS_DIR, exist_ok=True)

def make_image_data_list(image_filenames):  
    """
    image_filenames is a list of filename strings
    Returns a list of dicts formatted as the Vision API
        needs them to be
    """
    img_requests = []
    for imgname in image_filenames:
        with open(imgname, 'rb') as f:
            ctxt = b64encode(f.read()).decode()
            img_requests.append({
                    'image': {'content': ctxt},
                    'features': [{
                        'type': 'TEXT_DETECTION',
                        'maxResults': 1
                    }]
            })
    return img_requests

def make_image_data(image_filenames):  
    """Returns the image data lists as bytes"""
    imgdict = make_image_data_list(image_filenames)
    return json.dumps({"requests": imgdict }).encode()


def request_ocr(api_key, image_filenames):  
    response = requests.post(ENDPOINT_URL,
                            data=make_image_data(image_filenames),
                            params={'key': api_key},
                            headers={'Content-Type': 'application/json'})
    return response


if __name__ == '__main__':  
    api_key, *image_filenames = argv[1:]
    if not api_key or not image_filenames:
        print("""
            Please supply an api key, then one or more image filenames

            $ python cloudvisreq.py api_key image1.jpg image2.png""")
    else:
        response = request_ocr(api_key, image_filenames)
        if response.status_code != 200 or response.json().get('error'):
            print(response.text)
        else:
            for idx, resp in enumerate(response.json()['responses']):
                # save to JSON file
                imgname = image_filenames[idx]
                jpath = join(RESULTS_DIR, basename(imgname) + '.json')
                with open(jpath, 'w') as f:
                    datatxt = json.dumps(resp, indent=2)
                    print("Wrote", len(datatxt), "bytes to", jpath)
                    f.write(datatxt)

                # print the plaintext to screen for convenience
                print("---------------------------------------------")
                t = resp['textAnnotations'][0]
                print("    Bounding Polygon:")
                print(t['boundingPoly'])
                print("    Text:")
                print(t['description'])
  1. Run the script in terminal

    $ python cloudvisreq.py API_KEY image1.jpg image2.png


Resources
  1. PHP script courtesy of http://terrenceryan.com
  2. More detailed Python implementation -- https://gist.github.com/dannguyen/a0b69c84ebc00c54c94d
  3. Google's official Cloud Vision API docs -- https://github.com/GoogleCloudPlatform/cloud-vision/tree/master/python/text

Aiman Baharum

More about this blog https://github.com/aimanbaharum/random-wiki/wiki

Kuala Lumpur, Malaysia http://www.aimanbaharum.com

Subscribe to Knowledge Log

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!