OCR with Google Cloud Vision API

The first part of this guide is on PHP, whereas the second part is on how to implement it in Python.

  1. Enable Cloud Vision API
  2. Generate an API key from the API console
PHP Script

This is how you use Google Vision API in PHP. Simply change the type to anything applicable (in this case, we're using TEXT_DETECTION API), pass some necessary data, and boom, muh Google artificial intelligence right in front of you.


$api_key = YOUR_API_KEY;
$cvurl = "https://vision.googleapis.com/v1/images:annotate?key=" . $api_key;

if ($_FILES['photo']['name']) {
	if(!$_FILES['photo']['error']) {
		$valid_file = true;
		if($_FILES['photo']['size'] > (4024000)) {
			$valid_file = false;
			die('Your file\'s size is too large.');

		if($valid_file) {
			//convert it to base64
			$fname = $_FILES['photo']['tmp_name'];
			$data = file_get_contents($fname);
			$base64 = base64_encode($data);
			$r_json ='{
			  	"requests": [
					  "image": {
					    "content":"' . $base64. '"
					  "features": [
					      	"type": "' .$type. '",
							"maxResults": 200

			$curl = curl_init();
			curl_setopt($curl, CURLOPT_URL, $cvurl);
			curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
			curl_setopt($curl, CURLOPT_HTTPHEADER, array("Content-type: application/json"));
			curl_setopt($curl, CURLOPT_POST, true);
			curl_setopt($curl, CURLOPT_POSTFIELDS, $r_json);
			$json_response = curl_exec($curl);
			$status = curl_getinfo($curl, CURLINFO_HTTP_CODE);

			if ( $status != 200 ) {
			    die("Error: $cvurl failed status $status" );

			echo $json_response;
	else {
		echo "Error";
		die('Drror:  '.$_FILES['photo']['error']);

Extras -- Doing it in Python

  1. Install Python
  2. Enable Cloud Vision API

Open up cmd and issue a command $ python --version to make sure Python is installed correctly

The Python way
  1. Download and save this file as cloudvisreq.py. But, as usual, I'm gonna shamelessly copy the script here as a mirror in case Gist is down.
from base64 import b64encode
from os import makedirs
from os.path import join, basename
from sys import argv
import json
import requests

ENDPOINT_URL = 'https://vision.googleapis.com/v1/images:annotate'
RESULTS_DIR = 'jsons'
makedirs(RESULTS_DIR, exist_ok=True)

def make_image_data_list(image_filenames):
    image_filenames is a list of filename strings
    Returns a list of dicts formatted as the Vision API
        needs them to be
    img_requests = []
    for imgname in image_filenames:
        with open(imgname, 'rb') as f:
            ctxt = b64encode(f.read()).decode()
                    'image': {'content': ctxt},
                    'features': [{
                        'type': 'TEXT_DETECTION',
                        'maxResults': 1
    return img_requests

def make_image_data(image_filenames):
    """Returns the image data lists as bytes"""
    imgdict = make_image_data_list(image_filenames)
    return json.dumps({"requests": imgdict }).encode()

def request_ocr(api_key, image_filenames):
    response = requests.post(ENDPOINT_URL,
                            params={'key': api_key},
                            headers={'Content-Type': 'application/json'})
    return response

if __name__ == '__main__':
    api_key, *image_filenames = argv[1:]
    if not api_key or not image_filenames:
            Please supply an api key, then one or more image filenames

            $ python cloudvisreq.py api_key image1.jpg image2.png""")
        response = request_ocr(api_key, image_filenames)
        if response.status_code != 200 or response.json().get('error'):
            for idx, resp in enumerate(response.json()['responses']):
                # save to JSON file
                imgname = image_filenames[idx]
                jpath = join(RESULTS_DIR, basename(imgname) + '.json')
                with open(jpath, 'w') as f:
                    datatxt = json.dumps(resp, indent=2)
                    print("Wrote", len(datatxt), "bytes to", jpath)

                # print the plaintext to screen for convenience
                t = resp['textAnnotations'][0]
                print("    Bounding Polygon:")
                print("    Text:")
  1. Run the script in terminal

    $ python cloudvisreq.py API_KEY image1.jpg image2.png

  1. PHP script courtesy of http://terrenceryan.com
  2. More detailed Python implementation -- https://gist.github.com/dannguyen/a0b69c84ebc00c54c94d
  3. Google's official Cloud Vision API docs -- https://github.com/GoogleCloudPlatform/cloud-vision/tree/master/python/text
Show Comments

Get the latest posts delivered right to your inbox.