Sunday, November 6, 2011

Google Dictionary API example in python - gets primaries and webDefinitions

As pointed out earlier by google, it has an official dictionary API.

The response that comes from the server is json string. I wrote two scripts define.py (that gets the meaning and converts it to dictionary) & pretty_print.py(print the meanings in a pretty way).

These scripts are a part of a GUI software which is under deveopment. You can also get the sources from  https://github.com/shadyabhi/godict

define.py

#!/usr/bin/python2

import json
import urllib
import re
import binascii

def asciirepl(match):
  s = match.group()  
  return '\\u00' + match.group()[2:]

def get_meaning(query):
    p = urllib.urlopen('http://www.google.com/dictionary/json?callback=a&q='+query+'&sl=en&tl=en&restrict=pr,de&client=te')
    page = p.read()[2:-10] #As its returned as a function call
    
    #To replace hex characters with ascii characters
    p = re.compile(r'\\x(\w{2})')
    ascii_string = p.sub(asciirepl, page)

    #Now decoding cleaned json response
    data = json.loads(ascii_string)
    
    #Assumes that we always recieve a webDefinitions. ??Yet to check??
    if "webDefinitions" not in data:
        return None

    no_of_meanings = len(data['webDefinitions'][0]['entries']) 
    all_meanings = dict()
    all_meanings['primaries'] = dict()
    all_meanings['webDefinitions'] = list()

    if 'primaries') in data:
        #Creating list() for each types: adj, verb, noun
        for bunch in data['primaries']:
            #This list contains meanings and examples
            all_meanings['primaries'][bunch['terms'][0]['labels'][0]['text']] = list()
            means = all_meanings['primaries'][bunch['terms'][0]['labels'][0]['text']]
            
            for i in range(len(bunch['entries'])):
                #Choosen meaning, others can be related
                if bunch['entries'][i]['type'] != "meaning": continue
                meaning = bunch['entries'][i]['terms'][0]['text']
                try:    
                    example = list()
                    #Examples start with ZERO index
                    for i_ex in range(0, len(bunch['entries'][i]['entries'])):
                        example.append(bunch['entries'][i]['entries'][i_ex]['terms'][0]['text'])
                        
                except:
                    example = None
                means.append([meaning, example])
                
    #Web definitions
    for meaning in data['webDefinitions'][0]['entries']:
        all_meanings['webDefinitions'].append(meaning['terms'][0]['text'])
    
    return all_meanings

The test script for the above module.

pretty_print.py

#!/usr/bin/python2

import define
import sys
import httplib
import xml.dom.minidom


means = define.get_meaning(sys.argv[1])

if means is not None:
    #Short Summary
    for sec in means['primaries'].keys():
        meanings = means['primaries'][sec]
        print sec, "\n---------------"
        for m in meanings:
            print "\n\t", m[0]
            try: 
                for e in m[1]: print "\t\t--",e
            except: pass
    #Web Definitions
    print "\nWeb Definitions","\n---------------"
    for defs in means['webDefinitions']:
        print "\t",defs
else:
    print "Word not found. These are he suggestions"
    data = """ 
    <spellrequest textalreadyclipped="0" ignoredups="0" ignoredigits="1" ignoreallcaps="1">
    <text> %s </text>
    </spellrequest>
    """

    word_to_spell = sys.argv[1]
    con = httplib.HTTPSConnection("www.google.com")
    con.request("POST", "/tbproxy/spell?lang=en", data % word_to_spell)
    response = con.getresponse()

    dom = xml.dom.minidom.parseString(response.read())
    dom_data = dom.getElementsByTagName('spellresult')[0]

    for child_node in dom_data.childNodes:
            result = child_node.firstChild.data.split()
            print result


When i execute pretty_print,


In a few days, I plan to make a GUi to this that also reminds of the words searched & hence help improving vocabulary.

2 comments:

  1. This may violate Google's policy since Google does not allow this kind of abuse.

    ReplyDelete
  2. What is a use case for this app? I am simply trying to upload my own python dict and then query it via GET api.

    ReplyDelete