Skip to main content

NLBean - Make your database understand English

Natural language processing (a.k.a. NLP) is a stream of artificial intelligence and computational linguistics. In theory, it is the most attractive method of human-computer interaction; but as natural language recognition seems to require extensive knowledge about the outside world and the ability to manipulate it, implementing Natural Language Processing has infact been one of the most sought after conundrums in the computing world. This article presents an abstract introduction to natural language processing and further discusses implementing the same to query databases.

What is a Natural Language Processing (NLP)…?

Natural language processing is the collection of techniques employed to enable the computers to understand the languages spoken by humans. The concept linguistic analysis and processing originated with efforts in the United States in the 1950s, wherein the intent was to use computers to automatically translate texts from foreign languages into English. Since computers had proven their ability to do arithmetic much faster and more accurately than humans, it was thought to be only a short matter of time before computers demonstrated the remarkable capacity to process human spoken languages. When computer based translation failed to yield accurate translations even after recurring efforts, automated processing of human languages was concluded to be far more complex than originally assumed. Hereafter natural language processing was recognized as a new field of study, devoted to developing algorithms and software for intelligently processing language data. Over the past 50 years, the field of natural language processing has advanced considerably and several algorithms have been developed, which process language grammar and syntax.

What is Natural Language Database Query (NLDQ)…?

Thinking a little innovative around the implementations of natural language processing, one can imagine a plethora of its applications, including a natural language processor to query databases. Natural language database query (NLDQ) is a subset of natural language processing (NLP) that deals with natural language inquiries against structured databases. The quintessence of natural language database querying (NLDQ) is to transform natural language requests into SQL or some other database query language, which could be further used to perform extractions from standard databases. As of today, there are quite some implementations which transform regular English sentences into well-formed queries. Following are some of the viable options in this segment – Commercial
  • Semantra
  • ELF English Query
Educational
  • Nchiql - a Chinese natural language database querying system
  • TELL-ME - a VAX/VMS based prototype natural language database querying system
Another workable option and one of my favorite open source projects in the arena of natural language database querying (NLDQ) is NLBean. Although the code is very much crude and experimental, yet it does work fairly well. The implementation could be extended, customized to identify varied organizational domain terms and used to render an easy to use interface for our business users who struggle to understand standard database query languages. The following screenshot depicts the standard interface rendered by NLBean v5.0 –

(Click on the image to zoom)

References
  • Download the latest version of NLBeans here.
  • Further details on NLBeans can be found here.

Comments

Popular posts from this blog

Shard – A Database Design

Scaling Database is one of the most common and important issue that every business confronts in order to accommodate the growing business and thus caused exponential data storage and availability demand. There two principle approaches to accomplish database scaling; v.i.z. vertical and horizontal. Regardless of which ever scaling strategy one decides to follow, we usual land-up buying ever bigger, faster, and more expensive machines; to either move the database on them for vertical scale-up or cluster them together to scale horizontally. While this arrangement is great if one has ample financial support, it doesn't work so well for the bank accounts of some of our heroic system builders who need to scale well past what they can afford. In this write-up, I intend to explain a revolutionary and fairly new database architecture; termed as Sharding, that some websites like Friendster and Flickr have been using since quite sometime now. The concept defines an affordable approach t...

FAINT - Search for faces

Lately, I've been playing around a bit with facial pattern recognition algorithms and their open source implementations. I came across many reference implementation but a very few were implemented in Java, and the Eigenfaces algorithm by far happens to be the best amongst them all. During my research around the said topic i happened to stumble-upon an implementation called FAINT (The Face Annotation Interface - http://faint.sourceforge.net). Faint by far the best facial pattern recognition API and as you must have already guessed, it implements the Eigenfaces algorithm. Now enough of theory talks, how about implementing an example with faint...? Here is one for all you face-recognition enthusiasts. The following example simply searches for faces in a given photograph and thumbnails them. Now, I know thats not face recognition; but be a little creative here. Once you have the facial thumbnails extracted, its never a big deal to look further in the Faint API and find methods which ca...

Is Java String really immutable...?

In many texts String is cited as the ultimate benchmark of Java's various immutable classes. Well, I'm sure you'd have to think the other way once you have read this article. To start with, let's get back to the books and read the definition of immutability. The Wikipedia defines it as follows - 'In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created.' I personally find this definition good as it mentions that an immutable instance's state should not be allowed to be modified after it's construction. Now keeping this in the back of our minds, let's decompile Java's standard String implementation and peep into the hashCode() method - public int hashCode() { int h = hash; if (h == 0) { int off = offset; char val[] = value; int len = count; for (int i = 0; i h = 31*h + val[off++]; } hash = h; } return h; } A detailed ...