Monday, August 4, 2008

NLBean - Make your database understand English

Natural language processing (a.k.a. NLP) is a stream of artificial intelligence and computational linguistics. In theory, it is the most attractive method of human-computer interaction; but as natural language recognition seems to require extensive knowledge about the outside world and the ability to manipulate it, implementing Natural Language Processing has infact been one of the most sought after conundrums in the computing world. This article presents an abstract introduction to natural language processing and further discusses implementing the same to query databases.

What is a Natural Language Processing (NLP)…?

Natural language processing is the collection of techniques employed to enable the computers to understand the languages spoken by humans. The concept linguistic analysis and processing originated with efforts in the United States in the 1950s, wherein the intent was to use computers to automatically translate texts from foreign languages into English. Since computers had proven their ability to do arithmetic much faster and more accurately than humans, it was thought to be only a short matter of time before computers demonstrated the remarkable capacity to process human spoken languages. When computer based translation failed to yield accurate translations even after recurring efforts, automated processing of human languages was concluded to be far more complex than originally assumed. Hereafter natural language processing was recognized as a new field of study, devoted to developing algorithms and software for intelligently processing language data. Over the past 50 years, the field of natural language processing has advanced considerably and several algorithms have been developed, which process language grammar and syntax.

What is Natural Language Database Query (NLDQ)…?

Thinking a little innovative around the implementations of natural language processing, one can imagine a plethora of its applications, including a natural language processor to query databases. Natural language database query (NLDQ) is a subset of natural language processing (NLP) that deals with natural language inquiries against structured databases. The quintessence of natural language database querying (NLDQ) is to transform natural language requests into SQL or some other database query language, which could be further used to perform extractions from standard databases. As of today, there are quite some implementations which transform regular English sentences into well-formed queries. Following are some of the viable options in this segment – Commercial
  • Semantra
  • ELF English Query
Educational
  • Nchiql - a Chinese natural language database querying system
  • TELL-ME - a VAX/VMS based prototype natural language database querying system
Another workable option and one of my favorite open source projects in the arena of natural language database querying (NLDQ) is NLBean. Although the code is very much crude and experimental, yet it does work fairly well. The implementation could be extended, customized to identify varied organizational domain terms and used to render an easy to use interface for our business users who struggle to understand standard database query languages. The following screenshot depicts the standard interface rendered by NLBean v5.0 –

(Click on the image to zoom)

References
  • Download the latest version of NLBeans here.
  • Further details on NLBeans can be found here.

No comments: