Research Paper

Computer Standards & Interfaces 35 (2013) 470–481
Contents lists available at SciVerse ScienceDirect
Computer Standards & Interfaces

journal homepage: www.elsevier.com/locate/csi
How to make a natural language interface to query databases accessible to everyone:

An example
Miguel Llopis ⁎, Antonio Ferrández
Dept. Languages and Information Systems, University of Alicante, Spain
a r t i c l e i n f o a b s t r a c t
Available online 12 October 2012 Natural Language Interfaces to Query Databases (NLIDBs) have been an active research field since the 1960s.
However, they have not been widely adopted. This article explores some of the biggest challenges and ap-
Keywords: proaches for building NLIDBs and proposes techniques to reduce implementation and adoption costs. The article
Natural language interface describes {AskMe*}, a new system that leverages some of these approaches and adds an innovative feature:
Relational database query-authoring services, which lower the entry barrier for end users. Advantages of these approaches are
Ontology extraction
proven with experimentation. Results confirm that, even when {AskMe*} is automatically reconfigurable against
Concept hierarchy
Query-authoring services
multiple domains, its accuracy is comparable to domain-specific NLIDBs.
© 2012 Elsevier B.V. All rights reserved.
1. Introduction 2. Classification of existing NLIDBs
A natural language interface to query databases (NLIDB) is a system As we outlined in the previous section, there have been many dif-
that allows users to access information stored in a database by means of ferent approaches to the construction of NLIDBs. There are various
typing requests expressed in some natural language [1,2,22], such as ways for classifying them. In this article, we will explore two of the
English, Spanish, etc. most common taxonomies for classification of NLIDBs which appear
NLIDBs have been a field of investigation since 1960s [2]. There have across various overview articles about the NLIDB field (e.g. [2,22])
been many interesting theories and approaches about how an NLIDB and is also complemented by our own research observations:
could be built, in order to improve their accuracy [3], how to make
them more open in terms of the natural language expressions that - Based on user interface: textual NLIDBs vs. graphical NLIDBs.
they accept [4,16], or even more, how to make them guess the real in- - Based on domain-dependency: domain-dependent vs. domain-
tend of the user who is trying to construct a query where some pieces independent NLIDBs.
are missing [21], etc. We will analyze these approaches in this article. ο As part of the previous classification, we will divide these
While the research work on NLIDBs has led to many different NLIDBs in subcategories, based on their degree of portability
systems being implemented in academic and research environments and reconfiguration capabilities. This particular classification is
(e.g. [2–5,8,9,17–19]), it is difficult to find many of these systems not something that we have found on previous work per se,
being used in business environments or being commercialized in com- but rather a pattern that we have extracted based on the char-
panies expanding across various market segments or domain niches acteristics of systems that we have analyzed and previous re-
[22]. search papers on the field of NLIDBs that we have taken into
In this article, we will explore previous NLIDB systems and classify account for our work.
them based on the different approaches that they implement. At the
same time, we will explain which of these approaches lead to reduced In the next sections, we will explore the idiosyncrasies of each of
costs at different stages of the NLIDB lifecycle. Finally, we will look at these approaches. It is important to emphasize that we do not claim
how we have implemented our proposals to minimize implementation, one of these approaches to be better than others, as each of the ap-
configuration, portability and learning costs, by analyzing the imple- proaches has its advantages and disadvantages [2,22]. However, we
mentation of {AskMe*}, an ongoing NLIDB research work. will evaluate the convenience of each of these approaches in regards
to the main goal of our research work: optimize costs of NLIDBs.
2.1. NLIDBs by their user interface: textual interfaces vs. graphical interfaces
⁎ Corresponding author. One of the biggest questions in the space of NLIDBs through decades
E-mail addresses: mll9@alu.ua.es (M. Llopis), antonio@dlsi.ua.es (A. Ferrández). has been the disjunctive of choosing a textual or a graphical user
0920-5489/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.csi.2012.09.005
M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) 470–481 471
interface to build the system. Each of these two alternatives has its own - Domain-independent NLIDBs: There are many other NLIDBs that allow
advantages and disadvantages that are worth considering, as described the user to write queries in a natural language and that do not store
in [2,22]: any knowledge about the underlying domain; they simply translate
NL queries into SQL queries and execute them against the underlying
- Textual NLIDBs: examples of this type of NLIDB are HEY [18], AT&T
database [22]. Since the system does not know anything about the do-
[19], LUNAR [2,24] or PRECISE [3].
main, it is not able to warn the user about conceptual errors in the
ο Advantage:
query (entity–property mismatch, data type mismatch, etc.) and
■ User is not required to learn any additional language.
therefore the error-catching will happen in the database, thus, making
ο Disadvantages:
the system slower and less user-friendly when the query is ill-formed.
■ Linguistic coverage of the system is not obvious. An example of NLIDB system in this category is PRECISE [3].
■ Overlap of linguistic and conceptual failures.
- Graphical NLIDBs: an example of this type of NLIDBs is NL-Menu The problem of portability of NLIDBs is, from our perspective, one of
[30]. the most critical ones to be solved. By itself, the cost of developing an
ο Advantage: NLIDB can be very high, and in most of the approaches taken for creating
■ Easy to dynamically constrain query formulation based on NLIDBs, the resulting systems are tightly coupled to the underlying data-
user selections, in order to only build valid queries. bases [22].
ο Disadvantages: In the last few years, there have been interesting approaches to the
■ Lack of flexibility in query formulation. design of NLIDBs that are database-independent (e.g. [3,4]), in the
■ Expressivity power reduced to the user interface design (less sense that they can cope effectively with queries targeting different do-
expressivity power than a textual natural language). mains without requiring substantial reconfiguration efforts. One of the
best examples of this approach is PRECISE [3]. This system combines
While most of the NLIDBs built in the past can be classified in one of the latest advances in statistical parsers with a new concept of semantic
these two categories, there is an intermediate option between both tractability. This approach allows PRECISE to easily become highly
which consists on combining the expressivity power of a textual reconfigurable. In addition, this was one of the first NLIDB systems
NLIDB with the visual feedback to the user provided by a graphical that used the parser as a plug-in, so it could be changed with relative
NLIDB as we presented in our previous work [23]. This can be achieved ease in order to leverage newest advantages in the parsers' space.
by including query authoring services such as syntax coloring, text com- An interesting advantage of adapting the parsing process to each of
pletions or keyword highlighting as part of the system design; our pro- the knowledge domains that the system connects to is that analyzing an
posal is the first NLIDB that incorporates these features to the design of input question in NLIDB systems is often based on a part-of-speech
the system, to our best knowledge. Moreover, {AskMe*} helps the user (POS) tagging, followed by a syntactic analysis (partial or full) and final-
to make valid queries by automatically distinguishing between linguis- ly, a more or less precise semantic interpretation. Although there are
tic and conceptual failures. broadly accepted techniques for POS tagging (e.g. [5–7]) and syntactic
analysis (e.g. [6]), techniques for semantic parsing are still very diverse
2.2. NLIDBs by their degree of portability and re-configurability: and ad hoc. In an open-domain situation, where the user can ask ques-
domain-dependent vs. domain-independent NLIDBs tions on any topic, this task is often very difficult and relies mainly on
lexical semantics only. However, when the domain is limited (as is the
A second taxonomy in NLIDB classification can be made by consider- case of an NLIDB), the interpretation of a question becomes easier as
ing the different approaches for how an NLIDB relates to the knowledge the space of possible meanings is smaller, and specific templates can
domain of the database that is being queried. be used [8]. It has been demonstrated [9] that meta-knowledge of the
database, namely the schema of the database, can be used as an addi-
- Domain-dependent NLIDBs: These NLIDBs need to “know” particu-
tional resource to better interpret the question in a limited domain.
larities about the underlying domain entities and restrictions in
Another interesting existing solution, based on the creation of a
order to work.
new NLIDB every time that the system is connected to a new data-
ο Non-Reconfigurable: Many of the NLIDBs in this group are
base, is the system developed for the CINDI virtual library [4], which
designed ad-hoc for a particular problem domain (database).
is based in the use of semantic templates. The input sentences are
An example on this category is LUNAR [2,24].
syntactically parsed using the Link Grammar Parser [10], and seman-
ο Reconfigurable: Another group of NLIDBs are domain-dependent
tically parsed through the use of domain-specific templates. The sys-
but can be reconfigured towards being used to query a data-
tem is composed of a pre-processor and a run-time module. The
base that belongs to a different domain. In most cases, this
pre-processor builds a conceptual knowledge base from the database
reconfiguration consists on remapping domain entities and terms
schema using WordNet [13]. This knowledge base is then used at
from the DB in the query DSL.1 This, very often, requires the inter-
run-time to semantically parse the input and create the correspond-
vention of a technical user in order to perform these adjustments.
ing SQL query. The system is meant to be domain independent and
Examples in this category include AT&T [19] and ASK [2,22].
has been tested with the CINDI database that contains information
ο Auto-reconfigurable: This bucket is the most interesting from a
on a virtual library.
cost-saving perspective [23,30], as it will allow NLIDBs that are
The improvements that our research work in {AskMe*} provides in
knowledgeable about the underlying domain data (and therefore,
regards to the portability problem space are described in the next
they can provide more accurate information, error messages, etc.)
sections.
and at the same time it enables non-technical users to connect to
multiple databases without the need for manual reconfiguration.
3. Most significant costs in NLIDBs
The system knows who (the database connection string) and
what (entities, properties, data types, etc. generally captured in
Building an NLIDB system and bringing it into production has a sig-
an underlying source of knowledge, such as ontologies) to ask in
nificant cost [2,22,27,28,30]. This cost can be analyzed and divided
order to learn how to deal with the user queries. Examples of this
across the different stages of the NLIDB lifecycle: system implementa-
category include HEY [18], GINLIDB [29], FREyA [28] and an
tion, deployment and configuration, and finally system users' adoption.
NLIDB for CINDI virtual library [4].
- System implementation: Creating an NLIDB is not a trivial task; it rep-
1
Domain Specific Language resents an engineering effort that must be taken into account when
472 M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) 470–481
considering the creation of an NLIDB [2]. During the implementation Some other NLIDB systems developed in the past few years include
phase, and independent from the planning methodology being used, GINLIDB [29], WASP [22,25] and NALIX [22,26]. GINLIDB represents an
the costs can be mostly divided in the following three categories: interesting attempt at creating a fully auto-reconfigurable or “generic
ο Design: The design of the system has an expensive cost; it is in this interactive” (as the “G” and “I” letters in the acronym stand for) ap-
phase when various decisions must be taken: whether the system proach to the creation of NLIDBs. This system has been an inspiration
is designed to be domain-dependent or independent, what the for the work developed in {AskMe*}, however our system tries to go a
different modules should look like—lexer, syntactic and semantic step beyond what GINLIDB accomplished in auto-reconfiguration of
parsers, translation to SQL, etc. This design phase might require the NLIDB; while GINLIDB lets the user define custom mappings be-
weeks, even months, of engineering and architectural work [22]. tween words in their input queries and actual database entities by
ο Development: Even when there are tools and frameworks to assist means of graphical menus that are displayed after a query with errors
in the process, creating a natural language interface is a laborious or ambiguities has been introduced by the user, {AskMe*} attempts to
task. Being able to provide high expressivity power while also provide richer query-authoring services, which are aimed at helping
processing queries efficiently is hard [2,28]. users to easily learn how to ask questions in a new domain, by providing
ο Testing: In order to create a system than can be reliable, efficient query suggestions, error highlighting and domain-specific error de-
and error-free, it is important to significantly invest in testing it: scriptions, as we will describe later.
unit testing independent modules of the system, verifying the ro- WASP (Word Alignment-based Semantic Parsing) is a system devel-
bustness and validity of the system when integrating various oped at the University of Texas by Yuk Wah Wong [25]. While the system
pieces or validating the usability or the system in end-to-end sce- is designed to address the broader goal of constructing “a complete, for-
narios or queries are just some of the testing activities that must mal, symbolic, meaningful representation of a natural language sentence”,
be done in this phase [2]. it can also be applied to the NLIDB domain. A predicate logic (Prolog) was
- Deployment and configuration: This phase comprises the different ac- used as the formal query language. WASP learns to build a semantic pars-
tivities required to deploy and adapt the system, once it has been fully er given a corpus a set of natural language sentences annotated with their
implemented and tested, for the real use in a concrete enterprise. It correct formal query languages. The strength of WASP comes from the
includes, among others, the following tasks: deploying system com- ability to build a semantic parser from annotated corpora. This approach
ponents, configuring connections between components, connecting is beneficial because it uses statistical machine translation with minimal
the system to the domain database, mapping database entities to sys- supervision. Therefore, the system does not have to manually develop a
tem keywords, training the system to understand users' expressions, grammar in different domains. In spite of the strength, WASP also has
and ensuring robustness and high-availability of the deployed system two weaknesses. The first is: the system is based solely on the analysis
[22]. of a sentence and its possible query translation, and the database part is
- Users' learning process: Last but certainly not the least, once the sys- therefore left untouched. There is a lot of information that can be
tem has been deployed to an enterprise environment, it has to be ac- extracted from a database, such as the lexical notation, the structure,
cepted and understood by end-users. This is not a trivial process, in and the relations within. Not using this knowledge prevents WASP to
fact, making a system easy to understand, learn and use for the tar- achieve better performances, and this is an approach that {AskMe*} tries
get user must be considered as the most important principle from to improve as we will see later. The second problem is that the system re-
the design stage and across all the other phases: the most complete quires a large amount of annotated corpora before it can be used, and
and sophisticated NLIDB is worthless if users are not happy and sat- building such corpora requires a large amount of work [22].
isfied while using and interacting with it, or even more, if they reject NALIX is a “Natural Language Interface for an XML Database” [26].
using it because they do not like it. Thus, the learning process must The database used for this system is extensible markup language
be made smooth and compelling for the users, and this implies (XML) database with Schema-Free XQuery as the database query lan-
that a few different factors must be taken into account: users' guage. Schema-Free XQuery is a query language designed mainly for re-
learning curve for NLI constructions, database entities and relation- trieving information in XML. The idea is to use keyword search for
ships, etc. may be slow without the help from the system; users' databases. However, pure keyword search certainly cannot be applied.
learning curve for system graphical user interface may require addi- Therefore, some richer query mechanisms are added [26]. Given a col-
tional learning effort, users need to be trained in order to be able to lection of keywords, each keyword has several candidate XML elements
troubleshoot the most frequent system errors by themselves (con- to relate. All of these candidates are added to MQF (Meaningful Query
nectivity issues, users access, etc.) [27]. Focus), which will automatically find all the relations between these el-
ements. The main advantage of Schema-Free XQuery is that it is not
It is due to all the costs enumerated previously that we believe that necessary to map a query into the exact database schema, since it will
an NLIDB, in order to be successfully and widely adopted in the automatically find all the relations given certain keywords. In NALIX
real-world enterprise has to be designed once, being portable and able the transformation processes are done in three steps: generating a
to target different databases and knowledge domains, can be parse tree, validating the parse tree, and translating the parse tree to
reconfigured easily in order to connect to a different database without an XQuery expression. This approach is being leveraged by our system
the need of specialized deployment or reconfiguration steps that as well, in the sense that user queries are validated by {AskMe*} before
end-users cannot understand and, finally, must allow users to be pro- being executed against the database, thanks to the information avail-
ductive using the system since the first day of use, while implementing able from the database schema, but with {AskMe*} we try to go beyond
a mechanism for letting users learn more advanced concepts of the sys- this in order to provide richer query-authoring services to the user in
tem as they use it. order to make the “writing a query” step more interactive and educa-
tional for the user, as we will describe later.
4. Contributions of our approach compared to previous One of the first natural language interfaces that provide a notion of
related work suggestions to the user in order to author a query is OWLPath [27].
This system suggests to the user how to complete a query by combining
The main improvements of our proposal compared to other existing the knowledge of two ontologies, namely, the question and the domain
systems are the significant reduction of costs: implementation and ontologies. The question ontology plays the role of a grammar, providing
reconfiguration costs are optimized due to the dynamic nature of the the basic syntactic structure for building sentences. The domain ontology
system; and learning costs for end users are greatly reduced as well characterizes the structure of the application-domain knowledge in
thanks to the use of query-authoring services. terms of concepts and relationships. The system makes then suggestions
based on the content of the question ontology and its relationships with In order to achieve this, we combine domain-specific information,
the domain ontology. Once the user has finished formulating the natural captured in concept-hierarchy ontologies any time the system is
language query, OWLPath transforms it into a SPARQL query and issues it connected to a new database. The system automatically generates the
to the ontology repository. In the end, the results of the query are shown syntactic and semantic parsing templates and the rest of components
back to the user. This is an interesting approach in natural language in- needed in order to provide query-authoring services. In addition, the sys-
terfaces to query ontologies that were published just a few months be- tem is fully auto-reconfigurable without the need of any specialized
fore the first publication about {AskMe*} [23]. While both systems knowledge. This is a significant improvement compared to the existing
leverage ontologies in order to provide the user suggestions in how to portable solutions mentioned before, because it makes the entire
complete their queries, the systems are considerably different in a few reconfiguration process fully transparent to the end users as opposed
aspects: OWLPath is a natural language interface to query ontologies, to having to perform some reconfiguration steps for entity mapping, dis-
while {AskMe*} is a natural language interface to query databases that le- ambiguation, etc. This is a substantial improvement not only by the
verages ontology generation as a technique to capture the characteristics amount of extra work that is saved in the reconfiguration steps, but
and semantics of the underlying database schema. In addition, while also because it enables the system to be automatically managed without
OWLPath provides query suggestions or auto-completions for terms user intervention. This represents a step towards the democratization of
that exist in the underlying ontology, it does not provide error informa- NLIDBs, as users fitting a non-technical profile will be able to use the sys-
tion that is specific to the domain, in case that a query contains errors, tem on their own, throughout the entire system lifecycle, from the very
which is something that {AskMe*} tries to emphasize in order to educate early steps of adoption and deployment of the system towards a
users and help them learn how to use the system and understand the real-world production environment to the management, reconfiguration
logical model of the underlying domain. and diagnosis steps across multiple domains, for which the system is able
Another interesting and recent approach that inspires our work is to adapt itself automatically. In this sense, also, the role of query-
FREyA [28], which combines syntactic parsing with the knowledge authoring services is fundamental because they enable to perform the
encoded in ontologies in order to reduce the customization effort. If very little manual reconfigurations needed, if any, driven by intuitive
the system fails to automatically derive an answer, it will generate clar- real-time hints in the query authoring process.
ification dialogs for the user. The user's selections are saved and used for
training the system in order to improve its performance over time.
While this is an interesting approach and inspires {AskMe*} in its prin- 5. {AskMe*}: an NLIDB that reduces adoption, portability and
ciples, it differs from our research work in the sense that {AskMe*} fo- users' learning costs
cuses on helping users create valid queries from the beginning, as
opposed to FREyA's approach of letting them introduce wrong queries {AskMe*} is a database-independent NLIDB and uses a template-
and help the system correct them by means of clarification dialogs based approach for the dynamic generation of the lexer, syntactic and se-
which are used for auto-correcting errors in the future. mantic parsers. Fig. 1 shows the different modules of the system.
{AskMe*} is the first NLIDB system that proposes the combination of An exhaustive description of every component of the system is out
textual NLIDBs with rich query-authoring services (syntax coloring, of the scope for this paper, instead we will focus on describing the
error squiggles, tooltips, etc.). This provides a substantial improvement most relevant techniques that enable the proposed improvements
in the user experience when writing queries, especially in regards to of the system compared to other state-of-the-art systems: dynamic
query accuracy in order to solve both linguistic failures and conceptual generation of the system and query-authoring services.
failures, which could not be fully solved by the use of menu-based user In order to make this analysis easier to follow, we will use a case of
interfaces either. The use of query-authoring services helps to reinforce study and complement the description of each of these components
the conceptual center of the dialog between the user and the NLIDB with the application on a given domain. We will use a sub-set of
around the domain entities in focus. Northwind [14] (see Fig. 2), a canonical example of a relational database
Fig. 1. {AskMe*}'s high level architecture.

Fig. 2. Sub-set of Northwind database schema.
that captures the domain of a fictitious trading company, containing in- In order to build the ontology capturing the mapping described
formation about products, orders, suppliers, employees, etc. above, {AskMe*} leverages OWLminer's approach [7] which consists
on implementing the algorithm known as Feature and Relation Selec-
tion (FARS) [11]. FARS is multi relation feature selection that uses target
5.1. Ontology builder tables and attributes in order to create join chains with other tables
using foreign keys as links. The algorithm also uses Levenshtein Dis-
The first operation performed once {AskMe*} connected to a data- tance [12] as a metric for determining whether features are related or
base is to search for the ontology representing that domain in the ontol- not. This metric is based on closeness between text and feature's value
ogy repository. This repository consists on a dictionary that stores of the dataset. During this approximate search, every set of input texts
ontology references for any given tuples bServer, Database> that the from the set of relations and tables in the given database is analyzed.
system has been connected to. The result of this analysis is a set of attributes that meet the constraint:
If the ontology for that particular domain does not exist, the ontology all members must be columns (properties) within the current database
generation process is triggered. This process [10] analyzes the database table (entity) as described in Table 1.
catalog and schema in order to build the ontology that captures the do- After this first-level search has been performed for a given table, the
main entities, properties, relationships and constraints. {AskMe*} is next steps consist on finding the cross-table relationships, taxonomic
using OWL for representing ontologies. In order to build the ontology and non-taxonomic relations and dependencies, in order to make the
for each database, and keep the system within a manageable range of ontology grow in this dimension too. The attributes identified in the
data volume, only the minimal information needed from the database previous step are now used to analyze and discover the set of corre-
is stored into the ontology. Concretely, entity names, properties and sponding tables. As part of this process, also the primary and foreign
value types are mapped from the database into the ontology, while the keys are identified.
actual data is not. The reason that motivates this decision is that we are The output of the feature and relation selection algorithm described
using OWL as a way to represent domain characteristics (entity names, in Fig. 3 is represented as an XML document where the first-level nodes
entity properties, relationships, etc.) in the underlying database, howev- in the tree represent tables. An example of the output of this algorithm
er, the actual data is much bigger in size than the schema and also can be found on Fig. 4, based on the Northwind schema described
changes more often than the schema. Therefore, we decided to perform previously.
an analysis of the database schema that allows us to capture the nature of
the domain, while the actual data retrieval process part of each user
query execution is performed directly against the database, after validat- Table 1
ing that all domain restrictions are being satisfied by the user query, for Database–ontology mapping.
which we leverage the domain representation captured in the OWL on- Database component OWL component
tology. As a matter of fact, if a user query that complies to all domain re-
Table/entity Class
strictions stored in OWL is then executed against the database and the Column Functional property
result indicates that there has been a change in the underlying domain Column metadata: OWL property restriction:
that makes the OWL be out of date, a new XML–OWL generation process - Data type. - All values from restriction.
is triggered automatically, in order to refresh the domain ontology and - Mandatory/non-nullable. - Cardinality() restriction.
- Nullable. - MaxCardinality() restriction.
keep it accurate at any time with the underlying database.
Fig. 3. Feature and relation selection algorithm.
As can be seen in Fig. 4, all entities in the previously described sam- given NLIDB. This particularly means the set of entities and properties
ple based on Northwind are being captured in a custom XML tree struc- that have been identified in the database schema. In the case of
ture. This XML tree contains not only the entity (table) names but also {AskMe*}, these terms are also captured in the domain ontology.
columns and column types for each table, as well as primary key and In order to build the lexicon, we combine the set of nouns derived
foreign key information. from the domain knowledge contained in the database, namely the
The next step in the extraction process consists on converting this entity and property names, with a general-knowledge vocabulary
XML tree into an ontology that can be used to generate all query- terms, mostly verbs, adjectives and adverbs. We are retrieving these
authoring services information needed. In order to achieve this, the general-knowledge vocabulary terms from WordNet [13], a large lexical
previous XML tree needs to be converted into OWL and for that the database of English. This database classifies nouns, verbs, adjectives and
tree is processed and each table node is converted into an OWL class in adverbs into sets of cognitive synonyms. Cognitive synonyms (also
the resulting document. For this sample we will only focus on the foreign named in WordNet as “synsets” [13]) are terms which belong to a dif-
key relations for Category, Product, Supplier, Order and OrderDetail, but ferent syntactic category (i.e. nouns, verbs, etc.) but represent related
all other object properties would be represented in this OWL as well. In a concepts; an example of a set of cognitive synonyms could be “ap-
similar way, each foreign key is expressed as an OWL object property in proximation” (noun), “approximated” (adjective) and “approximate”
which two primary classes are related using domain and range attributes (verb). Thanks to these cognitive synonyms sets, we are also able to
(see Fig. 5). complement the existing set of domain-specific nouns (entities and
By using this approach, the building process of the OWL ontology is properties from the domain ontology) with an important amount of
accelerated and also the use of background knowledge helps to extract synonyms, into the system lexicon. This aspect is very important, as it
the required knowledge from the database. This approach is considerably will allow the lexer to automatically accept terms that, even when
better in cost (time and space) than simply mirroring the database sche- they are not the exact noun used in the underlying database schema,
ma to the ontology, based on multiple experiments as described in [11]. represent the same concept for the user. For example, the database
may contain a property called “Telephone” for the entity “Customer”;
5.2. Dynamic parser generation while the user probably refers to it simply as “Phone”. The lexer is
able to recognize “phone” as a valid term as well. In case the term is
After building the ontology that captures the overall characteristics not in WordNet, such as “ProductID”, several heuristics are applied
of the database domain, the next step consists on automatically building (e.g. by splitting a term into several terms when there is an uppercase
the parsers that will help understand users' queries and translate them letter in the middle of a term in lower case letters: “Product+ ID”). Fi-
into SQL queries to be executed against the database. nally, the user can review this lexicon in order to add or suppress syno-
As described previously, {AskMe*} is fully auto-reconfigurable and it nyms (e.g. the term “Emp” is not in WordNet so the user could add
can be pointed against multiple domains, while at the same time it is synonyms such as “Employee”).
able to offer domain-specific features such as lexical, semantic and con- Following the example of Northwind described previously (see
ceptual error detection. The key for these capabilities resides in the abil- Fig. 2), the domain-specific lexicon of nouns built from the ontology
ity to perform this dynamic parser generation at all three levels: lexical, (WordNet synonyms in parentheses) is presented in Table 2. Note
syntactic and semantic. that it contains Entities and Properties as specialized terms, this classifi-
cation is not relevant to the lexicon itself, but will be used lately for se-
5.2.1. Lexicon mantic analysis as we will describe.
A lexicon is formed by the set of terms that can be understood by By having the dynamic lexicon generation process, {AskMe*} can
the system; that is, the set of terms that have a special meaning in a implement an interesting feature such as the lexical error detection
Fig. 4. Generated custom XML tree containing all Northwind entities in the sample.
capability. Once the system has been configured, a user can start typing that goes immediately before this white space and decides whether it
in a query and it will be processed by the lexer at first. Every time that a is valid from the lexical perspective or not. If the term does not appear
white space has been added to the buffer, the lexer analyzes the term in the lexicon, the lexer will tag it as an invalid lexical item. This tag in-
formation is automatically retrieved by the query-authoring services
component, that will underline the invalid term with red squiggles in
the query bar, making it evident to the user that the underlined part is
wrong in his query, even before he finishes writing it and offering
tooltip information about the invalid term (Fig. 6).
In the example shown in Fig. 6, a query about “projects” is being
provided by the user. The system analyzes this query in real time
and determines that “projects” is the entity that needs to be found
in the underlying domain. In order to do this, {AskMe*} looks for
this entity on the lexicon (from Table 2) and determines that it does
not exist. As a result of this, the system notifies the user about the
error in the query by adding red squiggles as an underline to the
term that has not been found in the lexicon. When the user places
the mouse on this term, a tooltip containing additional information
about the error is displayed to the user.
The other query-authoring service offered by {AskMe*} at lexer
level is the completion suggestions mechanism, which offers in a
dropdown pop-up menu, which appears below the word that the
user is currently typing, the set of suggested words that contain the
portion typed by the user as a fragment. This helps the user to re-
member the exact word that he is trying to write, and also to
Fig. 5. OWL capturing classes and relations from Northwind. autocomplete it, making him write queries faster (Fig. 7).
Table 2
Northwind's lexicon with WordNet synonyms.
Entities (and synonyms) Properties
Product (merchandise, ware) Product ID (product identifier), product name (product denomination), supplier ID (provider identifier), category ID (type
identifier, class identifier), quantity per unit, unit price (unit cost), units in stock, units in order, reorder level, discontinued.
Order (command) Order ID (order identifier), employee ID (worker identifier), order date (command date), required date (due date), shipped
date, ship via, freight (cargo), ship name, ship address, ship city (town, municipality), ship region, ship postal code (ship zip
code), ship country.
Order details Order ID (order identifier), product ID (product identifier, ware identifier, merchandise identifier), unit price (unit cost),
quantity (amount), discount (deduction, reduction, allowance).
Categories (types, classes) Category ID (type identifier, class identifier), category name (category denomination, class name, class denomination, type
denomination, type name), description (representation, information), picture (photo, photograph, image).
Suppliers (dealers, providers, vendors) Supplier ID (dealer identifier, provider identifier, vendor identifier), company name (business name, enterprise name),
contact name (correspondent name), contact title (correspondent appellation), address (direction, domicile), city (town,
municipality), region (territory, district), postal code (zip code), country (nation, state), phone (telephone), fax (facsimile),
home page.
5.2.2. Syntactic parser 5.2.3. Semantic parser

{AskMe*} leverages the Link Grammar Parser [10] for the core syn- The third parsing step performed to an input query is the semantic
tactic parsing operations. The Link Grammar Parser is a syntactic parser parsing. In {AskMe*}, given its dynamic domain-specific knowledge
of English, based on link grammar, an original theory of English syntax. acquisition nature, it may be feasible to find that a certain query is
Given a sentence, the system assigns to it a syntactic structure, which valid according to the lexical and syntactic analysis, but does not rep-
consists of a set of labeled links connecting pairs of words. The parser resent a concept that fits into the current domain. For example, the
also produces a “constituent” representation of a sentence (showing query “Name and date of the customers from the country where
noun phrases, verb phrases, etc.), like the one shown in Fig. 8. most orders were made in 2010” could be lexically and syntactically
The parser has a dictionary of about 60,000 word forms. It has cover- valid, all the terms in the sentence may be present in the dynamic lex-
age of a wide variety of syntactic constructions, including many rare and icon, and the syntactic construction and order of words match one of
idiomatic ones. The parser is robust; it is able to skip over portions of the the valid categories of phrases in the Link Grammar Parser. However,
sentence that it cannot understand, and assign some structure to the as you will notice, the concept of Date may not exist for the entity
rest of the sentence. It is able to handle unknown vocabulary, and Customer. This is definitely an error in the input query, a semantic
make intelligent guesses from context and spelling about the syntactic error.
categories of unknown words. It has knowledge of capitalization, nu- In order to detect this kind of errors, the semantic parsing step is
merical expressions, and a variety of punctuation symbols. applied to the input query. The semantic parser is guided by the use
A full description of the Link Grammar is out of scope for this article, of semantic templates which are filled with the concepts captured
however it is noteworthy that, by using the Link Grammar API, the total- in the domain ontology. The set of rules that are modeled by these
ity of this parser's capabilities can be leveraged in {AskMe*}, thus dynamically-generated semantic templates are:
enabling our efforts to focus on other innovative areas such as the com-
- Entity–Property correspondence: This rule enforces that all the re-
bination of query-authoring services within the proposed NLIDB, as
quested properties for an entity in a query are indeed part of the
well as the portability of the system.
current domain schema.
The concurrency mechanisms implemented on top of the Link
- Cross-entities relationships: This rule is applied to queries that con-
Grammar Parser API are based on event notifications for all the syntactic
tain multiple sub-phrases, and its purpose is to enforce that there
parser events: every time the parser processes and tags a fragment
exists a foreign-key relationship in the database schema between
of the input query an event is generated, containing information
the entities in the query.
about the syntactic classification for each token. This is a key compo-
nent for driving the syntactic query-authoring service that {AskMe*}
implements: syntactic error squiggles (green). These squiggles warn
the user about syntactic errors in a query, even before the query
authoring has been fully completed (Fig. 9).
Fig. 6. Lexical error squiggles and tooltip error information.
Fig. 8. Constituent tree for the query “Suppliers that are not in United States”.
Fig. 7. Completions for supplier properties starting with “Co”. Fig. 9. Syntactic error squiggles and tooltip error information.
Table 3
Examples of semantic rules behavior.
Schema relationships Input query Result Reason
Customers–orders and orders–products Products from customers whose last name is Llopis. Fail There is not an existing relationship between products and
customers in this domain.
Customers–orders and orders–products Products that were ordered by more than 100 customers Success Products are related to orders, and every order references a
in 2010. customer.
Table 4 6. Evaluation
Examples of template-based semantic error messages.
Inconsistency type Error description message In order to evaluate the effectiveness of our approach, we are ap-
Entity–property “Entity A” does not contain a property called “Property A”
plying three different experiments:
mismatch (where Entity A and Property A are the values in a query).
Missing relationship “Entity A” and “Entity B” are not related to each other. - Accuracy in query interpretation for a concrete domain.
- Effectiveness of query authoring services for a concrete domain.
- Portability of the system across domains.
- Entities' default attributes: There are cases in which the query is

valid from a lexical, syntactic and semantic analysis, but it does 6.1. Accuracy in query interpretation for a concrete domain
not specify which attributes must be present in the result. For in-
stance, the query in Table 3, “Products that were ordered by more The first experiment consists on evaluating the accuracy of our
than 100 customers in 2010”, does not specify which product query interpretation process in a concrete domain. For that purpose,
properties we are interested in. This semantic rule does not inval- we evaluated our system using data from the Air Travel Information
idate a given input query, but rather imposes that the resulting (ATIS) domain [15]. The ATIS database is based on air travel data
SQL query must return all the product attributes that are obtained from tile Official Airline Guide (OAG) in June 1992 and current
not-null in the database schema, such as the Product ID, Product at that time. The database includes information for 46 cities and 52 air-
Name, Price, etc. This information, as we explained previously, ports in the US and Canada. The largest table in the expanded database,
was captured in the domain ontology as an OWL cardinality meta- the flight table, includes information on 23,457 flights. A complete ref-
data attribute. erence about the ATIS domain can be found at [15]. The selection of
ATIS was motivated by three concerns. First, a large corpus of ATIS
Some examples of these rules are presented and analyzed in Table 3. sentences already exists and is readily available. Second, ATIS provides
In the case that one or more of these semantic requirements are not met an existing evaluation methodology, complete with independent train-
by the input query, the semantic analysis would report errors. These er- ing and test corpora, and scoring programs. Finally, evaluation on a
rors are notified to the system in the form of events. The query-authoring common corpus makes it easy to compare the performance of the sys-
services component is subscribed to these semantic events, in the same tem with those based on different approaches. Our experiments utilized
way as it is to the lexical and syntactic ones, and would therefore notify the 448 context independent questions in the ATIS “Scoring Set A”,
the user in a visual way about the issue, by highlighting the portions of which is one of the sets of questions of the ATIS benchmark, generally
the input query that cause the inconsistency. When the user hovers the most commonly used for the evaluation of other systems, and the
with the mouse over these highlighted regions, a tooltip containing a one that lets us compare with most of them.
description of the inconsistency comes up. This description is also {AskMe*} produced an accuracy rate of 94.8%. System accuracy
template-based, see Table 4. rate is calculated based on the equation in Fig. 11.
Fig. 10. Examples of queries from ATIS and results obtained with {AskMe*}.
Fig. 11. System accuracy equation.

Table 5 entities and properties, and also auto-correction of typos based on

Accuracy comparison using ATIS between various NLIDB systems. distance-editing algorithms. Table 6 shows some of the most interesting
HEY [16] SRI [17] PRECISE [3] {AskMe*} MIT [18 ] AT&T [19] queries written by users and how {AskMe*} guided them towards the
right query.
92.5 93 94 94.8 95.5 96.2
In terms of semantic error distribution classified by the main seman-
tic rules that {AskMe*} implements, this evaluation determines that 51%
Table 5 contains a comparison of the results obtained by {AskMe*} in of them fall in the rule of entity–property mismatch, thus being the
the ATIS benchmark, to other state-of-the art systems. In some cases, as most common semantic error, 41% of errors correspond to queries try-
displayed in Fig. 10, some of the failures are due to domain-specific in- ing to refer to a missing relationship that does not exist in the domain
formation or query shortcuts (such as “tomorrow” →Date–Time, etc.) and the remaining 8% represents semantic errors due to the query spec-
which {AskMe*} does not support yet because other functional work ifying invalid values in property conditions.
was prioritized higher, such as domain-portability or query-authoring
services. 6.3. Portability of the system across multiple domains
These results confirm that, even when {AskMe*} is a fully
reconfigurable system that can be targeted to multiple knowledge do- Our third experiment focuses on evaluating the portability of the
mains, its accuracy results against a particular domain are very similar system. For this purpose, we have created a script that simulates the
to the results for other state-of-the-art systems which are tailored to user actions through the visual interface. In this test, the system will
the underlying domain. be connected to three different databases that we have previously
configured: ATIS, AdventureWorksDB [20] and Northwind [14]. For
6.2. Effectiveness of query authoring services for a concrete domain each of these database connections, a custom benchmark made up of
fifty different queries that are relevant to the correspondent domains
The second experiment that we are using to evaluate our system is (ATIS as described in the first experiment, Northwind as described
measuring how using query-authoring services improves the overall through different sections of this paper and Adventure Works shown
usability of the system, by enabling early detection of query errors. In in Fig. 12) is executed against the system, asserting that the query-
order to do that, we asked a set of ten users to write fifty queries per authoring services work as expected and that the resulting SQL query
user in a given domain. These users were completely new to the system is generated as expected as well.
and they did not have any previous knowledge about the underlying Finally, the test also evaluates the behavior when the system is
domain. We gave them an initial description of the Northwind database, connected to a database that had been already connected before,
without schema representation or concrete entity/property names, and checking that the ontology generation process is not kicked-off again,
let them query the system in an exploratory way. This description was but rather the existing ontology for that source is pulled back from
as simple as explaining them that the database contained information the store and brought into the current connection context. The results
about products, product categories, orders, order details and suppliers. of this experiment indicate that there is not any lose in accuracy after
During this process, users are very likely to introduce mistakes in a reconnection to a different database, and the results are the same as
most of the queries they come up with for the first time. We captured if the system was only connected to a single database for its lifetime.
traces for all of these queries and recorded in which stage of the parsing This means that the same results observed in the first and second exper-
process they were raised. iments apply to the scenario of multiple database reconnections with-
Our results indicate that, from the set of fifty input queries per user, out degrading the overall accuracy of the system after connecting to
almost 90% of them contained errors, from which roughly the 80% of multiple domains.
these wrong queries could be detected before they were translated
into SQL and, therefore, before they were being executed against the da- 7. Conclusions and future work
tabase. This fact results in significant improvements in terms of latency
time for wrong queries, since thanks to the query-authoring services {AskMe*} is an adaptive natural language interface and environment
that {AskMe*} implements, they are locally detected by the system in- system to query arbitrary databases. Internally, the system leverages an
stead of being translated into SQL and executed against the database. ontology-based approach in which a new ontology is auto-generated
The results of this experiment show that while an important amount every time the system is connected to a different database. Once this on-
of errors (23%) are due to lexical errors (usually things like typos), and tology has been generated, the rest of the system – domain-specific
26% of them correspond to syntactic errors (mostly ill-formed grammar, query-authoring services, etc. – reconfigures itself based on
sentences in the English language), most of the errors are due to seman- the set of language terms and relationships contained in the ontology.
tic errors (51%). In order to help minimizing the probability of having This automatic reconfiguration enables an effective lexical, syntactic
lexical errors in a query, the system provides auto-completion for and semantic validation of an input query, which will result in a higher
Table 6
Sample queries fixed by user interaction with query services.
User query Corrected query How was it fixed?
List all categories of products List all categories of products Auto-correction of typos and lexical level (distance-editing
algorithm comparing to “known” valid tokens)
Products from customers whose last name is Llopis Products ordered by customers whose last name is Llopis A semantic error tooltip is displayed in user query; they learn
that the relation products–customers is transitive, via order
details and orders (which contain customer ID).
The user follows guidance in order to end up with a valid
query.
Products from whose last name is Llopis Products from customers whose last name is Llopis The syntactic parser detects a syntactic error when the user
types “whose” as the entity is missing. By providing an error
squiggle and tooltip, the user is able to identify the missing
piece on the query and correct it in order to fix and complete
the rest of the query.
Fig. 12. Adventure Works database simplified schema used in the second experiment.
accuracy of the system. The evaluation process showed how, despite the asking different questions about different aspects of the same entity,
system is not specific to any concrete domain, the result of 94.8% of which will result, again, in another important usability shift for
accuracy against the ATIS benchmark is relatively good compared to {AskMe*} [21]. The main drawback of using this kind of resolution is
other existing state-of-the-art systems, both domain-dependent and its low precision. However, we plan to overcome the low precision
independent. of anaphora and ellipsis resolution by means of benefiting of query
Furthermore, this approach enables full portability of the system authoring services.
without any reconfiguration steps needed for the system to successfully
execute queries against any new database. Extra mapping reconfigu-
Acknowledgments
rations, user preferred ways to refer to elements of the domain-model,
can be done through easy user interface gestures such as right-clicking
This research has been partially funded by the Valencia Govern-
elements (i.e. words) of a given query. We believe that the simplification
ment under Project PROMETEO/2009/119, and by the Spanish Gov-
of the reconfiguration process when connecting to new database
ernment under Project Textmess 2.0 (TIN2009-13391-C04-01) and
schemas is a very important step towards the democratization of NLIDBs
TIN2012-31224.
in real world setup, as it enables non-technical users to be able to fully
control the system through its entire lifecycle.
In addition, it enables the construction of a customized textual query References
environment in which a set of query-authoring services can be provided
to the user, to help authoring and disambiguating queries. These [1] S. Abiteboul, V. Hull, R. Viannu, Foundations of Database Systems, Addison Wesley,
1995.
query-authoring services play a fundamental role in systems' usability, [2] L. Androutsopoulos, Natural language interfaces to databases—an introduction,
making it possible to early detect query errors, as demonstrated in the Journal of Natural Language Engineering 1 (1995) 29–81.
evaluation section, where we observed that around the 80% of the [3] A. Popescu, A. Armanasu, O. Etzioni, D. Ko, A. Yates, PRECISE on ATIS: Semantic
Tractability and experimental results, in: Proceedings of the National Conference
queries that contained errors could be detected before they were actu- on Artificial Intelligence – AAAI, 2004, pp. 1026–1027.
ally translated into SQL, resulting in a more efficient, lower-latency, [4] N. Stratica, L. Kosseim, B.C. Desai, Using Semantic Templates for a natural lan-
user-interactive system. The classification of these errors based on the guage interface to the CINDI virtual library, Data & Knowledge Engineering
Journal 55 (1) (2004) 4–19.
parsing stage in which they are detected, as shown in the evaluation, [5] D. Jurafsky, J. Martin, Speech and Language Processing: An Introduction to Natural
gives us the possibility to selectively focus on improving the quality Language Processing, Speech Recognition and Computational Linguistics, Prentice
and functionality of query-authoring services at each stage of the pars- Hall, 2000.
[6] C. Manning, H. Schutze, Foundations of Statistical Natural Language Processing,
ing process, in order to maximize the investment in relation to the gain MIT Press, 1999.
of the overall user experience. Finally, just remark that {AskMe*} helps [7] H. Santoso, S. Haw, Z.T. Abdul-Mehdi, Ontology extraction from relational database:
the user to make valid queries as well by automatically distinguishing concept hierarchy as background knowledge, Knowledge-Based Systems 24 (3)
(2011) 457–464.
between linguistic and conceptual failures.
[8] M. Watson, NLBean(tm) version 4: a natural language interface to databases,
Based on our very positive evaluation results for early error detec- www.markwatson.com.
tion, thanks to the use of query-authoring services, as future work, we [9] R. Bartolini, C. Caracciolo, E. Giovanetti, A. Lenci, S. Marchi, V. Pirrelli, C. Renso, L.
are trying to maximize this benefit by experimenting with new Spinsanti, Creation and use of lexicons and ontologies for NL interfaces to data-
bases, in: Proceedings of the International Conference on Language Resources
query-authoring services and improving the existing ones. Moreover, and Evaluation, vol. 1, 2006, pp. 219–224.
we will add anaphora and ellipsis resolution capabilities in {AskMe*}. [10] D. Sleator, D. Temperley, Parsing English with a link grammar, in: Proceedings of
Anaphora and ellipsis resolution are an active research field in the the Third International Workshop on Parsing Technologies, 1991.
[11] B. Hu, H. Liu, J. He, X. Du, FARS: multi-relational feature and relation selection ap-
space of NLIDBs; this capability enables users to have the possibility proach for efficient classification, in: Proceedings of the Advance Data Mining and
to dramatically abbreviate the number of words to be written when Application Conference, vol. 1, 2008, pp. 73–86.
[12] V.I. Levenhstein, Binary Codes capable of correcting deletions, insertions, and re- user interaction, in: Proceedings of the 7th Extended Semantic Web Conference,
versals, Soviet Physics – Doklady 10 (8) (1966) 707–710. 2010, pp. 106–120.
[13] G.A. Miller, WordNet: a lexical database for English, Communications of the ACM [29] P.R. Devale, A. Deshpande, Probabilistic context free grammar: an approach to ge-
Journal - CACM 38 (11) (1995) 39–41. neric interactive natural language interfaces to databases, Journal of Information,
[14] Northwind, http://msdn.microsoft.com/en-us/library/aa276825(SQL.80).aspx. Knowledge and Research in Computer Engineering 1 (2) (2010) 52–58.
[15] M. Bates, S. Boisen, J. Makhoul, Developing an evaluation methodology for spoken [30] H.R. Tennant, K.M. Ross, M. Saenz, C.W. Thompson, J.R. Miller, Menu-based natu-
language systems, in: Proceedings of the Speech and Natural Language Work- ral language understanding, in: Proceedings of the 21st Annual Meeting of ACL,
shop, vol. 1, 1990, pp. 102–108. 1983, pp. 151–158.
[16] H. Young, S. Young, A data-driven spoken language understanding system, in:
Proceedings of the IEEE Workshop on Automatic Speech Recognition and Under-
standing, vol. 1, 2003, pp. 583–588. Miguel Llopis is a Ph.D. Student at the Department of Soft-
[17] R.C. Moore, D.E. Appelt, SRI's experience with the ATIS evaluation, in: Proceedings ware and Computing Systems in the University of Alicante
of the Workshop on Speech and Natural Language, 1990, pp. 147–148. (Spain). His research interests include: Natural Language
[18] V. Zue, J. Glass, D. Goddeau, D. Goodine, L. Hirschman, M. Phillips, J. Polifroni, S. Processing, Question Answering and Domain-Specific
Seneff, The MIT ATIS system: February 1992 Progress Report, in: Proceedings of Languages. He has written various papers in journals and
the Workshop on Speech and Natural Language, 1992, pp. 84–88. participated in international conferences related to his
[19] D. Hindle, An analogical parser for restricted domains, in: Proceedings of the research topics. Besides his Ph.D. studies and research
Workshop on Speech and Natural Language, 1992, pp. 150–154. activity, Miguel works as a Program Manager in the
[20] Adventure Works, http://msdn.microsoft.com/en-us/library/ms124659.aspx. SQL Server Team at Microsoft Corporation (Redmond,
[21] J.L. Vicedo, A. Ferrandez, Importance of pronominal anaphora resolution in ques- Washington). Contact him at mll9@alu.ua.es.
tion answering systems, in: Proceedings of the 38th Annual Meeting of the Asso-
ciation for Computational Linguistics, 2000, pp. 555–562.
[22] N. Nihalani, S. Silakari, M. Motwani, Natural language interface for database: a
brief review, International Journal of Computer Science Issues 8 (2) (2011)
600–608.
[23] M. Llopis, A. Ferrandez, {AskMe*}: Reducing the costs of adoption, portability and Antonio Ferrández is a Full-time Lecturer at the Depart-
learning process in a natural language interface to query databases, in: Proceedings ment of Software and Computing Systems in the Universi-
of the 8th International Workshop on Natural Language Processing and Cognitive ty of Alicante (Spain). He obtained his Ph.D. in Computer
Science, vol. 1, 2011, pp. 75–89. Science from the University of Alicante (Spain). His research
[24] W.A. Woods, R.M. Kaplan, B.N. Webber, The Lunar Sciences Natural Language In- interests are: Natural Language Processing, Anaphora Reso-
formation System: Final Report, in: BBN Report, 2378, 1972. lution, Information Extraction, Information Retrieval and
[25] Y.W. Wong, Learning for semantic parsing using statistical machine translation Question Answering. He has participated in numerous
techniques, in: Technical Report UT-AI-05-323, University of Texas, Austin, 2005. projects, agreements with private companies and public or-
[26] Y. Li, H. Yang, H.V. Jagadish, NALIX: an interactive natural language interface for ganizations related to his research topics. Finally, he has su-
querying XML, in: Proceedings of the International Conference on Management pervised Ph.D. Thesis and participated in many papers in
of Data, 2005, pp. 900–902. Journals and Conferences related to their research interests.
[27] R. Valencia-Garcia, F. Garcia-Sanchez, D. Castellanos-Nieves, J.T. Fernandez-Breis, Contact him at antonio@dlsi.ua.es.
OWLPath: an OWL ontology-guided query editor, IEEE Transactions on Systems,
Man, and Cybernetics—Part A: Systems and Humans 41 (1) (2011) 121–136.
[28] D. Damljanovic, M. Agatonovic, H. Cunningham, Natural language interfaces to
ontologies: combining syntactic analysis and ontology-based lookup through the

Research Paper

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Research Paper

Загружено:

Авторское право:

Доступные форматы

Computer Standards & Interfaces 35 (2013) 470–481

Contents lists available at SciVerse ScienceDirect

Computer Standards & Interfaces

How to make a natural language interface to query databases accessible to everyone:

1. Introduction 2. Classiﬁcation of existing NLIDBs

Fig. 1. {AskMe*}'s high level architecture.

Fig. 2. Sub-set of Northwind database schema.

Fig. 3. Feature and relation selection algorithm.

Entities (and synonyms) Properties

5.2.2. Syntactic parser 5.2.3. Semantic parser

Fig. 6. Lexical error squiggles and tooltip error information.

Schema relationships Input query Result Reason

- Entities' default attributes: There are cases in which the query is

Fig. 11. System accuracy equation.

Table 5 entities and properties, and also auto-correction of typos based on

User query Corrected query How was it ﬁxed?

Вам также может понравиться