Question Answering
Question answering
is a subfield of
natural language processing
(aka NLP, aka
computational linguistics),
which itself is a specialized area of
artificial intelligence.
The general idea is to be able to answer questions written in English,
by finding the answer in a collection of documents (which may be web
pages or plain text). Unlike traditional web search engines, however,
the goal is to find a specific answer, rather than a collection of
hundreds of possibly irrelevant links.
While there are many ways to approach this task, the following
framework is commonly used:
- Analyze and classify the question, determining what type of answer
is being sought. This is commonly done by examining words in the
question (for example, "who is" suggests that the answer will be
a name, "how many" suggests that it will be a number, etc.).
- Using information retrieval techniques (such as those employed
in a typical web search engine), construct a subset of available
documents which are likely to contain the answer. Typically
this is done by searching for terms drawn from the question.
- Analyze the retrieved documents, and search for entities of
the type determined in step 1.
- If an appropriate entity is found, return that entity as a
response.
Obviously many refinements to this framework are possible :-), most
of which are actively being explored; question answering is a popular
research topic, thanks in part to the
TREC
and
TAC
series of conferences.
My own work is still getting started, but in general I'll be working on
contextual question answering. The goal is to be able to answer a
sequence of questions, in which subsequent questions can refer back to
previous ones, and possibly also to previous answers.
For example, the following question sequence was used in the
TREC 10
conference:
- Which museum in Florence was damaged by a major bomb
explosion in 1993?
- On what day did this happen?
- Which galleries were involved?
- How many people were killed?
- Where were those people located?
- How much explosive was used?
Information about question answering, natural language processing,
and artificial intelligence in general, can be found in many
places on the Web:
- An incomplete collection of existing QA systems
(in alphabetical order by name :-):
- An equally incomplete collection of QA/NLP/CL research groups
(again, alphabetized by name :-):
- Google search results:
- TAC publication archives:
-
TAC 1 (2008) -- coming soon
("TAC 2008 proceedings papers will be available at the end of February, 2009")
(QA track)
- TREC publication archives:
-
NTCIR
publication archives:
- Research group publication archives:
- Personal publication archives (alphabetized by surname :-):
-
Bill Wilson's
Natural Language Processing Dictionary
- The
ACL Anthology
calls itself
"A Digital Archive of Research Papers
in Computational Linguistics"; it contains papers
from several journals and conferences (including ACL, EACL, COLING
and ANLP among others)
- Recent papers in
computation and language
and
information retrieval,
from the
arXiv.org e-Print archive
- arXiv's
Computing Research Repository
-
CiteSeer
- The
Cross Language Evaluation Forum (CLEF)
- A powerpoint presentation on
predictions about the future of QA,
by the chief linguist of
Ask.com (formerly "Ask Jeeves")
- The
Association for Computational Linguistics
maintains the
ACL NLP/CL Universe,
a compendium of resources and directories of research groups and
companies working in the field; they also have several
special interest groups, including but not limited to the following:
-
SIGDAT (Linguistic data and corpus-based approaches to NLP)
-
SIGMOL (Mathematics of Language)
-
SIGNLL (Natural Language Learning)
-
SIGPARSE (Natural Language Parsing)
-
SIGSEM (Computational Semantics)
- The
Natural Language Software Registry
-
openNLP
is "an organizational center for open source projects related
to natural language processing". They also maintain a
collection of links to
NLP resources
- An annotated collection of
resources for statistical NLP and corpus-based computational linguistics
from Stanford's NLP Group
-
Language Technology World
describes itself as
"the most comprehensive WWW information service and knowledge
source on the wide range of technologies that deal with human
language"
-
Kenji Kita's
collection of
speech and language resources
includes a collection of
software tools for NLP
- Some
text-processing utilities
by
Hans Paijmans
- The
CMU-Cambridge statistical language modelling toolkit
-
GATE,
the General Architecture for Text Engineering
- The
RASP
(Robust Accurate Statistical Parsing) Project
- Resources for
corpus-based linguists
(i.e., get your corpora here :-)
- Another source of (generally unannotated) corpora is the growing
collection of web sites devoted to
electronic texts
- Despite the name,
ScientificPsychic.com
has some useful links about the English language, including a
formal description of English grammar
- The
CMU AI Repository,
and their
NLP collection
- The home page of ACM's
SIGART special interest group
- The
Journal of Artificial Intelligence Research,
and their
links to other AI services
- The
Journal of Machine Learning Research
has published special editions on
shallow parsing
and
machine learning methods for text and images
- Finally, for anyone working in Lisp, the classic reference is
Guy L. Steele Jr.'s
Common Lisp the Language, 2nd Edition,
now available online!
- On a related note, you may also be interested in my collections of
resources on
computer science
and
programming
Last Update: 2009/02/01
______________________________
[Steven's home page]