Cyc/FAQ/2.0

From Public Domain Knowledge Bank
Jump to: navigation, search

Cyc/FAQ/2.0

Unofficial Cyc FAQ Version 2.0

David Whitten Jul 31, 1997, 3:00:00 AM
The Unofficial, Unauthorized Cyc Frequently Asked Questions Information Sheet.

Written by David Whitten, with various input from other net citizens

If you think of questions that are appropriate for this FAQ, or would
like to improve an answer, please send email to me at whi...@netcom.com

*** Copyright:

Copyright (c) 1994,1995 by David Whitten. All rights reserved.
Portions copyright (c) by MCC and Cycorp.

This FAQ may be freely redistributed in its entirety without
modification provided that this copyright notice is not removed. It
may not be sold for profit or incorporated in commercial documents
(e.g., published for sale on CD-ROM, floppy disks, books, magazines,
or other print form) without the prior written permission of the
copyright holder. Permission is expressly granted for this document
to be made available for file transfer from installations offering
unrestricted anonymous file transfer on the Internet. Should this file
be made available, a courtesy mail message to whi...@netcom.com with
access information would be appreciated.

This article is provided AS IS without any express or implied warranty.

*** Topics Covered:

[1] Introduction
[1-1] What is Cyc ?
[1-2] How do I find this FAQ ?
[2] Who is doing Cyc ?
[2-1] Who is sponsoring it ?
[2-2] Who is Douglas Lenat ?
[2-3] Who else was previously or is currently working on Cyc ?
[2-4] How do I contact the authors myself ?
[3] When did they start Cyc and when will it be completed ?
[4] Why are they creating Cyc ?
[5] Where can I get the source code ?
[6] What do they use to make it work ?
[6-1] What programming language is it written in ?
[6-2] What machines does it run on ?
[6-3] How large is Cyc? How has the size changed over the project?
[7] How does it work ?
[7-1] How do they store common sense in a computer ?
[7-2] How do they input common sense ?
[7-3] What theoretical foundation is behind Cyc ?
[7-4] What is the difference between Cyc and an Expert System ?
[7-5] What are they doing in Natural Language Processing ?
[8] What are Cyc's capabilities right now?
[8-1] Can Cyc reason about data not stored in Cyc format databases?
[8-2] If a robot had a radio link to Cyc, what could it do ?
[8-3] When do they expect Cyc to be able to read ?
[8-4] Are you able to converse with Cyc ?
[8-5] Will Cyc pass the Turing Test any time soon ?
[9] Cyc standards with ANSI or ISO standards
[9-1] What are the details of the functional interface to CycL ?
[9-2] Can an SQL query be used to ask Cyc a question?
[9-3] Can Cyc generate an SQL query to find information it needs ?
[9-4] Can Cyc interact with HTML ?
[9-5] Does Cyc have a KIF interface ?
[A] Acknowledgements
[B] Bibliography

Search for [#] to get to question number # quickly.

*** Recent changes:

;;; 1.0: 15-DEC-94 djw Initial release
;;; 2.0: 25-SEP-95 djw Update, include info about Cyc 10

----------------------------------------------------------------
Subject: [1] Introduction


Certain questions and topics come up frequently in the various artificial
intelligence discussion groups about the Cyc program.

This file/article is an attempt to gather these questions and their answers
together as a convenient reference for AI researchers, students, hobbyists,
and practitioners. I post it whenever I notice a newbie asking about Cyc on
one of the newsgroups I read. I hope this will cut down on network traffic,
and increase the enjoyment of these newsgroups for the regular readers
by eliminating the necessity to read and respond to the same questions over
and over again. It may even answer some questions that readers may not have
thought about yet, and hopefully will stimulate new discussions by
increasing the total amount of information available.

Currently this FAQ covers the obvious questions and answers, but I plan to
add new questions and answers as they become common.

----------------------------------------------------------------
Subject: [1-1] What is Cyc ?

Cyc is the name of a very large, multi-contextual knowledge base and
inference engine, the development of which started at the Microelectronics
and Computer Technology Corporation (MCC) in Austin, Texas during the early
1980s.

Over the past eleven years the members of the Cyc team have added to the
knowledge base a huge amount of fundamental human knowledge: facts, rules
of thumb, and heuristics for reasoning about the objects and events of
modern everyday life.

Cyc is an attempt to do symbolic AI on a massive scale. It is not based on
numerical methods such as statistical probabilities, nor is it based on
neural networks or fuzzy logic. All of the knowledge in Cyc is represented
declaratively in the form of logical assertions. Cyc presently contains
approximately 400,000 significant assertions, which include simple
statements of fact, rules about what conclusions to draw if certain
statements of fact are satisfied (true), and rules about how to reason with
certain types of facts and rules. New conclusions are derived by the
inference engine using deductive reasoning.

Its avowed purpose is to break the software brittleness bottleneck
once and for all by constructing a foundation of basic "common sense"
knowledge -- a sort of semantic substratum of terms, rules, and
relations -- that will enable a variety of knowledge-intensive
products and services. Cyc is intended to provide a "deep" layer of
understanding that can be used by other programs (such as
domain-specific expert systems) to make them more flexible. To date,
Cyc has made possible ground-breaking pilot applications in the areas
of heterogeneous database browsing and integration, captioned image
retrieval, and natural language processing.

----------------------------------------------------------------
Subject: [1-2] How do I find this FAQ ?

This FAQ is posted semi-regularly on the comp.ai and comp.ai.nat-lang
newsgroups, as well as being available from its author. (whi...@netcom.com)
It is not currently available from archive sites, WWW sites, and ftp sites.

A previous version of this FAQ that has been HTML-ized is available at
http://www.mcs.net/~jorn/html/ai/cycfaq.html
and it is anticipated that this version will be available there as well.

a simple text listing of it is available at:
http://www.mcs.com/~drt/software/cycfaq

----------------------------------------------------------------
Subject: [2] Who is doing Cyc ?

Much of the Cyc work has been done at the Microelectronics and
Computer Technology Corporation in Austin, Texas. In January of 1995,
a new independent company named Cycorp was created to further the work
done on the Cyc project. Cycorp continues to be based in Austin,
Texas.

----------------------------------------------------------------
Subject: [2-1] Who is sponsoring it ?

The development of Cyc has been supported by several organizations,
including Apple, Bellcore, DEC, DoD, Interval, Kodak, and Microsoft.

----------------------------------------------------------------
Subject: [2-2] Who is Douglas Lenat ?

Doug Lenat is one of the world's leading computer scientists and is head of
the Cyc Project at MCC and President of Cycorp. He has been a Professor of
Computer Science at Carnegie-Mellon University and Stanford University.

He is a prolific author, whose publications include the books

Knowledge Based Systems in Artificial Intelligence (1982, McGraw-Hill)
Building Expert Systems (1983, Addison-Wesley)
Knowledge Representation (1988, Addison-Wesley)
Building Large Knowledge Based Systems (1989, Addison-Wesley)

His 1976 Stanford thesis earned him the bi-annual IJCAI Computers and
Thought Award in 1977.

He was named one of America's brightest scientists under the age of 40 in
the December 1984 Science Digest.

In 1986, he was elected as Councilor of AAAI.

----------------------------------------------------------------
Subject: [2-3] Who else was previously or is currently working on Cyc ?

Here is a complete list of everyone who has worked with Cyc for more
than one continous year since 1988, plus present employees. Many
other people have made sporadic or episodic contributions to Cyc.
Present employees of Cycorp -- full-time, part-time, occasional, or
currently on leave -- are marked with *.

Paul Blair (1991-1993) is a graduate student in Philosophy who worked on
Cyc as a knowledge enterer. His contributions to the project included
writing a clear introduction to CycL, Cyc's representation language. Paul
is currently continuing his graduate studies in New York City.

Judy Bowman (1986-1994) was the secretary for the Cyc project during most of
its existence at MCC.

Rupert Brauch (1992-1994) holds a bachelor's degree in Philosophy and
contributed to Cyc as a knowledge enterer. He is now pursuing a
graduate degree in Computer Science at Stanford.

*Kathy Burns (1993-present) is a member of the Cycorp Technical Board,
and directs Cycorp's natural language processing development effort.
She holds a bachelor's degree in Linguistics from the University of
Texas at Austin, and has done graduate study in Linguistics at McGill
University. In recent months Kathy (in conjunction with Keith
Goolsbey) has completely rebuilt Cyc's NL facilities to make them
compatible with the new version of the Cyc system (Cyc 10). She has
made important contributions to Cyc's database browsing and retrieval
application, has composed Cyc HTML interface and help pages, and has
done a significant amount of knowledge entry.

*Lisa Colvin (1995-present) holds a BA in philosophy from Tufts
University, and an MA in Linguistics from the University of Texas at
Austin. She works on Cycorp's natural language processing development
effort and has done knowledge entry in a variety of domains.

*Tony Davis (1995-present) holds a BA in Linguistics and Applied Math
from U.C. San Diego, and will soon receive a PhD in Linguistics from
Stanford. Before becoming a Cycorp employee he worked as a
computational linguist for MITRE Corp., where he developed methods for
ambiguity resolution. He works on Cycorp's natural language
processing development effort and is especially interested in verb
semantics.

Mark Derthick (1988-1994) holds a PhD in Computer Science from
Carnegie-Mellon University. He developed and maintained a variety of
interface, browsing, and knowledge entry tools, including the Heuristic
Level to Epistemological Level translator that proved to be a crucial
addition to the earlier (pre-1991) frame-based version of Cyc. With Karen
Pittman, he played a major role in developing the Cyccess pilot
application, which demonstrated the value of using Cyc to integrate the
data contained in disparate structured information sources such as database
tables and spreadsheets. Mark is currently working on a data
representation and visualization project at CMU.

*David Gadbois (1990-present) is a member of the Cycorp Technical Board.
He holds a BA in Plan II, a BS in Mathematics, and an MSCS in Computer
Science from the University of Texas at Austin. He is presently
finishing a PhD in Computer Science there with a dissertation on static
analysis and compilation of rule system programs. He has written many
of Cyc's system and network interfaces, manages Cycorp's computer
systems, and helped design and implement Cyc database integration
facilities. He is presently developing applications based on the
database facilities.

Lila Ghemri (1992-1995) holds a PhD in Computational Linguistics from
the University of Bristol, England. She worked on the Cyc natural language
processing effort, did a significant amount of knowledge entry, and
played an important part in expanding Cyc's English language lexicon.
In the Spring of 1995, she and her family moved to New York.

*Keith Goolsbey (1990-present) is a member of Cycorp's Technical
Board. He holds a Bachelor's degree in Electrical Engineering and a
Master's degree in Computer Science, both from the University of Texas
at Austin. He has done much knowledge entry (most notably regarding
Cyc's treatment of quantities and scalar intervals), and has
constructed practical HTML interfaces for Cyc, including a
browsing/editing interface. In recent months Keith has played a
dominant role in designing and implementing Cyc 10, including new
versions of the Cyc inference engine (with Kenneth Murray), the Cyc
Common Lisp to C translator, and the Cyc natural language processing
facilities (with Kathy Burns). He has also implemented a distributed
version of Cyc that allows the inference engine to derive conclusions
using several, physically distinct KBs containing knowledge about
different conceptual domains.

Ramanathan V. Guha (1987-1994) holds an MS in Mechanical Engineering
from UC-Berkeley, and a PhD in Computer Science from Stanford. From
1990 to 1994 he was the technical director of the Cyc project. He has
written several important papers and technical reports, including (as
co-author) the book Building Large Knowledge Based Systems (1989,
Addison-Wesley). His most notable contributions included converting
Cyc from a frame-based system to one in which the fundamental data
objects are logical assertions, and the implementation of logical
contexts (microtheories) as a method for structuring the knowledge in
the Cyc knowledge base. Guha left the Cyc project at the end of 1994
and now works for Apple.

*Bill Jarrold (1990-present) holds a BS in Cognitive Science from MIT. He
has done a great deal of knowledge entry work for Cyc, most notably in the
domains of weather and naive spatial relations. He is involved in the
design and implementation of test suites for the Cyc knowledge base, the
inference engine, and the underlying source code. Bill is currently
pursuing a PhD degree in Counseling Psychology at the University of Texas
at Austin.

Kate Joly (1992-1994) holds a Bachelor's degree in Linguistics from UC-Santa
Cruz. She made important contributions to Cyc's natural language
processing effort, including writing most of the syntactic parsing
templates for translating from English to CycL.

*Fritz Lehmann (1995-present) specializes in building "ontological bridges"
between different knowledge bases, thesauri, and standards. He has
concentrated on semantic integration of differently arranged
databases. He edited the book "Semantic Networks in Artificial
Intelligence" and has written articles on the mathematical structure
of taxonomies as well as practical issues involved in giving an
ontological basis to the fields and codes contained in existing
data interchange standards. He does some outside consulting on ontology-
based methods, and has done detailed work on names, addresses,
and documents. He is working on deep health-related modelling issues.

Liz Lempert (1992-1994) holds a Bachelor's degree in Symbolic Systems from
Stanford. She has done a great deal of knowledge entry for Cyc in a wide
variety of domains. She is now pursuing graduate studies in Boston.

Bill MacCartney (1994-1996) holds a Bachelor's degree in
Philosophy from Princeton. In addition to doing knowledge entry, he
has helped with porting Cyc from Common Lisp to C, and has worked on
Cyc interface development in C++ and HTML. In recent months, he
played an important part (with Keith Goolsbey) in devising the
algorithm for Cyc's approach to distributed inferencing.

Alan McKendree (1991-1994) holds a BS in Mathematics from the University of
Texas at Austin. He worked on Cyc as a knowledge enterer and a system
support specialist.

Kathy Mitchell (1991-1995) holds a Bachelor's degree in Computer
Science from Texas A & M and an MS in Computer Science from the
University of Texas at Austin. She did much knowledge entry for Cyc
in just about every domain. She played an important part in testing
and debugging the Cyc captioned image retrieval pilot application, and
also worked on the Cyc natural language processing effort. In the
Spring of 1995 she and her husband moved to Portland.

*Kenneth Murray (1987-present) is a member of Cycorp's Technical
Board. He holds a PhD in Computer Science from the University of
Texas at Austin. He has contributed to Cyc as a knowledge enterer in
a wide variety of domains. With Keith Goolsbey, he is responsible for
maintaining the Cyc inference engine, and he has written several of
the heuristic modules designed to make Cyc's inference engine more
efficient. He has also implemented one component of a CycL to SQL
translator, making a major contribution to the Cyc 10 database
browsing and retrieval application.

Deborah Nichols (1992-1994) did knowledge entry for Cyc in several domains.
She is now pursuing a PhD in Philosophy at the University of Texas at
Austin.

*Karen Pittman (1987-present) is a member of Cycorp's Technical Board,
and plays a prominent role in directing general knowledge entry work
on Cyc, training new knowledge enterers, scoping out new knowledge
domains, and planning application-specific knowledge entry tasks. She
is responsible for a great deal of the knowledge presently in the Cyc
knowledge base. She holds an MS in Botany from the University of
Texas at Austin, and is pursuing an MS in Computer Science (also at
UT). With Mark Derthick, she played a major role in developing the
Cyccess pilot application, which demonstrated the value of using Cyc
to integrate the data contained in disparate structured information
sources such as database tables and spreadsheets. More recently, she
has coordinated Cycorp's work on the Cyc 10 database browsing and
retrieval application.

*Dexter Pratt (1989-1994,1996-present) holds a BS in Chemistry from Yale.
Now an independent software developer and consultant, he was a pioneer in
the development of the Lisp machine workstation in the early 1980s, when he
worked for LMI. He contributed to Cyc in many ways, including writing
and maintaining interfaces, system software development and maintenance,
helping to port Cyc from Common Lisp to C, and knowledge entry in a variety
of domains. With Nick Siegel, he played a major role in developing Cyc's
captioned image retrieval pilot application.

Wanda Pratt (1990-1993) holds an MS in Computer Science from the University
of Texas at Austin. She contributed to Cyc as a knowledge enterer and a
system software developer and maintainer. Her knowledge entry work covered
many different domains. She is now pursuing a PhD in Computer Science at
Stanford.

Wei-Min Shen (1989-1991) holds a PhD in Computer Science from
Carnegie-Mellon University. He did knowledge entry for Cyc and explored
his interest in machine learning. He is now pursuing a career in
university teaching and research.

*Mary Shepherd (1984-present) is a Sociologist by training. She has
contributed to Cyc as a knowledge enterer, and for much of the past eleven
years has dealt with the onerous administrative and personnel tasks
necessary to keep the Cyc team functioning. She is presently the
administrative manager of Cycorp.

*Nick Siegel (1988-present) is a member of Cycorp's Technical Board.
He holds a BA in History of Religions from Creighton University and an
MA in Cultural Anthropology from the University of Texas at Austin.
His contributions have included planning and directing knowledge entry
tasks, training new knowledge enterers, designing tests for the Cyc
system, and implementing and maintaining various HTML interface tools.
With Dexter Pratt, he played a major role in developing Cyc's
captioned image retrieval pilot application. More recently, he
implemented a database "meta query" browser that allows naive users to
quickly determine the types of knowledge contained in a set of data
tables.

*Kevin Smith (1995-present) holds a BS in Symbolic Systems from Stanford
University, with a concentration in Natural Language. Before becoming
a member of the Cycorp technical staff he worked in Japan as an English
teacher and language learning specialist. He works on Cycorp's natural
language processing development effort and has also worked as a knowledge
enterer. He is especially interested in the interdependence between
cultural knowledge and language.

Srinija Srinivasan (1993-1995) holds a Bachelor's degree in Symbolic
Systems from Stanford. She did a great deal of knowledge entry for
Cyc in a wide variety of domains, most notably in the area of human
emotional states. She now works for Yahoo.

Jamie Stephens (1992-1994) worked on Cyc as a knowledge enterer and system
support specialist.

Dan Torosian (1990-1993) worked on Cyc as a knowledge enterer. A
professional jazz musician (clarinet, saxophone), he is now pursuing his
musical career full-time in Austin, Texas.

Ginger Webb (1990-1991) holds a Master's degree in French Linguistics from
the University of Texas at Austin. She worked on Cyc as a knowledge
enterer, contributing to several different domains.

Alan Kay, Michael Lesk, John McCarthy, Marvin Minsky, Tom Murphy, Bob
Simpson, Pat Hayes, Marvin Weinberger, and Steve Chenoweth have all
provided useful comments and met with the Cyc team at various times.

----------------------------------------------------------------
Subject: [2-5] How do I contact the authors myself ?

The Cyc group maintains a low profile on the Internet. As it would be easy
to be deluged by email, they have chosen not to publicize their addresses.
If you wish to discuss the ideas behind Cyc, or the philosophy of the Cyc
project, you probably will get a faster response by posting to the comp.ai,
the comp.ai.philosophy, or the comp.ai.nat-lang newsgroups.

There are many talented people who read these groups, and it is possible
that someone not affiliated with the Cyc project will be able to answer
your questions.

----------------------------------------------------------------
Subject: [3] When did they start Cyc and when will it be completed ?

The Cyc project began as a dream to create a computerized encyclopedia.
When Alan Kay, one of computing's legendary figures, was at Atari's research
center, he asked Doug Lenat for something original to add to this project.
After Atari hit financial difficulties, Doug Lenat relocated his idea to MCC.
Initially, Cyc was based on a re-implementation of RLL (the frame-based
language underlying Eurisko) which was similar to the simultaneously but
independently developed KRL.

The original ten year funding period for the Cyc project at MCC was
supposed to end in 1994, but was extended for one year to the close of
1995.

From the perspective of some people on the Cyc team, asking when Cyc will
be completed is sort of like asking of a person, "When will s/he be
finished?" A more pertinent question is, when will it be useful? The
practical answer is, when it can solve the problems and perform the tasks
its builders would like for it to perform. The Cyc team believes that Cyc
is now ready to be used in some interesting and useful applications.

----------------------------------------------------------------
Subject: [4] Why are they creating Cyc ?

The Cyc team doesn't believe there is any shortcut toward being
intelligent or creating an artificial intelligence based agent.
Addressing the need for a large body of knowledge complete with
content and context may only be done by manually organizing and
collating information. This knowledge includes heuristic, rule of
thumb problem solving strategies, as well as facts that can only be
known to a machine if it is told.

Much of the useful common sense knowledge needed for life is
prescientific and has therefore not been analyzed in detail. Thus a
large part of the work of the Cyc project is to formalize common
relationships and fill in the gaps between the highly systematized
knowledge used by specialists in the modern world.

----------------------------------------------------------------
Subject: [5] Where can I get the source code ?

It is not free, nor is it freely available. If you or your company are
willing to become a corporate sponsor of the Cyc development effort, you
will be able to have the same access to the internal details and internal
documentation available to the other sponsors (excluding, of course,
information that is proprietary to particular sponsors). This is not an
option for everyone, and Cycorp reserves the right to determine if they
will accept sponsorship by your company. As the current sponsors have
invested a considerable sum of money in developing Cyc, please do not
pursue this option unless you or your company are willing to make a similar
contribution. Serious inquiries regarding collaboration or sponsorship may
be sent to:

Doug Lenat
Cycorp, Inc.
3500 West Balcones Center Drive
Austin, Texas 78759

While the intent is to make Cyc widely available (so that it will become
the standard representation and reasoning system), Cycorp is committed to
protecting the intellectual property rights of those who have invested in
Cyc's development.

----------------------------------------------------------------
Subject: [6] What do they use to make it work ?

The Cyc system itself (the knowledge base, inference engine, interface
modules, etc.) is a fairly large, complex piece of software. However, it
now runs on what is basically stock, off-the-shelf hardware. Cyc can be
run in a networked mode (information provided on one machine is available
to all the other machines) or on a stand-alone workstation serving several
users at once.

----------------------------------------------------------------
Subject: [6-1] What programming language is it written in ?

There are both Common Lisp and C versions of Cyc. Most development is
currently done in Common Lisp running on Symbolics Lisp machines.
Lisp source code is translated into C, using a Common Lisp to C
translator developed by the Cyc team, to produce source code that can
be compiled by a variety of standard ANSI C compilers. Using the
standard HTML-based interface tools, it is virtually impossible for a
user to tell whether a given Cyc image is running in Common Lisp or C.

----------------------------------------------------------------
Subject: [6-2] What machines does it run on ?

The C version of Cyc is intended to run on any system that provides an
ANSI C compiler, virtual memory with at least 150 Mb of swap space, and
at least a 32-bit flat virtual address space.

As of January 1995, C versions of Cyc have been compiled and tested on the
following OS/hardware combinations:

UNIX OS:
Sun Sparc
DEC Alpha
Apple System 7 OS:
Macintosh Powerbook
Macintosh Quadra
Power Macintosh

If there is sufficient demand from a sponsor company, the Cyc team
will produce a C version of Cyc running under Microsoft Windows NT.

The Common Lisp version runs on Symbolics Lisp machines, and under Lucid on
Sparc 10s (with memory requirements similar to those for the C version).

----------------------------------------------------------------
[6-3] How large is Cyc? How has the size changed over the project?

This answer depends on several factors. The primary factors are
the language used and the implementation of the language used to build
the system. Other differences include the operating system support
which is provided to the executable. Different versions of LISP or C
may greatly influence these numbers.

The average size of an executable image is between 150 Mbytes and 200 Mbytes
This figure includes the inference engine and the Knowledge base, but does
not include temporary files or space used when importing external data.

The size of the Cyc Knowledge base has fluctuated over the years, as the
axioms and facts stored have changed. The knowledge base has decreased
in size when axioms have been generalized over the years. This has resulted
in fewer total axioms. Adding a new context or microtheory has increased
the size, although a large amount of information needed when defining a new
context is already availble in other parts of the knowledge base.

As has been stated earlier, there are more than 400,000 significant
assertions of which less than 30,000 are Rules (Inference IF THEN statement)
There are currently over 500 MicroTheories (long lived contexts) defined
within Cyc. (see section [7])

Adequate partial solutions have been defined for representing and reasoning
with the most commonly occurring situations that deal with Time, Belief,
Substances, Causality and Possibility, etc. Much of this has been outlined
in the book _Building Large Knowledge-Based Systems_.

A rough breakdown of the 30,000 Constant terms in the Cyc KB is:
40% Categories
.5% Categories of categories
Categories of individuals
3.0% Categories of Intangible objects,Information Bearing
objects, Numbers, and Physical attributes, etc.
18.5% Categories of Tangible objects, Living Things, Artifacts
18.0% Categories of Script types
Actions by one person : Physiological Actions, Problem Solving
and Planning, Work/Hobby/etc Actions
Actions by more than one person : Communication, Rites of
Passage, Trade and Commerce, etc.
Actions of Natural Phenomena (Weather etc)
15% Predicates and Functions
Unary: see Categories and Attributes
12.0% Binary
2.0% Ternary
1.0% Quaternary or more
10% Attributes
15% Lexical objects (words, parts of speech, tense, number, gender)
15% Proper Nouns (specific people, places, languages, events, etc.)
1.5% Microtheories (long lived contexts)
3.5% Misc. and Sundry

Another breakdown would be by the rules or formulae

25% Taxonomic Information (like type constraints on predicates)
35% Partonomic relations
5% general relationships
15% what kind of parts physical/anatomical/subEvents might
various types of objects have?
15% what kind of actors are involved in various script types?
5% Information about specific people, places, etc.
10% Lexical information
8% linguistic properties of different word senses
2% denotations of word senses
10% More complex information interrelating script types, people and
tangible objects.
10% General topics (time,space, intentions, stuff, numbers, etc)
5% Misc. and sundry formulae

----------------------------------------------------------------
Subject: [7] How does it work ?

In the old days (before 1991), Cyc's representation language (CycL) was
primarily a frame-based language, the Cyc KB was thought of as a set of
unit/slot/entry triples, and inferencing was done pretty much by
inheritance. This led to a set of increasingly baroque add-ons and
work-arounds, such as encoding higher-arity predicates as entries which
were tuples, having variant forms of predicates (in which the only
difference was the order of the arguments), and placing more and more stress
on frame-oriented editing interfaces to navigate around in the knowledge
base.

The Cyc team now thinks of the Cyc KB as a "sea" of assertions, with each
assertion being no more "about" its first argument than its last one. For
example, if one says that Fred is Sally's father, this is now regarded as
being just as much a statement "about" Sally as Fred. Inference has
broadened out into general logical deduction, with AI's well-known named
inference engines (such as inheritance, automatic classification, etc.)
just special cases that might or might not get treated specially in any
particular implementation of the Cyc system; but in any event the persons
entering knowledge do not need to cater to that, or even know about it. So
one way to visualize the Cyc KB is as a circle filled with assertions; a
circular "assertion sea". Above this sea (or outside it, from a
two-dimensional perspective) sit all the "constants". Attached to each
constant is a bundle of thin wires or strings. The other ends are
attached to all the assertions, in the sea, that mention that constant
anywhere. Moreover, each of the assertions in the sea can itself be
treated as a constant, if you want, and have its own wires reaching to
other assertions which mention it.

Inference rules in Cyc can now be thought of as ways of saying that if you
have certain assertions in the sea (a set of them, that match a certain
pattern) then you are justified in adding a particular new assertion. Each
time an assertion is added, wires are automatically strung to all the
constants that are mentioned anywhere inside the assertion, and "ripples"
of its adding may cause yet other inferences to occur, yet other new
assertions to get dumped into the sea, etc. Sometimes one of the new
assertions is the answer someone was waiting for, for some problem;
sometimes one of the inference procedures reaches a contradiction and has
to cope with that.

CycL, the Cyc representation language, is essentially a form of First
Order Predicate Calculus (FOPC) with equality, augmentations for
default reasoning, skolemization, and some second-order features
(e.g., quantification over predicates is allowed in some
circumstances). Like FOPC, CycL allows using ForAll (universal
quantification), ThereExists (existential quantification), and
LogImplication (material implication), as well as the other common
ways of combining variables and logical expressions such as LogAnd
(conjunction), LogOr (disjunction), and LogNot (negation). It uses a
form of circumscription, includes the unique names assumption, and can
make use of the closed world assumption where appropriate.

Cyc currently does not store most of the information you would find in a
dictionary, encyclopedia, or an almanac. For example, Cyc may not know
that Birendra Bir Vikram Shah Dev is the current king of Nepal, or that
Kathmandu is its capital city. It does know what the characteristics of a
capital city are, and it knows the significance of being a head of state.

----------------------------------------------------------------
Subject: [7-1] How do they store common sense in a computer ?

See sections [1-1] and [7], above.

Each assertion in Cyc (a statement of fact or a "rule-of-thumb") is
located in (or associated with) a specific microtheory or context. Each
microtheory captures one "fairly adequate" solution to some knowledge
representation area (knowledge domain). These solutions may address general
areas like representing and reasoning about space, common devices, time,
substances, agents, and causality or specific areas like weather,
manufacturing a particular thing, and walking.

Different areas may have several different microtheories, since the way an
area is perceived or modeled may be different. Different points of view,
different assumptions, different levels of granularity, and even what
distinctions are important or not important may be significant enough to
require creating a separate microtheory. A microtheory may be considered to
be a smaller and more modular knowledge-base within Cyc, which is
specialized on a particular topic.

The important thing to realize is that neither the Cyc team, nor Cyc itself
claims to have a unified theory of time, space, and the universe. Nor does
it embody some great master Laws of Thought. What they do have is a suite
of specialized microtheories whose union covers the most common cases.

----------------------------------------------------------------
Subject: [7-2] How do they input common sense ?

The Cyc team's basic knowledge browsing and editing tools consist of
an extensive and growing set of HTML pages. This scheme provides
maximal standardization and portability across platforms. In effect,
this means that anyone with access to a WWW browser, the correct URL,
and security clearance can, from anywhere in the world, browse in or
edit the Cyc knowledge base. These tools allow the user to view
assertions in the knowledge base and perform a variety of operations,
including adding assertions, removing assertions, creating new
constants, killing constants, renaming constants, setting inference
performance parameters (e.g., forward or backward propagation for
rules), asking for conclusions to be derived (if possible), and
viewing the inference chains that resulted in particular conclusions.

As these tools are for inhouse development, there is no public World
Wide Web site available. There currently is some discussion of
providing a subset of the Cyc database on an example Web site.
Should this happen, the address will be publicized. This may be
available before the end of 1995.

There is also a variety of test suites that are run periodically to test
the integrity of the knowledge base and the functioning of the inference
engine. The Cyc team expects to give more emphasis to regular, automatic
testing of the system now that product development has begun.

----------------------------------------------------------------
Subject: [7-3] What theoretical foundation is behind Cyc ?

Cyc is not a theoretical effort, although there has been a lot of
theory used in its construction. The Cyc team prefers to think of the
project as an engineering effort. The primary focus of the Cyc
project is to actually start consolidating a cohesive knowledge bank.
Any theoretical issues which have been addressed have been directly
motivated by the requirements of solving specific problems.

The Cyc team believes that a hand-encoded effort using symbolic logic may
express a significant fraction of the fundamental human knowledge typically
shared by most people. This bootstrap process is greatly enhanced by
the redundant nature of knowledge. Most knowledge uses and re-uses the
same basic ideas and relationships in many different ways. The day to day
entering of knowledge is not based on ethereal definitions of elaborate
Causality and Time-Space-Intelligence collections. Most data is as plebian
as 'living organisms have to eat to stay alive' or 'broom handles tend to
be made of wood'.

It is hoped that as the Natural Language effort continues (see [7-5],
below), more knowledge may be entered by persons typing in assertions in
English, and eventually by having Cyc 'read' source materials for itself,
bothering its human attendants only when disambiguation is required.

----------------------------------------------------------------
Subject: [7-4] What is the difference between Cyc and an Expert System ?

The knowledge in Cyc is more densely interrelated, Cyc has more information
about the common attributes of the world, and Cyc has a broader focus than
any individual expert system.

A typical expert system uses highly detailed knowledge about a single,
tightly-focused domain. Cyc encodes general knowledge about many different
domains, viewed from a variety of perspectives. Based on the bodies of
information (microtheories) it uses in inferencing, Cyc may draw differing
conclusions.

Cyc may be thought of as a tool for building expert systems and other
programs that use a rule-based knowledge representation. It supports
and uses both forward and backward chaining and the dynamic creation
of terms (Skolemization). Cyc has an integrated argumentation-based
truth maintenance system to provide logical reasoning as well as
supporting non-monotonicity.

----------------------------------------------------------------
Subject: [7-5] What are they doing in Natural Language Processing ?

The Cyc NL system is unique in having access to a very large, declaratively
represented common sense knowledge base. Cyc helps the natural language
system handle word/phrase disambiguation, and also provides a target
internal representation language (CycL) that can be used to do interesting
things, such as inference. A substantial portion of the Cyc natural
language processing system (the lexicon and many semantic rules) is
actually represented in the Cyc knowledge base; Cyc "knows about" words
just like it "knows about" cars or trees. Syntactic parsing is carried out
by application of phrase-structure rules to an input string. Semantic rules
are applied to the output of the syntax module. It is in the
application of the semantic rules that the knowledge in the knowledge
base is proving especially advantageous.

Most of the Cyc pilot applications developed in the recent past have
some NL component in their interfaces. The captioned image retrieval
application, for example, accepts queries in English, and allows
captioners to describe new images to the system using English
sentences. The Cyc NL team is currently expanding the lexicon,
extending the parser, and adding new semantic capabilities to the
system.

----------------------------------------------------------------
Subject: [8] What are Cyc's capabilities right now?

The basic hypothesis behind symbolic Artificial Intelligence is that it
is possible to simulate intelligence in a particular "microworld" by
manipulating a set of symbols that represent that "microworld".
The Cyc team believes this involves picking a particular task domain
and then solving the problems encountered in that task domain by
a combination of
1. Defining appropriate symbol-manipulation techniques
2. Building an adequate symbol set for representation
3. Find some general purpose reasoning mechanism
4. Build a reasoner using that mechanism

Cyc's chosen "microworld" is more or less every particular task domain,
but only down to a pre-expert level of detail. The symbol set and symbol
manipulations should be biased by the special cases encountered in modeling
the regularities in the world as encountered by the 'common man'.

----------------------------------------------------------------
Subject: [8-1] Can Cyc reason about data not stored in
Cyc-format databases ?

Yes. In the previous version of Cyc (Cyc 9) a pilot application named
Cyccess was developed. Through Cyccess, Cyc could interface with
structured information sources (SIS) such as databases or
spreadsheets.

Cyccess used Cyc to understand the contents of structured sources, to
retrieve information, and to pose queries that depended on a
combination of Cyc knowledge and the data in the SIS.

After the information in a SIS was appropriately linked to assertions
in Cyc, all the Cyc inferencing, guessing, and consistency checking
capabilities were available. An interesting implication of this is
that Cyc could use specific facts or time-sensitive information
without duplicating it within the Cyc knowledge base.

More recently, the Cyc team has built a production version of a
database browsing and retrieval application for one of Cycorp's
sponsors.

----------------------------------------------------------------
Subject: [8-2] If a robot had a radio link to Cyc, what could it do ?

The robot would not have a sense of self in the same way that a human does.
Cyc has predicates that refer to the symbol 'Cyc', but since it would not have
a representation of itself in space, it couldn't be immediately usable
as a control program for a robot. The CycL representation language is
capable of representing many things about time and space, but few predicates
are organized in the immediate form necessary for real time control.

Another issue is that Cyc cannot sense itself in space.
It does not currently embody any information about its location.
Since Cyc does not have any sensors, it depends solely upon data input
through knowledge enterers. The Cyc KB however, has the spatial knowledge
necessary to allow the robot to navigate. It would be necessary to define
the axioms the Cyc program would need to interpret input from sensors in
terms of these spatial knowledge axioms.

It is not part of the Cyc effort to build a real time planner which would
be necessary in giving a robot autonomous control.
----------------------------------------------------------------
Subject: [8-3] When do they expect Cyc to be able to read?

The subject of Natural Language or English input is complex. There is
some implication in reading that there is a corresponding level of
assimilation of new knowledge, as well as an understanding of what is read.

Cyc does have some ability to convert English text into the language it uses
to store knowledge. The English to CycL translator is not so robust that it
can read news articles from a newsfeed and learn from them.

Kathy Burns is leading the effort of the NL group and has substantially
rewritten many components of the existing system. The new version of the
Natural Language system is anticipated to be incorporated into existing
applications using Cyc by the end of this year. These include image
retrieval from text requests and limited natural language query processing.

----------------------------------------------------------------
Subject: [8-4] Are you able to converse with Cyc ?

You cannot converse with Cyc as you would to a person. Generation of
natural language text such as English is part of Cyc's continuing development
plan, but it currently is not capable of conversation.
A conversation involves many ellipses and false starts that people are
able to process, but complicate conversation immensely.

This is not to say you cannot ask questions of Cyc. The English to CycL
translator is developed sufficiently to be the basis of several Cyc
applications. The CycL to English generation is one of the focuses of
the Natural language group.

----------------------------------------------------------------
Subject: [8-5] Will Cyc pass the Turing Test any time soon ?

The Turing Test is a test developed by the computer scientist Alan Mathison
Turing (1912-1954). It proposed a method to determine if a computer program
should be classified as intelligent. Essentially, the test involves two
input devices, one to a computer, and the other to a human. If an observer,
using the input devices, cannot determine which device communicates with
the machine, and which communicates with the human, the computer is said
to have 'passed' the Turing test.

The Cyc Team is interested in creating an Artificial Intelligence based
agent. If in the process of creating this agent a program is created that
should pass the Turing Test, it would be very satisfying. The type of
common knowledge that is being put into Cyc would, in general, be useful
to a program which attempted to pass the Turing Test. It is not expected
that the Cyc program will pass the Turing Test in the near future.

----------------------------------------------------------------
Subject: [9] Cyc standards with ANSI or ISO standards

It is important to the Cyc team that Cyc applications and interfaces
(eventually) provide support for any commonly accepted knowledge
interchange and knowledge representation standards capable of expressing
the kinds of assertions and heuristics found in Cyc. Some of Cyc's
capabilties may be more complex than those encoded in existing standards.
One of the long term goals of Cyc's builders is to influence the shape of
proposed knowledge interchange and knowledge representation standards. The
Cyc team intends to make a major contribution to the development of a
basic, "core" knowledge foundation that could be used by other projects and
applications which need access to knowledge about consensus reality.

----------------------------------------------------------------
Subject: [9-1] What are the details of the functional interface to CycL ?

The Functional Interface (FI) of Cyc is a set of procedures that can
be used by external programs to query Cyc for conclusions or general
information. It currently consists of twenty-seven operations, with
the following having the most general utility:


FI-FIND given a name string
return the Cyc constant having that name

FI-CREATE given a name string
create and return a Cyc constant having that name

FI-KILL given a term
return a modified knowledge base in which the term no longer
exists and any assertions of which it was a component
are no longer true

FI-RENAME given a constant
given a string (new constant name)
change the constant's name to the new name and return the
new constant

FI- ASSERT given a logical formula,
given a knowledge base subset
produce a modified knowledge base with the formula as
an axiom and return a status notification

FI-UNASSERT given a logical formula,
given a knowledge base subset
produce a modified knowledge base with the formula not an
axiom (the formula may still be a theorem, i.e., follow
from other axioms) and return a status notification

FI-JUSTIFY given a (concluded, cached) formula
given a knowledge base subset
return a list of the cached formula-subset pairs that
together justify the previous derivation of the
formula as a theorem

FI- ASK given a logical formula, possibly including free variables,
given a knowledge base subset
return a binding list of variables that will make the formula
true (optional parameters which can be varied include
whether or not to backchain, the desired number of bindings,
the clock time allowed for the operation, and how deeply in
the search tree to search)

FI-CONTINUE-LAST-ASK (no inputs)
continue the last call to FI-ASK from the search state
where it terminated, looking for (more, new) bindings,
and return a binding list (optional parameters which
can be varied are the same as those for FI-ASK). This
is a new feature, possible because in Cyc 10, unlike
Cyc 9, inference search state is explicitly maintained

FI-DENOTATION given a natural language string
return a list of Cyc constants which (according to the Cyc
lexicon) constitute possible denotations for the string


The present implementation of the FI includes an external telnet
server. This means that applications (and users) can telnet directly
to Cyc, evaluate FI operations, and get the results.

Currently, Cycorp has not chosen to publicize any port where this interaction
may take place. Cycorp machines are privately owned. This telnet capability
of the software is not an invitation to unauthorized access.

----------------------------------------------------------------
Subject: [9-2] Can an SQL query be used to ask Cyc a question?

No. Since the SQL model presumes a relational abstraction consisting of
tables with rows and columns, it does not translate well into the Sea of
Assertions abstraction (or even the older Frame based abstraction). The
Cyc model is much richer than the SQL model, and thus too many distinctions
with the knowledge base probably could not be expressed. In essence, an
SQL query would be too coarse grained to pose an adequately meaningful
question to the Cyc knowledge base. (However, it is possible that
object-oriented versions of SQL would be capable of posing meaningful
questions to Cyc).

----------------------------------------------------------------
Subject: [9-3] Can Cyc generate an SQL query to find
information it needs ?

Yes. This is part of the capability that has been recently incorporated
into Cyc to use knowledge stored externally to the Cyc knowledge base. To
translate a CycL expression into an SQL database query, it is necessary to
describe the database schema to Cyc using CycL. This procedure can be used
to tell Cyc the meaning of the various rows and columns (fields) in any
given SQL database. After this has been done, Cyc can use both its own
internal rules of thumb and the data contained in the external database to
derive new conclusions.

----------------------------------------------------------------
Subject: [9-4] Can Cyc interact with HTML ?

Yes. There are existing tools to query Cyc and edit knowledge in Cyc
that are HTML based. Cyc dynamically generates HTML pages in response
to user queries. (See sections [6-1] and [7-2])

Cyc does not yet 'web-surf' or access gopher clients to learn about the
world by using available Internet resources. Many of these resources
are not in a form that Cyc can currently interpret, but Cyc's builders
hope that as its natural language understanding abilities improve it
will be able to assimilate knowledge from sources on the Internet.

----------------------------------------------------------------
Subject: [9-5] Does Cyc have a KIF interface ?

KIF stands for Knowledge Interface Format. It, as well as other standards
such as Conceptual Graphs, is intended to be a means of transmitting rules
and facts which are stored in an Expert System Shell, or other knowledge
base system. This format is intended to be a linear form of the logical
assertions stored internally using symbols and other complex data
structures.

Since knowledge in Cyc is stored in this assertion format and CycL is
also a form of linearly expressing assertions of a knowledge base, there
are no theoretical restrictions that would prevent CycL from being expressed
as KIF assertions.

KIF currently is not a published standard, although there is a draft
standard available, under the auspices of the American National Standards
Institute (ANSI). The Cyc team is interested in this is an area yet
there is currently no official interface developed in Cyc to allow Cyc to
generate KIF or interpret KIF as input.
----------------------------------------------------------------
Subject: [A] Acknowledgements

This FAQ has been based on magazine articles and books published by the Cyc
team, notably Doug Lenat and R.V. Guha, and personal communication with Doug
Lenat and Nick Siegel.

Any mistakes in it are my sole responsibility, although this is not a
warranty. I would appreciate a note about any inaccuracies or
misrepresentations herein. This FAQ could not be created without the
generosity of the Cyc team in sharing information with the computing
community about the methods and philosophy they have been using.

----------------------------------------------------------------
Subject: [B] Bibliography of Expert Systems books, introductions,
documentation, periodicals, and conference proceedings.

Davidson, Clive
Common Sense and the Computer, New Scientist
April 2, 1994

Guha, R.V and Lenat, D.B. Pittman K., Pratt, D. Shepherd M.
Cyc: a midterm report. Communications of the ACM
August 1990/ Vol 33. No 8

Guha, R.V and Lenat, D.B.
Cyc: a midterm report. A.I. Magazine, Fall 1990

Lenat, D.B. and Guha, R.V
Building Large Knowledge Based Systems, Addison Wesley,
Reading Mass, 1990

Lenat, D.B. and Guha, R.V
Enabling Agents to Work Together, Communications of the ACM
July 1994/ Vol 37. No. 7

Lenat, D.B
Steps to Sharing Knowledge. In Toward Very Large Knowledge
Bases. Edited by N.J.I. Mars. IOS Press, 1995.

Lenat, D.B.
Artificial Intelligence. Scientific American, September
1995.

----------------------------------------------------------------