Accessing
Information
from
Knowledge
Globally
Repositories (Extended
Alfred Department
Abstract)
V. Aho of Computer
Columbia
Science
University
[email protected].
Abstract This
paper
discusses in the
some
knowledge
tions from
of the
major
way of achieving
sal access to multimedia tributed
information
repositories.
the database
research
technical
obsta-
cost-effective stored
univer-
in globally
Opportunities
for contribu-
community
expanding
and users are influencing
alized communication
market
The
purpose
key
technical
Businesses are integrating
healthcare, merce
email,
are emerging
new
infrastructure.
entertainment, as important
Digital and
goal
areas.
com-
Un-
some
personal
and
issues
that
positioned
of the
if we
are
information the
to
database
attack.
The
from
and
- universal
access
The
remainder
in more
detail.
Scalability
The
most
sign
and
ture
architecture
that
but
in the price/performance of processors and computer memories, as well as exponential growth in the number of people and the amount of traffic carried on the
on a global
scale. striking to it
information
The terabytes
is becoming beginning of the
volume
Tens
information
and
to support
how
much
access
millions of
various
of
people on-line indicate
10 per
cent
Web
kinds
of
Internet.
is growing of
set
services
new
to the
information of
a rich
Estimates than
high
information
number
more
had
current
the
available. U.S.
gets and
information
of the
of 1996
of on-line
rate. of
infrastructure
rapidly
is growing
a signifi-
planet.
multimedia
how
de-
of effective to
connectivity
aspects
are
to
infrastruc-
creation
it was not engineered
infrastructure
staggering
on the
issues
is how
services
near-universal
interactive
population
the
telecommunications
of evolvable
the
problem
these
information
supports
people
reliability,
at the
discuss
communications
for
that
will
technical
of the
most
paper
a scalable
and
current
connected
of knowledge
evolution
browsing
challenging
marks
The
and
implement
fraction
The
integration
of this
3
cant
media
quality
- searching
and
Permission to make digital/hard copies of all or pati of tlis material for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial sdvantage, the copyright notice, the title of the publication and its date appear, and notice is given that copyright ia by permission of the ACM, Inc. To copy otherwise, to ~publiah, to post on servers or to redistribute to lists, requires specific permission andlor fee.
on
some
be solved universal
come
integration
arrival of the information age. In the past decade we have continued to see order of magnitude improvements
PODS ’96, Montreal Quebec Canada O 1996 ACM 0-89791-781-2/96/06. .$3.50
has a
highlight
must
is well
of multiple
- information
high
the
focus
we discuss
- organization
resolved, however, is the question of who will pay for the new services and how. Rapid improvements in computer hardware, software, are accelerating
is to
that
of affordable
shall
that
- systems
on an
libraries,
electronic
application
technology
U.S. as
- scalability
services whereby
their operations
paper
community
information
communications
the
problems
for person-
homes, and remote locations with equal ease. An increasingly diverse set of new services needs to as a significant fraction of the populace be supported access to the
the
such
Issues
problems
We
research
the
international scale and people want to access their information environments and coworkers from their oifices,
gains
in
or television.
of this
achieve
services.
people can get access to others and the information they need in the form and media they want, any time, any where.
New
- integration
is developing
and information
home
appliance
telephone,
The
2
of this infrastructure.
A rapidly
every
information
computer,
to
from business, technology,
Virtually of
are highlighted.
The goal of developing an information infrastructure is to provide affordable universal access to multimedia information services on a global scale. Strong forces evolution
form
dis-
Introduction
1
edu
Internet.
cles standing
Distributed
at
pages are
a
and
already
available
over
System by
NASA
data
the
This
from
and
a comparable
the
genome.
The
amount
frastructure applications mand,
grow
and
tain
constraints
time
work
the
protocols
of-service vices.
and
At
the
storage
times
level
capable
and
of
the
storage
devices
net-
However,
will
slower The
require
to ensure
information
such
in seconds,
also
to end
organization
from
disparate
The
integration
In
have
data-
are two
secondary
use
own
to
types
involving
dealing
with
needs
and
preferred data
asking
to
text.
These
new
types
model
and
that
access
the
database
for handling
tabular
extend develop
to the
implementation
interactive
textual
multimedia
appropriate tradeoffs
operators for
the
new
new
with
we
are
multiple
forced types.
becomes to
deal
We need
much
We
with a data
The
model
its
own
more
has its own j argon
medicine,
systems difficult
and
This
appropriate
classi-
When
we
disparate
knowl-
arise.
At
the
terms
should
orwe
among
question
problem”.
of
have
and
knowledge
areas?
and
fields
music
works.
problems
aggregated
the
and
from
concepts
In
the
is some-
effect,
ontology
can
We
need
sources.
analyze
the
At
data
wide
complex
when
one
involving
2
like
for
we are
all
of
to
and
have
towards
the
that dis-
a single
creating
systems
process
tools
interconnected as using
different
approaches
sys-
interop-
so application
information
are
the
model
a “wrapper”
from
the
approach and
filter
of formats
Translators place,
convertors
to but will
for In
and
translates
notation
dif-
the
is to use agents information
level, needed another.
the for
the
informa-
integrating
the
media-
disparate
systems
use
for representing convert
Some
it is likely continue
to
ap-
source
called
from
existing
dealing one
each
of the
and conventions are
representation
explored sources.
is defined that
into
representational
variety
being
information
the source
to collect
is taking and
to work
gather
Another
types.
can integrate
like
among
with
from
tors
data
would
is fitted
multimedia
would
access from
as easy to perform
an integrating
does
and
we
heterogeneous
model.
has
level,
information
proach, tion
relational
community
that
even
domains.
information
composite
future
knowledge
an
For example,
“ontology
interfaces
with
varying
has
more
is
their
what
is an
systems
data. problem
of growing
domains
taxonomies,
the
user
Several
data
types. The
for
of
notations,
many
the
make
programs
those
widely
type
research
topics
organizing
heterogeneous
the
erable
video
methods.
for
level,
what
At
and
is a significant
have
Each
methods
understandable
Internet.
Media
and
with
to buffering
to combine
images,
characteristics. operators
developed not
audio,
and
an
are
endeavor
information
called
would
are starting
in
integration
distinctive
describe
times
challenge. documents
types
appropriate
Integration
literature,
schemes
ferent
Digital
of
browsers,
knowledge.
dance,
combined
or
these
of video
types
data appliances
manner
of human
ganizational
storage
containing
delivery
data
as
as a megabyte
creation
types
and
and
edge domains,
users.
of multimedia
data
problem.
interconnect
devices
devices
of Multiple
and objects
multimedia
interfaces, data
knowledge
field
fication
tem.
Integration
user
of organizing
parate
4
information
pleasing
The
chemistry,
need
of data. these
approaches
smooth
new
of
Organization
their
juke
systems
or as large
The
these
esthetically
way
be
as disk
than
new
the
we
thus
tools,
Every
ser-
such
and
algorithms
of
capabilities.
challenging
as electronic
form
multimedia
quality-
will
petabytes likely
systems
as disks.
caching
in cer-
information such
the
in our
Knowledge
video
New
privacy
of holding
of magnitude
such
and
5
within
diverse
like
describe
accommodate
of text
design
displaying
For example,
infrastructure
the most
silos.
devices
the
applications
measured
orders
audio
to meet
is tertiary-storage
or tape
deau-
the
need
we would
can
research.
on
natural.
multimedia
of security
in-
and
to be delivered
the
authoring
quality-of-service
it to appear
for
future,
access
about
video
with
We
byte
we
stored
efficiently
presentation
multimedia
movies
of data.
can
as a single
requires
healthcare.
devices
boxes
and
for
physical
take
three
has
sensitive
the immediate will
stringent
transmission
video
Guarantees for
repositories. that
The
at
video-intensive
Delivering
are needed
commerce
knowledge
and
which
of information
structures
a
information
with
be synchronized
requirements
necessary
by the
types
information with
of video.
in
centers
of information
carried
more
to
a movie,
stored
languages
small
with
are expected
teleconferencing,
puts
needs
of
producing
world
dramatically
on the
audio
and
genome
the
instruction.
information
requirements
The
amount
as
video
combined
of multimedia
for
such
and
be System
size.
of traffic
will
of information oceans
all forms associated different
is expected
and
countries
throughout
to produce
the
will
Information
universities
human
decade,
land,
and
Observing
to relaunched
of a petabyte
of unprecedented
various
dio
of this
atmosphere,
sources
Data
database
end
information
other
EOS
Earth
ofsatellites
a third
about
earth.
the
the
about
annually
The
Internet.
acollection
towards
to generate the
the
(EOS),
need
data
from
standardization for
translators
foreseeable
future.
a
The
classification
cilitates
its
world,
storage
schemes approaches,
and
new
forms
classification
and
keeping
of knowledge.
scheme
in the have
widely
database been
schema
maintaining keeping
classes.
It
approaches
is well
to attack
the
this
in the
infacili-
scheme
trying
to generalize
these
arising
from
the
information
Systems
The
global
systems
Integration information
integration
To
facilitate
the
introduction
challenge
faced
creation
of
in the
components
Open
in-
are critical
for
system
Many
existing
tate
and
systems
tional of lines
but
tems. systems
nents
7
and
help
Information
With
the
with
data
of new
quality
is likely
database
problem data
can
plete.
The
corruption
invalid
systems. grated, source, An
sources
When users
the
can
for
the
open
of
with
integrity. lineage
or data data
problem the
approach
of information
that
can
retrieve
of multimedia
sources
databases
research
areas.
through tools
by content Some
queries of
Designing with
technical
as
Effective
a combination
graphics.
querying
such
values.
multimedia
interactive
maps, correlate
characteristics
answered
search
examine
repositories.
or approximate open
California?” hybrid
and then
tools
a variety
is a significant
Universal
a set
these
kinds
of
challenge.
are inteof’ which
– its
assurance
is to origins,
trace
Access is both a technical
On top of the basic providers could offer appeal to specialized increased willingness what goes into the
incomderived
assigned One
databases,
information infrastructure scope and scale.
with
of reasons,
new information
inconsistent
is
best
to
of
“What
and a societal
one:
everyone at an affordable price? At the technical level, this would involve asking what services could be provided at what cost. The database community has studied system and query optimization questions for years. However, the issues for the global
problem
plagued
when
area
are still
Our final question
to believe. research
from
in
need
searching
or
require
less
decline
cities?”
knowledge
for
shape,
and
may
much of
What facilities and services should be part of the basic information infrastructure that should be available to
compo-
existing
inconsistent,
the
for index-
use
quality
prescribed
interoperable
9
sys-
specifications
a variety
with
algorithms
combinations
causes
as these
user
various
data
be
capabilities
of all
system
already
spread
well
browsing
interoperability.
is used to populate
are confronted
if any,
For
incorrect,
systems
important
information record
be
are
data.”
Sct-
access to
The
tasks the
air
such
for searching
or concept
to get worse.
systems
of “dirty
legacy from
data,
the
process
methods
Quality
explosion
Existing the
these
system
facilitate
techniques
certain
and other
is a need
texture,
them.
definitions
tests
assure
among
color,
may
in
the
images,
imprecisely
existing
is the
organizations
unambiguous From
where
using
facili-
with
“What
for boolean
are
influencing
and
the
tech-
as “What
Computer
stores.
retrieval
questions
and
billions
data
“What
factors
There
interna-
of interoperable
conformance
would
out
the
and
many
interwork
international
are needed.
construct
that
to
aspects
precise,
existing
throw
how and
various
the
transportation
satellite
in-
that
or
as SQL
data
information
public
data
well-defined
contains
we cannot learn
interfaces
we can
Because
of national
Clear,
have
interfaces
infrastructure
must
are addressing
not
expose
integration.
of software,
number
do
do not
information
systems, A
systems
often
such of
effective
data.
efficient
questions
such
textual
concepts:
layer
interoperability
languages
we need
multimedia
number?”
record-based
Answering
the
levels
information
Foundations
has developed
strategies
the physical
telephone
Query
the
has developed
precise
textbook
precise are
evolvability.
terfaces
the
of
Many
biggest and
between
by
of keywords.
we need interoperability
interfaces
Smith’s
cost
sys-
the
services
from
is Jane
to find
community
ing and searching
by engineers.
new
infrastructure
applications.
is
used
accuracy
items.
of distributed
answering
community
Evolution
infrastructure
of new technology,
at all levels to the
and
be the
Browsing
ways
ocean
conventional
tems.
6
for
ence?”
hiertypes
problems
can
determine
information
and
database
niques
of individual
heterogeneous
to
of particular
growing
The
of ap-
view
complete
information
languages
Searching
schema
problem
with
This
query
We need effective of
versions
worth
of distributed
problem
A number
address
of the
different
8
as an important
evolution
versions
and
tegration
to
and
or reliability
applica-
performing
community.
taken
integrating
The
annotations.
conchang-
existing
recognized
and users
accom-
However,
when
been
to
business
can cause
working
fa-
changing
to evolve
changing
old programs
proaches
and
need
working.
problem
archy,
a rapidly
to stop
has
cluding
of information
In
procedures
evolution
ties,
access.
integrated
ditions,
tions
organization
classification
modate
ing the
and and
If
of
to enhanced
information
educational,
and
average
sources,
3
increased
infrastructure information service customized services that would communities with presumably an to pay. The tradeoff is between basic infrastructure and what is services.
is going economic,
individual,
are ones of vastly
then
to and
become health
information
essential well
being
services
for
the
of
the
need
to
be
universal
world
and
affordable.
of
haves
to
information.
access and
enhanced
and
is likely
and
Otherwise,
have-nets The
services,
we’ll
distinction
however,
to be a subject
have
differentiated well
a
their
between
is not
of much
by
basic
understood
debate
in the
near
future.
10
Conclusions
We have
taken
major
technical
of
nations
all
global there
on are
will
have
the
as
their
well.
respective
transcend
individual is
global this
kind
impact.
the
of
research
community
is well
for
property,
total
would
issues shaped
by
communities, various
of sets
investment
is likely over
have
in
that
examination for
of work
have
degree
The
new
positioned
we
to some
critical
of dollars
It is also the kind
must
safeguards
several
infrastructure
of billions
that
that
questions
tradeoffs
information
hundreds
have
out
are being
from
little
affordability/universality
the
include
of the
forces
Since
we
point
communities.
relatively
affordable
questions
communities
but
services,
should
investigated
technical
conflicting
basic
and
of intellectual
all
being
sometimes there
we
the
of expression.
level,
already
of
if people
While
These
of freedom
some
be met
universal
protection
technical
mentioned
of
must
nontechnical
privacy,
guarantees
At
a
issues,
substantial
answered
view
that
infrastructure.
technical
individual and
top-down
information
focused be
a
challenges
in
to
run
the next
enormous
of the
into
decade, economic
the database
research
to conduct.
References [A90]
Alfred
V. Aho.
in strings. Science
[SSU95]
Avi
Algorithms
In Handbook J. van Leeuwen,
Silberschatz,
Unman,
eds.
Mike
of cm NSF Systems
[u82]
Jeffrey tems,
1990. and Jeffrey
achievements
on the Future
May
D. Unman.
patterns
the 21st century.
Workshop
second
Elsevier,
research:
into
Research,
Ed.,
Stonebraker,
Database
and opportunities
for finding
of Theo retzcal Computer
26-27, 1995.
Principles
edition.
Report
of Database
Computer
of Database Science
SysPress,
1982.
[WMB94]
Ian
H.
C.
Bell.
Witten,
Reinhold, [CSTB94]
[NRC94]
The
New York,
1994.
the Press,
Changing
tions/Information search
Moffat,
Gigabytes,
Realizing Academy
Alistair
Managing
Council,
Znfm-matiorz
and
Timothy
Van
Nostrand
Future
NationaJ
1994. Nature
of the
Infrastructure 1994.
TelecommunicaNationaJ
Re-