blank
 
 
 

Please Do Not Read The Text Below.

Download the original file at

WebSeer: An Image Search Engine for the WorldWideWeb
Michael J. Swain, Charles Frankel, and Vassilis Athitsos
Department of Computer Science
The University of Chicago
Chicago, Illinois 60637
f swain, frankel, vassilisg@cs.uchicago.edu

We copied the scrambled text from pdf file (that's how it gets copied, sorry we had no control) so the keywords would be searchable.

WebSeer: An Image Search Engine for the WorldWideWeb 
Michael J. Swain, Charles Frankel, and Vassilis Athitsos
Department of Computer Science
The University of Chicago
Chicago, Illinois 60637
f swain, frankel, vassilisg@cs.uchicago.edu
Abstract
Because of the size of theWorldWideWeb and its inherent
lack of structure, finding what one is looking for can be
a challenge. In fact, some of the most highly visited Web
sites are search engines. However, while Web pages typically
contain both text and images, most currently available
search engines only index text. This paper describes
WebSeer, a system for locating images on the Web. Web-
Seer uses image content in addition to associated text to
index images; the image analysis is designed to complement
the information obtained from the text.
1 Introduction
The explosive growth of the World Wide Web has
proven to be a double-edged sword. An immense amount
of material is now easily accessible on the Web, but locating
specific information remains a difficult task. While
there has been some success in developing search engines
for text, search engines for other media on the Web (images,
video, and sounds) are just starting to appear, and
are extremely primitive. WebSeer uses information derived
from analyzing the image content to complement the textual
information associated with an image and information
derived from the image header. This additional information
is used to create a context in which image analysis algorithms
can effectively operate. Image analysis algorithms
are then used to classify the image within a taxonomy of
types (photograph, portrait, computer-generated graphic,
etc.) and to extract useful semantic information such as
the scale of a portrait (close-up, half-body shot, full-body
shot, etc.). In addition, duplicate images are detected using
an image checksum and other information.
Like text search engines, WebSeer does not have to access
the original data to respond to a query; all analysis of
the image and surrounding text is done off-line during the
creation of the database. In this way,WebSeer can give fast
query responses to a possibly huge number of users.
Submitted to CVPR ’97. This work has been supported by grants
from the Office of Naval Research and the National Science Foundation,
and by a donation from IBM.
2 RelatedWork
In recent years, there has been a great increase of interest
in content-based indexing into image databases, both
in the research community and to some extent in emerging
commercial applications. The first system to attract considerable
attention was the Query By Image Content (QBIC)
system from IBM Almaden [2]. This system, which has
continued to evolve, allowed the user to find images similar
to a given example image using low-level cues such as
color and texture similarity. When manual preprocessing
to segment objects in the images had been performed, similarity
of the shape of the occluding boundary of the object
could be used as well. Markus Stricker and his collaborators
at ETH extended the notion of similarity to include
rough location information in the image as well, which involved
color matching fuzzy regions in the image. This
model of image similarity has been used in other work such
as that carried out by Ramesh Jain and his students, and
the company he founded: Virage. Santini and Jain have
predicted:
The basic operation in query-by-contentwill be ranking
portions of the database with respect to similarity with the
query.
Other researchers who have made contributions in the
area of image similarity techniques for content-based indexing
into image databases include Carlo Tomassi [6] and
Rosalind Picard [4].
As a paradigm for an image search engine for theWorld
Wide Web, image similarity has its difficulties. An example
image is not always available, at least for the first query
before images have been returned from the search engine
(which may or may not provide good examples to work
from). In many cases describing specifications verbally is
easier than searching for or drawing an example picture.
A problem with the current approaches to image similarity
is that they are very low-level. Often user’s notions of
image similarity require more interpretation of the image
than provided by typical image similarity metrics such as
color and texture measures. The desired form of similarity
can vary widely depending on the goals of the user. And
the complexity of similarity matching in high-dimensional
feature spaces may make it less practical for applications
such as World Wide Web search engines, which require
rapid response to huge numbers of queries. For existing
methods of similarity retrieval, search time increases exponentially
with the dimensionality of the feature space1 and
logarithmically with the number of images in the database
[10].
One approach that is more promising in a multi-media
database such as the World Wide Web is to supplement
the image content with other types of information associated
with the image. Ogle and Stonebraker’s Cypress system
[3] uses information contained in hand-keyed database
fields to supplement image content information. Srihari’s
Piction system [9] uses the captions of newspaper photographs
containing human faces to help locate the faces.
As in Srihari’s work, we intend to use text associated with
the image to guide interpretation of the image. Since much
useful information about the content of the image can be
gleaned from the associated text, the image analysis need
only complement the information obtained from the text.
They can therefore can be incomplete, in that on their own,
they would not provide enough information to give a comprehensive
description of the contents of the image.
Independently of WebSeer, a project at Columbia University
called WebSeek has recently come on-line [7].
WebSeek is another project aimed at studying contentbased
indexing for the World Wide Web, and it provides
a search tool for the Web. WebSeek performs a semiautomated
classification of the images on the Web into a
hierarchy of categories, using associated text and filename
cues. Searching is done by browsing or searching through
the categories. Color histogram similarity matching can
be used to find images of similar color within a category
or over the entire catalogue. The researchers have also
shown that color histogram analysis is capable of differentiating
among some of the broad-level categories, and
that it can be useful in differentiating photographs from
computer-generated graphics.
3 WebSeer
To best understand the description of WebSeer that
follows, it is helpful to have seen an example of
the system in action. An example search is described
in detail in the next section. You can experience
the latest version of WebSeer on-line at
http://infolab.cs.uchicago.edu/webseer.
1To be more precise, White and Jain claim that the run-time complexity
of the similarity retrieval methods depend on the intrinsic dimensionality
of the dataset, which means that if the data lies within a lowerdimensional
subspace of the feature space, then it is the dimensionality of
the subspace that is relevant to the running time of the algorithms.
Figure 1: An image search query.
3.1 An Example Search
Suppose a user is interested in finding small, close-up
images of Rebecca De Mornay, and types Rebecca De
Mornay as the search text, and make selections as displayed
in Figure 1. The results page interface, shown in
Figure 2, is laid out as follows. Thumbnails of the resulting
images are displayed above a bar which indicates the
size of the original image. Clicking on the image loads the
original image into the browser. Clicking on the page icon
to the right of the image loads the page which contains the
image.
The results were obtained as follows. Webseer searches
for images whose associated text contain any of the words
contained in the query, after eliminating words that are especially
common (e.g. connecting words such as and and
the). The results are sorted by the summed weight of the
words associated with that image. For instance, if one image
has associated words Rebecca, with a weight of 3 (because
it occurs in the alternate text associated with the image),
and Mornay with a weight of 1 (because it occurs
in the HTML title of the page containing the image), that
image would have a summed weight of 4. The other selections
indicate that only small images with a height or
width < 150 pixels, with file size < 10K, which were deFigure
2: Results of the query shown in Figure 1.
termined to be photographs, and which contain exactly one
face whose height is at least 50Lastly, the results are sorted,
first by the weighted sum of associated words, and then,
since close up was selected, by the detected face size (with
the largest faces appearing first).
4 HowWebSeerWorks
Information for finding images on theWorldWideWeb
can come from two sources: the associated text and the
image itself. WebSeer uses information from both sources.
Cues from the Text and HTML Source Code An HTML
document is a structured document. Understanding the
structure of a particular document can reveal valuable information
about the images contained on that page. There
are several places relevant information about image content
may be located within the document. In order of the likelihood
of the text being useful in an image search, these
include image file names, image captions, alternate text
(displayed when the image cannot be displayed), hyperlinks,
and HTML titles. While this source of information
is not the primary focus of our research, it is extremely important
to the effectiveness of the image search tool. The
information extracted from the image content is designed
to complement the information that is available from the
surrounding text.
5 Image Content Analysis
Although it is clear that the context provided by the surrounding
text can produce an effective image search engine,
examining the images themselves can substantially
enrich the search. The following attributes can be easily
obtained from the image header: image size, file type
(JPEG, GIF, GIF89, etc.), file size, file date, and grayscale
vs. color. Obtaining information beyond that readily available
in the header requires analysis of the image itself.
5.1 Classifying Photographs and Drawings
The algorithm classifies images into photographs and
artificial images, where the terms “artificial images” and
“drawings” are used to denote all images that are not photographs.
There are some borderline cases. The most common
is images that contain a photograph and an artificial
part, which could be, for example, a computer-generated
background or superimposed text. Currently, the algorithm
is not designed to handle such images consistently.
The algorithm can be split into two independent modules.
One module consists of tests that separate photographs
from drawings. After the image has been submitted
to all the tests, the decision-makingmodule decides
how it should be classified.
5.1.1 Tests
In every test an image gets a positive real number as a
score. The images are represented by three matrices, corresponding
to their red (R), green (G) and blue (B) color
bands, whose entries are integers from 0 to 255. Photographs
and drawings tend to score in different ranges.
Here are the tests that have proven to be the most useful:
 The band difference test: We pick a threshold T between
0 and 255 and initialize the counter C to 0. For
every pixel in the image, let r, g, b be its respective
values in the R, G, B matrices. Let m be the largest
number among r, g, b, and let n be the smallest among
them. If m �� n  T, we increase the counter C by
one. The score of the image is C
S where S is the number
of pixels in the image. For thresholds T over 50,
we expect drawings to score higher than photographs,
because the colors used in drawings are usually highly
saturated. We can actually submit an image to this test
more than once, with a different value of T each time.
 The farthest neighbor test: We pick a threshold T between
0 and 765 and initialize counters C and S to 0.
For each inner pixel P of the image, let (r; g; b) be its
color vector (r, g, b are defined as in the band difference
test). We consider the pixel’s top, bottom, left
and right neighbors. For each neighbor, let (r0, g0, b0)
be its color vector. Let d = jr��r0j+jg��g0j+jb��b0j.
We pick the neighbor for which this d is maximum. If
d > 0 we increase S by one. If d  T we also increase
C by one. The score of the image is C
S. This
test turns out to be useful, because in general color
changes across pixels tend to be more abrupt in drawings.
Therefore, artificial images tend to score higher
than photographs in this test, when T  100.
 The color test: We count the number of different colors
in the image. Drawings tend to have fewer distinct
colors than photographs, at least in GIF images.
 The most common colors test: We fix a number N
between 1 and 256. We find the N most common
colors in the image. The score is the fraction of pixels
in the image that have one of those colors. For N
smaller than 10, drawings tend to score much higher
than photographs.
 The color table test: A color table is a three dimensional
table of size 161616 . Each color (r; g; b)
corresponds to entry (b r
16c; b g
16c; b b
16c) in the color
table (where bxc is the floor of x). To initialize a color
table, first we set all its entries to 0. Then, we pick a
set of images, and for each of those images we do the
following: For each color (r; g; b), let f be the fraction
of pixels that has that color in the image. We add
f to the entry corresponding to (r; g; b) in the color
table.
We have created four such color tables, corresponding
to the categories of GIF photographs, GIF drawings,
JPEG photographs and JPEG drawings. For each of
the color tables we only used images that belonged to
the corresponding group. The number of images used
in each color table varied from 300 to 900.
The actual test is the following:We pick the two color
tables that correspond to the image format (GIF or
JPEG) of the current image. We set A and N to 0.
For each color occuring in the image, let f be the fraction
of pixels that have that color. Let a be the entry in
the drawings color table that corresponds to that color.
Let n be the respective entry in the photographs color
table. We set A = A + af, and N = N + nf. The
final score is N
N+A. It is obvious by the definition
of the score, that photographs are expected to score
higher. The assumption we make in this test is that
some colors occur more frequently in drawings, and
other colors occur more frequently in photographs.
 The neighbor table test: This is pretty similar to the
previous test. We create a one-dimensional table of
size 766. The entries are indexed from 0 to 765. To
initialize the table, first we set its entries to 0. Then,
for each of a set of images, for each numberM from 0
to 765 we find the score s of the image in the Farthest
Neighbor test with the threshold T = M, and add s
to the entry indexed by M on the table.
We create four tables, as in the color table test. To test
an image, we set A and N to 0, and for each number
M from 0 to 765, we find the score s of the image in
the farthest neighbor test with the threshold T = M.
Let a and n be the entries corresponding to M on the
drawings and the photographs table respectively. We
set A = A + as and N = N + ns. The final score is
again N
N+A
 The band difference table test: Identical to the previous
test, but using the band difference test instead of
the farthest neighbor test.
 The narrowness test: We look at the ratio of rows to
columns. In photographs, the ratio tends to be between
0.5 and 2, whereas in drawings it is often less
than 0.5.
5.1.2 Decision making
Yali Amit [1] describes how to use multiple decision trees
to classify objects into certain categories. We use multiple
decision trees to classify an image. Each tree is binary,
and each node is either a leaf, or otherwise it has a
test field, which describes the next test the image should
be submitted to, and the threshold, such that if the image
scores below that threshold it should move to the left child,
otherwise it should move to the right child. The trees differ
from each other in the tests they submit the image to, as
well as the order in which those tests occur in the tree. We
use separate sets of trees for GIFs and JPEGs. The training
set was about 900 GIF drawings, 350 GIF photographs,
350 JPEG drawings and 400 JPEG photographs.
5.1.3 Results
The algorithm was tested on photographs and drawings,
both GIF and JPEG, not included in the training set, with
the following results:
Image category Number Number Error
of images of errors rate
GIF drawings 895 28 0.031
GIF photos 333 49 0.147
JPEG drawings 330 41 0.124
JPEG photos 278 8 0.029
In building the decision trees we tried to tailor them to
the actual frequency of each category on the Web. Based
on the images we index, it seems that GIF drawings occur
at least 10 times as frequently as GIF photographs, and
JPEG photographs occur at least 3 times as frequently as
JPEG drawings. So, the overall error rate for GIF images
is 0:042 and the overall error rate for JPEGs is 0:053.
5.2 Locating Faces
The current version of WebSeer uses a face-finder developed
by Rowley, et. al. [5], which searches for upright
faces oriented towards the camera. The efficiency of the
face finder is improved by searching for faces only in images
determined to be photographs, and by detecting possible
face locations using color cues [11] prior to pattern
analysis using multiple neural networks. Henry Rowley
has provided us with an implementation of the face finder,
including the color-based skin detector, which we have integrated
intoWebSeer.
6 TheWebSeer Implementation
WebSeer was implemented with three guiding principles
in mind: First, WebSeer must have acceptable performance.
We need to allow for extremely high speed
searches, as we expect a large number of people to be using
our system simultaneously. Since indexing occurs off-line,
the performance of the indexing engine is less crucial than
that of the search engine. Nonetheless, indexing the entire
web in a reasonable amount of time requires the processing
to proceed quite quickly. For example, assuming there are
10million unique images on theweb, indexing them in two
weeks would require eight images to be processed every
second. Crawler speed is also important. Preliminary results
indicate that for every 100 HTML pages on the Web,
there are 40 (unique) GIF images and one (unique) JPEG
image. In contrast to the file size of HTML pages, which
averages around 6k, the average file size for GIF files is
11k, and the average file size for JPEGs is 35k.
Second, we tried to incorporate standard commercial
software and hardware whenever possible. Much work has
been put into developing advanced database engines, and
WebSeer’s ability to leverage technology adds significantly
to its power. Microsoft’s Visual C++ development environment
was used for much of this project. Microsoft’s
Foundation Classes in conjunction with the Standard Template
Libraries (STL) provided many useful functions and
abstractions during the development of this project. On
the hardware front, the use of relatively inexpensive PCs
allowed us to ramp up our storage and processing capacities
quickly and within reasonable cost. Third, the project
should provide a basis for experimentation. We foresee
WebSeer evolving in the following ways:
 better image understanding algorithms.
 advanced text indexing capabilities
 improved interactions between the image understanding
and text indexing algorithms
 more complex transformations from form query submissions
to search actions taken.
6.1 WebSeer Components
To facilitate this type of research, the project is divided
in such a way that each component can be worked on independently.
Additionally, we wish to facilitate the incorporation
of relevant new technologies as they become
available. The WebSeer project is composed of four major
executables. With the exception of the Image Processing
Server, which is runs on Unix, all executables are written
in C++ and run on a M.S. Windows NT 3.51 platform.
1. The WebSeer Crawler crawls the web downloading
both HTML pages and images. The crawler is
multi-threaded so that the delay downloading pages
is spread over multiple threads. Each thread is connected
to both a thread on an image processing server
(using TCP/IP) and to the database (using the ODBC
2.0 database protocol) which contains the indexed information.
Each new image which is encountered by
the crawler is sent to the WebSeer Image Processing
Server which analyzes the image content and returns
the results to the crawler.
2. The WebSeer Image Processing Server is a C++ executable
which runs on SunOS/Solaris/AIX. The Image
Processing Server is a multi-threaded and runs
on multiple machines (with different operating systems).
A series of image understanding algorithms
are run on each image and the results are returned
to the crawler. The image understanding alogorithms
are run in an order that is designed to minimize the
amount of computatino required for each image. The
photograph/graphic detector, for example, is designed
as a decision tree with the fastest algorithms run first.
Additionally, face detection, an algorithm which requires
a relatively large amount of computation, is
run only when the image is determined to be a photograph.
3. The WebSeer CGI script is called when the user submits
(POSTs) a query from the WebSeer form. This
script opens a TCP/IP connection to the WebSeer
Search Server, and formats the results for display to
the user.
4. The WebSeer Search Server accepts requests from
the WebSeer CGI script and performs the appropriate
searches based on the form fields which the user
has filled in.
The WebSeer Crawler largely obeys the Robot Exclusion
Protocol. The Protocol is dependent on systemadministrators
including a robots.txt file on their web site, which
lists areas of the web site which robots should not visit.
Most robots are designed to download only HTML files,
and so visiting a directory which includes images would
be inappropriate. For this reason, some robots.txt files exclude
directories which contain only image files. Since
the WebSeer Crawler is designed to download images, we
obey the restrictions specified by the robots.txt file when
deciding whether to download an html file, but not when
downloading an image file.
The WebSeer Crawler indexes each new html page as it
is downloaded. As each image is encountered in theHTML
file (either through an <img src=this image.gif>
tag or an <a href=this image.gif> tag, the appropriate
text is extracted in the form of whole words. These
words may be present in any of the ways described above.
Some of the words are more likely to contain information
about the content of the image they refer to. For this
reason, we weight the words according to the likelihood
that they contain useful information. Words contained in
the title of the HTML page, for example, have a lower
weight than those in the ALT tag of an image. When a
user performs a search the summed weights of the matching
words are used as one criterion for sorting the resulting
images. These words and their weights are stored in a single
database table.
Image understanding algorithms are run whenever an
image is encountered. The figure below indicates the fields
which WebSeer currently saves for each image.
Field Name Sample Data
File Name http://www.cdt.org/
images/cdtlgo.gif
Source URL http://www.cdt.org/
index.html
Color Depth 8 bit color
File Size 3,129 bytes
File Type gif
Image Width 204
Image Height 107
Is Image a No
Photograph?
Is Image an No
Image Map?
Is Image included No
as a Reference?
Number of Faces 0
Three fields contain information about how the image
was included in the HTML document. The Is Image
a Photograph field contains the results of the photograph/
drawing algorithm described below. Is Image an
Image Map indicates whether the image appeared in the
HTML code with an ISMAP tag. These images appear to
the user as clickable images.
Is Image included as a Reference? indicates whether
the image appears as part of an HTML document, or whether
only a reference is included. If an image is part of an
HTML document, users viewing that document will immediately
be able to see that image. If a reference is included,
the user may have to click on some part of the document in
order to view the image. References to images are common
when the images are large and/or slow to download.
The Number of Faces field indicates the number of faces
which were detected in the given image. Each detected
face has four attributes associated with it: horizontal position,
vertical position, height, and width. Since an image
may contain a number of different faces, the attributes
(fields) associated with each particular face are saved in a
separate face table.
The Largest Face Size field saves the height of the
largest face detected in the given image, as a percentage of
image height. Although this information is also contained
in the face table, saving the information in the image table
speeds some searches by eliminating the need to perform a
JOIN of the image table with the face table.
6.2 Crawler Efficiency and Robustness
In order to index the enormous amount of data on the
web the WebSeer system must be both efficient, so that it
can index images quickly, and robust, so that it can run
unattended for long perionds of time.
TheWebSeer Crawler is separated into a number of different
exectutables with efficiency in mind. The crawler
is a multi-threaded Windows NT executable which establishes
connections to the database and the image processing
server during startup. The crawler is designed so that
multiple copies of the crawler can be run on different PCs
simultaneously. The amount of required communication
between threads on a single crawler and between crawlers
on different machines is kept to minimum. The advantage
is that while some threads are busy waiting to download
images or waiting for the image processing server to return
results, other threads can be parsing html files or creating
thumbnails of the images.
Each thread on each crawler acts largely as an independant
entity. If a crawler thread is connected to an image
processing server thread, and that image processing server
machine crashes (e.g. runs out of memory), then only that
crawler thread will be in jeapardy. Crawler threads connected
to image processing servers on different machines
will not be effected. Additionally, if a crawler thread notices
that the connection to it’s image processing server is
broken, the crawler thread will first attempt to reconnect
to the same image processing server and, if that fails, it
will attempt a connection to a different image processing
server.
7 Scaling Up
There are an estimated 55 to 60 million Web pages
(HTML documents). Our preliminary experiments indicate
there may be about one-half as many unique images
as Web pages, meaning about 30 million images to index.
Rough calculations suggest that to index the entire
Web, WebSeer’s database would be about 5 GB, and storing
thumbnails would take up about 45 GB of disk space.
Crawling the Web to index all the images will require
downloading them all. Our current multi-threaded Web
crawler can downloadmany pages per second, running on a
200MHz Pentium Pro PC attached to a dual T1 line shared
with the rest of the University. The commercial database
has been the bottleneck up to this point, though as we optimize
its operation, transfer some of the data to custom data
structures, and increase the capabilities of its host, we may
find that connectivity to the Internet is the bottleneck. The
cost of image analysis has not been a limiting factor, because
we are careful to limit expensive processing such as
face-finding to only restricted areas of a small fraction of
the images, and because the image processing can be distributed
over many processors and machines (we are currently
employing a Sun Sparc 20 with four 125 MHz HyperSparc
processors, and an IBM RS/6000 G30 with four
75 MHz PowerPC 601 processors).
8 FutureWork
Immediate research goals include the improvement of
the photo/drawing classifier, and the face-finder, and extending
the image duplicate detector to detect images that
are almost identical (e.g. transformed via cropping or by
adding a border).
We are currently working on improving
the photo/drawing classification in two directions: Finding
more tests, and handling mixed images. We are considering
various new tests, including looking at the shapes of
connected regions of the same color (they are expected to
be regular shapes in drawings), looking at the color transitions
(transitions from pure red to pure green, for example,
are expected in drawings but not in photographs) and looking
for straight lines. We are also considering various ways
of splitting an image into appropriate regions, which can
then be classified as photographs or drawings independent
of each other. This would allow us to classify many mixed
images in a meaningful way, like mainly photograph, or
mainly drawing.
We view the photograph vs. drawing distinction as
a starting point toward a more general image taxonomy.
We are working on identifying a taxonomy that fits users’
needs and is constructed of image classes that can be
reliably identified. Some of these categories may include
advertisements, geographic maps, cartoons, graphs,
landscapes, city/country scenes [4], night scenes, sunsets,
scenes with foliage, and so on.
We believe that we need a close interaction between the
image understanding algorithms and the associated text indexing
algorithms in order to successfully categorize images.
Srihari’s theory of visual semantics [8] provides useful
insight into some of the challenges of integrating text
indexing with image understanding algorithms. A closer
interaction between text and image processing will also
provide significant improvements in indexing speed. Recognizing
faces in an image is an example of an image
understanding algorithm which requires a relatively large
amount of processing time. By only running those algorithms
or models on images whose results are likely to be
useful, we believe that we can save significant processing
time per image.
Asmore functionality is added toWebSeer, the interface
should get simpler, not more complicated. The selections
of some image content combination boxes, for example,
could be assumed from the text of the query that the user
enters. Consider the query Bill and Hillary Clinton. If Bill
and Hillary can be recognized as first names, this information
can be used to direct the search towards images in
which two people were detected.
One of the goals of WebSeer is to encourage work
within other groups on problems posed by image search
engine technology. One of our first projects along these
lines will be to sponsor a competition for the face detector.
We will provide training data, ground truth, and evaluation.
The winning implementationwill become part ofWebSeer.
The approach and a considerable fraction of technology
from WebSeer should be applicable to other multi-media
databases containing structured text and images, including
newspaper or magazine archives, CD ROM’s, and video
annotated by closed captions.
References
[1] Y. Amit, D. Geman, and K. Wilder. Recognizing
shapes from simple queries about geometry. 1995.
[2] M. Flickner, H. Sawhney, W. Niblack, J. Ashley,
Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee,
D. Petkovic, D. Steele, and P. Yonker. Query by image
and video content: The qbic system. Computer,
28:23–32, 1995.
[3] V. Ogle and M. Stonebraker. Chabot: Retrieval from
a relational database of images. IEEE Computer,
28(9):40–48, September 1995.
[4] R.W. Picard and T. P. Minka. Vision texture for annotation.
Journal of Multimedia Systems, 3:3–14, 1995.
[5] H. A. Rowley, S. Baluja, and T. Kanade. Neural
network-based face detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern
Recognition, pages 203–208, 1996.
[6] Y. Rubner and C. Tomasi. Coalescing texture descriptors.
In Proceedings of the ARPA Image Understanding
Workshop, pages 927–936, 1996.
[7] J. R. Smith and S. Chang. Searching for images and
videos on theworld-wideweb. Technical report, Center
for Image Technology for New Media, August
1996.
[8] R. Srihari. Linguistic context in vision. In Proceedings
of the Workshop on Context-Based Vision, 1995.
[9] R. K. Srihari. Automatic indexing and contentbased
retrieval of captioned images. IEEE Computer,
28(9):49–56, 1995.
[10] D. A. White and R. Jain. Algorithms and strategies
for similarity retrieval. Technical Report VCL-
96-101, Visual Computing Laboratory, University of
California, San Diego, 1996.
[11] J. Yang and A.Waibel. Tracking human faces in realtime.
Technical Report CMU-CS-95-210, School
of Computer Science, Carnegie Mellon University,
1995.

 

 

  © 2002-2004   Home Page ; Iconocast offers eMarketing, Internet Advertising, Online Advertising, Internet Marketing, Online Branding, and eMarketing News Services.