Crankshaft: Content-Based Retrieval Concept (information science)

Introduction

Because of the demand for efficient management in images, much attention has been paid to image retrieval over the past few years. The text-based image retrieval system is commonly used in traditional search engines (Ratha et al., 1996), where a query is represented by keywords that are usually identified and classified by human beings. Since people have different understandings on a particular image, the consistency is difficult to maintain. When the database is larger, it is arduous to describe and classify the images because most images are complicated and have many different objects. There has been a trend towards developing the content-based retrieval system, which tries to retrieve images directly and automatically based on their visual contents.

A similar image retrieval system extracts the content of the query example q and compares it with that of each database image during querying. The answer to this query may be one or more images that are the most similar ones to q. Similarity retrieval can work effectively when the user fails to express queries in a precise way. In this case, it is no longer necessary to retrieve an image extremely similar to the query example. Hence, similarity retrieval has more practical applications than an exact match does.

content-Based Image Retrieval systems

In a typical content-based image retrieval system, the query pattern is queried by an example in which a sample image or sketch is provided. The system then extracts appropriate visual features that can describe the image, and matches these features against the features of the images stored in the database. This type of query is easily expressed and formulated, since the user does not need to be familiar with the syntax of any special purpose image query language. The main advantage is that the retrieval process can be implemented automatically (Chen, 2001). The scope of this article is circumscribed to image abstraction and retrieval based on image content.

Human beings have a unique ability that can easily recognize the complex features in an image by utilizing the attributes of shape, texture, color, and spatial information. Many researchers analyze the color, texture, shape of an object, and spatial attributes of images, and use them as the features of the images. Therefore, one of the most important challenges in building an image retrieval system is the choice and representation of the visual attributes. A brief overview of the commonly used visual attributes shape, texture, color, and spatial relationship will be illustrated as follows.

commonly used Image Features in content-Based Image Retrieval systems

Shape characterizes the contour of an object that identifies the object in a meaningful form (Gevers & Smeulders, 2000; Zhang & Lu, 2002). Traditionally, shapes are described through a set of features such as area, axis-orientation, certain characteristic points, and so forth. These systems retrieve a subset of images that satisfy certain shape constraints. In the shape retrieval, the degree of similarity between two images is considered as the distance between the corresponding points.

Color attribute may simplify the object's identification and extraction in the image retrieval (Galdino & Borges, 2000; Gevers & Smeulders, 2000). Color may provide multiple measurements at a single pixel of the image, and often enable the classification to be done without complex spatial decision-making. Any resulting difference between colors is then evaluated as a distance between the corresponding color points. The color-based retrieval system measures the similarity of the two images with their distance in color space.

Texture attribute depicts the surface of an image object (Yao & Chen, 2002; Zhang & Tan, 2003). Intuitively, the term refers to properties such as smoothness, coarseness, and regularity of an image object. Generally, the structural homogeneity does not come from the presence of a single color or intensity, but it requires the interaction of various intensities within a region.

Retrieval by spatial constraints facilitates a class of queries based on the 2-D arrangement of objects in an image (Chang Erland & Li, 1989; Chang & Li, 1988; Chang, Shi & Yan, 1987; Lee & Hsu, 1992). The query is composed by placing sketches, symbols or icons on a plane where every symbol or icon is predefined for one type of objects in an image. The relationships between the objects can be broadly classified as either directional (also referred as projective) (Chang & Li, 1988; Chang, Shi & Yan, 1987) or topological (Lee & Hsu, 1992). Directional relationship is based on the relative location and the metric distance between two image objects. Topological relationships are based on set-theoretical concepts like union, intersection, disjunction and so forth. Spatial information is a higher-level attribute, which is increasingly more specific. For example, facial features are frequently presented in terms of spatial information (Sadeghi, Kittler & Messer, 2001).

Briefly, color attribute depicts the visual appearance of an image, characterized by the luminance and chrominance histograms of the image. Texture attribute refers to three components: bi-dimensional periodicity, mono-dimensional orientation, and complexity obtained through world decomposition. Shape attribute sketches the geometrical properties of objects in images. Spatial attribute represents the relative position relationships between objects of an image.

typical image retrieval systems

This section briefly overviews the image retrieval systems based on the most commonly used image features: color, shape, texture, and spatial content.

The color-Based Image Retrieval Systems

Generally, the color-based image retrieval system does not find the images whose colors are exactly matched, but images with similar pixel color information. This approach has been proven to be very successful in retrieving images since concepts of the color-based similarity measure is simple, and the convention algorithms are very easy to implement. Besides, this feature can resist noise and rotation variants in images.

However, this feature can only be used to take the global characteristics into account rather than the local one in an image, such as the color difference between neighboring objects in an image. For example, if a landscape image with blue sky on the top and green countryside at the bottom is employed as a query example, the system that retrieves the images with similar structures based on these global features often gives very unsatisfactory results. In addition, the color-based image retrieval system often fails to retrieve the images that are taken from the same scene in which the query example is also taken from under different time or conditions, for example, the images of a countryside taken at dusk or dawn under a clear or a cloudy sky. In another scenario, the same scene may be imaged by different devices. Using one image taken by one device as the query example may fail to find the same scene taken by other devices.

The Shape-Based Image Retrieval systems

A shape-based image retrieval system is used to search for the images containing the objects, which are similar to the objects specified by a query. Since an object can be formed by a set of shapes in most cases (e.g., a car can be made of some little rectangles and circles), most similar objects have a high correlation in their set of shapes (Gevers & Smeulders, 2000; Zhang & Lu, 2002). The shape-based image retrieval system extracts the shapes of objects from images by segmentation, and classifies the shapes, where each shape has its own representation and variants to scaling, rotation, and transition.

Some criteria on shape representation and similarity measure for a well performing content-based image retrieval system should be achieved. Firstly, the representation of a shape should be invariant to scale, translation, and rotation. Secondly, the similarity measure between shape representations should conform to human perception; that is, perceptually similar shapes should have highly similar measures. Thirdly, the shape representation should be compact and easy to derive, and the calculation of similarity measure should be efficient.

However, how to locate and how to recognize objects from images is a real challenge. One of the obstacles is how to separate the objects from the background. Difficulties come from discrimination, occlusions, poor contrast, viewing conditions, noise, complicated objects, complicated backgrounds, and so forth. Moreover, the shape-based image retrieval system can only deal with the images that have simple object shapes. For complex object shapes, the region-based method has to build a binary sequence by using smaller grid cells, so that results that are more accurate can be obtained; nevertheless, the storage of indices and retrieval time may increase tremendously.

The Texture-Based Image Retrieval systems

Literally, texture relates to the arrangement of the basic constituents of a material. In digital images, texture describes the spatial interrelationships of the image pixels. Texture similarity can often be useful in distinguishing the areas of objects in images with similar color, such as sky and sea as well as leaves and grass. Texture queries can be formulated in the manner that is similar to the color queries by selecting an example of desired textures from a palette, or by supplying an example query image. The system then returns the images which are most similar to the query example in texture measures.

Making texture analysis is a real challenge. One way to perform content-based image retrieval using texture as the cue is by segmenting an image into a number of different texture regions and then performing a texture analysis algorithm on each texture segment. However, segmentation can sometimes be problematic for image retrieval. In addition, texture is quite difficult to describe and subject to the difference of human perception. No satisfactory quantitative definition of texture exists at this time.

The spatial-Based Image Retrieval systems

There are two kinds of spatial-based image retrieval systems: retrieval by spatial relationships (RSRs) and spatial access methods (SAMs). The RSR image retrieval system is to retrieve the images from a database that are similar to the query sample based on relative position relationships between the objects in the images. Hence, a physical image can be regarded as a symbolic image, each object of which is attached with a symbolic name. The centroid coordinates of the object with reference to the image frame are extracted as well. By searching for the logical images, the corresponding physical images can then be retrieved and displayed. Therefore, image retrieval can be simplified to the search of symbolic images.

Chang, Shi, and Yan (1987) used a 2D string representation to describe a symbolic image. Objects and their spatial relationships in a symbolic image can be characterized by a 2D string. An image query can be specified as a 2D string too. Consequently, the problem of image retrieval then turns out to be the matching of a 2D string. Subsequently, a great number of other image representations popped out that were derived from a 2D string, such as 2D G-string (Chang, Erland & Li, 1989), 2D B-string (Lee, Yang & Chen, 1992), 2D C-string (Lee & Hsu, 1992), and so forth. These representations adopt the description of orthogonal projection to delineate the spatial relationships between objects.

The SAM image retrieval systems are to manage large collections of points (or rectangles or other geometric objects) in the main memory or on the disk so that range queries can be efficiently answered. A range query specifies a region in the address space, requesting all the data objects that intersect it. They divide the whole space into several disjoint sub-regions, each with no more than P points (a point may represent a rectangle). P is usually the capacity of a disk page. Inserting a new point may result in further splitting a region. The split methods can be classified according to the attributes of the split (Gottschalk, Turney & Mudge, 1987).

Color attribute is most intuitive and straightforward for the user. Texture analysis systems are often developed to perform filtering in the transform domain in order to obtain feature images. The use of global color or texture features for the retrieval of images tends to be misleading, especially in homogeneous image collections. Though shape and texture are the essential visual attributes to derive potentially useful semantic information, there exists less understanding of the benefits to implement these attributes as compared to color, for efficient image retrieval. This approach apparently focuses on global frequency content of an image; however, many applications require the analysis to be localized in the spatial domain.

Future Trends

Many visual attributes have been explored, such as color, shape, texture, and spatial features. For each feature, there exist multiple representations that model the human perception of the feature from different perspectives. There is a demand for developing an image content description to organize the features. The features should not only be just associated with the images, but also be invoked at the right place and the right time, whenever they are needed to assist retrieval.

Human beings tend to apply high-level concepts in their daily lives. However, most features, the current computer vision techniques automatically extracting from images, are low-level. To narrow down this semantic gap, it is necessary to link the low-level features to high-level concepts. On the high-level concept, it should allow the user to easily provide his or her evaluation of the current retrieval results to the computer. It is likely for different people to give an identical name; therefore, generating the representation by automatically extracting the objects from an original image is very difficult. Therefore, the spatial relationships between objects cannot be extracted automatically without human interaction with the current techniques of image understanding and recognition. More recent research emphasis is given to "interactive systems" and "human in the loop".

Due to the perception subjectivity of image content, it is difficult to define a good criterion for measuring the similarity among images. That is, the subjectivity of image perception prevents us from defining objective evaluation criteria. Hence, it is urgent to find an appropriate way of evaluating the system performance guiding the research effort in the correct direction.

Establishing a well-balanced large-scale test bed is an important task too. A good test bed must be huge in scale for testing the scalability (for multidimensional indexing), and be balanced in image content for testing image feature effectiveness and overall system performance.

Human beings are the ultimate end users of the image retrieval system. This topic has attracted increasing attention in recent years, aiming at exploring how humans perceive image content and how one can integrate such a "human model" into the image retrieval systems. Recently, more studies of human perception focus on the psychophysical aspects of human perception.

conclusion

A successful image retrieval system requires the seamless integration of the efforts of multiple research communities. To achieve a fast retrieval speed and make the retrieval system truly scalable to large-size image collections, an effective multidimensional indexing module is an indispensable part of the whole system. The interface collects the information from the users and displays back the retrieval results to the users in a meaningful way. To communicate with the user in a friendly manner, the query interface should be graphics-based.

In the iconic image representation, an icon is used to represent an object in an image. The iconic image representation has two advantages. First, once the images in the database are analyzed, it is not necessary to analyze the image again in a query processing. Secondly, since the size of the symbolic image is much smaller than that of the original image, this representation can be well suited to be distributed to database environments where a large number of image transmissions between distant nodes are required.

Typically, color, shape and texture attributes provide a global description of images, but they fail to consider the meaning of portrayed objects and the semantics of scenes. The descriptions of objects and the relative positions among objects provide a spatial configuration and a logical representation of images. Combining approaches for content-based image retrieval systems could be considered as complementary.

KEY TERMS

Color Feature: Analyzing the color distribution of pixels in an image.

Geometric Hashing: The technique identifying an object in the scene, together with its position and orientation.

Query by Example: The image retrieval system where a sample image or sketch can be provided as a query.

Shape Feature: Characterizing the contour of an object that identifies the object in a meaningful form.

Spatial Feature: Symbolizing the arrangement of objects within the image.

Symbolic Image: Consisting of a set of objects; each object stands for an entity in a real image.

Texture Feature: Depicting the surface of an image object.