Analyse factorielle des correspondances pour lindexation et la recherche dinformation dans une grande base de données dimages

Introdution

This chapter focuses on the principles of image retrieval based on content (RIC), which involves searching a database for images that closely match a query image by automatically extracting visual information from the images This task is complex for two main reasons.

Sensorielle.Lefossésensoriel estlefosséséparantl'objetréeldesareprésentation numérique, sousforme d'imageou dedesription deette image par exemple.

Sémantique.Le fossésémantique traduit ladiéreneentrel'information extraite du ontenu visuel des images et l'interprétation de e ontenu par l'utilisateur dans unertainontexte.

In a typical RIC system, the visual content of images in the database is extracted and represented by image signatures, which form a signature database To search for images, the user provides an example image known as a query, and the system represents this query with its signature Similarity measures between the query signature and all images in the database are calculated and compared, often resulting in a list of similar images However, as the database size increases, examining all images becomes prohibitive To address this issue, most RIC systems employ an indexing scheme that allows for more efficient searching by examining only a small portion of the database Chapter 2 will discuss indexing techniques Additionally, some recent systems have integrated user relevance feedback to refine the search process and generate more meaningful results, enabling a progressive information retrieval process through user interaction.

Figure1.1Diagrammepourunsystèmedereherhed'imagesparleontenutypique. lisateur indique ensuitelesdouments quisont pertinentspour sonbesoin.Le système peututiliseretteinformation soitquantitativement (retourneplusdedoumentssimi- laires aux douments pertinents) soit qualitativement (retourne douments similaires auxdouments pertinents avant d'autres douments).

A RIC system aims to address two key challenges: first, how to mathematically derive the visual content of an image, and second, how to assess the similarity between two images based on their abstract descriptions.

Lessolutions au premierproblème sont présentées danslasetion1.2 quitraitelades- ription du ontenu visuel de l'image Le seond problème est détaillé dans la setion

1.3 Nousdérivonsensuitebrièvement lesméthodesinspiréesde l'analysedesdonnées textuelles avant l'interation homme -mahineetles métriquesd'évaluation.

Desription du ontenu visuel

Extration des aratéristiques visuelles

Il n'y a pas une dénition universelle ou exate de e qui onstitue une aratéris- tique.Ladénitionexatedépendsouventduproblèmeoudutyped'appliation.Dee fait, nousadoptonsla dénitionsuivante desaratéristiques :

Dénition 1.1 (Caratéristique visuelle) Unearatéristiquevisuelle(oua- ratéristique pour faire bref) d'une image est dénie omme une abstration des informations visuelles de l'image qui sont pertinentes pour des tâhes de alul reliées à une ertaine appliation (par ex.lassiation d'images, reherhe d'images).

Lesaratéristiquessont extraites,soitglobalementsuruneimage entière,soitloa- lement surunpetitgroupede pixels (unerégion).Le résultat d'uneétape d'extration de aratéristiques(globales ouloales)estappelédesripteur de aratéristique.

Dénition 1.2 (Desripteur de aratéristiques) Nous appelons la desription mathématiqued'uneimage ouune régionloale del'image,aprèsune étape d'extration dearatéristiques, son desripteur de aratéristiques(ou desripteurpour plus

Les desripteursseprésentent souvent sousforme d'unveteurdansun espae vetoriel, R D , appelé l'espae de aratéristiques.

Dans le as d'une extration globale, on réupère un seul desripteur par image tandis qu'une desription loalepermetd'obtenir pour une image unensemble dedes- ripteurs loaux.

Nous présentons dansette setion ertains types de aratéristiques les plus ou- ramment utilisées pour aluler les desripteurs : la ouleur, la texture, la forme, les points d'intérêt etles relations spatiales.

From the onset of content-based image retrieval (CBIR) research, we have investigated the characteristics of color spaces that enhance our understanding of the human visual system While most images are represented in the RGB color space, this model is seldom used for indexing and searching images, as it does not align well with color perception Alternative color spaces such as HSV (Hue, Saturation, Value) and CIE L*a*b* offer more effective solutions for image retrieval.

CIEL*u*v*(proposés par laCommissionInternationalede l'Élairage) sont mieuxpar rapportà lapereptionhumaine (-à-d.les diérenesdansl'espaede ouleursont si- milairesàellesentrelesouleursqueleshumainsperỗoivent)etsontdonplusutilisộs.

Les moments de ouleurs ont été utilisés dans plusieurs systèmes de reherhe d'images ommeQBIC [FSN

The use of color moments, particularly when an image contains a single object, has proven to be effective in representing color distributions By utilizing only nine moments for each color component, this method offers a compact representation compared to other color characteristics However, due to this compactness, the discriminative power of color moments may be reduced.

Proposéslapremièrefoisdans[SB91℄,leshistogrammesdeouleurssontdevenusdes desripteursprinipauxpourleontenud'imagesetontétéutilisésdansQBIC[FSN

VisualSEEK and Pitoseek are both effective for small databases, but they face challenges when analyzing color distributions in images, as different images can yield similar histogram results This issue becomes more pronounced with larger databases To address this, Pass et al proposed joint histograms, which enhance color histogram discrimination by integrating various features such as color, edge density, and texture Additionally, traditional color histograms do not account for spatial information.

Pixels with the same color are often not similar, as they can represent different objects or uniform regions Based on this observation, Passet et al propose a method to incorporate spatial information into color histograms, distinguishing between coherent (uniform color regions) and incoherent color vectors For each class of pixels, a color histogram is generated While coherent color vectors yield better results due to the inclusion of spatial information, there is no distinction among pixels within the same class Additionally, determining the class for pixels requires several parameters that may vary across different images.

99℄estuneautreapprohepour inorporerdesinformationsspatialesdansleshistogrammesdeouleurs.Ilspermettent de aratériser nonseulement les distributions despixels maisaussi laorrélationspa- tialedespairesdeouleurs.Comparésauxhistogrammesdeouleursetauxveteursde ouleurs ohérentes, les orrélogrammes de ouleurs fournissent les meilleurs résultats, maisleuromplexité de alulestplus ỏteuse.

Weighted color histograms enhance traditional color histograms and coherent color vectors by applying adaptive weighting to each pixel's contribution This weighting is based on a local measure of color non-uniformity calculated within the pixel's neighborhood Weighted color histograms strike a balance between robustness and the complexity of color-based descriptors.

Texture is a crucial property of images, capturing the granularity and repetitive patterns of surfaces For instance, grass, brick walls, and flower petals exhibit distinct textures One of the earliest significant works on this topic is Haralick's descriptors for automatic texture classification Texture plays an important role in various fields, including aerial imaging and medical imaging.

Texture representation is divided into two categories based on the application domain: structural and statistical methods Structural methods, including morphological operators and contiguity graphs, derive textures by identifying structural primitives and their placement rules, making them highly effective for representing regular textures In contrast, statistical methods characterize textures through the statistical distribution of image intensity.

Les desripteurs de Tamura [TMY78℄ sont utilisés dans QBIC [FSN

In various studies, including those by [MP90, JF91, Uns95, MM96, LWW00, DV02], texture characteristics have been extracted using techniques such as wavelet transforms, Fourier transforms, and Gabor filters Additionally, Markov random field-based descriptors have been employed in research by [MJ92, PS00].

Most previous texture descriptors assume that images are captured under the same conditions They are not invariant to affine transformations, such as rotation and scaling If we consider the characteristics

Figure 1.3 Images des pollens dans l'appliation de reonnaissane des pollens

(soure:http://www-sop.inria.fr/orion/ASTHMA) ter[Low04,MS04,BTVG06 ℄quiutilisentladétetiondespointsd'intérêt dansl'image.

Shape is a crucial attribute of segmented objects or regions within an image Unlike color and texture characteristics, shape features are typically extracted after an image segmentation process Due to the challenges in achieving precise and robust segmentation, and the ambiguity surrounding relevant object or region definitions, the application of shape features is often restricted to specific scenarios where objects or regions are clearly defined.

[PF86 ,ASBH90,KSP95 ℄)etles approhes basées sur les régions (momentsstatistiques

Recent developments in shape representation include the use of discrete curve evolution to simplify outlines, robust shape matching that accommodates certain geometric transformations, and a contour-based approach that represents shapes as sequences of wave and convex segments In this context, shape similarity is evaluated using a dynamic programming algorithm.

The use of the amplitude and phase of Fourier coefficients enables the derivation of shapes In this context, dynamic time warping, rather than Euclidean distance, allows for precise matching even in the presence of a limited phase shift.

Regions and objects with similar color and texture properties can be easily distinguished by introducing spatial constraints For instance, a region of blue sky and a region of ocean may have similar color histograms, but their spatial locations differ Therefore, the spatial location of regions (or objects) and the relationship between multiple regions in an image are crucial for image retrieval.

Chang et al [CSY87 ℄ basée surla théoriede projetions symboliques Ses variantes et extensionssont le2DG-string [CJL88 ℄,le2DC-string [LH90 ℄,le2DB-string [LYC92 ℄ etle2DBe-string [Wan03 ℄.

Parailleurs,d'autresmodèles apturent lesrelationsspatialesentrearatéristiques loales omme les points d'intérêt etreprésentent les atégories d'objets ou les lasses visuellespar uneombinaisondedesripteursloauxetde leursdistributions spatiales.

The introduction discusses various spatial relationship models between local entities, ranging from completely independent models, where each characteristic represents a distinct region, to fully connected models, such as constellation models It also explores intermediate models, including star topology and hierarchical structures, where entities are spatially dependent on their neighbors Additionally, spatial information can be integrated during a post-processing stage to enhance results.

Desription loale des images

Local descriptors can be extracted from an image by segmenting it into regions, typically using a simple partition that divides the image into equally sized rectangles However, this basic approach does not yield globally coherent regions with higher resolution Two common methods for identifying regions of interest in images are feature point detection, which identifies salient regions, and image segmentation, which divides the image into non-overlapping regions Ideally, segmentation would allow for the identification of distinct objects, such as cars, horses, or balls, but current technical limitations make this challenging.

Lespointsd'intérêttraditionnellementutiliséspourlastéréovisionsontutilisésaussi danslareherhed'images.Ilssontdéterminésdemanièretellequ'unpointtrouvédans uneimage seraaussitrouvédansuneautreimagequidièrelégèrement delapremière.

La signiation de tels points spéiaux est due à leur représentation ompate des régionsimportantes de l'image qui onduit à une indexation eae, età leur pouvoir disriminant surtout danslareherhe d'objets.

One of the earliest studies on the subject utilized a Harris detector to localize rotation-invariant interest points It was later demonstrated that descriptors cannot be invariant to scale changes if the extracted interest points are not themselves invariant to such changes Consequently, several detectors have been proposed to achieve scale invariance for interest points Automatic scale selection is performed by choosing extrema from a scale function, such as normalized Laplacians or differences of Gaussians Another detector proposed in the literature also addresses this issue.

The article discusses the extraction of salient points to explore the components of wavelet transformations at various scales To achieve invariance to affine transformations, Mikolajczyk et al proposed an adapted Harris detector and an iterative algorithm for detecting interest points that remain invariant to scale changes and affine transformations.

A discussion on the advantages and disadvantages of different types of interest point detectors used in image retrieval can be found in [GB02] Additionally, comparative evaluations of various interest point detectors are conducted in [SMB98, SMB00, MTS].

To effectively utilize points of interest, it is essential to characterize the region surrounding these points The characterization of a point of interest is evaluated at a chosen scale within the surrounding area Various descriptors have been proposed in the literature, including Shape Context, Steerable Filters, Scale-Invariant Feature Transform (SIFT), PCA-SIFT, and Gradient Location and Orientation Histogram.

(GLOH)[MS05 ℄.Uneétudeomparativesurl'évaluationdelaperformanedesdesrip- teursestprésentéedans[MS05℄.Parmi desdesripteurs, ledesripteurSIFT estleplus

Region-based segmentation is the most commonly used approach for partial query paradigms in image processing Segmenting an image is crucial for acquiring a region-based representation The advantage of region segmentation in content-based image retrieval lies in its ability to closely resemble object detection of semantic nature The ability to describe and characterize shapes within an image is essential Most image segmentation techniques can be broadly categorized into two types: edge-based approaches and region-based approaches Edge-based methods rely on discontinuities to partition an image by detecting isolated points, lines, and edges, while regions are inferred from their contours.

Methods based on this approach include local filters such as the Canny edge detector, as well as energy minimization models like active contours (snake models) and balloon models.

Algorithms in the second category analyze the homogeneity of certain characteristics such as intensity, color, and texture, utilizing methods like thresholding, clustering, region growing, division segmentation, and merging segmentation Each approach has its own advantages and disadvantages To enhance segmentation results, combining these methods can be beneficial A significant advancement in segmentation is the graph-based approach, where the segmentation problem is framed as a graph partitioning issue In this context, the set of vertices consists of pixels, and the weight of an edge connecting two vertices (pixels) represents a measurable, non-negative dissimilarity between them Ultimately, segmentation results in a partition of the vertex set into components, with each component corresponding to a connected component in the graph.

Le problème de partitionnement d'un graphe est onnu omme un problème de type

NP-completeness poses challenges in image indexing and retrieval Approximate methods, such as the normalized cut criterion and partitioning based on predicates that measure the likelihood of boundaries between regions, have been proposed Complete techniques for image indexing and retrieval through segmentation of regions are discussed in the literature.

F ormation des signatures d'images

1 estle÷ur des systèmes RIC Les signatures d'une image sont onstruites à partir de ses aratéris- tiquesvisuellesetsontde deuxtypes,desveteurs dearatéristiques (feature vetors)

1 Danslalittérature, lesdeuxtermessignature etdesripteur sontutilisés demanièreinterhan- geable.Paronvention,nousutilisonsletermesignaturepourladesriptiond'uneimageentièretandis quele termedesripteur estutilisé pourindiquerla desriptiond'une ertainepartie del'image (f. extrationloale).Danslaasd'extrationglobale,lasignatured'uneimageestsondesripteur alulé etdesdistributionsdesaratéristiques (f.Figure1.2).Commenousallonsleprésenter en détailpar lasuite, leshistogrammes etlessignatures basées surles régions peuvent êtreonsidéréesommedesensembles deveteurs pondérés.Quandlasommedespoids estégaleà1,esensemblessontéquivalentsauxdistributionsdisrètes (disrèteausens ó lessupports sont nis).

Global extraction often derives images through characteristic vectors, while local extraction produces a set of local descriptors viewed as a distribution Each local descriptor is a vector that characterizes the neighborhood of a specific pixel (point of interest) or a segmented region Direct use of the entire set of descriptors involves representing an image by its local descriptors With this representation, searching for a query image within a database requires individual searches of the local descriptors from the query image, with results merged using a voting technique.

We now present the construction of image signatures based on local descriptor summaries and the relationship between histogram summaries of descriptors and region-based signatures Let local descriptors be denoted as s i,j ∈ R D, where 1 ≤ i ≤ M (the number of images) and 1 ≤ j ≤ n i (the number of local descriptors for image i) To construct a basic histogram, the space R D is divided into a fixed number of cells, and the percentage of descriptors s i,j located within each cell is calculated Assuming there are k cells, the histogram is treated as a k-dimensional vector [f 1, f 2, , f k], where f l represents the frequency of descriptors in cell l.

C'estexatementunequantiationvetorielledesdesripteursloaux.Lesellulessont traités ommedesvariables symboliques.Dans [SZ03b,WAC

Visual words are analogous to textual words in texts, allowing text processing techniques to be applied to images represented by visual word descriptors These descriptors can be modeled using vector models with weighting or simpler histogram models A key advantage of this approach is that images are represented by simple vectors of varying dimensions, facilitating global feature extraction.

Such a representation complicates subsequent tasks like analysis and image retrieval However, histogram processing, which serves as frequency vectors, does not account for the localization of cells For certain proximity measures between two distributions, such as Earth Mover's Distance or Mallows distance, the localization of cells must be considered When these measures are applied, a histogram is mathematically defined as a collection of pairs {(c1, f1), (c2, f2), , (ck, fk)}, also known as a weighted vector set, where cl ∈ RD represents the center of cell l Thus, a histogram is a specific type of distribution in that it is discrete.

When an histogram is represented as { (c l , f l ) } for l=1,2, ,k, an adaptive extension of the histogram can be implemented by adjusting the values of c l and f l, allowing the number of cells, k, to vary according to the specific image This approach is based on region signatures, as utilized in [DMK].

Let S_i = {s_i,1, s_i,2, , s_i,n_i} represent a set of local descriptors for image i By applying a clustering algorithm, such as k-means, to the set S_i, the descriptors are grouped into k_i clusters If we denote c_i,l as the centroid of cluster l, the signature of image i will be represented as { (c_i,1, f_i,1), , (c_i,k_i, f_i,k_i) } The number of clusters, k_i, varies from one image to another.

Notons que les distributions extraites à partir d'un ensemble de desripteurs lo- auxpeuvent être sous d'autresformes, par exemple, une densité de probabilitéonti- nue [DV02℄ou unmodèle stohastiquespatial [LW04 ℄.

Disussion

Various methods for extracting visual features and constructing image signatures have been proposed While global features are suitable for describing the overall content of an entire image, local features are better for representing finer details Therefore, the choice of an appropriate representation depends on the scale of the main content In this regard, a hybrid representation could be an intriguing option.

Over the years, there has been a shift from global representations, such as color histograms and global shape descriptors, to local representations that utilize local features and descriptors like interest points, region-based features, spatial models, and local shape characterization Local features often correspond to the most significant components of an image, enabling a direct association of semantics with the components (regions, objects) within the image Although segmentation aims to recognize objects in an image, it remains an open problem Consequently, alternative approaches like interest points have proven to be more effective When local descriptors are used to describe image content, their distributions can be explored either discretely, through vector quantization or region-based representation, or continuously via probability densities or more complex spatial stochastic models While vector quantization applies a fixed number of cells to all images without considering their localization, treating them as symbolic variables (visual words), region-based representation generates cells adaptively for each image.

Les informations spatiales sont soit intégrées dans la phase de onstrution de signatures pour formerdesreprésentationsomplexes, soit utiliséesdansl'étapede post- traitementquivérielaohérenedelamiseenorrespondanedesdesripteursloaux.

Enrésumé,réduirelefossé sensoriel entandemavelefossésémantique doitonti-

Similarité des images

Signatures de type veteurs

Lorsque les imagessont représentées par de simples veteursde aratéristiques, la mesurededissimilaritélaplusutiliséeestsansdouteladistane eulidienne (ounorme

2 Nousparleronsdemesuresdesimilaritédansleasgénéraletplusspéiquementdemesurede dissimilaritédansle sensólamesureest nulle pourdeuxobjets identiquesetroissantelorsqueles

Signature basée sur les régions

Sac-de-mots Densité de prob.

Basée sur textes Divergence de K-L

Quantification vectorielle Estimation de densité

Figure 1.4 Diérents types de signature d'images, leur formulation mathématique, lesdistanesutilisées,etlestehniquesreliéesàlaformulationdesignaturesetaualul desdistanes.

L 2 ) Soient x, y ∈ E ≡ R D deux veteurs dans l'espae de dimension D , la distane eulidienne entre x et y est dénie de manière suitvante :

(x j − y j ) 2 (1.1) óx j est la j i` eme omposante de x et y j est la j eme ` omposante de y

Ladistaneeulidienneestunaspartiulierd'unedistaneplusgénérale,ladistane deMinkowski (ounormeL p ) Soient x, y ∈ R D deux veteurs dans l'espae de dimension

D , la distane de Minkowski d'ordre p entre x et y est obtenue par :

(1.2) ó| | désigne la v aleur absolue Dans le as ó p = 2 , nous avons la distane eulidienne

(L 2 ) et la distane de Manhattan ( L 1 ) dans le as ó p = 1 Quand p tend vers l'inni nousobtenons ladistane de Thébyhev :

To enhance robustness, various alternatives to the Minkowski distance (Lp) have been proposed, with one of the most notable being the Mahalanobis distance This distance allows for the consideration of uncertainties in vectors as well as the potential correlation between their components The Mahalanobis distance between two vectors, x and y, with a covariance matrix Σ, is defined by the formula: dM(x, y) = √((x - y)ᵀ Σ⁻¹ (x - y)), where xᵀ represents the transpose of x.

The quality of distances in a linear space is often inferior to that in a non-linear space A significant challenge involves replacing Euclidean distance with geodesic distance when working with non-linear varieties It is assumed that the non-linear subspace is better suited for visual perception than the original linear space Consequently, similarity measures may be more relevant when evaluated within the non-linear variety This concept has been applied in image ranking.

04,VL05 ℄.Lesméthodes typiques pour l'apprentissage basésurles variétéssont loally linear embedding (LLE), isomapping, etmultidimensional saling (MDS)[dST03 ℄.

Signatures de type ensembles de veteurs

Region-based signatures and histograms, particularly in the context of cell localization, can be viewed as sets of weighted vectors The dissimilarity measures for these types of signatures rely on the distance between two sets of vectors, which is less straightforward than the distance between two simple vectors.

First, let's consider an image signature represented as a set of weighted vectors, {(c1, f1), (c2, f2), , (cn, fk)}, where cl denotes the feature vector of region l, or the representative of the cluster, or even the location of cell l (f).

Setion 1.2.3)avesonpoids(ou fréquene)assoié, f l Soient deux signatures notées :

A natural approach to measure the similarity between two signatures involves pairing the vectors \(c(1)_i\) and \(c(2)_j\), then combining the distances between these vectors as a measure of distance between two sets of vectors One method for vector pairing assigns a weight \(a_{i,j}\) to each pair \((c(1)_i, c(2)_j)\), where \(1 \leq i \leq k_1\) and \(1 \leq j \leq k_2\).

[WLW01 ℄.Lepoidsa i,j indiquel'importanedel'assoiationduveteur c (1) i au veteur c (2) j Ces poids a i,j doivent vérier ertaines ontraintes :

Une des motivations pour et appariement ou (soft mathing) est de diminuer l'eet de la segmentation d'images inorrete sur la qualité de la reherhe d'images.

Une fois les poids déterminés, la distane entre S (1) et S (2) est agrégée à partir des distanes entre despairesde veteurs: d ens (S (1) , S (2) ) = k 1

, (1.5) óladistane d(., ) est une distane quelonque entre deux veteurs simples.

Une heuristique pour déterminer les poids a i,j est de herher les a i,j tels que la distane d ens (S (1) , S (1) ) dans l'équation 1.5 soit minimisée sous ertaines ontraintes surles a i,j d ens (S (1) , S (2) ) = min a i,j k 1

Cette distaneest exatement ladistane de Mallows dans leas de distributions dis- rètes[Mal72 ℄.

Another matching method is the Hausdorff distance, which is utilized in image retrieval In this approach, each vector \( c^{(1)}_i \) in set \( S^{(1)} \) is paired with its closest vector \( c^{(2)}_i^* \) in set \( S^{(2)} \) The distance between \( S^{(1)} \) and \( S^{(2)} \) is determined by the maximum of all distances \( d(c^{(1)}_i, c^{(2)}_i^*) \) To ensure symmetry, an additional distance is calculated by swapping the roles of \( S^{(1)} \) and \( S^{(2)} \), and the longest distance is selected Thus, the Hausdorff distance is defined as: \[ d_H(S^{(1)}, S^{(2)}) = \max \max_i \min_j d(c^{(1)}_i, c^{(2)}_j) \]

D'autres distanes basées sur les tehniques de mise en orrespondane oue très utiliséespour lessignaturesde typed'ensembledeveteurssont l'EMD(EarthMover's

The Earth Mover's Distance (EMD) aims to minimize the cost of transforming one distribution into another while adhering to specific movement constraints When the functions f(i) and f(j) represent probabilities, such as normalized histograms or probability densities, the EMD is equivalent to the distance between these distributions.

Mallows La distane IRMutilise le prinipe plussimilaire plusgrande priorité (ou

MSHPpour Most SimilarHighest Priority en anglais) pour l'appariement desrégions.

Le prinipe est que plus ladistane entre une paire de régions d c (1) i , c (2) j est ourte pluslepoidsorrespondant a i,j est grand.

Signatures de type résumé des desripteurs loaux

Les desripteurs loaux sont rộsumộs soit de faỗon disrốte par des histogrammes de mots visuelsou dessignatures basộes sur lesrộgions, soit de faỗon ontinue par des

Lorsqu'une quantiation vetorielle est utilisée pour réer un voabulaire visuel, les images peuvent être représentées par un modèle vetoriel ave une pondération tf*idf [SB88℄ouplussimplementparunmodèled'histogrammedemotsvisuels[ZZRS00 ,

SZ03b ,NS06, JHS07 ,LS07 ℄ Ave lemodèlevetoriel, lamesurede similaritéfréquem- mentutiliséeestlasimilaritéduosinus quipermetdemesurerlasimilaritéentredeux veteurs en déterminant leosinus de leur angle Soient x, y ∈ R D deux veteurs à D dimensions, lasimilaritéentre x et y est obtenue par : sim mv (x, y) = cos(x, y)

= h x ã y i k x kk y k (1.11) úh x ã y i est le produit salaire de deux veteurs x et y ; et kãk est la normeeulidienne. Pour lemodèled'histogramme,unemesuredesimilaritéaétéproposéedans[ZZRS00 ℄: sim mh (x, y) = 1

1 + dist mh (x, y) (1.12) ó ladistaneest: dist mh (x, y) =

Probability density functions of local descriptor distributions can be estimated and utilized as signatures The dissimilarity measure between two images is determined by the divergence between their distributions represented by these probability densities A commonly used measure of divergence is the Kullback-Leibler divergence.

(divergene K-L)[KL51 ℄.La divergene K-L,onnue ommel'entropie relative,estune mesureasymétrique dedeux distributions f (.) et g(.) , dénie par : d KL (f, g) =

−∞ f (x) log f (x) g(x) dx (1.14) d KL (f, g) = X x f (x) log f (x) g(x) (1.15) dansles asontinus etdisrets respetivement [DV02 ,MSB02℄.

In summary, similarity measurement can be achieved using simple feature vectors, sets of descriptors, and local summaries The primary advantage of representing images with simple vectors lies in the ease of performing algebraic and geometric operations on them However, many of these representations fail to capture the necessary details for conveying complex semantics On the other hand, the complexity of measuring similarity across descriptor sets complicates the image retrieval problem Summarizing local descriptors into histograms, along with associated similarity measures, strikes a balance between maintaining image detail and ensuring parsimony.

Méthodes inspirées de l'analyse de données textuelles

LSA

Latent Semantic Analysis (LSA) is a powerful method that projects documents from a dataset into a reduced space known as the latent space It employs singular value decomposition of the data matrix, which can be represented as the term-document matrix X, with or without tf*idf weighting This technique identifies a linear subspace that captures the majority of the variance within the corpus.

X = UΣV T (1.16) ó U et V sont des matries orthogonales U T U = V T V = I et Σ est une matrie diagonaledont leséléments diagonauxsont lesvaleurssingulières de X

L'approximation de X (par LSA) est alulé en mettant à zéro les (N − K) plus petites valeurssingulières(Σ ˜ ).

La nouvelle représentation des doumentsdansl'espae latentest donnéepar U ˜ Σ

Cetteapprohepeutserviràuneompressionsigniativedegrandesolletions.De de dimensions originales peuvent apturer ertains aspets des notions linguistiques ommelesynonymie etlapolysémie.

PLSA et LDA

A probabilistic version of Latent Semantic Analysis (LSA) is the Probabilistic Latent Semantic Analysis (PLSA), which enhances LSA by incorporating a probabilistic model that treats word distribution within a document as multinomial This method relies on a mixture decomposition derived from a latent variable model, employing the Expectation-Maximization (EM) algorithm to estimate parameters.

En eet,lePLSA introduitunevariablelatentez ∈ Z = { z 1 , z 2 , , z K } et modélise laprobabilitéjointe P(d, v) , d ∈ D et v ∈ V par :

Le logarithme delavraisemblanedu orpusest dénipar :

X(d, v) log(P (d, v)) (1.20) ó X(d, v) est le nombre d'ourrenes du mot v dans le doument d

L est maximisé par un algorithme d'EM ave l'étape E :

Each value of the latent variable z ∈ Z corresponds to a specific theme A word is generated from a theme based on the word distribution per theme P(v | z), allowing different words in a document to originate from various themes Consequently, a document can belong to multiple themes PLSA represents each document through its theme distribution P(z | d).

05 ,LS07 ℄,les intérêtsduPLSAsont la rédutionde dimension etl'amélioration de lareprésentation sémantiquedesimages.

Bien que le PLSA soit une méthode utile pour la modélisation probabiliste des

En eet, dans le PLSA, un doument est représenté par une distribution des thèmes,

The PLSA model estimates the distributions P(z | d), where d represents an index of documents in the training set However, this approach leads to two main issues: (i) the number of model parameters increases linearly with the size of the corpus, which can result in overfitting; and (ii) PLSA does not allow for the generation of new documents.

.Pour etteraison iln'est pasun vraimodèle génératif.

To address the issue, Blei et al proposed Latent Dirichlet Allocation (LDA), which employs a generative model The number of parameters in the model remains constant regardless of the corpus size However, due to the model's complexity, exact inference for estimating its parameters is not feasible Consequently, variational approximation methods have been developed to estimate the model's parameters.

LePLSA estunaspartiulierduLDAquandunestimateurmaximum aposteriori estutilisé etqueladistribution de Dirihlet est supposéeuniforme[GK03 ℄.

Signalons que le nombre de thèmes pour le PLSA et leLDA ne peut pas être très granden général Sinon, les probabilités sont trop petites etne sont passigniatives.

Enplus,l'algorithmed'EM nedonne pastoujours unesolutionglobalemaispluttune solutionloale.

Interation homme - mahine

Spéiation de requête

Users can search for images in a database in several ways The most common methods include category browsing, keyword queries, sketch-based queries, and example-based queries Category browsing allows users to navigate through images organized into different categories based on their semantics or visual content Keyword queries focus on retrieving images associated with specific descriptive terms In sketch-based and example-based queries, users specify their request by either drawing a sketch or providing an example image, prompting the system to suggest images with similar visual characteristics.

3 Hofmannaproposédans[Hof99a℄uneméthodepouralulerlesdistributiondesthèmesP (z|d) pourdesnouveauxdoumentsenlanỗantenoreunalgorithmed'EMquixelesdistributionsdesmots parthèmeP (v|z) obtenues dans l'étaped'apprentissage. typesderequêtessontreliésàladesriptionsémantiqued'imagesquisonthorsdenotre disussion Nousnousonentrons seulement surlesdeux dernierstypesde requête.

The ray tracing query allows users to create a rough sketch of an image using a graphic editor provided by the RIC system or another software Users can formulate queries by drawing multiple objects with specific properties such as color, texture, and shape In most cases, a rough sketch is generated when the query can be derived from the results.

The example-based query allows users to submit a request by providing a sample image The system first extracts the characteristics of the query image and represents them internally Then, it retrieves images from the database that share similar features with the query The main advantage of using an example-based query is that users do not need to provide an explicit description of the target image; the system generates this description automatically This type of query is suitable for applications where the target is an image containing the same object or a set of objects under different projection conditions Most operational systems support this method of query formulation.

The group image query feature allows users to select multiple images, enabling the system to identify images that best match the characteristics of the example group This method enhances precision by specifying relevant characteristic variations while eliminating irrelevant ones from the query Additionally, group properties can be refined by incorporating negative examples Several recently developed systems provide responses to queries that include both positive and negative examples.

Visualisation

La présentation des résultats de reherhe d'images est peut-être un des fateurs importants dans l'aeptation et la popularité d'un système de reherhe d'images.

Nousaratérisons quelquesshémas de visualisation ourants:

Ordonnộ par pertinene La faỗon la plus populaire pour prộsenter les rộsultats dereherhe estl'arrangement parpertinene, adoptéparGoogleetYahoo!pour leursmoteursdereherhed'images.Lesrésultatssontordonnésselonunemesure numérique de pertinene à larequête.

Ordonné dans letemps (hronologique) Lesimagesretournéesapparaissent dans unordrehronologique.LesystèmeGoogle'sPiasa

4 pourdesolletionsperson- nellesfournit une optionpour visualiserainsiles imagesdansde temps.

Groupé(lustering).Lelusteringd'imagesparleursmétadonnéesouleurontenu visuelaétéunthèmedereherheatifesdernièresannées.Lelusteringderésul- tatspermetuneformede présentation intuitiveetestaussiutilisé pour améliorer laperformane dusystème [CWK05 ℄.

Hierarchical metadata associated with images can be organized in a tree structure, such as WordNet, which aids in visualization This hierarchical visualization of results is particularly beneficial for archives, especially in educational contexts.

Composé La ombinaison se ompose de deux ou plusieurs des types de visualisation mentionnés i-dessus Le lustering hiérarhique et la visualisation des graphes deoneptssontdesexemplesdestehniquesdevisualisationomposées.

Boulage de pertinene

Human perception of similarity between images is subjective, semantic, and task-dependent While content-based methods show promise for image retrieval, results relying solely on visual feature similarity are often neither perceptually nor semantically satisfying Furthermore, each type of visual feature tends to capture only a single aspect of image properties, making it difficult for users to specify how different aspects are combined To address these issues, relevance feedback—a traditional technique in text-based information retrieval systems—can establish a weak link between high-level concepts and low-level features.

Relevance feedback is a supervised learning technique used to enhance the effectiveness of information retrieval systems The core concept involves utilizing positive and negative examples provided by the user to improve system performance When a query is submitted, the system initially returns a list of images ranked by a predefined similarity measure The user then indicates which images are relevant (positive examples) or not relevant (negative examples) to the query The system then refines the results based on this feedback and presents a new list of images to the user.

Métriques d'évaluation

Image retrieval based on content is fundamentally an information retrieval problem The evaluation metrics commonly employed are those utilized in information retrieval Two of the most popular evaluation measures are precision and recall.

P r ecision ´ Cette mesure se réfère au pourentage des images retournées qui sont pertinentes par rapport àlarequête.

Rappel Le rappel (ou la sensibilité) orrespond au pourentage de toutes les images pertinentes de labased'images qui sont retournées.

When a query involves an image, the relevance of the returned images is highly subjective Instead of providing a set of relevant images based on the query, most image search systems return a list of images ranked by their relevance to the request The decision and recall are often evaluated based on a specific set of the top k images returned, where k is referred to as the scope.

On a montré que la pr´ ecision et le rappel suivent une relation inverse en fontion dusope,'est-à-direlapréisiondiminue tandisquelerappelaugmentequandlesope augmente.

Notonsependantqu'aveunsope k donné, la pr´ ecision ave les k premières images retournées (dénotée par Pk ) est proportionnelle au rappel (Rk) au même sop e.

Traditionnellement,lesrésultatsd'unsystèmedereherhe d'informationsontrésu- més par desourbes depr´ ecision rappel ou ourbes de pr´ ecision scope

To obtain a new precision-recall curve, we assign the precision to each relevant image returned and interpolate the precision at 11 standard recall points, which are 0, 0.1, 0.2, , and 1 The interpolation is conducted using the following rule: p(r) = max r′ ≥ r { p(r′) }, where p(r) represents the precision at the point where recall equals r.

Une ourbe de pr´ ecision rappel idéale est parallèle à l'axe rappel et onstant égale à 1(-à-d lapr´ ecision est toujours égale à 1 quelque soit le rappel ).

To evaluate how effectively the system ranks relevant images in user search results, two widely used numerical metrics in the Information Retrieval Community (IRC) are the Average Precision and the Average Normalized Rank These metrics complement graphical measures such as precision-recall curves, providing a comprehensive assessment of retrieval performance.

Mean Average Precision (mAP) is calculated as the area under the precision-recall curve, averaging the precision values for each relevant image returned The arithmetic mean of the average precision calculated across various queries is referred to as mAP.

La mesureAN R pour une requête est donnée par :

 (1.25) ó M est le nombre d'images dans la base ; M per est le nombre d'images pertinentes pourla requêteetrang(i) est le rang de la i eme ` image pertinente.

Essentially, the Average Normalized Rank (AN R) equals 0 when all relevant images are returned first The AN R value ranges from 0 to 1, with a value of 0.5 indicating random search results A smaller AN R signifies better result ordering, similar to how mean precision is calculated through arithmetic averages.

AN R alulés pour plusieurs requêtes est appelée le M AN R (Mean Average Normalized

Synthèse

In this chapter, we discussed the principles of functioning related to similarities and dissimilarities (distances), methods inspired by text data analysis, human-machine interaction, and evaluation metrics We emphasized techniques for describing visual characteristics and measures of image similarity based on these visual features Details on indexing image signatures and efficient search techniques are presented in Chapter 2.

The most commonly used visual characteristics in content analysis are color, texture, shape, and spatial relationships Visual feature extraction can be performed either globally on an entire image or locally on a group of pixels These two approaches result in different types of image signatures, such as vectors, sets of vectors, visual word sets, and probability density distributions of local descriptors Spatial relationships between regions or objects are often represented by a 2D string or probabilistic model Local descriptors are generally more effective than global descriptors for describing visual content.

The techniques for measuring similarities and distances between image signatures depend on the type of signatures used Distances between vector-type signatures can be calculated easily, while calculating distances between sets of vector signatures is more complex The summary of local descriptors based on visual words strikes a good balance between precision (as local descriptors are used for image description) and speed (as distance calculations are very fast) in content-based image retrieval systems Additionally, techniques for processing textual data can be applied to images to enhance search quality These are the main reasons we chose this representation of images in our thesis work.

Indexation et reherhe d'images dans une base

Reherhe d'images par leur signature

The goal of a content-based image retrieval system is to find images whose signatures closely match a given query signature based on a specific measure of similarity This process, known as similarity search, fundamentally differs from exact search methods used in traditional databases Two primary methods for similarity-based retrieval are outlined in the literature.

Reherhe à un rayon près (ε -près) La reherhe à ε -près onsiste à retrouve les images situéesà une distaned'auplus ε de l'image requête.

Reherhedes k -plus prohes voisins Dans la reherhe des k -plus prohes voisins, il s'agit de retrouver les k images les plus prohes de l'image requête selon une mesure desimilarité assoiéeauxsignatures d'images.

The main drawback of image retrieval based on ε is the uncertainty regarding the number of images returned; a small ε value may yield few results, while a large ε can result in an overwhelming number of images In contrast, searching for k nearest neighbors ensures that k images are always included in the results, although some may be less relevant to the query Generally, k nearest neighbor searches are preferred over radius-based searches, despite both methods having their pros and cons When an image is represented by a unique signature, the k nearest neighbors of that signature directly correspond to the k most similar images to the query image, allowing users to view and assess the quality of the search results Therefore, we focus on the retrieval of k nearest neighbors in this context.

L'approhenạvepoureetuerunereherheparsimilaritéonsistềaluler,pour une signature de l'image requête, lamesure de similarité ave toutes les signatures de la base On parlera alors d'une reherhe séquentielle ou d'une reherhe exhaustive.

The complexity of such a method is proportional to the number of signatures in the database, denoted as O(M), where M represents the number of images in the database When the database size is large, exhaustive searches become prohibitively expensive, prompting the use of database techniques to accelerate searches The general principle involves limiting the number of signatures compared to the query signature by eliminating groups of signatures through filtering rules The performance of indexing techniques heavily relies on the nature of the signatures and the similarity measures employed.

Indexation multi-dimensionnelle

Struture

Les méthodes d'indexation vetorielle sont basées sur leprinipe du regroupement hiérarhique de l'espae de données Struturellement, elles ressemblent à un arbre de typeB

-tree [BM72 ,Com79 ℄.Pour traiterlesrequêtes demanièreeae,les veteurs sont stokésdanslesfeuilles d'unarbre,de tellesorteque lesveteursquisontprohes l'undel'autre dansl'espae designatures sont présentsdansune mêmefeuille Chaque veteureststokédansunefeuilleunique,-à-d.iln'yapasdedupliationdeveteurs.

Data structures known as trees are organized in a structured hierarchy Each intermediate node contains a set of keys and pointers to either a sub-tree or a leaf Each key corresponds to a representation of a region in a multidimensional space, depending on the structure used For instance, in an R-tree, the key represents a hyper-rectangle that encompasses all the keys and vectors of the sub-tree, while in an SS-tree, it corresponds to a hyper-sphere.

Grouping vectors from the same region of space into a single sheet allows for a two-step research process The first step involves identifying sheets likely to contain vectors that meet the query criteria, while the second step calculates the distance between the query and the vectors within the selected sheets This approach significantly reduces the number of vectors that need to be compared to the query.

There are numerous indexing structures developed for multidimensional data retrieval Generally, multidimensional data indexing techniques can be categorized into two groups: partitioning-based methods, such as R-tree, X-tree, SS-tree, SR-tree, and TV-tree, which divide the data space according to the data distribution; and hyperplane-based methods, which partition the data space using predefined hyperplanes, disregarding the actual data values, and store vectors in appropriate cells.

Algorithme de reherhe

Avantdedérireendétaill'algorithmedereherhedek plus prohes voisins utilisant les strutures d'indexation multi-dimensionnelle, il est néessaire de dénir quelques notions importantes:

Dénition 2.1 (MinDist et MaxDist) Soit un veteur de requête r dans un espa e vetoriel E de dimension D Soit R une région de E dépendant de la strutur e utilisée et aratérisée par sa lé dans l'arboresene Soitd : E × E → R + , la distane onsidérée pour la reherhe des k plus prohes voisins de r Alors,

Intuitivement, M inDist est la plus petite distane entre une requête et une région et

M axDist est la plus grande distane entre une requête et une région.

K-nearest neighbor search algorithms utilize pruning rules to directly eliminate branches of the tree when it is impossible for the corresponding region to contain points closer than the current solution.

Ilyadeuxrèglesdeltrage.Lapremièrestipule quepourdeuxrégionsR 1 et R 2 , si

M axDist(r, R 1 ) ≤ M inDist(r, R 2 ) (2.1) alors R 2 peut être éliminée Cette règle suppose qu'il y ait au moins k veteurs dans haque région.

La seonde règle deltrage stipule quesi

M inDist(r, R) ≥ d(r, x k ) (2.2) ó x k représente le k eme ` plus prohe voisin atuellement trouvé, alors R peut être éliminée.

The two algorithms for retrieving the k nearest neighbors are the RKV algorithm and the HS algorithm The RKV algorithm, originally proposed in the context of R-trees, employs a branch-and-bound approach that traverses the tree in depth Its filtering rule is enhanced by using a more restrictive distance known as MinMaxDist, which corresponds to the minimum of the MaxDist across different dimensions of the hyper-rectangle This filtering rule assumes that each dimension of the hyper-rectangle contains at least one vector, a condition that holds true for R-trees by construction.

The HS algorithm is designed to navigate the tree structure in a non-traditional manner, independent of the tree level where the nodes are located It begins with a list initialized by the region corresponding to the root of the tree and its MinDist, followed by a series of repeated operations.

Séletionnerlarégion R ayant la plus petite distane MinDist dans la LRA.

SiR orrespond à une feuille, alors herher les plus prohes voisins dans R

Sinon, aluler la MinDist des régions orrespondant aux sous-arbres du n÷ud orrespondant àR et les insérer dans la LRA.

Additionally, it has been demonstrated that it performs better in terms of the number of nodes or leaves visited However, the drawback is that it requires more memory to maintain the LRA, with the worst-case memory space needed being O(n), where n represents the number of regions.

Partitionnement des données

All data partitioning-based indexing methods are derived from the R-tree method, originally developed for indexing 2D data in geographic information systems The R-tree employs minimum bounding rectangles (MBRs) to efficiently organize and retrieve spatial data.

REMestun intervalle multidimensionnel de l'espaede données(un retanglemultidi- mensionnel detés parallèlesauxaxes).LesREMssontdesapproximationsminimales desensemblesdepointsenglobés.Don,haquesurfaede(D − 1 ) dimensions du REM ontient au moins un veteur

Diverses méthodes peuvent être utilisées pour déter- miner quels retangles doivent être fusionnés ou séparés à haque niveau de l'arbre.

The main issue with R-trees is that the bounding rectangles can overlap As a result, the search process must traverse multiple nodes to find the nearest neighbors, leading to inefficiencies.

-tree [SSH86 , SRF87℄,une variantede

R-tree, évite e problème en utilisant une stratégie, appelée fored-split, qui sépare les régions de hevauhement en deux Le nombre de n÷uds peut augmenter de manière exponentielle Le X-tree [BKK96 ℄, est une extension du R-tree développée pour gérer diretement des donnéesmultidimensionnelles Le hevauhement desrégions est évité en utilisant unsplit-history et dessupernodes ave desapaités élargies.

The SS-tree, an extension of the R-tree, utilizes bounding spheres instead of rectangles Although the bounding spheres are not minimal, they are defined by the gravity center of points and a radius that ensures all vertices are included within the sphere This method simplifies the description of regions to just a center and a radius, allowing for rapid determination of spatial relationships.

Bien que les sphères améliorent la performane de la reherhe, les sphères englo- bantesoupentplusd'espaequelesretanglesenglobantsavedesdonnéesdegrande dimension Ceiréduit l'eaité delareherhe.Par ailleurs,un problèmegénéral du b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b

Figure 2.1 Problème général du SS-tree :il n'y a pas de partitionnement sans hevauhement possible.

SS-treeestqu'iln'estpastoujourspossibledefaireunpartitionnement sanshevauhe- ment (f.Figure 2.1).

To address the issue, the SR-tree utilizes the intersection of a bounding sphere and an encompassing rectangle According to the authors, spheres are more suitable than rectangles for nearest neighbor searches using the L2 metric However, spheres are challenging to maintain and tend to create more overlapping regions The combination of sphere and rectangle helps overcome these drawbacks.

The MaxDist is defined as the maximum of the MinDist from the REM and the encompassing sphere, while the MinMaxDist is not defined for spheres; instead, the MinMaxDist of the REMs is utilized It is important to note that the MinDist and MaxDist are not the exact MinDist and MaxDist; specifically, the exact MinDist is less than the MinDist, and the MaxDist is greater than the MaxDist.

M axDist exate) et qu'auune onnaissane sur la sphère n'est exploitée pour déter- minerlaM inM axDist

The TV-tree is an extension of the R-tree that recognizes not all dimensions contribute equally to search efficiency Lin et al categorize dimensions into three classes: those that can be ignored, those that must be used, and those that are utilized for ranking searches A significant drawback of this technique is its reliance on precise knowledge of data distribution across each dimension, allowing certain dimensions to be used independently of others.

Partitionnement de l'espae

Les tehniques de partitionnement de l'espae omme le grid-le [NHS84 ℄, le K-

D-B-tree [Rob81℄ et le LSD h

-tree [Hen98 ℄ divisent l'espae en régions retangulaires

Disjoint cells, arranged in a more or less regular pattern, do not consider data distribution The generated regions are organized either as a hash table within the grid or through a tree structure like the k-d tree This approach ensures a complete and disjoint partitioning of the space.

However, complete partitioning typically results in regions that are larger than necessary In high-dimensional spaces, the number of regions often exceeds the number of vectors, leading to many empty regions Additionally, kd-trees are usually unbalanced The K-D-B tree addresses this issue through a strategy of forced splitting, similar to that used in R-trees.

-tree [Hen98℄ est un kd-tree adaptatif Contrairement aux R-trees et kd- trees,ladesriptiondesrégionsdanslesLSD h

Sophisticated tree structures are designed to minimize memory usage High-level intermediate nodes are believed to be fixed in central memory to enhance search efficiency A significant drawback of space partitioning methods is that the number of neighboring regions increases exponentially with the number of dimensions Additionally, most regions tend to be empty, making searches inefficient when the dimensionality is high (D > 16).

Courbes remplissant l'espae

Space-filling curves, such as Hilbert curves and Z-order curves, belong to the family of space partitioning methods These curves map D-dimensional spaces to one-dimensional spaces, facilitating efficient data representation and retrieval The mathematical representation of this mapping is expressed as a function: R^D → R.

The use of discrete approximations of curves allows for the partitioning of space into cells At each step, one dimension is chosen, and the space is divided into two The choice of dimension depends on the curve utilized If p represents the number of steps, known as the depth of the partitioning predetermined by the user, the space will be divided into 2^p cells of equal hypervolume, numbered from 1 to 2^p.

2 p cellule(x) est une fontion qui alule la ellule dans laquelle le veteur x se trouve. cellule : R D → [1; 2 p ]

Les veteurs sont transformés en entiers [1; 2 p ] N'importe quelle tehnique d'indexation uni-dimensionnelle est alors apable d'indexer les veteurs Supposons que le

-tree [BM72,Com79 ℄ soit utilisé.

To address queries related to k nearest neighbors or within a specific radius, we consider the minimum and maximum distances (MinDist and MaxDist) between the query and the cells The challenge lies in evaluating these distances efficiently without enumerating all cells in a given interval This problem can be resolved using a divide-and-conquer algorithm that recursively divides the interval into two parts Although this algorithm significantly reduces the number of unnecessary cells, the number of cells to visit generally increases exponentially based on the parameter p (where p ≥ D).

JBF05 has expanded the ε-near research into a statistical analysis that eliminates cells with a low probability of obtaining responses This enhancement in research methodology shows improvement over traditional ε-near approaches However, this method relies on the assumption that the distribution of distorted images generated by a set of transformations follows a Gaussian distribution, with independence among the components.

Ceiestsupposéêtrevraidansleasdedétetion deopies.Par ontre, dansleasde reherhe d'images, ladistribution desimagesdistordues n'est pasonnue.

Pyramid-tree

Le pyramid-tree [BBK98 ℄ est une tehnique de partitionnement de l'espae onỗue pour ontrer lamalédition dela dimension [Bel61 ℄.

Two main phenomena explain the degradation of performance in high-dimensional spaces: equidistance and the need for comparison As dimensions increase, all vectors tend to become equidistant from one another, making the closest neighbor to a query as distant as all other vectors in the database Consequently, it becomes essential to compare all vectors in the database with the query, making sequential search a more suitable approach.

As the dimensions increase, the space appears increasingly empty The number of vectors contained within a sphere of constant radius rapidly approaches zero with higher dimensions In other words, the vectors become increasingly distant from one another.

Similaire auxourbes remplissant l'espae, le pyramid-tree transforme les veteurs dans l'espae de D dimensions en lés dans l'espae à une dimension et utilise un

-tree pour indexer les lés Contrairement aux autres méthodes, le pyramid-tree stokelesveteursetleurlédansles feuillesduB

The inverse transformation is unnecessary, allowing the retrieval step to be executed without additional queries The partitioning in a pyramid-tree involves dividing the vector space into 2D pyramids, with the apex positioned at the center of the space Each pyramid is then partitioned using hyperplanes parallel to its base Consequently, the number of regions generated by this partition scales linearly with the dimensionality of the space This strategy for optimizing vector retrieval is designed for ε-approximation in high-dimensional spaces An ε-approximate query is processed by identifying all pyramids that intersect with the query and subsequently determining all relevant regions through a computationally intensive algorithm involving numerous distance tests.

-treeettouslesveteurs ontenus danslesfeuilles retrouvéessont omparés àlarequête.

The pyramid-tree is a unique indexing structure that remains unaffected by dimensionality issues Research has shown that for uniform distributions and constant radius queries, the performance of pyramid-trees improves as the dimensionality increases However, a significant drawback is that it only supports ε-approximate searches rather than k-nearest neighbor searches.

Reherhe séquentielle aélérée

V A-le

Weber et al introduced a similarity-based rehashing technique known as VA-le, which utilizes two hierarchies: an approximation hierarchy that provides a quantized, compressed version of vectors, and an original hierarchy that contains the true vectors Notably, neither of the two hierarchies is sorted.

Quantization of vectors involves constructing an irregular grid over the data space The resolution of the grid in dimension i corresponds to 2^b_i, where b_i represents the number of bits used to quantize dimension i The total number of bits required to quantize a vector is given by b = Σ^P D_i=1 b_i In other words, this method partitions the space into 2^b cells.

The research on the closest neighbors is conducted in two stages First, the quantitative vectors found in the initial layer are stored in memory and processed sequentially during the first stage, applying classical distance rules to the Minimum Distance (MinDist) and Maximum Distance (MaxDist) of the corresponding cells The observed candidate vectors are then organized by their MinDist, and their coordinates are stored in memory through a direct access to the second layer, which contains the original vectors.

(étape deranement) Cetteétapes'arrêtequanduneM inDist trouvée est supérieure ou égaleà ladistaneentrelarequête etlek eme ` plus prohe voisin.

The gain of VA-File compared to sequential search primarily lies in the compression rate, as the complexity of sequential loading of a large dataset is linear relative to its size This method can be significantly improved if the approximation dataset is fully stored in main memory However, a major challenge arises during the random access and disk retrieval phase When the resolution during the approximation phase is low (i.e., when 'b' is too small), the number of candidate vectors remaining after filtering increases significantly, negating any performance benefits gained from compression Conversely, if a large number of bits are used for quantization, the approximation dataset becomes too large to fit entirely in memory In this case, the dataset must be reloaded into memory with each query, resulting in overhead from the loading process and additional costs associated with the MinDist calculations.

M axDist ne justient pas sonutilisation Ilest montréque, danse as, leVA-le est ànouveau moinsperformant qu'à une reherhe séquentielle [AG01℄.

La performane du VA-le dépend fortement de la qualité d'approximation ar l'étape de ltrage se base sur l'approximation Une amélioration du VA-le pour des distributionsnon-uniformes,appeléeVA

-le[FTAEA00 ℄,transformedesdonnéesdans ipales(ACP).La quantiationesteetuéedansledomainedeKL Lespasdequan- tiationsontdéterminés par unalgorithme k -means.

IQ-tree, GC-tree, et PRC-tree

-leestquetouslesveteursapproxi- matifsdoiventêtreexaminésdansl'étapedeltrage.Pourréduirelenombredeveteurs approximatifs à examiner pendant la reherhe, quelques tehniques qui ombinent le

VA-le etl'indexation baséesur unarbre ont étéproposées.

00 ℄ est un arbre d'indexation à trois niveaux organisé en trois hiers distints Le premier niveau est une struture d'indexation arboresente (f.

Setion 2.2.1) dont les feuilles ontiennent des REMs et des pointeurs sur les pages du deuxième niveau qui ontiennent des veteurs quantiés La quantiation sebase sur la densitéde veteurs dans les REMs Le troisième niveau du IQ-tree ontient les pagesdeveteursoriginaux.LeGC-tree[CC02 ℄partitionnel'espaedeD dimensions en

2 D ellules pour identier les lusters (régions denses) et les outliers (régions reuses) baséssurladensitéloaledessousespaes.Leslusters sontréursivementpartitionnés.

DansleGC-tree, l'approximationestutiliséepourreprésenterlesrégionsdenses tandis que pour une région reuse, les veteurs originaux sont utilisés Le PRC-tree [CZZ07 ℄ transforme lesveteurs dansledomaine de KLommeleVA

R-treepourstruturerlesveteursquantiés.Laquantiationutilise lespremiers axes d'uneACP.

The search for k nearest neighbors in the IQ-tree and PRC-tree is conducted in a single step, utilizing the MinDist metric During the search, access to the original vectors is maintained The key difference between the IQ-tree and PRC-tree is that the REMs in the IQ-tree are defined in the original space, while the REMs in the PRC-tree are calculated based on quantized vectors In contrast, the search in the GC-tree is performed in multiple steps, similar to the VA-tree.

Fihier inversé

When an image is represented by a vector space model, the inverted index approach is one of the most suitable methods Initially employed in text retrieval, where the query consists of a set of related words, this approach is also applied in image retrieval once a visual word model is used to represent images.

Supposonsquetous lesveteursreprésentant les imagesdanslabasesoient norma- lisés.La similarité osinus entre une requête r = [r 1 r 2 r D ] ( r est aussi normé) et une imagez = [z 1 z 2 z D ] de la base est alulée par : sim(r, z) =

The cosine similarity in formula 2.3 relies solely on the condition that r_j ≠ 0 For any given query, only the dimensions where the query's coordinate is non-zero are utilized Furthermore, for a selected dimension, only the images with a non-zero coordinate in that dimension are considered relevant Based on this observation, an inverted index is constructed for each dimension j, which contains pairs of , where i represents the i-th image in the database, and z_ij (with z_ij ≠ 0) indicates the coordinate of image i in dimension j.

The dissimilarity for a query represented as [r1, r2, , rD] involves retrieving the corresponding inverted hierarchies for dimensions where rj ≠ 0 For each inverted hierarchy j, which corresponds to dimension j, the similarity between the query r and image i within that hierarchy is accumulated by a factor of (rj * zij).

Cette tehnique est aussi généralisée dans [NS06 ℄ pour les distanes de Minkowski d'ordre p quelonque en supposant que les veteurs sont normés d'après la norme p

Rappelons la formule 1.2 pour aluler la distane de Minkowski d'ordre p dans le hapitre préédent.

Reherhe approximative

Approximation de la représentation des données

Data representation approximation involves transforming data into a lower-dimensional space than the original and conducting research within this transformed space The most commonly used transformation technique is Principal Component Analysis (PCA) Another method is multidimensional hashing, which allows for searching points obtained by projecting vectors from the original space This technique transforms vectors using multiple hashing functions to maximize the likelihood that similar vectors from the original space will be grouped together Various transformations are applied to the vectors, with each assigning the same value to similar vectors The implementation of multidimensional hashing, as proposed in existing literature, begins with transforming the base vectors into a Hamming cube, where the Hamming distance serves as a measure of similarity, corresponding to the L1 distance in the original space.

Dee fait, l'approhe ne peuts'appliquer qu'aux images dont lasimilarité est estimée par ladistaneL 1

Approximation de la tehnique de reherhe

In the field of research, two primary approaches are identified for addressing the nearest neighbor search to reduce response time The first approach involves transforming the nearest neighbor problem into an ε-approximation problem within suitable indexing structures, as proposed in [LS02] This method initially establishes a radius ε for k nearest neighbors based on a sample of data stored in main memory The radius is then adjusted by a factor to minimize the dependency between ε and the chosen sample, resulting in a corrected ε value.

Une reherhe à ε -près ave le rayon orrigé est eetuée sur toutes les données.

Let R be the set of vectors contained within the regions intersected by the corrected hypersphere of radius ε The final step involves searching for the k nearest neighbors in R and returning them as an approximation of the k nearest neighbor search This technique reduces input/output costs during the search; however, the quality of the search depends on the initial sample size According to experiments presented in [LS02], a smaller sample size results in a greater estimation error for ε Additionally, this error increases with the number of nearest neighbors Moreover, the size of R is not controlled, which may lead to situations where it is smaller than the number of nearest neighbors to be calculated.

The second approach for finding the nearest neighbors focuses on utilizing indexing structures in the search process to minimize the number of candidates compared to the query This method, known as geometric approximation of queries, employs exact search techniques while modifying the query itself To enhance the search for the nearest neighbors, the ε-nearest neighbor search has been proposed.

Un ε -plus prohe voisin est un veteur dont la distane ave la requête est inférieure à

(1 + ε) fois la distane du plus prohe voisin exat :

Dénition 2.2 (ε -plus prohe voisin) Soit ∆ = { S i } i ∈[1;M ] une base de M signatures etd : ∆ × ∆ → R + une mesure dedissimilarité.Unesignature S εpp est une ε -plus prohe voisine d'unesignature de requête S r si : d(S r , S εpp ) ≤ (1 + ε)d(S r , S pp ) ó S pp est la plus prohe voisine exate de S r

Cette requête est très failement transposable à la plupart des méthodes exates vues préédemment, puisqu'il sut de modier, dans les règles de ltrage (f setion

2.2.2),laM inDist par M inDist + ε et la M axDist par M axDist − ε Dans le as des struturesd'indexationmultidimensionnelle, elarevientàmodier latailledesrégions englobantes, endiminuantainsiletauxdehevauhement etenrenforỗantlasộletivitộ de l'étape deltrage.

Geometric approximation methods are based on a straightforward principle, but selecting the right ε can be challenging to balance time and research quality A small ε maintains precision but does not always ensure a significant reduction in response time Conversely, a larger ε significantly decreases research time but may compromise accuracy.

Clustering et lassiation

Image labeling and categorization are essential preprocessing steps that reduce response time and enhance result quality when searching large image databases or performing automatic annotation These methods serve as approximations for search techniques The principle behind these approaches involves grouping data offline using a data clustering algorithm The resulting aggregates are stored individually on disk within a hierarchy During a search, the most relevant aggregates for the query are first selected, and the corresponding hierarchies are loaded into memory, allowing for sequential processing of the individuals within these hierarchies The approximation occurs at the stage of aggregate selection; thus, a larger number of aggregates leads to improved result accuracy.

The main drawback of this approach is its effectiveness, which is limited to scenarios where data is significantly aggregated, a condition not always met, particularly in certain image retrieval applications Additionally, the offline data aggregation phase can be quite costly, typically requiring O(M²) time complexity, and may quickly become prohibitive for large datasets.

The approach to reducing the number of aggregates relies solely on prematurely halting the search process Initially, all aggregates are organized based on their proximity to the query, followed by an arbitrary selection of aggregates for review This method prioritizes the quality of results, utilizing fast data clustering algorithms During the search, aggregates are arranged according to the distance between their entry and the query.

Recent literature proposes alternative methods that employ a probabilistic approach for aggregate selection, allowing for a priori control over the precision of results in proximity searches The DBIN (Density-based Indexing) method introduces a data clustering algorithm based on density estimation using an EM (Expectation-Maximization) algorithm Data distribution is modeled as a mixture of Gaussian distributions, with each component representing an aggregate The EM algorithm is applied to estimate the parameters of these Gaussian distributions During the search process, aggregates are ranked by the probability of the query belonging to each one, and they are subsequently processed in order of probability The sequential search results of the current aggregates are then utilized to update the probability that a newly encountered aggregate is superior to the nearest neighbor currently identified.

Lareherhes'arrête lorsqueetteprobabilité estinférieureàuneseuilxéparl'utilisateur L'inonvénient de etteméthode estlié diretement à l'utilisation del'algorithme

EM Laomplexitéest assezélevée Ilyaen eetune limitationdu nombred'agrégats alors que ertaines données réelles sont onstituées d'un grand nombre d'agrégats de petitetaille.

Berranietal proposed an approximation method that combines a geometric approximation of enclosing hyperspheres with precision control This method relies on the maximum probability of not including a vector present in the exact result within the approximate outcome The radius of the enclosing hyperspheres is adjusted based on the user's specified level of imprecision to increase the number of eliminated clusters Experiments demonstrated that the proposed method is 6.71 times faster than sequential search with only a 1% loss in precision However, the method was tested on medium-dimensional data (24 dimensions), and in high-dimensional cases, vectors tend to cluster near the surfaces of the hyperspheres, making radius reduction less significant Additionally, the reduced radii are recalculated offline based on a user-defined imprecision rate, necessitating recalibration whenever this value changes.

T ehniques basées sur l'algorithme MEDRANK

Faginetal proposed a very rapid method for retrieving the nearest neighbors, based on random projection of data and an aggregation algorithm called MEDRANK This method involves randomly selecting an arbitrary number of lines, K (where K < D, and D is the number of data dimensions), and projecting the data onto the chosen lines The projected values on each line are then ordered.

The process generates K ordered lists based on a given query r, which is projected onto predefined random lines The coordinates of r are located within the corresponding lists, indicated by two pointers, l_i and r_i, for each list The search traverses the lists in both directions, always moving the pointer closest to r_i The first vector found in more than K/2 lists is considered the nearest neighbor, while the second vector found in the same manner is deemed the second nearest neighbor, and so on The algorithm concludes when the k-th nearest neighbor is identified An alternative method, OMEDRANK, moves both pointers l_i and r_i at each step without determining which is closer to r_i, allowing for a reduction in comparisons.

The proposed method offers the advantage of analyzing only a small portion of data (e.g., 5%), making it very quick However, a significant drawback of this approach is that the accuracy of the results relies on the number of random lines used, denoted as K, as well as the quality of these lines The larger the value of K, the better the precision, but this also results in a slower algorithm.

Lejseketal.ontproposédans[LA04 ,LAJA09 ℄uneméthoded'indexation,appeléePvS.

La méthode onsiste à réer des PvS-indexes La onstrution d'un PvS-index prend d'abord une première droite aléatoire et projette les données sur ette droite (niveau

Data is partitioned into equally sized segments, either with or without overlapping For each segment, the data is projected a second time onto another random line (level 2) The projection continues until all random lines are utilized The values projected at the final level are stored in a designated location.

The LesPvS indexes are structured in a tree format During the retrieval phase, at each level, the segment closest to the projected query value is selected This process continues until a segment is chosen at the final level.

-tree assoié à e segment joue le même rle qu'une liste ordonnée dans l'algorithmeMEDRANK.

The PvS indexing method intuitively reduces response time and enhances the precision of the MEDRANK algorithm through data re-projection However, partitioning data into segments can isolate vectors that are distant along one line but are correlated along others When the first line is selected for initial partitioning, the separated vectors can no longer share the same segment, leading to a degradation in recall.

The PvS indexing method has been utilized for image copy detection, employing a voting algorithm on local descriptors to identify the nearest neighbors of a descriptor for matching purposes This approach does not require all nearest neighbors of a descriptor, focusing instead on the quality of the closest neighbors.

(-à-d.lapr´ ecision ) ar l'étape de mise enorrespondanedesdesripteursloauxn'est

Synthèse

This chapter provides a brief overview of indexing methods aimed at enhancing the search for nearest neighbors in a query It first introduces multidimensional indexing techniques, including data partitioning methods and spatial partitioning techniques However, the performance of these methods deteriorates rapidly as the number of dimensions increases, which is why sequential search acceleration methods have been proposed.

We presented various approximate search techniques that reduce response time at the cost of result quality The first approach involves transforming the search into an ε-approximate search by estimating the radius based on a data sample Other techniques aim to model or represent data in a smaller space than the original, such as dimensionality reduction methods like PCA and random projection These methods not only accelerate searches but also enhance result quality by reducing noise Additionally, techniques that decrease the number of vectors compared to the query include geometric approximation methods, clustering-based methods, and rank aggregation methods Geometric query approximation increases the number of aggregates filtered out, thereby speeding up the search process Rank aggregation methods like MEDRANK and PvS indexing combine random projection and rank aggregation to significantly accelerate searches.

All the methods discussed in this chapter have their advantages and disadvantages The next chapter will introduce an innovative approach that combines several methods to maximize benefits: the goal is to accelerate research without compromising the quality of results This approach relies on factorial correspondence analysis, originally developed for analyzing textual data when documents are represented by word clouds.

Analyse fatorielle des orrespondanes pour l'indexation et la reherhe d'images

Analyse fatorielle des orrespondanes

Adaptation de l'AFC pour la reherhe d'images

Aélération de la reherhe d'images par l'AFC

Expérimentations

AFC pour les grands tableaux de ontingene

Combinaison de l'AFC ave d'autres méthodes

Déouverte des thèmes d'images

Tiêu đề	Analyse Factorielle Des Correspondances Pour L'Indexation Et La Recherche D'Information Dans Une Grande Base De Données D'Images
Tác giả	Nguyen-Khang Pham
Người hướng dẫn	Isrặl-Cộsar LERMAN, Professeur Émérite, Djamel Abdelkader ZIGHED, Professeur, Georges QUẫNOT, Chargé de Recherche CNRS, Annie MORIN, Maître de Conférences, Patrik GROS, Directeur de Recherche INRIA, Quyet-Thang Lấ, Maître de Conférences
Trường học	Université de Rennes 1
Chuyên ngành	Informatique
Thể loại	thèse
Năm xuất bản	2009
Thành phố	Rennes

Định dạng
Số trang	156
Dung lượng	2,87 MB