(LUẬN văn THẠC sĩ) trích xuất ảnh trademark dựa trên các đặc trưng bất biến dịch chuyển, quay, tỷ lệ

Introduction

A trademark is defined as a word, design, picture, or symbol that identifies and distinguishes a company's products or services There are four main types of trademarks based on visual complexity: word-in-mark (only text), device-mark (only graphical elements), composite-mark (a combination of text and graphics), and complex-mark (intricate images) Examples of these trademark types can be found in Fig 1.

Fig 1 Some trademark image samples

Every company or financial organization aims to create a unique and meaningful logo that accurately represents its identity A well-designed logo not only attracts consumer attention to products and services but also enhances market viability Additionally, it is crucial to avoid consumer confusion through clear and distinctive branding.

The expansion of global markets has led to a significant increase in trademark registrations, with millions of trademarks submitted worldwide This surge presents challenges for trademark registration authorities, as they struggle to ensure distinctiveness and avoid duplication among existing trademarks The traditional manual classification methods used to assess these applications are proving to be inefficient, making it difficult for officials to determine if a trademark is already registered or if it infringes on existing copyrights Consequently, there is an urgent need for automated technology to streamline the trademark registration process and enhance the accuracy of trademark searches.

Current techniques for trademark distinctness checks include various image processing methods, with Content-Based Image Retrieval (CBIR) being the most widely used These techniques focus on analyzing visual features such as color, texture, and shape to determine similarities among trademarks A trademark may consist of text, graphics, images, and colored textures, which can be broken down into components for effective comparison The CBIR systems utilize color histograms for color feature extraction and emphasize shape features, which are crucial in these applications While several approaches exist for image retrieval, including improved pattern matching algorithms and broader text-based searches, some methods, such as shape-based techniques for logo retrieval, have proven inadequate for comprehensive trademark analysis.

This thesis presents a novel method for trademark image retrieval, where the input trademark image is decomposed into dominant visual shape images For each shape image, a feature vector is generated that remains invariant to scale, rotation, and translation The similarity between two trademarks is then assessed using these feature vectors The related manuscript, titled “Trademark Image Retrieval Based on Scale, Rotation, Translation, Invariant Features,” was published in the proceedings of the 2013 IEEE RIVF International Conference on Computing and Communication Technologies, held from November 10-13, 2013.

This thesis is structured into several key chapters: Chapter 1 serves as the introduction, while Chapter 2 reviews related works Chapter 3 discusses the background of relevant problems, and Chapter 4 details the proposed method Chapter 5 guides the installation of Visual Studio 2010 with OpenCV 2.4.2 on Windows 7, along with the presentation of experimental results Finally, Chapter 6 concludes the thesis, and the Appendix includes the complete source code for easy reference.

Related work

In recent years, researchers have proposed various solutions to reduce the workload of trademark registration offices Chen, Sun, and Yang suggested a two-step method for computing feature vectors, which involves partitioning an object region from a rotated image into 16 equal sections and constructing an entropy vector by calculating the information entropy of each section This shape-based retrieval technique demonstrates strong performance with good invariance to rotation, translation, scale, noise, and thickness, while also satisfying human visual perception However, the reliance on a single-feature retrieval system does not adequately address the multi-faceted nature of appreciation To enhance this approach, other methods such as single-feature Zernike Moments and invariant moments have been explored.

[3, 5, 6] features of each are combined with other features.Experiment results presented by

The Zernike moment method demonstrates consistent performance and robust invariance to translation, rotation, and scale, along with low sensitivity to noise, making it effective for trademark image retrieval However, existing methods face challenges in accurately retrieving images with geometric deformations and aligning with human visual perception In contrast, a new approach utilizing cosine distance and normalized distance measures offers improved accuracy by normalizing feature vectors and integrating both global and local features This method not only streamlines the trademark image database search process but also enhances similarity calculations, ultimately yielding better retrieval results Additionally, the incorporation of four shape features, including invariant moments and entropy histograms, further strengthens the proposed retrieval framework.

Recent advancements have integrated nine feature vectors from low-order color moments in the HSV color space with low-order Hu moments and eccentricity, as derived from gray shape-region images using Rui and Huang's technique (1998) These features undergo Gauss normalization, allowing for flexible adjustment of their weights Experimental results demonstrate that this multi-feature combination approach outperforms single-feature methods Additionally, ten invariant moments have been utilized as shape features for trademark images, enhanced through an ensemble of RBF neural networks trained to minimize localized generalization error for trademark recognition Currently, the system processes black-and-white trademark images, with plans to expand its capabilities to include color trademark images in future research.

In [2, 7], the ways of proposed combination of features are definitely different It is admitted that each of them performs well The equidistance based on concentric circles

The article discusses the partitioning of regions, as highlighted in previous studies [4] and [2] While both studies focus on this initial step, they differ in their implementation of the subsequent phase Study [4] derives a feature vector F from the corresponding region ZM, whereas study [2] integrates 200 region feature vectors with contour features identified through the enhanced SUSAN algorithm by Hong & Jiang [15] Additionally, Iwanaga et al [7] introduced a modified angle-distance pair-wise histogram, which combines angle and distance histograms of trademark objects This innovative system surpasses traditional moment-based and independent histograms in terms of performance, as evidenced by experiments on registered trademark databases, showcasing its robustness and ease of construction.

The integration of Zernike moments (ZM) with Scale-Invariant Feature Transform (SIFT) features represents a cutting-edge approach in image retrieval This method begins by extracting and sorting Zernike moments from the retrieved images based on similarity, forming candidate images Subsequently, SIFT features are utilized to match the query image with these candidates accurately This technique not only maintains high precision and recall compared to single Zernike moments but also enhances retrieval speed over using SIFT features alone Its effectiveness makes it particularly suitable for trademark image retrieval systems, significantly improving performance Additionally, Tuan N.G et al introduced a novel method leveraging the discriminative properties of trademark images to enhance text recognition accuracy, demonstrating notable improvements over traditional methods This contribution addresses an essential aspect of trademark image recognition.

Current approaches to trademark comparison overlook both partial and mirrored trademarks, focusing solely on original trademarks or noise-free standard databases These methods treat trademark images as complete objects, neglecting the intricate visual details that contribute to partial similarity detection Additionally, calculating the distance between features is crucial for assessing similarity levels among images Consequently, each proposed solution aims to establish a more suitable measurement for trademark similarity.

To address the limitations of existing methods, a novel content-based trademark recognition approach is introduced, comprising four key stages: first, trademark images are pre-processed and converted into binary format; second, dominant shape objects are extracted from these binary images; third, the RBRC algorithm is utilized to derive rotation-invariant, scale-invariant, and translation-invariant features from the shape objects; finally, Euclidean distance is employed to assess the similarity between images, enabling the retrieval of the ten most similar trademark images to a given query This study specifically targets the Vietnamese composite-mark database.

Background

Pre-processing

Converting gray scale to binary image

Segmentation is the process of dividing an image into distinct regions or contours that correspond to different objects This is typically achieved by identifying shared properties within regions or by recognizing differences between them A fundamental property that can be utilized for segmentation is intensity, making thresholding—a method that separates light and dark areas—a natural approach for segmenting these regions.

Thresholding transforms grey-level images into binary images by setting all pixels below a specified threshold to zero and all pixels above that threshold to one In this process, if \( g(x, y) \) represents the thresholded version of \( f(x, y) \) at a global threshold \( T \), it effectively delineates areas of interest in the image.

Thresholding primarily focuses on pixel intensity without considering the relationships between pixels, leading to challenges in accurately identifying contiguous regions This method may include extraneous pixels not belonging to the target area while also missing isolated pixels, particularly near boundaries The accuracy of thresholding diminishes with increased noise, as pixel intensity may not reflect the true characteristics of the region Consequently, users often struggle to find the right balance, risking the loss of significant areas or the inclusion of unwanted background pixels Additionally, shadows can complicate the process, often being misclassified as part of lighter or darker objects, further complicating accurate segmentation.

Global thresholding can be problematic due to variations in illumination across a scene, leading to certain areas appearing brighter while others remain in shadow These lighting changes can obscure the true characteristics of the objects within the image, making it difficult to achieve accurate segmentation.

To address uneven illumination in images, we can implement local thresholds that adapt across the image rather than relying on a single global threshold This approach allows for a smoother variation of the threshold, enhancing the overall image quality.

Automated methods for finding thresholds

To establish a global threshold or adjust a local threshold for a specific area, we typically analyze the histogram to identify distinct modes representing the foreground and background It is important to remember that a histogram serves as a probability distribution.

That is, the number of pixels 𝑛 𝑔 having greyscale intensity g as a fraction of the total number of pixels n

To effectively identify an object that is brighter than its background and occupies a specific fraction (1/𝑝) of an image, we can establish a threshold by determining the intensity level at which a desired percentage of the image pixels fall below this value This threshold can be easily obtained from the cumulative histogram of the image.

Simply set the threshold 𝑇such that 𝑐(𝑇) = 1/𝑝 (Or, if we’re looking for a dark object on a light background,𝑐(𝑇) = (1 − 1/𝑝)

A straightforward method for determining an appropriate threshold involves identifying the local maxima (modes) in a histogram and locating the valleys (minima) between them However, this approach has two significant drawbacks: the presence of noise in the histogram can lead to numerous local minima and maxima, and when two separate distributions are combined, the resulting histogram may not exhibit clear distinct modes To address the noise issue, histograms are typically smoothed before analyzing for separate modes.

Thresholding in image processing involves two groups of pixels with overlapping value ranges, making classification challenging The goal is to minimize the misclassification of background pixels as foreground and vice versa This is achieved by reducing the area under the histogram that falls on the opposite side of the threshold for each region.

The challenge we face is the absence of individual histograms for each region, as we only have access to a combined histogram It's important to note that the point of minimum overlap—where the misclassified areas of the distributions are equal—does not necessarily align with the valley in the combined histogram This discrepancy can occur, for instance, when one cluster has a wide distribution while the other has a narrow one To address this issue, we can conceptualize the values in the two regions as two distinct clusters Specifically, let 𝜇𝐵(𝑇) represent the mean of all pixels below a certain threshold, and 𝜇𝑂(𝑇) denote the mean of all pixels above that threshold Our objective is to identify a threshold that satisfies these conditions.

The algorithm begins by estimating the background mean, μB(T), using the average of the four corner pixels, while μO(T) is calculated as the average of the remaining pixels A threshold is then established midway between μB(T) and μO(T) to categorize pixels based on their intensity proximity to these means The estimates for μB(T) and μO(T) are subsequently updated by computing the means of the pixels on either side of the threshold This iterative process continues until convergence is achieved Although this method is effective when the distributions have similar spreads, it struggles with scenarios where the variances of the distributions differ significantly.

To achieve optimal clustering results, it is essential to set a threshold that minimizes overlap between clusters, thereby ensuring they are as tight as possible While we cannot alter the underlying distributions, we can modify the threshold to determine how we separate the clusters Adjusting the threshold affects the spread of each cluster; increasing one will decrease the other The objective is to identify the threshold that minimizes the combined spread of the clusters, which can be quantified through the within-class variance, defined as the weighted sum of the variances of each individual cluster.

𝜍 𝐵 2 𝑇 = the variance of the pixels in the background (below threshold)

𝜍 0 2 𝑇 = the variance of the pixels in theforeground (above threshold) and [0,N − 1] is the range of intensity levels

Calculating the within-class variance for each class at every possible threshold can be computationally intensive However, a simpler approach involves subtracting the within-class variance from the total variance of the combined distribution, resulting in what is known as the between-class variance.

The combined variance, represented by 𝜍², is calculated using the formula 𝑛 𝐵 𝑇 𝜇 𝐵 𝑇 − 𝜇² + 𝑛 𝑂 (𝑇) 𝜇 𝑂 𝑇 − 𝜇², where μ denotes the combined mean It is important to note that the between-class variance reflects the weighted variance of the cluster means in relation to the overall mean By substituting μ with the expression 𝑛 𝐵 𝑇 𝜇 𝐵 (𝑇) + 𝑛 𝑂 𝑇 𝜇 𝑂 (𝑇) and simplifying, we can derive further insights into the relationship between the variables involved.

So, for each potential threshold T we

 Separate the pixels into two clusters according to the threshold

 Find the mean of each cluster

 Square the difference between the means

 Multiply by the number of pixels in one cluster times the number in the other

The optimal threshold for distinguishing between two clusters is determined by the difference in their means, eliminating the need to compute individual intensity differences This threshold maximizes the variance between classes while minimizing the variance within each class.

Object description

Objects in images are represented as collections of pixels, necessitating the description of pixel group properties through numerical descriptors for effective recognition For these descriptors to be useful, they must fulfill four key properties: completeness, congruence, invariance, and compactness Completeness ensures that two objects share the same descriptors only if they have identical shapes, while congruence allows for the recognition of similar objects through similar descriptors Invariance is crucial, particularly rotation-invariant descriptors, which aid in recognizing objects regardless of orientation, scale, or viewpoint changes Lastly, descriptors should be compact, conveying the essence of an object efficiently without unnecessary information Despite the absence of a universal set of complete and compact descriptors for all objects, optimal recognition performance is achieved through the careful selection of properties tailored to specific applications Object characterization can be achieved through region descriptors, which focus on pixel arrangements within an area, and shape descriptors, which emphasize pixel arrangements along the perimeter, a distinction commonly utilized in image analysis Techniques such as region growing and differentiation are employed to derive descriptors for an object's boundary.

Feature vectors extraction

The Fourier Transform decomposes an image into its sine and cosine components, effectively transforming it from the spatial domain to the frequency domain This mathematical process allows for the exact approximation of any function as a sum of infinite sine and cosine functions, making it a powerful tool for image analysis The two-dimensional Fourier Transform specifically applies this concept to images, revealing their frequency characteristics.

In image processing, the spatial domain is represented by the image value f, while its frequency domain is denoted by F, resulting in complex numbers from the transformation This complex data can be visualized through either a real image and a complex image or through magnitude and phase images However, the magnitude image is primarily significant, as it holds essential information regarding the geometric structure of the image If modifications are made to the image in these forms and a re-transformation is required, it is crucial to preserve both the magnitude and phase components.

The log-polar transform is a method for converting two-dimensional images from Cartesian to polar coordinates, represented as (𝑥, 𝑦) ↔ 𝑟𝑒^(𝑖𝜃), where 𝑟 is calculated as √(𝑥² + 𝑦²) and 𝜃 is derived from arctan(𝑦/𝑥) To express these polar coordinates relative to a specified center point (𝑥𝑐, 𝑦𝑐), the transformation involves taking the logarithm, resulting in 𝜌 = log(√((𝑥 − 𝑥𝑐)² + (𝑦 − 𝑦𝑐)²)) and 𝜃 = arctan((𝑦 − 𝑦𝑐)/(𝑥 − 𝑥𝑐)) In image processing, a scaling factor (m) is often applied to ρ to optimize the representation of significant features within the constraints of available image memory Figure 2 illustrates a square object alongside its log-polar space encoding.

Fig 2 The log-polar transform maps (𝐱, 𝐲) into (𝐥𝐨𝐠(𝐫), 𝛉)

The log-polar transform enables the creation of two-dimensional invariant representations of object views by centering the transformed image’s center of mass at a fixed point in the log-polar plane This method effectively addresses the challenge of recognizing shapes, such as squares, that vary in size and orientation In the log-polar plane, size discrepancies are represented as shifts along the log(r) axis, while rotation differences are reflected as shifts along the θ-axis By recentering the transformed squares to a specific position in the log-polar plane, all squares can be represented identically, achieving invariance to two-dimensional rotation and scaling.

Fig 3 Log-polar transform of rotated and scaled squares: size goes to a shift on the 𝐥𝐨𝐠(𝐫) axis and rotation to a shift on the 𝛉 − 𝐚𝐱𝐢𝐬

Measure of similarity

In the realm of digital multimedia, content-based image retrieval (CBIR) involves creating a database of images, each represented by feature vectors derived from color, shape, and texture When a user submits a query image, the system calculates similarity measures between the query and the stored images to identify and display the most similar results.

To determine the proximity of one vector to another, a suitable similarity measurement must be chosen This problem can be transformed into calculating the discrepancy between two vectors, 𝑥 and 𝑦, in a d-dimensional space, denoted as 𝑅 𝑑 Various distance measurements can be employed to compute this discrepancy, enabling the comparison of vector similarities.

The Euclidean distance between 𝑥, 𝑦 ∈ 𝑅 𝑑 is computed by

A similar measurement called the cityblock distance, which takes fewer operations, is computed by

𝜏 1 𝑥, 𝑦 = 𝑥 − 𝑦 1 = 𝑑 𝑗 =1 𝑥 𝑗 − 𝑦 𝑗 (21) Another distance measurement called the supremenorm, is computed by

The Mahalanobis distance between two vectors 𝑥and 𝑦with respect to the training patterns {𝒙 𝑖 }is computedby

𝛿 2 𝑥, 𝑦 = 𝑥 − 𝑦 𝑡 𝑆 −1 𝑥 − 𝑦 , (23) where the mean vector 𝑢and the sample covariance matrix 𝑆from the sample 𝑥 𝑖 |1 ≤ 𝑖 ≤

𝑛 of size n are computed by 𝑆=1𝑛𝑖=1𝑛𝑥𝑖−𝑢𝑥𝑖−𝑢𝑡 with

The chord distance between two vectors x and y is to measure the distance between the projected vectors of x and y onto the unit sphere, which can be computedby

A simple computation leads to 𝛿 3 𝑥, 𝑦 = 2sin⁡(𝛼/2) with 𝛼 begin the angle between vectors 𝑥and 𝑦

A similar measurement based on the angle between vectors 𝑥 and 𝑦 is defined as

Proposed method

Pre-processing

In the initial stage of image processing, images are resized to a smaller dimension of 300 pixels on the shorter side and converted to grayscale Subsequently, Otsu’s method is applied to convert these images into binary trademarks, effectively minimizing the weighted within-class variance while maximizing the between-class variance Otsu's algorithm is one of five automated thresholding techniques, which also include methods like finding peaks and valleys, clustering (K-Means Variation), mixture modeling, and multispectral thresholding.

Visual shape objects extraction

In this stage, connected contours are extracted from the binary image using Suzuki's algorithm, with each detected contour represented as a vector of points These vectors are organized into a hierarchical structure that provides information about the image's topology The texture of the image influences the number of contours identified.

The article discusses four options for contour retrieval: (i) extracting only the extreme outer contours, (ii) obtaining all contours without hierarchical relationships, (iii) organizing all contours into a two-level hierarchy, and (iv) retrieving all contours while reconstructing a comprehensive hierarchy of nested contours.

In this research, we focused on extracting shape objects from binary trademark images, addressing the challenge of noise that often results in the extraction of unwanted contours To mitigate this issue, we implemented a filtering process that effectively removes noise contours, as dominant shape contours typically have a significantly larger area Our analysis revealed that most trademarks in our database feature one or two key shape objects that are vital to a company's identity Consequently, we developed an algorithm designed to extract up to two dominant shape objects from a binary image, consisting of four main steps and a function called FilterContours This function operates based on two thresholds, T1 = 3.82 and T2 = 81707, identified through experimental analysis of the trademark images in our database.

Algorithm : FilterContours(List of Contours)

1 Compute each contour area using the Green formulapresented in [29]

2 Sort these extracted contours according to descending order of contours area

3 Remove noise shape contours in trademark image; just keep two 2nd and 3rd biggest area contours

4 Remove one of the kept contours by FilterContours function

4.1 if (The area of 2nd contour is T1 times bigger than that of 3rd contour and the area of 2nd is less than T2) then 3rd contour is removed and 2nd contour is remained

4.2 else if (the area of 2nd is greater than T2) then 2nd contour is deleted and 3rd contour is kept

4.3 else then both contours are maintained return (One or Two Contours)

Scale, rotation, translation invariant features

Each extracted shape object generates a corresponding feature vector that must remain invariant to rotation, scale, and translation to effectively generalize actions like duplication and resizing This paper employs the RBRC algorithm, which consists of three key steps: performing a two-dimensional Fourier transform (DFT), converting the Fourier representation into polar coordinates, and applying the DFT again The RBRC algorithm ensures that our method is invariant to scale, rotation, and translation, aligning with the Fourier-Mellin approach and polar Fourier representation Previous approaches have utilized phase correlation techniques with polar representation to tackle the challenges posed by translated and rotated objects, highlighting a significant distinction in methodology.

The RBRC algorithm employs the Discrete Fourier Transform (DFT) in its final step, whereas the Fourier-Mellin method utilizes phase correlation The invariance of the RBRC algorithm to rotation, translation, and scale is explained through the theory of the log-polar transform (LPT), which relates to the magnitude of the DFT This article discusses the significance of the log-polar transform in enhancing the robustness of the RBRC algorithm.

(𝐿𝑃𝑇) and Representation of magnitude of 𝐷𝐹𝑇 into log-polar coordinate for reading convenience

Log-polar transform is a nonlinear and nonuniform sampling method used to convert image from the Cartesian coordinates 𝐼(𝑥, 𝑦) to the log-polar coordinates 𝐼𝐿𝑃(𝜌, 𝜃) [24]

The mathematical expression of 𝐿𝑃𝑇 procedure is shown below:

(𝑥−𝑥 𝑐 ) (25) where(𝑥 𝑐 , 𝑦 𝑐 ) is the center pixel of the transformation in the Cartesian coordinates

In image processing, the coordinates (x, y) represent the sampling pixel in Cartesian coordinates, while (ρ, θ) indicate the log-radius and angular position in log-polar coordinates For a scaled and rotated image g(x', y') derived from the original image f(x, y), the transformation involves scale rotation parameters a and α degrees.

In log-polar coordinate, 𝑓(𝜌, 𝜃) → 𝑔(𝜌 ′ 𝜃 ′ ) we have:

Expression (5), (7) proves that scaling and rotation in Cartesian domain corresponds to pure translation in log-polar domain However, when the original image is translated by

(𝑥 0 , 𝑦 0 ), the corresponding log-polar coordinates is represented by:

The equations referenced indicate that slight translations result in a modified log-polar image To address this issue, the algorithm first employs a Fourier transform followed by a Log-Polar Transform on the magnitude spectrum This approach ensures that the magnitudes of the Fourier transform for both the original and translated images remain invariant to translation while still capturing the effects of rotation and scaling This concept is further clarified through the representation of the magnitude of the Discrete Fourier Transform (DFT) in log-polar coordinates.

The power spectrum of an image, when subjected to rotation and translation, will reflect these transformations in its orientation, while the magnitude remains unchanged, as supported by various studies.

𝐹 1 (𝜉, 𝜂)and 𝐹 2 (𝜉, 𝜂)be the Fourier transforms of images 𝑓 1 (𝑥, 𝑦)and 𝑓 2 (𝑥, 𝑦), respectively

We are interested in three below cases:

The first case: If 𝑓 2 differs from 𝑓 1 only by a displacement 𝑥 0 , 𝑦 0 ), then

In second case: If 𝑓 2 (𝑥, 𝑦) is a translated and rotated replica of 𝑓 1 (𝑥, 𝑦) with translation (𝑥 0 , 𝑦 0 ) and rotation 𝜃 0 , then

𝑓 2 𝑥, 𝑦 = 𝑓 1 𝑥 cos 𝜃 0 + 𝑦 sin 𝜃 0 − 𝑥 0 , −𝑥 sin 𝜃 0 + 𝑦 cos 𝜃 0 − 𝑦 0 (34) DFTs of𝑓 1 and 𝑓 2 are related as shown below:

Ii is supposed that 𝑀 1 and 𝑀 2 are the magnitudes of 𝐹 1 and 𝐹 2 , respectively They have relation as shown:

𝑀 2 𝜉, 𝜂 = 𝑀 1 (𝜉 cos 𝜃 0 + 𝜂 𝑠𝑖𝑛 𝜃 0 , −𝜉 sin 𝜃 0 + 𝜂 cos 𝜃 0 ) (36) The Fourier magnitude spectra are transformed to polar representation:

𝑀 1 𝜌, 𝜃 = 𝑀 2 𝜌, 𝜃 − 𝜃 0 (37) Where𝜌 and 𝜃 are the radius and angle in the polar co-ordinate system, respectively

In the last case:If 𝑓 1 is a translated, rotated and scaled version of 𝑓 2 , the Fourier magnitude spectra are transformed to log-polar representations and related by:

It is obviously that scale, rotation, translation of image is represented as translation in frequency domain in Log-Polar coordinate This translation is eliminated by applying 𝐷𝐹𝑇 one more time

Fig 5 Illustration of three stages of the proposed method

Our method consists of three key stages Initially, the query image is created by merging two images from our database Following pre-processing, the query image is converted into a binary format In the object extraction stage, we identify and extract the two dominant shape objects Finally, we derive two feature vectors from these shape objects for further analysis.

Measure of similarity

An act of copying a trademark can be done by:

 One trademark is scaled, rotated, or translated from the other

 One trademark is combined from a part of the other

 One trademark is mirrored from the other

To identify copied trademark images, we develop a trademark similarity measure utilizing feature vectors Each trademark input image is represented by one or two feature vectors Let \( I \) and \( I' \) denote the two trademark images, with \( F_i \) and \( F_j \) representing their respective feature vectors, where \( i = 1, 2 \) and \( j = 1, 2 \).

We propose that the degree of similarity between two trademarks, denoted as 𝑆(𝐼, 𝐼 ′ ), is primarily determined by the smallest distance between their respective feature vectors, represented as 𝑑𝑖𝑠𝑡(𝐹 𝑖 , 𝐹 𝑗 ) To calculate this distance, we utilized Euclidean distance, which provides a quantitative measure of similarity between the two feature sets.

Experiments and results

Implementation

OpenCV is an open-source library licensed under BSD that features hundreds of algorithms for computer vision Its modular structure allows it to include various shared and static libraries, providing a wide range of functionalities through its available modules.

 core - a compact module defining basic data structures, including the dense multi- dimensional array Mat and basic functions used by all other modules

The imgproc module is a comprehensive image processing tool that offers a variety of features, including both linear and non-linear image filtering, geometrical transformations such as resizing, affine and perspective warping, and generic table-based remapping Additionally, it provides capabilities for color space conversion and histogram analysis, making it an essential resource for advanced image manipulation and analysis.

 video - a video analysis module that includes motion estimation, background subtraction, and object tracking algorithms

 calib3d - basic multiple-view geometry algorithms, single and stereo camera calibration, object pose estimation, stereo correspondence algorithms, and elements of 3D reconstruction

 features2d - salient feature detectors, descriptors, and descriptor matchers

 objdetect - detection of objects and instances of the predefined classes (for example, faces, eyes, mugs, people, cars, and so on)

 highgui - an easy-to-use interface to video capturing, image and video codecs, as well as simple UI capabilities

 gpu - GPU-accelerated algorithms from different OpenCV modules

It is illustration of installation procedure of OpenCV library and how it can be integrated with Microsoft Visual Studio 2010

 Download OpenCV2.4.2 at www.sourceforge.net/projects/opencvlibrary/

The folder C:\OpenCV2.4.2 contains the source files necessary for OpenCV, which must be compiled for the Microsoft Visual Studio 2010 development environment To facilitate this process, the CMake utility can be utilized to generate the appropriate build files for Microsoft Visual Studio 2010.

 Download and Install CMake utility from (www.cmake.org/files/v2.8/cmake- 2.8.2-win32-x86.exe)

 Open CMake and select the source directory for OpenCV source files i.e( C:\OpenCV2.0) Select the build directory, for instance C:\OpenCV2.4.2\Build

 Once source and build directories are selected Press Configure button and specify generator Microsoft Visual Studio 10 and hit finish

After clicking the finish button, CMake will analyze all source files and present all variables utilized within them, highlighting these variables in red initially To resolve any issues, press the Configure button repeatedly until all red entries are eliminated.

 Press Generate button and build files for Microsoft Visual Studio 2010 will be generated in C:\OpenCV2.4.2\Build

 After verifying that build files have been generated, start Microsoft Visual Studio

2010 and open solution C:\OpenCV2.4.2\Build\OpenCV.sln and build this solution, if build is successful it will generate required binaries of OpenCV and they are done.

Test results for exact copy actions

In this experiment, a trademark image from the testing database is utilized as a direct query without any modifications As illustrated in Fig 5, the retrieval results showcase the five most similar trademarks, organized in descending order of similarity, based on the input query image displayed in the Query Image column.

Our method effectively identifies exact matches of trademarks within our database and also showcases visually similar trademarks from the testing database, as illustrated in Fig 7.

Fig 7 Results for exact copy tests

Test results for scaling action

In this experiment, we rescale the query image at scaling ratio 0.7, 0.6, 0.5, and 0.4

Figure 8 illustrates that scaling actions with a rate greater than 0.5 are accurately identified However, the test result for a scaling rate of 0.4 is incorrectly recognized, as the original trademark image is absent from the five retrieved images.

Fig 8 Result for scaling tests

To assess the impact of scaling actions on recognition rates, we applied various scaling ratios to all trademark images in the testing database The accuracy rate was determined by the ratio of correctly recognized images to the total number of tests conducted As shown in Table I, the accuracy rate declines significantly when the scaling ratio falls below 0.5.

TABLE I ACCURACY RATE OF SCALING ACTION

We assess our approach by examining a trademark that has been scaled to 0.7 and translated to the corners of the input image This mimicry is detected with a high accuracy rate, as illustrated in Fig 9, where all query image results are accurately recognized.

Test case Query image Retrieved trademark images

Fig 9 Results for translation and scaling tests

Test results for rotating actions

To assess the effectiveness of our proposed method for trademark rotation copy actions, we rotated the query trademark by 30 degrees in each instance, and the results are illustrated in Fig 10 Notably, all cases presented were accurately recognized.

Degree Query image Retrieved trademark images

Fig 10 Results for rotation tests

Test results for mirror actions

The mirror copy actions were evaluated by vertically flipping the query trademark, with several results displayed in Fig 11 All cases presented were accurately recognized.

Fig 11 Results for mirror tests

Test results for partial copy actions

Partial copy actions involve taking a segment of a trademark and merging it with other elements to create a new trademark To assess our proposed method for these types of actions, we generate query images that reflect the same approach A query image is formed by incorporating at least one segment of a trademark image from the testing image database Figure 12 illustrates the query results for various test cases related to this partial copy action.

1 Combine two parts of two trademarks: no modification to any parts

2 Combine two parts of two trademarks: rotate one part 30 o and rescale the other at scale rate 0.7

3 Combine two parts of two trademarks: rotate one part 30 𝑜 and horizontally flip the other

4 Combine two parts of two trademarks: vertically flip one part and rescale the other at scale rate 0.7

5 Combine one part of a trademark with arbitrary shape

Figure 12 illustrates that all instances of partial trademark copies have been successfully identified, with the retrieved trademarks that include elements of the corresponding query image appearing at the top of the results list.

Fig 12 Results for parital copy tests

Test results for random query trademark

Trademark Query image Retrieved trademark images

Fig 13 Results for random tests

In this experiment, we analyzed several well-known trademarks as query images As illustrated in Fig 13, all retrieved trademarks exhibit visual similarities to their corresponding query trademarks Notably, since the query trademark images are absent from the testing database, only those trademarks that share a similar visual shape with the query trademarks appear in the results.

Testing summary

Table II presents the accuracy percentages from the experiments conducted, detailing various actions such as exact copy, translating, rotating, scaling, and mirroring For each action, all trademarks in the testing database were individually used as query trademarks, and the recognition accuracy was calculated as the ratio of correct recognitions to the total number of tests The proposed method achieved an impressive average accuracy rate of 96.8% (with a scale ratio greater than 0.5), demonstrating its robustness across different trademark copy actions, making it suitable for real-world content-based trademark recognition systems.

TABLE II RESULTS OF OUR PROPOSED METHOD

Conclusion

This thesis introduces an innovative trademark image retrieval method that enhances the process of identifying similar trademarks The approach begins by isolating the dominant visual shapes of the input query trademark, followed by the creation of feature vectors for each shape using a scale-, rotation-, and translation-invariant algorithm These feature vectors facilitate the calculation of similarity measures between trademarks, allowing for the retrieval of the five most similar trademark images from a predefined database Extensive experiments were conducted to simulate various trademark copying scenarios, including exact copies, scaling, rotating, and mirroring The results demonstrate the robustness and practical applications of the proposed method across different types of trademark copying actions.

[1] C.K Chen, Q.Q Sun and J.Y Yang, “Binary Trademark Image Retrieval Using Region Orientation Information Entropy,” International Conference on Computational Intelligence and Security Workshops, Harbin, China, vol 167, pp 295-298, Dec 2007

[2] Z Hong and Q Jiang, “Hybrid Content-based Trademark Retrieval using Region and Contour Features,” 22nd International Conference on Advanced Information Networking and Applications Workshops, Okinawa, Japan, vol 82, pp 1163-1168, March 2008

[3] B.J Zou and M.P Umugwaneza, “Shape-based Trademark Retrieval using Cosine Distance Method,” Eighth International Conference on Intelligent Systems Design and Applications, Kaohsiung, Taiwan, vol 161, pp 498-503 (2008)

[4] L Li, D Wang and G.Cui, “Trademark Image Retrieval using Region Zernike Moments,” Second International Symposium on Intelligent Information Technology Application, Shanghai, China vol 330, pp 301-305, Dec 2008

[5] You Fu-cheng and Z Cong, “The Technique of Color and Shape-based Multi- feature Combination of TradeMark Image Retrieval,” International Conference on Multimedia Technology (ICMT), Ningbo, China, pp 1-5, Oct 2010

[6] W.W Lai, W.Y.N Wing, P.K.C.Patrick and S.Y Daniel, “Trademark classication by shape using ensemble of RBFNNS,” International Conference on Machine Learning and Cybernetics (ICMLC), Qingdao, China, vol 1, pp 391-396 July 2010

[7] T Iwanaga, H Hama, T Toriu, P Tin and T.T Zin, “A ModiedHistogramApproach to Trademark Image Retrieval,” International Journal of Computer Science and Network Security, vol 11, pp 56-62 April 2011

[8] K Ranjeet, R.C Tripathi and M.D Tiwari, “A Comprehensive Study on Content based Trademark,” International Journal of Computer Applications, vol 13, pp 18-

[9] V N Akriti, K.G Arpit and R.C Tripathi, “Content based Trademark Retrieval by Integrating Shape with Colour and Texture Information,” International Journal of Computer Applications, vol 22, pp 40-45, May 2011

[10] Z WANG and K.HONG, “A Novel Approach for Trademark Image Retrieval by Combining Global Features and Local Features,” Journal of Computational Information Systems, vol 8, pp 1633-1640, April 2012

[11] N Otsu, “A Threshold Selection Method from Gray-Level Histograms,” IEEE Transactions on Systems, Man, and Cybernetics, vol 9, pp 62-66, Jauary 1979

[12] S SUZUKI and K ABE, “Topological Structural Analysis of Digitized Binary Images by Border Following,” Computer Vision, Graphics, and Image Processing, vol 30 , pp 32-46, April 1985

[13] D.L Antonella, M Giovanni and A.S James, “A Rotation and Scale Invariant Descriptor for Shape Recognition,” Proceedings of 2010 IEEE 17th International Conference on Image Processing, Hong Kong, China, vol 17, pp 257-260, September 2010

[14] Q.Q SUN, C.K CHEN and Y.J LIU, “Binary Trademark Image Retrieval Using Distance Distribution Information Entropy,” Computer Engineering and Applications, vol 43, pp 71-73, 2007

[15] S M Smith and Brady, “SUSAN - A new Approach to Low Level Image Processing,” International Journal of Computer Vision, vol 23, 1995

[16] B.J Zou, Y.Y Yao and L Zhang, “A New Algorithm for Trademark Image Retrieval Based on Sub-block of Polar Coordinates,” Entertainment Computing, Issue: ICEC, Shang Chai, China, vol 4740, pp 91-97, September 2007

[17] M , Zhou G., Geng, Wei na, “How to Adjust the Weights in Image Retrieval” The Technique of Image Retrieval, pp 211-215 2007

[18] C.S Rao, S.S Kumar and B.C Mohan, “Content Based Image Retrieval Using Extract Legendre Moment And Support Vector Machine,” The International Journal of Multimedia Its Applications, vol 2, pp 69-79, May 2010

[19] B.S Reddy and B.N Chatterji, “An FFT-Based Technique for Translation Rotation, and Scale-Invariant Image Registration,” IEEE Transactions on Image Processing, vol 5, pp 1266-1271, Aug 1996

[20] G Wolberg and Zokai, “Robust Image Registration Using Log-polar Transform,” International Conference on Image Processing, Vancouver, Canada, vol 1, pp 493-

[21] V.J Traver and F Pla, “The Log-polar Image Representationr in Pattern Recognition Tasks,” Pattern Recognition and Image Analysis, vol 2652, pp 1032-

[22] J.N Sarvaiya, S Patnaik and K Kothari, “Image Registration Using Log Polar Transform and Phase Correlation to Recover Higher Scale,” Journal of Pattern Recognition Research, vol 7, pp 90-105, February 2012

[23] T.H Huy and R Goecke, “Optical Flow Estimation Using Fourier Mellin Transform,” IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, vol 78, pp 1-8 June 2008

R Matungka's Ph.D dissertation, titled "Studies on Log-Polar Transform for Image Registration and Improvements Using Adaptive Sampling and Logarithmic Spiral," was completed in 2009 at Ohio State University in the field of Electrical and Computer Engineering The research focuses on enhancing image registration techniques through the application of log-polar transforms and adaptive sampling methods The full dissertation is accessible online at the OhioLINK ETD database.

[25] X Guo, Z Xu, Y Lu and Y Pang, “An Application of Fourier-Mellin Transform in Image Registration,” International Conference on Computer and Information Technology, Shanghai, China, vol 5, pp 619-623, Sept 2005

[26] Y., Keller, A Averbuch and M Israeli, Moshe, “Pseudo-polar Based Estimation of Large Translations Rotations and Scalings in Images,” IEEE Transactions on Image Processing, vol 14, pp 12-22, Jan 2005

[27] N.G Tuan, L.T Ha and N.T.N Thanh, “Text Recognition Method for Trademark Images,” The 12th International Conference on Electronics, Information, and Communication (ICEIC 2013), vol 267, pp 201-203, 2013

[28] [Online] Available: http://www.noip.gov.vn/, November 2012

[29] [Online] Available: http://www.math24.net/greens-formula.html

[30] C.C Chen, H.T Chu, “Similarity Measurement Between Images”, Computer Software and Applications Conference, vol.2, pp 41-42,July 2005

[31] Bryan S Morse, “Lecture note Thresholding”, Brigham Young University, January,

[32] G.Bradski, A.Kaehler, “Learning OpenCV”,Sep 2008

[33] K.Ranjeet, R.C.Tripathi, M.D.Tiwari, “A Comprehensive Study on Content Based Trademark Retrieval System”, International Journal of Computer Applications, vol

[34] Open Source Computer Vision Library: http://opencv.org

This section provides the source code for the thesis, organized into two main parts: the first part includes a variety of implemented functions, while the second part consists of the main program that utilizes these functions.

Images are scaled down with the smaller side of 300 pixels and converted into gray scales The images are then converted into binary trademark using Otsu’s.

Input All trademark images in database

Idea Apply resize function in openCV

Source code void RESIZEIMAGE(Mat &src,Mat&dst) { double a=(double)300/src.rows,b=(double)300/src.cols; double r=src.rowssrc.cols?a:b; resize(src,dst,Size(),r,r,1);

Using Suzuki's algorithm, we extract shape objects defined by connected contours from binary images Observations indicate that the dominant shape contours typically occupy a significantly larger area than the noise contours.

Input Binary trademark images in previous stage

Output A number of shape object images

The presence of noise in input trademark images often leads to the extraction of multiple noise contours as shape objects To address this issue, a filtering process is implemented to eliminate these noise contours Given that most trademarks in our database typically feature one or two dominant shapes, we propose an algorithm designed to extract up to two dominant shape objects from a binary image This algorithm consists of four main steps and includes a function called FilterContours, which is specifically responsible for isolating the two dominant shape objects The operation of FilterContours relies on two critical thresholds: T1 = 3.82 and T2 = 81707.

Source code void EXTRACTOBJ(Mat &src,Mat&obj1,Mat &obj2) { vector contours; vector hierarchy; contours.clear(); hierarchy.clear();

Mat bw; threshold(img,bw,1,255,THRESH_OTSU); findContours( bw, contours, hierarchy, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE );

Scalar color( (rand()&255), (rand()&255), (rand()&255) ); for(size_t i=0;i81707)

{ obj1 = Mat::zeros(img.size(), CV_8UC1); drawContours( obj1, contours, l-3, color, -1, 8, hierarchy,1 ); threshold(obj1,obj1,1,255,THRESH_OTSU);

To process the image, initialize a black matrix òbj1` of the same size as the input image using `Mat::zeros`, and then draw contours on it using `drawContours`, specifying the contour index `l-2` and the desired color Apply Otsu's thresholding to òbj1` to binarize the image Repeat the process by creating another black matrix òbj2`, drawing contours with the index `l-3`, and applying the same thresholding technique to obtain a binary image.

Scale, rotation, translation invariant features extraction

Description For each extracted shape object, a corresponding feature vector is created.

Input Each extracted shape objects

Output Scale-invariant rotation-invariant, and translation-invariant features

Apply RBRC algorithm in [13] which is composed of three steps: a two- dimensional Fourier transform (𝐷𝐹𝑇), the magnitude of the Fourier represented into polar coordinates, and 𝐷𝐹𝑇.

Source code void DFT( Mat &src,Mat&dst) { int M = getOptimalDFTSize( src.rows ); int N = getOptimalDFTSize( src.cols );

Mat padded; copyMakeBorder(src, padded, 0, M - src.rows, 0, N - src.cols, BORDER_CONSTANT, Scalar::all(0));

Mat planes[] = {Mat_(padded), Mat::zeros(padded.size(), CV_32F)};

To process an image using the Discrete Fourier Transform (DFT), first, merge the image planes into a complex image Then, apply the DFT to this complex image and split it back into its original planes Calculate the magnitude of the frequency components and store the result in the first plane To enhance visibility, add a scalar value of one to the magnitude result and apply a logarithmic transformation Finally, crop the image to ensure even dimensions by adjusting the width and height, and determine the center coordinates of the resulting image.

Mat q0(dst, Rect(0, 0, cx, cy));

Mat q1(dst, Rect(cx, 0, cx, cy));

Mat q2(dst, Rect(0, cy, cx, cy));

Mat q3(dst, Rect(cx, cy, cx, cy)); q0.copyTo(tmp); q3.copyTo(q0); tmp.copyTo(q3); q1.copyTo(tmp); q2.copyTo(q1); tmp.copyTo(q2); normalize(dst, dst, 0, 1, CV_MINMAX);

} void POLARTRANSFORM(Mat &src,Mat&dst) {

CvMat *dst1=cvCreateMat(360,425,CV_32FC1); cvLogPolar( src1, dst1, cvPoint2D32f(src1-

>cols/2,src1->rows/2), 78,CV_INTER_LINEAR+CV_WARP_FILL_OUTLIERS); dst=cvarrToMat(dst1,true);

} void RBRC(Mat &src,Mat&dst) {

Matching by measure of similarity and retrieval Trademark Images

Description In order to recognize the copied trademark image, we derive a trademark similarity measure based on its feature vectors

Output Most five similar trademark images

After generating feature vectors, a trademark input image is represented by one or two feature vectors Let \( I \) and \( I' \) denote the two trademark images, with \( F_i \) and \( F_j \) representing their respective feature vectors, where \( i = 1, 2 \) and \( j = 1, 2 \) The similarity degree of the two trademarks, \( S(I, I') \), is determined by the smallest distance between the feature vectors, denoted as \( dist(F_i, F_j) \) To calculate this distance, we utilize the Euclidean distance formula.

Source code void VECTORDISTANCE(vectorfeaturevectors,Mat&padded,vect or&distance)

RBRC(padded,dst); for(size_t i=0;i

Tiêu đề	Trademark Image Retrieval Based On Scale, Rotation, Translation, Invariant Features
Tác giả	Nguyen Tiendung
Người hướng dẫn	Dr. Le Thanh Ha
Trường học	University of Engineering and Technology Vietnam National University
Chuyên ngành	Information Technology
Thể loại	master thesis
Năm xuất bản	2014
Thành phố	Hanoi

Định dạng
Số trang	64
Dung lượng	2,14 MB