(LUẬN văn THẠC sĩ) trích xuất ảnh trademark dựa trên các đặc trưng bất biến dịch chuyển, quay, tỷ lệ

Introduction

A trademark is defined as a word, design, picture, or complex symbol representing a company's product or service There are four main types of trademarks categorized by visual complexity: word-in-mark (text only), device-mark (graphic elements), composite-mark (text and graphics), and complex-mark (intricate images).

Fig 1 Some trademark image samples

Every company or financial organization aims to have a unique and meaningful logo that reflects its identity and offers exclusive rights An effective logo not only attracts consumers to its products or services but also enhances market viability by minimizing consumer confusion.

The expansion of global markets has led to a surge in businesses and an increasing number of trademark registrations worldwide, necessitating distinctiveness from existing trademarks With millions of trademarks already registered and many applications pending, trademark authorities face significant challenges in processing these requests The traditional manual classification methods used to identify similar trademarks are proving to be inefficient and labor-intensive, making it difficult for officials to determine if a trademark is duplicated or infringes on existing copyrights or licenses This situation highlights the urgent need for automated technology to streamline the trademark registration process and enhance accuracy.

Various techniques are employed for trademark distinctness checks, with Content Based Image Retrieval (CBIR) being the most recognized method CBIR utilizes visual features such as color, texture, and shape to identify similarities among trademarks, which can include combinations of text, graphics, and colored textures The extraction of color features through color histogram techniques is essential, as is the consideration of shape features in CBIR applications While numerous approaches exist for image retrieval, including improved pattern matching algorithms and broader methods that search text files, challenges remain, particularly with existing shape-based techniques for logo retrieval, which may not adequately address the complexities of trademark distinctness.

This thesis presents a novel method for trademark image retrieval that involves separating the input trademark image into dominant visual shapes For each shape, a feature vector is generated that is invariant to scale, rotation, and translation The similarity between two trademarks is then assessed using these feature vectors This research is detailed in the manuscript titled “Trademark Image Retrieval Based on Scale, Rotation, Translation, Invariant Features,” which was published in the proceedings of the 2013 IEEE RIVF International Conference on Computing and Communication Technologies, held from November 10-13, 2013.

This thesis is structured into several key chapters: Chapter 1 introduces the thesis, while Chapter 2 reviews related works Chapter 3 discusses the background of relevant problems, and Chapter 4 details the proposed method In Chapter 5, the installation process of Visual Studio 2010 with OpenCV 2.4.2 on Windows 7 is outlined, alongside the presentation of experimental results Finally, Chapter 6 concludes the thesis, and the Appendix includes the complete source code for reader convenience.

Related work

In recent years, researchers have introduced various solutions to reduce the workload of trademark registration offices Chen, Sun, and Yang proposed a two-step method for computing feature vectors, where the object region from a rotated image is divided into 16 equal parts, and an entropy vector is generated by calculating the information entropy of each region This shape-based retrieval technique demonstrates effective performance and strong invariance to rotation, translation, scale, noise, thickness, and human visual perception However, the reliance on a single-feature retrieval system limits its ability to address multiple aspects of appreciation To enhance this approach, alternative methods such as single-feature Zernike Moments and invariant moments have been explored.

[3, 5, 6] features of each are combined with other features.Experiment results presented by

The method discussed in [4] demonstrates consistent performance and strong invariance to translation, rotation, and scale, along with low noise sensitivity, enhancing its robustness against noise However, existing trademark image retrieval methods face challenges, particularly in handling geometric deformations, accuracy, and alignment with human visual perception In contrast, the Zernike moment method presented in [10] enables rapid trademark retrieval A novel approach introduced in [3] utilizes cosine distance and normalized distance measures, which standardize feature vectors to unit length, ensuring invariance to in-plane scaling transformations This method combines cosine and Euclidean distances for improved accuracy by integrating global features, such as invariant moments and eccentricity, with local features like entropy and distance histograms Initially, it indexes the trademark image database to narrow the search scope and reduce computation time, subsequently calculating the total similarity of feature vectors Additionally, [3] explores four shape features, leveraging both global and local characteristics for enhanced retrieval performance.

Recent advancements have integrated nine feature vectors derived from low-order color moments in HSV color space with low-order Hu moments and eccentricity, utilizing Rui and Huang's (1998) technique for gray shape-region images This approach employs Gauss normalization, allowing for flexible adjustment of feature weights Experimental results demonstrate that this multi-feature combination outperforms single-feature methods Additionally, ten invariant moments have been utilized as shape features for trademark images, enhanced through an ensemble of RBFNNs trained to minimize localized generalization error Currently, the system focuses on black-and-white trademark images, with plans to expand its capabilities to include color trademark images in future research.

In [2, 7], the ways of proposed combination of features are definitely different It is admitted that each of them performs well The equidistance based on concentric circles

The article discusses various methodologies for partitioning regions in image processing, highlighting the initial step labeled as [14] The implementations of the second step differ between [4] and [2]; while [4] generates a feature vector F based on the corresponding region ZM, [2] integrates 200 region feature vectors with contour features derived from the enhanced SUSAN algorithm by Hong & Jiang Iwanaga et al [7] introduced a modified angle-distance pair-wise histogram, which utilizes both angle and distance histograms of trademark objects, demonstrating superior performance over traditional moment-based and independent histograms, including those based on angle, distance, and color Experiments on registered trademark databases yielded impressive results, showcasing the robustness of this approach, which is also noted for its simplicity in constructing the distance-angle pair-wise histogram for trademark objects.

The innovative approach integrates Zernike moments with SIFT features, enhancing the accuracy and retrieval speed of trademark image searches Initially, Zernike moments are extracted and sorted by similarity to form candidate images, followed by precise matching of the query image using SIFT features This method outperforms traditional single-feature techniques, maintaining high precision and recall while improving retrieval efficiency Additionally, Tuan N.G et al introduced a method that leverages the discriminative properties of trademark images for enhanced text recognition, demonstrating significant improvements in accuracy compared to conventional methods This advancement addresses key aspects of trademark image recognition.

Many existing approaches to trademark comparison overlook both partial and mirrored trademarks, focusing solely on original trademarks without filtering out noise elements Additionally, these methods treat trademark images as complete objects, neglecting the intricate visual shapes within them, which hinders their ability to identify partial similarities Moreover, accurately calculating the distance between features is crucial for assessing similarity levels among images Consequently, each of these solutions attempts to propose a suitable measurement to address these limitations.

To address the limitations of existing methods, we propose an innovative content-based trademark recognition approach consisting of four key stages: first, we preprocess and downscale trademark images into binary format; second, we extract dominant shape objects from these binary images; third, we implement the RBRC algorithm to obtain features that are invariant to rotation, scale, and translation; and finally, we utilize Euclidean distance to assess the similarity between images, retrieving the ten most similar trademarks to the queried image This study specifically targets the Vietnamese composite-mark database.

Background

Pre-processing

Segmentation is the process of dividing an image into distinct regions or contours that correspond to different objects This technique typically involves identifying shared properties within regions or, alternatively, recognizing contours by highlighting the differences between various areas.

The simplest property that pixels in a region can share is intensity So, a natural way to segment such regions is through thresholding, the separation of light and dark regions

Thresholding transforms grayscale images into binary images by converting all pixels below a specified threshold to zero and all pixels above that threshold to one In this process, if \( g(x, y) \) represents the thresholded version of \( f(x, y) \) at a global threshold \( T \), the result is a simplified binary representation that highlights significant features in the image.

Thresholding primarily focuses on pixel intensity, neglecting the relationships between pixels, which can lead to inaccuracies in identifying contiguous regions This method often includes extraneous pixels outside the desired area while missing isolated pixels within the region, particularly near boundaries The challenges increase with noise, as it distorts pixel intensity, making it harder to represent the true characteristics of the region Consequently, adjusting the threshold can result in losing significant parts of the region or capturing excessive background noise Additionally, shadows can complicate the process by being misclassified as part of dark objects against lighter backgrounds.

Global thresholding can be problematic due to varying illumination across a scene, resulting in certain areas appearing brighter while others remain in shadow This variation in lighting can occur independently of the actual objects present in the image.

To address uneven illumination in images, we can utilize local thresholds rather than relying on a single global threshold This approach enables the threshold to vary smoothly across different regions of the image, enhancing overall image quality and detail.

Automated methods for finding thresholds

To establish a global threshold or adjust a local threshold for a specific area, we typically analyze the histogram to identify distinct modes representing the foreground and background A histogram serves as a probability distribution, aiding in this differentiation.

That is, the number of pixels 𝑛 𝑔 having greyscale intensity g as a fraction of the total number of pixels n

To effectively identify an object that is brighter than its background and occupies a specific fraction (1/𝑝) of the image, we can establish a threshold by determining the intensity level at which the desired percentage of image pixels falls below this value This threshold can be conveniently derived from the cumulative histogram of the image.

Simply set the threshold 𝑇such that 𝑐(𝑇) = 1/𝑝 (Or, if we’re looking for a dark object on a light background,𝑐(𝑇) = (1 − 1/𝑝)

A straightforward approach to determine an appropriate threshold involves identifying the local maxima (modes) in a histogram and locating the valleys (minima) between them However, this method faces two significant challenges: the potential noise in the histogram can lead to multiple local minima and maxima, necessitating a smoothing process before mode detection Additionally, when two distinct distributions are combined, the resulting histogram may not exhibit clear, separate modes.

Thresholding challenges arise from the overlap of value ranges between two groups of pixels Our objective is to reduce the classification error of background pixels being misidentified as foreground pixels, and vice versa To achieve this, we focus on minimizing the area under the histogram that corresponds to one region on the opposite side of the threshold.

The challenge lies in the absence of individual histograms for each region, as we only have a combined histogram It's important to note that the point of minimum overlap, where misclassified areas of the distributions are equal, does not always align with the valley in the combined histogram This discrepancy can occur when one cluster exhibits a wide distribution while the other has a narrow one To address this issue, we can treat the values in the two regions as distinct clusters Specifically, we define 𝜇𝐵(𝑇) as the mean of all pixels below the threshold and 𝜇𝑂(𝑇) as the mean of all pixels above the threshold, aiming to identify a threshold that satisfies this relationship.

To effectively differentiate between background and other pixels, the algorithm begins by estimating the background mean, μB(T), using the average of four corner pixels, while μO(T) is calculated as the average of the remaining pixels A threshold is then established midway between μB(T) and μO(T), allowing for the classification of pixels based on their intensity proximity to these means The algorithm iteratively updates the estimates of μB(T) and μO(T) by recalculating the means of the pixels on either side of the threshold until convergence is achieved While this approach is effective when the distributions have similar spreads, it struggles in scenarios where the variances differ significantly.

To achieve optimal clustering results, it is essential to set a threshold that tightens each cluster and reduces their overlap While the underlying distributions remain unchanged, we can modify the separation point by adjusting the threshold Altering the threshold affects the spread of the clusters, increasing one while decreasing the other The objective is to find the threshold that minimizes the overall spread This can be quantified by defining the within-class variance as the weighted sum of the variances of each cluster.

𝜍 𝐵 2 𝑇 = the variance of the pixels in the background (below threshold)

𝜍 2 𝑇 = the variance of the pixels in theforeground (above threshold) and [0,N − 1] is the range of intensity levels

Calculating the within-class variance for each class at every possible threshold can be computationally intensive However, a simpler approach exists: by subtracting the within-class variance from the total variance of the combined distribution, we obtain the between-class variance.

The combined variance (𝜍²) and combined mean (μ) are defined by the equation 𝑛 𝐵 𝑇 𝜇 𝐵 𝑇 − 𝜇 ² + 𝑛 𝑂 (𝑇) 𝜇 𝑂 𝑇 − 𝜇 ² Here, the between-class variance represents the weighted variance of cluster means around the overall mean By substituting μ with the expression 𝑛 𝐵 𝑇 𝜇 𝐵 (𝑇) + 𝑛 𝑂 𝑇 𝜇 𝑂 (𝑇) and simplifying, we can derive important insights into the relationship between cluster means and overall variance.

So, for each potential threshold T we

 Separate the pixels into two clusters according to the threshold

 Find the mean of each cluster

 Square the difference between the means

 Multiply by the number of pixels in one cluster times the number in the other

The optimal threshold is determined by the difference between the means of two clusters, which simplifies the process by eliminating the need to calculate individual intensity differences This threshold maximizes the between-class variance while minimizing the within-class variance.

Object description

In image analysis, objects are represented as collections of pixels, and their recognition relies on describing the properties of these pixel groups through object descriptors For effective recognition, descriptors must possess four key properties: they should form a complete set, ensuring that identical shapes yield identical descriptors; they must be congruent, allowing similar objects to be recognized through similar descriptors; they should exhibit invariant properties, such as being rotation, scale, and position invariant, which is crucial for recognizing objects from different viewpoints; and finally, they should be compact, efficiently representing the unique essence of an object with minimal information Despite the lack of a universal set of complete and compact descriptors for all objects, optimal recognition performance is achieved through the careful selection of properties tailored to specific applications Object characterization can be achieved through two forms of descriptors: region descriptors, which focus on the arrangement of pixels within an area, and shape descriptors, which emphasize the pixel arrangement along the perimeter or boundary Techniques such as region growing and differentiation are commonly used to identify edges and obtain object boundary descriptors.

Feature vectors extraction

The Fourier Transform decomposes an image into its sine and cosine components, effectively transforming it from the spatial domain to the frequency domain This mathematical technique demonstrates that any function can be precisely approximated by the sum of infinite sine and cosine functions, making it essential for image analysis In the case of two-dimensional images, the Fourier Transform provides a comprehensive representation of the image's frequency components.

In image processing, the transformation of an image from its spatial domain (f) to its frequency domain (F) yields complex numbers This transformation can be represented through either a real image and a complex image or through magnitude and phase images However, the magnitude image is of primary interest, as it encapsulates all essential information regarding the image's geometric structure To effectively modify the image in these forms and subsequently retransform it, it is crucial to preserve both the magnitude and phase information.

The log-polar transform is a method for converting two-dimensional images from Cartesian to polar coordinates, represented as (𝑥, 𝑦) ↔ 𝑟𝑒𝑖𝜃, where 𝑟 = 𝑥² + 𝑦² and 𝜃 = arctan⁡(𝑦/𝑥) To define the polar coordinates relative to a center point (𝑥𝑐, 𝑦𝑐), we use logarithmic scaling, where 𝜌 = log(√((𝑥 − 𝑥𝑐)² + (𝑦 − 𝑦𝑐)²)) and 𝜃 = arctan⁡((𝑦 − 𝑦𝑐)/(𝑥 − 𝑥𝑐)) This transformation is particularly useful in image processing, allowing us to effectively manage image memory by applying a scaling factor m to 𝜌 Figure 2 illustrates a square object alongside its representation in log-polar space.

Fig 2 The log-polar transform maps (𝐱, 𝐲) into (𝐥𝐨𝐠(𝐫), 𝛉)

The log-polar transform enables the creation of two-dimensional invariant representations of object views by centering the transformed image’s center of mass at a fixed point in the log-polar plane For instance, when recognizing various shapes as “squares,” the log-polar transform effectively addresses size and rotation discrepancies Size variations in the (x, y) plane translate to shifts along the log(r) axis, while rotation differences correspond to shifts along the θ-axis in the log-polar plane By recentering the transformed centers of these squares to a specific fixed position, all squares are represented identically in the log-polar plane, achieving invariance to two-dimensional rotation and scaling.

Fig 3 Log-polar transform of rotated and scaled squares: size goes to a shift on the 𝐥𝐨𝐠(𝐫) axis and rotation to a shift on the 𝛉 − 𝐚𝐱𝐢𝐬

Measure of similarity

In the realm of digital multimedia, content-based image retrieval (CBIR) involves creating a database of images, each represented by feature vectors derived from color, shape, and texture When a user submits a query image, the system calculates similarity measurements between the query and the stored images to identify and return the most similar ones.

To determine the closeness of two vectors, a suitable similarity measurement must be chosen This involves calculating the discrepancy between the vectors 𝑥 and 𝑦 in the space 𝑅 𝑑 Various distance measurements can be utilized for this purpose.

The Euclidean distance between 𝑥, 𝑦 ∈ 𝑅 𝑑 is computed by

A similar measurement called the cityblock distance, which takes fewer operations, is computed by

𝜏 1 𝑥, 𝑦 = 𝑥 − 𝑦 1 = 𝑑 𝑗 =1 𝑥 𝑗 − 𝑦 𝑗 (21) Another distance measurement called the supremenorm, is computed by

The Mahalanobis distance between two vectors 𝑥and 𝑦with respect to the training patterns {𝒙 𝑖 }is computedby

𝛿 2 𝑥, 𝑦 = 𝑥 − 𝑦 𝑡 𝑆 −1 𝑥 − 𝑦 , (23) where the mean vector 𝑢and the sample covariance matrix 𝑆from the sample 𝑥 𝑖 |1 ≤ 𝑖 ≤

𝑛 of size n are computed by 𝑆=1𝑛𝑖=1𝑛𝑥𝑖−𝑢𝑥𝑖−𝑢𝑡 with

The chord distance between two vectors x and y is to measure the distance between the projected vectors of x and y onto the unit sphere, which can be computedby

A simple computation leads to 𝛿 3 𝑥, 𝑦 = 2sin⁡(𝛼/2) with 𝛼 begin the angle between vectors 𝑥and 𝑦

A similar measurement based on the angle between vectors 𝑥 and 𝑦 is defined as

Proposed method

Pre-processing

In the initial stage, images are resized to a smaller dimension of 300 pixels on the shorter side and converted to grayscale Subsequently, Otsu's method is applied to transform these images into binary trademarks by minimizing the weighted within-class variance, or alternatively, maximizing the between-class variance Otsu's algorithm is recognized as one of five automated thresholding methods, which include finding peaks and valleys, clustering (K-Means Variation), mixture modeling, and multispectral thresholding.

Visual shape objects extraction

In this stage, Suzuki's algorithm is utilized to extract shape objects represented by connected contours from the binary image Each detected contour is stored as a vector of points, which are then organized into a hierarchical structure that reflects the image's topology The texture of the image plays a crucial role in determining the number of contours identified.

The article presents four methods for contour retrieval: (i) extracting only the extreme outer contours, (ii) retrieving all contours without hierarchical relationships, (iii) organizing all contours into a two-level hierarchy, and (iv) reconstructing a comprehensive hierarchy of nested contours.

In this research, we focused on shape object extraction from binary trademark images We initially extracted several shape object images; however, the presence of noise in the input images resulted in many noise contours being included To address this issue, we implemented a filtering process to eliminate these noise contours Our observations indicated that dominant shape contours typically occupy a significantly larger area than noise contours Given that most trademarks in our database consist of one or two key shape objects critical to a company's reputation, we developed an algorithm designed to extract up to two dominant shape objects from a binary image This algorithm comprises four main steps and includes a function called FilterContours, which is essential for isolating the two dominant shape objects The FilterContours operation relies on two thresholds, T1 = 3.82 and T2 = 81707, determined through experiments conducted on each trademark image in our database.

Algorithm : FilterContours(List of Contours)

Input:List of Contours Output:Dominant Shape Contours Procedure

1 Compute each contour area using the Green formulapresented in [29]

2 Sort these extracted contours according to descending order of contours area

3 Remove noise shape contours in trademark image; just keep two 2nd and 3rd biggest area contours

4 Remove one of the kept contours by FilterContours function 4.1 if (The area of 2nd contour is T1 times bigger than that of 3rd contour and the area of 2nd is less than T2) then 3rd contour is removed and 2nd contour is remained 4.2 else if (the area of 2nd is greater than T2) then 2nd contour is deleted and 3rd contour is kept 4.3 else then both contours are maintained return (One or Two Contours)

Scale, rotation, translation invariant features

In this study, we generate a feature vector for each extracted shape object, ensuring its invariance to rotation, scale, and translation to effectively generalize actions like duplication and resizing We employ the RBRC algorithm, which consists of three key steps: applying a two-dimensional Fourier transform (DFT), converting the Fourier magnitude into polar coordinates, and executing another DFT This approach achieves scale, rotation, and translation invariance, aligning with the Fourier-Mellin method and polar Fourier representation While previous methods integrate phase correlation with polar representation to tackle the challenges of translated and rotated objects, the RBRC algorithm distinguishes itself by utilizing DFT in its final step instead of phase correlation To support the invariance of the RBRC algorithm, we discuss the log-polar transform (LPT) in relation to the DFT magnitude, providing a comprehensive explanation for clarity.

The log-polar transform (LPT) is a nonlinear and nonuniform sampling technique that converts images from Cartesian coordinates \( I(x, y) \) to log-polar coordinates \( I_{LP}(\rho, \theta) \) This transformation is essential in image processing for better handling of scale and rotation variations The mathematical representation of the LPT procedure is outlined in the following expression.

In the transformation process, the center pixel is represented by (𝑥𝑐, 𝑦𝑐) in Cartesian coordinates, while (𝑥, 𝑦) signifies the sampling pixel The log-polar coordinates are defined by (ρ, θ), indicating the log-radius and angular position When considering a scaled and rotated image, denoted as 𝑔(𝑥′, 𝑦′), derived from the original image 𝑓(𝑥, 𝑦), the transformation incorporates scale and rotation parameters a and α degrees.

In log-polar coordinate, 𝑓(𝜌, 𝜃) → 𝑔(𝜌 ′ 𝜃 ′ ) we have:

Scaling and rotation in the Cartesian domain correspond to pure translation in the log-polar domain, as demonstrated by expressions (5) and (7) When the original image undergoes a translation by (𝑥₀, 𝑦₀), the equivalent representation in log-polar coordinates is established.

The equations discussed indicate that slight translations result in a modified log-polar image To address this limitation, the algorithm first applies a Fourier transform, followed by a Log-Polar Transform on the magnitude spectrum This approach ensures that the magnitudes of the Fourier transform remain invariant to translation while still capturing the effects of rotation and scaling These concepts are further clarified through the representation of the magnitude of the Discrete Fourier Transform (DFT) in log-polar coordinates.

The Fourier transform's rotation and translation properties indicate that the power spectrum of an image will rotate in alignment with the image's orientation, while its magnitude remains unchanged.

𝐹 1 (𝜉, 𝜂)and 𝐹 2 (𝜉, 𝜂)be the Fourier transforms of images 𝑓 1 (𝑥, 𝑦)and 𝑓 2 (𝑥, 𝑦), respectively

We are interested in three below cases:

The first case: If 𝑓 2 differs from 𝑓 1 only by a displacement 𝑥 0 , 𝑦 0 ), then

In second case: If 𝑓 2 (𝑥, 𝑦) is a translated and rotated replica of 𝑓 1 (𝑥, 𝑦) with translation (𝑥 0 , 𝑦 0 ) and rotation 𝜃 0 , then

𝑓 2 𝑥, 𝑦 = 𝑓 1 𝑥 cos 𝜃 0 + 𝑦 sin 𝜃 0 − 𝑥 0 , −𝑥 sin 𝜃 0 + 𝑦 cos 𝜃 0 − 𝑦 0 (34) DFTs of𝑓 1 and 𝑓 2 are related as shown below:

Ii is supposed that 𝑀 1 and 𝑀 2 are the magnitudes of 𝐹 1 and 𝐹 2 , respectively They have relation as shown:

𝑀 2 𝜉, 𝜂 = 𝑀 1 (𝜉 cos 𝜃 0 + 𝜂 𝑠𝑖𝑛 𝜃 0 , −𝜉 sin 𝜃 0 + 𝜂 cos 𝜃 0 ) (36) The Fourier magnitude spectra are transformed to polar representation:

𝑀 1 𝜌, 𝜃 = 𝑀 2 𝜌, 𝜃 − 𝜃 0 (37) Where𝜌 and 𝜃 are the radius and angle in the polar co-ordinate system, respectively

In the last case:If 𝑓 1 is a translated, rotated and scaled version of 𝑓 2 , the Fourier magnitude spectra are transformed to log-polar representations and related by:

It is obviously that scale, rotation, translation of image is represented as translation in frequency domain in Log-Polar coordinate This translation is eliminated by applying 𝐷𝐹𝑇 one more time

Fig 5 Illustration of three stages of the proposed method

Our method consists of three key stages: First, a query image is created by merging two images from our database Next, this image undergoes pre-processing and is converted into a binary format Subsequently, the two dominant shape objects are extracted during the object extraction phase Finally, feature vectors are derived from these shape objects.

Measure of similarity

An act of copying a trademark can be done by:

 One trademark is scaled, rotated, or translated from the other

 One trademark is combined from a part of the other

 One trademark is mirrored from the other

To identify copied trademark images, we develop a trademark similarity measure based on feature vectors Each trademark input image is represented by one or two feature vectors Let 𝐼 and 𝐼 ′ denote two trademark images, with 𝐹 𝑖 and 𝐹 𝑗 representing their respective feature vectors, where 𝑖 = 1, 2 and 𝑗 = 1, 2.

We propose that the degree of similarity between two trademarks, denoted as \( S(I, I') \), is determined by the smallest distance between their respective feature vectors This distance is represented by \( dist(F_i, F_j) \), where \( F_i \) and \( F_j \) belong to different sets To calculate this distance, we utilized Euclidean distance, which provides a quantitative measure of similarity between the two feature vectors.

Experiments and results

Implementation

OpenCV is an open-source library licensed under BSD that features hundreds of computer vision algorithms Its modular structure allows for the inclusion of various shared or static libraries, offering a wide range of available modules.

 core - a compact module defining basic data structures, including the dense multi-

The imgproc module is a comprehensive image processing tool that offers a variety of features, including both linear and non-linear image filtering, geometrical transformations such as resizing, affine and perspective warping, and generic table-based remapping Additionally, it supports color space conversion and histogram manipulation, making it an essential resource for advanced image processing tasks.

 video - a video analysis module that includes motion estimation, background subtraction, and object tracking algorithms

 calib3d - basic multiple-view geometry algorithms, single and stereo camera calibration, object pose estimation, stereo correspondence algorithms, and elements of 3D reconstruction

 features2d - salient feature detectors, descriptors, and descriptor matchers

 objdetect - detection of objects and instances of the predefined classes (for example, faces, eyes, mugs, people, cars, and so on)

 highgui - an easy-to-use interface to video capturing, image and video codecs, as well as simple UI capabilities

 gpu - GPU-accelerated algorithms from different OpenCV modules

It is illustration of installation procedure of OpenCV library and how it can be integrated with Microsoft Visual Studio 2010

 Download OpenCV2.4.2 at www.sourceforge.net/projects/opencvlibrary/

The folder C:\OpenCV2.4.2 contains the source files for OpenCV, which must be built for a specific development environment, namely Microsoft Visual Studio 2010 To generate the necessary build files for this environment, the CMake utility can be utilized.

 Download and Install CMake utility from (www.cmake.org/files/v2.8/cmake- 2.8.2-win32-x86.exe)

 Open CMake and select the source directory for OpenCV source files i.e( C:\OpenCV2.0) Select the build directory, for instance C:\OpenCV2.4.2\Build

 Once source and build directories are selected Press Configure button and specify generator Microsoft Visual Studio 10 and hit finish

After clicking the finish button, CMake scans all source files and lists the variables utilized within them, initially highlighted in red To resolve any issues, continue pressing Configure until all red entries are eliminated.

 Press Generate button and build files for Microsoft Visual Studio 2010 will be generated in C:\OpenCV2.4.2\Build

 After verifying that build files have been generated, start Microsoft Visual Studio

2010 and open solution C:\OpenCV2.4.2\Build\OpenCV.sln and build this solution, if build is successful it will generate required binaries of OpenCV and they are done.

Test results for exact copy actions

In this experiment, we utilize an unaltered trademark image from the testing trademark database as the query trademark Figure 5 illustrates the retrieval results of our method, showcasing the five most similar trademarks retrieved in descending order of similarity based on the input query image displayed in the Query Image column.

Our method effectively identifies exact trademark matches within our database and also showcases visually similar trademarks from the testing trademark database, as illustrated in Fig 7.

Test results for scaling action

In this experiment, we rescale the query image at scaling ratio 0.7, 0.6, 0.5, and 0.4

Figure 8 illustrates that scaling actions above a rate of 0.5 are accurately identified However, the test result for a scaling rate of 0.4 is incorrectly recognized, as the original trademark image is absent from the five retrieved images.

Fig 8 Result for scaling tests

To assess the impact of scaling actions on recognition rates, we applied various scaling ratios to all trademark images in the testing database The accuracy rate was determined by calculating the ratio of correct recognitions to the total number of tests As shown in Table I, the accuracy rate decreases significantly when the scaling ratio falls below 0.5.

TABLE I ACCURACY RATE OF SCALING ACTION

We assess our approach by analyzing a trademark that has been scaled to 0.7 and translated to the corners of the input image This imitation process achieves a high accuracy rate, as demonstrated in Fig 9, where all query image results are accurately recognized.

Test case Query image Retrieved trademark images

Fig 9 Results for translation and scaling tests

Test results for rotating actions

To assess the effectiveness of our proposed method for trademark rotating copy actions, we rotated the query trademark by 30 degrees for each evaluation The results, illustrated in Fig 10, demonstrate that all cases were accurately recognized.

Degree Query image Retrieved trademark images

Fig 10 Results for rotation tests

Test results for mirror actions

The mirror copy actions were evaluated by vertically flipping the query trademark, with several results illustrated in Fig 11 All cases presented were accurately recognized.

Fig 11 Results for mirror tests

Test results for partial copy actions

Partial copy actions involve taking a segment of a trademark and merging it with others to create a new trademark To assess our proposed method for these copy actions, we generate a query image that reflects the same approach This query image includes at least one component of a trademark from the testing image database Figure 12 illustrates the query results for various test cases related to this partial copy action.

1 Combine two parts of two trademarks: no modification to any parts

2 Combine two parts of two trademarks: rotate one part 30 o and rescale the other at scale rate 0.7

3 Combine two parts of two trademarks: rotate one part 30 𝑜 and horizontally flip the other

4 Combine two parts of two trademarks: vertically flip one part and rescale the other at scale rate 0.7

5 Combine one part of a trademark with arbitrary shape

Figure 12 illustrates that all instances of partial trademark copies are accurately identified, with the retrieved trademarks featuring elements of the queried trademark image appearing prominently at the top of the results list.

Fig 12 Results for parital copy tests

Test results for random query trademark

Trademark Query image Retrieved trademark images

Fig 13 Results for random tests

In this experiment, we gathered well-known trademarks to serve as query trademarks As illustrated in Fig 13, all retrieved trademarks exhibit visual similarities to their corresponding query trademarks Notably, the query trademark images are absent from the testing trademark database, ensuring that only trademarks with comparable visual shapes to the query trademarks are included in the results.

Testing summary

Table II presents the accuracy percentages of various experiments, including actions such as exact copy, translating, rotating, scaling, and mirroring For each action, trademarks from the testing database were individually used as query trademarks, and the recognition accuracy was calculated by dividing the number of correct recognitions by the total number of tests The proposed method achieved an impressive average accuracy rate of 96.8% (with a scale ratio greater than 0.5), demonstrating its robustness across multiple trademark copy actions and its suitability for real-world content-based trademark recognition systems.

TABLE II RESULTS OF OUR PROPOSED METHOD

Conclusion

This thesis introduces an innovative trademark image retrieval method that begins by isolating the dominant visual shapes of input query trademarks Each shape is transformed into a feature vector using a scale-, rotation-, and translation-invariant algorithm, enabling accurate similarity measurement between trademarks The retrieval process identifies the five most similar trademark images from a predefined database Extensive experiments were conducted to simulate various trademark copying actions, including exact copying, scaling, rotating, and mirroring The results demonstrate the robustness and practical applications of the proposed method across these different trademark manipulation scenarios.

[1] C.K Chen, Q.Q Sun and J.Y Yang, “Binary Trademark Image Retrieval Using Region Orientation Information Entropy,” International Conference on Computational Intelligence and Security Workshops, Harbin, China, vol 167, pp 295-298, Dec 2007

[2] Z Hong and Q Jiang, “Hybrid Content-based Trademark Retrieval using Region and Contour Features,” 22nd International Conference on Advanced Information Networking and Applications Workshops, Okinawa, Japan, vol 82, pp 1163-1168, March 2008

[3] B.J Zou and M.P Umugwaneza, “Shape-based Trademark Retrieval using Cosine Distance Method,” Eighth International Conference on Intelligent Systems Design and Applications, Kaohsiung, Taiwan, vol 161, pp 498-503 (2008)

[4] L Li, D Wang and G.Cui, “Trademark Image Retrieval using Region Zernike Moments,” Second International Symposium on Intelligent Information Technology Application, Shanghai, China vol 330, pp 301-305, Dec 2008

[5] You Fu-cheng and Z Cong, “The Technique of Color and Shape-based Multi- feature Combination of TradeMark Image Retrieval,” International Conference on Multimedia Technology (ICMT), Ningbo, China, pp 1-5, Oct 2010

[6] W.W Lai, W.Y.N Wing, P.K.C.Patrick and S.Y Daniel, “Trademark classication by shape using ensemble of RBFNNS,” International Conference on Machine Learning and Cybernetics (ICMLC), Qingdao, China, vol 1, pp 391-396 July 2010

[7] T Iwanaga, H Hama, T Toriu, P Tin and T.T Zin, “A ModiedHistogramApproach to Trademark Image Retrieval,” International Journal of Computer Science and Network Security, vol 11, pp 56-62 April 2011

[8] K Ranjeet, R.C Tripathi and M.D Tiwari, “A Comprehensive Study on Content based Trademark,” International Journal of Computer Applications, vol 13, pp 18-

[9] V N Akriti, K.G Arpit and R.C Tripathi, “Content based Trademark Retrieval by Integrating Shape with Colour and Texture Information,” International Journal of Computer Applications, vol 22, pp 40-45, May 2011

[10] Z WANG and K.HONG, “A Novel Approach for Trademark Image Retrieval by Combining Global Features and Local Features,” Journal of Computational Information Systems, vol 8, pp 1633-1640, April 2012

[11] N Otsu, “A Threshold Selection Method from Gray-Level Histograms,” IEEE Transactions on Systems, Man, and Cybernetics, vol 9, pp 62-66, Jauary 1979

[12] S SUZUKI and K ABE, “Topological Structural Analysis of Digitized Binary Images by Border Following,” Computer Vision, Graphics, and Image Processing, vol 30 , pp 32-46, April 1985

[13] D.L Antonella, M Giovanni and A.S James, “A Rotation and Scale Invariant Descriptor for Shape Recognition,” Proceedings of 2010 IEEE 17th International Conference on Image Processing, Hong Kong, China, vol 17, pp 257-260, September 2010

[14] Q.Q SUN, C.K CHEN and Y.J LIU, “Binary Trademark Image Retrieval Using Distance Distribution Information Entropy,” Computer Engineering and Applications, vol 43, pp 71-73, 2007

[15] S M Smith and Brady, “SUSAN - A new Approach to Low Level Image Processing,” International Journal of Computer Vision, vol 23, 1995

[16] B.J Zou, Y.Y Yao and L Zhang, “A New Algorithm for Trademark Image Retrieval Based on Sub-block of Polar Coordinates,” Entertainment Computing, Issue: ICEC, Shang Chai, China, vol 4740, pp 91-97, September 2007

[17] M , Zhou G., Geng, Wei na, “How to Adjust the Weights in Image Retrieval” The Technique of Image Retrieval, pp 211-215 2007

[18] C.S Rao, S.S Kumar and B.C Mohan, “Content Based Image Retrieval Using Extract Legendre Moment And Support Vector Machine,” The International Journal of Multimedia Its Applications, vol 2, pp 69-79, May 2010

[19] B.S Reddy and B.N Chatterji, “An FFT-Based Technique for Translation Rotation, and Scale-Invariant Image Registration,” IEEE Transactions on Image Processing, vol 5, pp 1266-1271, Aug 1996

[20] G Wolberg and Zokai, “Robust Image Registration Using Log-polar Transform,” International Conference on Image Processing, Vancouver, Canada, vol 1, pp 493-

[21] V.J Traver and F Pla, “The Log-polar Image Representationr in Pattern Recognition Tasks,” Pattern Recognition and Image Analysis, vol 2652, pp 1032-

[22] J.N Sarvaiya, S Patnaik and K Kothari, “Image Registration Using Log Polar Transform and Phase Correlation to Recover Higher Scale,” Journal of Pattern Recognition Research, vol 7, pp 90-105, February 2012

[23] T.H Huy and R Goecke, “Optical Flow Estimation Using Fourier Mellin Transform,” IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, vol 78, pp 1-8 June 2008

In his 2009 Ph.D dissertation at Ohio State University, R Matungka explores the Log-Polar Transform for image registration, emphasizing enhancements through adaptive sampling and the use of logarithmic spirals The full dissertation is accessible online at OhioLink.

[25] X Guo, Z Xu, Y Lu and Y Pang, “An Application of Fourier-Mellin Transform in Image Registration,” International Conference on Computer and Information Technology, Shanghai, China, vol 5, pp 619-623, Sept 2005

[26] Y., Keller, A Averbuch and M Israeli, Moshe, “Pseudo-polar Based Estimation of Large Translations Rotations and Scalings in Images,” IEEE Transactions on Image Processing, vol 14, pp 12-22, Jan 2005

[27] N.G Tuan, L.T Ha and N.T.N Thanh, “Text Recognition Method for Trademark Images,” The 12th International Conference on Electronics, Information, and Communication (ICEIC 2013), vol 267, pp 201-203, 2013

[28] [Online] Available: http://www.noip.gov.vn/, November 2012

[29] [Online] Available: http://www.math24.net/greens-formula.html

[30] C.C Chen, H.T Chu, “Similarity Measurement Between Images”, Computer Software and Applications Conference, vol.2, pp 41-42,July 2005

[31] Bryan S Morse, “Lecture note Thresholding”, Brigham Young University, January,

[32] G.Bradski, A.Kaehler, “Learning OpenCV”,Sep 2008

[33] K.Ranjeet, R.C.Tripathi, M.D.Tiwari, “A Comprehensive Study on Content Based Trademark Retrieval System”, International Journal of Computer Applications, vol

[34] Open Source Computer Vision Library: http://opencv.org

This section presents the source code for the thesis, organized into two main parts: the first part includes a variety of implemented functions, while the second part comprises the main program that utilizes these functions.

Images are scaled down with the smaller side of 300 pixels and converted into gray scales The images are then converted into binary trademark using Otsu’s.

Input All trademark images in database

Idea Apply resize function in openCV

Source code void RESIZEIMAGE(Mat &src,Mat&dst) { double a=(double)300/src.rows,b=(double)300/src.cols; double r=src.rowssrc.cols?a:b; resize(src,dst,Size(),r,r,1);

Using Suzuki's algorithm, connected contours are extracted from binary images, revealing that dominant shape contours typically possess significantly larger areas than noise contours.

Input Binary trademark images in previous stage

Output A number of shape object images

The presence of noise in input trademark images often leads to the extraction of unwanted noise contours as shape objects To address this issue, a filtering process is implemented to eliminate these noise contours Our analysis of the trademark database reveals that most trademarks feature one or two dominant shapes Consequently, we propose an algorithm designed to extract up to two dominant shape objects from a binary image This algorithm consists of four main steps and utilizes a function called FilterContours, which effectively removes the two dominant shape objects The FilterContours operation relies on two specific thresholds: T1 = 3.82 and T2 = 81707.

Source code void EXTRACTOBJ(Mat &src,Mat&obj1,Mat &obj2) { vector contours; vector hierarchy; contours.clear(); hierarchy.clear();

Mat bw; threshold(img,bw,1,255,THRESH_OTSU); findContours( bw, contours, hierarchy, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE );

Scalar color( (rand()&255), (rand()&255), (rand()&255) ); for(size_t i=0;i81707)

{ obj1 = Mat::zeros(img.size(), CV_8UC1); drawContours( obj1, contours, l-3, color, -1, 8, hierarchy,1 ); threshold(obj1,obj1,1,255,THRESH_OTSU);

In this code snippet, the first operation initializes a zero matrix, òbj1`, with the same size as the input image and a single channel (CV_8UC1) It then draws contours on òbj1` using the specified color and hierarchy, followed by applying Otsu's thresholding to create a binary image Similarly, a second zero matrix, òbj2`, is created, contours are drawn on it, and Otsu's thresholding is applied again, resulting in another binary image.

Scale, rotation, translation invariant features extraction

Description For each extracted shape object, a corresponding feature vector is created.

Input Each extracted shape objects

Output Scale-invariant rotation-invariant, and translation-invariant features

Apply RBRC algorithm in [13] which is composed of three steps: a two- dimensional Fourier transform (𝐷𝐹𝑇), the magnitude of the Fourier represented into polar coordinates, and 𝐷𝐹𝑇.

Source code void DFT( Mat &src,Mat&dst) { int M = getOptimalDFTSize( src.rows ); int N = getOptimalDFTSize( src.cols );

Mat padded; copyMakeBorder(src, padded, 0, M - src.rows, 0, N - src.cols, BORDER_CONSTANT, Scalar::all(0));

Mat planes[] = {Mat_(padded), Mat::zeros(padded.size(), CV_32F)};

To process an image, first, create a complex image by merging the planes Next, apply the discrete Fourier transform (DFT) to the complex image and then split it back into its individual planes Calculate the magnitude of the first two planes and store the result in the first plane Enhance the output by adding a scalar value of one to avoid logarithm of zero, followed by applying the logarithm function Finally, crop the resulting image to even dimensions and determine the center coordinates of the processed image.

Mat q0(dst, Rect(0, 0, cx, cy));

Mat q1(dst, Rect(cx, 0, cx, cy));

Mat q2(dst, Rect(0, cy, cx, cy));

Mat q3(dst, Rect(cx, cy, cx, cy)); q0.copyTo(tmp); tmp.copyTo(q3); q1.copyTo(tmp); q2.copyTo(q1); tmp.copyTo(q2); normalize(dst, dst, 0, 1, CV_MINMAX);

} void POLARTRANSFORM(Mat &src,Mat&dst) {

CvMat *dst1=cvCreateMat(360,425,CV_32FC1); cvLogPolar( src1, dst1, cvPoint2D32f(src1-

>cols/2,src1->rows/2), 78,CV_INTER_LINEAR+CV_WARP_FILL_OUTLIERS); dst=cvarrToMat(dst1,true);

} void RBRC(Mat &src,Mat&dst) {

Matching by measure of similarity and retrieval Trademark Images

Description In order to recognize the copied trademark image, we derive a trademark similarity measure based on its feature vectors

Output Most five similar trademark images

After generating feature vectors, a trademark input image is represented by one or two feature vectors Let 𝐼 and 𝐼 ′ represent two trademark images, with their corresponding feature vectors denoted as 𝐹 𝑖 and 𝐹 𝑗, where 𝑖 = 1, 2 and 𝑗 = 1, 2 We propose that the similarity degree between the two trademarks, denoted as 𝑆(𝐼, 𝐼 ′), is determined by the smallest distance between the feature vectors, specifically the distance between one vector from set (𝐹 𝑖) and one from set (𝐹 𝑗), represented as 𝑑𝑖𝑠𝑡(𝐹 𝑖, 𝐹 𝑗) To compute this distance, we utilize the Euclidean distance formula.

Source code void VECTORDISTANCE(vectorfeaturevectors,Mat&padded,vect or&distance)

RBRC(padded,dst); for(size_t i=0;i

Định dạng
Số trang	64
Dung lượng	1,93 MB