Plant identification
Manual plant identification
In manual plant identification approach, botanists observe a plant, use different plant characteristics to identify species based on dichotomous key with technical terms.
Dichotomous keys are essential tools in botanical classification, allowing users to answer a series of questions about an unknown tree's attributes to identify its species This manual approach, commonly used in the botany community, is illustrated by a group of students engaged in plant identification However, it is often time-consuming and requires precise answers, making it less accessible for the general public due to the complexity of botanical terminology.
Figure 1.1 Botany students identifying plants using manual approach [13].
Table 1.1 Example dichotomous key for leaves [14].
1 a Needle leaves go to 2 b Non-needle leaves go to 3
2 a Needles are clustered Pine b Needles are in singlets Spruce
3 a Simple leaves (single leaf) go to 4 b Compound leaves (made of “leaflets”) go to 7
4 a Smooth edged go to 5 b Jagged edge go to 6
5 a Leaf edge is smooth Magnolia b Leaf edge is lobed White Oak
6 a Leaf edge is small and tooth-like Elm b Leaf edge is large and thorny Holly
7 a Leaflets attached at one single point Chestnut b Leaflets attached at multi point Walnut
Plant identification based on semi-automatic graphic tool
In the above approach, the use of scientific terms is very difficult to remember.
Plant identification systems designed for non-botanists must prioritize user-friendliness and intuitiveness One effective method is the use of icons to represent terms, making the system accessible to a wider audience IDAO exemplifies this approach by employing a Graphic-User Interface (GUI) that allows users to describe biological features through iconic symbols Users can select specific organ features by clicking on corresponding icons, leading to a list of predicted plants based on their selections This visual and interactive method enhances the overall user experience.
The main graphical interface of IDAO features icons that represent the characteristics of leaves, fruits, and flowers While this language-independent design enhances accessibility, it also has drawbacks, such as the time-consuming process of selecting suitable icons and the limited number of available icons, which may not adequately represent all plant species.
Automated plant identification
The rapid advancement of technology, including mobile devices, digital cameras, and powerful computers, has made automated plant identification from images increasingly feasible Recent studies have focused on extracting visual characteristics from plant images and predicting plant names using image processing, computer vision, and machine learning techniques Input for these methods can include images or sets of images of various plant organs Research by Bonnet et al indicates that while automatic plant identification may not match expert-level accuracy, it performs significantly better than that of beginners in plant taxonomy Successful applications like Pl@ntNet and Leafsnap allow users to upload plant images and receive a list of predicted species, making the identification process simple, fast, and intuitive Despite these advancements, challenges remain, and the accuracy of automated identification still falls short of user expectations.
Plants possess various organs, each contributing uniquely to their identification Current methodologies for plant identification can be categorized into two main types: those utilizing images of single organs and those employing images of multiple organs The subsequent sections will provide a comprehensive analysis of the proposed methods for both single-organ and multiple-organ plant identification.
Figure 1.3 Snapshots of Leafsnap (left) and Pl@ntNet (right) applications.
Automatic plant identification from images of single organ
Introducing the plant organs
Among different organ of plants, the most widely used organs for automatic plant identification that are ranked by decreasing order are leaf, flower, fruit, branch, stem and entire.
Leaf: among different organs of a plant, the leaf is the most widely used because leaf usually exists in a whole year, numerous in number, easily collected and is usually
A review of 120 studies on plant identification reveals that 107 focus on leaf organs, while only 15 examine other plant parts like flowers and fruits Most research utilizes leaf images captured against simple backgrounds, with only 12 studies addressing complex backgrounds This has led to the creation of extensive datasets, such as Flavia, ICL, and Swedish, which predominantly feature leaf scans Experimental findings indicate that leaf scans yield superior results compared to other plant organs There are two primary types of leaves: single leaves and compound leaves, the latter consisting of multiple leaflets However, challenges arise in plant identification using leaves, as variations can occur due to environmental factors and developmental stages.
Figure 1.4 Some types of leaves: a,b) leaves on simple and complex background of the
Acer pseudop latanus L, c) a single leaf of the Cercis siliquastrum L, d) a compound leaf of the Sorbus aucuparia L.
Flowers are a prominent organ in plant identification due to their distinct appearances, which remain stable regardless of weather, plant development, or other factors They can be categorized into two types: single flowers and inflorescences According to botanical experts, flower images serve as a valuable resource for identifying plants, with experimental results indicating that flowers are often recognized more accurately than other plant organs, even in complex backgrounds Key characteristics extracted from flower images, such as color, symmetry, petal count, and size, play a crucial role in plant identification, although challenges still exist in this process.
flower organ: First, flowers usually exist for a short time such as a few days or a few weeks in a year There exist some plants that have not blossomed in several yeas.
Second, the color and 3D shape of the flower of the same plant may vary significantly.
In addition to leaves and flowers, other plant organs such as fruits, stems, branches, and even entire plants play a crucial role in plant identification However, research focusing on these specific organs remains limited due to various challenges associated with their analysis Notably, studies involving stem images highlight the complexities faced in this area of research.
Figure 1.5 illustrates various types of flower inflorescences, showcasing how flowers are structured and interconnected on the plant The characteristics of plant stems vary significantly with the age of the tree, as depicted in Figure 1.6, which presents examples of stems at different developmental stages Identifying plants based solely on the stem can be challenging, as branch images often include other organs like leaves and flowers, while entire plant images are typically captured from a distance.
Figure 1.7 and Figure 1.8 show some examples of branch and entire images.
In automatic plant identification from images, leaves and flowers are the most commonly utilized organs While stems, fruits, branches, and other plant parts can also aid in identification, their lower discriminative power often necessitates their combination with leaves and flowers to enhance overall identification accuracy.
Figure 1.6 The visual diversity of the stem of the Crataegus monogyna Jacq.
Figure 1.7 Some examples branch images.
Figure 1.8 The entire views for Acer pseudoplatanus L.
General model of image-based plant identification
Various methods for image-based plant identification follow a common model, which consists of three key steps: preprocessing, feature extraction, and classification/prediction.
Image-based plant species identification begins with analyzing images of plant organs Effective image preprocessing is crucial as it enhances image quality, highlighting relevant features for subsequent analysis This step significantly boosts the likelihood of achieving accurate identification results Key tasks involved in this process include enhancing image quality, normalizing images, and segmenting them for better analysis.
Figure 1.9 Fundamental steps for image-based plant species identification.
Feature extraction is the process of transforming input data into a set of features.
Feature extraction plays a crucial role in image identification systems by highlighting the unique characteristics of input images This process not only minimizes the amount of information presented but also enhances the distinctiveness of the features Selecting the right features is essential for achieving high identification accuracy In the context of plant species representation, both handcrafted and deep-learned features can be utilized effectively.
Training involves utilizing analytical techniques to process data and categorize samples into distinct groups This process frequently employs machine learning methods Additionally, in certain deep learning approaches, feature extraction and training are integrated into a single network.
In the following sections, we will analyze in detail three main steps: preprocessing,feature extraction and training methods.
Preprocessing techniques for images of plant
When working with plant images, it is common for them to include additional objects or backgrounds that may obscure the plant organs of interest To effectively extract plant characteristics, preprocessing techniques such as image segmentation are typically employed The primary goal of image segmentation is to isolate the region of interest (ROI) from the background, a task that becomes particularly challenging with complex backgrounds Depending on user requirements, image segmentation can be categorized into automatic segmentation, which requires no user intervention, and semi-automatic or interactive segmentation that involves some level of user manipulation.
The study in [23] presents an automatic marker-controlled watershed segmentation method for effectively isolating leaves from complex backgrounds This innovative approach involves the automatic selection of markers, enhancing the accuracy and efficiency of plant image segmentation.
The Otsu thresholding method, combined with erosion operations, is effective for segmenting large leaf images that are centered within the frame While an automatic segmentation approach utilizing spectral and spatial techniques is presented in [22], it struggles with leaves exhibiting multiple dominant colors Additional automated segmentation techniques for complex leaves and leaflets, employing the Otsu algorithm and k-means clustering, are discussed in [11, 21] Moreover, segmentation methods extend to flower images, with [24] utilizing color clustering and domain knowledge for automatic flower region identification, and [25] implementing fully-object segmentation through propagation.
Automatic segmentation methods offer the benefit of requiring no user involvement, often achieving effective results when the objects to be segmented are centrally located or occupy a substantial portion of the image However, their performance may falter in complex background scenarios, where reliable segmentation cannot be assured Additionally, recent advancements in deep learning techniques, such as Mask R-CNN, have shown promise in improving segmentation outcomes.
R-CNN allow to produce while good result for image segmentation However, these techniques requires specific computation resources.
Semi-automatic plant image segmentation involves users providing cues to guide the segmentation process Various methods, including GrabCut, have been developed for segmenting plant images effectively.
Interactive mean shift, guided active contour, and watershed methods are commonly used in image segmentation In this process, the user outlines a specific region within the leaf, allowing for the estimation of the leaf's color model Subsequently, the distance of each pixel from this model is calculated, leading to the application of segmentation techniques that utilize polygon model evaluation and active contour segmentation for accurate results.
A framework for analyzing compound leaves is proposed, focusing on estimating the number and shape of leaflets through a segmentation technique that utilizes multiple region-based active contours Additionally, a semi-automated approach based on the GrabCut method is developed, which highlights the significance of dataset preprocessing by demonstrating that segmentation yields the highest identification results compared to non-preprocessed and cropped datasets Further advancements include interactive segmentation of flowers using color models and graph-cut techniques, as well as the application of a marker-controlled watershed segmentation method to accurately determine leaf regions Notably, this marker-based watershed transform achieved the best results in the ImageCLEF 2012 plant identification competition, showcasing the effectiveness of these segmentation methods for complex background plant images.
In the ImageCLEF 2012 competition, teams utilizing automatic segmentation methods, indicated in red, displayed varying accuracy levels compared to those employing semi-automatic techniques, shown in blue The experimental findings reveal that interactive segmentation frequently outperforms automatic segmentation, leading to superior results.
Figure 1.10 Accuracy of plant identification based on leaf images on complex back- ground in the ImageCLEF 2012 [21].
Segmentation methods are employed to extract a Region of Interest (ROI) and compute the minimum bounding box of the target object Leaf pre-processing includes geometric transformation and petiole removal, which standardizes leaf orientation with the petiole positioned below and the leaf apex above Removing the petiole enhances the performance of the leaf recognition system Additionally, images are rotated and cropped to focus on a single leaf, ensuring its apex is oriented at the top This thesis leverages interactive segmentation for effective leaf-based plant identification amidst complex backgrounds.
Feature extraction
Feature extraction methods are divided into two main categories: hand-designed features and deeply-learned features Hand-designed features rely on expert knowledge for selecting the most relevant characteristics for a specific problem, whereas deeply-learned features utilize machine learning techniques to automatically learn and extract features from plant images.
Hand-designed features are essential in plant identification, encompassing color, texture, shape, and organ-specific characteristics Notably, leaves exhibit unique features such as vein structure, margin, and tooth According to Table 1.2, leaf shape is the most significant factor in identification, while flower identification relies heavily on both shape and color For stems, texture features are commonly utilized The table also indicates that researchers frequently combine multiple feature types for each plant organ, as no single feature is sufficient to distinguish all categories effectively.
In [32] the authors extracted a different type of features such as HSV values,
The study utilized MR8 filter, Scale-Invariant Feature Transform (SIFT), and Histogram of Oriented Gradients (HOG) on a dataset of 17 flower categories, followed by the application of an SVM classifier with various linear weighted kernels to optimize feature selection on a larger dataset of 102 species, achieving a high recognition rate Additionally, color and shape features were extracted from flower images, with Principal Component Analysis (PCA) employed for species discrimination Other researchers extracted HOG features and applied SVM for classification, while another study introduced a flower image retrieval tool based on Regions of Interest (ROI), leveraging color histograms and two shape-based features: Centroid-Contour Distance and others.
The Angle Code Histogram is utilized to characterize the shape features of flowers, evaluated using a dataset of 14 plant species Aakif et al focused on extracting three feature types: morphological features (aspect ratio, eccentricity, roundness, and convex hull), Fourier descriptors, and shape-defining features, achieving 96% accuracy with an artificial neural network (ANN) on both Flavia and ICL datasets Another study proposed extracting geometrical and moment features from leaf contours for plant identification Du et al utilized the Pyramid of Histograms of Orientation Gradient (PHOG) to extract leaf shape information, beginning with Canny edge detection and calculating orientation gradients at various pyramid levels Additionally, a novel contour-based shape descriptor was introduced, which is invariant to translation, rotation, scaling, and symmetry, offering ease of implementation and speed Lastly, a study employed leaf vein and shape features for plant leaf identification, extracting 21 leaf features and achieving an accuracy of 97.19% based on a leaf image dataset.
32 plant species In [38], Mounie et al combined leaf salient points and leaf contour
Table 1.2 Methods of plant identification based on hand-designed features.
-year Organ Segmentation Features Classifier
[23], 2008 Leaf Automatic maker-controlled watershed
Zernike moments Moving center hypersphere
[28], 2012 Leaf Mode based leaf segmentation Shape Random Forest
[44], 2012 Leaf - Contour-based shape descriptor Nearest neighbor
GrabCut Complex network, tooth, morphological Random Forest
Watershed Shape, texture, color Support Vector
[50], 2014 Leaf Effective robust kernelized fuzzy
C-means Geometric, Tooth Support Vector
Morphological features, shape defining feature, Fourier descriptor
[51], 2015 Leaf Threshold Shape, morphology, texture, color Support Vector
[52], 2015 Leaf - Histograms of Curvature over
Scale,hand-crafted shape, statistical features Random Forest
[53], 2015 Leaf - Multiscale-arch-height descriptor Nearest neighbor
[54], 2015 Leaf Mixture-of-Gausians Inner-distanceshapecontext Nearest neighbor
[55], 2016 Leaf - Relative sub-image sparse coefficient,gray level co-occurrence matrix Nearest neighbor
Haralick texture, gabor features, shape, color histograms, co-occurrence matrices
Entropy sequence, Zernike moments, Hu’s invariants aspect ratio, rectangularity, form factor and circularity
Feature extraction with Convolutional neural network
[57], 2006 Flower Graph cut Color, shape, texture Nearest neighbor
[58], 2008 Flower Graph cut Colour, histogram of gradient orientations, SIFT Support vector machine
[18], 2011 Flower Interactive Color and shape features of the whole flower region / pistil/stamen area
[59], 2010 Fruit - Colour, texture Minimum distance classifier
[19], 2012 Fruit Split-and-merge algorithm Colour histogram, texture, shape Support vector machine
The Radial Basis Probabilistic Network and Support Vector Machine (SVM) methods are enhanced through a late fusion approach The kernel descriptor introduced by Bo et al has demonstrated its effectiveness in various object recognition tasks Recently, Le et al applied this kernel descriptor for leaf identification, subsequently utilizing the SVM classifier Experiments conducted on two datasets involving 32 species yielded significant results.
The Flavia dataset and a subset of ImageCLEF 2013, featuring simple background leaf images of 126 species, demonstrate identification accuracies of 97.5% and 58.0%, respectively The proposed descriptor surpasses existing state-of-the-art methods like SURF, achieving excellent results with the Flavia dataset However, the performance on ImageCLEF 2013 is hindered by the diverse species and complex backgrounds While Le et al.'s method excels with simple backgrounds, it struggles with scale and rotation invariance This thesis aims to address these limitations by applying the method to complex background leaf images.
In conclusion, hand-designed features demonstrate promising results across various datasets; however, their accuracy significantly declines when applied to a large number of species or complex background images.
Deep learning, a specialized subset of Machine Learning, has gained significant popularity over the last decade due to advancements in computing power and the availability of vast amounts of data, including images, audio, and text from the internet This technique leverages Convolutional Neural Networks to effectively learn feature representations.
Convolutional Neural Networks (CNNs) have achieved significant success in various areas of computer vision, including object detection, segmentation, and image classification Unlike traditional methods, CNNs can automatically learn features from data without the need for predefined feature extraction They build a hierarchical representation of features, starting from basic elements like lines and curves in the initial layers, and progressively extracting more complex components in subsequent layers This multi-layered approach enables computers to comprehend intricate data, making CNNs a cornerstone of deep learning techniques.
Prominent architectures of Convolutional Neural Networks (CNNs) include AlexNet, VGG, GoogLeNet, and ResNet A typical CNN consists of several key components: a convolutional layer, a non-linear activation layer known as ReLU (Rectified Linear Unit), pooling layers, and fully-connected layers.
Convolutional layer: This is the most important component in the CNN, which also represents the idea of building local connections instead of connecting all the
Figure 1.11 Feature visualization of convolutional net trained on ImageNet from [61].
A Convolutional Neural Network (CNN) architecture processes input images by applying small-sized filters to local pixel values, resulting in a series of output images that correspond to the applied filters This convolution operation effectively captures local features within the image, enabling the network to learn and recognize patterns.
filters are randomly generated at the beginning and will be adjusted gradually throughout the training process.
The non-linear activation layer, primarily utilizing the ReLU function, is crucial for introducing non-linearity into a training model following linear computations in the convolution layer ReLU, along with other non-linear activation functions like Sigmoid and Tanh, effectively constrains output values Its advantages include straightforward implementation and rapid processing speed, making it a preferred choice for efficient calculations The ReLU function operates by converting all negative values to zero, resulting in an output map that retains the same dimensions as the input from the previous layer, with the only difference being the replacement of negative values with zeros.
The pooling layer is a crucial component in Convolutional Neural Networks (CNNs), typically positioned after the convolution and ReLU layers Its primary function is to reduce the size of the output map while retaining essential information from the input map, which helps decrease the number of parameters and enhance computational efficiency Similar to the convolution layer, the pooling layer employs a sliding window to scan the image, but instead of performing convolution, it samples the data by selecting a representative value from each mapped area Common types of pooling include max-pooling, min-pooling, and average-pooling.
The fully connected layer operates similarly to those in traditional neural networks, where every pixel is interconnected with the nodes in the subsequent layer However, this layer significantly reduces the size of input maps while preserving essential identification information As a result, the process of identity calculation using a linear transmission model becomes simpler and more efficient compared to traditional neural networks.
Deep learning methods typically require extensive training data due to the need to optimize millions of parameters However, real-world scenarios often lack sufficient data, making it challenging to train models effectively To address this issue, transfer learning has emerged as a popular solution, allowing practitioners to utilize knowledge gained from previously trained models to tackle new problems without starting from scratch A prevalent approach involves leveraging pre-trained Convolutional Neural Networks (CNNs) that have been trained on large datasets, such as ImageNet, which includes 1.2 million images across 1,000 classes, to enhance model performance even with smaller datasets.
Training methods
Training is a crucial step in data mining, enabling the development of classification functions and predictive models based on existing data Common training methods utilize machine learning techniques, including support vector machines (SVM), K-nearest neighbor (KNN), random forest (RF), probabilistic neural networks (PNN), moving media center hyper spheres (MMC), artificial neural networks (ANN), learning vector quantization, genetic algorithms, moving center hyper spheres classifiers (MCH), Bayesian classifiers, and fuzzy relevance vector machines (FRVM).
These classification methods applied for plant identification are listed in Table 1.2.
Among different classifiers, the SVM is the most widely used thanks to its superior performance The following section I will detail this classification method.
Support Vector Machine (SVM) is a supervised learning classification algorithm introduced by Corinna and Vapnik in 1995 Originally developed for binary classification, SVM has since been adapted for multi-class classification tasks It is widely recognized as one of the most effective classification methods in computer science, particularly excelling in handling large datasets and high-dimensional spaces.
Linear Support Vector Machines (SVM) aim to classify data by identifying a hyperplane that effectively separates samples into two distinct classes based on their labels Given a training dataset of samples (x1, y1), , (xn, yn), where each sample xi exists in a p-dimensional space and yi denotes its corresponding class label, the objective is to maximize the margin—the distance between the hyperplane and the nearest data points In two-dimensional space, this hyperplane is represented as a straight line, while in three-dimensional space it takes the form of a plane, and generally, it is a hyperplane in an R^(d-1) dimensional subspace within R^d The mathematical representation of this hyperplane can be expressed as the set of points x that satisfy the equation w x + b = 0.
Where w is the normal vector to the hyperplane The parameter b/ k w k is defined as the distance from the hyperplane to the origin of the coordinate according to the normal vector w.
There are two types of margins in Support Vector Machines (SVM): hard-margin and soft-margin A hard margin is applicable when the training data is linearly separable, allowing for the selection of two parallel hyperplanes that effectively separate the two classes while maximizing the distance between them This distance is referred to as the "margin," with the hyperplane positioned between the two being known as the maximum-margin hyperplane The closest data points to this hyperplane are termed support vectors.
Hyperplanes are defined by the equations w x + b = 1 and w x + b = -1, which separate data points into two distinct classes Points on or above the boundary w x + b = 1 are classified as belonging to class 1, while those on or below w x + b = -1 are classified as class -1 The distance between these boundaries determines the length of the margin.
To maximize the distance while minimizing the value of \( w \), specific constraints must be established to ensure that data points do not encroach upon the margin For instances where \( y_i = 1 \), the condition \( w \cdot x_i + b \geq 1 \) must hold true, while for \( y_i = -1 \), the condition \( w \cdot x_i + b \leq -1 \) must be satisfied.
Figure 1.13 Hyperplane separates data samples into 2 classes.
Combining the two formulas we have [73]: yi(w xã i+ b) 1≥ , i = 1 , ,n (1.4)
So the optimal problem is: minimizing w subject to yk k i(w xã i+b) 1, = 1≥ i , , n
Once the SVM classifier model has been obtained, with each sample can be simply classified by using the sign function according to the following formula [71]: sign(w x + b)ã (1.5)
When a hard margin hyperplane cannot be established, the soft-margin technique is employed This method introduces additional variables, ξ_i, to quantify the deviation of misclassified samples, allowing them to be adjusted closer to their respective labels The relationship is expressed mathematically as yi(w · x_i + b) ≥ 1 - ξ_i for i = 1, , n.
Nonlinear classification enables the mapping of feature vectors into higher-dimensional spaces, allowing for linear classification in this transformed environment This process is facilitated by a function Φ that transforms data from d-dimensional space to a feature space, represented as Φ: F R^d → F Consequently, the original training samples (xi, yi) are converted into (Φ(xi), yi) A classification model is then constructed by identifying a linear hyperplane within this new variable space, utilizing the transformed datasets (Φ(x1), y1), , (Φ(xn), yn).
Plant identification from images of multiple organs
Early fusion techniques for plant identification from images of
This section explores plant identification studies utilizing early fusion techniques In a study involving the LifeCLEF 2015 dataset, which includes seven plant organs, the authors implemented an early fusion method Initially, a Convolutional Neural Network (CNN) was employed to extract domain-generic features from all plant images Subsequently, a separate CNN was used for each organ to learn subset-specific features Both feature types were extracted from layer 20 of the CNNs and combined into a single feature vector An SVM classifier was then trained using these integrated features.
The authors in [78] explored early and late fusion techniques for integrating local and global features of leaves Early fusion involves combining features at the FC6 layer output prior to the Softmax layer, where class scores are computed, while late fusion occurs at the Softmax outputs of each CNN network, as illustrated in Figure 1.16.
Figure 1.15 Early fusion method in [77].
The research detailed in [79] employed multi-column deep convolutional neural networks (MCD-CNN) to analyze multiple plant organs, utilizing AlexNet as a feature extractor for each organ Features were gathered from the final fully connected layer (FC8) and subsequently merged before being processed by a Softmax classifier This combined output generates probabilities for each species through the Softmax layer In this thesis, we have reimplemented this approach to facilitate a comparison with our proposed methods.
This early fusion technique is not always feasible because many cases the given fea- tures might not be compatible due to differences in the nature of modalities Concate-
Figure 1.16 Different types of fusion strategies [78] nation of the feature vectors may lead a feature vector with very high dimensionality, thus increasing the calculation.
Late fusion techniques for plant identification from images of
In plant identification studies, late fusion methods typically operate based on rank or score Techniques such as Borda-fuse, BordaMNZ, and the Borda count algorithm are commonly employed for ranking Additionally, various late fusion schemes aimed at merging image feature similarity scores from ranking lists have been explored, with research investigating different fusion methods, including inverse rank position, Borda count, and leave-out strategies, specifically for leaf and flower organs.
Most studies utilize late fusion at the score level to identify plant species based on the identification results from various organs Score level fusion can be divided into three categories: transformation-based, classification-based, and density-based approaches Transformation-based methods involve normalizing matching or confidence scores, which are then fused using rules like max, product, or sum to derive a final score, facilitating straightforward implementation In contrast, classification-based approaches treat multiple scores as feature vectors and apply a classifier to distinguish categories, although this method is more complex Lastly, density-based approaches ensure optimal fusion by accurately computing the probability density function of scores for each class, making it particularly effective for verification tasks.
Plant identification studies usually focus on a transformation-based approach The commonly used late fusion methods used are average of probability [35], max rule
In various studies, multiple Convolutional Neural Networks (CNNs) are utilized and their results are often combined using common methods, such as averaging the outcomes This involves calculating the mean values of class predictions from different models for each image, which are then compiled into a new set of predictions Key techniques in this process include the sum rule, min rule, product rule, and majority voting rule, among others.
In their study, the authors implemented various fusion techniques, starting with the Leave Out algorithm for processing multiple leafscan image queries Additionally, they employed a weighted fusion approach for local features specifically designed to address images with complex backgrounds.
P C( k ) = X f F ∈ w( ) (f P C f k ) (1.7) where F as a set of local features, P C( f k ) as probability of class C k for feature f ∈F w f, ( ) = max
In addressing multi-image queries with complex backgrounds, the author categorizes the approach into two levels: single-view and multi-organ At the single-view level, where query images pertain to the same organ, a max rule is utilized Conversely, at the multi-organ level, which involves different organs, a weighted fusion technique is employed as described in Equation 1.7, substituting local features with a set This methodology highlights the significance of various organs in the analysis.
In [74] the weighted fusion method also apply for organs ( ) Leaf ( ), Flower ( ),o L f
Fruit ( ), Branch ( ) to compute the final confidence score for a species :F B S
Where no is number of image for organ in a query,o w o is the weighted confidence for organ oand Co(xi, S ) is the confidence score of image x i for organ obelong species
S [93] also used the similar weighted fusion method in [17, 74].
Pl@ntNet is the first tool that identifies plants based on multi-organ [6, 94] It
The plant identification process begins with analyzing images of individual organs, followed by integrating these results to achieve a comprehensive identification Each organ type is assigned a different weight, with flowers being prioritized due to their superior distinguishing features compared to leaves Empirical optimization is used to determine the weights for each organ Research indicates that including at least one flower image in the identification query enhances recognition performance To effectively combine predictions from various images, late fusion techniques are employed Studies have demonstrated that identifying plants using multiple organs significantly outperforms methods relying on a single organ.
Score level fusion techniques are both feasible and practical, with transformation-based methods being favored for their simplicity and effectiveness However, there is a lack of plant identification studies utilizing classification-based fusion techniques This thesis aims to address this gap by not only employing straightforward transformation-based methods but also integrating classification-based techniques The proposed approach combines various fusion methods at the score level, with the expectation that this combination will yield superior results compared to relying on a single method.
Plant identification studies in Vietnam
In recent years, Vietnam has seen significant advancements in automatic plant identification research, yielding positive outcomes Notably, Le et al have contributed extensively to this field, with multiple studies focusing on plant identification Their participation in the ImageCLEF 2013 competition showcased their efforts in identifying plants based on various plant organs.
The study conducted three experiments to enhance plant identification methods The first experiment employed the GIST feature alongside the k-nearest neighbor classifier for various organs The second experiment focused on flowers and entire organs, utilizing color and texture features In the third experiment, SURF features and an SVM classifier were applied Although initial results were modest compared to other teams, the authors developed a two-part plant identification system, comprising a semi-automatic graphic tool and a leaf-based automatic identification method The leaf identification method achieved promising results, demonstrating an accuracy of 84.68% on the Flavia dataset using HOG features and SVM classification Additionally, the authors created an Android application for plant identification based on leaf analysis.
The combination of SURF features with a bag of words model and an SVM classifier has led to an impressive accuracy of 95.94% Additionally, the authors explored a kernel descriptor for leaf identification, achieving a notable accuracy of 97.5% on the Flavia dataset Overall, the identification results have shown consistent improvement over time.
The study [100] focuses on image classification to distinguish between leaf and non-leaf images, utilizing GIST features in conjunction with an SVM classifier The experimental results, derived from the Leafsnap dataset as well as additional data collected from the internet, demonstrate a classification accuracy that reaches impressive levels.
98% show the success of the method.
Some studies related to identification of Vietnamese medicinal plants such as [101,
Nguyen et al developed a method for retrieving medicinal plants using binary signs that represent the color distribution of images through a series of binary bits, allowing for the calculation of similarity between two images Their experiments, conducted on three species of medicinal plants, demonstrated varying accuracy rates between 45% and 98% Additionally, a VGG16-based deep learning model was employed to extract features from a dataset collected by the authors, which included 10,000 images of 10 different plant species in Vietnam, sourced from both mobile phone captures in natural environments and online resources.
The application of the Light Gradient Boosting Machine classifier achieved an impressive accuracy of 93.6% In a separate study, researchers utilized deep learning networks, including Inception-based and MobileNets, alongside transfer learning techniques, testing their methods on a self-collected dataset of 20 medicinal plants comprising 5,816 images This dataset featured leaves photographed against a white background, resulting in a remarkable accuracy of 98.5% Additionally, a mobile application for medicinal plant identification has been successfully deployed.
A recent study focused on plant identification using leaf images has been conducted on various plant species in Vietnam Researchers Thanh et al developed a CNN model that classifies leaves based on their vein shapes Their experiments, utilizing a self-collected dataset from the Mekong Delta, which includes 52 plant species, along with the Flavia and Swedish datasets, achieved an impressive accuracy exceeding 95%.
Various studies have focused on identifying the flowers and fruits of Vietnamese plant species One study outlines a methodology that includes K-means segmentation, extraction of Hu moment features, and the application of an SVM classifier The researchers conducted experiments using data they collected, which included 10 common species.
flower species in Vietnam with an average of 200 images per species, the accuracy achieved 73.93% Nguyen in [106] uses the AlexNet network, fine-tuning the model to
fit their dataset to fruit identification Fruit image dataset is built with 40 different species The author has built an identification application on mobile phone Accuracy achieved is 65.49%.
Research on plant identification in Vietnam is currently limited, primarily focusing on leaf analysis using hand-designed feature-based methods Recent studies have begun to explore deep learning techniques applied to a small selection of Vietnamese plant species, ranging from 3 to 52, often featuring plant organs against simple backgrounds These datasets are typically self-collected or sourced from the internet However, the effectiveness of these methods on other benchmark datasets remains unproven, and there is a notable scarcity of studies addressing images with complex backgrounds or combinations of plant organs This thesis aims to develop plant identification methods that utilize images of multiple plant organs, specifically targeting the identification of medicinal plants in Vietnam.
Plant data collection and identification systems
A large image dataset plays a crucial role in building automatic recognition system.
However, collecting images of plant is a time and effort consuming work because of the following reasons The first reason comes from a large number of species (e.g.,
The LifeCLEF 2017 dataset is one of the largest image collections available, featuring over 1.8 million images representing 10,000 different plant species However, it still encompasses only a fraction of the vast number of species found on our planet.
Research indicates that utilizing multiple plant organs yields superior results compared to relying on a single organ However, capturing images of these organs can be challenging due to their varying growth stages, which may lead to incomplete data Additionally, plant data collection often faces a long-tail problem, where abundant information exists for common species, but scarce or nonexistent data is available for less common ones This results in significant class imbalance in datasets, as seen in LifeCLEF 2018, where the number of images per species varies dramatically from just 1 to 1245 Furthermore, the majority of plant data collection is conducted by botanists during field trips, focusing primarily on biological aspects.
Botanists often capture images of specific observations, such as damaged plants or their surrounding environments Before these images can be utilized in automated plant identification systems, they must undergo a validation process Additionally, each botanist may maintain a personal collection of these images.
There are two primary methods for collecting plant images: manual acquisition and crowdsourcing The manual approach involves collectors visiting specific locations to capture images of the desired plant organs, which can be time-consuming and labor-intensive In contrast, the crowdsourcing method leverages contributions from a large online community, allowing individuals to collectively gather and share plant images, resulting in a more efficient and comprehensive collection process.
This section analyzes various plant information collection systems, emphasizing their functionalities for user contributions, commonly referred to as the Contributions function We categorize these tools into two groups: explicit and implicit systems, based on how they gather data from users An overview of the available plant data collection tools is presented in Table 1.3, summarizing their social network structure, identity definition for new observations, system type, information collected on observed plants, additional functionalities, user statistics, supported platforms, and any associated fees.
Table 1.3 A summary of available crowdsourcing systems for plant information col- lection.
Species name, images with organs tag, GPS, date, time
- Browse plants in the database.
- Support automatic plant identification based on images of organs
- Suggest species name for others users
3,000,000 users iOS Android Web cloud-based
Species name, images, descriptive information, GPS, date, time
- Browse plants in the database.
389,024 registered users iOS Android Web cloud-based
Species name, images, GPS, date, time
- Browse plants in the database.
Species name, images of leaf, GPS, date, time
- Browse plants in the database.
- Support automatic plant identification based on images of leaf
1,000,000 users since 2011. iOS client-server Free
Species name, images of leaf, GPS, date, time
- Browse plants in the database.
- Support automatic plant identification based on images of leaf
Species name, Images, short description.
Pl@ntNet is a crowdsourcing platform designed for plant data collection and identification, developed by scientists from four French research organizations and the Tela Botanica network Available on iOS, Android, and web platforms, it allows users to upload images of plants and determine their species through three methods: by entering the species name, using the system's automatic identification feature, or seeking assistance from the community Users can also provide GPS location data for their images and contribute further by evaluating the quality of others' uploads or suggesting species names.
Figure 1.17 Some snapshot images of Pl@ntNet.
The system has gained significant popularity among users due to its innovative solutions, notably its connection to the Tela Botanica network, Europe's largest botanical social network with over 37,358 registered members of varying expertise in botany Additionally, the system offers engaging features such as a comprehensive browse function and automatic plant identification The browse function provides extensive information on plants, including images of various plant organs like flowers, fruits, leaves, and bark, with species descriptions linked to both Tela Botanica and Wikipedia.
The automatic plant identification system prompts users to identify the type of plant organ present in their images, such as leaves or flowers Utilizing an image-based identification method, the system generates a list of matching plants for users, enhancing their knowledge about the specific plant of interest.
Pl@ntNet is a highly successful crowdsourcing tool dedicated to plant identification, boasting a vast database of 3.6 million images and extensive information on numerous plant species In contrast, iNaturalist supports a broader range of 13 biological groups, including plants, and operates across three platforms: iOS, Android, and Web As of January 2017, iNaturalist had recorded over 1.16 million observations of 33,421 plant species, contributed by 42,134 observers and 8,495 identifiers Users can submit observations with multiple images and suggest species names, which may be verified through community consensus or expert input The platform also encourages users to provide additional details such as date, time, and GPS coordinates, linking to resources like Wikipedia and the Encyclopedia of Life for comprehensive plant information.
Another crowdsourcing system for biodiversity is iSpot [111, 112], it is also explicit system Launched in June 2009 by OPEN University-UK with support by OPAL
(The OpenScience Laboratory), iSpot is available on Web and Android Similar to iNaturalist, besides plants, the system supports other groups such as bird and fish.
In each group, members are assigned varying weights based on their identity contributions, which are widely recognized by the community Those with higher weights have a more significant impact on community dynamics This system operates similarly to the Contribution function used by iNaturalist.
Implicit systems have emerged alongside explicit systems, characterized by their ability to gather user information indirectly A notable example is Leafsnap, launched in May 2011 for iOS, which specializes in plant identification through leaf images Users can capture a photo of a leaf against a white background, and the system automatically determines if a leaf is present If detected, Leafsnap employs leaf recognition techniques to provide a list of potential matches Users can then review this list and select the plant that most closely resembles their specimen, with additional details such as images and timestamps provided.
GPS coordinates are automatically sent to dataset of Leafsnap Scientists can use this information to track how the numbers and ranges of plants are changing over time.
This system focuses only on 185 trees in the Northeastern United States.
Another implicit system is Flora Finder [113] Flora Finder has been created in
2013 by the Otago Innovation Limited (The University of Otago Technology Transfer
Team) and the University of Otago Botany Department in partnership with MEA
Flora Finder is an iOS app that utilizes advanced image-recognition technology to identify plant species by their leaves, similar to Leafsnap If automatic recognition fails, users can submit a request to the system, and a team of botanists from the University of Otago will assist in providing the correct identification.
Botany Department identifies manually the plant for users This tool is limited to plants in New Zealand and uses only leaf images Moreover, it requires fee from users.
Plantifier is a free crowdsourcing plant recognition tool on iOS and Android [114].
It has been created by a Belgian company, TrendsCo and is available in English, Dutch,
MyGarden.org is a community-driven platform that enables users to upload photos of unknown plants along with a brief description for identification To receive responses, users must have a MyGarden.org account The platform encourages users to submit images of the entire plant rather than just dried leaves or picked flowers Additionally, users can enhance the identification process by suggesting possible plant names or voting on answers provided by fellow gardening enthusiasts.
Over 100,000 individuals have submitted photos of unidentified plants for recognition, benefiting from expert analysis that ensures accurate identification However, the process can be time-consuming and relies on the availability of other users Additionally, the current system does not facilitate users' exploration of plant information within the dataset.
Crowdsourcing presents a promising trend for biodiversity data collection, enabling the generation of large volumes of timely and cost-effective data compared to traditional methods This innovative approach could transform ecological monitoring practices Nevertheless, several challenges remain to be addressed, including the need for a biodiversity crowdsourcing system that facilitates cross-country collaboration.