Random Forest Machine Learning
Technique for Automatic Vegetation
Detection and Modelling in LiDAR Data
Fayez Tarsha Kurdi1*, Wijdan Amakhchan2 and Zahra Gharineiat3
1Institute of Integrated and Intelligent Systems, Griffith University, Australia
2Faculté des Sciences et Techniques de Tanger
3 School of Civil Engineering & Surveying, Faculty of Health, Engineering and Sciences, University of Southern Queensland, Australia
Submission: May 12, 2021; Published: June 04, 2021
*Corresponding author: Fayez Tarsha Kurdi, Research Fellow, Institute of Integrated and Intelligent Systems, Griffith University, Nathan, QLD 4111, Australia
How to cite this article: Fayez Tarsha K, Wijdan A, Zahra G. Random Forest Machine Learning Technique for Automatic Vegetation Detection and
Modelling in LiDAR Data. Int J Environ Sci Nat Res. 2021; 28(2): 556234. DOI: 10.19080/IJESNR.2021.28.556234
Machine learning techniques have gained a distinguished position in the automatic processing of Light Detection and Ranging (LiDAR) data area. They represent the actual research topic in the remote sensing domain. Indeed, this paper presents one method of supervised machine learning, which is called Random Forest. This algorithm is discussed, and their primary applications in automatic vegetation extraction and modelling in the LiDAR data area are presented here.
Keywords: LiDAR; Random forest; Classification; Modelling
Nowadays, Light Detection and Ranging (LiDAR) data wins an advanced position among other remote sensing data . Automatic vegetation detection and modelling in forest and urban areas are one of the important envisaged applications of LiDAR data. In fact, automatic tree detection in LiDAR data belongs to the automatic classification of LiDAR data topic. The scanned scene consists of different man-made objects such as buildings, bridges, roads and dams. Furthermore, the studied zone may contain natural item classes such as vegetation, terrain, rivers, and lakes. In order to model the project area, its LiDAR point cloud has to be classified according to the main classes . Once the classification is achieved successfully, the next step is to model each class aside. Concerning the vegetation detection and modelling, classic approaches were employed in the literature such as RANdom Sample Consensus (RANSAC) algorithm [3,4], local maximum algorithm , surface growing algorithm and multiple echo analysis , voxel layer single tree modelling algorithm , morphological algorithm  and analysis of fullwave form LiDAR data .
Recently, a modern technique called machine learning enhanced the automatic processing of the LIDAR data area. This technique becomes quickly widespread and occupied a major position in regard to the other classical approaches. This paper presents one machine learning method that widely applied for automatic vegetation recognition and modelling in LiDAR data field. This method is Random Forest (RF). The paper aims to summarize the principal of this technique in addition to its main applications in LiDAR data field.
In the next section, Random forest technique will be discussed.
RF is an ensemble of supervised learning algorithms used for classification and regression, used in predictive modelling and machine learning technique . It gathers the results and the predictions of several decision trees to finally choose the best output which is the mode (the value that appears most often in the set of decision trees results) of the classes or mean prediction.
RF works by splitting the dataset into two sections, the training set and the test set. Then randomly select multiple samples from the training set. Next, use the decision tree for each sample which divides each selection into two daughters using the best division. Thereafter, repeat the last step to finally vote for each prediction result and select the most voted prediction as the final result (Figure 1).
The main hyperparameters in Random Forest are either used
to increase the predictive power of the model or to make the model
faster . In this context, a higher number of trees can increase
the performance as well as makes the predictions more stable, but
unfortunately, the processing time becomes longer. Furthermore,
the employment of a maximum number of features in addition to
a minimum number of leaves are that requested splitting internal
nodes may improve the algorithm performance. Once the training
step is realized, the trained model can be applied to a dataset that
is not used for training. This procedure allows estimating their
predictions and compared them to the expected values .
In literature, many authors applied RF exclusively on LiDAR
data [13,14] whereas other authors simultaneously used
additional data and LiDAR point cloud as input for RF algorithm
[15,16]. From another viewpoints, several applications were
achieved on LiDAR data by using the RF technique. Yu et al. 
suggested an approach for estimating tree characteristics such as
height, diameter, and stem volume using LiDAR data. For attending
this goal, RF is considered as a classifier. Levick et al.  fused
the Digital Surface Model (DSM) calculated from LiDAR point
cloud and field-measured wood volume using RF Algorithm. Chen
et al.  used the feature selection method and RF algorithm for
forested landslide detection. For this purpose, the Digital Terrain
Model (DTM) and the slope model was established for the scanned
scene, and the selected features are calculated at the pixel level.
The same principle was applied by Guan et al.  to classify the
city components in the urban zones.
RF was broadly used for vegetation detection in forest and
urban areas. Niemeyer et al.  classified the scanned city
elements by integrating RF classifier into a Conditional Random
Field (CRF) framework. Moreover, Man et al.  extracted grasses
and trees in urban areas using airborne LiDAR and hyperspectral
data. RF and object-based classification methods were employed
together to extract the distribution map of urban vegetation.
[20,21] underlined the efficiency of RF for vegetation detection in
forest and urban areas. Huang & Zhu  developed an approach
for fusing hyperspectral image and LiDAR data based on RF. In this
context, each feature is ranked by RF, and more useful features are
selected as inputs for RF for data classification.
RF is an efficient machine learning technique that can be used
for automatic vegetation extraction and modelling in forests and
urban zones in LiDAR data. In this context, LiDAR data can be used
exclusively or in addition to other supplementary data such as
field-measurement and hyperspectral data.
Wang Y, Weinacker H, Koch B, Sterenczak K (2008) LiDAR point cloud based fully automatic 3D single tree modelling in forest and evaluations of the procedure. Int Archives Photogrammetry Remote Sens Spatial Inform Sci XXXVII: 45-51.