European Hazelnut Yield Prediction Model for Low and High Production Years, Using a Neural Network and Multispectral Images
Cañete-Salinas P1*, Ogass K1, Saavedra-Pérez N1, Espinosa-Ackerknecht C2, Urzua J1, Guajardo J1, Errázuriz-Montanares I1, Garrido-Faúndez P1 and Acevedo-Opazo C1
1Faculty of Agrarian Sciences, University of Talca, Chile
2ERDE Technology and Applied Engineering SPA, Chile
Submission: November 28, 2024;Published: December 12, 2024
*Corresponding author: Cañete-Salinas P, Acevedo-Opazo C, Faculty of Agrarian Sciences, University of Talca. Avenida Lircay s/n, Talca, Chile
How to cite this article: Cañete-Salinas P, Ogass K, Saavedra-Pérez N, Espinosa-Ackerknecht C, U, et al. European Hazelnut Yield Prediction Model for Low and High Production Years, Using a Neural Network and Multispectral Images. Ecol Conserv Sci. 2024; 4(4): 555644.DOI:10.19080/ECOA.2024.04.555644
Abstract
A study was carried out to develop and validate a yield prediction model for European hazelnut (cvs. Tonda di Giffoni and Barcelona), by means of a neural network model using the multilayer perceptron (MLP) technique, through the incorporation of different spectral bands extracted from Sentinel-2 during the period of greatest vegetative expression of hazelnut (January in the southern hemisphere). This research was conducted in a commercial European hazelnut orchard located in Freire (Araucanía region), Chile, during six consecutive seasons (2016 to 2021). This information was used to develop a yield estimation model, for this purpose, multispectral information collected from Sentinel-2 satellite was used. The yield information was classified into years of low production “OFF years” and production “ON years” according to the agronomic behavior of this fruit tree. The results show a strong linear correlation of the proposed model, both for model develop and validation process, with R2 of 0.99 and 0.93, respectively. In turn, it showed that the model was able to predict yield with Root Means Square Error (RMSE) and Mean Absolute Error (MAE) of 432 and 342kg/tree for the model develop process and 759 with 589kg/tree for the validation, respectively. The results suggest that the use of MLP technique is an interesting tool to be implemented for the prediction of commercial yield of hazelnut and that it could be improved, through the incorporation of historical yield information, by using autonomous learning technique.
Keywords:Multilayer perceptron; European hazelnut; Neural network model; Spectral band; Yield estimation
Abbreviations:MLP: Multilayer Perceptron; RMSE: Root Means Square Error; MAE: Mean Absolute Error; CBMA: Copula-Based Bayesian Model; NDVI: Normalized Difference Vegetation Index; EVI: Enhanced Vegetation Index; AVI: Advanced Vegetation Index; GNDVI: Green Normalized Difference Vegetation Index
Introduction
European hazelnut is one of the most important nut species worldwide Bozoğlu et al. [1] and has experienced sustained growth in recent years in Chile Ascari et al. [2], reaching 37,000 ha in 2022. Hazelnut consumption in the world is 750 to 850 thousand tons, destined to produce chocolates and confectionery An et al. [3]. This high demand explains the exponential growth of this fruit tree in Chile, due to the positioning of companies such as Ferrero in the country.
The hazelnut in temperate climates such as Chile, presents a range of yields that fluctuates between 2000 to 3500 kg per hectare Bozoğlu et al. [1]. This wide range of variation is due to a recurrent problem in hazelnuts, known as alternate bearing, with years of high production “ON” followed by years of low production “OFF” Ascari et al. [2]. This becomes a big problem because it is not possible to know what the expected yield is at the end of the season, generating many logistical problems during harvest Saavedra-Pérez et al. [4]. On the other hand, the temperate climates of the world have been negatively affected by climate change, generating significant changes on the behavior of phenological events in several species, including European hazelnut, causing a high variability of expected yields in this species Von Bennewitz et al. [5]; An et al. [3].
Yield estimation in fruit trees is essential information for crop scheduling and productivity-related decision making Ji et al. [6]; Saavedra-Pérez et al. [4]. Yield prediction can be done by direct methods, such as counting or fruit detection, or by indirect methods that consider different auxiliary information related to yield, such as agroclimatic information, agronomic management, and vegetation indices He et al. [7]. Estimates at the grower-level are usually based on visual assessment at flowering or before harvest, without considering either the spatial variability of the orchard or the production history. However, this methodology would generate estimates with high prediction errors, especially in extensive plantations or in trees with dense crowns Anderson et al. [8], which are characteristics of European hazelnut production in Chile.
Remote sensing is currently a tool that has been proposed as an interesting alternative to be implemented in yield estimation models in different fruit species Li et al. [9], with vegetation indices being the most widely used information He et al. [7]. In this regard, several yield prediction studies have been conducted on European hazelnut, based on the use of multiple layers of information, such as edaphoclimatic variables, agronomic management, and plant leaf architecture Bregaglio et al. [10]; Bregaglio et al. [11]; Bregaglio et al. [12]. However, there is no information on yield estimates considering vegetation indices and correlations according to the biennial productive variability of this fruit species Azarenko et al. [13].
Regarding methodologies for the building of prediction models, one of the most widely used are artificial intelligence models, capable of estimating non-linear phenomena from a series of layers of information Nosratabadi et al. [14]. One of the main tools used are neural networks, which have an elevated level of prediction for time series data, the most widely used system being the multilayer perceptron (MLP) Ahmed et al. [15]. This system allows the combination of massive data, which consider the main interactions between the variables used for the develop of the proposed models Ahmed et al. [15]. Regarding its use for the development of predictive yield models, we can find several examples in annual crops. In this regard Ahmed et al. [15], using information from the Food and Agriculture Organization and the World Data Bank, were able to estimate maize yield using the MLP with climatic information of temperature and precipitation, together with the previous year’s yield, with an error of 0.11, being the lowest in similar studies. On the other hand, Bazrafshan et al. [16] were able to estimate wheat yield in 4 locations in Iran, using a 30-year database of fertilizers, yield, and climate information, and MLP in conjunction with a copula-based Bayesian model. (CBMA), obtaining a mean absolute error (MAE) of less than 0.13, being better than the existing models in literature. Moreover, it can cover between 89 to 94% of the observations. This type of model has also been successful applied in species such as rice, where Gandhi et al. [17] was able to estimate the yield of this crop from climatic and yield information from 27 districts of Maharashtra state in India, reaching an accuracy of 97.5% in the estimation, with a sensitivity of 96.3 and a specificity of 98.1. Similar, Wang et al. [18] managed to estimate the sugar yield with an accuracy of 95 %, a recovery of 96 %, a Mean absolute error (MAE) of 0.04% and root mean square error (RMSE) of 0.006.
Considering the high spatial variability of European hazelnut and the problems of alternate production of this species, it is essential to predict the yield using extensive methodologies, considering large areas of plantations with this fruit tree. Therefore, the objective of the present work is to estimate the production of European hazelnut in orchards with high vegetative expression, using vegetation indices in years of high (ON) and low (OFF) production.
Materials and Methods
Description of study site
The experiment was conducted in a commercial hazelnut field located in the commune of Freire, Araucanía Region, Chile (38°95’78’’S, 72°43’96’’O), during six season (2016 to 2021). The experiment was conducted on cvs. Tonda di Giffoni and Barcelona, both planted in a 5 x 4m planting frame, with 2 drippers per tree of 2L/h. The mean annual temperature of the trial was 8.4ºC, and the mean minimum average of the coldest month was 1.5ºC. The total annual average rainfall for the last 10 years was 1,250 mm. The soil is an Andisol characterized by being derived from volcanic ash.
Experimental design
For the develop of the proposed models, the average yield values of 14 European hazelnut blocks were used (Table 1). This information on yields per hectare was used to standardize the variability among the blocks evaluated. Each block represents a production unit of European hazelnut cvs. Tonda di Giffoni and Barcelona (Table 1). For the delimitation of each of the evaluated blocks, the google earth “polygons” tool was used, to be validated later in the Qgis 3.32 Lima.
Satellite images extraction
Satellite images from the Sentinel-2 MSI sensor Drusch et al. [19] were obtained during the month of January, when hazelnuts trees reach their maximum vegetative expression for the productive conditions of Araucania region. The images were downloaded using the “semi-automatic classification plugin” available in Qgis 3.32 Lima. The images were carefully selected on clear days (no clouds) and geometrically and radiometrically corrected. In addition, the images were atmospherically corrected using the sen2cor algorithm Louis et al. [20] to obtain the surface reflectance of the vegetation cover. It was decided to work with the highest vegetative expression of hazelnuts, due to the high relationship between productivity and availability of photosynthetically active shoots in plants.
Model develop and validation process
The multilayer perceptron (MLP) technique, developed by Rosenblatt et al. [21], was used. This model generates one or more hidden layers through a series of input data, which through connections generates the desired output in relation to the phenomenon or variable to be predicted. For this, the larger the database, the better the results delivered by the model. This type of models allows the use of parametric and non-parametric data to understand the dynamics of the data and the environment.
Prior to the use of MLP, the data and spectral bands to be used were filtering. For this purpose, a Pearson correlation matrix was made between the yields per site of all seasons and the spectral bands obtained in the sampling, to determine the degree of association between them. In turn, a Forward analysis was performed to classify the lineal degree of importance of each independent variable, this being the one that produces the least variance in yield represented by equation (1) Thompson et al. [22]. Table 2 shows the selected Sentinel-2 bands after the process was performed.
where Fs is the radius of the linear model Forward, SSk-1 is the sum of squares of the total regression with (k-1) degrees of freedom. While MSEk is the residual sum of squares.
Multilayer perceptron (MLP) technique will be used to predict the commercial yield of several European hazelnut seasons by using different spectral bands captured by the Sentinel 2-MSI sensor (Table 2). The 2016, 2017, 2020 and 2021 seasons were used for developing the model, while the yields of 2018 and 2019 seasons were used for the model validation process. In addition to the spectral bands, the differentiation between “ON years” and “OFF years” years in hazelnut was incorporated into the model to correct the yield results based on the agronomic behavior of the orchards. For the develop of the model using MLP, the activation function of the hidden layer was by “hyperbolic tangent”. The structure of the generated neural network is shown in Figure 1.

The activation function generates a series of neurons or nodes, which in turn generate hidden layers. These layers are generated from the weight that each variable represents in the develop of the actual yield estimation model (yn), represent by blue lines in Figure 1. Taking this weight into account, a network is used to adjust the weights of the input variables, the nodes or hidden layers and the output layer (estimated value; y’n), using equation (2) based on the methodology used by Ahmed [15].
For model validation, the statistical analysis used to compare predicted and observed hazelnut yield was adapted from Mayer & Bulter [23], where the following statistical criteria were included: Root Means Square Error (RMSE) and Mean Absolute Error (MAE). On the other hand, the linear slope ratio (1:1; S) and the determination coefficient (R2) were used as fit parameter for models (3) and (4).
where: ӯobs is the value of yield determined in the field; ӯest is the value estimated by the MLP model tested; and “n” is the number of observations.
Results and Discussion
Figure 2 shown the division of the database used in the development process. Since the database is limited in terms of block and monitoring seasons, 70% of the data was used for model training, 23% for model testing and, finally, the remaining 7% was used as a reserve. The above division was implemented as a strategy given the limited amount of information available. In this sense, the ideal is to leave as much information as possible for model training. Thus, to the extent that more information is incorporated into the MLP, the better the estimation will be, and a small volume of information can be used in training.

The contribution of each of the variables for develop the model is presented in Figure 3. In this sense, Band 8A (865nm) is the most important with a contribution of 24%, followed by bands 8 (842nm) and 6 (740nm) with 19 and 18%, respectively. These bands represent the near-infrared wavelength, i.e., they detect changes in the vegetative expression of the plant, which would explain the importance of these bands in the construction of the model, these bands being important in the detection of changes in foliage growth by using the Sentinel-2 sensor Drusch et al. [19]. The study carried out by Saavedra-Pérez et al. [4] demonstrates the high relationship that exists between vegetation indices calculated from satellite images and yield. Among these indices, Enhanced Vegetation Index (EVI) and Advanced Vegetation Index (AVI) stand out, which are built with the spectral bands between 665 and 865 nm, these being the most related in this research to yield. In relation to the “ON years” and “OFF years”, these contribute 9% in the develop of the model, making clear the relevance of identifying yield behavior patterns and their interaction with vegetation expression. Thus, a higher vegetative expression during the agronomic season is associated with a higher number of shoots, which support a higher number of glomerules (hazelnut female flower), which would also increase pollen availability, favoring fruit set and final yield Ascari et al. [2].
Figure 4 shows the linear correlation between the actual yield and the yield estimated by the model, for the data used in the develop and validation process. A total of four seasons were used to develop the model, with the average yields of each block. In this sense, the model showed a high R2 of 0.98 and a slope close to 1 of 0.97 in the 1:1 correlation and a coefficient of determination of 0.88 for the relationship between estimated and observed. Therefore, the model in its development can predict the average yield of the orchards using a satellite image obtained in the month of January, associated with a higher vegetative expression in European hazelnut. For the validation process carried out with two different seasons, it also shows a high R2 of 0.96 and a slope close to 1 of 1.08 in the 1:1 correlation and a coefficient of determination of 0.64 for the relationship between estimated and observed, corroborating the predictive capacity of the model. Although, during the validation process, the model tends to be more erratic, this may improve as more seasons or historical yield records are integrated.
By using the MLP technique, it was possible to generate better performance estimates, through autonomous learning of the proposed models, and it was even possible to incorporate other variables measured in the field, such as vegetative expression and plant physiological response, that are directly associated with yield. To characterize the yield estimated by the model during the develop and validation process, Table 3 presents the analysis of different statistics used. In this sense, RMSE was 432 and 559kg/ha, while the MAE was 342 and 475kg/ha, for the develop and validation processes, respectively. On the other hand, the percentage of error associated with the average yields of all the seasons used in both processes was estimated, showing errors of 12 and 18% for both processes (develop and validation), considering the average yield as 2,648kg/ha in all seasons.





Regarding the use of this type of models in hazelnut or other species, Anastasiou et al. [24] evaluated the use of Normalized Difference Vegetation Index (NDVI) and Green Normalized Difference Vegetation Index (GNDVI) to estimate yield in grapevines, using satellite and proximal sensing. In this sense, they obtained an adjusted R2 ranging from 0.33 to 0.80 and modeling errors of approximately 26%, being lower than what was achieved by our study. In Apple tree, Bai et al. [25], studied the relationship between yield and NDVI, finding R2 of 0.71 and RMSE of 16.4kg/ tree, the error being greater than that achieved by our study, considering that on average a plant can achieve 80 to 120kg/plant, the error would be 20-14%, which when taken to hectare would be greater. In hazelnut Bregaglio et al. [10] developed a yield prediction model for hazelnut called GROSS-MS based on plant photosynthesis and climatic behavior, which obtained a Relative RMSE of 25.03% and a model efficiency (EF) of 0.52, presenting a higher error compared to that observed in the present study (both for the development and validation process). Although the results of the present model are promising, they only use multispectral information. We believe that by incorporating plant physiological information and shoot growth and plant size in combination with vegetation indices, it will be possible to develop a more precise model than the one proposed in this work, this being the next step in the research and development of models. predictive of yield in European hazelnut. Among the variables that can be incorporated, the space used by the plant and the volume of canopy stand out, according to the results obtained by Saavedra et al. [4].
On the other hand, regarding the use of MLP in other species as a performance modeling technique, there are no experiences of yield predictions in European hazelnut. However, there are some experiences of yield prediction in annual crops. In corn, Ahmed [15] using MLP, obtained an R2 of 0.96 and an error of 0.11, improving the estimation predictions compared to other models in this crop. On the other hand, Bazrafshan et al. [16], worked with MLP in wheat yield prediction, obtaining an RMSE of 0.1520 and a relative RMSE of 12%, being similar to the development process of our model. Wang et al. [18] estimated sugar yield, with an RMSE of 0.006% and MAE of 0.04%, being much superior to what was obtained in this work. All the models developed using MLP were able to reduce the estimation error, improving the accuracy of yield prediction, so the use of historical data and extensive yield databases would considerably improve yield estimation in European hazelnut. The use of large databases is the difference with our model. Although 4 seasons were used for its development and 2 for validation, the incorporation of more seasons or historical performances from previous ones should increase the predictive capacity of the model and reduce the error.
In the process of yield prediction, in addition to obtaining correct estimation and low average errors, it is also important to obtain good spatial estimation of performance. In Figure 5 & 6 we can see the actual yield (A) and the yield estimated by the model (B) during the 2018 and 2019 seasons, respectively.
For 2018 (Figure 5) the yield achieved were higher, with 3,600 kg/ha in some sectors as a maximum and 1,600 kg/ha as a minimum. In this regard, the central area of the field is the one that has the highest yields, and the south-east and south-west areas have the lowest yields (A). Model (B) can detect the sectors with the highest yield in the central zone, but it is somewhat erratic in determining the areas of low performance, generating an overestimation.
Regarding the 2019 season (Figure 6), the yield achieved were lower with 1,200 to 3,000 kg/ha. Once again, the same pattern as the previous season is observed, with a central zone of higher yield and lower yield in the south-east and south-west (A). For this season, model (B) works better, being able to detect a spatial structure similar to the yield.


Although the model has a great predictive capacity and can detect a similar spatial structure in yield, it can improve as more information is added to the model. Also, adding other variables of interest, such as those measured in the field, such as space used by the plant or plant volume, could be used to improve the performance of the model, according to what was stated by Saavedra et al. [4].
Conclusion
The model developed by neural networks, using the MLP technique, can predict the yield of European hazelnut with a high correlation coefficient and a low mean error, using spectral information extracted during the peak vegetative expression season (January in the southern hemisphere). Models generated from neural networks can be significantly improved by using a large amount of historical yield information, which would allow for better prediction, characterizing the spatial and temporal behavior of yield and its relationship with “ON years” and “OFF years” production.
References
- Bozoğlu M, Başer U, Topuz BK, Eroğlu NA (2019) An overview of hazelnut markets and policy in Turkey. Kahramanmaraş Sütçü İmam Üniversitesi Tarım ve Doğa Dergisi 22(5): 733-743.
- Ascari L, Siniscalco C, Palestini G, Lisperguer MJ, Huerta ES, et al. (2020) Relationships between yield and pollen concentrations in Chilean hazelnut orchards. European Journal of Agronomy 115: 126036.
- An N, Turp MT, Türkeş M, Kurnaz ML (2020) Mid-term impact of climate change on hazelnut yield. Agriculture 10(5): 159.
- Saavedra-Pérez N, Cañete Salinas P, Ogass K, Espinosa Ackerknecht C (2023) Characterization of yield spatial variability of European hazelnut (Corylus avellana L), using auxiliary variables of high spatial resolution. IEEE International Conference on Automation/XXVI Congress of the Chilean Association of Automatic Control (ICA-ACCA), Valdivia, Chile, 2023.
- Von Bennewitz E, Ramírez C, Muñoz D, Cazanga-Solar R, Lošak T, et al. (2019) Phenology, pollen synchronization and fruit characteristics of european hazelnut (Corylus avellana L.) cv. tonda de Giffoni in three sites of central Chile. Revista de la Facultad de Ciencias Agrarias UNCuyo 51(2): 55-67.
- Ji Z, Pan Y, Zhu X, Wang J, Li Q (2021) Prediction of crop yield using phenological information extracted from remote sensing vegetation index. Sensors 21(4): 1406.
- He L, Fang W, Zhao G, Wu Z, Fu L, et al. (2022) Fruit yield prediction and estimation in orchards: A state-of-the-art comprehensive review for both direct and indirect methods. Computers and Electronics in Agriculture 195: 106812.
- Anderson NT, Walsh KB, Wulfsohn D (2021) Technologies for forecasting tree fruit load and harvest timing-from ground, sky and time. Agronomy 11(7): 1409.
- Li B, Lecourt J, Bishop G (2018) Advances in non-destructive early assessment of fruit ripeness towards defining optimal time of harvest and yield prediction-A review. Plants 7(1): 3.
- Bregaglio S, Orlando F, Forni E, De Gregorio T, Falzoi S, et al. (2016) Development and evaluation of new modelling solutions to simulate hazelnut (Corylus avellana L.) growth and development. Ecological Modelling 329: 86-99.
- Bregaglio S, Giustarini L, Suarez E, Mongiano G, De Gregorio T (2020) Analysing the behaviour of a hazelnut simulation model across growing environments via sensitivity analysis and automatic calibration. Agricultural Systems 181: 102794.
- Bregaglio S, Fischer K, Ginaldi F, Valeriano T, Giustarini L (2021) The HADES yield prediction system–a case study on the Turkish hazelnut sector. Frontiers in Plant Science 12: 665471.
- Azarenko AN, McCluskey RL, Chambers WC (2004) Does canopy management help to alleviate biennial bearing in´ ennis´ and´ montebello´ hazelnut trees in oregon? In VI International Congress on Hazelnut 686: 237-242.
- Nosratabadi S, Ardabili S, Lakner Z, Mako C, Mosavi A (2021) Prediction of food production using machine learning algorithms of multilayer perceptron and ANFIS. Agriculture 11(5): 408.
- Ahmed S (2023) A Software Framework for Predicting the Maize Yield Using Modified Multi-Layer Perceptron. Sustainability 15(4): 3017.
- Bazrafshan O, Ehteram M, Moshizi ZG, Jamshidi S (2022) Evaluation and uncertainty assessment of wheat yield prediction by multilayer perceptron model with bayesian and copula bayesian approaches. Agricultural Water Management 273: 107881.
- Gandhi N, Petkar O, Armstrong LJ (2016) Rice crop yield prediction using artificial neural networks. In 2016 IEEE Technological Innovations in ICT for Agriculture and Rural Development (TIAR), pp. 105-110.
- Wang P, Hafshejani BA, Wang D (2021) An improved multilayer perceptron approach for detecting sugarcane yield production in IoT based smart agriculture. Microprocessors and microsystems 82: 103822.
- Drusch M, Del Bello U, Carlier S, Colin O, Fernandez V, et al. (2012) Sentinel-2: ESA's optical high-resolution mission for GMES operational services. Remote Sensing of Environment 120: 25-36.
- Louis J, Debaecker V, Pflug B, Main-Knorn M, Bieniarz J, et al. (2016) Sentinel-2 Sen2Cor: L2A processor for users. In Proceedings living planet symposium, pp. 1-8.
- Rosenblatt (1958) The Perceptron: A Theory of Statistical Separability in Cognitive Systems, Cornell Aeronautical Laboratory, Report No. VG1196-G-1.
- Thompson ML (1978) Selection of variables in multiple regression: Part I. A review and evaluation. International Statistical Review/Revue Internationale de Statistique, pp. 1-19.
- Mayer DG, Butler DG (1993) Statistical validation. Ecological modelling 68(1-2): 21-32.
- Anastasiou E, Balafoutis A, Darra N, Psiroukis V, Biniari A, et al. (2018) Satellite and proximal sensing to estimate the yield and quality of table grapes. Agriculture 8(7): 94.
- Bai X, Li Z, Li W, Zhao Y, Li M, et al. (2021) Comparison of machine-learning and casa models for predicting apple fruit yields from time-series planet imageries. Remote Sensing 13(16): 3073.