Mohsen Shahhosseini; Guiping Hu

doi:10.19080/IJESNR.2020.25.556161

Short Communication

Machine Learning Models for Corn Yield Prediction: A Survey of Literature

Mohsen Shahhosseini and Guiping Hu*

Department of Industrial and Manufacturing Systems Engineering, Iowa State University, USA

Submission: July 27, 2020; Published: August 06, 2020

*Corresponding author: Guiping Hu, Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, Iowa, USA

How to cite this article: Mohsen S, Guiping H. Machine Learning Models for Corn Yield Prediction: A Survey of Literature. Int J Environ Sci Nat Res. 2020; 25(3): 556161.DOI: 10.19080/IJESNR.2020.25.556161

Abstract

The ability to predict crop yields enables the timely and effective decision making for crop management, and regional agriculture system planning. The field crop corn is the largest crop in the U.S. and hence significant efforts have been devoted to predicting corn yields through various means. The present survey reviews the studies that used machine learning models and their variations to predict corn yield.

Keywords:Agriculture system planning; Crop management; Environmental data; Deep neural networks; Spatial resolution

Introduction

Agriculture and its related industries contribute significantly to the US economy by providing 11% of total U.S employment, and with $1.05 trillion of U.S. gross domestic product (GDP) in 2017 [1]. Crop yield prediction is of great importance as it can deliver insightful information for improving crop management and subsequently U.S. and global economy. In 2019, corn was considered as the largest produced crop in the U.S. [2] and with the increasing demand of corn throughout the country, predicting corn production is essential. The present survey summarizes multiple well-known studies in predicting corn yield using machine learning (ML) models. We first present the most common data preprocessing tasks performed in the literature, and then provide a brief summary of the developed ML models as well as numerical results.

Data Preprocessing Tasks

The most common data preprocessing tasks done by the literature for corn yield prediction include dealing with yearly increasing corn yield trend, feature selection, imputing missing data, and dealing with different spatial resolutions of environmental data sets (soil and weather).

Corn yield trend

Historical corn yields throughout the country demonstrates an increasing trend. This trend is derived from improved genetics (cultivars), improved management, and other technological advances such as farming equipment. Generally, the yearly trend in the corn yields is addressed with two approaches. The first adds the trend back into the developed model as a linear component [2-7]. On the other hand, some studies use recursive neural network variations that are inherently able to capture the time dependency in the response variable [8,9].

Missing data

The missing data treatment strategies have been dependent on the nature of the developed data sets. Some studies impute the missing data with statistical measures [9,10], whereas some other studies made use of expert knowledge to impute the missing data with data aggregation or removing them from the developed data set [4,7,11].

Spatial aggregations

One of the common issues when developing initial data sets arises due to data ingestion from different sources. Each data set has a different spatial resolution. Hence, an important pre-processing task is spatial aggregation to re-arrange the data resolutions of different data sets. The most common solution undertaken in the literature is to use a statistical average/median of the information of the nearest neighbors to coordinate the spatial resolutions of different data sets [3- 8, 12-16].

Machine Learning Models

Various ML models have been designed to predict corn yields throughout the literature, but generally, they can be categorized into five main groups.

Regression-based models

Assuming a linear relationship between the independent and dependent variables, some studies built linear regression models to predict corn yield [6,16]. Other regression-based models in the literature include stepwise linear regression [14], and linear discriminant analysis (LDA) model [17].

Classification and regression tree models

The use of tree models in the literature has been limited due to the superior performance of tree ensemble models. The most common tree-based model has been M5 prime regression model which is an extension of regression tree model with the possibility of linear regression functions at the nodes [18,19].

Tree ensemble models

Tree ensemble models provided better prediction accuracy with the ability to capture nonlinear patterns. Random forest and extreme gradient boosting (XGBoost) have been used more than other tree ensemble models in the literature [3,20].

Neural network models

Like tree ensemble models, neural networks have the ability to deal with nonlinear patterns as well as presenting decent predictions. Many of the recent studies use variations of neural network models from back-propagation neural networks (BPNN) [10] to deep neural networks (DNN) [5,11-13,15], long short-term memory (LSTM) [8] and convolutional neural network (CNN) (Khaki et al., 2020) models.

General ensemble models

Some studies attempted to combine some ML models in an appropriate way to create superior ensemble of models. The base models can be as simple as regression trees or as complex as deep neural networks [4,7].

Summary of Results

The following table summarizes the studies that used ML models to predict US corn yields along with the numerical results of their best developed model (Table 1).

Conclusion

We presented a summary of the studies which use machine learning models to predict corn yields. We explained the most common preprocessing tasks that is done to prepare the data for building machine learning models. The developed ML models throughout the literature were categorized into five general groups and a summary of the studies that attempted to predict U.S. corn yields were presented in this study. Reviewing the studies that used simulation crop models and remote sensors to predict corn yields can be considered as a future research direction.

IJESNR.MS.ID.556161

Our Media Partner

IJESNR Menu

Useful Links

Downloads