Agriculture plays a crucial role in feeding the world, and selecting the right crop for a given soil type can significantly improve productivity. With Machine Learning and Streamlit, we can build a Smart Crop Selection App that predicts the most suitable crop based on soil properties such as Nitrogen (N), Phosphorus (P), Potassium (K), and pH levels.
In this blog post, we will walk you through how to use Streamlit to build and deploy a Smart Crop Selection web application.
Streamlit is an open-source Python library that makes it easy to build and share interactive web applications for data science and machine learning projects. It requires minimal code and allows quick visualization of models and data.
Before building the app, install the required dependencies.
Run the following command in your terminal:
pip install streamlit pandas numpy scikit-learn joblib xgboost
This installs:
Create a directory structure:
smart_crop_selection/
├── app.py # Streamlit web app
├── data/
│ ├── soil_measures.csv # Dataset
├── models/
│ ├── crop_model.pkl # Trained ML model
│ ├── label_encoder.pkl # Encode data labels
│ ├── scaler.pkl # Scale data inputs
├── scripts/
│ ├── preprocess.py # Data preprocessing functions
│ ├── train_model.py # Model training script
├── requirements.txt # List of dependencies
├── README.md # Project documentation
We need to preprocess the soil dataset before training the model.
preprocess.py
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, StandardScaler
import joblib
def preprocess_data(df, fit_scaler=True, fit_encoder=True):
"""
Preprocesses the input dataset:
- Encodes the target variable
- Standardizes numerical features
Parameters:
df (pd.DataFrame): Input data containing soil parameters and crop labels
fit_scaler (bool): Whether to fit a new StandardScaler (True for training, False for inference)
fit_encoder (bool): Whether to fit a new LabelEncoder (True for training, False for inference)
Returns:
tuple: (Preprocessed feature matrix X, Encoded target variable y, scaler, label_encoder)
"""
# Separate features and target
X = df.drop(columns=['crop']) # Ensure 'crop' is the target variable
y = df['crop']
# Encode the target variable
label_encoder = LabelEncoder()
if fit_encoder:
y_encoded = label_encoder.fit_transform(y)
joblib.dump(label_encoder, 'models/label_encoder.pkl') # Save encoder
else:
label_encoder = joblib.load('models/label_encoder.pkl')
y_encoded = label_encoder.transform(y)
# Standardize numerical features
numerical_features = ['N', 'P', 'K', 'ph']
X[numerical_features] = X[numerical_features].astype('float64')
scaler = StandardScaler()
if fit_scaler:
X_scaled = X.copy()
X_scaled[numerical_features] = scaler.fit_transform(X[numerical_features])
joblib.dump(scaler, 'models/scaler.pkl') # Save scaler
else:
scaler = joblib.load('models/scaler.pkl')
X_scaled = X.copy()
X_scaled[numerical_features] = scaler.transform(X[numerical_features])
return X_scaled, y_encoded, scaler, label_encoder
def preprocess_input(input_df):
"""
Prepares new input data for model prediction.
Parameters:
input_df (pd.DataFrame): User input containing soil parameters
Returns:
pd.DataFrame: Scaled input ready for prediction
"""
scaler = joblib.load('models/scaler.pkl') # Load pre-trained scaler
numerical_features = ['N', 'P', 'K', 'ph']
input_df[numerical_features] = input_df[numerical_features].astype('float64')
input_df[numerical_features] = scaler.transform(input_df[numerical_features])
return input_df
We use XGBoost to predict the best crop based on soil parameters.
train_model.py
import pandas as pd
import joblib
import xgboost as xgb
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.metrics import accuracy_score
from .preprocess import preprocess_data
# Load dataset
df = pd.read_csv('data/soil_measures.csv')
# Preprocess data
X_scaled, y_encoded, scaler, label_encoder = preprocess_data(df, fit_scaler=True, fit_encoder=True)
# Split the data with stratification
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_encoded, test_size=0.2, stratify=y_encoded, random_state=42)
# Define parameter grids for XGBoost
xgb_params = {
'max_depth': [10, 20],
'learning_rate': [0.001, 0.1, 1],
'n_estimators': [100, 200, 500],
'subsample': [0.7, 0.9]
}
# Perform hyperparameter tuning using cross-validation
kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
# XGBoost tuning
xgb_model = xgb.XGBClassifier(eval_metric='mlogloss', verbosity=1)
xgb_grid = GridSearchCV(estimator=xgb_model, param_grid=xgb_params, cv=kf, scoring='accuracy', n_jobs=1, verbose=1)
xgb_grid.fit(X_train, y_train)
# Best model from GridSearchCV
best_xgb_model = xgb_grid.best_estimator_
# Evaluate model
y_pred = best_xgb_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Best XGBoost Model Accuracy: {accuracy:.4f}')
# Save the best model
joblib.dump(best_xgb_model, 'models/crop_model.pkl')
app.py
import streamlit as st
import pandas as pd
import numpy as np
import joblib
from scripts.preprocess import preprocess_input
from sklearn.metrics import accuracy_score
# Load trained model and label encoder
crop_model = joblib.load('models/crop_model.pkl')
label_encoder = joblib.load('models/label_encoder.pkl')
# Streamlit UI
st.title("🌱 Smart Crop Selection Web App")
st.markdown("Enter soil parameters to get the best crop recommendation.")
# Sidebar inputs
st.sidebar.header("Input Soil Parameters")
N = st.sidebar.number_input("Nitrogen (N)", min_value=0.0, max_value=160.0, value=30.0)
P = st.sidebar.number_input("Phosphorus (P)", min_value=0.0, max_value=160.0, value=60.0)
K = st.sidebar.number_input("Potassium (K)", min_value=0.0, max_value=250.0, value=50.0)
pH = st.sidebar.number_input("pH Level", min_value=0.0, max_value=10.0, value=6.0)
# Convert inputs to DataFrame
input_data = pd.DataFrame([[N, P, K, pH]], columns=['N', 'P', 'K', 'ph'])
# Preprocess input
test_input = preprocess_input(input_data)
# Prediction
if st.sidebar.button("Predict Crop"):
predicted_crop_index = crop_model.predict(test_input)[0]
recommended_crop = label_encoder.inverse_transform([predicted_crop_index])[0]
# Calculate accuracy on training data
training_data = pd.read_csv('data/soil_measures.csv')
X_train, y_train, _, _ = preprocess_input(training_data.drop(columns=['crop'])), training_data['crop'], None, None
y_train_encoded = label_encoder.transform(y_train)
y_train_pred = crop_model.predict(X_train)
accuracy = accuracy_score(y_train_encoded, y_train_pred) * 100
# Display results
st.subheader("🌾 Recommended Crop:")
st.write(f"**{recommended_crop}** with an accuracy of **{accuracy:.2f}%**")
streamlit run app.py
app.py
as the main file.You can access the deployed version of this app: https://smartcropselection.streamlit.app/
The deployment code of the project is available here.
The data exploratory analysis and model development insights are discussed in detail in my previous post.
With Streamlit and Machine Learning, we have successfully built a Smart Crop Selection App that predicts the best crop based on soil parameters. 🚜🌱
Try deploying your own version and improve it by adding weather data or more soil parameters!