Perhaps you can review this video:
https://statquest.org/xgboost-part-1-xgboost-trees-for-regression/
Outcompute to outcompete | Growing our own timber
I believe the origin of that image is from the DataCamp course "Extreme Gradient Boosting With XGBoost", and for full context the entirety of that slide is:
Linear Base Learner: - Sum of linear terms - Boosted model is weighted sum of linear models (thus is itself linear) - Rarely Used Tree Base Learner: - Decision tree - Boosted model is weighted sum of decision trees (nonlinear) - Almost exclusively used in XGBoost
And the chapter is particularly on regression with XGBoost.
Just sharing my thoughts as a fellow learner, correct my if I'm wrong.
I will just focus on the difference between the two learners (linear vs nonlinear).
For both cases, as an ensemble of weak learners, the regression prediction here would be the weighted sum of the outputs from all those weak models.
I believe the weight in the case of gradient boosted trees would be the learning rate i.e. eta parameter in XGBoost, not too sure if that would apply also for the gblinear booster.
So for the case of linear base learner, if each of the weak learners is modeling a linear function y=Xβ+ε, even if we take the weighted sum of all these linear functions, the resulting function can still be mathematically simplified into a single linear function.
Essentially, the model capacity of the ensemble of linear models is not much better than a single linear model at modeling more complex non-linear relationships.
In contrast, for the tree base learner, each of the weak learners is a decision tree.
Decision trees by themselves are already able to model non-linear relationships, so naturally their weighted sum is also able to model complex non-linear relationships.
Building more trees in the ensemble would increase the model capacity, but also increase the risk of overfitting.