Statistically reinforced machine learning for nonlinear interactions of factors and hierarchically nested spatial patterns
A data-driven approach employing machine learning holds promise for applications in ecological informatics. Machine learning can automatically find unexpected relationships between variables and will help generate hypotheses. Despite recognizing such an advantage, many ecologists may feel uncomfortable with fully relying on a data-driven approach that is quite different from the hypothesis-driven approach (i.e. frequentist statistics). With several examples from our ongoing projects, we discuss how to apply machine learning in ecology in terms of both advantages and caveats. First, we introduce an emerging technique that blends machine learning and frequentist statistics: Statistically-reinforced machine learning. This modeling approach has potential to discover nonlinear and interaction relationships, relying on statistical significance without the requirement that the user specifies a priori which variables interact. We show an example where machine learning discovered the interactions of an elevational gradient, biogeoclimatic class, and forest coverage as the most important interactions from > 40000 possible three-way interactions for explaining freshwater biodiversity patterns across Switzerland. Second, we also introduce how to model spatial patterns that are hierarchically nested across multiple scales using machine learning, considering spatial autocorrelation that is largely overlooked.