In:
Biometrika, Oxford University Press (OUP), Vol. 107, No. 3 ( 2020-09-01), p. 723-735
Abstract:
We consider the problem of approximating smoothing spline estimators in a nonparametric regression model. When applied to a sample of size $n$, the smoothing spline estimator can be expressed as a linear combination of $n$ basis functions, requiring $O(n^3)$ computational time when the number $d$ of predictors is two or more. Such a sizeable computational cost hinders the broad applicability of smoothing splines. In practice, the full-sample smoothing spline estimator can be approximated by an estimator based on $q$ randomly selected basis functions, resulting in a computational cost of $O(nq^2)$. It is known that these two estimators converge at the same rate when $q$ is of order $O\{n^{2/(pr+1)}\}$, where $p\in [1,2]$ depends on the true function and $r & gt; 1$ depends on the type of spline. Such a $q$ is called the essential number of basis functions. In this article, we develop a more efficient basis selection method. By selecting basis functions corresponding to approximately equally spaced observations, the proposed method chooses a set of basis functions with great diversity. The asymptotic analysis shows that the proposed smoothing spline estimator can decrease $q$ to around $O\{n^{1/(pr+1)}\}$ when $d\leq pr+1$. Applications to synthetic and real-world datasets show that the proposed method leads to a smaller prediction error than other basis selection methods.
Type of Medium:
Online Resource
ISSN:
0006-3444
,
1464-3510
DOI:
10.1093/biomet/asaa019
Language:
English
Publisher:
Oxford University Press (OUP)
Publication Date:
2020
detail.hit.zdb_id:
1119-8
detail.hit.zdb_id:
1470319-1
SSG:
12
Bookmarklink