where \(\mathsf A=(\mathsf A_i,j)=\kappa (\varvecx_i,\varvecx_j)\), \(i,j=1,\ldots ,n\), is the so-called interpolation (or collocation or simply kernel) matrix. The uniqueness of the solution of (1) is guaranteed as long as \(\kappa \) is strictly positive definite. For a more general formulation of the interpolant that involves conditionally positive definite kernels, and for a complete overview concerning kernel-based approximation, we refer the reader e.g. to [2].
kernel-based approximation methods using matlab pdf 53
As a consequence, the power function gives information about how the interpolation error relates to the node distributions. Indeed, as for polynomial and spline interpolation, the approximation quality strongly depends on the distribution of the scattered data. In view of this, starting with an initial set of nodes, many adaptive strategies have been studied in order to construct well-behaved interpolation designs, i.e. interpolation sets which provide an accurate reconstruction and, preferably, affordable computational costs. In particular, the so-called greedy approaches match such purposes by iteratively adding new points to the interpolation set. The iterative rule is based on minimizing a pointwise upper bound for the interpolant. Precisely, those strategies rely on the residual of the interpolant (f-greedy) or on the value of the power function (p-greedy); refer to [6,7,8,9] for a general overview. Despite the fact that we only focus on these two schemes, other strategies that combine the two above, known as \(f \cdot p\) and f/p greedy, are available; we refer the reader to [10,11,12]. These methods fall under the context of knot insertion algorithms, which have been studied also in the context of adaptive least-squares approximation [1, 21].
Kernel-based classification and regression methods have been successfully applied to modelling a wide variety of biological data. The Kernel-based Orthogonal Projections to Latent Structures (K-OPLS) method offers unique properties facilitating separate modelling of predictive variation and structured noise in the feature space. While providing prediction results similar to other kernel-based methods, K-OPLS features enhanced interpretational capabilities; allowing detection of unanticipated systematic variation in the data such as instrumental drift, batch variability or unexpected biological variation.
The Kernel-OPLS method [21] is a recent reformulation of the original OPLS method to its kernel equivalent. K-OPLS has been developed with the aim of combining the strengths of kernel-based methods to model non-linear structures in the data while maintaining the ability of the OPLS method to model structured noise. The K-OPLS algorithm allows estimation of an OPLS model in the feature space, thus combining these features. In analogy with the conventional OPLS model, the K-OPLS model contains a set of predictive components Tp and a set of Y-orthogonal components To. This separate modelling of Y-predictive and Y-orthogonal components does not affect the predictive power of the method, which is comparable to KPLS and least-squares SVMs [22]. However, the explicit modelling of structured noise in the feature space can be a valuable tool to detect unexpected anomalies in the data, such as instrumental drift, batch differences or unanticipated biological variation and is not performed by any other kernel-based method to the knowledge of the authors. Pseudo-code for the K-OPLS method is available in Table 1. For further details regarding the K-OPLS method, see Rantalainen et al. [21].
Implementations of various kernel-based methods are available in the literature for the R and MATLAB environments. Among the R packages available on CRAN [23], a few relevant examples include kernlab (kernel-based regression and classification), e1071 (including SVMs) and PLS (implementing a linear kernel-based implementation of the PLS algorithm). kernlab provides a number of kernel-based methods for regression and classification, including SVMs and least-squares SVMs, with functionality for n-fold cross-validation. The e1071 package contains functions for training and prediction using SVMs, including (randomised) n-fold cross-validation. The PLS package includes an implementation of both linear PLS as well as a linear kernel-based PLS version. This enables more efficient computations in situations where the number of observations is very large in relation to the number of features. The PLS package also provides a flexible cross-validation functionality.
The K-OPLS method can be used for both regression as well as classification tasks and has optimal performance in cases where the number of variables is much higher than the number of observations. Typical application areas are non-linear regression and classification problems using omics data sets. Properties of the K-OPLS method make it particularly helpful in cases where detecting and interpreting patterns in the data is of interest. This may e.g. involve instrumental drift over time in metabolic profiling applications using e.g. LC-MS or when there is a risk of dissimilarities between different experimental batches collected at different days. In addition, structured noise (Y-orthogonal variation) may also be present as a result of the biological system itself and can therefore be applied for the explicit detection and modelling of such variation. This is accomplished by interpretation of the Y-predictive and the Y-orthogonal score components in the K-OPLS model. The separation of Y-predictive and Y-orthogonal variation in the feature space is unique to the K-OPLS method and is not present in any other kernel-based method.
The K-OPLS algorithm has been implemented as an open-source and platform-independent software package for MATLAB and R, in accordance with [21]. The K-OPLS package provides functionality for model training, prediction and evaluation using cross-validation. Additionally, model diagnostics and plot functions have been implemented to facilitate and further emphasise the interpretational strengths of the K-OPLS method compared to other related methods.
Kernel methods have previously been applied successfully in many different pattern recognition applications due to the strong predictive abilities and availability of the methods. The K-OPLS method is well suited for analysis of biological data, foremost through its innate capability to separately model predictive variation and structured noise. This property of the K-OPLS method has the potential to improve the interpretation of biological data, as was demonstrated by a plant NMR data set where interpretation is enhanced compared to the related method KPLS. In conjunction with the availability of the outlined open-source package, K-OPLS provides a comprehensive solution for kernel-based analysis in bioinformatics applications. 2ff7e9595c
Comments