Multidimensional adaptive P‐splines with application to neurons' activity studies

The receptive field (RF) of a visual neuron is the region of the space that elicits neuronal responses. It can be mapped using different techniques that allow inferring its spatial and temporal properties. Raw RF maps (RFmaps) are usually noisy, making it difficult to obtain and study important features of the RF. A possible solution is to smooth them using P‐splines. Yet, raw RFmaps are characterized by sharp transitions in both space and time. Their analysis thus asks for spatiotemporal adaptive P‐spline models, where smoothness can be locally adapted to the data. However, the literature lacks proposals for adaptive P‐splines in more than two dimensions. Furthermore, the extra flexibility afforded by adaptive P‐spline models is obtained at the cost of a high computational burden, especially in a multidimensional setting. To fill these gaps, this work presents a novel anisotropic locally adaptive P‐spline model in two (e.g., space) and three (space and time) dimensions. Estimation is based on the recently proposed SOP (Separation of Overlapping Precision matrices) method, which provides the speed we look for. Besides the spatiotemporal analysis of the neuronal activity data that motivated this work, the practical performance of the proposal is evaluated through simulations, and comparisons with alternative methods are reported.


INTRODUCTION
A fundamental characteristic of neurons of the mammalian visual system, as well as all neurons involved in other sensory systems, is that they have what is known as a receptive field (RF). In the case of a visual neuron, the RF is the small area of the visual field that the neuron "sees" (i.e., it is the area that elicits neuronal responses). It is known that this RF has both spatial and temporal properties, and their study is fundamental to understanding how visual information is processed in the brain. RF can be mapped using different techniques, such as reverse cross-correlation. These techniques allow studying how visual neurons process sensory stimuli from different positions in their visual field. In brief, since as a result of a sensory stimulus, a neuron can produce sudden changes in its membrane potential known as "spikes," from the neuron responses (spikes), it is possible to infer properties of the RF (e.g., its spatiotemporal properties: where and when a sensory stimulus produces a response). Yet, neuronal responses usually produce noisy signals as they encode the information by means of spike trains of variable length and frequency, and therefore, the spatial and temporal properties of their RF might not be clearly defined. Postprocessing the noisy neural signals through smoothing techniques allows obtaining RF maps (RFmaps) with desirable properties and paves the way for the study of important characteristics of RFs, such as their size, center, or boundaries, as well as comparisons among neurons and/or experimental conditions. Penalized splines (O'Sullivan, 1986;Eilers & Marx, 1996) has become one of the most popular smoothing methods in semiparametric regression for the estimation of curves and surfaces. Penalized splines combine a lowrank basis with a penalty term. This term controls the smoothness of the estimated function, and its influence is determined by a smoothing parameter. In the classic penalized spline approach, this parameter is global in the sense that it provides a constant amount of smoothing (i.e., classic penalized splines rely on assuming a smooth transition of the covariate effect across its whole domain). This might be a serious drawback in situations where the curves/surfaces are nonhomogeneous, showing rapid changes in some regions while being rather smooth in others (Ruppert & Carroll, 2000). Indeed, this is the case of the neuronal activity study that motivated the work described in this paper. Details regarding the experiment and the data are given in Section 2, but we advance that, in this setting, classic penalized spline models are not able to properly recover the sharp spatiotemporal transitions present in the data, which ask for spatially adaptive smoothness.
In the case of univariate smoothing, a wide range of solutions to adapt smoothness locally to the data have been proposed. Ruppert and Carroll (2000) based their approach on locally varying smoothing parameters and a multivariate cross-validation (CV) approach for their selection. In Krivobokova et al. (2008), the varying smoothing (variance) parameters are modeled as a log-penalized spline, yielding a hierarchical mixed model; they use a Laplace approximation of the marginal likelihood for parameter estimation. More recently, Yue et al. (2014) introduced a class of adaptive smoothing spline models that is derived by solving certain stochastic differential equations with finite element methods. The task becomes more complicated when trying to achieve adaptivity in two dimensions. As far as we are aware, all attempts to two-dimensional adaptive smoothing have been proposed in the context of isotropic smoothing, i.e., the same amount of smoothing is used in both dimensions. For example, Lang and Brezger (2004) use locally adaptive smoothing parameters that are incorporated using a smoothness prior with spatially adaptive variances and Yue and Speckman (2010) improve this work by proposing a prior with a spatially adaptive variance component and taking a further Gaussian Markov random field prior for this variance function. A different approach is taken by Jang and Oh (2011) who replace the global smoothing parameter by a smooth step function that can be extended to higher dimensions. In Krivobokova et al. (2008), adaptivity in two dimensions is achieved by modeling the spatially variable smoothing parameters using (penalized) low-rank radial basis functions, and Reiss et al. (2017) use adaptive smoothing and a nonisotropic penalty, but the adaptivity is only in one direction, allowing only one smoothing parameter to vary along the other dimension. However, all these approaches are either computationally very demanding or unstable, or do not include general smoothing structures such as anisotropy.
In this work, we present a general framework for anisotropic multidimensional adaptive smoothing in the context of P-spline models (Eilers & Marx, 1996), where B-spline basis are combined with discrete difference penalties on the coefficients. Our proposal relies on the construction of locally adaptive penalties through assuming a different smoothing parameter for each coefficient difference in the penalty. Its use is not restricted to Gaussian responses (as it is in the case of most of the existing approaches) because it can be easily extended to responses within the exponential family of distributions. One of the possible drawbacks of adaptive smoothing is the computational cost of having to estimate, or select by CV methods or information criteria, multiple smoothing parameters. We solve this issue by using the connection between P-splines and generalized linear mixed models (Currie & Durban, 2002) and the recently developed SOP (Separation of Overlapping Precision Matrices) method (Rodríguez-Álvarez et al., 2019). SOP allows estimating Pspline models with overlapping penalty matrices (or overlapping precision matrices) and uses restricted maximum likelihood (REML) to select the smoothing parameters (variance parameters). For a discussion on the benefits and disadvantages of RE/ML-based and CV-based smoothing parameter selection, we refer the reader to Krivobokova and Kauermann (2007) and Krivobokova (2013).
The rest of the paper is organized as follows. In Section 2, we describe the study that motivated this work. Section 3 presents our approach for the construction of locally adaptive multidimensional anisotropic penalties. The empirical performance of the approach is evaluated in a simulation study in Section 4. In Section 5, results for the neurons' activity study are shown, and we conclude with a discussion. Details on estimation and computation, extended simulations, and extra results for the neuronal activity data are available as Supporting Information.

MOTIVATING EXAMPLE: NEURONS' ACTIVITY STUDY
The work described in this paper was motivated by the research and electrophysiology studies conducted by Francisco Gonzalez, professor of Ophthalmology at the University of Santiago de Compostela (Galicia, Spain). The data we deal with here derive from experiments conducted to study neurons' activity in the visual cortex, in particular, the spatial and temporal properties of RFs. A detailed explanation of the electrophysiological experiment and the reverse cross-correlation technique discussed here can be found in Rodríguez-Álvarez et al. (2012) (and references therein). That paper was also the starting point of this work. Schematically, the experiment is as follows (Figure 1). The subject (a monkey) is viewing two monitors, one for each eye. In each monitor, there is a square area of dimension 16 × 16 (i.e., 256 spatial/grid locations). In a pseudorandom way, stimuli are delivered at these spatial locations ( Figure 1A). The experiment is conducted under two different experimental conditions. Namely, the stimulus can correspond to the flash of a bright ("ON") or dark ("OFF") spot. While the stimuli are being delivered, the activity of the neuron is being recorded ( Figure 1B). When a spike occurs, say at 0 , the location of the stimulus at different prespike times (−20, −40, … , −320 ms; for this experiment, it is not expected that a stimulus would produce a response (spike) after 320 ms) is recovered ( Figure 1C). Unfortunately, the way the experiment is performed does not allow to know which of these stimuli (and thus locations) is responsible for the spike (neuron's response), and all stimuli are considered as potentially responsible. As such, the number of spike occurrences attributed to the location of the stimulus at the different prespike times is increased by one ( Figure 1D, red bold numbers). However, since, during the experiment, stimuli are randomly delivered/presented many times, this allows determining where and when a stimulus produces a neuron's response. Summarizing, for each neuron (the complete experiment contains data for 17 different neurons), eye (left or right), and experimental condition ("ON" or "OFF"), the reverse cross-correlation technique provides a dataset consisting of a series of 16 matrices (one for each prespike time) of dimension 16 × 16 (256 grid positions that represent the square area). Each cell of each of the 16 matrices contains the number of spike occurrences attributed to this location at the corresponding prespike time. Besides, there is an extra matrix of dimension 16 × 16 with the number of stimulus presentations on each particular location of the square area. The graphical representation of each of the 16 matrices (normalized, i.e., each cell is divided by the number of stimulus presentations in this location) is called RF map (RFmap) and can be regarded as a representation of the firing rate of the neuron. Figure 2 depicts the evolution over time of the raw RFmap for a particular neuron (denoted by FAU3), eye (right), and experimental condition ("ON"). Despite the noisy data, it can be observed that for most prespike times, the firing rate is uniform (shows no structure), but there is a clear increase in the firing rate around −60 ms for a central area of the visual field. In other words, the results suggest that only stimuli in this central area produce a response of the neuron and that the response occurs around 60 ms after the stimulus is presented. This area corresponds to the RF of the neuron. In this paper, we aim to use P-spline models to provide smoothed (denoised) versions of RFmaps such as those shown in Figure 2. Yet, note that the raw RFmaps show that the transition from outside the RF into the RF is very sharp and that they are structured for a short period (between −40 and −80 ms). This suggests the need for multidimensional adaptive smoothing, which was also pointed out in the discussion of Rodríguez-Álvarez et al.

MULTIDIMENSIONAL ADAPTIVE PENALTY
Our proposal for the construction of locally adaptive penalties in more than one dimension builds upon the same principles as the adaptive penalty in one dimension (see, e.g., Rodríguez-Álvarez et al., 2019, and references therein). In consequence, to make the presentation of the new results more readable, we first briefly focus on the one-dimensional case, and then, we move onto the multidimensional setting. For clarity and brevity, we describe The animal is viewing two monitors (A) with a fixation target. Within a square area over the neuron visual field a bright ("ON") or dark ("OFF") spot is flashed in different positions in a pseudorandom manner. Neuron spikes are recorded, whereas the stimulus is delivered (B). When a spike is produced ( 0 ), the stimulus position at several prespike times (−20, −40, … , −320 ms) is read (C) and the corresponding position is increased by one (D, red bold numbers). This figure appears in color in the electronic version of this article, and any mention of color refers to that version. in full detail the rationale for the adaptive penalty in two dimensions and relegate the generalization to the three-dimensional case to Web Appendix C.

Adaptive penalty in one dimension
Let = ( 1 , … , ) ⊤ be a vector of observations, and consider the (simple) generalized model ( ) = ( ) ( = 1, … ), where = ( | ), ar( | ) = ( ), (⋅) is the link function, and (⋅) is a smooth and unknown function. Here, (⋅) is a specified variance function, and is the dispersion parameter that may be known or unknown. In P-splines (Eilers & Marx, 1996), the function ( ) is modeled as a linear combination of B-splines basis functions, i.e., ( ) = ∑ =1 ( ), and smoothness is ensured by penalizing the differences of order of coefficients associated with adjacent B-spline basis functions, i.e., the penalty takes the following form: where Δ forms differences of order , = ( 1 , 2 , … , ) ⊤ , and is the matrix representation of Δ . Finally, is the smoothing parameter that controls the trade-off between fidelity to the data (when is small) and smoothness of the function estimate (when is large). Note that Equation (1) penalizes all coefficient differences, Δ ( = + 1, … , ), by the same smoothing parameter (see Web Figure 1a). Implicit in Equation (1) is thus the assumption that the same amount of smoothing is needed across the whole domain of the covariate. The locally adaptive penalty relaxes this assumption by assuming a different smoothing parameter for each coefficient difference where = ( 1 , … , − ) ⊤ is a vector of smoothing parameters. That is, the adaptive penalty defined in (2) allows the amount of smoothing (driven by the smoothing parameters ) to vary locally depending on the covariate values. This is graphically illustrated in Web Figure 1(b). To reduce the complexity of the adaptive penalty in (2) (there are as many smoothing parameters as coefficient differences, i.e., − ), the vector of smoothing parameters is further replaced by a smooth version of it = ( 1 , … , ) ⊤ (with < ( − ) so as to ensure that the number of smoothing parameters is reduced) using a B-spline basis expansion, i.e., (3) Plugging-in the right-hand side of previous equation into (2), the locally adaptive penalty is expressed as

Raw Data
is the B-spline design matrix of dimension ( − ) × ; in (4) is thus the column of .

Adaptive penalty in two dimensions
In the two-dimensional case, interest lies in the generalized model where = ( 1 , 2 ) ⊤ is a two-dimensional covariate vector, and (⋅, ⋅) is a smooth and unknown bivariate surface, defined over covariates 1 and 2 . When it comes to extend the P-spline principles to two dimensions, we first model the bidimensional surface in terms of B-splines. This is accomplished by the tensor product of two marginal B-splines bases, i.e., ( 1 , 2 ) = ∑ 1 =1 ∑ 2 =1 1 ( 1 ) 2 ( 2 ), where 1 (⋅) and 2 (⋅) are the marginal B-spline basis functions for, respectively, 1 and 2 . In matrix form, model (6) becomes where = 2 □ 1 = ( 2 ⊗ ⊤ 1 ) ⊙ ( ⊤ 2 ⊗ 1 ), and 1 = [ 1; ] with 1; = 1 ( 1 ), 2 = [ 2; ] with 2; = 2 ( 2 ), is a column vector of ones of length , ⊗ denotes the Kronecker product, ⊙ the element-wise (Hadamard) product, and □ the "box" product (the facesplitting product or row-wise Kronecker product, Eilers et al., 2006;Slyusar, 1999). Finally, = ( 1 , … , ) ⊤ and = ( 11 , … , 1 1 , … , 1 2 ) ⊤ . As for the one-dimensional case, in the two-dimensional setting, smoothness is F I G U R E 3 Illustration of the tensor product of marginal B-splines basis functions ( 1 ( 1 ) 2 ( 2 )) and the anisotropic penalty (based on coefficient differences along covariates 1 and 2 ) defined in (8). The top row shows the landscape of nine cubic B-spline tensor products-a portion of a full basis-and highlights why forming coefficient differences along 1 and 2 (i.e., on, respectively, the columns and rows of the matrix of coefficients) ensures smoothness along the corresponding covariate. The bottom row schematically illustrates the coefficient differences (arrows) and the smoothing parameters acting on them. In both cases, red is used for covariate 1 and blue for covariate 2 . achieved by penalizing coefficient differences. In particular, the anisotropic penalty in two dimensions is defined as where ( = 1, 2) are difference matrices of possibly different order , and and˜are the smoothing parameters (Eilers & Marx, 2003). Before proceeding, it is worth seeing the vector as a ( 1 × 2 ) matrix of coefficients, = [ ]; the rows and columns of correspond to the regression coefficients in the 1 and 2 directions, respectively. Thus, ⊤ ( 2 ⊗ ⊤ 1 1 ) forms (the sum of squares of) differences of order 1 on each column of the matrix of coefficients ; it is thus responsible, in combination with the smoothing parameter , for the smoothness along covariate 1 . Similarly, ⊤ ( 2 ⊤ 2 ⊗ 1 ) forms (the sum of squares of) differences of order 2 on each row of the matrix of coefficients, controlling, jointly with˜, the smoothness along covariate 2 . Figure 3 gives more insights about (the reasoning behind) the anisotropic penalty just presented, and helps presenting the adaptive penalty in two dimensions that will follow.
By considering two different smoothing parameters and˜(i.e., anisotropy), the penalty in (8) permits a different amount of smoothing for 1 and 2 . However, in some circumstances, this flexibility may not be enough for the model to capture "local" behaviors in the data; all coefficient differences (along 1 or along 2 ) are penalized by the same smoothing parameter ( or˜), and thus, the same amount of smoothing is assumed along each covariate. Following the same reasoning as in the one-dimensional case, we propose to overcome the (possible) lack of flexibility of penalty (8) by considering a different smoothing parameter for each coefficient difference, and we do that separately for the coefficient differences along 1 and along 2 . The idea is graphically exemplified in Figure 4, which shows clearly that our approach gives rise to two matrices of smoothing parameters, = [ ] of dimension ( 1 − 1 ) × 2 , and˜= [˜] of dimension 1 × ( 2 − 2 ). Recall that is the dimension of the marginal B-splines bases and is the penalty order ( = 1, 2). In particular, the smoothing parameters in act on the coefficient differences along 1 , and then, control the amount of smoothing along that covariate, but permitting it to vary locally. The same applies to˜, which (adaptively) controls the amount of smoothing along 2 . With this in mind, our two-dimensional anisotropic adaptive penalty takes the following form: where = vec( ) and˜= vec(˜), and vec( ) denotes the vectorization of matrix . A possible drawback of the adaptive penalty defined in (9) is the number of smoothing parameters involved, which equals to ( 1 − 1 ) × 2 + 1 × ( 2 − 2 ); i.e., the total number of coefficient differences along 1 and 2 . This may give rise, in addition to undersmoothing and unstable computations, to prohibitively long computing times. We propose to reduce the complexity of the multidimensional adaptive penalty in (9) through a reduction on the number of smoothing parameters. In a similar manner to the one-dimensional case presented in Section 3.1, this is done by considering a smoothed (and smaller) version of them. The underlying assumption is that smoothing parameters that are spatially proximate are more likely to be similar than those farther apart. Before proceeding, note that we now have two matrices of smoothing parameters and˜(see also Figure 4). It seems then reasonable to smooth them using the tensor product of marginal B-spline bases, and we do it separately for and˜. Taking advantage of the "data" (smoothing parameters) being in an array structure, we write , and˜( 2 − 2 )× 22 2 are B-spline design matrices (the superindices indicate their dimension). In particular, these matrices are constructed as follows: Plugging-in the right-hand side of Equations (10) and (11) into (9), and after some algebraic operations, we obtain our proposal for the adaptive penalty in two dimensions where and˜denote, respectively, the columns and of = 2 ⊗ 1 (see (10)) and˜=˜2 ⊗˜1 (see (11)).

Simplifications and generalizations
The two-dimensional adaptive penalty presented in the previous section (expression (13)) is the most general one: A different smoothing parameter is assumed for each coefficient difference along both 1 and 2 . However, several simplifications may be made according to the data at hand. These are discussed and presented in detail in the Supporting Information (Web Appendix B), where we also cover the extension of the adaptive penalty to three dimensions (Web Appendix C).

Estimation and inference
To estimate model (6) subject to the adaptive penalty defined in (13) (as well as the simplifications and the extension to the three-dimensional case discussed in the Supporting Information), we adopt the equivalence between Psplines and generalized linear mixed models for estimation F I G U R E 4 Illustration of the adaptive penalty in two dimensions. Separately for 1 (left-hand side plot) and 2 (right-hand side plot), each coefficient difference is penalized by a different smoothing parameter. The arrows (and colors) schematically represent different coefficient differences jointly with the smoothing parameters acting on them ( for the coefficient differences along 1 and˜for the coefficient differences along 2 ). and inference (e.g., Currie & Durban, 2002;Wand, 2003). For brevity, the technical details and a discussion on computational aspects are given in Web Appendix D, but we note here that the model reparameterization we use gives rise to precision matrices for the random effects that are linear in the inverse of the variance parameters (i.e., the precision parameters). This feature allows using the SOP method (Rodríguez-Álvarez et al., 2019) for estimation. One of the main advantages of SOP is that it is computationally very efficient when it comes to estimating variance parameters; this is essential in our setting where the number of variance parameters (or equivalently smoothing parameters) to be estimated may be very large.

SIMULATION STUDY
This section reports the results of a simulation study conducted to study the empirical performance of the multidimensional adaptive penalties described in Section 3 above. For conciseness, the study concentrates on the two-dimensional case, and only the full adaptive penalty specification is evaluated (expression (13)). We compare the performance of our proposal with that described in Krivobokova et al. (2008) and implemented in the R-package AdaptFit (Krivobokova, 2012). Also, a nonadaptive two-dimensional P-spline model (i.e., a model with a standard anisotropic penalty; see (8)

Scenarios and setup
Three different scenarios are considered in this study. The first scenario is classical in the context of adaptive P-splines in two dimensions, and it has been considered, among others, by Lang and Brezger (2004), Crainiceanu et al. (2007), and Krivobokova et al. (2008). The second and third scenarios correspond to highly varying functions with sharp peaks, where adaptive smoothing seems a sensible choice.

Scenario III
The true two-dimensional functions used in each scenario are shown in Web Figure 3. For all scenarios, a total of = 250 replicates are performed and we consider ∈ {300, 500, 1000, 2000}. For the P-spline models with and without adaptive smoothing, we use second-order differences ( = 2) and marginal cubic B-splines bases of dimension = 12 (Scenario I) and = 20 (Scenarios II and III) to represent the two-dimensional functions ( = 1, 2); larger bases dimensions are considered for Scenarios II and III due to the complexity of the simulated functions. For the adaptive approach, we consider the full adaptive penalty and choose = 5 ( , = 1, 2; see (10) and (11)). This gives rise to a total of 50 (2 × 5 2 ) smoothing parameters (or variance components). For the proposal by Krivobokova et al. (2008) (hereafter denoted as AdaptFit), we use 144 (12 × 12) knots for the two-dimensional function in Scenario I and 200 (20 × 20) knots for Scenarios II and III. Also, and for the three scenarios, we consider 25 (5 × 5) knots for the variance parameters. Here, low-rank radial basis functions are used, and the knots are selected based on the clara algorithm by Kaufman and Rousseeuw (1990).

Results
For brevity, the majority of graphical and numerical results are provided in Web Appendix E, and we focus here on the main findings. We note that, for Scenarios II and III, AdaptFit presents severe convergence problems (e.g., for Scenario II-all response distributions-and Scenario III-Poisson and Bernoulli-it does not converge for any run). To avoid misleading conclusions on the performance of AdaptFit, results for these two scenarios are not shown. Figure 5 shows boxplots of log(MSE) for all the scenarios, response distributions, sample sizes, and noise levels considered in the study. MSE is computed at the observed covariate values, where, for Gaussian data, the true linear predictor ( ) is chosen as the target, and in the case of Poisson and binary data, it is computed on the response scale. We start with the results for Scenario I. To our surprise, in our study, the best approach is the P-spline model with the standard anisotropic penalty, followed closely by the adaptive approach proposed in this paper. AdaptFit is the one performing the worst. This result somehow contradicts that reported in Lang and Brezger (2004), where the adaptive approach (different to the one proposed here) performs better than the nonadaptive counterpart. We highlight that the results we obtain for AdaptFit are in concordance to those presented in Krivobokova et al. (2008) (median of log(MSE) of −3.53 and −3.79, respectively, for = 300), which, in turn, outperform results in both Lang and Brezger (2004) and Crainiceanu et al. (2007). If we focus on the results for Scenarios II and III, we see that the full two-dimensional adaptive approach proposed in this paper is the one that performs the best, for (almost) all response distributions, sample sizes, and noise levels. Differences are particularly remarkable (in favor of the adaptive approach) for moderate to large sample sizes, and, in the Gaussian case, for low noise levels. Results regarding coverage probabilities (averaged over all the surface) are shown in Web Table 1. In general, average coverage probabilities are close to the nominal level except for small sample sizes and large noise levels in the Gaussian case (Scenarios II and III), and for the Bernoulli distribution in Scenario III. In these cases, average coverage probabilities are around 90%. Results for the adaptive and nonadaptive P-spline models are very similar, but at the cost of wider confidence intervals of the nonadaptive approach (see Web Figure 4). Despite results on coverage probabilities being similar on average, when looking at the pointwise coverage probability maps/surfaces (see Web Figures 5-11), we observe that, for Scenarios II and III, the nonadaptive approach presents very low coverage probabilities (in some cases lower than 10%) around the areas where the sharp peaks occur, which is likely where the interest lies (the nonadaptive approach has trouble capturing the sharp peaks, see Web Figures 13-21, and this translates into poor coverage probabilities). Finally, results regarding computing times (in seconds and logarithm scale) are presented in Web Figure 12. As could have been expected, in all cases, fitting the nonadaptive P-spline model is faster than F I G U R E 5 For the simulation study: Violin plots of log(MSE) across = 250 replicates for Scenarios I-III, different response distributions (Gaussian, Poisson, and Bernoulli), levels of noise ( ), and sample sizes ( ). "2D P-spline" stands for the two-dimensional P-spline model with the standard anisotropic penalty, "2D Adapt. P-spline" for the model considering the full adaptive penalty in two dimensions proposed in this paper, and "2D AdaptFit" for the proposal by Krivobokova et al. (2008). using the adaptive alternatives, but our approach outperforms AdaptFit. To give some numbers, for one of the worst-case situations (Scenario III, = 2000 and Binomial distribution), the standard anisotropic two-dimensional P-spline model requires, in median, around 6.7 s, whereas the proposed adaptive approach requires around 32.6 s. In any case, computing times are more than reasonable. However, we are aware that these values cannot be considered as benchmarks: It is expected that computing times will increase significantly for other (and more "generous") bases specifications (see also Web Appendix D for a discussion on computational aspects).

NEURONS' ACTIVITY STUDY: RESULTS
We now present the results for the study discussed in Section 2. Recall that the objective is to produce smoothed (de-noised) versions of RFmaps (see Figure 2). To that aim, we adopt a Poisson model that expresses the neuronal response as a smooth function of both space (row and column position) and time where denotes the number of spike occurrences attributed to stimulus presentations at row and column of the square area ( , = 1, … , 16) for the th prespike time ( = −20, … , −320), is the total number of stimulus presentations during the experiment at row and column , and is the intensity parameter (or firing rate). For fitting model (18), we consider the full three-dimensional adaptive penalty (see Web Appendix C). Also, we fit a model with the standard anisotropic penalty (i.e., nonadaptive smoothing). In both cases, we take advantage of the array structure of the data and generalized linear array methods (GLAMs, Currie et al., 2006) are used. Besides, we use second-order differences ( = 2) and marginal cubic B-splines bases of dimension = 11 to represent the spatiotemporal surface. For the adaptive approach, we choose = 6 ( , = 1, 2, 3), yielding a total of 648 (3 × 6 3 ) smoothing parameters (or variance components). These values are chosen to provide enough flexibility to the models. Figure 6 depicts (for the same neuron, eye, and experimental condition shown in Figure 2) the raw as well as the estimated (denoised) time-series of RFmaps (firing rates) using both approaches. For conciseness, results are not shown for the whole sequence of prespike times, but for five frames, from 20 to 100 ms prespike times (the complete results can be found in Web Appendix F). As noted before, neuronal responses usually produce noisy signals, making it difficult to study the spatial and temporal properties of their RF (see top row of Figure 6). Smoothing with the proposed adaptive approach yields clearly defined structures and time development of the RF (see middle row of Figure 6), whereas smoothing without the adaptive penalty yields a poorly structured RF (see bottom row of Figure 6). More precisely, the results provided by the adaptive approach show that the RFmap begins to be structured about 80 ms prespike and peaks approximately at 60 ms when a clear central area of high values (firing rates) appears that lasts until 40 ms. Outside the time range from −80 to −40 ms, the RFmaps show no structure and a uniform firing rate pattern (see Web Figure 34). In other words, and in concordance with the raw RFmaps, the adaptive approach clearly shows that the spike emission (neuronal response) is coupled with the stimulus from −80 to −40 ms (i.e., the time between sensory stimulus and neuronal response spans from 40 to 80 ms) and that the response is restricted to a small area (this area being the visual RF of the neuron) centered and stabilized over time in the space covered by the stimulus presentation. By contrast, the smoothed RFmaps provided by the nonadaptive approach show almost no visible structure, neither spatial nor temporal (the approach is not able to recover neither the peak nor the temporal pattern; see also Web Figure 35 and Web Figure 37 where some cross-sections are shown jointly with 95% pointwise confidence intervals). As such, these results do not allow to detect the neuron's RF. Extra results can be found in the Web Appendix F, where we also include (approximate) standard errors for the smoothed RFmaps.
Regarding computing times, in the absence of adaptive smoothing, the SOP method takes about 3 min, which increases to 39 min with the adaptive penalty. In terms of degrees of freedom or effective dimensions (EDs), and despite the too smooth results of the nonadaptive approach, they are lower for the adaptive than for the nonadaptive counterpart (36.4 and 58.4, out of 1331, respectively). Using SOP, the total ED is obtained by adding the EDs associated with each smoothing/variance parameter in the model (plus the dimension of the fixed part). Details can be found in Rodríguez-Álvarez et al. (2019). For (multidimensional) adaptive P-spline models, most of the EDs (or equivalently the variance parameters) are almost zero, meaning a local linear fit (i.e., they "allow" the model to better adapt to the data regardless adaptivity is needed or not). According to the (conditional) Akaike information criterion (cAIC) proposed by Donohue et al. (2011), the best model is the one using the adaptive penalty (15,523.8 vs. 15,650.2 for the standard anisotropic penalty).

DISCUSSION
This paper presents a novel approach for anisotropic multidimensional adaptive penalties in the context of P-spline models. The construction and rationale of the new adaptive penalties are described in full detail for the twodimensional case, but the extension to three dimensions is also covered. Besides, we discuss several simplifications that can easily be implemented if desired. Model estimation is based on the connection between P-spline models and generalized linear mixed models, and the practical implementation is based on the recently proposed SOP method. However, nothing, in principle, precludes the use of other estimating methods/algorithms, provided that they can deal with precision/penalty matrices that are linear in the precision/smoothing parameters. Both the simulation study and applications shown in the paper and Supporting Information highlight the gain value of using, when needed, the proposed multidimensional adaptive P-spline models. Also, the simulation study suggests that when adaptive smoothing is not necessary, F I G U R E 6 For the visual receptive field study: Level plot of the observed and smoothed ON-RFmaps (firing rates) for the right eye of neuron FAU3. First row: observed. Second row: estimates with locally adaptive smoothing. Third row: estimates without locally adaptive smoothing. the extra flexibility afforded by the adaptive penalties does not translate into an important loss in efficiency. The gain is remarkable for the study that motivated this work. In the absence of adaptivity, results are too smooth and do not appropriately recover the RF of the neuron; this may lead to erroneous conclusions. Although it has not been covered here (accurate), smoothed RFmaps will facilitate the study of important characteristics of RFs. For instance, smoothed RFmaps may help compare the results of bright ("ON") with the results of dark ("OFF") spot responses in such a way that overlapping in space and time can be determined. This, in turn, would allow inferring the functional properties of the cell. Also, comparisons between the results obtained from the monocular stimulation of each eye (right and left) could provide information about ocular dominance.
Of course, the flexibility of multidimensional adaptive P-spline models is at the cost of increasing the computing time. For applications of the adaptive approach in two dimensions, computing times are kept to a reasonable level, even for rather large B-spline bases and number of smoothing parameters. However, the increase is substantial in the three-dimensional case. Even in the nonadaptive case, multidimensional P-splines can be very computationally demanding, especially if large basis dimensions are used. The SOP method allows alleviating the computational cost, which can otherwise be very large using alternative estimating methods. We note that we used the high-level R language to implement the SOP method for the adaptive approaches described in the paper. We expect that low-level languages (e.g., C or Fortran) could further improve computing times. This would also make computationally more feasible the estimation of more complex models, such as those described in Rodríguez-Álvarez et al. (2012), where, for the visual RF study, the authors jointly estimate and compare RFmaps under different experimental conditions.
A limitation of the proposal presented here is that, although relaxed, smoothness is still assumed, for both the functions to be estimated and the smoothing parameters. Therefore, it is not suitable for functions with abrupt changes or discontinuities (see, e.g., Liu & Guo, 2010, for a one-dimensional adaptive approach in these settings). The study of appropriate multidimensional penalties (and efficient estimation strategies) for these situations represents an interesting topic for research. Finally, it would also be worth exploring the implementation of the approach presented in the paper within a Bayesian hierarchical framework.

A C K N O W L E D G M E N T S
This research was funded by projects MTM2017-82379-R (AEI/FEDER, UE) and PID2019-104901RB-I00 (AEI), by the Ramon y Cajal Grant RYC2019-027534-I, by the Basque Government (BERC 2018(BERC -2021, by the Spanish Ministry of Science, Innovation, and Universities (BCAM Severo Ochoa accreditation SEV-2017-0718), by ISCIII (RETICS RD16/0008/0003), Xunta de Galicia (Accreditation ED431G 2019/02), and the European Regional Development Fund. We are grateful to Vanda Inácio for helpful comments and discussions. We are also grateful to the peer referees and associate editor for their constructive comments of the paper.
Funding for open access charge was provided by Universidade de Vigo/CISUG.

O P E N R E S E A R C H B A D G E S
This article has earned Open Data and Open Materials badges. Data and materials are available at https:// bitbucket.org/mxrodriguez/madaptsop.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings in this paper are available in the Supporting Information of this article.