: Joint Maximum Likelihood Estimation of Motion and T1 Parameters from Magnetic Resonance Images in a Super-resolution Framework: a Simulation Study

. Magnetic resonance imaging (MRI) based T 1 mapping allows spatially resolved quan-tiﬁcation of the tissue-dependent spin-lattice relaxation time constant T 1 , which is a potential biomarker of various neurodegenerative diseases, including Multiple Sclerosis, Alzheimer disease, and Parkinson’s disease. In conventional T 1 MR relaxometry, a quantitative T 1 map is obtained from a series of T 1 -weighted MR images. Acquiring such a series, however, is time consuming. This has sparked the development of more efﬁcient T 1 mapping methods, one of which is a super-resolution reconstruction (SRR) framework in which a set of low resolution (LR) T 1 -weighted images is acquired and from which a high resolution (HR) T 1 map is directly estimated. In this paper, the SRR T 1 mapping framework is augmented with motion estimation. That is, motion between the acquisition of the LR T 1 -weighted images is modeled and the motion parameters are estimated simultaneously with the T 1 parameters. Based on Monte Carlo simulation experiments, we show that such an integrated motion/relaxometry estimation approach yields more accurate T 1 maps compared to a previously reported SRR based T 1 mapping approach.


Introduction
T 1 mapping is a quantitative Magnetic Resonance Imaging (MRI) technique that generates maps of the tissue-specific spin-lattice relaxation time T 1 [1].There is growing evidence that T 1 mapping can be applied to detect subtle microscopic tissue damage, with potential for earlier diagnosis of various brain diseases including multiple sclerosis [2][3][4], epilepsy [5] and Alzheimer's disease [6].Despite these promising results, T 1 mapping currently remains a research tool and is not yet part of routine clinical assessment.The main obstacle for clinical adaptation of T 1 mapping is that conventional T 1 mapping techniques require long scan times to achieve adequate accuracy, precision and spatial resolution.
The gold standard method for T 1 mapping, the inversion recovery method, is presented in Fig. 1.When an object is placed in a strong magnetic field, its nuclear spins align to this magnetic field, resulting in a net magnetic moment oriented in the so-called longitudinal direction (i.e., the direction parallel to the external magnetic field, corresponding with the z-direction in Fig. 1).Next, this equilibrium state is disturbed by applying a 180 • radio frequency (RF) pulse, which inverts the longitudinal magnetization.After this pulse, the spins start to relax back towards the equilibrium state with a time constant T 1 [7].After inversion time TI, the longitudinal component is tipped into the transverse plane by a 90 • RF pulse, after which the (T 1 -weighted) MR signal is measured.In this way, T 1 -weighted images are acquired at different inversion times.Subsequently, a T 1 map is estimated by voxel-wise fitting a parametric model to these images.Since many images are required in such an acquisition scheme, conventional T 1 mapping suffers from long acquisition times.
A simple way to reduce acquisition time is to lower the number of T 1 -weighted images.This, however, results in a loss of precision in the estimated T 1 map.Alternatively, a large number of T 1 -weighted images can be acquired in a short acquisition time by reducing the acquisition time of each individual T 1 -weighted image by lowering their spatial resolution.Commonly, this is done by acquiring multi-slice images where the slice-thickness is much larger than the spatial resolution within B Figure 1.A: Inversion Recovery Sequence: The longitudinal net nuclear magnetization vector is inverted by a 180 • pulse.After inversion time TI, the longitudinal component is tipped into the transverse plane by a 90 • pulse, after which the (T 1 -weighted) MR signal is measured.By acquiring multiple MR signals (images) at different inversion times, the recovery of the longitudinal magnetization towards its equilibrium value can be sampled.The time between two repetitions of the sequence, i.e. the time between the inversion pulses, is called the repetition time TR.B: Effect of the inversion recovery sequence on the net nuclear longitudinal magnetization vector M z as seen in the RF-rotating frame.B 0 represents the external magnetic field vector.the slice, i.e., the through-plane resolution is much lower than the in-plane resolution.Additionally, increasing the slice thickness increases the signal-to-noise ratio (SNR) of the T 1 -weighted images, as signal strength scales linearly with imaged volume.However, thicker slices also suffer from increased partial volume effects, which arise when different tissues occur within a single voxel.In summary, reducing the acquisition time in conventional T 1 mapping is clearly a trade-off in which faster scanning comes at the cost of either a lower precision or a lower spatial resolution and increased partial volume effects, of the resulting T 1 map.
To improve this trade-off, a super-resolution reconstruction (SRR) method was recently proposed that estimates a 3D high resolution T 1 map with isotropic voxel size from a set of low resolution T 1weighted multi-slice images with different slice orientations and anisotropic voxel size [8].These low resolution images are acquired at a high in-plane resolution and a low through-plane resolution.It was shown that this method indeed provides a better trade-off between resolution, precision and acquisition time than direct high-resolution acquisition [9].In this approach, motion was compensated for by adjusting the transformation parameters constituting the motion operator in a preprocessing step, and fixing these parameters in the SR-T1 estimation routine that followed.Fixing the motion parameters, however, may lead to inaccurate (i.e., biased) T 1 maps since no feedback mechanism is present in the SR-T1 estimation routine that can undo incorrect fixation of motion parameters.As such, errors that potentially exist in the motion estimation step might propagate into the T1 estimation.At the same time, another recent work has proposed a unified Maximum Likelihood framework for simultaneous motion and T 1 estimation in non-super-resolution T 1 mapping [10].It was demonstrated that the joint incorporation of the relaxation model, the motion model as well as the data statistics provide substantially more accurate motion and T 1 parameter estimates.In the present paper, we explore, by means of simulation experiments, the potential of combining both approaches, resulting into joint Maximum Likelihood estimation of T 1 and motion in a super-resolution framework.
The remainder of this paper is organized as follows.In Section 2, the image acquisition model, the proposed joint Maximum Likelihood estimator (MLE) and its implementation are described.Section 3 describes the simulation experiments, of which the results are presented and discussed in Section 4. Finally, in Section 5 conclusions are drawn.

Theory
The proposed method starts from a set of N T 1 -weighted multi-slice images, each with a different slice direction.The multi-slice images, which are assumed to have a high in-plane resolution and a low through-plane resolution, will be referred to as the low resolution (LR) images.That is, the slice thickness, or through-plane voxel size, is larger than the in-plane voxel size, leading to anisotropic voxels.The method then estimates a high-resolution (HR) T 1 map with isotropic voxels from a set of LR multi-slice T 1 -weighted images and simultaneously estimates the motion between the acquisition of these LR images.
In the derivation of the imaging model, we will assume that the LR T 1 -weighted images are acquired with a multi-slice inversion recovery (IR) conventional spin echo (SE) sequence, being the gold standard sequence for T 1 mapping [11][12][13].

MR imaging model
Let T 1 = (T 1 (j)) ∈ R Nr×1 be the vector containing the values of the unknown T 1 map at the HR grid points {x j } (with x j ∈ R 3×1 and j the HR voxel index, j = 1, . . ., N r ).Furthermore, let s n ∈ R Ns×1 , with n = 1, . . ., N , denote the vector containing the intensities of the noiseless LR T 1 -weighted multi-slice image with slice direction n, acquired with inversion time TI n , and sampled at the LR grid points {y nl } (with y nl ∈ R 3×1 and l the LR voxel index, l = 1, . . ., N s ).To derive the mathematical relation between the HR T 1 map of interest, T 1 , and the LR image s n , we now first introduce the virtual, noise free, HR T 1 -weighted image r n = (r n (j)) ∈ R Nr×1 , which is assumed to be acquired with the same inversion time TI n as s n and sampled at the (nonrotated) HR grid points of T 1 .
Then, r n can be modeled as a function of T 1 and a quantity ρ = (ρ j ) ∈ R Nr×1 , which is proportional to the proton density [14]: where we have assumed a perfect inversion pulse of 180 • and a repetition time TR T 1 .In the remainder of this paper, T 1 and ρ will be referred to as relaxation model parameters to be estimated.
Mathematically, we can now express the LR image s n as the result of applying a sequence of operators on the virtual HR image r n : with M θn ∈ R Nr×Nr , G n ∈ R Nr×Nr , B ∈ R Nr×Nr and D ∈ R Ns×Nr linear operators that describe, respectively, unintended motion, a known geometric transformation, spatially invariant blurring, and downsampling.In this work, we assume the unintended motion M θn to be rigid, parameterized by with t xn , t yn , t zn the translation parameters and α n , β n , γ n the Euler angles of three elementary rotation matrices that describe rotation around the x, y and z axis, respectively.The superscript T in Eq. (3) denotes the transpose operation.In the present work, we use the same implementation of the rigid motion operator M θ as in [10], where they used the fact that rotation matrices can be decomposed as the product of three shear matrices.Each of the shearings can be implemented efficiently with Fast Fourier Transforms (FFT).Translation is implemented using the FFT as well.The operator G n applies a known geometric transformation that models the image acquisition with a specific slice direction.More specifically, operator G n models the SRR acquisition, in which multiple LR T 1 -weighted images at different orientations are acquired by rotation of the acquisition plane for each image around one fixed encoding axis.In our implementation, operator G n is a simplified version of M θ that models rotation around one fixed encoding axis.However, whereas M θn models the effect of unintended motion, which is unknown and has to be estimated from the data, the geometric transformation G n is known and determined by the prescribed slice direction of the LR image s n .The blurring operator B describes the point spread function (PSF) of the MRI acquisition process, which can be modeled as tensor product of the PSF in three orthogonal directions: the through-plane (i.e., slice-selection) direction and the two in-plane directions, which are known as the phase-and frequency-encoding direction.We currently consider the PSF in the through-plane direction only.This through-plane PSF depends on the slice selection method.In this incipient work, the operators B and D describing spatially invariant blurring and downsampling, respectively, are combined into one operator D that performs downsampling by averaging along the through-plane direction [15].Details about this specific D operator (and corresponding adjoint operator) are given in Appendix A.
For convenience of expression, we define A n = DG n M θn , and rewrite Eq. (2) as with A n = (a n (l, j)) ∈ R Ns×Nr .By combining Eqs. ( 1) and ( 4), the noiseless signal in voxel l of the LR T 1 -weighted image can be described in terms of the HR maps T 1 = (T 1 (j)) and ρ = (ρ(j)): In this work, we will assume that magnitude images are acquired, as is common for spin echo IR sequences.The voxel intensities of magnitude images reflect only the magnitude of the longitudinal magnetization, disregarding polarity [16].Hence, in the absence of noise, the magnitude images are described by |s n |, with | • | the point-wise modulus operator.
Obviously, real-world images will be subject to noise.In this work, the noise is assumed to be additive, zero mean Gaussian noise.It has been shown that this is a valid assumption when the signalto-noise ratio of the magnitude data is sufficiently high (> 3) [17][18][19][20], which is typically the case for the LR images.Hence, if we denote the acquired LR magnitude images by sn ∈ R Ns×1 , our image acquisition model can be described as with e n ∈ R Ns×1 a vector containing zero mean Gaussian noise contributions.

The joint Maximum Likelihood estimator
Having derived the imaging model in subsection 2.1, this subsection will describe the model-based SRR framework that estimates an HR ρ and T 1 map (ρ and T 1 , respectively) simultaneously with the motion parameters θ n , n = 1, . . ., N , from a set of LR images {s 1 , s2 , . . ., sN }, using a joint Maximum Likelihood estimator (MLE).The MLE is chosen because it is asymptotically unbiased, efficient (i.e., most precise) and consistent [21].The MLE fully exploits prior knowledge on the statistical distribution of the data.In our case, the data consists of the voxels of the LR images.These voxels can be modeled as random variables that, due to the presence of noise, fluctuate about their expected values which are described by the model |A n r n | that was derived in subsection 2.1.Assuming additive, zero-mean Gaussian distributed noise, the PDF of a voxel sn (l), with l = 1, . . ., N s , of the image sn is given by: with σ the standard deviation of the noise, which in this work is assumed to be spatially and temporally invariant.Assuming all voxels of all T 1 -weighted LR images statistically independent, the joint PDF of all voxels is given by: with s = (s T 1 , . . ., sT N ) T and θ = (θ T 1 , . . ., θ T N ) T .To simplify the notation, let us define the parameter vector τ = (T T 1 , ρ T , θ T ) T .To construct the MLE of τ , the likelihood function L(τ |s), which is the joint PDF of Eq. ( 8) regarded as a function of the unknown parameter vector τ (with s fixed), is needed.The MLE τML of the parameter vector τ from measured data s is that value of τ that maximizes the likelihood function L(τ |s), or equivalently, the so-called log-likelihood function L s(τ |s) log L(τ |s), with respect to τ , i.e., τML = arg max It follows from Eq. ( 8) that the log-likelihood function can be written as Hence, the ML estimator τML is equal to the ordinary (unweighted) least-squares estimator: with The non-linear optimization problem (11) can be solved using the alternating minimization method, also known as the cyclic block-coordinate descent (cBCD) method [22,23].In this method, the parameter vector τ is split into blocks and the cost function J(τ ) is successively minimized with respect to each block in a cyclic order.In our case, we use a split into two blocks that contain the motion parameters and relaxation model parameters, respectively.In this way, the large-scale optimization problem ( 11) is separated into more easily solvable problems [10].Moreover, it can be shown that this cBCD method assures a convergence property where J(τ ) decreases at every iteration [22].Convergence to at least a local minimum is guaranteed [24].In summary, the joint MLE is obtained by the following iterative recursive procedure: t) , ρ(t) , θ) (P.1) with θ(0) = θ ini , ρ(0) = ρ ini and T1 (0) = T 1ini the initial values of the parameters θ, ρ and T 1 , respectively.By its definition, this procedure produces a nonincreasing sequence of cost function values [23].The procedure is terminated when the number of iterations exceeds t max or when E (t) < E min , where ), and consecutive iterations are started from E (0) = r E min , with r ∈ R >1 .The pseudo-code of the joint MLE algorithm is presented in Algorithm 1.

Implementation
For the proposed joint MLE algorithm, in which the non-linear optimization problem is solved in alternating fashion between problems (P.1) and (P.2), the motion estimation problem (P.1) adopts a particularly simple structure when the relaxation model parameters are fixed.Assuming no dependence of {θ n } N n=1 through index n, as is done here, the motion estimation problem can be decoupled into N independent minimization problems that can be evaluated in parallel.In the absence of additional information, a natural choice for the initialization of the motion parameters in the first iteration is a zero-motion initial condition such that the rigid motion operator M θ ini = I, with I the identity matrix.Initial values ρ ini and T 1ini were obtained by voxel-wise NLLS fitting the modulus of the relaxation model in Eq. ( 1) to the upsampled LR images with a Levenberg-Marquardt [25] algorithm, using the MATLAB routine lsqnonlin.Upsampling was performed using the adjoint operator sequence G T n D T acting on the LR images sn , followed by application of | • |, the point-wise modulus operator, to regain magnitude images.The cost functions of LS problems (P.1) and (P.2) were minimized with a trust-region Newton algorithm using the MATLAB routine fminunc.Explicit analytical gradients were supplied for LS problem (P.1), while both explicit analytical gradients and implicit Hessian matrix elements (in the form of a matrix multiplication routine) were supplied for LS problem (P.2).The matrix multiplications in Eq. ( 2) were implemented by splitting the transformation operators M θn and G n in sets of shear operations, each of which can be efficiently applied as a filtering operation in the frequency domain [10].
The computational complexity of the joint MLE algorithm is primarily defined by the FFT operations that are part of the implementation of operators G n and M θn in equation Eq. (2).Using that a Q element 1D FFT has computational complexity of O (Q log 2 (Q)), we derived that a single step of problem (P.1) required O 69M 3 log 2 (M 2 ) + 5M 6 log 2 M 6 floating point operations, given that G n and M θn operate on HR images with isotropic dimensions M × M × M .This includes the operations introduced by the explicit analytical expressions for the gradient of the objective function of problem (P.1) w.r.t.motion parameters θ n .Furthermore, the given number of floating point operations should be multiplied by a factor N − 1, since problem (P.1) is optimized in parallel manner.In addition, a single step of problem (P.2) requires O N • 48M 3 log 2 (M 2 ) + 4M 6 log 2 M 6 floating point operations.This number also includes the additional FFT operations introduced by the analytical gradient and Hessian expression for the objective function w.r.t. the T 1 parameters.The computational requirements of operator D were less demanding with one call of operator D calculated up to 10 times faster than G n , and up to 50 times faster than M θn , for the considered phantom size.

Simulation experiments
In this section, we describe the Monte Carlo (MC) simulation experiments that were carried out to evaluate the performance of the proposed joint MLE and to compare it with: • SRR-T1: SR least squares (LS) estimation without motion correction.In this approach, the motion is simply ignored.That is, the least-squares criterion ( 12) is minimized with respect to the relaxation model parameters only, while fixing the motion parameters at θ = 0.
• SRR-T1-MI: Mutual Information (MI) based registration prior to SR LS estimation.In this approach, the LR images {s 1 , s2 , . . ., sN } are first upsampled by applying the adjoint operator A T n .Next, the upsampled (HR) images are registered using a mutual information image similarity metric [26,27].The motion parameters that result from this procedure are then substituted in the LS criterion (12), which is then minimized with respect to the relaxation model parameters only.
• SRR-T1-PRE: SRR T1 mapping described in [8].In this approach, the motion parameters are estimated in a preprocessing step prior to the estimation of the HR T 1 and ρ map.For this purpose, an iterative model-based motion correction scheme is used.First, the LR images {s 1 , s2 , . . ., sN } are upsampled by applying the adjoint operator A T n .The first time that this upsampling is performed, the motion operator M θn that co-constitutes A is set equal to the identity matrix.Next, a T 1 and ρ map are estimated by voxel-wise fitting the modulus of model (1) to these upsampled images, using the Levenberg-Marquardt algorithm.Based on the thus obtained HR T 1 and ρ maps, LR images are generated using model (5).These LR images are then rigidly aligned with the LR images {s 1 , s2 , . . ., sN } by minimizing their mean squared difference.The resulting motion parameter estimates are then used to update M θn .All steps are repeated until the stopping criterion is met with E min = 10 −4 .The motion parameters are then fixed and a HR T 1 and ρ map is estimated using a LS estimator, with as initial values for T 1 and ρ the values that resulted from the motion correction procedure.Details about the specific simulation settings for each of the described MC simulation experiments are summarized in Table 1.Information about the initialization of (P.1) and (P.2) is also summarized in trust-region Newton i trust-region Newton g,i trust-region Newton g,i trust-region Newton i a During simulations, nine different SNR values were studied ranging from 20 to 100, sampled with step size 10.
b Each of the LR images {θn} N n=1 , with N = 14, had a unique inversion time TIn.The logarithms of these inversion times {TIn} N n=1 were equidistantly spaced between T1 = 0.1s and T14 = 8s.c The upsampled LR images are pairwise registered using MATLAB's imregtform function [28].During the rigid registration process, the number of multi-level image pyramid levels is equal to two, and the first image of the series is chosen as a reference, hence θ1 = 0.The one-plus-one evolutionary optimizer configuration is used, for which the number of iterations is set to a very high value (> 5000) to ensure convergence of the motion parameter estimation.The remaining MI registration parameters are set to the default values of the MATLAB built-in code.Assuming no dependence of the motion parameters {θn} N n=1 through the index n, the different pairwise registration problems can be decoupled into N − 1 sub-problems, which can be implemented very efficiently with MATLAB parallel computing tools.d Preprocessing loop: the iterative model-based motion correction scheme is used, as described in [8].e zero-motion IC: In the absence of additional information, a natural choice for the initialization of the motion parameters in the first iteration is a zero-motion initial condition (IC) such that the rigid motion operator M θ ini = I, with I the identity matrix.f vw-NLLS-LM: voxel-wise NLLS fitting the modulus of relaxation model in Eq. ( 1) to upsampled LR images with Levenberg-Marquardt algorithm, using MATLAB's lsqnonlin routine, with the initial estimate per voxel chosen equal to [ρ, T1] = [0.5, 1.5].Upsampling was performed using the adjoint operator sequence G T n D T followed by application of | • |, the point-wise modulus operator.g Problem (P.2) is solved for the relaxation model parameters only, while fixing the motion parameters at those that result from the initialization procedure.h MATLAB's fminunc routine, implemented with explicit analytical expressions for the gradient of the objective function.i MATLAB's fminunc routine, implemented with explicit analytical expressions for the gradient of the objective function and implicit Hessian matrix elements (in the form of a matrix multiplication routine).
Table 1 for each simulation experiment.To perform realistically adequate simulation experiments, a set of 2D multi-slice IR-SE T 1 -weighted LR images affected by inter-image motion (as in Eq. ( 6)) and noise was modeled from the ground truth T 1 and proton density maps.These ground truth maps were based on a simple cubic 12 × 12 × 12 numerical phantom that was adopted from [8].The phantom maps consisted of distinct regions, representing grey and white matter tissue parameters that were characterized using reported T 1 and ρ (proton density) values in human brain tissue at 3T [29].The ground truth T 1 values for grey and white matter were 1607 ms and 838 ms, respectively, while those for ρ were set at 0.86 and 0.77 for the respective regions.An overview of this numerical phantom, which we refer to as Phantom 1, is shown in Fig. 2. From these ground truth T 1 and ρ maps, a set of N = 14 noiseless LR T 1 -weighted magnitude images was simulated using the forward model (6).The dimensions of each LR image were equal to 12 × 12 × 6.The anisotropy factor (AF) was equal to 2. This AF is defined as the ratio between the through-plane slice thickness and the (isotropic) in-plane voxel size in the frequency encoding and phase encoding direction, see also Fig. 3.For each LR image N , the translational shifts t xn , t yn , t zn and Euler angles α n , β n , γ n that define the rigid motion parameters {θ n } N n=1 in Eq. (3) were generated randomly from a uniform distribution on the interval [−1, 1] (voxel units) and [−5, 5] degrees respectively.The reference image was chosen to be s1 , hence θ 1 = 0.The same set of randomly generated rigid motion parameters was used for all simulation experiments.Furthermore, MATLAB's intrinsic coordinate system is used to represent 3D images, for which the origin is chosen at the center of the 3D image.
To account for wraparound artefacts that stem from the use of FFT to perform rotations, appropriate zeropadding was performed in each direction.Furthermore, it should be noted that since in our simulations the M × M × M discrete sampled image volume to be rotated had an even number of sampling points in each direction (i.e., M was even), the FFT based procedure required an extra multiplication with an exponential phase factor to obtain real values after rotation [30].
To fully recover the HR information, the LR images need to contain complementary information about the phantom.Rotation in image space corresponds to a rotation in frequency domain.As previously argued [31], acquiring the LR images with different slice orientations ensures that each LR image covers a different part of k-space (Fig. 3).In this way, the LR data will contain high spatial frequencies in all three dimensions.This approach results in more effective sampling of k-space than shifting the LR images by subpixel distances along the slice selection direction.In the latter case, the SR reconstruction result relies heavily on the success of recovering the aliased high frequency in the slice direction, since the narrow slice selection frequency band covers exactly the same part Figure 3. Schematic comparison between image space and k-space (2D and 3D view) for a multi-orientation low resolution acquisition.The anisotropy of the voxels in image space is defined by the anisotropy factor, AF = b a with b, the slice thickness, and a, the voxel size in the frequency encoding direction (and phase encoding direction).Since we choose to rotate only around the phase encoding axis, the k-space can only be sampled in a cylinder with diameter 1 a .
of the k-space for each LR image.Similar to the acquisition protocol in [8,32], the rotation of the LR images was performed around the virtual phase encoding axis in increments of 180/N o degrees, with N o the number of slice orientations.Aiming at a short acquisition time, the number of slice orientations was kept low, but sufficiently high to ensure that the k-space is maximally covered.In particular, the number of slice orientations was fixed to 7, corresponding to having two different LR T 1 -weighted magnitude images per slice orientation.An overview of the slice orientations and corresponding inversion times (TI), combined with their coverage of k-space is shown in Fig. 3.Each LR T 1 -weighted magnitude image had a unique inversion time, with TI n ∈ [0.1, . . ., 8]s, where the TIs were sampled equidistantly in log-space.
After fixing the motion parameters and acquisition geometry, downsampling by averaging and the application of the point-wise modulus operator for magnitude images in accordance with (6), zero mean white Gaussian noise was added to the LR images.The noise level was chosen to obtain SNR values between 20 and 100, where SNR is the ratio of the spatial mean of the LR image with the highest TI and the standard deviation of the noise.For each SNR, N MC = 140 realizations of sets of LR images were generated.For each realization, a HR T 1 and ρ map as well as the motion parameters θ were estimated.
The different simulations experiments were implemented in MATLAB [28], and run on a computer with an Intel i7-6850K hexa-core CPU @ 3.6 GHz and 32 GB of RAM.The proposed joint MLE simulation experiment for the numerical phantom required around 4 GB RAM (allocated memory usage), and for a fixed tolerance of E min = 10 −4 , on average about 14 alternating MLE iterations were needed to ensure convergence of the optimization process.In order to run the series of MC simulations quickly and efficiently, the University of Antwerp's High Performance Computing core (HPC) facility CalcUA was used.
Overview of the slice orientations of the LR images and corresponding inversion times TI n (top row) and the k-space sampling strategy (middle and bottom row).The middle row shows the k-space sampling of seven individual LR images, each having a different slice orientation, whereas the bottom row shows the overlap in k-space when those images are combined.The shaded area denotes the sampled k-space, while the white region is not sampled.
To assess the performance of each method to estimate the T 1 map, the following performance measures were used [10]: (a) Relative bias.The bias quantifies the accuracy or, equivalently, the systematic error of the estimator [21].For each voxel, the relative sample bias was calculated as ( T1 − T 1 )/T 1 , where T1 is the sample mean of the N MC estimates T1 and T 1 is the true value.A measure of the overall accuracy of the T 1 map was obtained by calculating the spatial mean of the absolute value of the relative sample bias.
(b) Relative standard deviation.The standard deviation quantifies the precision, or, equivalently, the non-systematic error of the estimator [21].For each voxel, the relative sample standard deviation was calculated as std( T1 )/T 1 , and an overall precision measure was obtained by taking the spatial mean of these relative sample standard deviations.
(c) Relative root-mean-square error (relative RMSE).The RMSE is a measure that incorporates both accuracy and precision.For each voxel, the relative sample RMSE was calculated as ( T1 − T 1 ) 2 /T 1 .An overall RMSE measure was obtained by calculating the spatial mean of these relative sample RMSE values.
By substitution of ρ for T 1 , the performance of each method to estimate the ρ map was assessed in an identical way using measures (a)-(c).
To assess the ability of the proposed method to estimate motion, the following performance measure was used: (d) Motion component root-(mean)-mean-square-error (RMMSE), defined as with [θ n ] j the jth component of θ n and [ θn ] j the sample mean of the N MC estimates [ θn ] j .
To supplement the results of Phantom 1, also a second, more challenging phantom was created (Fig. 4), which consisted of distinct grey and white matter tissue regions that included both uniform regions and checkerboard patterns, combined with horizontal and vertical planar structures.This second phantom will be referred to as Phantom 2. The same ground truth T 1 values as for Phantom 1 were used to characterize grey and white matter voxels.The different Monte Carlo experiments were repeated for Phantom 2 for a fixed spatial SNR = 50, keeping the other simulation settings identical as for Phantom 1.The reconstruction results of both phantoms were visually compared w.r.t.their respective ground truth T 1 parameter maps.For each phantom and for each method, an average T 1 map was calculated by voxel-wise averaging over all N MC reconstruction results for SNR = 50.

Results and Discussion
Figs. 5-7 summarize the statistical performance results that were obtained from the simulation experiments for Phantom 1. Fig. 5(a-c) and Fig. 6(a-c) show the overall relative sample bias, standard deviation and RMSE for all considered estimators of, respectively, T 1 and ρ, as a function of the SNR.
The results clearly show that for the SRR-T1 method without motion correction, the estimation of the relaxation model parameters performs poorly in terms of accuracy and precision, with high relative bias and high relative standard deviation, thereby underlining the importance of a proper motion estimation framework.Furthermore, SRR-T1-MI also shows a poor performance.In terms of precision, SRR-T1-MI performs even worse than SRR-T1.This observation is quite remarkable.It may reflect the limitations of intensity-based image registration when it comes to registering images of largely different contrast, which have been reported earlier [33,34].The poor performance of the SRR-T1-MI method may also be partly due to its implementation.In our implementation, the LR magnitude images were naively upsampled using the adjoint operator A T n followed by application of the pointwise modulus operator.The latter preserves the magnitude characteristic of the resulting HR images after upsampling.However, information loss is inherent and unavoidable in this upsampling process, partially due to the non-existence of an adjoint modulus operation, and this might impede correct intensity-based registration of these HR images in the next step.Finally, the small size of the images considered in our simulation experiment may also play a role, as SRR-T1-MI may be more sensitive to the image size than the other methods considered.Next, the SRR-T1-PRE method clearly improves the estimation results, both in terms of accuracy and precision.For low SNR values, this method shows even a better relative standard deviation than the SRR-T1-JMLE approach, as can be observed from Fig. 5(b) and Fig. 6(b).However, this lower precision of the proposed SRR-T1-JMLE is more than compensated by its higher accuracy, resulting in the superior performance of SRR-T1-JMLE in terms of relative RMSE over the whole range of SNR values.Fig. 7 shows the motion component RMMSE for each of the six rigid motion components as a function of the SNR.SRR-T1-JMLE clearly outperforms the other methods (SRR-T1, SRR-T1-MI, and SRR-T1-PRE) in terms of the motion component RMMSE.This is particularly visible for the rotation parameter components α, β, γ, in the lower bottom half of Fig. 7.It is worth emphasizing once more that the motion parameter problem in the SRR-T1-JMLE method was initialized from a zero-motion initial condition, i.e.M θ ini = I, with I the identity matrix.This also highlights the robustness of the SRR-T1-JMLE method for poor motion initialization scenarios.The quantitative performance measures for Phantom 2, for a fixed spatial SNR = 50, are summarized in Table 2. Results with this new phantom are very similar to those obtained with Phantom 1, underlining the superior performance of SRR-T1-JMLE in terms of relative RMSE and motion component RMMSE.
Finally, to provide a visual comparison of the performance of the different methods, for each method an average T 1 map was calculated by voxel-wise averaging over all N MC reconstruction results for SNR = 50.Fig. 8 shows the resulting average T 1 maps for three orthogonal slices (sagittal, axial and coronal planes) of Phantom 1, accompanied by histograms of the voxel data distribution of the corresponding full 3D average T 1 maps.The number of histogram bins was specified at 200.Note that the grey and white matter phantoms considered have histograms consisting of two distinct peaks corresponding with the ground truth T 1 values of both tissues.From Fig. 8, for Phantom 1, it is clear that only with SRR-T1-PRE and SRR-T1-JMLE, these peaks can be distinguished.Moreover, Fig. 8 clearly shows that the block-wise homogeneous structure of the phantom is best reconstructed by SRR-T1-JMLE.The same observations can be made for Phantom 2, for which the results of the visual comparison are shown in Fig. 9.
Overall, the results clearly demonstrate the superior performance of the SRR-T1-JMLE method in terms of accuracy and relative RMSE for both motion and T 1 and ρ estimation.This is also supported by the visual comparison presented in Fig. 8 and Fig. 9, where SRR-T1-JMLE clearly outperforms the other estimation frameworks.
In the outline of the MR imaging model under paragraph 2.1, the voxel intensity values of the HR images r n were modeled by a three-parameter T 1 model (Eq.( 1)) that depends on T 1 and ρ, for given inversion times TI n [14].It should be noted that if the assumptions that substantiate the choice of this model, i.e. perfect inversion pulse of 180 • and a repetition time TR T 1 , are invalid in practice, the proposed joint MLE method can still be used, but the model should be extended so as to avoid biased results.Such an extension may include the introduction of additional unknown model parameters to be estimated from the data [35], which may have a negative influence on the precision.This well-known trade-off between bias and precision should always been taken into account in model selection.
According to the computational complexity of the SRR-T1-JMLE algorithm described in section 2.3, increasing the volume size of the images, i.e. choosing a larger region-of-interest (ROI), will result in longer computation times, as the number of floating point operations increases.As a possible solution to this problem, the ROI could be split in several blocks (with overlap to avoid edge artifacts), where the HR relaxation parameters are reconstructed in each block separately.This would allow parallelization of estimation problem (P.2), which would considerably reduce the computational complexity, memory consumption and computation time of the SRR-T1-JMLE method.

Conclusion
Quantitative MR T 1 mapping suffers from long acquisition times with high risk for patient motion artefacts, resulting in poor accuracy of estimated T 1 relaxometry parameters.In this work, we explored the potential of augmenting a recently proposed super-resolution reconstruction method for MRI T 1 mapping with simultaneous motion estimation in a maximum likelihood framework.Superresolution reconstruction provides a better trade-off between resolution, precision and acquisition time than conventional direct high-resolution acquisition.By extending super-resolution reconstruction with simultaneous motion estimation, potential bias in the estimated T 1 map caused by motion can be substantially reduced compared to motion correction by preprocessing.By means of Monte Carlo simulation experiments, our newly proposed method was quantitatively compared against a ground-truth T 1 map together with three other approaches.The results were analysed using statistical performance measures and by performing a visual comparison.In conclusion, the results of these simulation experiments demonstrate that our newly proposed joint relaxometry and motion estimation approach yields more accurate T 1 maps than a previously reported SRR based T 1 mapping approach, in which motion registration is applied as a preprocessing step prior to T 1 mapping.Future work will focus on the validation of the proposed joint MLE method on real data scenarios, the development of advanced blurring operators for the acquisition model, and the extension of the motion model to non-rigid or affine motion.In addition, it is also worthwhile to investigate intra-image motion correction strategies and to further customize the alternating optimization scheme.

Appendix A Downsampling and upsampling operators
In the buildup of the forward model (4) encapsulated in operator A, operators B and D describing spatially invariant blurring and downsampling, respectively, are combined into one operator D that performs downsampling by averaging [15].Conventionally, downsampling keeps one sample out of a block and discards the remaining samples, whereas blurring takes into consideration the point spread functions of the MRI acquisition process.Downsampling by averaging is used here.  . . . . . .
. . .where M corresponds with the anisotropy factor AF, defined as the ratio of the through-plane slice thickness and the (isotropic) in-plane voxel size.Implementing the iterative recursive procedure described by problems (P.1) and (P.2) benefits from having adjoint operators.Interestingly, upsampling is the adjoint of downsampling.Conventionally, upsampling with zero insertion is used, we use upsampling with replication.For example, upsampling by a factor of 2 in 1D has matrix form,    where the floor function • gives the largest integer less than or equal to its argument.
Figure1.A: Inversion Recovery Sequence: The longitudinal net nuclear magnetization vector is inverted by a 180 • pulse.After inversion time TI, the longitudinal component is tipped into the transverse plane by a 90 • pulse, after which the (T 1 -weighted) MR signal is measured.By acquiring multiple MR signals (images) at different inversion times, the recovery of the longitudinal magnetization towards its equilibrium value can be sampled.The time between two repetitions of the sequence, i.e. the time between the inversion pulses, is called the repetition time TR.B: Effect of the inversion recovery sequence on the net nuclear longitudinal magnetization vector M z as seen in the RF-rotating frame.B 0 represents the external magnetic field vector.(a) Initial net nuclear longitudinal magnetization in alignment with B 0 , (b) the 180 • inverts the longitudinal magnetization M z , (c)-(d) the longitudinal magnetization M z relaxes and recovers to equilibrium, (e) after an inversion time TI the relaxing longitudinal magnetization M z is tipped into the transverse plane by a 90 • pulse before readout.

Figure 2 .
Figure 2. Phantom 1: Overview of the ground truth HR maps, and visualization of the downsampling along the slice dimension for one LR image.

Figure 4 .
Figure 4. Phantom 2: Overview of the ground truth HR maps, and visualization of the downsampling along the slice dimension for one LR image.

Figure 5 .
Figure 5. Results of the simulation experiments for Phantom 1: (a) relative T 1 bias, (b) relative T 1 standard deviation, (c) relative T 1 RMSE, as a function of SNR.Error bars correspond with the standard error of the spatial mean, but are omitted when their size matches the order of the graph symbol size.

Figure 6 .
Figure 6.Results of the simulation experiments for Phantom 1: (a) relative ρ bias, (b) relative ρ standard deviation, (c) relative ρ RMSE, as a function of SNR.Error bars correspond with the standard error of the spatial mean, but are omitted when their size matches the order of the graph symbol size.

Figure 7 .Figure 8 .
Figure 7. Results of the simulation experiments for Phantom 1, showing the motion component RMMSE for each of the six rigid motion components, as a function of SNR.Error bars are omitted when their size matches the order of the graph symbol size.

Figure 9 .
Figure 9. SRR-T1-JMLE reconstruction result for Phantom 2: Ground Truth (row 1), SRR-T1-PRE (row 2), SRR-T1-JMLE (row 3).On the left, three orthogonal slices of the T 1 map obtained by voxel-wise averaging over all N MC = 140 reconstruction results for SNR=50.On the right, histograms showing the voxel data distribution of the corresponding full 3D T 1 parameter map estimate.The ground truth T 1 values for grey and white matter, 1607 ms and 838 ms respectively, are marked with vertical lines.
For example, downsampling by a factor of 2 in 1D has matrix form, In the spatial (i.e.image) domain, this downsampling from a discrete vector x[n] to y[n] can be written in more compact form asy[n] = Dx[n]

16 )
The upsampling matrix is the transpose of the downsampling matrix in (A.15).We can write the upsampled x[n] from y[n] asx[n] = D T y[n] )