# Wikisource:WikiProject Open Access/Programmatic import from PubMed Central/Robust Time Estimation Reconciles Views of the Antiquity of Placental Mammals

Robust Time Estimation Reconciles Views of the Antiquity of Placental Mammals
Yasuhiro Kitazoe; Hirohisa Kishino; Peter J. Waddell; Noriaki Nakajima; Takahisa Okabayashi; Teruaki Watabe; Yoshiyasu Okuhara, edited by Matthew Hahn
PLoS ONE , vol. 2, iss. p.

## Abstract

### Background

Molecular studies have reported divergence times of modern placental orders long before the Cretaceous–Tertiary boundary and far older than paleontological data. However, this discrepancy may not be real, but rather appear because of the violation of implicit assumptions in the estimation procedures, such as non-gradual change of evolutionary rate and failure to correct for convergent evolution.

### Methodology/Principal Findings

New procedures for divergence-time estimation robust to abrupt changes in the rate of molecular evolution are described. We used a variant of the multidimensional vector space (MVS) procedure to take account of possible convergent evolution. Numerical simulations of abrupt rate change and convergent evolution showed good performance of the new procedures in contrast to current methods. Application to complete mitochondrial genomes identified marked rate accelerations and decelerations, which are not obtained with current methods. The root of placental mammals is estimated to be ∼18 million years more recent than when assuming a log Brownian motion model. Correcting the pairwise distances for convergent evolution using MVS lowers the age of the root about another 20 million years compared to using standard maximum likelihood tree branch lengths. These two procedures combined revise the root time of placental mammals from around 122 million years ago to close to 84 million years ago. As a result, the estimated distribution of molecular divergence times is broadly consistent with quantitative analysis of the North American fossil record and traditional morphological views.

### Conclusions/Significance

By including the dual effects of abrupt rate change and directly accounting for convergent evolution at the molecular level, these estimates provide congruence between the molecular results, paleontological analyses and morphological expectations. The programs developed here are provided along with sample data that reproduce the results of this study and are especially applicable studies using genome-scale sequence lengths.

## Introduction

Despite great progress over the past decade, the evolutionary history of placental mammals remains controversial. While a consensus is emerging on the topology of the evolutionary tree [1][2], although with occasional disagreement [3][4], divergence times remain uncertain. The age of earlier nodes and in particular the root, remain especially uncertain in the absence of definitive placental fossils deeper into the Cretaceous [5][3], [6]. Both paleontological and morphological studies suggest that the radiation of placental orders and super orders occurred close to the Cretaceous–Tertiary (K–T) boundary about 65 million years ago (mya) [6], [7].

In contrast, molecular studies have suggested markedly older origins for many superordinal groups and that some extant orders diversified before the K–T boundary. The root of living placental mammals has been reported to be in the range of 100–140 mya [5], [8][9] even with application of rate-adjustment techniques [5], [9]. The molecular consensus of an old root is becoming strong enough that it may become dogma without a close examination of the assumptions being made in such analyses. The true age of the orders and superorders has important implications for determining the overall paleoecology and biogeography of placental mammals during a period that included the breakup of continents and the extinction of terrestrial dinosaurs. Reconciling the different paleontological and molecular divergence time estimates has important implications for basic methodologies that are central to evolutionary study.

The strength of molecular divergence time studies is their potential to draw information from very long aligned sequences of many species. It is widely assumed that a huge amount of sequence data and the approximate rate constancy of sequence evolution [10] make molecular estimates more reliable than those based solely on fossil data. Such analyses on a genomic scale are generally anticipated to be decisive because they are expected to be free of stochastic noise [11], [12].

However, molecular studies acknowledge the problem of misspecification of the model of sequence change, which may result in seriously biased estimation. The relationships among placental orders do vary according to the data used and the taxa sampled [1][4], [13][9], [14][15]. Most methods of phylogenetic inference rely strongly on probabilistic models of sequence evolution, and neither directly detect nor correct for convergent evolution [16]. When left uncorrected, such “homoplasy” may attract lineages and underestimate certain pairwise distances. This in turn distorts branch (edge) length estimates, which are of primary importance for divergence time estimation. While some methods will detect potential convergent evolution, for example Hadamard conjugations and SplitsTree [17], Kitazoe et al. have developed multidimensional vector space (MVS) representation methods to both detect convergent evolution among estimated distances and to also correct for this bias [18][19].

A further problem facing molecular dating is the evolutionary rate constancy, or lack of constancy, over long periods. Since its proposal in 1965 [20], the molecular clock hypothesis has been one of the most hotly debated subjects in evolutionary biology. It is now widely accepted that a molecular clock tends to hold well for closely related organisms, but breaks down with increasing evolutionary divergence. To adjust for the inevitable fluctuation of evolutionary rates, nonparametric [11], [12], local clock [21], [22], and hierarchical Bayesian [23] methods have been developed. These methods are robust against stochastic fluctuations of evolutionary rates [23][24]. However, they may cause serious problems when pronounced transient changes of rate occur, because they overly smooth such rate changes. In this article, we show the magnitude of such problems using clear worked examples.

To improve the detection of, and robustness to, abrupt rate changes, we have developed a new procedure that minimizes the local variability of the inverse of the evolutionary rate. Just as the effective size of fluctuating populations is represented by the harmonic mean over time, the mean evolutionary rate among lineages is expressed better by the harmonic mean. This approach is especially useful when branch lengths measured in the expected number of substitutions per site (the products of rates and times) are estimated accurately, and there are either rapid transient changes of rate (hence large rate heterogeneity) or a general bias towards a speed up or slow down in rates through time.

Using this new procedure, an analysis of 69 mitochondrial protein sequences (3660 amino acid sites in total) from placental mammals identified a rapid acceleration of evolutionary rate for the lineage directly leading to the common ancestor of Supraprimates and an even more marked one for the lineage leading to Laurasiatheria. This acceleration was followed closely by a strong deceleration, which persisted in nearly all lineages of Laurasiatheria. In contrast, almost all lineages of Afrotheria and Xenarthra seem to have retained rates similar to that of the root. This view is in marked contrast to current rate-change penalty functions. The robustness of the new procedure is assessed using simulations that show the types of change that most concern biologists; speedups or slowdowns through time, transient rate changes, and rate changes that are do not follow a normal or transformed normal distribution, as well as stochastic variation. A revised estimate of the origin of placental mammals is as young as 84 mya, which is much more recent than current estimates using molecular data. The inferred age of deeper splits in the placental tree are compared with the rate of occurrence of new species from the North American fossil record. These two sources of data are far more congruent than is suggested by using current, possibly strongly misleading, dating methods.

## Results

### Revised rates and times of mitochondrial protein sequences from placental mammals

Mitochondrial protein sequences are used widely in phylogenetic studies and have been particularly popular in studying placental mammals. A desirable feature of these data is relatively long sequences, good taxon sampling and very little missing data. Following alignment, we retained 3660 amino acid sites present in all of 62 placental mammals plus seven outgroup taxa (Table S1 gives the accession numbers of protein sequences). In contrast, large nuclear protein datasets of mammals with diverse taxon sampling [25], [5] show a large proportion of missing data. Such missing heterogeneous data may lead to complex systematic errors in distance estimation [26]. This is particularly relevant since MVS analyses work best with low stochastic noise (hence relatively long sequences), diverse taxon sampling with a considerable proportion of taxa showing minimal convergent evolution, and minimal bias of the input distances from sources such as missing data.

We adopted a standard two-step procedure to estimate divergence times. The first step is to estimate the phylogenetic tree with unconstrained branch lengths in units of expected numbers of substitutions per site. Given the problems with convergent evolution in mitochondrial data [5], [2], [15], [18], we used the MVS procedure to correct the input distances for convergent evolution. A variant of the core set approach [19] to MVS was used in this instance (Materials and Methods). Fixing the topology to the tree obtained by MVS, which is very similar to trees obtained by previous authors using varied data sets [27][2], branch lengths were reestimated using maximum likelihood (ML) in the program PAML [28]. We used the JTT [29]+Gamma model with α = 0.5 for these analyses. The second step uses the tree and its branch lengths, some fossil constrained nodes, and a penalty or cost function to dampen rate changes, to infer divergence times of all nodes and evolutionary rates along all branches.

Three cost functions, FADD, FLOG, and FIR, were applied to the MVS and ML trees. These functions penalize the fluctuation of rates, and do this on either a linear scale (here called the ADD function), on the log rate (LOG), or on the inverse rates (IR), respectively (see equations (1), (2), and (4) in Materials and Methods). The tree estimated by MVS and FADD is denoted MVS-FADD and so on (while ML-FADD indicates use of ML branch lengths).

To calibrate these trees, we used eight fossil constraints, all taken from previous studies (see Figure 1 caption) [5], [9]. The divergence times were estimated by then minimizing each cost function subject to these constraints (see Materials and Methods). The CI was calculated using a likelihood interpretation (see Materials and Methods) of the cost function residual analogous to the method used in Multidivtime [23]. This captures the error caused by deviations of the tree's branch lengths away from their expected values and includes the unpredictability of rate changes, which might mimic Brownian motion, for example. It will also incorporate variability arising from ancestral polymorphism. Polymorphism will cause fluctuations of the edge lengths of the tree when the analysis has multiple nodes constrained by fossil or other data [13]. Generally, the time at the most recent common ancestor of sequences from different species is older than the time at speciation. The extent of this difference varies among internal nodes due to both the stochastic nature of the coalescent and factors governing its expected magnitude, such as population size. The cost function does not take account of this bias, but the CI does include the variance arising from ancestral polymorphism at internal nodes.

The MVS-FIR analysis (Figure 1A) estimated the age of the root at 84.2 mya with a 95% confidence interval (CI) of 80.7–88.4 mya (Table 1). This time is much more recent than the result of the ML-FL analysis (Figure 1B), i.e., 122.2 (95% CI, 112.2–144.7) mya. Note that, ML-FL is giving results consistent with that reported in earlier studies [1][2], [30][9], indicating that the younger root for placentals returned using the function FIR (see below) is due to the method and not the data. The difference between the FLOG and the F'LOG function in table 1 is explained in Materials and Methods. It is the FLOG method that most closely approximates the Brownian motion assumed by Multidivtime.

Figure 1 shows the estimated ancestral rates across the tree, while Figure 2 represents the inferred evolutionary rates going from the root to a terminal. The single instance of root to fin whale is shown in figure 2, while figure S1 traces the evolutionary rates inferred by different methods along seven representative lineages. The MVS-FIR tree identified an abrupt acceleration of evolutionary rate near the common ancestor of both Supraprimates and Laurasiatheria, then a very strong acceleration in just the ancestral lineage of Laurasiatheria (Figures 1A and 2A). This acceleration was followed closely by a strong deceleration along nearly all lineages of Laurasiatheria. In contrast, almost all lineages of Afrotheria and Xenarthra have retained rates close to that of the root. There are also sporadic later accelerations, for example that among hedgehogs and Moon rat (Figure 1A).

The cost function FIR detected an acceleration–deceleration pattern near the base of Laurasiatheria using both the MVS and ML branch lengths (the red lines of Figure 2), whereas the function FLOG showed a far more flat prediction of generally lower evolutionary rates that led to older root times (the green lines of Figure 2). Even more extreme, the cost function FADD inferred gradually decreasing rates in all deep branches of the MVS and ML trees (the blue lines of Figure 2), and the root time tended to infinity. Table 1 summarizes the root time values estimated by various models (all the node divergence times from these models are listed in Table S2).

Table 1 also compares the inferred times with those obtained from r8s, a popular program for estimating absolute rates [11], [12]. This program has two options of for its cost functions, FADD and F′LOG (equation (3) in Materials and Methods). It also takes account of stochastic noise caused by the finite length of the sequences used to derive the branch lengths using a penalized-likelihood approach. The fitting to the estimated branch lengths is expressed by the log likelihood, while the weight for the cost function is estimated using cross validation. In our mitochondrial dataset, with its long sequences, the root time values changed little due to the cross validation effect (the reason is explained in Materials and Methods). Further, root divergence time estimates returned by r8s do not show confidence intervals in table 1, since r8s only assesses the sampling variance of node times due to stochastic errors in branch lengths due to finite sequence length (it does this via the bootstrap) and does not include the effect of stochastic variation of evolutionary rate. This source of error remains as sequence lengths go to infinity, and will be dominant with long sequences.

### Agreement with fossil data

Figure 3 represents the chronological distribution of the internal node density for the MVS-FIR tree (Figure 1A) compared with the rate of new species appearing in the well-studied North American fossil record. To avoid potential bias of divergence times in the molecular tree due to species sampling, the scaling constant was determined by a least-squares fit to the fossil data in the period 85–50 mya.

The MVS-FIR analysis reconciles molecular with fossil data in two ways. First, the chronological distribution of the internal node density is clearly largely consistent with the rate of appearance of novel fossil species. This is despite the fact that the paleontological data assessed are limited to the well-studied North American record [7]. Both the MVS-FIR tree and the fossil record suggest accelerating taxonomic diversity near the K–T boundary, rather than a longer slower buildup [27][2], [30][9]. Use of the less robust FLOG criterion seems to suggest a prolonged increase in diversity that agrees far less with the quantitative fossil record. This congruence with fossils does not automatically show that the combination of MVS and FIR is the best way to analyze this data, but it does reframe the discussion of the relationship between the fossil and molecular times into one where the molecular dates are being looked at far more critically.

Second, and more indicatively, the cost function FIR resolves incongruence among the fossil constraints/inferred divergence times in different parts of the molecular tree. The best fossil constraints in Laurasiatheria suggest much older times than constraints in other parts of the tree [5], [31]. The MVS-FIR, MVS-FLOG, and MVS-FADD trees with all constraints give the root as 84.2, 105.0, and ∞ mya, respectively (Table 1). Removing the whale–hippo constraint gives 82.4, ∞, and 91.2 mya; removing the horse–rhino constraint gives, 82.8, 105.0, and ∞ mya; and removing both sets of constraints gave 82.9, 87.2, and ∞ mya. Thus, in this case only the FIR tree is insensitive to “constraint sampling”.

Finally, there is a good fossil calibration for tarsier [5] that was withheld because it is not used in both references 4 and 15. It suggests that human and tarsier split around 50–60 mya and the molecular trees suggest the human-loris split was not much earlier (probably <5 myr earlier). Despite our use of whale–hippo and horse–rhino calibrations, the MVS-FIR tree (Figure 1) gives a human-loris split close to this age, in contrast to earlier studies [5], [9], [24].

### Sensitivity and robustness of FIRassessed by simulations

We highlight two distinct properties of the new function FIR with the help of evolutionary simulations and worked examples. The first property is its ability to detect a transient acceleration of evolutionary rate. Such an effect might be caused by a burst of positive selection and/or a bottleneck in population size. The second is to assess the effects of both stochastic fluctuations and systematic bias on the robustness of estimated times. Here, we model bias in the form of either a general slowdown or a general acceleration of evolutionary rate across the whole tree.

We first modeled a strong instantaneous acceleration as an analogue to what is inferred by FIR to have occurred ancestral branch leading to Laurasiatheria. We simulated a 32-taxon symmetric tree in which a molecular clock holds except for an abrupt elevation (by a factor of 10) of evolutionary rate along a short internal branch (the red line in Figure 4A). This example is useful for demonstrating the efficacy of the FIR function and something similar appears on both the MVS-FIR and ML-FIR mitochondrial trees. The times at the internal nodes were set to 48, 56, 64, and 72 mya, and the root time was set to 80 mya. On this weighted tree, the cost functions were minimized with two constrained node times, which corresponded to two fossil calibration points. Only the FIR function accurately estimated the true divergence times and appeared robust against the abrupt change (Figure 4B). In contrast, the functions FLOG and FADD inferred gradual rate changes (Figure 4E), which are erroneous and lead to overestimation of the root time along with that of many other nodes (Figures 4C and 4D). Table 2 summarizes the root time values inferred by various models. Table 2 includes the result of r8s with the cross-validation method.

We next simulated stochastic rate fluctuations by themselves, plus either a prevailing slowing down or acceleration of rates through time. Such simulations are distinct from a Brownian-type process, and are used to gauge the general robustness of the functions. To impose rate fluctuations on the same basic tree as Figure 4A, but without the abrupt rate change, the age of the root was set to 100 mya. We assumed infinite length of sequences and that the branch lengths are known without uncertainty. Next a branch was randomly selected proportional to its duration in time and all its descendant branch lengths were multiplied by a factor (chosen randomly) from a uniform distribution of range 0.5–1.5. A total of 25 such rate changes were placed on the tree to give the final branch lengths for that tree. These branch lengths were used as the input data and the various cost functions were minimized with on calibrated internal node, and then determined all node times. We repeated this procedure 600 times to obtain the mean and standard error of the root time. We also simulated the two other cases. The only difference was the range of the uniform distribution used. Using a range of 1.0–1.75 the model is biased toward rate acceleration through time, while using and 0.25–1 induces a rate slow down through time (a probable situation in the placentals generally). As table 3 shows, the function FIR gave reasonable estimates with a bias towards deceleration, whereas FADD gave an infinite root time in 139 of 600 samples. These undefined root-time values were set arbitrarily to 200 mya before calculating the mean and standard error.

### Reduced bias in branch length estimation by the MVS model

The MVS model was shown previously to recover the correct tree in a simulation with two strongly convergent lineages [32] that were grouped erroneously by standard methods (including the neighbor joining (NJ) [33] and ML [28] methods). Here, we examined a different question; this is how well the MVS method recovers the true branch lengths, which are of primary importance when estimating divergence times. We simulated the evolution of a sequence of 104 amino acid sites, following the tree depicted in Figure 5. The model of amino acid substitutions used was the JTT [29]. Convergent evolution was then imposed on this data. This was done by sharing parts of the sequences among three ingroup clades plus the outgroup. The red lines in Figure 5 indicate which lineages shared sequence and were therefore subject to a form of convergent evolution.

Pairwise distances were then estimated from the terminal sequences obtained above using the same JTT model. A modified MVS core-set procedure [19] was able to recognize that the tree consisted of three groups of eight sequences which had additive distances within each group and the outgroup. The MVS procedure then converted the distances within the three ingroups into three sets of perfectly additive distances and sequentially combined them to form a single core set (Materials and Methods). The final MVS tree was obtained by modifying the distances between this single ingroup core set and the outgroup following the rules described in Materials and Methods.

The branch lengths recovered by the MVS model reproduced the true values accurately (Figure 6). The distribution around the diagonal line in Figure 6 represents the stochastic fluctuation of the estimated distances; that is, the magnitude of this fluctuation did not change after the simulation was rerun without convergent evolution. In contrast, the NJ and ML trees returned branch lengths that were affected clearly by convergent evolution. The worst affected branch lengths were ancestral to the groups showing convergent evolution (underestimated) or else were leading to the sister groups of the groups affected by convergent evolution (overestimated).

## Discussion

We begin the discussion by examining why the current cost functions, FLOG and FADD, overestimated the age of the root in the worked example with strong rate heterogeneity in the form of a short term highly elevated evolutionary rate. It is also important to examine the profile of the cost function around their minimal values for the age of the root. The functions FLOG and FADD showed asymmetric behavior around the estimated root time, even in worked examples with a perfect molecular clock (data not shown), whereas the FIR profile was symmetric and parabolic in shape. The asymmetry seen with FLOG and FADD increased in response to a general bias towards deceleration of rates through time (Figure 7A). In particular, the function FADD showed a monotonic decrease with respect to increasing age of the root, that is, its estimate tended to infinity. Thus, asymmetric behavior of a cost function seems to be a symptom of unstable estimation of the age of the root. A similar strong asymmetry appeared in the profile of the current cost function with respect to the age of the root of the mitochondrial tree of placental mammals (Figure 7B), irrespective of whether MVS or ML branch lengths were used. Comparing Figures 7A and 7B suggests that bias due to decelerating evolutionary rates occurred in the evolutionary history of placental mammals, and is impacting the ability of current methods to estimate the correct age of the root.

Other approaches to divergence time estimation are under active development. For example, Drummond et al. [34] recently developed a set of Bayesian procedures to jointly estimate divergence times, evolutionary rates, and the tree topology. The uncorrelated variable rate model they use, which assumes independence of evolutionary rates between branches, may also be able to identify instantaneous accelerations. However, the magnitude of rate acceleration may be underestimated because hierarchical Bayes estimates can generally be regarded as shrinkage estimators [35]. An important future direction is to assess the performance of the various estimators in simulations where rate variability is a mixture of an autocorrelated process plus rare, but strong, instantaneous accelerations.

Our refined approaches resolve apparent contradictions between the quantitative molecular and paleontological data of placental mammals. Given such agreement, there is no need for ancillary hypotheses such as the long fuse model [27], [2], [6], [30][9], [36] to explain the lack of any positively identified fossils of modern placental mammals prior to about 75 million years ago. In addition, dates of markedly less than 100 mya for the root of placental mammals also bring into question earlier hypotheses that traditional continental drift models explain the geographic distribution of the four major groups of placental mammals [27][2], [13], [37]. We anticipate that the FIR cost function developed in this paper will provide an improved methodology for a wide range of molecular studies because its robustness to rapid fluctuations of rate is essential to understanding events such as adaptive evolution. For example, because acquisition of new molecular functions can be achieved in a few million years after gene duplications [35], the duration of an inflated evolutionary rate may be surprisingly short, and, correspondingly, difficult to detect. We hope our new methods will be beneficial in such situations and will lead to a richer understanding of molecular evolution.

## Materials and Methods

### Autocorrelated variation of inverse rate

Because the branch lengths of a phylogenetic tree are estimated from the data as the product of the evolutionary rate of a branch and its time duration, these two factors cannot be directly estimated separately. It has become standard to use loose constraints to accommodate uncertainty and it is often wise to exclude constraints if there is no firm basis for them. Information on the age of some internal nodes may be fairly directly available from the fossil record. An example of this is the horse-rhino split [13]. If the molecular clock [10] governs the process of molecular evolution well, we can accurately estimate the times at other nodes given just a single reliable calibration point. However, the assumption of the molecular clock is often rejected by the fit of a clock-like tree to the data.

The most widely used assumption in order to model the evolutionary rate changes away form being constant (or away from a clock) is to introduce a stochastic process. For example, Sanderson [11] suggested the following cost function be minimized:

${\displaystyle F_{ADD}={\mathit {\Sigma }}_{n}(R_{n}-R_{a(n)})^{2}}$

1with rates Rn and Ra(n) being from the nth branch and its ancestral branch a(n), respectively. The exponent of the negative cost function Πn {(2π)1/2σ}−1 exp{−(RnRa(n))2/2σ2} implies that this function can be interpreted assumes a priori that rate changes follow a random walk with independent increments randomly drawn from a normal distribution. Likewise, Thorne et al. [23] considered a cost function of the type

${\displaystyle \mathrm {F_{LOG}} =2{\mathit {\Sigma }}_{n}(\log R_{n}-\log R_{a(n)})^{2}/(T_{n}-T_{a(n)})}$

2in the framework of the hierarchical Bayes procedure. Here Tn is the time interval of the nth branch (the current version of Multidivtime [23] uses rates at the nodes, instead of average rates along branches, as the free parameters). This can be interpreted as an assumption that the log rate undergoes Brownian motion. That is, independent increments of normal random variables, whose variances are proportional to the average time duration between the pair of branches (that is, the average length in time of the two branches). Recently, Sanderson implemented the unweighted cost function

${\displaystyle F_{LOG}^{\prime }=2{\mathit {\Sigma }}_{n}(\log R_{n}-\log R_{a(n)})^{2}}$

3into his program r8s[12].

However, the estimated rates based on any of the above cost functions may overly smooth the change of evolutionary rates when a pronounced transient change of rate has occurred. In turn, biased estimates of evolutionary rates will lead to biased estimates of divergence times. Here we propose a new cost function, which penalizes the local rate deviation from the harmonic mean. For simplicity, we ignore the stochastic variance of branch length [11], [12], [23], which goes to zero with increasingly long sequences. Denote the branch length and inverse rate of the nth branch by Bn and Qn ( = Tn/Bn), respectively. The local rate variability (FIR,n) of two successive branches can be expressed as the variance of Qn and Qa(n) around their average value, An = (QnBn+Qa(n)Ba(n))/(Bn+Ba(n)), which is equal to the inverse of the harmonic mean of Rn ( = Bn/Tn) and Ra(n), with the weights Bn and Ba(n). That is, FIR,n = {(QnAn)2Bn+(Qa(n)An)2Ba(n)}/(Bn+Ba(n)). Because FIR,n is rewritten as FIR,n = (QnQa(n))2Wn with the weight Wn = BnBa(n)/(Bn+Ba(n))2, the inverse-rate cost function, FIR, is defined as the weighted average of (QnQa(n))2 over all ancestor–descendant branch pairs in the tree:

${\displaystyle \mathrm {F_{IR}} ={\mathit {\Sigma }}_{n}\mathrm {F} _{\mathrm {IR} ,n}/{\mathit {\Sigma }}_{n}W_{n}={\mathit {\Sigma }}_{n}(Q_{n}-Q_{a(n)})^{2}W_{n}/{\mathit {\Sigma }}_{n}W_{n},}$

4which contrasts with previous expressions in terms of Rn.

Because FIR places a smaller penalty on abrupt rate changes than do previous models, it can confine an abrupt change close to where it occurred on the tree. In contrast, FADD and FLOGa priori put stress upon the smoothness of evolutionary rate change from one branch to the next, and this may propagate the effect of an abrupt rate change over successive branches. In this article, the variable (Tn) (the divergence times) were estimated by minimizing the cost functions (equations 1–4) subject to the (fossil) constraints without introducing other modifiers such as cross validation.

At first glance, it seems most intuitive to use a penalty such as (rate1−rate2)2; this has been the implicit assumption until now [23] and is the basis of published programs such as r8s and Multidivtime[11], [12], [23]. It rests on the implicit expectation that any reasonable type of smoothing of changes in rate will give a similar answer. However, a simple linear form for the difference in rates is misleading, and this is illustrated by considering the general problem of estimating velocity (rate) given distance (in our case branch length). Going from point a to b at velocity v1, then back at velocity v2, the average velocity (vav) is equal not to the standard arithmetic mean, but to the harmonic mean, 1/vav = (1/v1+1/v2)/2. Further, going from point (node) a to b at velocity (rate) v1, and from point b to c at velocity v2, then the distances traveled (dab and dbc) need no longer be equal so we must use weights. Specifically, vav = (dab+dbc)/(t1+t2), where t1 = dab/v1 and t2 = dbc/v2. As a result, 1/vav = (dab/v1+dbc/v2)/(dab+dbc). This is the same form as the average quantity An used to obtain FIR (equation 4). The use of the harmonic mean becomes important when v1 differs greatly from v2.

### The penalty in hierarchical Bayes and penalized likelihood approaches

The Bayesian approach estimates the divergence times and evolutionary rates in the form of a posterior distribution, which is summarized approximately as P(''B''|''R'',''T'') exp(−λF). Here B, R, and T are the vectors of the estimated branch lengths, evolutionary rates, and divergence times, respectively, while F is the cost function. The value of λ expresses the weight for the penalty, and is called a hyperparameter. Introducing the distribution for the hyperparameter is done via a so-called hyper-prior; the hyperparameter is then estimated concurrently with the posterior distribution. The penalized likelihood approach maximizes log P(''B''|''R'',''T'')λF. The weight for the penalty λ is estimated using cross-validation. When the sequences are long enough, for example concatenated mitochondrial protein sequences, it is often safe to assume that the branch lengths are estimated accurately, that is with minimal stochastic error (but not necessarily without systematic error, something we address in this paper using MVS). Accordingly, it is assumed that the products of rates and times are known exactly. Thus in a situation of long, or very long sequences (e.g. genomic alignments), the results returned by using just the penalty function will converge to those returned by either a Bayesian or a penalized likelihood approach with the same type of cost function.

### MVS core-set correction of biased evolutionary distances

In the MVS method, the additivity of evolutionary distances is converted using the expected orthogonality among branch vectors in a multidimensional Euclidean space [18]. If the estimated pairwise distances do not satisfy additivity, MVS provides an index to measure deviations from orthogonality without specifying a tree structure. This index enables one to diagnose whether a distance matrix is compatible with a tree and to modify the biased distances until they satisfy orthogonality, that is, with a zero value for the index. Here, a revised core-set approach, that is, an extension of [19], is applied to correct the observed pairwise distances for convergent evolution.

The first step of this approach is to divide the set of taxa into subgroups and to correct the distances between pairs in each subgroup. We decompose the taxa based on partitions with both strong biological support and high bootstrap support. The deviation from additivity of pairwise distances is much smaller within each subgroup than across the whole distance matrix. If an anomalously large deviation is observed associated with a single taxon within a subgroup, that taxon is removed temporarily from the analysis. When the only deviations between distances within a subset of taxa are judged to be due to stochastic noise, the pairwise distances are modified by solving the equation of motion (a method using a many body kinetic equation in physics), which uses the index of the deviation from additivity as the potential energy [18], [32]. The modified distance matrix within a subgroup is interpreted as the true additive distance matrix (core set) achieved by minimal modification of the original distances. Generally, the modified distances are very close to the distances estimated by the NJ method for just these taxa. Then, the excluded taxa are deposited into this core set tree by solving the equation of motion with the same potential function. Assuming that convergent evolution and long-branch attraction are the main source of deviation from additivity, we allow for only positive corrections of the biased distances between the core set and excluded taxa (that is, distances can only get bigger).

The second step is a clustering of subgroups. The pairwise distances between two core-set groups are corrected by assuming attractions between them (that is, some distances are underestimated) and by minimal enlargement of the distances between the two groups to satisfy additivity. In this way, the core-set approach used in this article is in the direction opposite to that used in an earlier procedure that proceeds from the whole tree down to subtrees [32]. This revised method seems more appropriate here because the branching pattern within subgroups becomes more apparent and there are smaller deviations from additivity.

Conceptually, the first step of MVS core set is to decompose the whole tree into putative monophyletic groups that show minimal bias internally. The approach developed as part of this particular study resembles k-means clustering, that is, minimizing the deviation (variance) within groups while maximizing the deviation (variance) between groups. At present, the decomposition is done in a supervised, as opposed to an automatic, manner. In the second step, and when it is difficult to connect two groups definitely, we first exclude taxa with the largest deviations from additivity and then complete a new core set for the remaining sequences. We then locate the excluded sequences on the core set tree using minimal modification of distances (a parsimony-like criterion) under the constraint of only enlarging the distances identified as being biased (thus the only biases are assumed to be due to attraction).

When analyzing the mitochondrial protein sequences, we estimated pairwise distances initially under the JTT+Gamma model with α = 0.5. Given moderately high bootstrap support, the placental tree was decomposed into four major groups: Laurasiatheria, Supraprimate, Xenarthra, and Afrotheria. Laurasiatheria was decomposed into five subgroups, namely Cetartiodactyla, Perissodactyla, Carnivora, Chiroptera, and Euliptotyphyla. Supraprimates was decomposed into two subgroups: Primates and Glires. The core-set analysis began by generating additive distances within these subgroups, after which the clustering procedure described above was applied to connect these subgroups. In this analysis, we analyzed sequences of Xenarthra and Afrotheria together in a single core-set. This is because a relatively short internal branch separates them and Xenarthra contains but a single sequence in this analysis. Finally, we tried to modify the distances between placental mammals and the outgroups, but we could not determine the position of the root with confidence because of very strong attractions between many placental mammals and the outgroups. The consensus [25][2], [9] has been that the most probable placement of the root separates Afrotheria from the other placental mammals. The most likely alternative placement for the root is between Xenathra plus Afrotheria (Atlantogenata [27], [38]) and all other placentals (Boreotheria). Recent analyses including LINE sequences and indels [39] support this view. When we place the root in this position, the divergence times, including that of the root, change little (results not shown).

Further details of the current MVS procedure of distance modification are documented in Methods S1. The derived additive distance matrix was converted to a Newick tree file with the help of the NJ method, and this was used as the input data for divergence time estimation by the cost functions.

### Log-likelihood profiles of the cost function with respect to the age of the root

We note that least-squares estimators can be interpreted as ML estimators when errors in the data follow a multi-normal distribution. Accordingly, the least-squares residual can be treated as twice the log-likelihood ratio for the three cost functions GJ(Troot) (J = IR, ADD, or LOG); that is, as GJ(Troot) = N log{FJ(Troot)/FJ(TM)}. Here, FJ(Troot) gives the minimum residual value with all internal node times reestimated, except for the root time Troot, and it becomes truly minimal at Troot = TM. Here, N is the number of terms in the cost function, which is equal to 2t−4, where t is the number of tips on the tree (here t = 62, so N = 120). The 95% CI can be obtained using the threshold of GJ(Troot) = 3.84. Here, the minimum for criterion FADD was arbitrarily set to FADD(Troot) at Troot = 200 because the profile decreased monotonically for all greater values of the root time. In all evaluations, we avoid putting arbitrary constraints near the root of the tree (for example, that the rate at the root is the same as that of one of its descendants) because while arbitrary constraints may bound the minimum, they will also create unknown biases. Note that the MVS-FIR profile is symmetric and parabolic around the estimated root time, whereas those of FLOG and FADD are not. We see the same thing even in worked examples using a perfectly clock-like tree.

Our novel way of estimating CIs jointly takes into account some, but not all, of the sources of error noted in a previous publication [13]. These include errors in edge length estimation (because of a finite sampling of sites), the unpredictability of fluctuating evolutionary rates, and coalescent time. The assumptions of this method, vis-à-vis independence of errors, are the same as those used in programs by Thorne et al. [23]. Note that even if edge lengths have no error due to sampling of sites (e.g., the type of error identified by the bootstrap), they may still show four wide CIs because of the other two factors, which do not go to zero with increasing sequence length. This, in turn, shows that using the sequence bootstrap to infer errors on divergence times [11], [12] is not comprehensive and therefore the CIs are too narrow. We expect that this empirical approach to estimating standard errors will become increasingly accurate with increased numbers of internal nodes and calibration points.

Fortran source code and executable versions of the programs used in this study are downloadable from 〈http://www.kochi-ms.ac.jp/∼ct_cmis/kitazoe/〉. The operation of these programs, using the data in this analysis, is documented in Methods S2.

## Supporting Information

### Methods S1

The procedure of MVS distance modification for placental mammals(0.04 MB PDF)pone.0000384.s001.pdf

### Methods S2

The operation manual of programs for divergence time and MVS analyses(0.04 MB PDF)pone.0000384.s002.pdf

### Figure S1

Trace of evolutionary rates along seven lineages in the MVS-FIR and ML-FIR analyses. The ML-FIR analysis (B) showed a flatter peak rate than did the MVS-FIR tree (A) (the black lines of F in the whale lineage) and inferred a longer period at a lower rate of evolution, producing an older root time. If lineages appear to merge with each other, zoom in to follow their exact path.(0.68 MB TIF)

### Table S1

Sequences and accession numbers used in this paper. The aligned amino acid sequences of the mitochondrial proteins ND1-ND6, ND4L, CO1-CO3, CTYB, and ATP6, are concatenated giving a total length of 3660 sites.(0.09 MB DOC)pone.0000384.s004.doc

### Table S2

Divergence times of the MVS-FIR, ML-FIR, and ML-FLOG analyses. The numbers 1–61 denote the ancestral nodes in Figure 1. The red numbers stand for the positions of the time constraints. Times older than the constraints varied the most, particularly using ML branch lengths.(0.09 MB DOC)pone.0000384.s005.docWe thank Jeffrey L Thorne for his constructive criticism and theoretical insights.

## Figures and Tables

### Table 1

"The age of the root of placental mammals(10.1371/journal.pone.0000384.t001)"
 Tree model Cost function FIR FADD FLOG F'LOG FADD(r8s) F'LOG(r8s) MVS 84.2(80.7, 88.4) Infinity 105.0(97.0,117.0) 91.8(84.5,104.7) 160.5 91.1 ML 106.4(102.4,110.6) Infinity 122.2(112.2,144.7) 112.0(102.0,121.2) 122.2 111.6
The age of the root using MVS and ML branch lengths was estimated after minimizing various penalty functions with the same fossil constraints. All times are in mya, and 95% confidence intervals were estimated by the sum-of-squares method described in Materials and Methods. The times estimated by the r8s program use cross-validation and penalized likelihood methods with essentially the same penalty functions [12], but do not report comprehensive errors.

### Table 2

"Estimated age of the root in the presence of an abrupt acceleration then deceleration of ancestral rates(10.1371/journal.pone.0000384.t002)"
 True value Cost function FIR FADD FLOG F'LOG FADD(r8s) F'LOG(r8s) 80 81.3 161.1 114.5 101.7 160.5 108.7
These estimates are from a worked example with an abrupt acceleration then deceleration (the red line of Figure 4A) which occurs deeper in the tree than the calibration points. The true root age is 80.0 mya. The estimated values are made by minimizing the cost functions (that also appear in table 1), but upon the weighted constrained tree of figure 4A.

### Table 3

"The age of the root inferred by functions FIR, FLOG, and FADD on simulated data with random auto-correlated rate changes(10.1371/journal.pone.0000384.t003)"
 Rate range Cost function FIR FLOG FADD 0.25–1.0 111.3±11.1 125.1±12.7 172.0±33.9 0.5–1.5 100.7±4.4 103.3±3.9 103.6±5.1 1.0–1.75 93.2±2.0 94.0±2.1 94.3±2.4
In these simulations, the length of a branch (on what tree!) is selected randomly proportional to its true duration in time. A rate adjustment (change) factor is then chosen randomly form a uniform distribution and all its descendant branch lengths were then multiplied by this factor. A total of 25 such random rate changes were placed on the tree, then the branch lengths (measured in the product of rate and time) of the weighted tree were passed on to the time estimation procedures. The whole procedure was repeated 600 times to obtain the average root time and standard error. Three different ranges were used for the uniform distribution of rate changes. The first has a range of 0.25 to 1 and represent a strong persistent bias towards rate deceleration. A range of 0.5 to 1.5 gives minimal rate change bias, but retains stochastic fluctuations. Finally a strong acceleration effect is achieved by the use of the range 1.0–1.75. The function FADD gave an age of the root tending to infinity on 139 of the weighted trees simulated under the deceleration model. In such cases the root time is set to 200 mya in order to allow the mean and standard deviation of this cost function to be calculated.Competing Interests: The authors have declared that no competing interests exist.Funding: This work was supported by the Technology Agency (JST) and the Japan Society for the Promotion of Science (JSPS grant SR B-16300086 to YK and HK). PJW acknowledges the JSPS Senior Fellowship Program and NIH grant 5R01LM8626 for support.

## References

1. Stanhope MJ, Waddell VG, Madsen O, de Jong W, Hedges SB, et al. Molecular evidence for multiple origins of Insectivora and a new order of endemic African insectivore mammals. Proc Natl Acad Sci U S A. 1998;95:9967–9972.9707584
2. Springer MS, Stanhope MJ, Madsen O, de Jong WW. Molecules consolidate the placental mammal tree. Trends in Ecology and Evolution. 2004;19:430–438.16701301
3. Easteal S. Molecular evidence for the early divergence of placental mammals. BioEssays. 1999;21:1052–1058.10580990
4. Arnason U, Adegoke JA, Bodin K, Born EW, Esa YB, et al. Mammalian mitogenomic relationships and the root of the eutherian tree. Proc Natl Acad Sci U S A. 2002;99:8151–8156.12034869
5. [see 4] Cite error: Invalid <ref> tag; name "pone.0000384-Waddell2" defined multiple times with different content
6. Benton MJ, Ayala FJ. Dating the tree of life. Science. 2003;300:1698–1700.12805535
7. Alroy J. The fossil record of North American mammals: evidence for a Paleocene evolutionary radiation. Syst Biol. 1999;46:107–118.12078635
8. Archibald JD, Deutschman DH. Quantitative analysis of the timing of the origin and diversification of extant placental orders. J Mammal Evol. 2001;8:107–124.
9. Springer MS, Murphy WJ, Eizirik E, O'Brien SJ. Placental mammal diversification and the Cretaceous–Tertiary boundary. Proc Natl Acad Sci U S A. 2003;100:1056–1061.12552136
10. Kimura M. Evolutionary rate at the molecular level. Nature. 1968;217:624–626.5637732
11. Sanderson MJ. Nonparametric approach to estimating divergence times in the absence of rate constancy. Mol Biol Evol. 1997;14:1218–1231.
12. Sanderson MJ. Estimating absolute rates of molecular evolution and divergence times: A penalized likelihood approach. Mol Biol Evol. 2004;19:101–109. Software r8s, version 1.71 homepage http://ginger.ucdavis.edu/11752195
13. [c.f. 13] Cite error: Invalid <ref> tag; name "pone.0000384-Waddell3" defined multiple times with different content
14. Horai S, Hayasaka K, Kondo R, Tsugane K, Takahata N. Recent African origin of modern humans revealed by complete sequences of hominoid mitochondrial DNAs. Proc Natl Acad Sci U S A. 1995;92:532–536.7530363
15. Nikaido M, Cao Y, Harada M, Okada N, Hasegawa M. Mitochondrial phylogeny of hedgehogs and monophyly of Eulipotyphla. Mol Phylogenet Evol. 2003;28:276–284.12878464
16. Zhang J, Kumar S. Detection of convergent and parallel evolution at the amino acid sequence level. Mol Biol Evol. 1997;14:527–536.9159930
17. Swofford, DL, Olsen GJ, Waddell PJ, Hillis DM. Phylogenetic Inference. In: Hillis DM, Moritz, editors. Molecular Systematics, second edition. Sunderland, Mass.: Sinauer Assoc; 1996. pp. 450–572.
18. Kitazoe Y, Kurihara Y, Narita Y, Okuhara Y, Tominaga A. A new theory of phylogeny inference through construction of multidimensional vector space. Mol Biol Evol. 2001;18:812–828.11319265
19. Okabayashi T, Kitazoe Y, Kishino H, Watabe T, Nakajima N, et al. Core set approach to reduce uncertainty of gene trees. BMC Evol Biol. 2006;6:41:10.1186/1471-2148-6-41.16712735
20. Zuckerkandl E, Pauling L, Bryson V, Vogel HJ, editors. Evolving genes and proteins. New York: Academic Press; 1965. p. 97 p.
21. Kishino H, Hasegawa M. Converting distance to time: application to human evolution. Methods Enzymol. 1990;183:550–70.2314292
22. Yoder AD, Yang Z. Estimation of primate speciation dates using local molecular clocks. Mol Biol Evol. 2000;17:1081–1090.10889221
23. Thorne JL, Kishino H, Painter IS. Estimating the rate of evolution of the rate of molecular evolution. Mol Biol Evol. 1998;15:1647–1657.9866200
24. Hedges SB, Kumar S. Precision of molecular time estimates. Trends in Genetics. 2004;20:242–247.15109778
25. [e.g.3] Cite error: Invalid <ref> tag; name "pone.0000384-Murphy1" defined multiple times with different content Cite error: Invalid <ref> tag; name "pone.0000384-Murphy1" defined multiple times with different content
26. Misawa K, Nei M. Reanalysis of Murphy et al.'s data gives various mammalian phylogenies and suggests overcredibility of Bayesian trees. J Mol Evol. 2003;57(Supple 1):S290–S296.15008427
27. Waddell PJ, Okada N, Hasegawa M. Towards resolving the interordinal relationships of placental mammals. Syst Biol. 1999;48:119–137.12078636
28. Yang Z. PAML: a program for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997;13:555–556.9367129
29. Jones DT, Taylor WR, Thornton JM. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992;8:275–282.1633570
30. Kumar S, Hedges SB. A molecular timescale for vertebrate evolution. Nature. 1998;392:917–920.9582070
31. Near TJ, Sanderson MJ. Assessing the quality of molecular divergence time estimates by fossil calibrations and fossil-based model selection. Philos Trans R Soc Lond B Biol Sci. 2004;359:1477–1483.15519966
32. Kitazoe Y, Kishino H, Okabayashi T, Watabe T, Nakajima N, et al. Multidimensional vector space representation for convergent evolution and molecular phylogeny. Mol Biol Evol. 2005;22:704–715.15548750
33. Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4:406–425.3447015
34. Drummond AJ, Ho SYW, Phillips MJ, Rambaut A. Relaxed phylogenetis and dating with confidence. PLoS Biology. 2006;4:699–710. Software BEAST homepage http://evolve.zoo.ox.uk/beast/
35. Stein C. Inadmissibility of the usual estimator for the mean of a multivariate distribution. Proc Third Berkeley Symp Math Statist Prob. 1956;1:197–206.
36. Bromham L, Ramhaut A, Fortey R, Cooper A, Penny D. Testing the Cambrian explosion hypothesis by using a molecular dating technique. Proc Natl Acad Sci U S A. 1998;95:12386–12389.9770496
37. Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–1155.11073452
38. Waddell PJ, Cao Y, Hauf J, Hasegawa M. Using novel phylogenetic methods to evaluate mammalian mtDNA, including amino acid-invariant sites–LogDet plus site stripping, to detect internal conflicts in the data, with special reference to the positions of hedgehog, armadillo, and elephant. Syst Biol. 1999;48:31–53.12078643
39. Waters PD, Dobigny G, Waddell PJ, Robinson TJ. Evolutionary History of LINE-1 in the Major Clades of Placental Mammals. PLoS ONE. 2007;2(1):e158. doi:10.1371/journal.pone.0000158.17225861