Methods

There is no agreement on a single optimal method to validate a surrogate marker or surrogate endpoint. Multiple statistical frameworks have been proposed for evaluating surrogate markers including the meta-analytic framework, the principal stratification framework, and the proportion of treatment effect explained (PTE) framework. In the content and videos below, we describe these frameworks and then focus primarily on implementation of the PTE framework. This includes methods that can handle censored data, multiple surrogates, surrogates measured with error, longitudinal surrogates, high-dimensional surrogates and more. Each method has publicly available software in R to implement estimation and inference. These are described below, with more information about software available here.

Proportion of Treatment Effect Explained (PTE)

First, we describe the proportion of treatment effect explained (PTE). The full phrasing is: the proportion of the treatment effect on the primary outcome that is explained by the treatment effect on the surrogate, but this is often shortened to simply PTE. Let $Y$ denote the primary outcome, $S$ denote the surrogate marker, and (\Z\) denote the treatment indicator where $Z \in \{0,1\}$ (i.e., treatment vs. control) and treatment is assumed to be randomized unless otherwise noted. Let’s assume that higher values of $Y$ and $S$ are “better". The observed data consists of $\{Y_i, S_i, Z_i\}$ for each person $i$.

Previous work proposed to evaluate a surrogate marker by defining and estimating the PTE via specifying two regression models, for example:

$$E(Y|Z) = \beta_0 + \beta_1Z$$

$$E(Y|Z,S) = \beta_0^* + \beta_1^*Z + \beta_2S$$

where the PTE is defined as $R_F = 1-\beta_1^*/\beta_1$ and estimated by plugging in the corresponding regression estimates. Intuitively, the idea is that if one fits the first model and observes a large treatment effect (large $\beta_1$) and then fits the second model, essentially adding in the surrogate marker, and the treatment effect is now small or close to 0 (small $\beta_1^*$), then $R_F$ will be close to 1, indicating that the surrogate is capturing the effect of the treatment on Y. This approach is extremely appealing and easy to implement. However, numerous studies have pointed out problems with this approach, one of which is that it relies on both models being correctly specified. Notably, not only is the estimate dependent on correct specification, but the definition of the quantity $R_F$ itself relies on correct specification.

An alternative approach is one that aims to capture the same idea with a model-free definition. To introduce this, we use potential outcomes notation where each person has potential outcomes $ \{Y^{(1)}, Y^{(0)}, S^{(1)}, S^{(0)}\} $ where $ Y^{(g)} $ is the outcome when $ Z = g $ and $ S^{(g)} $ is the surrogate when $ Z = g $. The PTE is defined using contrasts between the actual treatment effect on $ Y $, defined as:

$$ \Delta = E(Y^{(1)} - Y^{(0)}) $$

and the residual treatment effect on $ Y $, defined as:

$$ \Delta_S = E_{S^{(0)}} \left[ E(Y^{(1)} - Y^{(0)} \mid S^{(1)} = S^{(0)} = s) \right] = \int E(Y^{(1)} - Y^{(0)} \mid S^{(1)} = S^{(0)} = s) \, dF_{S^{(0)}}(s) $$

where $ F_{S^{(0)}} $ is the cumulative distribution function of $ S^{(0)} $. The residual treatment effect can be interpreted as the “leftover" treatment effect on $ Y $, after accounting for the treatment effect on $ S $. That is, it is the hypothetical treatment effect on $ Y $ if the distribution of the surrogate in both groups looked like the distribution of the surrogate in the control group. Importantly, the definition of $ \Delta_S $ uses the distribution of $ S^{(0)} $, but in theory, one can select the distribution of $ S^{(1)} $ or some combination of the two. The PTE is then defined as:

$$ R_W = \frac{\Delta - \Delta_S}{\Delta} = 1 - \frac{\Delta_S}{\Delta} $$

where $ \Delta - \Delta_S $ is the treatment effect explained by $ S $. Ideally, this quantity is between 0 and 1, with values close to 0 indicating a poor surrogate (not capturing the treatment effect) and values close to 1 indicating a good surrogate (able to capture the treatment effect). However, $ R_W $ is only guaranteed to be within $ [0,1] $ under certain assumptions, detailed elsewhere. This construction of $ R_W $ highlights a key challenge in surrogate evaluation: when the overall treatment effect, $ \Delta $, is small, identifying a surrogate becomes inherently difficult. In this particular framework, a small treatment effect implies that finding a surrogate capable of explaining a large proportion of such a small effect will be highly challenging. Unlike $ R_F $, the definition $ R_W $ does not involve any model specification and thus, is model-free. In terms of estimation of $ R_W $, there are both parametric and nonparametric options available. Overall, this framework offers a single number, the PTE, that aims to quantify the strength of the surrogate marker with respect to capturing the treatment effect on $ Y $. While there is no agreed upon threshold for a “good enough" surrogate, some have proposed using 0.5 or 0.75 as a threshold for either the point estimate or, more strictly, for the lower bound of the confidence interval.

Principal Stratification

Principal stratification is a causal inference method developed for a setting where one aims to condition jointly on the potential outcomes of two variables. Surrogate marker evaluation is one application of principal stratification. The framework is based on principal effects which are defined as comparisons of potential outcomes within a principal stratum:

$$E\left(Y^{(1)} - Y^{(0)} \mid (S^{(1)}, S^{(0)}) = (s_1, s_0)\right).$$

In this framework, for $ S $ to be a “principal surrogate", we must have (1) Average Causal Necessity (ACN):

$$E( Y^{(1)} \mid S^{(1)} = S^{(0)} = s) = E( Y^{(0)} \mid S^{(1)} = S^{(0)} = s) \quad \mbox{for all} \quad s$$

and (2) Average Causal Sufficiency (ACS):

$$E( Y^{(1)} \mid S^{(1)} = s_1, S^{(0)} = s_0) \neq E( Y^{(0)} \mid S^{(1)} = s_1, S^{(0)} = s_0) \quad \mbox{for all} \quad |s_1 - s_0| > C. $$

Essentially, ACN means that if $ S^{(1)} $ and $ S^{(0)} $ are the same, then $ Y^{(1)} $ and $ Y^{(0)} $ should also be the same; and ACS means that if $ S^{(1)} $ and $ S^{(0)} $ are different, then $ Y^{(1)} $ and $ Y^{(0)} $ should also be different. Of course, the structure of the ACN and ACS conditions look quite similar to the components of the $ \Delta_S $ quantity within the PTE framework and indeed, prior work has demonstrated the links between these two frameworks. For example, it has been shown that if ACN holds, then $ \Delta_S = 0 $. The concepts of ACN and ACS can be visualized via the Causal Effect Predictiveness (CEP) Curve which is a curve describing the relationship between $ S^{(1)} - S^{(0)} $ and $ Y^{(1)} - Y^{(0)} $. In reality, these quantities are not identifiable because we generally do not have both $ S^{(1)} $ and $ S^{(0)} $ (and $ Y^{(1)} $ and $ Y^{(0)} $) for the same individual. However, there are some exceptions such as within the crossover trial design and in certain vaccine settings where $ S $ measures immune response and there is no possibility of an immune response if given a placebo, i.e., $ S^{(0)} = 0 $ for all individuals, referred to as the constant-biomarker setting. To estimate the CEP Curve, identifiability assumptions are typically needed, and in one way or another a parametric assumption is imposed. For example, one approach uses baseline covariates, $ W $, to predict unobserved $ S $ and then a generalized linear model describing the dependence of $ Y $ on $ S $ and $ Z $ with a specified parametric link function.

Meta-analytic Framework

The third framework is the meta-analytic framework and is applicable when multiple studies are available. In this setting, the observed data consists of $ \{Y_{ij}, S_{ij}, Z_{ij}\} $ for individual $ i $ in trial/study $ j $ where $ j=1,\ldots,J $ with $ J $ being the number of trials. This framework, which considers random trial-level intercepts and subject-level correlations, generally relies on the following bivariate model specification:

$$Y_{ij} = \nu_{Yi} + \theta_i Z_{ij} + \epsilon_{Yij} $$

$$S_{ij} = \nu_{Si} + \gamma_i Z_{ij} + \epsilon_{Sij} $$

where $ \nu_{Yi} $ and $ \nu_{Si} $ are trial-specific intercepts, $ \theta_i $ and $ \gamma_i $ are trial-specific treatment effects, $ \{\nu_{Yi}, \nu_{Si}, \theta_i, \gamma_i\} $ is assumed to follow a normal distribution, and $ \epsilon_{Yij} $ and $ \epsilon_{Sij} $ are correlated error terms assumed to have a bivariate normal distribution. This assumed normality means that one can essentially derive the expected treated effect on $ Y $ conditional on the treatment effect on $ S $, which is of course our ultimate goal when using a surrogate marker in a future trial. The assumed normality results in various terms working out quite nicely with some algebra. (Even if $ Y $ and $ S $ are not normally distributed, it may be possible to determine an appropriate transformation to make this assumption reasonable.) Surrogacy is typically quantified using the proportion of variance in the total effect explained by the trial-level random effects associated with the surrogate, denoted as $ R^2_{\text{trial}} $, which is a function of the various variance matrix quantities. Larger values of $ R^2_{\text{trial}} $ reflect a stronger surrogate.An advantage of this approach is the lack of needed assumptions about relationships between unobserved potential outcomes which are needed in the PTE and principal stratification frameworks. Disadvantages of this approach include its dependence on the assumption of a normal distribution, which is often unrealistic in practice, and, more critically, the clear requirement for data from multiple trials.