Michael Kane and Brian Hobbs
Motivation: the "new" way clinical oncology trials are being conducted
The patient heterogeneity problem
Automated subtyping using latent space methods
Case study: predicting patient response by subtype
Case study: diagnosing mis-dosing based on adverse events
New compound is developed and is thought to deliver a (small) benefit over current therapies for a specific histology
A large number of patients are enrolled (at least hundreds)
The response rate of the treatment population is tested against a control group
Targeted therapy (NTRK gene rearrangement)
Very stringent inclusion/exclusion criteria
Effective for other histologies (including breast, colorectal, and neuroblastoma)
8/11 responders for lung cancer in initial study
"A drug that is intended to treat a serious condition AND preliminary clinical evidence indicates that the drug may demonstrate substantial improvement on a clinically significant endpoint(s) over available therapies"
Benefits:
Often means single arm
Smaller populations
May include multiple histologies
Still work within FDA regulation, often including "all-comers"
Biomarker | Tumor Type | Drug | N | ORR (%) | PFS (months) |
---|---|---|---|---|---|
BRAF V600 | NSCLC (>1 line) | Dabrafenib + Trametinib | 57 | 63 | 9.7 |
ALK fusions | NSCLC (prior criz) | Brigatinib | 110 | 54 | 11.1 |
ALK fusions | NSCLC (prior criz | Alectinib | 225 | 46-48 | 8.1-8.9 |
EGFR T790M | NSLCLC (prior TKI) | Osimertinib | 127 | 61 | 9.6 |
BRCA 1/2 | Ovarian (>2 prior) | Rucaparib | 106 | 54 | 12.8 |
MSI-H/MMR-D | Solid Tumor | Pembroliumab | 149 | 40 | Not reached |
BRAF V600 | Erdheim Chester | Vemurafinib | 22 | 63 | Not reached |
Hobbs, Kane, Hong, and Landin. Statistical challenges posed by basket trials: sensitivity analysis of the Vemurafinib study. Accepted to the Annals of Oncology.
> summary(lm(y ~ x1n + x2n - 1, ts1))
Call:
lm(formula = y ~ x1n + x2n - 1, data = ts1)
Residuals:
Min 1Q Median 3Q Max
-7.276 -2.695 0.260 2.341 7.358
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x1n 1.1018 0.2266 4.863 4.03e-05 ***
x2n 1.2426 0.2398 5.181 1.69e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.647 on 28 degrees of freedom
Multiple R-squared: 0.6383, Adjusted R-squared: 0.6124
F-statistic: 24.7 on 2 and 28 DF, p-value: 6.572e-07
Consider n training samples {x1,x2,..,xn}, each with p features.
Given a response vector {y1,y2,...,yn}, find a function h:X→Y minimizing the ∑i=1nL(h(xi),y) with respect to a loss function L.
Construct h=f∘gy such that gy :X→X′ is a latent space projection of the original data, whose geometry is dictated by the response.
Note that f is not parameterized by the response.
Let X∈Rn×p be a full-rank design matrix with n>p, X=UΣV is the singular value decomposition of X.
where Γ is a diagonal matrix in Rp×p, 1 is a column of ones in Rn, and ε is composed of (sufficiently) i.i.d. samples from a random variable with mean zero and standard deviation σ.
Under the ℓ2 loss function we can find the optimal value of Γ among the set of all weight matrices Γ~ with
The matrix Γ is diag(β) where β=Σ−1UTY is the slope coefficient estimates of the corresponding linear model.
XY=XVΓ~ represent the data in the latent space
Each column whose corresponding slope coefficient is not zero, contributes equally to the estimate of Y in expectation
If the distance metric denoted by matrix A∈Rp×p and the distance between any two 1×p matrices x and y expressed by
The square euclidean distance between two samples, i and j in XY, denoted as XY(i) and XY(j) respectively is
Proof:
Let Z be a diagonal matrix of standard normals
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
>
> fit <- lm(Sepal.Length ~, iris[,-5])
> mm <- model.matrix(Sepal.Length ~ ., iris[,-5])
>
> km <- kmeans(mm, centers = 3)
> table(km$cluster, iris$Species)
setosa versicolor virginica
1 21 1 0
2 29 2 0
3 0 47 50
>
> # ...
> table(subgroups$membership, iris$Species)
setosa versicolor virginica
1 50 0 0
2 0 23 0
3 0 27 20
4 0 0 30
> mm <- model.matrix(Sepal.Length ~ ., iris[,-5])
>
> km <- kmeans(mm, centers = 4)
> table(km$cluster, iris$Species)
setosa versicolor virginica
1 0 24 0
2 50 0 0
3 0 0 36
4 0 26 14
>
> # ...
> table(subgroups$membership, iris$Species)
setosa versicolor virginica
1 50 0 0
2 0 23 0
3 0 27 20
4 0 0 30
Clinical trial data is not low-dimensional
Sometimes the predictive information isn't in a linear subspace of the data
Received "accelerated approval"
Subtype response based on baseline characteristics
Variable | Description |
---|---|
AMD19FL | Exon 19 Del. Act. Mut. Flag |
AM858FL | L858R Activating Mut. Flag |
LIVERFL | Mets Disease Site Liver Flag |
DISSTAG | Disease Stage at entry |
NUMSITES | Num. of Mets Disease Sites |
PRTK | Number of Prior TKI |
PRTX | Number of Prior Therapies |
WTBL | Baseline Weight |
SEX |
1. Improve prediction accuracy:
2. Construct counterfactuals and create synthetically controlled trials.