I love restricted cubic splines, made famous by Frank Harrell (see his approach starting on page 58 here). Dr. Harrell made a package for automating these in R. I’m not aware of an equivalent package for stata. Here’s my approach to making this specific restricted cubic spline in Stata.

The model here is modified Poisson regression using the Zou 2004 method since the outcome is binary. Since it’s coded as a GLM, it’ll be relatively easy to swap out this one specific model for other models, like logistic regression using the appropriate link & family.

It’s good habit to have the probability density of the outcome across the continuum of exposure, so that is plopped on the bottom here.

This code is an ~99% rewritten version of one that is used at the Welch Center at Hopkins, where it has been handed down through generations of doctoral students and post-docs. The main piece that remains from the original version at the Welch Center is the scalar cut. There’s a weird bug in Stata where you can’t embed a less than or equal to sign when calling a scalar cut, so you have to hide the equals sign within the opening and closing tics. Weird.

Enjoy!

## Code to make the figure above

```
webuse fvex, clear
capture drop scalars _all
// Define the OUTCOME after the (first) word "outcome"
global outcome outcome // this database's outcome is called outcome...
// Define the EXPOSURE following the word "exposure"
global exposure age // this has to be a continuous variable
// Define the COVARIATES following word "covariate"
global covariates sex distance
// Define what makes GROUP 1 here after "thing1"
global thing1 if arm==1
// Define what makes GROUP 2 here after "thing2"
global thing2 if arm==2
// need to make the exposure by group for the kernel density plots.
gen exposuresubgroup1 = ${exposure} ${thing1}
gen exposuresubgroup2 = ${exposure} ${thing2}
// need to pluck the median for each group for the xbrcspline below
sum ${exposure} ${thing1}, d
local median1 = r(p50)
sum ${exposure} ${thing2}, d
local median2 = r(p50)
// Make the splines for each group
//
// NOTE: look at the stata output here, the knots will be displayed
// you'll need to list these in the footer
//
// Harrell's method recommends using the # of knots ('k') as 3, 4 or 5
// with n >=100 as 5 and n<30 as 3. It's just a guidelines.
// Details are on page 62 here:
// http://biostat.mc.vanderbilt.edu/wiki/pub/Main/BioMod/notes.pdf
mkspline ${exposure}_spline1=${exposure} ${thing1}, nknots(4) cubic displayknots
mat knots1=r(knots)
mkspline ${exposure}_spline2=${exposure} ${thing2}, nknots(4) cubic displayknots
mat knots2=r(knots)
// model time!
// the models here are modified poisson regressions using sandwich variance estimators
// which is the Zou method from this classic paper:
// https://pubmed.ncbi.nlm.nih.gov/15033648/
//
// First group
glm ${outcome} c.${exposure}_spline1* ${covariates} ${thing1}, ///
fam(poisson) link(log) nolog robust
// robust is required for modified poisson regression by zou's method
// (i.e., modified poisson regression with sandwich variance estimators)
levelsof(${exposure}) ${thing1}
xbrcspline ${exposure}_spline1, values(`r(levels)') ///
ref(`median1') matknots(knots1) ///
eform gen(lpnt1 hr1 lb1 ub1)
// Second group
glm ${outcome} c.${exposure}_spline2* ${covariates} ${thing2}, ///
fam(poisson) link(log) nolog robust
///
levelsof(${exposure}) ${thing2}
xbrcspline ${exposure}_spline2, values(`r(levels)') ///
ref(`median2') matknots(knots2) ///
eform gen(lpnt2 hr2 lb2 ub2)
// should drop the extremes, here the 0.5 and 99.5th percentiles
// but need to define these as scalars
centile ${exposure} ${thing1}, centile(0.5 99.5)
sca cut_a1 = r(c_1)
sca cut_b1 = r(c_2)
centile ${exposure} ${thing2}, centile(0.5 99.5)
sca cut_a2 = r(c_1)
sca cut_b2 = r(c_2)
```

```
/// make the figure
set scheme s1mono // my favorite scheme
twoway ///
/// spline for one group
(line hr1 lpnt1 if lpnt1 >`=scalar(cut_a1)' & ///
lpnt1 < `=scalar(cut_b1)' , yaxis(1) lp(solid) lc(blue) lwidth(medthick) ) ///
```

```
(rarea lb1 ub1 lpnt1 if lpnt1 >`=scalar(cut_a1)' & ///
lpnt1 < `=scalar(cut_b1)' , yaxis(1) color(blue%5)) ///
```

```
(line lb1 lpnt1 if lpnt1 >`=scalar(cut_a1)' & ///
lpnt1 < `=scalar(cut_b1)' , yaxis(1) lp(dash) lc(blue) lwidth(thin) ) ///
```

```
(line ub1 lpnt1 if lpnt1 >`=scalar(cut_a1)' & ///
lpnt1 <`=scalar(cut_b1)' , yaxis(1) lp(dash) lc(blue) lwidth(thin) ) ///
```

```
/// spline for other group
(line hr2 lpnt2 if lpnt2 >`=scalar(cut_a2)' & ///
lpnt1 <`=scalar(cut_b2)' , yaxis(1) lp(solid) lc(red) lwidth(medthick) ) ///
```

```
(rarea lb2 ub2 lpnt2 if lpnt2 >`=scalar(cut_a2)' & ///
lpnt1 <`=scalar(cut_b2)' , yaxis(1) color(red%5)) ///
```

```
(line lb2 lpnt2 if lpnt2 >`=scalar(cut_a2)' & ///
lpnt1 <`=scalar(cut_b2)' , yaxis(1) lp(dash) lc(red) lwidth(thin) ) ///
```

```
(line ub2 lpnt2 if lpnt2 >`=scalar(cut_a2)' & ///
lpnt1 <`=scalar(cut_b2)' , yaxis(1) lp(dash) lc(red) lwidth(thin) ) ///
```

```
(line ub2 lpnt2 if lpnt2 >`=scalar(cut_a2)' & ///
lpnt1 <`=scalar(cut_b2)' , yaxis(1) lp(dash) lc(red) lwidth(thin) ) ///
```

```
/// kernel density for one group
(kdensity exposuresubgroup1 if exposuresubgroup1 >`=scalar(cut_a1)' & ///
exposuresubgroup1 <`=scalar(cut_b1)' , yaxis(2) lp(shortdash) lcolor(blue)) ///
```

```
/// kernel density for other group
(kdensity exposuresubgroup2 if exposuresubgroup2 >`=scalar(cut_a2)' & ///
exposuresubgroup2 <`=scalar(cut_b2)' , yaxis(2) lp(shortdash) lcolor(red)) ///
```

```
, ///
yline(1, lpattern(solid) lcolor(black) axis(1)) ///
/// labels for the left axis, hide the following line if you aren't sure
/// of what the range should be:
ylabel(0.25 "0.25" 0.5 "0.5" 1 "1" 2.5 "2.5" 5 "5", axis(1) angle(0)) ///
/// range for the left axis. the BOTTOM of this range has to be WAAAY lower
/// than the bottom ylabel in the line above in order for it to sit
/// on the way top of the figure.
yscale(r(0.001 2) log axis(1)) /// log scale here
/// ditto for the right axis, but the TOP of the range on the yscale needs
/// to be WAAAY higher than the top ylabel
ylabel(0 "0" 0.01 "0.01" 0.02 "0.02" 0.03 "0.03" 0.04 "0.04" 0.05 "0.05", ///
axis(2) labsize(vsmall) angle(0)) ///
yscale(r(0.0 .2) axis(2)) /// not log scale here
/// you'll notice that the x label doesn't span the entire
/// range of the exposure because the scalar cuts above.
xlabel(20(5)60) ///
/// for the titles, need to put a bunch of spaces so things align
ytitle(" {bf:Risk Ratio (95% CI)}", axis(1)) ///
ytitle("{bf:Probability Density} ", ///
justification(left) axis(2)) ///
xtitle("{bf:Exposure Here}") ///
title("{bf:The Title}") ///
legend(off)
// save as png, change to tif if needed for submission
graph export "mygraph1.png", replace width(1000)
```