{"id":424,"date":"2020-05-31T17:30:50","date_gmt":"2020-05-31T21:30:50","guid":{"rendered":"http:\/\/blog.uvm.edu\/tbplante\/?p=424"},"modified":"2024-12-11T10:52:52","modified_gmt":"2024-12-11T15:52:52","slug":"making-restricted-cubic-splines-in-stata","status":"publish","type":"post","link":"https:\/\/blog.uvm.edu\/tbplante\/2020\/05\/31\/making-restricted-cubic-splines-in-stata\/","title":{"rendered":"Making Restricted Cubic Splines in Stata"},"content":{"rendered":"\n<p>I love restricted cubic splines, made famous by Frank Harrell (see his approach starting on <a href=\"https:\/\/web.archive.org\/web\/20240327112002\/https:\/\/biostat.app.vumc.org\/wiki\/pub\/Main\/BioMod\/notes.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">page 58 here<\/a>). Dr. Harrell made a <a href=\"https:\/\/rdrr.io\/cran\/Hmisc\/man\/rcspline.plot.html\">package for automating these in R<\/a>. I&#8217;m not aware of an equivalent package for Stata. Here&#8217;s my approach to making this specific restricted cubic spline in Stata.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"551\" height=\"397\" src=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/10\/image.png\" alt=\"\" class=\"wp-image-1508\" style=\"width:430px;height:310px\" srcset=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/10\/image.png 551w, https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/10\/image-300x216.png 300w\" sizes=\"auto, (max-width: 551px) 100vw, 551px\" \/><\/figure>\n\n\n\n<p>The model here is modified Poisson regression using the <a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/15033648\/\">Zou 2004 method<\/a> since the outcome is binary. Since it&#8217;s coded as a GLM, it&#8217;ll be relatively easy to swap out this one specific model for other models, like logistic regression using the appropriate link &amp; family.  It&#8217;s good habit to have the probability density of the outcome across the continuum of exposure, so that is plopped on the bottom here. It wouldn&#8217;t take much work to replace with a histogram.<\/p>\n\n\n\n<p>This is for two groups (group1=blue, group2=red). It wouldn&#8217;t be too hard to make this for just one group, deleting everything having to do with group 2 or having the number 2 in it. Or, duplicate lines for a group 3. Also, sometimes folks like to present a kernel density plot for each outcome, so you&#8217;d just duplicate the kdensity lines and add some code to specify that one of each is for folks with outcome==0 then outcome==1. <\/p>\n\n\n\n<p>One quirk is that the <strong>xbrcspline<\/strong> code depends on a list of all of the possible options for the exposure, which is brought in from <strong>listof <\/strong>to <strong>xbrcspline <\/strong>as a numlist. As you can see in <strong>&#8211;help limits<\/strong>&#8211;, numlists are capped at 2,500 numbers. If you get an error saying &#8220;values() invalid &#8212; invalid numlist has too many elements&#8221;, then you have too many individual options for your exposure. This might be because your exposure is an integer with a huge amount of digits after the decimal place, so your &gt;2,500 observations all come with their own unique value for the exposure. Rounding can help reduce this count, if it analytically and clinically makes sense. (This is okay for BMI, for example, because there isn&#8217;t clinically relevant difference between a bmi of 27.0400558235 and 27.04). To generate a rounded value, plop &#8212;<strong>gen bmi_round=round(bmi, 0.01<\/strong>)&#8211; in the first few lines, right after opening your data. Then use &#8220;bmi_round&#8221; as your exposure variable rather than &#8220;bmi&#8221;.<\/p>\n\n\n\n<p>You also should consider centering your exposure variables at the mean. To do this, use the <strong>&#8211;sum&#8211;<\/strong> command for each variable then <strong>&#8211;gen [variable&#8217;s name]_center = [variable&#8217;s name]\/r(mean)&#8211;<\/strong>. Then use the &#8220;[variable&#8217;s name]_center&#8221; as exposures variables. <\/p>\n\n\n\n<p>This code is a mostly rewritten version of one that is used at the Welch Center at Hopkins, where it has been handed down through generations of doctoral students and post-docs.  <\/p>\n\n\n\n<p>I&#8217;ve recently needed to make these figures using pweights, so I put some comments throughout to simplify that process. <\/p>\n\n\n\n<p><strong>Note<\/strong>: One glitch with WordPress (the blog that hosts this) is that it can&#8217;t render certain code below unless it&#8217;s the final line in a block of code, so I&#8217;ve had to chop it up this script, which should be a single block of code, into several blocks of code. If you copy and paste this into a do file and try to run it, you&#8217;ll notice some blank lines in the &#8220;Make a Figure&#8221; section that you&#8217;ll need to delete. <\/p>\n\n\n\n<p>Enjoy!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Code to make the figure above<\/h2>\n\n\n\n<p><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>*************************************************\n***************load data!************************\n****************and drop any open graphs*********\n*************************************************\nwebuse fvex, clear\ngraph drop _all\nmacro drop _all\n\/\/ if weighted, declare weighting here\n*************************************************\n**************define variables here**************\n*************************************************\n\/\/ Define the OUTCOME, = to 0 or 1, after the (first) word \"outcome\"\nglobal outcome outcome \/\/ this database's outcome is called outcome...\n\/\/ Define the EXPOSURE following the word \"exposure\"\nglobal exposure age \/\/ this has to be a continuous variable\n\/\/ Define the COVARIATES following word \"covariate\"\nglobal covariates sex distance\n\/\/ Define what makes GROUP 1 here after \"thing1\" (blue)\nglobal thing1 if arm==1 \n\/\/ Define what makes GROUP 2 here after \"thing2\" (red)\nglobal thing2 if arm==2\n\n*************************************************\n************auto-generate subgroups and**********\n*****************local macros needed*************\n********************for spline*******************\n*************************************************\n\/\/ need to make the exposure by group for the kernel density plots. \ngen exposuresubgroup1 = ${exposure} ${thing1}\ngen exposuresubgroup2 = ${exposure} ${thing2}\n\n\/\/ The xbrcspine command below needs the middle value of the exposure \n\/\/ for each group. This is often times the median, but if there are \n\/\/ an even number of values for the exposure, the median will be an \n\/\/ average of the two middle values of the exposure variable. The following \n\/\/ code defines the median as a local macro (median1temp) then grabs the\n\/\/ actual value for the exposure closest to that number (median_1). \n\/\/ _pctile can be used with pweight, if needed.\n\/\/\n_pctile exposuresubgroup1, p(50)\n\/\/ _pctile exposuresubgroup1 &#091;pweight=samplingweight], p(50)\nlocal median1temp = r(r1)\ngen mediandiff1=abs(exposuresubgroup1-`median1temp')\ngsort mediandiff1\nlocal median_1 = ${exposure}&#091;_n==1]\n\/\/\n_pctile exposuresubgroup2, p(50)\n\/\/ _pctile exposuresubgroup2 &#091;pweight=samplingweight], p(50)\nlocal median2temp = r(r1)\ngen mediandiff2=abs(exposuresubgroup2-`median2temp')\ngsort mediandiff2\nlocal median_2 = ${exposure}&#091;_n==1]\n\/\/\n\/\/ get rid of these variables, since they are not needed anymore\ndrop mediandiff1 mediandiff2\n\n*************************************************\n*************make splines and view knots!********\n*************************************************\n\/\/ NOTE: look at the stata output here, the knots will be displayed\n\/\/ you'll need to list these in the footer\n\/\/ \n\/\/ Harrell's method recommends using the # of knots ('k') as 3, 4 or 5\n\/\/ with n &gt;=100 as 5 and n&lt;30 as 3. It&#039;s just a guideline. \n\/\/ Details start on page 58 here: https:\/\/web.archive.org\/web\/20240327112002\/https:\/\/biostat.app.vumc.org\/wiki\/pub\/Main\/BioMod\/notes.pdf\n\/\/ \nmkspline ${exposure}_spline1=exposuresubgroup1, nknots(4) cubic displayknots\nmat knots1=r(knots)\nmkspline ${exposure}_spline2=exposuresubgroup2, nknots(4) cubic displayknots\nmat knots2=r(knots)\n\/\/ above won&#039;t work for pweight weighted data. instead, you need \n\/\/ to specifically define the percentiles from the table on pg 5 here:\n\/\/ https:\/\/www.stata.com\/manuals13\/rmkspline.pdf\n\/\/ But basically, \n\/\/ 3 knots, percentiles are at: 10 50 90\n\/\/ 4 knots, percentiles are at: 5 35 65 95\n\/\/ 5 knots, percentiles are at: 5 27.5 50 72.5 95 &lt;- if using this\n\/\/             and you consider the median to be another knot, then \n\/\/             need to specify the 4 knots that aren&#039;t 50th percentile\n\/\/ 6 knots, percentiles are at: 5 23 41 59 77 95\n\/\/ 7 knots, percentiles are at: 2.5 18.33 34.17 50 65.83 81.67 97.5\n\/\/ the code for 4 knots follows in hidden code:\n\/\/\n\/\/_pctile exposuresubgroup1 &#091;pweight=samplingweight], p(5 27.5 72.5 95) \n\/\/return list\n\/\/local gr1knot1 = r(r1)\n\/\/local gr1knot2 = r(r2)\n\/\/local gr1knot3 = r(r3)\n\/\/local gr1knot4 = r(r4)\n\/\/di &quot;Knots for group 1 are at &quot; `gr1knot1&#039; &quot;, &quot; `gr1knot2&#039; &quot;, &quot; \/\/\/\n\/\/`gr1knot3&#039; &quot;, &quot; `gr1knot4&#039; &quot;.&quot;\n\/\/\n\/\/\n\/\/mkspline ${exposure}_spline1=exposuresubgroup1,  \/\/\/\n\/\/knots(`gr1knot1&#039; `gr1knot2&#039; `gr1knot3&#039; `gr1knot4&#039; ) \/\/\/\n\/\/cubic displayknots\n\/\/\n\/\/mat knots1=r(knots)\n\/\/\n\/\/_pctile exposuresubgroup2 &#091;pweight=samplingweight], p(5 27.5 72.5 95) \n\/\/return list\n\/\/local gr2knot1 = r(r1)\n\/\/local gr2knot2 = r(r2)\n\/\/local gr2knot3 = r(r3)\n\/\/local gr2knot4 = r(r4)\n\/\/\n\/\/di &quot;Knots for group 2 are at &quot; `gr2knot1&#039; &quot;, &quot; `gr2knot2&#039; &quot;, &quot; \/\/\/\n\/\/ `gr2knot3&#039; &quot;, &quot; `gr2knot4&#039; &quot;.&quot;\n\/\/\n\/\/\n\/\/mkspline ${exposure}_spline2=exposuresubgroup2,  \/\/\/\n\/\/knots(`gr2knot1&#039; `gr2knot2&#039; `gr2knot3&#039; `gr2knot4&#039;) \/\/\/\n\/\/cubic displayknots\n\/\/\n\/\/mat knots2=r(knots)\n*************************************************\n********Generate models to use in splines********\n*************************************************\n\/\/ model time!\n\/\/ the models here are modified poisson regressions using sandwich variance estimators\n\/\/ which is the Zou method from this classic paper:\n\/\/ https:\/\/pubmed.ncbi.nlm.nih.gov\/15033648\/\n\/\/ GLMs can be used with pweight, if needed.\n\/\/\n\/\/ First group \nglm ${outcome} c.${exposure}_spline1* ${covariates}  ${thing1},  \/\/\/\nfam(poisson) link(log) eform robust\n\/\/ robust is required for modified poisson regression by zou&#039;s method \n\/\/ (i.e., modified poisson regression with sandwich variance estimators)\n\/\/ for pweight or survey data, use:\n\/\/ svy: glm ${outcome} c.${exposure}_spline1* ${covariates}  ${thing1},  \/\/\/\n\/\/ fam(poisson) link(log) eform\nlevelsof(${exposure})  ${thing1}  \/\/ generates r(levels) for next line\nxbrcspline ${exposure}_spline1, values(`r(levels)&#039;) \/\/\/\nref(`median_1&#039;) matknots(knots1) \/\/\/ \neform gen(lpnt1 hr1 lb1 ub1)\n\n\/\/ Second group\nglm ${outcome} c.${exposure}_spline2* ${covariates} ${thing2}, \/\/\/\nfam(poisson) link(log) eform robust\n\/\/ for pweight or survey data, use:\n\/\/ svy: glm ${outcome} c.${exposure}_spline2* ${covariates}  ${thing2},  \/\/\/\n\/\/ fam(poisson) link(log) eform\n\/\/\nlevelsof(${exposure}) ${thing2} \/\/ generates r(levels) for next line\nxbrcspline ${exposure}_spline2, values(`r(levels)&#039;) \/\/\/\nref(`median_2&#039;) matknots(knots2) \/\/\/ \neform gen(lpnt2 hr2 lb2 ub2)\n\n*************************************************\n***********generate extremes to drop*************\n*************************************************\n\/\/ should drop the extremes, here the 0.5 and 99.5th percentiles\n\/\/ but need to define these as local macros.\n\/\/ _pctile can be used with pweight, if needed.\n_pctile exposuresubgroup1, p(0.5 99.5)\nreturn list\nlocal cut_a1 = r(r1)\nlocal cut_b1 = r(r2)\n\n_pctile exposuresubgroup2, p(0.5 99.5)\nreturn list\nlocal cut_a2 = r(r1)\nlocal cut_b2 = r(r2)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>*************************************************\n***************make the figure*******************\n*************************************************\nset scheme s1mono \/\/ my favorite scheme\ntwoway \/\/\/\n\/\/\/ spline for one group\n(line  hr1 lpnt1 if lpnt1 &gt; `cut_a1' &amp; \/\/\/\nlpnt1 &lt; `cut_b1&#039; , yaxis(1) lp(solid) lc(blue) lwidth(medthick) ) \/\/\/<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>(rarea lb1 ub1 lpnt1 if lpnt1 &gt; `cut_a1' &amp; \/\/\/\nlpnt1 &lt; `cut_b1&#039; , yaxis(1) color(blue%5)) \/\/\/<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>(line  lb1 lpnt1 if lpnt1 &gt; `cut_a1' &amp; \/\/\/\nlpnt1 &lt; `cut_b1&#039;, yaxis(1)  lp(dash) lc(blue) lwidth(thin) ) \/\/\/<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>(line  ub1 lpnt1 if lpnt1 &gt; `cut_a1' &amp; \/\/\/\nlpnt1 &lt; `cut_b1&#039; , yaxis(1) lp(dash) lc(blue) lwidth(thin) ) \/\/\/<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/\/ spline for other group\n(line  hr2 lpnt2 if lpnt2 &gt; `cut_a2' &amp; \/\/\/\nlpnt2 &lt; `cut_b2&#039; , yaxis(1) lp(solid) lc(red) lwidth(medthick) ) \/\/\/<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>(rarea lb2 ub2 lpnt2 if lpnt2 &gt; `cut_a2' &amp; \/\/\/\nlpnt2 &lt; `cut_b2&#039; , yaxis(1) color(red%5)) \/\/\/<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>(line  lb2 lpnt2 if lpnt2 &gt; `cut_a2' &amp; \/\/\/\nlpnt2 &lt; `cut_b2&#039; , yaxis(1)  lp(dash) lc(red) lwidth(thin) ) \/\/\/<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>(line  ub2 lpnt2 if lpnt2 &gt; `cut_a2' &amp; \/\/\/\nlpnt2 &lt; `cut_b2&#039; , yaxis(1) lp(dash) lc(red) lwidth(thin) ) \/\/\/<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/\/ kernel density for one group\n\/\/\/ note: can alternatively install kdens and moremata to \n\/\/\/ use pweighting in kernel density.\n(kdensity exposuresubgroup1 if exposuresubgroup1 &gt; `cut_a1' &amp; \/\/\/\nexposuresubgroup1 &lt; `cut_b1&#039; , yaxis(2) lp(shortdash) lcolor(blue)) \/\/\/<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/\/ kernel density for other group\n(kdensity exposuresubgroup2 if exposuresubgroup2 &gt; `cut_a2' &amp; \/\/\/\nexposuresubgroup2 &lt; `cut_b2&#039; , yaxis(2) lp(shortdash) lcolor(red)) \/\/\/<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>, \/\/\/\nyline(1, lpattern(solid) lcolor(black) axis(1)) \/\/\/\n\/\/\/ labels for the left axis, hide the following line if you aren't sure \n\/\/\/ of what the range should be:\nylabel(0.25 \"0.25\" 0.5 \"0.5\" 1 \"1\" 2.5 \"2.5\" 5 \"5\", axis(1) angle(0)) \/\/\/\n\/\/\/ range for the left axis. the BOTTOM of this range has to be WAAAY lower \n\/\/\/ than the bottom ylabel in the line above in order for it to sit \n\/\/\/ on the way top of the figure. \nyscale(r(0.001 2) log axis(1)) \/\/\/ log scale here\n\/\/\/ ditto for the right axis, but the TOP of the range on the yscale needs \n\/\/\/ to be WAAAY higher than the top ylabel\nylabel(0 \"0\" 0.01 \"0.01\" 0.02 \"0.02\" 0.03 \"0.03\" 0.04 \"0.04\" 0.05 \"0.05\", \/\/\/\naxis(2) labsize(vsmall) angle(0)) \/\/\/\nyscale(r(0.0 .2) axis(2)) \/\/\/ not log scale here\n\/\/\/ you'll notice that the x label doesn't span the entire \n\/\/\/ range of the exposure because the local macro cuts above. \n\/\/\/xlabel(20(5)60) in case you want to specify the x axis\n\/\/\/ for the titles, need to put a bunch of spaces so things align\nytitle(\"                {bf:Risk Ratio (95% CI)}\", axis(1)) \/\/\/\nytitle(\"{bf:Probability Density}                                          \", \/\/\/\n justification(left) axis(2)) \/\/\/\nxtitle(\"{bf:Exposure Here}\") \/\/\/\ntitle(\"{bf:The Title}\") \/\/\/\nlegend(order(1 \"Group 1\" 5 \"Group 2\")) \/\/\/\nname(mygraph1)\n\t\t\t\n*************************************************\n*****************Save figure!********************\n*************************************************\n\/\/ save as png, change to tif if needed for submission\t\t\t\ngraph export \"mygraph1.png\", replace width(1000)<\/code><\/pre>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I love restricted cubic splines, made famous by Frank Harrell (see his approach starting on page 58 here). Dr. Harrell made a package for automating these in R. I&#8217;m not aware of an equivalent package for Stata. Here&#8217;s my approach to making this specific restricted cubic spline in Stata. The model here is modified Poisson &hellip; <a href=\"https:\/\/blog.uvm.edu\/tbplante\/2020\/05\/31\/making-restricted-cubic-splines-in-stata\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Making Restricted Cubic Splines in Stata<\/span><\/a><\/p>\n","protected":false},"author":4473,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[477491],"tags":[],"class_list":["post-424","post","type-post","status-publish","format-standard","hentry","category-stata-code"],"_links":{"self":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/424","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/users\/4473"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/comments?post=424"}],"version-history":[{"count":84,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/424\/revisions"}],"predecessor-version":[{"id":1998,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/424\/revisions\/1998"}],"wp:attachment":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/media?parent=424"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/categories?post=424"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/tags?post=424"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}