{"id":1330,"date":"2022-10-31T16:35:02","date_gmt":"2022-10-31T20:35:02","guid":{"rendered":"https:\/\/blog.uvm.edu\/tbplante\/?p=1330"},"modified":"2023-02-14T13:28:18","modified_gmt":"2023-02-14T18:28:18","slug":"making-subgroup-analysis-figure-in-stata","status":"publish","type":"post","link":"https:\/\/blog.uvm.edu\/tbplante\/2022\/10\/31\/making-subgroup-analysis-figure-in-stata\/","title":{"rendered":"Making a subgroup analysis figure in Stata"},"content":{"rendered":"\n<p>I was the analyst for the myPACE trial (<a rel=\"noreferrer noopener\" href=\"https:\/\/jamanetwork.com\/journals\/jamacardiology\/article-abstract\/2801001\" target=\"_blank\">published here in JAMA Cardiology<\/a>), and needed to put together a subgroup analysis figure. I didn&#8217;t find any helpful stock code, so I wrote my own. The code uses the Frames feature that was introduced in Stata 16. It will (a) make a new frame with the required variables but no data, (b) generate a new dichotomous variable from continuous variables, (c) generate labels, (d) grab the Ns, point estimates, and 95% CI from a logistic regression, (e) grab the P-value for interaction for the primary exposure dependent variable*, (f) write the point estimate\/95% CI\/P-value for interaction to the new frame, then (g) switch to the new frame and make this figure. <em><strong>This script uses local macros so needs to be run all at once in a do file, not line by line. <\/strong><\/em><\/p>\n\n\n\n<p>This uses a stock Stata dataset called &#8220;catheter&#8221;. The outcome of interest\/dependent variable is &#8220;infect&#8221;, the primary exposure\/independent variable of interest is &#8220;time&#8221;, and the subgroups are age, sex, and patient number. This uses logistic regression, but you can easily swap this model out for any other model. <\/p>\n\n\n\n<p>*You can get more complex code to format the P-values <a rel=\"noreferrer noopener\" href=\"https:\/\/blog.uvm.edu\/tbplante\/2022\/10\/26\/formatting-p-values-for-stata-output\/\" target=\"_blank\">here<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2022\/11\/image.png\" alt=\"\" class=\"wp-image-1347\" width=\"331\" height=\"234\" srcset=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2022\/11\/image.png 948w, https:\/\/blog.uvm.edu\/tbplante\/files\/2022\/11\/image-300x213.png 300w, https:\/\/blog.uvm.edu\/tbplante\/files\/2022\/11\/image-768x544.png 768w\" sizes=\"auto, (max-width: 331px) 100vw, 331px\" \/><\/figure>\n\n\n\n<p>Here&#8217;s the code!<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>frame reset \/\/ drop all existing frames and data\nwebuse catheter, clear \/\/ load analytical dataset\nversion 16 \/\/ need Stata version 16 or newer\n*\n* Make an empty frame with the variables we'll add later row by row\n* The variables \"rowname\" and \"pvalue\" will be strings so when\n* you add to these variables with the --frame post-- command,\n* you need to use quotes. \nframe create subgroup str30 rowname n beta low95 high95 str30 pval\n*\n*** Age\n* Need to generate a dichotomous age variable \n* You don't need to do this if the variable is already dichotomous, \n* ordinal, or nominal\ngenerate agesplit = (age&gt;=50) if !missing(age) \/\/ below 50 is 0, 50 and above is 1\n* \n* Now generate a label for the overall grouping\nlocal label_group \"{bf:Age}\" \n* \n* Group 0, below age 50\n* Generate label for this subgroup\nlocal label_subgroup_0 \"Under 50y\" \n* Now run the model for this subgroup\nlogistic infect time if agesplit==0 \n* Now save the N, beta, and 95% CI as local macros.\n* There's lots you can save after a regression, type --return list--, \n* --ereturn list--, and --matrix list r(table)-- to see what's there\nlocal n_0 = e(N) \nlocal beta_0 = r(table)&#091;1,1] \nlocal low95_0 = r(table)&#091;5,1]\nlocal high95_0 = r(table)&#091;6,1]\n* print above local macros to prove you collected them correctly: \ndi \"For `label_subgroup_0', n=\" `n_0' \", beta (95% CI)=\" %4.2f `beta_0' \" (\" %4.2f `low95_0' \" to \" %4.2f `high95_0' \")\" \n*\n* Group 1, at least 50 years old\nlocal label_subgroup_1 \"At least 50y\" \nlogistic infect time if agesplit==1 \nlocal n_1 = e(N)\nlocal beta_1 = r(table)&#091;1,1] \nlocal low95_1 = r(table)&#091;5,1]\nlocal high95_1 = r(table)&#091;6,1]\ndi \"For `label_group', subgroup `label_subgroup_1', n=\" `n_1' \", beta (95% CI)=\" %4.2f `beta_1' \" (\" %4.2f `low95_1' \" to \" %4.2f `high95_1' \")\" \n* \n* Now run the model with an interaction term between the primary exposure\n* and the subgroup. \nlogistic infect c.time##i.agesplit\n* Grab the p-value as a local macro saved as pval\nlocal pval = r(table)&#091;4,5] \n* Print that local macro to see that you've grabbed it correctly\ndi \"P-int = \" %4.3f `pval'\n* Format that P-value and save that as a new local macro called pvalue\nlocal pvalue \"P=`: display %3.2f `pval''\"\ndi \"`pvalue'\"\n* \n* Now write your local macros to the the \"subgroup\" frame\n* Each \"frame post\" command will add 1 additional row to the frame.\n* We will graph these line by line.\n* First line with overall group name a P-value for interaction:\nframe post subgroup (\"`label_group'\") (.)  (.) (.) (.) (\"`pvalue'\") \n* Now each subgroup by itself:\nframe post subgroup (\"`label_subgroup_0'\") (`n_0')  (`beta_0') (`low95_0') (`high95_0') (\"\") \nframe post subgroup (\"`label_subgroup_1'\") (`n_1')  (`beta_1') (`low95_1') (`high95_1') (\"\") \n* Optional blank line:\nframe post subgroup (\"\") (.) (.) (.) (.) (\"\") \n*\n*** Female sex\n* This is already dichotomous, so don't need to create a new variable\n* like we did for age.\nlocal label_group \"{bf:Sex}\"  \n*group 0, males\nlocal label_subgroup_0 \"Males\" \nlogistic infect time if female==0 \nlocal n_0 = e(N) \nlocal beta_0 = r(table)&#091;1,1] \nlocal low95_0 = r(table)&#091;5,1]\nlocal high95_0 = r(table)&#091;6,1]\n*group 1, females\nlocal label_subgroup_1 \"Females\" \nlogistic infect time if female==1 \nlocal n_1 = e(N) \/\/ N\nlocal beta_1 = r(table)&#091;1,1] \nlocal low95_1 = r(table)&#091;5,1]\nlocal high95_1 = r(table)&#091;6,1]\n*interaction P-value\nlogistic infect c.time##i.female\nlocal pval = r(table)&#091;4,5]\nlocal pvalue \"P=`: display %3.2f `pval''\"\n*write to subgroup frame\nframe post subgroup (\"`label_group'\") (.)  (.) (.) (.) (\"`pvalue'\") \nframe post subgroup (\"`label_subgroup_0'\") (`n_0')  (`beta_0') (`low95_0') (`high95_0') (\"\") \nframe post subgroup (\"`label_subgroup_1'\") (`n_1')  (`beta_1') (`low95_1') (`high95_1') (\"\") \nframe post subgroup (\"\") (.) (.) (.) (.) (\"\") \n*\n*** patient\n* need to generate a patient dichotomous variable\ngenerate patientsplit = (patient&gt;=20) if !missing(patient) \/\/ below 20 is 0, 20 and above is 1\nlocal label_group \"{bf:Patient}\" \n*group 0, below 20\nlocal label_subgroup_0 \"Under 20th patient\" \nlogistic infect time if patientsplit==0 \nlocal n_0 = e(N) \nlocal beta_0 = r(table)&#091;1,1] \nlocal low95_0 = r(table)&#091;5,1]\nlocal high95_0 = r(table)&#091;6,1]\n*group 1, 20 and above\nlocal label_subgroup_1 \"At least the 20th patient\" \nlogistic infect time if patientsplit==1 \nlocal n_1 = e(N) \nlocal beta_1 = r(table)&#091;1,1] \nlocal low95_1 = r(table)&#091;5,1]\nlocal high95_1 = r(table)&#091;6,1]\n*interaction P-value\nlogistic infect c.time##i.agesplit\nlocal pval = r(table)&#091;4,5] \nlocal pvalue \"P=`: display %3.2f `pval''\"\n*write to subgroup frame\nframe post subgroup (\"`label_group'\") (.)  (.) (.) (.) (\"`pvalue'\") \nframe post subgroup (\"`label_subgroup_0'\") (`n_0')  (`beta_0') (`low95_0') (`high95_0') (\"\") \nframe post subgroup (\"`label_subgroup_1'\") (`n_1')  (`beta_1') (`low95_1') (`high95_1') (\"\") \nframe post subgroup (\"\") (.) (.) (.) (.) (\"\") \n*\n*** Now make the figure. You'll have to modify this so the number of rows \n*   in your subgroup frame matches the labels and whatnot\nset scheme s1mono \/\/ I like this scheme\n* Change frame to the subgroup frame\ncwf subgroup\n* Generate a row number by the current order of the data in this frame\ngen row=_n\n* Here's the code to make the figure\ntwoway \/\/\/\n(scatter row beta, msymbol(d) mcolor(black) msize(medium)) \/\/\/\n(rcap low95 high95 row, horizontal lcolor(black) lwidth(medlarge)) \/\/\/\n, \/\/\/\nlegend(off) \/\/\/\nxline(1, lcolor(red) lpattern(dash) lwidth(medium)) \/\/\/\ntitle(\"Title\") \/\/\/\nyti(\"Y Title\") \/\/\/\nxti(\"X Title\") \/\/\/\nyscale(reverse) \/\/\/\nyla( \/\/\/\n1 \"`=rowname&#091;1]'\" \/\/\/\n2 \"`=rowname&#091;2]', n=`=n&#091;2]'\" \/\/\/\n3 \"`=rowname&#091;3]', n=`=n&#091;3]'\" \/\/\/\n4 \" \" \/\/\/ blank since it's a blank row\n5 \"`=rowname&#091;5]'\" \/\/\/\n6 \"`=rowname&#091;6]', n=`=n&#091;6]'\" \/\/\/\n7 \"`=rowname&#091;7]', n=`=n&#091;7]'\" \/\/\/\n8 \" \" \/\/\/ blank since it's a blank row\n9 \"`=rowname&#091;9]'\" \/\/\/\n10 \"`=rowname&#091;10]', n=`=n&#091;10]'\" \/\/\/\n11 \"`=rowname&#091;11]', n=`=n&#091;11]'\" \/\/\/\n12 \" \" \/\/\/ blank since it's a blank row\n, angle(0) labsize(small) noticks) \/\/\/\nxla(0.8(.2)2.2) \/\/\/\ntext(1 1.1 \"`=pval&#091;1]'\", placement(e) size(small)) \/\/\/ these are the p-value labels\ntext(5 1.1 \"`=pval&#091;5]'\", placement(e) size(small)) \/\/\/\ntext(9 1.1 \"`=pval&#091;9]'\", placement(e) size(small)) \n*\n* Now export your figure as a PNG file\ngraph export \"myfigure.png\", replace width(1000)<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I was the analyst for the myPACE trial (published here in JAMA Cardiology), and needed to put together a subgroup analysis figure. I didn&#8217;t find any helpful stock code, so I wrote my own. The code uses the Frames feature that was introduced in Stata 16. It will (a) make a new frame with the &hellip; <a href=\"https:\/\/blog.uvm.edu\/tbplante\/2022\/10\/31\/making-subgroup-analysis-figure-in-stata\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Making a subgroup analysis figure in Stata<\/span><\/a><\/p>\n","protected":false},"author":4473,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[477491],"tags":[502556,703436,703437],"class_list":["post-1330","post","type-post","status-publish","format-standard","hentry","category-stata-code","tag-stata","tag-subgroup-analysis","tag-subgroup-figure"],"_links":{"self":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/1330","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/users\/4473"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/comments?post=1330"}],"version-history":[{"count":15,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/1330\/revisions"}],"predecessor-version":[{"id":1379,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/1330\/revisions\/1379"}],"wp:attachment":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/media?parent=1330"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/categories?post=1330"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/tags?post=1330"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}