{"id":951,"date":"2021-11-08T12:35:35","date_gmt":"2021-11-08T17:35:35","guid":{"rendered":"https:\/\/blog.uvm.edu\/tbplante\/?p=951"},"modified":"2025-12-17T12:43:20","modified_gmt":"2025-12-17T17:43:20","slug":"mediation-analysis-in-stata-using-iorw-inverse-odds-ratio-weighted","status":"publish","type":"post","link":"https:\/\/blog.uvm.edu\/tbplante\/2021\/11\/08\/mediation-analysis-in-stata-using-iorw-inverse-odds-ratio-weighted\/","title":{"rendered":"Mediation analysis in Stata using IORW (inverse odds ratio-weighted mediation)"},"content":{"rendered":"\n<p>Mediation is a commonly-used tool in epidemiology. Inverse odds ratio-weighted (IORW) mediation was described in <a rel=\"noreferrer noopener\" href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3954805\/\" data-type=\"URL\" data-id=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3954805\/\" target=\"_blank\">2013 by Eric J. Tchetgen Tchetgen in this publication<\/a>. It&#8217;s a robust mediation technique that can be used in many sorts of analyses, including logistic regression, modified Poisson regression, etc. It is also considered valid if there is an observed exposure*mediator interaction on the outcome. <\/p>\n\n\n\n<p>There have been a handful of publications that describe the implementation of IORW (and its cousin inverse odds weighting, or IOW) in Stata, including this <a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/25693776\/\" target=\"_blank\" rel=\"noreferrer noopener\">2015 AJE paper by Quynh C Nguyen<\/a> and this <a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/31209086\/\" target=\"_blank\" rel=\"noreferrer noopener\">2019 BMJ open paper by Muhammad Zakir Hossin<\/a> (see the supplements of each for actual code). I recently had to implement this in a REGARDS project using a binary mediation variable and wanted to post my code here to simplify things. <strong><em>The code here is designed for binary exposures and not continuous exposures.<\/em><\/strong> Check out the Nguyen paper above if you need to modify the following code to run IOW instead of IOWR, or if you are using a continuous mediation variable, rather than a binary one. <\/p>\n\n\n\n<p>A huge thank you to Charlie Nicoli MD, Leann Long PhD, and Boyi Guo (almost) PhD who helped clarify implementation pieces. Also, Charlie wrote about 90% of this code so it&#8217;s really his work. I mostly cleaned it up, and clarified the approach as integrated in the examples from Drs. Nguyen and Hossin from the papers above. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">IORW using pweight data (see below for unweighted version)<\/h2>\n\n\n\n<p>The particular analysis I was running uses pweighting. This code won&#8217;t work in data that doesn&#8217;t use weighting. This uses modified Poisson regression implemented as GLMs. These can be swapped out for other models as needed. You will have to modify this script if you are using 1. a continuous exposure, 2. more than 1 mediator, 3. a different weighting scheme, or 4. IOW instead of IORW. See the supplement in the above Nguyen paper on how to modify this code for those changes. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>*************************************************\n\/\/ this HEADER is all you should have to change\n\/\/ to get this to run as weighted data with binary\n\/\/ exposure using IORW. (Although you'll probably \n\/\/ have to adjust the svyset commands in the \n\/\/ program below to work in your dataset, in all \n\/\/ actuality)\n*************************************************\n\/\/ BEGIN HEADER\n\/\/\n\/\/ Directory of your dataset. You might need to\n\/\/ include the entire file location (eg \"c:\/user\/ ...\")\n\/\/ My file is located in my working directory so I just list\n\/\/ a name here. Alternatively, can put URL for public datasets. \nglobal file \"myfile.dta\"\n\/\/\n\/\/ Pick a title for the table that will output at the end.\n\/\/ This is just to help you stay organized if you are running\n\/\/ a few of these in a row. \nglobal title \"my cool analysis, model 1\"\n\/\/\n\/\/ Components of the regression model. Outcome is binary,\n\/\/ the exposure (sometimes called \"dependent variable\" or \n\/\/ \"treatment\") is also binary. This code would need to be modified \n\/\/ for a continuous exposure variable. See details in Step 2.\nglobal outcome mi_incident\nglobal exposure smoking\nglobal covariates age sex\nglobal mediator c.biomarker\nglobal ifstatement if mi_prevalent==0 &amp; biomarker&lt;. &amp; mi_incident &lt;.\n\/\/\n\/\/ Components of weighting to go into &quot;svyset&quot; command. \n\/\/ You might have \nglobal samplingweight mysamplingweightvar\nglobal id id_num \/\/ ID for your participants\nglobal strata mystratavar\n\/\/\n\/\/ Now pick number of bootstraps. Aim for 1000 when you are actually \n\/\/ running this, but when debugging, start with 50.\nglobal bootcount 50\n\/\/ and set a seed. \nglobal seed 8675309\n\/\/ END HEADER\n*************************************************\n\/\/\n\/\/\n\/\/\n\/\/ Load your dataset. \nuse &quot;${file}&quot;, clear\n\/\/\n\/\/ Drop then make a program.\ncapture program drop iorw_weighted\nprogram iorw_weighted, rclass\n\/\/ drop all variables that will be generated below. \ncapture drop predprob \ncapture drop inverseoddsratio \ncapture drop weight_iorw\n\/\/\n*Step 1: svyset data since your dataset is weighted. If your dataset \n\/\/ does NOT require weighting for its analysis, do not use this program. \nsvyset ${id}, strata(${strata}) weight(${samplingweight}) vce(linearized) singleunit(certainty)\n\/\/\n*Step 2: Run the exposure model, which regresses the exposure on the\n\/\/ mediator &amp; covariates. In this example, the exposure is binary so we are \n\/\/ using logistic regression (logit). If the exposure is a normally-distributed \n\/\/ continuous variable, use linear regression instead. \nsvy linearized, subpop(${ifstatement}): logit ${exposure} ${mediator} ${covariates}\n\/\/\n\/\/ Now grab the beta coefficient for mediator. We&#039;ll need that to generate\n\/\/ the IORW weights. \nscalar med_beta=e(b)&#091;1,1]\n\/\/\n*Step 3: Generate predicted probability and create inverse odds ratio and its \n\/\/ weight.\npredict predprob, p\ngen inverseoddsratio = 1\/(exp(med_beta*${mediator}))\n\/\/ \n\/\/ Calculate inverse odds ratio weights.  Since our dataset uses sampling\n\/\/ weights, we need to multiply the dataset&#039;s weights times the IORW for the \n\/\/ exposure group. This step is fundamentally different for non-weighted \n\/\/ datasets. \n\/\/ Also note that this is for binary exposures, need to revise\n\/\/ this code for continuous exposures. \ngen weight_iorw = ${samplingweight} if ${exposure}==0\nreplace weight_iorw = inverseoddsratio*${samplingweight} if ${exposure}==1\n\/\/\n*Step 4: TOTAL EFFECTS (ie no mediator) without IORW weighting yet. \n\/\/ (same as direct effect, but without the IORW)\nsvyset ${id}, strata(${strata}) weight(${samplingweight}) vce(linearized) singleunit(certainty)\nsvy linearized, subpop(${ifstatement}): glm ${outcome} ${exposure} ${covariates}, family(poisson) link(log) \nmatrix bb_total= e(b)\nscalar b_total=bb_total&#091;1,1] \nreturn scalar b_total=bb_total&#091;1,1]\n\/\/\n*Step 5: DIRECT EFFECTS; using IORW weights instead of the weighting that\n\/\/ is used typically in your analysis. \nsvyset ${id}, strata(${strata}) weight(weight_iorw) vce(linearized) singleunit(certainty)\nsvy linearized, subpop(${ifstatement}): glm ${outcome} ${exposure} ${covariates}, family(poisson) link(log)\nmatrix bb_direct=e(b)\nscalar b_direct=bb_direct&#091;1,1]\nreturn scalar b_direct=bb_direct&#091;1,1]\n\/\/\n*Step 6: INDIRECT EFFECTS\n\/\/ indirect effect = total effect - direct effects\nscalar b_indirect=b_total-b_direct\nreturn scalar b_indirect=b_total-b_direct\n\/\/scalar expb_indirect=exp(scalar(b_indirect))\n\/\/return scalar expb_indirect=exp(scalar(b_indirect))\n\/\/\n*Step 7: calculate % mediation\nscalar define percmed = ((b_total-b_direct)\/b_total)*100\n\/\/ since indirect is total minus direct, above is the same as saying:\n\/\/ scalar define percmed = (b_indirect)\/(b_total)*100\nreturn scalar percmed = ((b_total-b_direct)\/b_total)*100\n\/\/\n\/\/ now end the program.\nend\n\/\/\n*Step 8: Now run the above bootstraping program\nbootstrap r(b_total) r(b_direct) r(b_indirect) r(percmed), seed(${seed}) reps(${bootcount}): iorw_weighted\nmatrix rtablebeta=r(table) \/\/ this is the beta coefficients\nmatrix rtableci=e(ci_percentile) \/\/ this is the 95% confidence intervals using the &quot;percentiles&quot; (ie 2.5th and 97.5th percentiles) rather than the default normal distribution method in stata. The rational for using percentiles rather than normal distribution is found in the 4th paragraph of the &quot;analyses&quot; section here (by refs 37 &amp; 38): https:\/\/bmjopen.bmj.com\/content\/9\/6\/e026258.long\n\/\/\n\/\/ Just in case you are curious, here are the the ranges of the 95% CI, \n\/\/ realize that _bs_1 through _bs_3 need to be exponentiated:\nestat bootstrap, all \/\/ percentiles is &quot;(P)&quot;, normal is &quot;(N)&quot;\n\/\/\n\/\/ Here&#039;s a script that will display your beta coefficients in \n\/\/ a clean manner, exponentiating them when required. \nquietly {\nnoisily di &quot;${title} (bootstrap count=&quot; e(N_reps) &quot;)*&quot;\nnoisily di _col(15) &quot;Beta&quot; _col(25) &quot;95% low&quot; _col(35) &quot;95% high&quot; _col(50) &quot;Together&quot;\nlocal beta1 = exp(rtablebeta&#091;1,1])\nlocal low951 = exp(rtableci&#091;1,1])\nlocal high951 = exp(rtableci&#091;2,1])\nnoisily di &quot;Total&quot; _col(15) %4.2f `beta1&#039; _col(25) %4.2f `low951&#039; _col(35) %4.2f `high951&#039; _col(50) %4.2f `beta1&#039; &quot; (&quot; %4.2f `low951&#039; &quot;, &quot; %4.2f `high951&#039; &quot;)&quot;\nlocal beta2 = exp(rtablebeta&#091;1,2])\nlocal low952 = exp(rtableci&#091;1,2])\nlocal high952 = exp(rtableci&#091;2,2])\nnoisily di &quot;Direct&quot; _col(15) %4.2f `beta2&#039; _col(25) %4.2f `low952&#039; _col(35) %4.2f `high952&#039; _col(50) %4.2f `beta2&#039; &quot; (&quot; %4.2f `low952&#039; &quot;, &quot; %4.2f `high952&#039; &quot;)&quot;\nlocal beta3 = exp(rtablebeta&#091;1,3])\nlocal low953 = exp(rtableci&#091;1,3])\nlocal high953 = exp(rtableci&#091;2,3])\nnoisily di &quot;Indirect&quot; _col(15) %4.2f `beta3&#039; _col(25) %4.2f `low953&#039; _col(35) %4.2f `high953&#039; _col(50) %4.2f `beta3&#039; &quot; (&quot; %4.2f `low953&#039; &quot;, &quot; %4.2f `high953&#039; &quot;)&quot;\nlocal beta4 = (rtablebeta&#091;1,4])\nlocal low954 = (rtableci&#091;1,4])\nlocal high954 = (rtableci&#091;2,4])\nnoisily di &quot;% mediation&quot; _col(15) %4.2f `beta4&#039; &quot;%&quot; _col(25) %4.2f `low954&#039; &quot;%&quot;_col(35) %4.2f  `high954&#039; &quot;%&quot; _col(50) %4.2f `beta4&#039; &quot;% (&quot; %4.2f `low954&#039; &quot;%, &quot; %4.2f `high954&#039; &quot;%)&quot;\nnoisily di &quot;*Confidence intervals use 2.5th and 97.5th percentiles&quot;\nnoisily di &quot; rather than default normal distribution in Stata.&quot;\nnoisily di &quot; &quot;\n}\n\/\/ the end.<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">IORW for datasets that don&#8217;t use weighting (probably the one you are looking for)<\/h2>\n\n\n\n<p>Here is the code above, except without consideration of weighting in the overall dataset. (Obviously, IORW uses weighting itself.)  This uses modified Poisson regression implemented as GLMs. These can be swapped out for other models as needed. You will have to modify this script if you are using 1. a continuous exposure, 2. more than 1 mediator, 3. a different weighting scheme, or 4. IOW instead of IORW.  See the supplement in the above Nguyen paper on how to modify this code for those changes.  <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>*************************************************\n\/\/ this HEADER is all you should have to change\n\/\/ to get this to run as weighted data with binary\n\/\/ exposure using IORW. \n*************************************************\n\/\/ BEGIN HEADER\n\/\/\n\/\/ Directory of your dataset. You might need to\n\/\/ include the entire file location (eg \"c:\/user\/ ...\")\n\/\/ My file is located in my working directory so I just list\n\/\/ a name here. Alternatively, can put URL for public datasets. \nglobal file \"myfile.dta\"\n\/\/\n\/\/ Pick a title for the table that will output at the end.\n\/\/ This is just to help you stay organized if you are running\n\/\/ a few of these in a row. \nglobal title \"my cool analysis, model 1\"\n\/\/ Components of the regression model. Outcome is binary,\n\/\/ the exposure (sometimes called \"dependent variable\" or \n\/\/ \"treatment\") is also binary. This code would need to be modified \n\/\/ for a continuous exposure variable. See details in Step 1.\nglobal outcome  mi_incident\nglobal exposure smoking\nglobal covariates age sex\nglobal mediator c.biomarker\nglobal ifstatement if mi_prevalent==0 &amp; biomarker&lt;. &amp; mi_incident &lt;.\n\/\/\n\/\/ Now pick number of bootstraps. Aim for 1000 when you are actually \n\/\/ running this, but when debugging, start with 50.\nglobal bootcount 50\n\/\/ and set a seed. \nglobal seed 8675309\n\/\/ END HEADER\n*************************************************\n\/\/\n\/\/\n\/\/\n\/\/ Load your dataset. \nuse &quot;${file}&quot;, clear\n\/\/\n\/\/ Drop then make a program.\ncapture program drop iorw\nprogram iorw, rclass\n\/\/ drop all variables that will be generated below. \ncapture drop predprob \ncapture drop inverseoddsratio \ncapture drop weight_iorw\n\/\/\n\/\/\n*Step 1: Run the exposure model, which regresses the exposure on the\n\/\/ mediator &amp; covariates. In this example, the exposure is binary so we are \n\/\/ using logistic regression (logit). If the exposure is a normally-distributed \n\/\/ continuous variable, use linear regression instead. \nlogit ${exposure} ${mediator} ${covariates} ${ifstatement}\n\/\/\n\/\/ Now grab the beta coefficient for mediator. We&#039;ll need that to generate\n\/\/ the IORW weights. \nscalar med_beta=e(b)&#091;1,1]\n\/\/\n*Step 2: Generate predicted probability and create inverse odds ratio and its \n\/\/ weight.\npredict predprob, p\ngen inverseoddsratio = 1\/(exp(med_beta*${mediator}))\n\/\/ \n\/\/ Calculate inverse odds ratio weights. \n\/\/ Also note that this is for binary exposures, need to revise\n\/\/ this code for continuous exposures. \ngen weight_iorw = 1 if ${exposure}==0\nreplace weight_iorw = inverseoddsratio if ${exposure}==1\n\/\/\n*Step 3: TOTAL EFFECTS (ie no mediator) without IORW weighting yet. (same as direct effect, but without the IORW)\nglm ${outcome} ${exposure} ${covariates} ${ifstatement}, family(poisson) link(log) vce(robust)\nmatrix bb_total= e(b)\nscalar b_total=bb_total&#091;1,1] \nreturn scalar b_total=bb_total&#091;1,1]\n\/\/\n*Step 4: DIRECT EFFECTS; using IORW weights\nglm ${outcome} ${exposure} ${covariates} ${ifstatement} &#091;pweight=weight_iorw], family(poisson) link(log) vce(robust)\nmatrix bb_direct=e(b)\nscalar b_direct=bb_direct&#091;1,1]\nreturn scalar b_direct=bb_direct&#091;1,1]\n\/\/\n*Step 5: INDIRECT EFFECTS\n\/\/ indirect effect = total effect - direct effects\nscalar b_indirect=b_total-b_direct\nreturn scalar b_indirect=b_total-b_direct\n\/\/scalar expb_indirect=exp(scalar(b_indirect))\n\/\/return scalar expb_indirect=exp(scalar(b_indirect))\n\/\/\n*Step 6: calculate % mediation\nscalar define percmed = ((b_total-b_direct)\/b_total)*100\n\/\/ since indirect is total minus direct, above is the same as saying:\n\/\/ scalar define percmed = (b_indirect)\/(b_total)*100\nreturn scalar percmed = ((b_total-b_direct)\/b_total)*100\n\/\/\n\/\/ now end the program.\nend\n\/\/\n*Step 7: Now run the above bootstraping program\nbootstrap r(b_total) r(b_direct) r(b_indirect) r(percmed), seed(${seed}) reps(${bootcount}): iorw\nmatrix rtablebeta=r(table) \/\/ this is the beta coefficients\nmatrix rtableci=e(ci_percentile) \/\/ this is the 95% confidence intervals using the &quot;percentiles&quot; (ie 2.5th and 97.5th percentiles) rather than the default normal distribution method in stata. The rational for using percentiles rather than normal distribution is found in the 4th paragraph of the &quot;analyses&quot; section here (by refs 37 &amp; 38): https:\/\/bmjopen.bmj.com\/content\/9\/6\/e026258.long\n\/\/\n\/\/ Just in case you are curious, here are the the ranges of the 95% CI, \n\/\/ realize that _bs_1 through _bs_3 need to be exponentiated:\nestat bootstrap, all \/\/ percentiles is &quot;(P)&quot;, normal is &quot;(N)&quot;\n\/\/\n\/\/ Here&#039;s a script that will display your beta coefficients in \n\/\/ a clean manner, exponentiating them when required. \nquietly {\nnoisily di &quot;${title} (bootstrap count=&quot; e(N_reps) &quot;)*&quot;\nnoisily di _col(15) &quot;Beta&quot; _col(25) &quot;95% low&quot; _col(35) &quot;95% high&quot; _col(50) &quot;Together&quot;\nlocal beta1 = exp(rtablebeta&#091;1,1])\nlocal low951 = exp(rtableci&#091;1,1])\nlocal high951 = exp(rtableci&#091;2,1])\nnoisily di &quot;Total&quot; _col(15) %4.2f `beta1&#039; _col(25) %4.2f `low951&#039; _col(35) %4.2f `high951&#039; _col(50) %4.2f `beta1&#039; &quot; (&quot; %4.2f `low951&#039; &quot;, &quot; %4.2f `high951&#039; &quot;)&quot;\nlocal beta2 = exp(rtablebeta&#091;1,2])\nlocal low952 = exp(rtableci&#091;1,2])\nlocal high952 = exp(rtableci&#091;2,2])\nnoisily di &quot;Direct&quot; _col(15) %4.2f `beta2&#039; _col(25) %4.2f `low952&#039; _col(35) %4.2f `high952&#039; _col(50) %4.2f `beta2&#039; &quot; (&quot; %4.2f `low952&#039; &quot;, &quot; %4.2f `high952&#039; &quot;)&quot;\nlocal beta3 = exp(rtablebeta&#091;1,3])\nlocal low953 = exp(rtableci&#091;1,3])\nlocal high953 = exp(rtableci&#091;2,3])\nnoisily di &quot;Indirect&quot; _col(15) %4.2f `beta3&#039; _col(25) %4.2f `low953&#039; _col(35) %4.2f `high953&#039; _col(50) %4.2f `beta3&#039; &quot; (&quot; %4.2f `low953&#039; &quot;, &quot; %4.2f `high953&#039; &quot;)&quot;\nlocal beta4 = (rtablebeta&#091;1,4])\nlocal low954 = (rtableci&#091;1,4])\nlocal high954 = (rtableci&#091;2,4])\nnoisily di &quot;% mediation&quot; _col(15) %4.2f `beta4&#039; &quot;%&quot; _col(25) %4.2f `low954&#039; &quot;%&quot;_col(35) %4.2f  `high954&#039; &quot;%&quot;  _col(50) %4.2f `beta4&#039; &quot; (&quot; %4.2f `low954&#039; &quot;, &quot; %4.2f `high954&#039; &quot;)&quot;\nnoisily di &quot;*Confidence intervals use 2.5th and 97.5th percentiles&quot;\nnoisily di &quot; rather than default normal distribution in Stata.&quot;\nnoisily di &quot; &quot;\n}\n\/\/ the end.<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Mediation is a commonly-used tool in epidemiology. Inverse odds ratio-weighted (IORW) mediation was described in 2013 by Eric J. Tchetgen Tchetgen in this publication. It&#8217;s a robust mediation technique that can be used in many sorts of analyses, including logistic regression, modified Poisson regression, etc. It is also considered valid if there is an observed &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/blog.uvm.edu\/tbplante\/2021\/11\/08\/mediation-analysis-in-stata-using-iorw-inverse-odds-ratio-weighted\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Mediation analysis in Stata using IORW (inverse odds ratio-weighted mediation)&#8221;<\/span><\/a><\/p>\n","protected":false},"author":4473,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[502558,477491],"tags":[696677,696678,696681,696683,696682,350237,696680,502556],"class_list":["post-951","post","type-post","status-publish","format-standard","hentry","category-epidemiology-and-biostatistics","category-stata-code","tag-inverse-odds-ratio-weighting","tag-inverse-odds-ratio-weighted","tag-iorw","tag-iow","tag-iowr","tag-mediation","tag-mediation-analysis","tag-stata","entry"],"_links":{"self":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/951","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/users\/4473"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/comments?post=951"}],"version-history":[{"count":23,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/951\/revisions"}],"predecessor-version":[{"id":2173,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/951\/revisions\/2173"}],"wp:attachment":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/media?parent=951"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/categories?post=951"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/tags?post=951"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}