{"id":225,"date":"2018-06-08T11:22:04","date_gmt":"2018-06-08T15:22:04","guid":{"rendered":"http:\/\/blog.uvm.edu\/tbplante\/?p=225"},"modified":"2020-09-16T09:48:07","modified_gmt":"2020-09-16T13:48:07","slug":"the-confusion-nomenclature-of-epidemiology-and-biostatistics","status":"publish","type":"post","link":"https:\/\/blog.uvm.edu\/tbplante\/2018\/06\/08\/the-confusion-nomenclature-of-epidemiology-and-biostatistics\/","title":{"rendered":"The confusion nomenclature of epidemiology and biostatistics"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">This should be more simple.<\/h2>\n\n\n\n<p>Epidemiology and biostatistics are awash with synonyms and each institution has its own preferred nomenclature to describe the same general concepts. I started this page as a central place to document the various terms by concept. I&#8217;ll plan on revisiting and updating over time.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Regression<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Fundamentals<\/h3>\n\n\n\n<p>You probably learned the fundamentals of regression in introductory algebra but may not realize it.&nbsp; Remember drawing a graph from a slope-intercept equation? <em><strong>Draw a graph where Y is equal to 1\/4x plus 5<\/strong><\/em>. (<a href=\"https:\/\/www.khanacademy.org\/math\/algebra\/two-var-linear-equations\/graphing-slope-intercept-equations\/v\/graphing-a-line-in-slope-intercept-form\">Here is the relevant Khan Academy Algebra I video about this<\/a>.) You take the general equation:<\/p>\n\n\n\n<p class=\"has-text-align-center\">Y = mx + b<\/p>\n\n\n\n<p class=\"has-text-align-left\">&#8230;where Y is the y-axis, m is the slope of the line, and b is where the line crosses the y-axis. The equation you will write is:<\/p>\n\n\n\n<p class=\"has-text-align-center\">Y=1\/4x + 5<\/p>\n\n\n\n<p>&#8230;and you will draw:<\/p>\n\n\n\n<div class=\"wp-block-media-text alignwide is-stacked-on-mobile\" style=\"grid-template-columns:61% auto\"><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"240\" src=\"http:\/\/blog.uvm.edu\/tbplante\/files\/2018\/06\/algebra-1-figure-300x240.jpg\" alt=\"\" class=\"wp-image-226\" srcset=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2018\/06\/algebra-1-figure-300x240.jpg 300w, https:\/\/blog.uvm.edu\/tbplante\/files\/2018\/06\/algebra-1-figure.jpg 607w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p class=\"has-large-font-size\"><\/p>\n<\/div><\/div>\n\n\n\n<p>This sounding familiar? When you do a linear regression, you do the same thing. Instead, you&nbsp;<em><strong>regress Y on X<\/strong><\/em>, or:<\/p>\n\n\n\n<p class=\"has-text-align-center\">Y = \u03b2<sub>1<\/sub>x<sub>1<\/sub> +&nbsp;\u03b2<sub>0<\/sub><\/p>\n\n\n\n<p>And fitting in the variables here, you want to <em>figure out what a predicted cholesterol level will be for folks by a given age<\/em>. You would&nbsp;<em><strong>regress cholesterol level on age<\/strong><\/em>:<\/p>\n\n\n\n<p class=\"has-text-align-center\">Cholesterol level = \u03b2<sub>1<\/sub>*Age&nbsp;+&nbsp;\u03b2<sub>0<\/sub><\/p>\n\n\n\n<p>Here, x<sub>1<\/sub>&nbsp;is the slope of the line for age and&nbsp;\u03b2<sub>0<\/sub> is the intercept on the Y-axis, essentially the same as the b in Y=mx+b. When you run a regression in Stata, you type<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">regress y x<\/pre>\n\n\n\n<p>or here,<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">regress cholesterol age<\/pre>\n\n\n\n<p>Let&#8217;s say that Stata spits out something like:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">      Source |  xxxxxxxxxxxxxxxxxxxxxxxxxxxx  \n-------------+------------------------------    \n       Model |  xxxxxxxxxxxxxxxxxxxxxxxxxxxx   \n    Residual |  xxxxxxxxxxxxxxxxxxxxxxxxxxxx    \n-------------+------------------------------    \n       Total |  xxxxxxxxxxxxxxxxxxxxxxxxxxxx   \n\n------------------------------------------------------------------------------\n  cholesterol|      coeff       se         t<b> <\/b>    P&gt;|t|    [95% Conf. Interval]\n-------------+----------------------------------------------------------------\n         age |     0.500     xxxxxxxx    xxxxx   0.000     0.4000      0.60000\n       _cons |     100       xxxxxxxx    xxxxx   0.000     90.000      110.000\n------------------------------------------------------------------------------<\/pre>\n\n\n\n<p>The \u03b2<sub>1&nbsp;<\/sub>coefficient for age is 0.5. The intercept, or&nbsp;\u03b2<sub>0<\/sub> is 100. You would interpret this as&nbsp;<em><strong>cholesterol level = 0.5*age in years + 100<\/strong><\/em>. You could plot this using your Algebra 1 skills.<\/p>\n\n\n\n<p class=\"has-text-align-center\">Cholesterol = 0.5*age + 100<\/p>\n\n\n\n<p>Or you can substitute in actual numbers. What is the predicted cholesterol at age 50? Answer: 125.<\/p>\n\n\n\n<p>If you want to make it more complex and add more variables to explain cholesterol level, it&#8217;s no longer a straight line on a graph, but the concept is the same. A <em><strong>multiple linear regression<\/strong><\/em>&nbsp;adds more X variables. You can&nbsp;<em>figure out what a predicted cholesterol level will be for folks by age, sex, and BMI.<\/em> You would&nbsp;<em><strong>regress cholesterol level on age, sex, and BMI.&nbsp;<\/strong><\/em>(You would code sex as 0 or 1, like female = 1 and male = 0.)<\/p>\n\n\n\n<p class=\"has-text-align-center\">Y =&nbsp; \u03b2<sub>1<\/sub>x<sub>1<\/sub> +&nbsp;\u03b2<sub>2<\/sub>x<sub>2<\/sub> +&nbsp;\u03b2<sub>3<\/sub>x<sub>3<\/sub> + \u03b2<sub>0<\/sub><\/p>\n\n\n\n<p>Or,<\/p>\n\n\n\n<p class=\"has-text-align-center\">Y =&nbsp; \u03b2<sub>1<\/sub>*Age&nbsp;+&nbsp;\u03b2<sub>2<\/sub>*Sex&nbsp;+&nbsp;\u03b2<sub>3<\/sub>*BMI&nbsp;+ \u03b2<sub>0<\/sub><\/p>\n\n\n\n<p>You get the idea.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Names of Y and X<\/h3>\n\n\n\n<p>This is what irks me. There are so many synonyms for Y and X variables. Here is a chart that I&#8217;ll update over time with synonyms seen in the wild.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><th>&nbsp;Y<\/th><th>=<\/th><th>x<\/th><\/tr><tr><td>Dependent<\/td><td>&nbsp;<\/td><td>Independent<\/td><\/tr><tr><td>Outcome<\/td><td>&nbsp;<\/td><td>Predictor<\/td><\/tr><tr><td>&nbsp;<\/td><td>&nbsp;<\/td><td>Covariate<\/td><\/tr><tr><td>&nbsp;<\/td><td>&nbsp;<\/td><td>Factor<\/td><\/tr><tr><td>&nbsp;<\/td><td>&nbsp;<\/td><td>Exposure variable<\/td><\/tr><tr><td>&nbsp;<\/td><td>&nbsp;<\/td><td>Explanatory variable<\/td><\/tr><\/tbody><\/table><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>This should be more simple. Epidemiology and biostatistics are awash with synonyms and each institution has its own preferred nomenclature to describe the same general concepts. I started this page as a central place to document the various terms by concept. I&#8217;ll plan on revisiting and updating over time. Regression Fundamentals You probably learned the &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/blog.uvm.edu\/tbplante\/2018\/06\/08\/the-confusion-nomenclature-of-epidemiology-and-biostatistics\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;The confusion nomenclature of epidemiology and biostatistics&#8221;<\/span><\/a><\/p>\n","protected":false},"author":4473,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[502558],"tags":[],"class_list":["post-225","post","type-post","status-publish","format-standard","hentry","category-epidemiology-and-biostatistics","entry"],"_links":{"self":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/225","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/users\/4473"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/comments?post=225"}],"version-history":[{"count":6,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/225\/revisions"}],"predecessor-version":[{"id":578,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/225\/revisions\/578"}],"wp:attachment":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/media?parent=225"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/categories?post=225"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/tags?post=225"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}