{"id":1398,"date":"2023-02-23T11:27:35","date_gmt":"2023-02-23T16:27:35","guid":{"rendered":"https:\/\/blog.uvm.edu\/tbplante\/?p=1398"},"modified":"2023-02-27T14:55:52","modified_gmt":"2023-02-27T19:55:52","slug":"generating-overlapping-overlaying-decile-frequency-histograms-in-stata","status":"publish","type":"post","link":"https:\/\/blog.uvm.edu\/tbplante\/2023\/02\/23\/generating-overlapping-overlaying-decile-frequency-histograms-in-stata\/","title":{"rendered":"Generating overlapping\/overlaying decile frequency histograms in Stata"},"content":{"rendered":"\n<p>I recently had a dataset with two groups (0 or 1), and a continuous variable. I wanted to show how the overall deciles of that continuous variable varied by group. Step 1 was to generate an overall decile variable with an &#8211;xtile&#8211; command. Step 2 was to make a frequency histogram. BUT! I wanted these histograms to overlap and not be side-by-side. Stata&#8217;s handy &#8211;histogram&#8211; is a quick and easy way to make histograms by groups using the &#8211;by&#8211; command, but it makes them side-by-side like this, and not overlapping. (Note: see how to use &#8211;twoway histogram&#8211; to make overlapping histograms at the end of this post.) <\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-4.png\" alt=\"\" class=\"wp-image-1402\" width=\"411\" height=\"293\" srcset=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-4.png 717w, https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-4-300x214.png 300w\" sizes=\"auto, (max-width: 411px) 100vw, 411px\" \/><\/figure>\n\n\n\n<p>I instead used a collapse command to generate a count of # in each decile by group (using the transparent color command as color percent sign number), like this: <\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-10.png\" alt=\"\" class=\"wp-image-1411\" width=\"428\" height=\"310\" srcset=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-10.png 717w, https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-10-300x218.png 300w\" sizes=\"auto, (max-width: 428px) 100vw, 428px\" \/><\/figure>\n\n\n\n<p>Here&#8217;s the code to make both:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>clear all\n\n\/\/ make fake data\nset obs 1000\nset seed 8675309\ngen id=_n \/\/ ID of 1 though 100\ngen var0or1 = round(runiform())\ngen continuousvalue = 100*runiform()\n\n\/\/ make overall deciles of continuousvalue\nxtile decilesbygroup = continuousvalue, nq(10)\n\n\/\/ now make a frequency histogram of those deciles\nset scheme s1color \/\/ I like this scheme\nhist decilesbygroup, by(var0or1) frequency bin(10)\n\n\/\/ make a variable equal to 1 that we will sum in collapse\ngen countbygroup = 1\n\/\/ now sum that variable by the 0 or 1 indicator and deciles\ncollapse (sum) countbygroup, by(var0or1 decilesbygroup)\n\/\/ now render the count from above as a bar graph:\nset scheme s1color \/\/ I like this scheme\ntwoway \/\/\/\n(bar countbygroup decilesbygroup if var0or1==0, vertical color(red%40)) \/\/\/\n(bar countbygroup decilesbygroup if var0or1==1, vertical color(blue%40)) \/\/\/\n, \/\/\/\nlegend(order(1 \"var0or1==0\" 2 \"var0or1==1\")) \/\/\/\ntitle(\"Title!\") \/\/\/\nxtitle(\"Decile of continuousvalue\") \/\/\/\nxla(1(1)10) \/\/\/\nyla(0(10)70, angle(0)) \/\/\/\nytitle(\"N in Decile\")<\/code><\/pre>\n\n\n\n<p>You could also offset the deciles by the var0or1 and shrink the bar width a bit to get a frequency histogram where the bars are next to each other, like this: <\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-11.png\" alt=\"\" class=\"wp-image-1412\" width=\"451\" height=\"329\" srcset=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-11.png 708w, https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-11-300x219.png 300w\" sizes=\"auto, (max-width: 451px) 100vw, 451px\" \/><\/figure>\n\n\n\n<pre class=\"wp-block-code\"><code>clear all\n\n\/\/ make fake data\nset obs 1000\nset seed 8675309\ngen id=_n \/\/ ID of 1 though 100\ngen var0or1 = round(runiform())\ngen continuousvalue = 100*runiform()\n\n\/\/ make overall deciles of continuousvalue\nxtile decilesbygroup = continuousvalue, nq(10)\n\n\/\/ now make a frequency histogram of those deciles\nset scheme s1color \/\/ I like this scheme\nhist decilesbygroup, by(var0or1) frequency bin(10)\n\n\/\/ offset the decilesbygroup by var0or1 a bit:\nreplace decilesbygroup = decilesbygroup - 0.2 if var0or1==0\nreplace decilesbygroup = decilesbygroup + 0.2 if var0or1==1\n\n\/\/ make a variable equal to 1 that we will sum in collapse\ngen countbygroup = 1\n\/\/ now sum that variable by the 0 or 1 indicator and deciles\ncollapse (sum) countbygroup, by(var0or1 decilesbygroup)\n\/\/ now render the count from above as a bar graph:\nset scheme s1color \/\/ I like this scheme\ntwoway \/\/\/\n(bar countbygroup decilesbygroup if var0or1==0, vertical color(red%40) barwidth(0.4)) \/\/\/\n(bar countbygroup decilesbygroup if var0or1==1, vertical color(blue%40) barwidth(0.4)) \/\/\/\n, \/\/\/\nlegend(order(1 \"var0or1==0\" 2 \"var0or1==1\")) \/\/\/\ntitle(\"Title!\") \/\/\/\nxtitle(\"Decile of continuousvalue\") \/\/\/\nxla(1(1)10) \/\/\/\nyla(0(10)70, angle(0)) \/\/\/\nytitle(\"N in Decile\")<\/code><\/pre>\n\n\n\n<p>A few quick notes here: The way that I am specifying the &#8220;bins&#8221; for the histograms here is different than how Stata specifies bins for histograms, since I&#8217;m forcing it to render by decile. If you were to generate a histogram of the &#8220;continuousvalue&#8221; instead of the above example using &#8220;decilebygroup&#8221;, you&#8217;ll notice that the resulting histograms looks a bit different from each other: <\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-8.png\" alt=\"\" class=\"wp-image-1406\" width=\"374\" height=\"272\" srcset=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-8.png 718w, https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-8-300x218.png 300w\" sizes=\"auto, (max-width: 374px) 100vw, 374px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-9.png\" alt=\"\" class=\"wp-image-1407\" width=\"381\" height=\"277\" srcset=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-9.png 712w, https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-9-300x218.png 300w\" sizes=\"auto, (max-width: 381px) 100vw, 381px\" \/><\/figure>\n\n\n\n<pre class=\"wp-block-code\"><code>clear all\n\n\/\/ make fake data\nset obs 1000\nset seed 8675309\ngen id=_n \/\/ ID of 1 though 100\ngen var0or1 = round(runiform())\ngen continuousvalue = 100*runiform()\n\n\/\/ make overall deciles of continuousvalue\nxtile decilesbygroup = continuousvalue, nq(10)\n\n\/\/ now make a frequency histogram of those deciles\nset scheme s1color \/\/ I like this scheme\nhist decilesbygroup, title(\"hist decilesbygroup\") by(var0or1) frequency bin(10) name(a)\nhist continuousvalue, title(\"hist continuousvalue\") by(var0or1)  frequency bin(10) name(b)<\/code><\/pre>\n\n\n\n<p>Also, this code will only render frequency histograms, not density histograms, which are the default in Stata. You can also use the &#8211;twoway hist&#8211; command to overlay two bar graphs, but these might not perfectly align with the deciles. But, using the &#8211;twoway hist&#8211; allows you to use density histograms instead. See the example that follows. I suspect that most people will get what they need with the &#8211;twoway hist&#8211; command in Stata. <\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-12.png\" alt=\"\" class=\"wp-image-1417\" width=\"536\" height=\"385\" srcset=\"https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-12.png 975w, https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-12-300x215.png 300w, https:\/\/blog.uvm.edu\/tbplante\/files\/2023\/02\/image-12-768x551.png 768w\" sizes=\"auto, (max-width: 536px) 100vw, 536px\" \/><\/figure>\n\n\n\n<pre class=\"wp-block-code\"><code>clear all\n\n\/\/ make fake data\nset obs 1000\nset seed 8675309\ngen id=_n \/\/ ID of 1 though 100\ngen var0or1 = round(runiform())\ngen continuousvalue = 100*runiform()\n\nset scheme s1color \/\/ I like this scheme\ntwoway \/\/\/\n(hist continuousvalue if var0or1==0, bin(10) color(red%40) density) \/\/\/\n(hist continuousvalue if var0or1==1, bin(10) color(blue%40) density) \/\/\/\n, \/\/\/\nlegend(order(1 \"var0or1==0\" 2 \"var0or1==1\")) \/\/\/\ntitle(\"Title!\") \/\/\/\nxtitle(\"Grouping in 10 Bins\") <\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I recently had a dataset with two groups (0 or 1), and a continuous variable. I wanted to show how the overall deciles of that continuous variable varied by group. Step 1 was to generate an overall decile variable with an &#8211;xtile&#8211; command. Step 2 was to make a frequency histogram. BUT! I wanted these &hellip; <a href=\"https:\/\/blog.uvm.edu\/tbplante\/2023\/02\/23\/generating-overlapping-overlaying-decile-frequency-histograms-in-stata\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Generating overlapping\/overlaying decile frequency histograms in Stata<\/span><\/a><\/p>\n","protected":false},"author":4473,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[477491],"tags":[703445,703442,703443,703444,502556],"class_list":["post-1398","post","type-post","status-publish","format-standard","hentry","category-stata-code","tag-frequency-histograms","tag-histograms","tag-overlapping-histograms","tag-overlaying-histograms","tag-stata"],"_links":{"self":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/1398","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/users\/4473"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/comments?post=1398"}],"version-history":[{"count":6,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/1398\/revisions"}],"predecessor-version":[{"id":1419,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/1398\/revisions\/1419"}],"wp:attachment":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/media?parent=1398"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/categories?post=1398"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/tags?post=1398"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}