{"id":970,"date":"2021-11-15T16:01:24","date_gmt":"2021-11-15T21:01:24","guid":{"rendered":"https:\/\/blog.uvm.edu\/tbplante\/?p=970"},"modified":"2023-09-19T11:30:36","modified_gmt":"2023-09-19T15:30:36","slug":"using-statas-frames-feature-to-build-an-analytical-dataset","status":"publish","type":"post","link":"https:\/\/blog.uvm.edu\/tbplante\/2021\/11\/15\/using-statas-frames-feature-to-build-an-analytical-dataset\/","title":{"rendered":"Using Stata&#8217;s Frames feature to build an analytical dataset"},"content":{"rendered":"\n<p>Stata 16 introduced the new Frames functionality, which allows multiple datasets to be stored in memory, with each dataset stored in its own &#8220;Frame&#8221;. This allows for dynamic manipulation of multiple datasets across multiple Frames. Stata is still simplest to use when manipulating a single dataset (or, frame). So, Stata users will probably be interested in building a single dataset\/Frame for a specific analysis that is built from variables taken from multiple datasets\/Frames. <\/p>\n\n\n\n<p>One handy application of Frames is to import non-Stata datasets as separate frames and combine them (really, merge) into a single analytical dataset\/Frame. Before using Frames, I had previously imported non-Stata datasets, saved them locally, then merged them 1 by 1. With Frames, you just import each dataset into its own Frame, and &#8220;merge&#8221; them directly, skipping the intermediate &#8220;save as Stata dta file&#8221; step. <\/p>\n\n\n\n<p>Here&#8217;s my approach to building a single analytical dataset from multiple imported datasets, with frames. We&#8217;ll do this with NHANES data. This is a modification of the code on <a rel=\"noreferrer noopener\" href=\"https:\/\/blog.uvm.edu\/tbplante\/2018\/03\/02\/downloading-and-analyzing-nhanes-datasets-with-stata-in-a-single-do-file\/\" target=\"_blank\">this post<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Step 1: Drop (reset) all frames, create new ones to import the new datasets, and run commands within each new frame to import NHANES\/SAS datasets.<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ Here's the NHANES website, FYI:\n\/\/ https:\/\/wwwn.cdc.gov\/nchs\/nhanes\/default.aspx\n\/\/\n\/\/ drop all frames from memory. This will delete all unsaved data so be careful!!\nframes reset\n\/\/\n\/\/ make a blank frame called \"DEMO_F\"\n\/\/ you could type \"frame create\" or the brief synonym \"mkf\" for \n\/\/ \"make frame\"\nmkf DEMO_F\n\/\/ ...Then run the sas import command within it, grabbing it from the CDC website. \n\/\/ Since this command needs to be run from within the DEMO_F frame,\n\/\/ we can tell Stata to run the command from that frame without \n\/\/ actually changing to it using the \"frame &#091;name]:\" prefix\nframe DEMO_F: import sasxport5 \"https:\/\/wwwn.cdc.gov\/Nchs\/Nhanes\/2009-2010\/DEMO_F.XPT\", clear \n\/\/\n\/\/ ditto for the \"BPQ_F\" and \"KIQ_U_F\" datasets.\nmkf BPQ_F\nframe BPQ_F: import sasxport5 \"https:\/\/wwwn.cdc.gov\/Nchs\/Nhanes\/2009-2010\/BPQ_F.XPT\", clear\n\/\/\nmkf KIQ_U_F\nframe KIQ_U_F: import sasxport5 \"https:\/\/wwwn.cdc.gov\/Nchs\/Nhanes\/2009-2010\/KIQ_U_F.XPT\", clear\n\/\/ \n\/\/ let's see a list of current frames:\nframes dir \n\/\/\n\/\/ Which frame are you using though? \n\/\/ pwf is present working frame, or the current one in use.\npwf \n\/\/ You'll see that the pwf if \"default\". <\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Step 2: Create an &#8220;analytical&#8221; frame that will contain the data you need to complete your analysis, and copy the variable that links all of your data to this frame. Also switch to that new analytical frame. <\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ The \"linking\" variable in this dataset is called \"seqn\", which \n\/\/ we will copy (\"put\") from the DEMO_F frame. \n\/\/ (Your file might have a linking variable called \"id\".)\n\/\/ This creates a new frame called \"analytical\" and also moves the \"seqn\"\n\/\/ variable from DEMO_F to the new analytical frame in one line.\nframe DEMO_F: frame put seqn, into(analytical)\n\/\/\n\/\/ see current list of frames and present working frame:\nframes dir\npwf\n\/\/ Now change from the default frame to the new analytical one.\n\/\/ You can change frames with \"cwf\" for \"change working frame\".\ncwf analytical\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Step 3: Now that you are are in the new analytical frame, link all of your frames using the &#8220;linking&#8221; variable (&#8220;seqn&#8221; here). <\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ use the \"frlink\" command to link frames. This can be 1:1 linking or 1:m.\n\/\/\n\/\/ remember that your cwf should be analytical right now. \n\/\/\nfrlink 1:1 seqn, frame(DEMO_F)\nfrlink 1:1 seqn, frame(BPQ_F)\nfrlink 1:1 seqn, frame(KIQ_U_F)\n\/\/ \n\/\/ You are still within the \"analytical\" frame, but now your frames are all \n\/\/ linked or connected to each other. \n\/\/ if you look at your dataset with the --browse-- command, you'll see there\n\/\/ are now new DEMO_F, BPQ_F, and KIQ_U_F variables. These are the \"rows\" for\n\/\/ linked IDs in those other frames, so Stata knows where to look for \n\/\/ variables in those other frames. \n\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Step 4: &#8220;Get&#8221; the specific variables you want from each frame. This is how you merge individual variables from multiple frames. <\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ from my NHANES post, we need to grab weighting variables from DEMO_F\n\/\/ and a BP variable from BPQ_F. Just for kicks, we'll also grab self-\n\/\/ reported \"weak kidneys\" from KIQ_U_F. \n\/\/ \n\/\/ remember that your cwf should be analytical right now. \n\/\/\nfrget wtint2yr wtmec2yr sdmvpsu sdmvstra, from(DEMO_F)\nfrget bpq020, from(BPQ_F)\nfrget kiq022, from(KIQ_U_F)\n\/\/\n\/\/ now you have a nice merged database! \n\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Step 5 (optional): If you are satisfied with your analytical dataset and no longer need the other frames, you can now drop the &#8220;linking&#8221; variables from the analytical dataset, save your analytical dataset, clear your frames, and reopen your analytical dataset<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ You can save this analytical frame as a Stata\n\/\/ dataset now if you are done manipulating the other frames. \n\/\/ \n\/\/ You might opt to drop the new linking variables prior to saving\n\/\/ for simplicity.\ndrop DEMO_F BPQ_F KIQ_U_F\n\/\/\n\/\/ Stata's \"save\" command won't save other frames as FYI, just the pwf.\n\/\/ But in this example, we are done with other frames. \nsave analytical.dta, replace\n\/\/\n\/\/ Drop all other frames so you don't get an annoying pop-up \n\/\/ about unsaved frames in memory. \n\/\/ Be careful! This will drop all data from memory!!\nframes reset \n\/\/\n\/\/ now reopen your previously saved analytical dataset.\nuse analytical.dta, clear<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Bonus: Appending frames<\/h2>\n\n\n\n<p>You might need to append frames. There are some details <a href=\"https:\/\/www.statalist.org\/forums\/forum\/general-stata-discussion\/general\/1505424-can-stata-16-frames-be-appended\">here<\/a> about how to do this. I&#8217;m using the &#8211;fframeappend&#8211; command by J\u00fcrgen Wiemers.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ install fframeappend, only need to do once:\nssc install fframeappend\n\/\/ change to the frame you want to append all of your data to,\n\/\/ in this example it's called \"appended\"\n\/\/ you'll have to use \"mkf appended\" if you don't already have one\n\/\/ called that. \ncwf appended\n\/\/now append your frame \"a\" to the current open frame\nfframeappend, using(a)\n\/\/ now repeat using frame \"b\" to the current open frame\nfframeappend, using(b)\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Bonus: Here&#8217;s steps 1-4 in a single loop using global macros<\/h2>\n\n\n\n<p>For advanced users: Here&#8217;s a loop and some global macros that is adaptable to downloading several years. NHANES uses different letters for files from different years, the 2009-2010 one uses &#8220;F&#8221;. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>frames reset\nglobal url \"https:\/\/wwwn.cdc.gov\/Nchs\/Nhanes\/\"\nglobal F \"2009-2010\/\"\nglobal files DEMO BPQ KIQ_U\n\nforeach x in F {\n\tforeach y in $files {\n\t\tmkf `y'_`x'\n\t\tframe `y'_`x': import sasxport5 \"${url}${`x'}`y'_`x'.xpt\"\n\t}\nframe DEMO_`x': frame put seqn, into(analytical_`x')\ncwf analytical_`x'\n\tforeach y in $files {\n\t\tfrlink 1:1 seqn, frame(`y'_`x') \n\t}\n\tfrget wtint2yr wtmec2yr sdmvpsu sdmvstra, from(DEMO_`x')\n\tfrget bpq020, from(BPQ_`x')\n\tfrget kiq022, from(KIQ_U_`x')\n}\nframes dir \npwf \n<\/code><\/pre>\n\n\n\n<p>Here&#8217;s the same as above, but just for a single year.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>frames reset\nglobal dir \"https:\/\/wwwn.cdc.gov\/Nchs\/Nhanes\/2009-2010\/\"\nglobal files DEMO_F BPQ_F KIQ_U_F\n\nforeach y in $files {\n\tmkf `y'\n\tframe `y': import sasxport5 \"${dir}`y'.xpt\"\n}\nframe DEMO_F: frame put seqn, into(analytical)\ncwf analytical\nforeach y in $files {\n\tfrlink 1:1 seqn, frame(`y') \n}\nfrget wtint2yr wtmec2yr sdmvpsu sdmvstra, from(DEMO_F)\nfrget bpq020, from(BPQ_F)\nfrget kiq022, from(KIQ_U_F)\nframes dir \npwf <\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Stata 16 introduced the new Frames functionality, which allows multiple datasets to be stored in memory, with each dataset stored in its own &#8220;Frame&#8221;. This allows for dynamic manipulation of multiple datasets across multiple Frames. Stata is still simplest to use when manipulating a single dataset (or, frame). So, Stata users will probably be interested &hellip; <a href=\"https:\/\/blog.uvm.edu\/tbplante\/2021\/11\/15\/using-statas-frames-feature-to-build-an-analytical-dataset\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Using Stata&#8217;s Frames feature to build an analytical dataset<\/span><\/a><\/p>\n","protected":false},"author":4473,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[477491],"tags":[696689,696687,696688,696684,696690,29609,502556],"class_list":["post-970","post","type-post","status-publish","format-standard","hentry","category-stata-code","tag-building-dataset","tag-data-cleaning","tag-dataset","tag-frames","tag-importing-data","tag-merge","tag-stata"],"_links":{"self":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/970","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/users\/4473"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/comments?post=970"}],"version-history":[{"count":25,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/970\/revisions"}],"predecessor-version":[{"id":1502,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/posts\/970\/revisions\/1502"}],"wp:attachment":[{"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/media?parent=970"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/categories?post=970"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.uvm.edu\/tbplante\/wp-json\/wp\/v2\/tags?post=970"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}