Note in 2023: We are using R and not Stata this summer so this post doesn’t apply to this year’s projects. Stata is a popular commercial statistical software package that was first released 30+ years ago. It has some really nice features, loads of top-rate documentation, a very active community, and approachable syntax. For beginners, …
Category Archives: Stata code
Table 1 with pweights in Stata
The very excellent table1_mc program will automate generation of your Table 1 for nearly all needs (read about it here), except for datasets using pweight. I’ve been toying around with automating Excel table generation using Stata v16+ Frames features. I recently started working on a database that requires pweighting for analyses, and opted to use …
Getting Python and Jupyter to work with Stata in Windows
Note: this post was written prior to Stata 17, which now allows Python to control Stata and vice versa. In Stata 16, Python could not control Stata but Stata could control Python. Because of this functionality, there’s a more streamlined approach to getting Jupyter to play nicely with Stata 17, detailed here. You’ll still need …
Continue reading “Getting Python and Jupyter to work with Stata in Windows”
Making Restricted Cubic Splines in Stata
I love restricted cubic splines, made famous by Frank Harrell (see his approach starting on page 58 here). Dr. Harrell made a package for automating these in R. I’m not aware of an equivalent package for Stata. Here’s my approach to making this specific restricted cubic spline in Stata. The model here is modified Poisson …
Use Stata to download the NY Times COVID-19 database and render a Twitter-compatible US mortality figure
Note: this code probably doesn’t work anymore with changes in the NY Times database. I’m keeping it here for historical purposes (4/22/2022). Here’s the figure! Code follows Comments are in-line below. Some unique strategies in this code: This will automatically download the latest NY Times dataset, but the date of “last day of follow-up” needs …
Output a Stata graph that won’t be clipped in Twitter
Twitter sizing Twitter does this weird thing where it clips figures that aren’t the correct proportion. I came across this blog post that argues that 1100×628 px is the ‘optimal’ Twitter image size. So, how do you output Stata figures to be 1100×628? Output a Stata figure in Twitter size in 2 steps Step 1: …
Continue reading “Output a Stata graph that won’t be clipped in Twitter”
Figure to show the distribution of quartiles plus their median in Stata
Buried in the supplement of a recent paper is a variant of this figure that I’m rather proud of: It shows the distribution of quartiles of BNP and NT proBNP at baseline on a log scale, by use of beta blockers (BB) at baseline. It also shows the midway point of the medians. It’s a …
Continue reading “Figure to show the distribution of quartiles plus their median in Stata”
Making a publication-ready Kaplan-Meier plot in Stata
This post is a bit dated but is a nice primer on KM curves. For a more recent one that includes printing the HR results on the KM curve, see this post. In the early Winter of 2019, we had a paper published in JAMA: Network Open using the TOPCAT trial dataset looking at association …
Continue reading “Making a publication-ready Kaplan-Meier plot in Stata”
Working with Stata regression results: Matrix/matrices, macros, oh my!
If you make your own Stata programs and loops, you have discovered the wonders of automating output of analyses to tables. Extracting the results from regressions in Stata can be a bit cumbersome. Here’s one step-by-step approach that you might find helpful. The set-up Let’s use the classic 1978 auto dataset that comes with Stata. …
Continue reading “Working with Stata regression results: Matrix/matrices, macros, oh my!”
Make a Table 1 in Stata in no time with table1_mc
What’s in a Table 1? Baseline demographic tables (colloquially known as ‘Table 1’ given their common location) are a core feature of nearly all epidemiologic manuscripts. The columns represent the exposure you are studying. The rows are characteristics of your population that are relevant to your research project. In placebo-controlled RCTs, the columns are drug …
Continue reading “Make a Table 1 in Stata in no time with table1_mc”