Figure to show the distribution of quartiles plus their median in Stata

Buried in the supplement of a recent paper is a variant of this figure that I’m rather proud of:

It shows the distribution of quartiles of BNP and NT proBNP at baseline on a log scale, by use of beta blockers (BB) at baseline. It also shows the midway point of the medians. It’s a nice figure that shows the increase of BNP with beta blocker administration. My colleagues jokingly called it the “Plante Plot” since I have had it included in several drafts of manuscripts, this is just the first one in which it was published.

The code for it is pretty complex and follows. Steps 1-3 pluck out the ranges of the quartiles and their midpoints for each group and saves them as a CSV file. Steps 4-8 render the figure. You may find it more simple to skip Steps 1-3 and manually enter the ranges of the quartiles and their medians into an Excel file and just open up that excel file in Step 4.

Alternatively, I have a do file that automates this a bit. Just type:

distplot [xaxis variable] [yaxis group name]

Good luck!

// Step 1: load the database
use "029b1b analytic datset.dta", clear

// Step 2: You need quartiles plus the intermediate points of
// each quartile, which is really 8-iles.
// Step 2a: 
// You need the maximum and minimum variables to draw the bounds
// of the bottom and top quartile
// this is by group, here's the first group:
sum baseline_bnp if efstratus50==1 & baselinebb0n1y==0, d
return list // see that r(min) and r(max) are the points needed
// save min and max as macros for 0th bound and 0th bound
local bnp8ile_nobb_0=r(min) // beginning of q1
local bnp8ile_nobb_8=r(max) // end of q4
// Step 2b: 
// now get intermediate bounds for the 8-iles
_pctile baseline_bnp  if efstratus50==1  & baselinebb0n1y==0, percentiles(12.5(12.5)87.5) // this gets bounds by 12.5iles
return list // there they are!
local bnp8ile_nobb_1=r(r1) // middle of q1
local bnp8ile_nobb_2=r(r2) // end of q1/beginning of q2
local bnp8ile_nobb_3=r(r3) // middle of q2
local bnp8ile_nobb_4=r(r4) // end of q2/beginning of q3
local bnp8ile_nobb_5=r(r5) // middle of q3
local bnp8ile_nobb_6=r(r6) // end of q3/beginning of q4
local bnp8ile_nobb_7=r(r7) // middle of q4
// now come up with a label to eventually apply to the figure
// don't use commas in this label since we'll save this
// output as a CSV file and commas will screw up the cell
// structure of a CSV (C=comma)
// step 2c: 
local label_bnp_nobb="Baseline BNP; -BB"

// now repeat for the other groups
sum baseline_bnp if efstratus50==1 & baselinebb0n1y==1, d
return list
local bnp8ile_bb_0=r(min)
local bnp8ile_bb_8=r(max)
_pctile baseline_bnp  if efstratus50==1  & baselinebb0n1y==1, percentiles(12.5(12.5)87.5)
return list
local bnp8ile_bb_1=r(r1)
local bnp8ile_bb_2=r(r2)
local bnp8ile_bb_3=r(r3)
local bnp8ile_bb_4=r(r4)
local bnp8ile_bb_5=r(r5)
local bnp8ile_bb_6=r(r6)
local bnp8ile_bb_7=r(r7)
local label_bnp_bb="Baseline BNP; +BB"

sum baseline_ntprobnp if efstratus50==1 & baselinebb0n1y==0, d
return list
local ntprobnp8ile_nobb_0=r(min)
local ntprobnp8ile_nobb_8=r(max)
_pctile baseline_ntprobnp  if efstratus50==1  & baselinebb0n1y==0, percentiles(12.5(12.5)87.5)
return list
local ntprobnp8ile_nobb_1=r(r1)
local ntprobnp8ile_nobb_2=r(r2)
local ntprobnp8ile_nobb_3=r(r3)
local ntprobnp8ile_nobb_4=r(r4)
local ntprobnp8ile_nobb_5=r(r5)
local ntprobnp8ile_nobb_6=r(r6)
local ntprobnp8ile_nobb_7=r(r7)
local label_ntprobnp_nobb="Baseline NT proBNP; +BB"

sum baseline_ntprobnp if efstratus50==1 & baselinebb0n1y==1, d
return list
local ntprobnp8ile_bb_0=r(min)
local ntprobnp8ile_bb_8=r(max)
_pctile baseline_ntprobnp  if efstratus50==1  & baselinebb0n1y==1, percentiles(12.5(12.5)87.5)
return list
local ntprobnp8ile_bb_1=r(r1)
local ntprobnp8ile_bb_2=r(r2)
local ntprobnp8ile_bb_3=r(r3)
local ntprobnp8ile_bb_4=r(r4)
local ntprobnp8ile_bb_5=r(r5)
local ntprobnp8ile_bb_6=r(r6)
local ntprobnp8ile_bb_7=r(r7)
local label_ntprobnp_bb="Baseline NT proBNP; -BB"

// Step 3: save this to a csv file that we'll open up right away.
// Note: This code goes out of frame on my blog. copy and paste it 
// into a .do file and it'll all appear. 
quietly {
capture log close bnp 
log using "bnprangefigure.csv", replace text name(bnp)
// this is the row of headers:
noisily di "row,label,eight0,eight1,eight2,eight3,eight4,eight5,eight6,eight7,eight8" 
// row 1:
noisily di "1,`label_bnp_nobb',`bnp8ile_nobb_0',`bnp8ile_nobb_1',`bnp8ile_nobb_2',`bnp8ile_nobb_3',`bnp8ile_nobb_4',`bnp8ile_nobb_5',`bnp8ile_nobb_6',`bnp8ile_nobb_7',`bnp8ile_nobb_8'"
// row 2
noisily di "2,`label_bnp_bb',`bnp8ile_bb_0',`bnp8ile_bb_1',`bnp8ile_bb_2',`bnp8ile_bb_3',`bnp8ile_bb_4',`bnp8ile_bb_5',`bnp8ile_bb_6',`bnp8ile_bb_7',`bnp8ile_bb_8'"
// blank row 3:
noisily di "3"
// row 4:
noisily di "4,`label_ntprobnp_nobb',`ntprobnp8ile_nobb_0',`ntprobnp8ile_nobb_1',`ntprobnp8ile_nobb_2',`ntprobnp8ile_nobb_3',`ntprobnp8ile_nobb_4',`ntprobnp8ile_nobb_5',`ntprobnp8ile_nobb_6',`ntprobnp8ile_nobb_7',`ntprobnp8ile_nobb_8'"
// row 5:
noisily di "5,`label_ntprobnp_bb',`ntprobnp8ile_bb_0',`ntprobnp8ile_bb_1',`ntprobnp8ile_bb_2',`ntprobnp8ile_bb_3',`ntprobnp8ile_bb_4',`ntprobnp8ile_bb_5',`ntprobnp8ile_bb_6',`ntprobnp8ile_bb_7',`ntprobnp8ile_bb_8'"
log close bnp

// step 4: open CSV file as active database:
import delim using "bnprangefigure.csv", clear
// note, you may opt to skip steps 1-3 and manually compile the 
// ranges of each quartile and their median into an excel file.
// Use the -import excel- function to open that file up instead.
// IF YOU SKIP OVER STEPS 1-3, your excel file will need the 
/// following columns:
// row - each group, with a blank row 3 to match the figure
// label - title to go to the left of the figure
// eight0 through eight8 - the even numbers are ranges of the
//    quartiles and the odd numbers are the mid-ranges.
//    See my approach in steps 2a-2b on how to get these numbers.

// step 5: steal the labels. note skipping row 3 since it's blank
local label1=label[1]
local label2=label[2] 
local label4=label[4] // NO ROW 3!!
local label5=label[5]

// step 6: pluck the intermediate points of each quartile
// which are 8-iles 1, 3, 5 and 7
// and repeat for each row
local bar1row1=eight1[1]
local bar2row1=eight3[1]
local bar3row1=eight5[1]
local bar4row1=eight7[1]

local bar1row2=eight1[2]
local bar2row2=eight3[2]
local bar3row2=eight5[2]
local bar4row2=eight7[2]

// no row 3 in this figure

local bar1row4=eight1[4]
local bar2row4=eight3[4]
local bar3row4=eight5[4]
local bar4row4=eight7[4]

local bar1row5=eight1[5]
local bar2row5=eight3[5]
local bar3row5=eight5[5]
local bar4row5=eight7[5]

// step 7: pick a different scheme than the default stata one
// I like s1mono or s1color
set scheme s1mono

// step 8: complex graph.
// stata likes to drop the macros ("local" commands) and 
// the things inside of the ticks will be missing if you 
// just run starting at the "graph twoway" below
// step 8a: rbar the ends of the quartiles, which is:
// 0 to 2, 2 to 4, 4 to 6, and 6 to 8
// step 8b: apply the labels
// step 8c: place a vertical bar at the midpoints of the
// quartiles, which are at: 1, 3, 5, and 7. A bug in Stata
// is that a centered label (placement(c)) is actually a smidge
// south still, so the rows are offset by 0.13. You'll notice
// the Y label in the text box is row value minus 0.13 (0.87, etc.)
// to account for that.
// step 8d: adjust the aspect ratio to get the bar character ("|") 
// to fit within the width of the the bar itself.

graph twoway /// step 8a:
(rbar eight0 eight2 row , horizontal) /// 
(rbar eight2 eight4 row , horizontal) ///
(rbar eight4 eight6 row , horizontal) ///
(rbar eight6 eight8 row , horizontal) ///
, ///
yscale(reverse) ///
xscale(log) ///
t2title("Quartiles of BNP or NT-proBNP, EF ≥50%", justification(center)) /// 
xla(1 "1" 10 "10" 100 "100" 1000 "1000" 10000 "10000" 20000 "20000", angle(45)) /// step 8b:
yla(1 "`label1'" 2 "`label2'" 4 "`label4'" 5 "`label5'", angle(horizontal)) ///
ytitle(" ") ///
xtitle("BNP/NTproBNP Range (Log Scale)") ///
legend(off) ///
/// step 8c:
text(0.87 `bar1row1' "|", color(white) placement(c)) ///
text(0.87 `bar2row1' "|", color(white) placement(c)) ///
text(0.87 `bar3row1' "|", color(white) placement(c)) ///
text(0.87 `bar4row1' "|", color(white) placement(c)) ///
text(1.87 `bar1row2' "|", color(white) placement(c)) ///
text(1.87 `bar2row2' "|", color(white) placement(c)) ///
text(1.87 `bar3row2' "|", color(white) placement(c)) ///
text(1.87 `bar4row2' "|", color(white) placement(c)) ///
text(3.87 `bar1row4' "|", color(white) placement(c)) ///
text(3.87 `bar2row4' "|", color(white) placement(c)) ///
text(3.87 `bar3row4' "|", color(white) placement(c)) ///
text(3.87 `bar4row4' "|", color(white) placement(c)) ///
text(4.87 `bar1row5' "|", color(white) placement(c)) ///
text(4.87 `bar2row5' "|", color(white) placement(c)) ///
text(4.87 `bar3row5' "|", color(white) placement(c)) ///
text(4.87 `bar4row5' "|", color(white) placement(c)) ///
/// step 8d:

Making a publication-ready Kaplan-Meier plot in Stata

This post is a bit dated but is a nice primer on KM curves. For a more recent one that includes printing the HR results on the KM curve, see this post.

In the early Winter of 2019, we had a paper published in JAMA: Network Open using the TOPCAT trial dataset looking at association between beta-blocker use at baseline and incident heart failure admissions. We obtained the data from the NIH/NHLBI BioLINCC repository. This is an incredible resource for datasets. With a quick application, IRB, and DUA, you can get world-class datasets from landmark clinical trials like ACCORD, SPRINT, TOPCAT, etc..

In this analysis we needed to put together a Kaplan-Meier plot for Figure 2 (sometimes called a survival plot). Well, technically it’s a cumulative incidence plot since the line starts a 0% and creeps up as events happen rather than starting at 100% and dropping down as events happen.

Features of this plot that are somewhat unique:

  • Lines are not only different colors, one is dashed and one is solid. This is key to avoiding color publication costs in some journals as it can be made greyscale and folks can still figure out which line is which. Even if there wasn’t an issue with color costs, including differing line patterns helps folks who may be colorblind. This is done with the -plot1opts- and -plot2opts- commands.
  • Text label for Log-Rank!
  • Bonus example of how to use italics in text. It’s {it:YOUR TEXT}. This could also have been bold using {bf:YOUR TEXT} instead.
  • Survival table that lines up with the years on the x axis!
  • Pretty legend on 2 rows with custom labels.

Of course, the JAMA folks remade my figure for the actual publication. Womp, womp. I still like how it came out.

What the figure looks like

Code to make this figure

// STEP 1: tell stata that it's time to event data. 
// Use the -stset- command. Syntax:
// stset [time], failure([thing that is 0 or 1 for the event of outcome]) ...
// id([participant id, useful if doing time varying covariates]) ...
// scale([time scale, here 365.25 days=1 year)

stset days_hfhosp, f(hfhosp) id(id) scale(365.25)

// STEP 2: Set the color scheme.
// I don't like the default stata color scheme.
// Try s1color or s1mono

set scheme s1color 

// STEP 3: Here's the actual figure!

sts graph if efstratus50==1 ///
, /// this was a subset of patients with EF >=50%
fail /// this makes the line starts at the bottom of the Y axis instead of the top
by(baselinebb0n1y) /// this is the variable that defines the two groups, bb use at baseline or none
title("Cumulative Heart Failure Hospitalizations") /// Title!
t1title("Among TOPCAT Participants with EF ≥50%") /// subtitle!
xtitle("Year of Follow-Up") /// x label!
ytitle("Cumulative Incidence") /// y label!
yla(0 "0%" .25 "25%" .5 "50%" .75 "75%" 1 "100%", angle(0)) /// Y-label values! Angle(0) rotates them.
xla(0(1)6) /// X-label values! From 0 to 6 years with 1 year intervals in between
text(0.63 3 "Log-Rank {it:p}<0.001", placement(e) size(medium)) /// floating label with italics!
legend(order(1 "No Beta-Blocker" 2 "Beta-Blocker") rows(2)) /// Legend, forced on two rows
plot1opts(lpattern(dash) lcolor(red)) /// this forces the first line to be dashed and red
plot2opts(lpattern(solid) lcolor(blue)) /// this forces the second line to be solid and blue
risktable(0(1)6 , size(small) order(1 "No Beta-Blocker" 2 "Beta-Blocker")) // the numbers under the X axis

// STEP 4: Export the graph as a tiny PNG file for the draft and 
// tif file to upload with the manuscript submission. 
graph export "Figure 2.png", replace width(2000)
graph export "Figure 2.tif", replace width(2000)