Making a horizontal stacked bar graph with -graph twoway rbar- in Stata

Making a horizontal stacked bar graph in Stata

I spent a bit of time making a variation of this figure today. (The data here are made up.) I’m pleased with how it came out. I first tried to use the -graph bar, horizontal- command, but it didn’t give me as much customization as -twoway graph rbar…, horizontal-. I think it looks pretty slick.

Start with an Excel file

I made an Excel file called stacked bar graph data.xlsx that I saved in the same folder as a .do file. I closed Stata and reopened that .do file from Windows explorer so that Stata set the working directory as the same folder that contains the .do file. More importantly, it set the working directory as the same folder that also contains the Excel file.

Group is the number of the individual bars, bottom is the bottom of the first segment of a bar, q1 is the top of the first segment of each bar. The rest should be obvious. I made this for quartiles, hence the q1-4 names. You can tweak the numbers by editing the Excel file, hitting save/close, and rerunning the .do file.

Make sure that you save and close Excel before running the .do file or the .do file won’t run and you will be sad. 

My .do file for making this horizontal stacked bar graph

Here’s my code! I hope it’s useful.

********************************************************************************
******************************IMPORT DATA HERE**********************************
********************************************************************************
import excel "stacked bar graph data.xlsx", firstrow clear // make sure that excel
//                                                           is closed before you
//                                                           run this script!
destring, replace

********************************************************************************
******************************CODE STARTS HERE**********************************
********************************************************************************
capture ssc install scheme-burd, replace // this installs nicer color schemes
// see the schemes here: https://github.com/briatte/burd
set scheme burd4 

graph twoway ///
(rbar bottom q1 group, horizontal) ///
(rbar q1 q2 group, horizontal) ///
(rbar q2 q3 group, horizontal) ///
(rbar q3 q4 group, horizontal) ///
, /// if you modify this file and it stops working, check the placement of this comma
xscale(log) /// make the x axis log scale
xla(0.25 "0.25" 0.5 "0.5" 1 "1" 2.5 "2.5" 5 "5" 10 "10" 25 "25" 50 "50" 100 "100") ///
yla(1 "Group 1" 2 "Group 2" 3 "Group 3" 4 "Group 4") ///
ytitle("Y title") ///
xtitle("X title") ///
text(4 1.2 "Q1", color(white)) /// first number is y second is x.
text(4 4.3 "Q2", color(white)) ///
text(4 20 "Q3", color(white)) ///
text(4 90 "Q4", color(white)) ///
legend(off) /// no legend, aspect is the shape of the figure. 1 is tall and thin.
aspect(.25) 
//
// export the graph as a PNG file
graph export "stacked graph.png", replace width(2000)
// graph export "stacked graph.tif", replace width(2000) // in case you want as a tiff

Downloading and analyzing NHANES datasets with Stata in a single .do file

Learn to love NHANES

NHANES is a robust, nationally-representative cross-sectional study. For the past ~18 years it sampled different communities across the US in 2 year continuous cycles. A few of these years are linked to National Death Index data, so you can assess risk factors at the time of the survey and use time-to-event mortality data to identify novel risk factors for death. Manuscripts using NHANES data have been published across the spectrum of medical journals, all the way up to NEJM. The best part? You can download almost all of NHANES data from the CDC website right now for free. (Here’s a list of all data variables available.)

Manipulating NHANES data is challenging for beginners because of the sheer quantity of individual files and requirement for weighting. Plus, all of the files are in SAS XPT format so you have to download, import, save, and merge before you can even think about starting an analysis. To make this data management task slightly more complex, the CDC sporadically publishes interval updates of the source data files on their website. Files may be updated for errors or removed entirely without you knowing about it. (I strongly recommend subscribing the the NHANES Listserv to get real-time updates.) If you have NHANES files saved locally from 2-3 years ago, there’s a reasonable chance that you are using outdated databases, which could yield some false conclusions. Re-downloading all of the many files every time you want to do a project is a big headache.

Leverage Stata’s internet connectivity to make NHANES analyses easy

I love that Stata will download datasets for you with just a URL. The .do file below shows you how easy it is to download just the needed files on the fly then do some simple analyses. This means that you don’t have to worry about maintaining your personal database of NHANES files. If the source files are updated by the CDC, no worry! Every time you run this .do file, it’ll grab the freshest data files available. If the source data files are removed for inaccuracies, the file won’t run and you’ll be prompted to investigate. For example, the 2011-2012 Folate lab results were withdrawn February 2018. If you tried to download the FOLATE_G file, CDC’s website will give their version of a 404 error and Stata will stop the .do file cold in its tracks.

What this .do file does

In this short script, you’ll see how to 1. Import the NHANES SAS XPT files directly from the CDC website with just the XPT file’s URL, 2. Save data as Stata .dta files, 3. Merge the .dta files, 4. Review basic coding issues, 5. Run an analysis using weighting, and 5. Display data.

In this example, we’ll look at the 2009-2010 NHANES results and apply weighting to estimate the amount of the US population who have been told that they have high blood pressure. Just copy/paste the code below and save into a .do file. Set the working directory to be in the same folder as your .do file. Or, copy/paste the .do file, save it, close Stata, open the .do file through Explorer, and then run! Opening the .do file from Windows Explorer with Stata closed sets the .do file’s parent folder to be Stata’s working directory.

What this .do file doesn’t

You won’t be able to merge files with multiple entries per users. For example, the Prescription Medications – Drug Information questionnaire has a row for each medication and ICD code its use for all participants. You’ll need to use the reshape wide command on those variables before merging. BTW, that code is: .bys seqn: gen j= _n [linebreak] .reshape wide rxddrug rxddrgid rxqseen rxddays, i(seqn) j(j)

This also won’t merge with the National Death Index files, which are hosted elsewhere. The NHANES-NDI linkage website provides an example Stata .do script that would be straightforward to include below.

Also, if you are trying to combine analyses from multiple NHANES cycles (say, combining 2009-2010 with 2011-2012), things get a bit more complicated. You’ll need to append .dta files and consider adjusting the weights.

Finally, if you are interested in using Frames functionality to “merge” a dataset (aka skip saving multiple dta files in the process of building an analytical dataset), then check out this post

// Link to all NHANES datasets: https://wwwn.cdc.gov/nchs/nhanes/default.aspx
// 1. Click on years of interest (e.g., 2009-2010).
// 2. Scroll down, click on data of interest (e.g., demographics).
// 3. Right click on the XPT data file and copy the URL.
// 4. Note that the DOC file containing an overview of the file is right there.
//    Take a peek at its contents and return with questions. 
//
**********************************************************************
*********************download the demographics!!**********************
**********************************************************************
// The demographics file contains the weights.
// You ALWAYS need the demographics file.
//
// Paste the URL for the demographics of interest below:
import sasxport5 "https://wwwn.cdc.gov/Nchs/Nhanes/2009-2010/DEMO_F.XPT", clear
// NOTE: if you are using Stata 15 or older, above should be "sasxport"
// and not "sasxport5". Ditto for all "sasxport" commands that follow. 
sort seqn // Sort the file in order of the unique identifiers. Not necessary,
//           but rarely having an unsorted file will cause analytic issues.
// Save as a Stata dataset
save "DEMO_F.dta", replace
//
**********************************************************************
*********************download other files*****************************
**********************************************************************
// Let's look at the "questionnaire" for "blood pressure & cholesterol"
// and "kidney conditions - urology"
//
// Lost? Go back to the link on the very first line of this .do file
// and click on the year of interest again (2009-2010), scroll down and
// click "questionnaire". 
//
// Paste the URL for the BP & cholesterol below:
import sasxport5 "https://wwwn.cdc.gov/Nchs/Nhanes/2009-2010/BPQ_F.XPT", clear
// Sort and save as an Stata dataset
sort seqn 
save "BPQ_F.dta", replace
//
// We aren't going to use the kidney file in this analysis, but just an
// example of how to merge a second dataset, copy/paste URL for kidney conditions
import sasxport5 "https://wwwn.cdc.gov/Nchs/Nhanes/2009-2010/KIQ_U_F.XPT", clear
// Sort and save as an Stata dataset
sort seqn
save "KIQ_U_F.dta", replace
//
**********************************************************************
*********************merge your datsets*******************************
**********************************************************************
clear all // clear all memory
//
// start with the demographics file
// ALWAYS INCLUDE THE DEMOGRAPHICS FILE SINCE IT HAS WEIGHTS
use "DEMO_F.dta", clear
//
// Merge demographics with bp & chol
// The unique identifier in this dataset is "seqn", it may be different in other
// NHANES datasets.
merge 1:1 seqn using "BPQ_F.dta"
drop  _merge // this is a merge variable that needs to be dropped prior to 
//              merging other datasets. This variable will be helpful 
//              in exploring causes of merging issues. 
// Merge with kidney conditions datafile
merge 1:1 seqn using "KIQ_U_F.dta"
drop  _merge
//
// you can now save as a merged data file if you want. 
save nhanesBPall.dta, replace
//
**********************************************************************
*********************do an analysis!!*********************************
**********************************************************************
// YOU NEED TO APPLY WEIGHTING.
// Fortunately, Stata makes this easy with built in survey commands.
// You need to tell it that it's survey data with the "svyset" command then use 
// specific functions designed for weighted analyses.
// To explore available these functions: 
// 1. Check out the drop-down menu in stata, statistics --> survey data
// 2. Stata 13 documentation about this code: 
//            https://www.stata.com/manuals13/svysvyset.pdf
// 3. A helpful video from Stata: https://www.youtube.com/watch?v=lRTl8GKsZTE
//
// USING SVYSET:
// For the more recent continuous NHANES ones, here are the variables needed for
// weighting:
//   weight for interview data: wtint2yr 
//   weight for laboratory (MEC) data: wtmec2yr 
//   Sampling units (PSU): sdmvpsu
//   Strata: sdmvstra
//   Single unit: there isn't one. 
// NOTE: Older NHANES datasets use different variables
//
// We are using interview data (questionnaires), so our command is
svyset sdmvpsu [pweight = wtint2yr], strata(sdmvstra) vce(linearized) singleunit(missing)
//
// DOING ANALYSES:
// Syntax: "svy: COMMAND var" - for a list of commands, type "help svy_estimation"
//
// The 4 basic survey descriptive commands:
// 1. mean
// 2. proportion
// 3. ratio
// 4. total
//
// there are a bunch of other commands, including logistic regression, etc. 
//
// let's see what the mean age was, the variable is "ridageyr"
svy: mean ridageyr
// you see that the mean age of the US population (301 million people) in 2009-2010 was
// 36.7 years. COOL!
//
// NEXT:
// Let's see the amount of people who have been told they have high blood pressure
// VARIABLE: bpq020
// First, look at the documentation for this variable on the source XPT file's DOC page
// https://wwwn.cdc.gov/Nchs/Nhanes/2009-2010/BPQ_F.htm
// note that this is coded as 1=yes, 2=no, 7=refused, 9=don't know, .=missing
//
// Only asked if age >=16, so not the entire 301 million US population will be here
//
// let's look at the responses given
svy: proportion bpq020
// RECODE:
// let's make a new variable with 0=no, 1=yes, .=all others
gen toldhtn0n1y =.
replace toldhtn0n1y=0 if bpq020==2 // no (you need a double equals sign after the if)
replace toldhtn0n1y=1 if bpq020==1 // yes
//                                    all others remain as =.
//
// let's see the population on BP meds
svy: proportion toldhtn0n1y
// SO: of the 234 million adults >16 years, 28% have been told that hey have htn
// A little program to spit out the results in english:
matrix htnansw = e(b) // e(b) is where the # of people is stored. Type "ereturn list" to see
//                       which matrices are available. Type "matrix list [name of matrix]" to 
//                       see content of each matrix. This "matrix htnansw = e(b)" command will
//                       save the temporary matrix for e(b) under the permanent matrix
//                       "htnansw" that we can then manipulate.
//
// If you typed "matrix list htnansw", you'd see that the proportion answering "yes"
// is saved in the second column of the first row of this matrix, or [1,2].
//
local yesbp = htnansw[1,2] // pluck out the value of the 1,2 cell in the saved matrix 
//                            (the yes proportion) as a macro to call later
//
// NOTE: You call macros by opening with the tick to the left of number 1 on your keyboard,
// writing the name of the macro, then closing with a traditional apostrophe.
// Read about macros here: https://www.ssc.wisc.edu/sscc/pubs/stata_prog1.htm
//
// save # of people who answered y/n (234 million)
matrix subpop = e(_N_subp) // pluck out the # of people in this population, aka the # of 
//                            americans >=16 years old, as a permanent matrix named "subpop"
//
local population = subpop[1,1] // make a macro plucking the 1,1 cell where the total
//                                # of americans are in this population
//
// how data can be presented:
di "Unrounded " "`yesbp'" // just to prove how you can present it. Note that yesbp is a macro.
di "Rounded " round(`yesbp'*100) // round at the decimal after mult by 100
di "Total population is " `population' // note exponent
di "Total population is " %18.0fc `population' // note no exponent but helpful commas
di "Among Americans >=16 years, " round(`yesbp'*100) "% have been told that they have high BP."
di "Among Americans >=16 years, " round(`yesbp'*`population') " Americans have been told they have high BP."
//
// NOTE: If you run this line-by-line, stata may drop the macros above.  
// Run the script from the very top if you are getting errors in these last few lines. 
//
// Fin.

ClipSpeak: The most user-friendly, simple text-to-speech app ever

I came across the excellent text-to-speech app, ClipSpeak, last week. It’s a freeware program that you can download here.

Installing and opening ClipSpeak

I extracted the .zip file to my desktop and created a shortcut on my Windows Taskbar for ClipSpeak. Open it up and you’ll notice that nothing happened. (You may notice a new yellow icon next to your clock.)

Using ClipSpeak

Open up anything with text. Yes, anything. A browser. A Word document. A .txt file. Highlight what you want read outloud can hit “copy” (Ctrl-C).

WHAT’S THAT SONOROUS NOISE?

Yes, that’s ClipSpeak reading you your copied text. You can right-click on the system icon of ClipSpeak (next to the clock, possibly buried under the ^ symbol) and increase/decrease the speed under Select Voice

Why I love this

This is the most simple program ever. There is no interface. It does what you want it to do. Want it to stop reading? Quit the program and open it back up to clear the audio playback.

MS Word’s new Read Aloud feature: Helpful for dyslexia and typo-finding

New in MS Word 365 is Read Aloud. It’s a pretty straightforward, stripped down/simplified text-to-speech (TTS) tool. If you have dyslexia or otherwise are having a difficult time finding typos in your work, try this feature. It’s very helpful.

To turn it on, just head over to the Review tab and click on the Read Aloud button.

It’ll automatically start reading from your cursor forward. Traditional icons for audio playback will float in the top right corner of your document window. These include play/pause, skip back (one paragraph), ad skip forward (one paragraph).

There is an icon of a dude with radio waves coming out of his head. That’s the settings button.

I like cranking the speed up by a few notches. You can also switch to other installed voices. I’m a fan of David’s sonorous speech. I’d totally pay a few bucks to have a Gilbert Gottfried voice though.

What if you have an older version of MS Word? Well, before this feature was released, I was using WordTalk v4.3. It’s a free, standalone app from the folks at the University of Edinburgh. It has some nice keyboard shortcuts that you can enable. The interface is very klunky though. In a pinch, it’ll do the trick.

2017-09-28 update

After using Read Aloud consistently for the past month, I still enjoy and recommend its use. However, I find it to be a bit glitchy with documents that have many co-author comments or embedded references from Zotero. In these documents, the playback will stop mid-word on occasion.

There are also some keyboard shortcuts for Read Aloud, documented here:

  • CTRL+Alt+Space – Start or quit Read Aloud
  • CTRL+Space – Play/pause
  • CTRL+Left arrow or CTRL+Right arrow – Skip back or forward a paragraph
  • Alt+Left or Alt+Right – Decrease or increase reading speed

2017-11-08 update

I received a new computer that ships with MS Office 2016, not 365. Read Aloud isn’t in Office 2016. It turns out that you can enable the older Text-to-Speech functionality in a pinch. If you don’t have the MS Speech Platform installed, you can download it here.

First, right click on the ribbon and select “customize the ribbon”.

Next, make a new group under the Review tab. I called mine Speak. Then under Choose Commands From, select All Commands. Scroll down to Speak and add it to your new group called Speak. Hit save.

You should now have a Speak button on your Review tab. Highlight what you want it to read and click Speak. Click it again to stop the reading.

You can adjust the reading speed under the Windows Control Panel –> Speech Recognition –> Text to Speech (on the left).

Generic start of a Stata .do file

I took the Stata programming class at the Johns Hopkins School of Public Health during grad school It was taught by Dorry Segev. If you are at the school, I highly, highly, highly recommend taking it and doing all of the assignments in term 4. It saved me many hours of labor in writing up my thesis. It’s a phenomenal class.

One of the biggest takeaways from the class was using a .do file as much as possible when interacting with Stata. As in 99% of the time.

Below is the stock header and footer of every .do file that I make. Steps to success:

  1. Open a blank .do file
  2. Paste the code from below
  3. Save it in the same folder as your dataset
  4. Close Stata
  5. In Windows File Explorer, find your new .do file and open it up then get rolling.

By opening the .do file through file explorer, Stata automatically knows which folder you are working in. Then you don’t have to write the entire directory to start. For example, you can write:

use data.dta, clear

…and not

use c:\windows\users\myname\work\research\001project\data\data.dta, clear


 
********************************************************************************
******************************HEADER STARTS HERE********************************
********************************************************************************
// at the beginning of every do file:
macro drop _all // remove macros from previous work, if any
capture log close // Close any open logs. Capture will ignore a command that gives 
//                   an error. So if there isn't an open log, instead of giving you 
//                   an error and stopping here, it'll just move onto the next line.
clear all // clean the belfries
drop _all // get rid of everything!

log using output.log, replace text // change the name of this to whatever you'd like

// The purpose of this .do file is... [say why you are writing this do file]

version 15 // Every version of Stata is slightly different, but all are backwards 
//            compatible with previous ones. If you open up this do file with a way 
//            newer version, it'll run it in version 14 compatibility mode. Change 
//            this to the current version of Stata that you are using. This will 
//            also keep your code from running on older versions of stata that will 
//            break with new code that it isn't designed to handle. 

set more off, permanently // so you don't have to keep clicking through stata to 
//                           keep it running

set linesize 255 // this keeps longer lines from getting clipped. Helpful for making 
//                  tables.

capture shell md pictures // this makes a folder called pictures in the Windows 
//                           version of stata. Save your pictures here.
capture shell mkdir pictures // ditto, except in the Mac version.

********************************************************************************
******************************IMPORT DATA HERE**********************************
********************************************************************************
* working with stata dta file: 
// use "data.dta", clear // change this with whatever data file you are using and 
//                        remove the double slashes. Note: putting quotes around 
//                        filenames lets you open files with spaces in the name.
* working with excel file: 
// CLOSE YOUR EXCEL FILE BEFORE TRYING TO OPEN IT OR STATA WILL STOP HERE.
// import excel using "name of file.xlsx", firstrow clear // firstrow imports 
//                       the first row of the sheet as variable. Change to the 
//                       appropriate name and delete the double lines as needed. 
* working with csv file:
// import delim "name of file.csv", varname(1) clear // varname imports the # 
//                       designated as the variables. 
******************************************************************************** 
******************************CODE STARTS HERE********************************** 
******************************************************************************** 
// ... you get the idea

********************************************************************************
******************************FOOTER STARTS HERE********************************
********************************************************************************
// At the very end of your .do file: 
log close
// Fin.

Keeping your digital work organized

I’m not an organized person by nature. However, I’m the kind of guy that likes to implement systems to overcome problems. In medical school, I didn’t have any sort of structure for my projects and found it frustrating to find the most recent versions of papers, posters, and other writings. Here’s an approach that I started using in fellowship that works for me.

Step 1: Save all of your work in one folder

This seems pretty obvious, but modern operating systems, cloud-based storage systems, and network drives all compete for your documents. If you are a windows user, I’m willing to bet that you have different projects saved on the Desktop, in your My Documents folder, in your cloud storage, and in your email in/outboxes. You probably also have alternate working drafts of your writings on different computers. This is a recipe for disaster.

Instead, make one folder that is accessible from your usual workstation. Call it Work, or something more creative. Save freaking everything in your Work folder from now until the end of time. 

Step 2: Make subfolders for different types of your work

Within your Work folder, make a subfolder for each flavor of project that you do. I have individual folders for GrantsEditorials, Non research (for blog posts and other miscellaneous work), and Research.

Step 3: Numerically and descriptively label each of your projects in its own sub-subfolder

Let’s say you are about to have your first meeting for a new research project studying the effects of Fro-yo vs. ice cream on cholesterol among adults. All of the relevant documents for your next research project will now be saved in the 001 froyo ice cream folder, seen here:

(You also had a good idea for a maple syrup study that you have also started simultaneously.)

What is the threshold to start a new numerically and descriptively labeled sub-subfolder? Mine is pretty low. If I think I have at least a marginally good idea and have started emailing folks about it, I go ahead and start a new folder. A PDF copy of those initial emails are often the first things that I save in that new numerically and descriptively labeled sub-subfolder. (I loathe MS Outlook but have to use it. Its search is terrible. Saving copies of important emails saves me some headache.)

What kinds of things are you going to save into these folders? That’s up to you. I recommend saving your literature searches, copies of relevant important emails, research proposals, and so on. I like to make sub-sub-subfolders with headings like Data and analysisManuscriptsPDFs (for relevant literature saved from your library’s journal subscriptions), and Presentations. You should probably number these folders to match the overall sub-subfolder number as well.

Step 4: Name each item within your sub-sub-subfolders starting with the a) sub-subfolder name, b) a short descriptor of the thing you are writing, and c) ending with a revision and version number 

Example: You finished all of your awesome dessert-related research and are starting to write up the main findings. You are going to start your first MS Word version of the first draft. I suggest naming it like this: “001 froyo ice cream main findings r00 v01.docx“.

Using version numbers

The r00 (that’s r zero zero) means that this particular draft has never undergone a major revision from a prior draft. The v01 (that’s v zero one) means that it’s the first version ever. You’ll email out a file titled:

001 froyo ice cream main findings r00 v01.docx 

…to your primary collaborator who will send back something with edits, they’ll probably save it as:

001 froyo ice cream main findings r00 v01 BHO edits.docx

…which is nice but doesn’t help with your naming structure. Immediately save that to Work\Research\001 froyo ice cream\Manuscripts\ as version 02, or:

001 froyo ice cream main findings r00 v02.docx

Accept or refuse their edits and save that version as:

001 froyo ice cream main findings r00 v03.docx

…then send this out to more collaborators. You’ll wind up going through somewhere between 5-20 (or more) manuscript versions before you are ready to submit to a journal. What’s nice about this system is that you’ll have no problems figuring out which version of a file you are getting edits on. Remember, at >5 collaborators, the probability of one co-author checking their email <1 time per month approaches 100%. You'll be working on the 10th version of your draft and receive an email from the slow reviewer with edits to version 03. Sticking to this scheme helps you figure out how to integrate their suggestions into the current version by tracking the intermediate changes.

Note: I suggest using two digits (v01-v99) for the revision and version numbers rather than starting with 1 digit (v1-v99). This helps keeping files in alphabetical-numerical order when sorting in your file explorer. Computers aren’t smart enough to figure out that v12 doesn’t come between v1 and v2.

Using revision numbers

For some reason, the New England Journal of Medicine doesn’t accept your manuscript for publication after sending it out for review. Lucky for you, it was sent out for review and the reviewers gave you some high-quality feedback that will improve your paper. Now it’s time to start revision 01! When prepping it for the inevitable JAMA submission, you’ll be working starting on revision 01 and version 01, or:

001 froyo ice cream main findings r01 v01.docx 

In the end, you’ll have a folder looking like this:

Look at how pristine and well-organized this is!

Step 4.5: Use trailing letters for projects with multiple publications

Let’s say that you decide to write both a main findings manuscript and a subgroup analysis manuscript. How to stay organized? Just the 001 Manuscripts folder into two sub-sub-sub subfolders for each publication, with 001a for the main findings and 001b for the subgroup analysis. It’ll look like this:

The naming scheme changes slightly as filenames will start with 001a instead of 001, like this:

Step 5: For the love of all things decent, regularly back up your Work folder!!

I can’t stress this enough. Keeping your Work folder in a commercial cloud storage drive doesn’t count as backing things up. Dropbox only saves things for 30 days in their basic plan. If you aren’t actively backing up your data, you can safely assume it isn’t backed up. One dead hard drive in a non-backed up folder can mean months or years of lost work.

Remember to check your institution’s policies for determining your backup method of choice. If you are working with PHI, saving it on an unencrypted thumb drive on your keychain won’t cut it. If it’s okay with your data policies, I recommend buying a DVD-R spindle or a bunch of cheap thumb drives and setting a calendar date to write your Work folder to a different DVD-R or different thumb drive once per month (i.e., you should have this on a different thumb drive each month). Keep these in a locked, secured, fireproof area that is consistent with the policies of your organization. Better yet, keep them in multiple locked, secured, fireproof areas if possible. For bonus points, get an M-Disc DVD discs and compatible burner, which should be stable for >1,000 years, unlike the usual 10-20 years of conventional burnable DVD-Rs and CD-Rs.

Or, just speak with your friendly neighborhood IT professional about how your institution recommends backing up the data you are working with. Chances are that they have a server that they back up 3 times per day that you can just use for a modest price. It’ll be worth the sorrow of lost data and drafts.

Tomighty: The Java-powered Pomodoro app

I’m a big fan of Tomighty. I have installed and used it extensively for the past year. It’s a simple timer for use with the Pomodoro technique. It helps maximize productivity and focus.

How I use the Pomodoro technique:

  • Pick up a kitchen timer (like the ones crafted to resemble a tomato, or pomodoro in Italian).
  • Get one sticky note and a pen. Stick the note to the right side of your desk. On the top, write “Pomodoros:”, a line down write “To do:”
  • Close your email, put your phone on silent, and get away from all distractions.
  • Set the timer for one Pomodoro (25 min).
  • Work non-stop for that 25 minute span.
    • If a fleeting thought is vying to distract you during that timeframe (‘send this quick email‘, ‘remember to order some more pens‘, ‘double check that that meeting made it to my calendar‘, or ‘send that cute picture of my kid to grandma‘), jot it down on a sticky note for later.
  • When your 25 min is up, count yourself one Pomodoro on the top of your sticky note.
  • Set your timer for inter-Pomodoro time (5 min).
    • Do the simple tasks on your sticky note.
    • Say “BAM” every time you check one of those boxes on your to-do list.
      • Don’t worry about getting through them all in this 5 min span.
  • When that 5 min is up, start your next Pomodoro.

The idea is to maximize the amount of Pomodoros you do in one day, not the duration of time that you work. Aim to get 3 your first day. Try to get up to 8 a week later. Only have an hour of time to work on one day? Do two Pomodoros.

With this technique, you stop counting minutes and hours and instead focus on bite-sized Pomodoros. As the day goes on and you rack up Pomodoros, you have fewer distracting thoughts and more productive time. For me, watching the clock causes a mild, counterproductive agitation. I don’t have this response counting Pomodoros.

This system helps you optimize your 1. Productivity requiring extended focus (writing papers, etc.), and 2. Productivity requiring little focus of silly things that you are putting off for later (quick emails, etc.).

In my opinion, the only drawback of the Pomodoro technique is the kitchen timer. It’s not quiet. Unless you have an office with a door and walls with soundproofing or work from home by yourself, this timer will cause problems with neighbors. Its not-so-quiet countdown/clicking and jarring ring will annoy folks in libraries, coffee shops, or cubicle communities.

Tomighty is a Java-based applet that replicates the tomato timer, written by Célio Cidral Junior. It has a lovely countdown click and time’s-up ring. It’ll keep track of your Pomodoros. It’s free and open source and should work on any computer (PC, Mac, Linux) with Java. Though, not the stock version of Java. I have struck out many times trying to figure out the correct version of Java to install and use. Every time I figure it out, I say “I should write this down.” Then I don’t. Fortunately, there is beta version written in C++ for Windows that doesn’t require Java. Just save the folder on your desktop and run the executable.

Table 1 program

July 2019 update

I have since discovered the excellent table1_mc program and have abandoned using this program. I’m keeping this code here for historical sake. Check out my write-up of table1_mc.

Stata programs to make tables

Here are a few simple Stata programs that will write a CSV file for your Table 1.

This has three parts:

  1. Header (writes the first line of the table with names and the row with Ns),
  2. Program for continuous variables, and
  3. Program for dichotomous variables.

Each time you want to add another line your table, just call the appropriate program followed by the variable of interest.

clear all // get rid of everything in memory
webuse auto.dta, clear

set seed 12345
gen treatment = .
replace treatment = round(runiform()) // make a random treatment 
// variable that's 0 or 1

// Header + second row with Ns
quietly {
capture log close table1 // force closes any tables with the same name
log using "my_table_1.csv", text replace name(table1) //replace will erase 
// any CSV files you started already with the same name
noisily disp ",All,Group 1,Group 2" // line 1
count
local nall=r(N)
count if treatment==0
local ntreatment0=r(N)
count if treatment==1
local ntreatment1=r(N)
noisily disp "N," `nall' "," `ntreatment0' "," `ntreatment1'
log close table1
}

// Program for continuous variables
capture program drop table1_cont // drops any programs with the same name
program define table1_cont
quietly {
syntax varlist
capture log close table1
log using "my_table_1.csv", text append name(table1) // append will 
// keep writing onto existing tables
foreach var of varlist `varlist' {
sum `var'
local `var'mean = r(mean)
local `var'sd = r(sd)
local `var'n=r(N)
sum `var' if treatment==0
local `var'mean0 = r(mean)
local `var'sd0 = r(sd)
local `var'n0=r(N)
sum `var' if treatment==1
local `var'mean1 = r(mean)
local `var'sd1 = r(sd)
local `var'n1=r(N)

noisily disp "`var' (Mean (SD))," ///
%3.1f ``var'mean' " (" %3.1f ``var'sd' "),"  ///
 %3.1f ``var'mean0' " (" %3.1f ``var'sd0' "),"  ///
 %3.1f ``var'mean1' " (" %3.1f ``var'sd1' ")"
} // end varlist loop
log close table1
} // end quietly
end

// program for dichotomous variables
capture program drop table1_dichotomous
program define table1_dichotomous
quietly {
syntax varlist
capture log close table1
log using "my_table_1.csv", text append name(table1)
foreach var of varlist `varlist' {
sum `var'
local `var'n= r(N)
local `var'mean = r(mean)*100
sum `var' if treatment==0
local `var'n0 = r(N)
local `var'mean0 = r(mean)*100
sum `var' if treatment==1
local `var'n1 = r(N)
local `var'mean1 = r(mean)*100
noisily disp "`var' (N (%))," ///
``var'n' " (" %3.1f ``var'mean' "),"  ///
``var'n0' " (" %3.1f ``var'mean0' ")," ///
``var'n1' " (" %3.1f ``var'mean1' ")" 
} 
log close table1 
} 
end 
// now just call these programs as needed: 

table1_cont trunk 
table1_cont weight 
table1_dichotomous foreign 
// and so-on

Here is the output from the example above:

,All,Group 1,Group 2
N,74,43,31
trunk (Mean (SD)),13.8 (4.3),13.0 (3.9),14.8 (4.6)
weight (Mean (SD)),3019.5 (777.2),2906.0 (746.1),3176.8 (804.1)
foreign (N (%)),74 (29.7),43 (32.6),31 (25.8)

If opened with MS Excel, it will look like this: