Using Stata and Graphviz to make social network graphs and hierarchical graphs

I recently had to make a figure that showed relationship between variables. I tried a few different software packages and ultimately decided that Graphviz is the easiest. Thanks to dreampuf for their Graphviz online program! I used this web-based implementation and didn’t have to install Graphviz on my computer. So for this, we’ll be using this online Graphviz package: https://dreampuf.github.io/GraphvizOnline

I wrote a basic Stata script that inputs data and then outputs Graphviz code that you can copy and paste right into the Graphviz website above. (I strongly strongly strongly recommend saving your Graphviz code locally on your computer as a text file. On Windows I use Notepad++. Don’t save this code in a word processor because it will do unpredictable things to the quotes.) You can then tweak the settings of the outputted Graphviz code to your liking. See all sorts of settings in the left-sided menu here: https://graphviz.org/docs/nodes/

Originally I wanted this to be a network graph (“neato”) but ultimately liked how it looked best in the hierarchical graph (“dot”). You can change between graph types using the engine dropdown menu on the top right of the Graphviz online website. You can also change the file type to something that journals will use, like PNG, on the top right.

Code to output Graphviz code from your Stata database

Run the following in its entirety from a Stata do file.

clear all // clear memory
//
// Now input variables called 'start' and 'end#'.
// 'start' is the originating node and 'end#' is
// every node that 'start' connects to.
// If you need additional 'end#' variables, just add them
// using strL (capital L) then the next 'end#' number.
// In this example, there's 1 'start' and 4 'end#'
// so there are 5 total columns. 
input strL start strL end1 strL end2 strL end3 strL end4
"ant" "bat" "cat" "dog" "fox"
"bat" "ent" "" "" ""
"cat" "ent" "fox" "" ""
"dog" "ent" "fox" "" ""
end // end input
// 
// The following code reshapes the data from wide
// to long and drops all subsequent blank variables
// from that rotation (if any). 
gen row = _n //make row variable
reshape long end, i(row) j(nodenum) // reshape
drop if end=="" // drop empty cells
keep start end // just need these variables
//
// The following loop renders the current Stata
// dataset as Graphviz code. Since this uses loops
// and local macros, it needs to be run all at once 
// in a do file rather than line by line. 
local max = _N
quietly {
	// Start of graphviz code:
	noisily di ""
	noisily di ""
	noisily di ""
	noisily di ""
	noisily di "// Copy everything that follows"
	noisily di "// and paste it into here:"
	noisily di "// https://dreampuf.github.io/GraphvizOnline"
	noisily di "digraph g {"
	//
	// This prints out the connection between
	// each 'start' node and all connected
	// 'end#' nodes one by one.
	forvalues n = 1/`max' {
		noisily di start[`n'] " -> " end[`n'] ";"
	}
	//
	// Global graph attributes follows. 
	// "bb" sets the size of the figure
	// from lower left x, y, then upper right x, y.
	// There are lots of other settings here: 
	// https://graphviz.org/docs/graph/
	// ...if adding more, just add between the final
	// comma and closing bracket. If adding several
	// additional settings here, separate each with 
	// a comma. 
	// Note that this has an opening and closing tick
	// so the quotes inside print like characters
	// and not actual stata code quotes.
	noisily di `"graph [bb="0,0,100,1000",];"' 
	//
	// The next block generates code to render each
	// node. First, we need to reshape long(er) so that
	// all of the 'start' and 'end#' variables are all
	// in a single column, delete duplicates, and 
	// sort them. 
	rename start thing1
	rename end thing2
	gen row = _n
	reshape long thing, i(row) j(nodenum) 
	keep thing
	duplicates drop
	sort thing
	//
	// Now print out settings for each node. These
	// can be fine tuned. Lots of options for 
	// node formatting here: 
	// https://graphviz.org/docs/nodes/
	local max = _N
	forvalues n= 1/`max' {
		noisily di thing[`n'] `" [width="0.1", height="0.1", fontsize="8", shape=box];"'
	}
	// End of graphviz code: 
	noisily di "}"
	noisily di "// don't copy below this line"
	noisily di ""
	noisily di ""
	noisily di ""
	noisily di ""
}
// that's it!

The above Stata code prints the following Graphviz code in the Stata output window. This code can be copied/pasted to the Graphviz website linked above. (Make sure to save a backup of this Graphviz code as a txt file on your computer!!) Make sure your Stata screen is full size before running the above Stata code or it might insert some line breaks that you have to manually delete since the output width is (usually) determined by the window size. Also, if your node settings get long, it’ll also insert line breaks that you’ll have to manually delete.





// Copy everything that follows
// and paste it into here:
// https://dreampuf.github.io/GraphvizOnline
digraph g {
ant -> bat;
ant -> cat;
ant -> dog;
ant -> fox;
bat -> ent;
cat -> ent;
cat -> fox;
dog -> ent;
dog -> fox;
graph [bb="0,0,100,1000",];
ant [width="0.1", height="0.1", fontsize="8", shape=box];
bat [width="0.1", height="0.1", fontsize="8", shape=box];
cat [width="0.1", height="0.1", fontsize="8", shape=box];
dog [width="0.1", height="0.1", fontsize="8", shape=box];
ent [width="0.1", height="0.1", fontsize="8", shape=box];
fox [width="0.1", height="0.1", fontsize="8", shape=box];
}
// don't copy below this line






Example figures from outputted code above using the different Graphviz engines

Clicking the dropdown in the top right “engine” toggles between the figures below. You can learn more about these here: https://graphviz.org/docs/layouts/

Not shown below are “nop” and “nop2” which don’t render correctly for unclear reasons. Some of these will need to be tweaked to be publication quality, some of them frankly don’t work with this dataset. For this made up code, I think dot and neato look great!

Dot (hierarchical or layered drawing of directed graphs, my favorite for this project):

Neato (a nice network graph, called “spring model” layout):

Circo, aka circular layout:

fdp (force-directed placement):

sfdp (scalable force-directed placement):

twopi (radial layout):

osage (clustered graphs):

Patchwork (clustered graph using squarified treemap):