Extracting numbers from strings in Excel

Stata’s great at taking raw numbers and chugging out graphs with minimal edits. Often times you’ll get results that aren’t raw numbers, but instead will exist as a string. Instead, of getting:

…you’ll get

My previous strategy has been to manually extract these numbers into rows and columns. I just came across these two pages of Excel strategies that will pluck the numbers in the second picture and produce the first one!

Step 1: Split strings into separate columns.

Reference: https://support.office.com/en-us/article/split-text-into-different-columns-with-functions-49ec57f9-3d5a-44b2-82da-50dded6e4a68

Here, we’ll split the string at the spaces. There are two spaces, so you’ll end up with three new variables. First, make three new ‘temp’ columns to the right. You’ll need a different code for each section’s destination cell.

Left bit, or “0.80”:

=LEFT(A2, SEARCH(" ",A2,1))

Middle bit, or “(0.70,”:

=MID(A2,SEARCH(" ",A2,1)+1,SEARCH(" ",A2,SEARCH(" ",A2,1)+1)-SEARCH(" ",A2,1))

Right bit, or “0.90)”:

=RIGHT(A2,LEN(A2)-SEARCH(" ",A2,SEARCH(" ",A2,1)+1))

This assumes that your string of interest is sitting cell A2. Change this cell reference as needed. If all goes well, you should have new cells that look like this:

Step 2: Pluck numbers from strings.

Reference: https://www.ablebits.com/office-addins-blog/2017/11/22/excel-extract-number-from-string/

Welp, there’s still some non-numeric text here. Time to pluck out the raw numbers! We’ll pretend that the ‘hrtemp’ cell also has a non-numerical character in it (e.g., a percent sign) for completeness’ sake. (Excel actually considers it a string still, which is why it’s not showing as “0.8”). Make 3 new rows to the right and use this code to extract the raw numbers contained in the string:

=(SUMPRODUCT(MID(0&B2, LARGE(INDEX(ISNUMBER(--MID(B2, ROW(INDIRECT("1:"&LEN(B2))), 1)) * ROW(INDIRECT("1:"&LEN(B2))), 0), ROW(INDIRECT("1:"&LEN(B2))))+1, 1) * 10^ROW(INDIRECT("1:"&LEN(B2)))/10))/100

…obviously, you’ll need to change the B2 cell to C2 and D2 as needed. You should get:

Boom! You should be able to copy to strings in subsequent rows by hovering over the bottom right of each cell and dragging down. Now you can “import excel” in stata and use your hr, low95, and high95 variables!