30
Lookarounds
114
Giventhislistofcitiesandpostalcodes: TODONewYork,NY10006
ConvertittothisCSVformat:
TODO
“Hey,”youmightsay,“there’salreadyacommainthatdata.”True,butit’sjusttypicalpunctuation.
IfweweretoopenthislistinExcel,wewouldendupwith:
TODO:
Soweneedtouseasimpleregextoatleastseparatethestatefromthezipcode.
Answer
Find (.+?),([A-Z]{2})(\d{5})
Replace \1,\2,\3
Exercise: Morecomplexaddresses
Believeitor not, theeasily-fixedscenario aboveis onethatI’veseen keeppeoplefrom making
perfectlyusable,explorabledataoutoftext.
However,formostkindsoftextlists,thecleanupisalittlemoresophisticatedthanoneextracomma.
Here’sanexampleinwhichwehavetodealwithstreetnamesandaddresses:
50 Fifth Ave. New York, NY 10012
100 Ninth Ave. Brooklyn, NY 11416
9 Houston St. Juneau, AK 99999
2800 Springfield Rd. . Omaha, , NE 55555
Changeto:
50,Fifth Ave.,New York,NY,10012
100,Ninth Ave.,Brooklyn,NY,11416
9,Houston St.,Juneau,AK,99999
2800,Springfield Rd.,Omaha,NE,55555
Answer
Thisissimplybreakingeachpartofthelineintoitsownseparatepattern:
1. Streetnumber: consecutivedigitsatthebeginningoftheline
2. Streetname: Acombinationofwordcharactersandspacesuntilaliteralperiodisreached.