Friday, August 14, 2009

Finding email addresses in a text file

I got stuck with an project recently which entailed pulling email addresses out of a flat text file.
This request surprisingly more tricky than I thought because of all of the variations of acceptable email addresses. Anyway I ended up finding something out on the net that did most of what I needed so I'm posting it here in case anybody else needs something similar. However, I do recommend eyes on data verification because there are a few cases problems with it. I'll put tome more time in on this one later on, but for simple jobs this seems to get it done.


#!/bin/bash
## This is a quickie script to pull emaill addresses out of a flat text file.
## However, there are some short comings and bugs in it that still need to
## to be fixed
## 1) if the email host has more than 1 dot in the name the second dot
## and it everything after it get lost. Such as foo@tampabay.rr.com
## whatever comes after the .rr is there
## 2) somtimes the recipient gets mangled. Haven't quite figured out
## the pattern to this bug, but it will reuire visual inspection
##

egrep -o "\w+([._-]\w)*@\w+([._-]\w)*\.\w{2,4}"



As I said I'm working a better one of these and I'll post it when I can figure one out.

1 comment:

erandasapu said...

I need to know how to search for an email address in text file and replace the word 'EMAIL' in a html file.
http://www.pictube1.blogspot.com/