Friday, August 14, 2009

Finding email addresses in a text file

I got stuck with an project recently which entailed pulling email addresses out of a flat text file.
This request surprisingly more tricky than I thought because of all of the variations of acceptable email addresses. Anyway I ended up finding something out on the net that did most of what I needed so I'm posting it here in case anybody else needs something similar. However, I do recommend eyes on data verification because there are a few cases problems with it. I'll put tome more time in on this one later on, but for simple jobs this seems to get it done.


#!/bin/bash
## This is a quickie script to pull emaill addresses out of a flat text file.
## However, there are some short comings and bugs in it that still need to
## to be fixed
## 1) if the email host has more than 1 dot in the name the second dot
## and it everything after it get lost. Such as foo@tampabay.rr.com
## whatever comes after the .rr is there
## 2) somtimes the recipient gets mangled. Haven't quite figured out
## the pattern to this bug, but it will reuire visual inspection
##

egrep -o "\w+([._-]\w)*@\w+([._-]\w)*\.\w{2,4}"



As I said I'm working a better one of these and I'll post it when I can figure one out.

Thursday, August 13, 2009

Data scrubbing with sed

This is just s quickie, I recently had to do some clean up work on a database
that had irregular columns separators. There were single tabs, multiple tabs
mixed with single white spaces and multiple white space. Here's a quick one liner
in sed that will clean those up and leave you with just single white space.


#!/bin/bash
#first we strip off the tabs and replace with white spaces
sed -e 's/\t/ /g' -e 's/ */ /g' $1

Wednesday, February 18, 2009

: bad interprer: No such file or command

Hello and welcome to this weeks edition of practical shell scripting. This is one for the newbies and cross platform folks. If you've ever worked in cross platform development environment you know there are several obstacles that you will encounter with issues like directory paths and permissions. But one of the sneakiest of these obstacles is the that Unix does not recognize carriage returns characters. In fact they will really fowl things up, and this can occur from a couple of different sources. I'll give you some examples of these in a few moments, but first let me lay out situation that I always encounter.

I write some script to do some arbitrary thing and test it to make sure it works. Then I check it into our revision control system. Then a few weeks later someone else edits the file and checks it back and then it suddenly doesn't work. When you try to execute it you get

:bad interprter : No such file.

How can this be, you're looking right at the file and still it won't execute. Here's what happened,
the last person to edit the file did so using a windows based editor ( like word pad ) and that editor placed a carriage return at the end o each line in the script. If you cat the file or vi the file it looks fine and yet it gives you this error. Here's how to fix it, view the file using `cat -t`, it will show you the hidden characters. These carriage returns look like ^M at the end of each line. There's a couple of fixes for this. On more current unix distributions there is a utility called dos2unix which will strip the ^M out of the file, however on the older Unix's you can do a sed command like this:

sed 's/^M//g' FileThatsMessedUp.sh > FileThatsFixed.sh

The trick is that the ^M is created by pressing the Ctrl button at the same time as the letter v, and then Ctrl and the letter m. Now view your FileThatsFixed.sh using he cat -t command and look for the ^M's. They should be gone.

One other sneaky part of this is is that once this happens the FileThatsFixed.sh will no longer have execute permissions, so you'll need to go back and set those to executable. I'm not sure if this will survive porting to html, but I'll give it a go and see what happens.

FileThatsMessedUp.sh

#!/bin/bash

echo "foo"
echo "foo"
echo "Just testing"


FileThatsFixed.sh

#!/bin/bash

echo "foo"
echo "foo"
echo "Just testing"


If these don't work for you, just copy a file onto your windows box and open it up with notepad or word pad and give it a try.

That's it for this weeks.

Monday, January 12, 2009

A little tee please

Welcome to this weeks exciting new episode of Practical Shell Scripting. I'm under the gun to get a project done at work so weeks is a quickie. If you ever wanted to pipe output to both a file and the screen. Here's a fast one liner that will do it for you.

#!/bin/bash

LOGFILE="test.log"


echo "foo" | tee -a $LOGFILE



The trick is the -a for append which tee uses to print to stdio as well as where ever you direct it.

Tuesday, January 6, 2009

Inline functions in bash

Hello and welcome to this weeks edition of Practical Shell Scripting. This week I want to give a couple of examples of one the coolest things that you can do with shell scripts. I don't know if there is an actual name for this functionality, but I have always referred to them as inline functions because they remind me of the inline functions from C++. The basic deal is that you can set up one liners within a shell script to derive a certain value just as if you were calling those functions from from a shell. Here's a small shell script with a couple of examples to demonstrate what I'm talking about.

#!/bin/bash

##--------CheckForOrangeCount ----------##
CheckForOrangeCount()
{

if [ -f $FILE_NAME ]; then
FILE_NAME="$1"
SomeValue=`grep oranges $FILE_NAME | cut -d" " -f2`
echo "There are $SomeValue oranges"
else
echo "No file to check out"
fi

}

##-------- main----------##

UPTIME=$(cat /proc/uptime | cut -f1 -d' ' | cut -f1 -d'.')
echo "Your system has been up for $UPTIME"



if [ "$1" = "" ]; then
echo "You got no incoming files to work on"
else
echo "You've got $1"
FILE_NAME="$1"
CheckForOrangeCount $FILE_NAME
fi

##--------------------------------------------------------------------------##


There are two instances of these in this piece of code. The first one to show up is one that
tells us how long the system has been running since the last boot up.

UPTIME=$(cat /proc/uptime | cut -f1 -d' ' | cut -f1 -d'.')

The next is one that tells us the number number of oranges in a text file, presumably a grocery list of some sort.

SomeValue=`grep oranges $FILE_NAME | cut -d" " -f2`

Notice that there are two different formats here. The first one uses $( ) to retrieve the value for how long the system has been up. While the second one uses a format similar to an alias. In fact except for the keyword alias these are essentially identical from a functional point of view.

These are two very simple examples, but you can make these as complicated or as simple as you like. Just remember that there's always the probability that you will need to decipher them at some time in the future.

That's it for this week, as always if you have any questions or comments please feel free to contact me and Happy coding.