Wednesday, August 18, 2010

Unix cut command examples

I'm sure many of you are familiar with the unix cut command, but what you may not know is that it can not only cut sing fields, but also ranges of fields as well as columns.
For instance if you want to see filed 27-30 of a pipe delimited file try:
cat file | cut -d"|" -f27-30

and if you just want to see columns 27 and 30 with out the rest of the stuff between try:
cat file | cut-d"|" -f27,30

sort records by fields with sort

I just found a very neat little jewel of a use for the unix sort command. Apparently you can use sort with the -t option and specify fields like so:

cat onlyequipment.txt | sort -t"|" +2 > onlyequipment.sorted.txt

Given this command sort will sort the records of file onlyequipment.txt based on the 3rd field of the pipe delimited field such as:

38320|E|STENTM|20100518|1445|CYSTOM|30||
4871|E|US/BX|20100617|0800|US/BX|45||
40359|E|CYSTM1|20100726|1530|CYSTOM|30|
29566|E|STENTM|20100414|0945|CYSTOM|30||
45995|E|US/BX|20100830|1315|US/BX|30|||
44196|E|US/BX|20100609|0800|US/BX|45||3
18699|E|STENTM|20100621|0830|CYSTOM|30||4 WKS C
35816|E||20100805|0800|CYSTOF|0||1Y - F
40880|E||20100316|0815|CYSTOF|0||6 M CYSTO|100316
41071|E|CYSTM1|20100721|1445|CYSTOM|30||3WK PER BLOI
24512|E|US/BX|20100421|0800|US/BX|45||TRUS BX-GIVE

so after the file has been sorted and piped into the file you will get this:

40880|E||20100316|0815|CYSTOF|0||6 M CYSTO|100316
35816|E||20100805|0800|CYSTOF|0||1Y - F
41071|E|CYSTM1|20100721|1445|CYSTOM|30||3WK PER BLOI
40359|E|CYSTM1|20100726|1530|CYSTOM|30|
29566|E|STENTM|20100414|0945|CYSTOM|30||
18699|E|STENTM|20100621|0830|CYSTOM|30||4 WKS C
24512|E|US/BX|20100421|0800|US/BX|45||TRUS BX-GIVE
44196|E|US/BX|20100609|0800|US/BX|45||3
4871|E|US/BX|20100617|0800|US/BX|45||
45995|E|US/BX|20100830|1315|US/BX|30|||

Pretty neat huh.

Saturday, April 3, 2010

A quick perl script to solve that same problem

#!/usr/bin/perl


## check for existence of a file ##
if (-e "./test2.txt") {
print "File exists! \n";
}else {
print "File does not exist";
}


print "\n ";

#use strict;
open(MYDATA, "test2.txt") or
die("Error: cannot open file 'data.txt'\n");
my $line;
my $lnum = 1;

while( $line = ){
chomp($line);
# chop;
($fee, $fi, $fo, $fum) = (split(/,/, $line));
print "COLUMN 2 is: $fi \n ";
if ( $fi =~ /^\./ ) {
print "found one \n";
}
print "$lnum| $line\n";
$lnum++;
}


close MYDATA;

More on records that have a period at the beginning

#!/bin/bash

## one possible method to find all of the columns that start with period ##
cat $1 | cut -d"|" -f2 | grep '^\.' > badrecords.txt

## another possible method
cat $1 | cut -d"|" -f2 | grep '^\.' > badrecords.txt

## find all records that have a pipe followed by a period
grep '|\.' test1.txt
~

### the quick sed command to make global switches ##
:.,$s/|/,|g ## translates all pipes to comas
:.,$s/,/|/g ## translates all comas to pipes:w

Thursday, April 1, 2010

Finding records in which some words start with a period

This is a real quickie. How do you find any record in which some of the words begin with a "."?
Here's the answer:
grep '.\.' file.txt.

If you want to loop through more than one file try this:

#!/bin/bash
for file in `ls *.txt`
do
echo "searching $file"
grep '.\.' $file
done


to find all of the records that don't have words starting with a "." try
grep -v '.\.' file.txt