This is a quick, but very helpful little script I found out on the net for padding variables.
#!/bin/bash
function lpad
{
word="$1"
while [ ${#word} -lt $2 ]; do
word="$3$word";
done;
echo "$word";
}
function rpad {
word="$1"
while [ ${#word} -lt $2 ]; do
word="$word$3";
done;
echo "$word";
}
lpad 3 5 0
rpad fooby 20 ^
Wednesday, August 17, 2011
Monday, March 7, 2011
Finding Blank spaces and fields in awk and perl
I recently was working on a project where the customer wanted to create a report that would print the values of one field if another field was blank. Specifically for this case if field 24 is blank then print the value in field 2. I believe that this particular case was for claimants in insurance records, but it could just as easily have been cell phone, vs main phone, or primary vs secondary address.
Anyway here's the regular expression to search for fields that are completely blank. Note the ^ at the start and the $ at the end to specify both the beginning as well as the ending of the field and the []* referring to empty brackets.
##-- this checks if $24 has any number of blanks in it ##
if ($24 ~ /^[ ]*$/)
{
printf("%s|",$2);
}
else
{
printf("%s|",$24);
}
This same piece of regular expression should work pretty well for Perl with a matching test, but I'd tr it out before I swear to it.
Anyway here's the regular expression to search for fields that are completely blank. Note the ^ at the start and the $ at the end to specify both the beginning as well as the ending of the field and the []* referring to empty brackets.
##-- this checks if $24 has any number of blanks in it ##
if ($24 ~ /^[ ]*$/)
{
printf("%s|",$2);
}
else
{
printf("%s|",$24);
}
This same piece of regular expression should work pretty well for Perl with a matching test, but I'd tr it out before I swear to it.
Wednesday, August 18, 2010
Unix cut command examples
I'm sure many of you are familiar with the unix cut command, but what you may not know is that it can not only cut sing fields, but also ranges of fields as well as columns.
For instance if you want to see filed 27-30 of a pipe delimited file try:
cat file | cut -d"|" -f27-30
and if you just want to see columns 27 and 30 with out the rest of the stuff between try:
cat file | cut-d"|" -f27,30
For instance if you want to see filed 27-30 of a pipe delimited file try:
cat file | cut -d"|" -f27-30
and if you just want to see columns 27 and 30 with out the rest of the stuff between try:
cat file | cut-d"|" -f27,30
sort records by fields with sort
I just found a very neat little jewel of a use for the unix sort command. Apparently you can use sort with the -t option and specify fields like so:
cat onlyequipment.txt | sort -t"|" +2 > onlyequipment.sorted.txt
Given this command sort will sort the records of file onlyequipment.txt based on the 3rd field of the pipe delimited field such as:
38320|E|STENTM|20100518|1445|CYSTOM|30||
4871|E|US/BX|20100617|0800|US/BX|45||
40359|E|CYSTM1|20100726|1530|CYSTOM|30|
29566|E|STENTM|20100414|0945|CYSTOM|30||
45995|E|US/BX|20100830|1315|US/BX|30|||
44196|E|US/BX|20100609|0800|US/BX|45||3
18699|E|STENTM|20100621|0830|CYSTOM|30||4 WKS C
35816|E||20100805|0800|CYSTOF|0||1Y - F
40880|E||20100316|0815|CYSTOF|0||6 M CYSTO|100316
41071|E|CYSTM1|20100721|1445|CYSTOM|30||3WK PER BLOI
24512|E|US/BX|20100421|0800|US/BX|45||TRUS BX-GIVE
so after the file has been sorted and piped into the file you will get this:
40880|E||20100316|0815|CYSTOF|0||6 M CYSTO|100316
35816|E||20100805|0800|CYSTOF|0||1Y - F
41071|E|CYSTM1|20100721|1445|CYSTOM|30||3WK PER BLOI
40359|E|CYSTM1|20100726|1530|CYSTOM|30|
29566|E|STENTM|20100414|0945|CYSTOM|30||
18699|E|STENTM|20100621|0830|CYSTOM|30||4 WKS C
24512|E|US/BX|20100421|0800|US/BX|45||TRUS BX-GIVE
44196|E|US/BX|20100609|0800|US/BX|45||3
4871|E|US/BX|20100617|0800|US/BX|45||
45995|E|US/BX|20100830|1315|US/BX|30|||
Pretty neat huh.
cat onlyequipment.txt | sort -t"|" +2 > onlyequipment.sorted.txt
Given this command sort will sort the records of file onlyequipment.txt based on the 3rd field of the pipe delimited field such as:
38320|E|STENTM|20100518|1445|CYSTOM|30||
4871|E|US/BX|20100617|0800|US/BX|45||
40359|E|CYSTM1|20100726|1530|CYSTOM|30|
29566|E|STENTM|20100414|0945|CYSTOM|30||
45995|E|US/BX|20100830|1315|US/BX|30|||
44196|E|US/BX|20100609|0800|US/BX|45||3
18699|E|STENTM|20100621|0830|CYSTOM|30||4 WKS C
35816|E||20100805|0800|CYSTOF|0||1Y - F
40880|E||20100316|0815|CYSTOF|0||6 M CYSTO|100316
41071|E|CYSTM1|20100721|1445|CYSTOM|30||3WK PER BLOI
24512|E|US/BX|20100421|0800|US/BX|45||TRUS BX-GIVE
so after the file has been sorted and piped into the file you will get this:
40880|E||20100316|0815|CYSTOF|0||6 M CYSTO|100316
35816|E||20100805|0800|CYSTOF|0||1Y - F
41071|E|CYSTM1|20100721|1445|CYSTOM|30||3WK PER BLOI
40359|E|CYSTM1|20100726|1530|CYSTOM|30|
29566|E|STENTM|20100414|0945|CYSTOM|30||
18699|E|STENTM|20100621|0830|CYSTOM|30||4 WKS C
24512|E|US/BX|20100421|0800|US/BX|45||TRUS BX-GIVE
44196|E|US/BX|20100609|0800|US/BX|45||3
4871|E|US/BX|20100617|0800|US/BX|45||
45995|E|US/BX|20100830|1315|US/BX|30|||
Pretty neat huh.
Saturday, April 3, 2010
A quick perl script to solve that same problem
#!/usr/bin/perl
## check for existence of a file ##
if (-e "./test2.txt") {
print "File exists! \n";
}else {
print "File does not exist";
}
print "\n ";
#use strict;
open(MYDATA, "test2.txt") or
die("Error: cannot open file 'data.txt'\n");
my $line;
my $lnum = 1;
while( $line = ){
chomp($line);
# chop;
($fee, $fi, $fo, $fum) = (split(/,/, $line));
print "COLUMN 2 is: $fi \n ";
if ( $fi =~ /^\./ ) {
print "found one \n";
}
print "$lnum| $line\n";
$lnum++;
}
close MYDATA;
## check for existence of a file ##
if (-e "./test2.txt") {
print "File exists! \n";
}else {
print "File does not exist";
}
print "\n ";
#use strict;
open(MYDATA, "test2.txt") or
die("Error: cannot open file 'data.txt'\n");
my $line;
my $lnum = 1;
while( $line =
chomp($line);
# chop;
($fee, $fi, $fo, $fum) = (split(/,/, $line));
print "COLUMN 2 is: $fi \n ";
if ( $fi =~ /^\./ ) {
print "found one \n";
}
print "$lnum| $line\n";
$lnum++;
}
close MYDATA;
More on records that have a period at the beginning
#!/bin/bash
## one possible method to find all of the columns that start with period ##
cat $1 | cut -d"|" -f2 | grep '^\.' > badrecords.txt
## another possible method
cat $1 | cut -d"|" -f2 | grep '^\.' > badrecords.txt
## find all records that have a pipe followed by a period
grep '|\.' test1.txt
~
### the quick sed command to make global switches ##
:.,$s/|/,|g ## translates all pipes to comas
:.,$s/,/|/g ## translates all comas to pipes:w
## one possible method to find all of the columns that start with period ##
cat $1 | cut -d"|" -f2 | grep '^\.' > badrecords.txt
## another possible method
cat $1 | cut -d"|" -f2 | grep '^\.' > badrecords.txt
## find all records that have a pipe followed by a period
grep '|\.' test1.txt
~
### the quick sed command to make global switches ##
:.,$s/|/,|g ## translates all pipes to comas
:.,$s/,/|/g ## translates all comas to pipes:w
Thursday, April 1, 2010
Finding records in which some words start with a period
This is a real quickie. How do you find any record in which some of the words begin with a "."?
Here's the answer:
grep '.\.' file.txt.
If you want to loop through more than one file try this:
#!/bin/bash
for file in `ls *.txt`
do
echo "searching $file"
grep '.\.' $file
done
to find all of the records that don't have words starting with a "." try
grep -v '.\.' file.txt
Here's the answer:
grep '.\.' file.txt.
If you want to loop through more than one file try this:
#!/bin/bash
for file in `ls *.txt`
do
echo "searching $file"
grep '.\.' $file
done
to find all of the records that don't have words starting with a "." try
grep -v '.\.' file.txt
Subscribe to:
Posts (Atom)