If you're interested in functional programming, you might also want to checkout my second blog which i'm actively working on!!

Thursday, December 20, 2012

StringJoining lines of file Unix

This week I had to generate a batch (400) of DITA files. The actual ids were handed to me in an excel sheet. As I was using a scripting language (Cocoon flowscript == Javascript) it would be convenient to transform the id column from excel to Javascript array notation. So the first thing I did was copying the content of the first column and paste it into batch2.txt.

batch2.txt
2N7000
BLF1822-10
BT150-800R
BU1506DX
BUT18A
BUX87P
BY229-600

So now I needed to find a way to basically transform each line by wrapping each id in double quotes and next do a string-join of all lines. As I was a bit rusty in shell scripting I started looking online. My colleague and I stumbled upon the same approach which got the job done for 99.9%. The only problem was that there was a comma after the last id.
for line in `cat batch2.txt`; do echo -n "\"$line\"," ; done >> output.txt

"2N7000","BLF1822-10","BT150-800R","BU1506DX","BUT18A","BUX87P","BY229-600",

When i woke up this morning I felt restless... surely doing a simple stringjoin can't be that difficult? So I started reading about a few unix commands and you can use 'sed' to easily do string replacement. So the trick I use is to first wrap all ids within double quotes. I also have to make sure to use the -n flag with the echo command so the newlines are removed. Next I just replace 2 sequential double quotes "" by "," and that's it.
nxp10009@NXL01366 /c/tmp
$ for line in `cat batch2.txt`; do echo -n "\"$line\""; done | sed 's/""/","/g' >> output.txt

nxp10009@NXL01366 /c/tmp
$ less output.txt
"2N7000","BLF1822-10","BT150-800R","BU1506DX","BUT18A","BUX87P","BY229-600"

1 comment:

  1. I've tried to do it without a for loop and so far I've invented these ways:

    1. Most naive: tr -> sed
    tr '\n' ',' < test.txt | sed 's/^/"/;s/,$/"/;s/,/","/g'
    2. Also quite logical: awk
    awk '{if (NR>1) printf(","); printf("\"%s\"",$0)}' test.txt
    3. Cumbersome: sed -> tr
    sed 's/^/"/;:l;N;$!bl;s/\n/","/g;s/$/"/' test.txt | tr -d '\n'

    ReplyDelete