sorting - How to use awk to sum multiple columns (but not all) and sort by the summed values -
i hope can solve problem awk and/or sort:
i have 19-column tab-delim file formatted so: (where line beginning 'gene' header)
gene -100 -75 -50 -25 0 25 50 75 100 -100 -75 -50 -25 0 25 50 75 100 mll 0 0 0 2 5 2 0 0 1 0 0 4 8 5 5 4 0 1 mll2 0 0 0 7 10 7 0 0 1 0 0 0 7 10 7 0 0 1
i sum columns 2-10, , sort rows summed value, give output so:
gene -100 -75 -50 -25 0 25 50 75 100 -100 -75 -50 -25 0 25 50 75 100 mll2 0 0 0 7 10 7 0 0 1 0 0 0 7 10 7 0 0 1 mll 0 0 0 2 5 2 0 0 1 0 0 4 8 5 5 4 0 1
i know if can make 20th column sum value need, can use sort finish job:
sort -nk20 file.txt
thanks in advance!
two step solution
this sums columns , prints sum 20th column:
$ awk 'nr==1{print $0,0;next;} {s=0; (i=2;i<=nf;i++) s+=$i; print $0,s;}' file gene -100 -75 -50 -25 0 25 50 75 100 -100 -75 -50 -25 0 25 50 75 100 0 mll 0 0 0 2 5 2 0 0 1 0 0 4 8 5 5 4 0 1 37 mll2 0 0 0 7 10 7 0 0 1 0 0 0 7 10 7 0 0 1 50
the output of above can piped, suggest, sort -nk20
.
one step solution
if want sum , sort in 1 step , if have gnu awk, use:
$ awk 'begin{procinfo["sorted_in"]="@val_num_asc"} nr==1{print;next} {s=0; (i=2;i<=nf;i++) s+=$i; a[nr]=s; b[nr]=$0} end{for (i in a)print b[i]}' file
or, written on multiple lines:
awk 'begin{procinfo["sorted_in"]="@val_num_asc"} nr==1{print;next} {s=0; (i=2;i<=nf;i++) s+=$i; a[nr]=s; b[nr]=$0} end{for (i in a)print b[i]}' file
procinfo
feature of gnu awk.
Comments
Post a Comment