sorting - How to use awk to sum multiple columns (but not all) and sort by the summed values -


i hope can solve problem awk and/or sort:

i have 19-column tab-delim file formatted so: (where line beginning 'gene' header)

gene  -100 -75 -50 -25  0 25 50 75 100  -100 -75 -50 -25  0 25 50 75 100 mll      0   0   0   2  5  2  0  0   1     0   0   4   8  5  5  4  0   1 mll2     0   0   0   7 10  7  0  0   1     0   0   0   7 10  7  0  0   1 

i sum columns 2-10, , sort rows summed value, give output so:

gene  -100 -75 -50 -25  0 25 50 75 100  -100 -75 -50 -25  0 25 50 75 100 mll2     0   0   0   7 10  7  0  0   1     0   0   0   7 10  7  0  0   1 mll      0   0   0   2  5  2  0  0   1     0   0   4   8  5  5  4  0   1 

i know if can make 20th column sum value need, can use sort finish job:

sort -nk20 file.txt 

thanks in advance!

two step solution

this sums columns , prints sum 20th column:

$ awk 'nr==1{print $0,0;next;} {s=0; (i=2;i<=nf;i++) s+=$i; print $0,s;}' file gene  -100 -75 -50 -25  0 25 50 75 100  -100 -75 -50 -25  0 25 50 75 100 0 mll      0   0   0   2  5  2  0  0   1     0   0   4   8  5  5  4  0   1 37 mll2     0   0   0   7 10  7  0  0   1     0   0   0   7 10  7  0  0   1 50 

the output of above can piped, suggest, sort -nk20.

one step solution

if want sum , sort in 1 step , if have gnu awk, use:

$ awk 'begin{procinfo["sorted_in"]="@val_num_asc"} nr==1{print;next} {s=0; (i=2;i<=nf;i++) s+=$i; a[nr]=s; b[nr]=$0} end{for (i in a)print b[i]}' file 

or, written on multiple lines:

awk 'begin{procinfo["sorted_in"]="@val_num_asc"}      nr==1{print;next}      {s=0; (i=2;i<=nf;i++) s+=$i; a[nr]=s; b[nr]=$0}     end{for (i in a)print b[i]}' file 

procinfo feature of gnu awk.


Comments

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

unity3d - Unity local avoidance in user created world -