Bash script - Construct a single line out of many lines having duplicates in a single column -
i have instrumented log file have 6 lines of duplicated first column below.
//sc001@1/1/1@1/1,get,clientstart,1363178707755 //sc001@1/1/1@1/1,get,talktosocketstart,1363178707760 //sc001@1/1/1@1/1,get,decoderequest,1363178707765 //sc001@1/1/1@1/1,get-reply,encodereponse,1363178707767 //sc001@1/1/1@1/2,get,decoderequest,1363178708765 //sc001@1/1/1@1/2,get-reply,encodereponse,1363178708767 //sc001@1/1/1@1/2,get,talktosocketend,1363178708770 //sc001@1/1/1@1/2,get,clientend,1363178708775 //sc001@1/1/1@1/1,get,talktosocketend,1363178707770 //sc001@1/1/1@1/1,get,clientend,1363178707775 //sc001@1/1/1@1/2,get,clientstart,1363178708755 //sc001@1/1/1@1/2,get,talktosocketstart,1363178708760
note: , (comma) delimiter here
like wise there many duplicate first column values (ids) in log file (above example having 2 values (ids); //sc001@1/1/1@1/1 , //sc001@1/1/1@1/2) need consolidate log records below format.
id,clientstart,talktosocketstart,decoderequest,encodereponse,talktosocketend,clientend //sc001@1/1/1@1/1,1363178707755,1363178707760,1363178707765,1363178707767,1363178707770,1363178707775 //sc001@1/1/1@1/2,1363178708755,1363178708760,1363178708765,1363178708767,1363178708770,1363178708775
i suppose have bash script exercise , appreciate expert support this. hope there may sed or awk solution more efficient.
thanks much
one way:
sort -t, -k4n,4 file | awk -f, '{a[$1]=a[$1]?a[$1] fs $nf:$nf;}end{for(i in a){print i","a[i];}}'
sort
command sorts file on basis of last(4th) column. awk
takes sorted input , forms array 1st field key, , value combination of values of last column.
Comments
Post a Comment