bash - Replacing strings in one file with strings from second file -
i've been searching couple of days haven't got right answer
i have 2 files this:
file1:
>contig-100_23331 length_200 read_count_4043 tcag... >contig-100_23332 length_200 read_count_4508 ttca... >contig-100_23333 length_200 read_count_184 ttcc...
file2:
>contig-100_23331_cov:_30.9135 >contig-100_23332_cov:_125.591 >contig-100_23333_cov:_5.97537
i want replace lines names (>contig... length...) in file1 lines names in file2. note file2 contains contig names (no sequence).
i suppose theres way sed
, can't find solution
thanks in advance!
one possibility use sed
create sed
-script file2
used on file1
:
sed 's/^\(>contig-[0-9]*_[0-9]*\)_.*/s%^\1 %& %/' file2 > sed.script sed -f sed.script file1 > file.out rm -f sed.script
for sample file2
, sed.script
contain:
s%^>contig-100_23331 %>contig-100_23331_cov:_30.9135 % s%^>contig-100_23332 %>contig-100_23332_cov:_125.591 % s%^>contig-100_23333 %>contig-100_23333_cov:_5.97537 %
for sample file1
, output of sed
processing be:
>contig-100_23331_cov:_30.9135 length_200 read_count_4043 tcag... >contig-100_23332_cov:_125.591 length_200 read_count_4508 ttca... >contig-100_23333_cov:_5.97537 length_200 read_count_184 ttcc...
some versions of sed
may have problems 23k lines in sed
script. if that's problem you, can generate sed.script
, split (split
) smaller chunks (e.g. 1000 lines each) , run sed -f chunk
each of chunks. that's painful, necessary. historically, hp-ux (archaic versions, hp-ux 9 or 10) had rather limited versions of sed
handle few hundred commands in sed
script.
given you're using bash
, can avoid explicit intermediate file process substitution:
sed -f <(sed 's/^\(>contig-[0-9]*_[0-9]*\)_.*/s%^\1 %& %/' file2) file1 > file.out
however, should validate script before using notation.
Comments
Post a Comment