bash - Replacing strings in one file with strings from second file -
i've been searching couple of days haven't got right answer
i have 2 files this:
file1:
>contig-100_23331 length_200 read_count_4043 tcag... >contig-100_23332 length_200 read_count_4508 ttca... >contig-100_23333 length_200 read_count_184 ttcc... file2:
>contig-100_23331_cov:_30.9135 >contig-100_23332_cov:_125.591 >contig-100_23333_cov:_5.97537 i want replace lines names (>contig... length...) in file1 lines names in file2. note file2 contains contig names (no sequence).
i suppose theres way sed, can't find solution
thanks in advance!
one possibility use sed create sed-script file2 used on file1:
sed 's/^\(>contig-[0-9]*_[0-9]*\)_.*/s%^\1 %& %/' file2 > sed.script sed -f sed.script file1 > file.out rm -f sed.script for sample file2, sed.script contain:
s%^>contig-100_23331 %>contig-100_23331_cov:_30.9135 % s%^>contig-100_23332 %>contig-100_23332_cov:_125.591 % s%^>contig-100_23333 %>contig-100_23333_cov:_5.97537 % for sample file1, output of sed processing be:
>contig-100_23331_cov:_30.9135 length_200 read_count_4043 tcag... >contig-100_23332_cov:_125.591 length_200 read_count_4508 ttca... >contig-100_23333_cov:_5.97537 length_200 read_count_184 ttcc... some versions of sed may have problems 23k lines in sed script. if that's problem you, can generate sed.script , split (split) smaller chunks (e.g. 1000 lines each) , run sed -f chunk each of chunks. that's painful, necessary. historically, hp-ux (archaic versions, hp-ux 9 or 10) had rather limited versions of sed handle few hundred commands in sed script.
given you're using bash, can avoid explicit intermediate file process substitution:
sed -f <(sed 's/^\(>contig-[0-9]*_[0-9]*\)_.*/s%^\1 %& %/' file2) file1 > file.out however, should validate script before using notation.
Comments
Post a Comment