python - Parsing XML to match comments and text according to tag ids -
(updated): added code match values according ids. question: why matching ids u'1' , 'u'0' in both dictionaries not recognized?
(goal code): i'm writing script takes commented text .docx file , matches comments via xml tag ids. i've managed extract comment tags, text , ids. need match these up. strategy create 2 dictionaries: 1) 1 ids keys , commented text values , 2) second ids keys , comments values.
then plan run through both dictionaries , if keys (i.e. ids) match up, want make tuples of matching commented text/comment pairs. i'm having trouble creating dictionary , i'm getting error message syntax creating dictionary invalid. don't quite understand why. ideas?
from bs4 import beautifulsoup soup f = open('transcript.xml','r') soup = soup(f) #print soup.prettify() textdict = {} in soup.find_all('w:commentrangestart'): # variable 'key' assigned tag id key = i.parent.contents[1].attrs['w:id'] #variable 'value' assigned tag's text value= ''.join(i.nextsibling.findall(text=true) # key / value pairs added dictionary 'text_d' textdict[key]=value print textdict commentdict = {} in soup.find_all('w:comment'): key = i.attrs['w:id'] value= ''.join(i.findall(text=true) commentdict[key]=value print commentdict ## output {u'1': u'contradictory news', u'0': u'something news'} ## {u'1': u'news; comment; negative', u'0': u'news; comment'} ## added code key in set(textdict) & set (commentdict): if textdict[key] == commentdict[key]: print 'yay'
you have syntax error because didn't close parenthesis:
value= ''.join(i.nextsibling.findall(text=true) # -------------^ missing )
you missing few lines further on too:
value= ''.join(i.findall(text=true) # -------------^ missing )
Comments
Post a Comment