python - Read only lines that contain certain specific string and apply regex on them -
here's code: have script reads file in file not lines similar , i'd extract informations lines have i doc o:
.
i've tried if condition still doesn't work when there lines regex aren't matching:
#!/usr/bin/env python # -*- coding: utf-8 -*- import re def extraire(data): ms = re.match(r'(\s+).*?(o:\s+).*(r:\s+).*mid:(\d+)', data) # heure & mid return {'heure':ms.group(1), 'mid':ms.group(2),"origine":ms.group(3),"destination":ms.group(4)} tableau = [] fichier = open("/home/test/file.log") f = fichier.readlines() line in f: if (re.findall(".*i doc o:.*",line)): tableau = [extraire(line) line in f ] print tableau fichier.close()
and here's example of lines of file here want first , fourth lines..:
01:09:25.258 mta messages doc o:nvs:smtp/alarm@yyy.xx r:nvs:sms/+654811 mid:6261 01:09:41.965 mta messages rep o:nvs:smtp/alarmes.techniques@xxx.de r:nvs:sms/+455451 mid:6261 01:09:41.965 mta messages rep 6261 ok, accepted (id: 26) 08:14:14.469 mta messages doc o:nvs:smtp/alarm@xxxx.en r:nvs:sms/+654646 mid:6262 08:14:30.630 mta messages rep o:nvs:smtp/alarm@azea.er r:nvs:sms/+33688704859 mid:6262 08:14:30.630 mta messages rep 6262 ok, accepted (id: 28)
from: http://docs.python.org/2/library/re.html
?, +?, ?? '', '+', , '?' qualifiers greedy; match text possible. behaviour isn’t desired; if re <.*> matched against ...
also, findall best used w/ entire buffer, , returns list, hence looping on matches saves having conditional against each line of file.
buff = fichier.read() matches = re.findall(".*?i doc ):.*", buff) match in matches: tableau = ...
-here test code, tell me it's doing, didn't want?
>>> import re >>> = """ ... 01:09:25.258 mta messages doc o:nvs:smtp/alarm@yyy.xx r:nvs:sms/+654811 mid:6261 ... 01:09:41.965 mta messages rep o:nvs:smtp/alarmes.techniques@xxx.de r:nvs:sms/+455451 mid:6261 ... 01:09:41.965 mta messages rep 6261 ok, accepted (id: 26) ... 08:14:14.469 mta messages doc o:nvs:smtp/alarm@xxxx.en r:nvs:sms/+654646 mid:6262 ... 08:14:30.630 mta messages rep o:nvs:smtp/alarm@azea.er r:nvs:sms/+33688704859 mid:6262 ... 08:14:30.630 mta messages rep 6262 ok, accepted (id: 28)""" >>> m = re.findall(".*?i doc o:.*",a) ['01:09:25.258 mta messages doc o:nvs:smtp/alarm@yyy.xx r:nvs:sms/+654811 mid:6261', '08:14:14.469 mta messages doc o:nvs:smtp/alarm@xxxx.en r:nvs:sms/+654646 mid:6262'] >>> tableau = [] >>> line in m: ... tableau.append( extraire(line) ) ... >>> tableau [{'origine': 'r:nvs:sms/+654811', 'destination': '6261', 'heure': '01:09:25.258', 'mid': 'o:nvs:smtp/alarm@yyy.xx'}, {'origine': 'r:nvs:sms/+654646', 'destination': '6262', 'heure': '08:14:14.469', 'mid': 'o:nvs:smtp/alarm@xxxx.en'}]
you in single line as
>>> tableau = [ extraire(line) line in re.findall( ".*?i doc ):.*", fichier.read() ) ]
Comments
Post a Comment