Handling dates with Regex in Apache Pig -
assuming field time
looks 2013-01-01t00:00:00.000z
, piggybank.jar
has been imported , , command extract
has been defined (define extract org.apache.pig.piggybank.evaluation.string.extract();) what's best way extract fields year, month, day, hour, minute, second
? that's have done far:
data = foreach data generate flatten(extract(time, '(\\d+)-(\\d+)-(\\d+)t(\\d+):(\\d+):(\\d+).(\\s+)')) ( year: int, month: int, day: int, hour: int, minute: int, second: int, tail: chararray );
since pig 0.11 can use datetime type.
a = load 'data' (date:chararray); b = foreach generate todate(date) date; c = foreach b generate getmonth(date) month;
you can use these functions here: datetime functions
if you're not working 0.11 can write udf or resort regex posted.
Comments
Post a Comment