haskell - Stack space overflow (possibly related to mapM) -
i'm writing program creates shell script containing 1 command each image file in directory. there 667,944 images in directory, need handle strictness/laziness issue properly.
here's simple example gives me stack space overflow. work if give more space using +rts -ksize -rts, should able run little memory, producing output immediately. i've been reading stuff strictness in haskell wiki , wikibook on haskell, trying figure out how fix problem, , think it's 1 of mapm commands giving me grief, still don't understand enough strictness sort problem.
i've found other questions on seem relevant (is mapm in haskell strict? why program stack overflow? , is haskell's mapm not lazy?), enlightenment still eludes me.
import system.environment (getargs) import system.directory (getdirectorycontents) gencommand :: filepath -> filepath -> filepath -> io string gencommand indir outdir file = let infile = indir ++ '/':file let angle = 0 -- have read file calculate real let outfile = outdir ++ '/':file return $! "convert " ++ infile ++ " -rotate " ++ show angle ++ " -crop 143x143+140+140 " ++ outfile main :: io () main = putstrln "#!/bin/sh" (indir:outdir:_) <- getargs files <- getdirectorycontents indir let imagefiles = filter (`notelem` [".", ".."]) files commands <- mapm (gencommand indir outdir) imagefiles mapm_ putstrln commands edit: test #1
here's newest version of example.
import system.environment (getargs) import system.directory (getdirectorycontents) import control.monad ((>=>)) gencommand :: filepath -> filepath -> filepath -> io string gencommand indir outdir file = let infile = indir ++ '/':file let angle = 0 -- have read file calculate real let outfile = outdir ++ '/':file return $! "convert " ++ infile ++ " -rotate " ++ show angle ++ " -crop 143x143+140+140 " ++ outfile main :: io () main = putstrln "test 1" (indir:outdir:_) <- getargs files <- getdirectorycontents indir putstrln $ show (length files) let imagefiles = filter (`notelem` [".", ".."]) files -- mapm_ (gencommand indir outdir >=> putstrln) imagefiles mapm_ (\filename -> gencommand indir outdir filename >>= putstrln) imagefiles i compile command ghc --make -o2 amy2.hs -rtsopts. if run command ./amy2 ~/nosync/galaxyzoo/table2/images/ wombat, get
test 1 stack space overflow: current size 8388608 bytes. use `+rts -ksize -rts' increase it. if instead run command ./amy2 ~/nosync/galaxyzoo/table2/images/ wombat +rts -k20m, correct output...eventually:
test 1 667946 convert /home/amy/nosync/galaxyzoo/table2/images//587736546846572812.jpeg -rotate 0 -crop 143x143+140+140 wombat/587736546846572812.jpeg convert /home/amy/nosync/galaxyzoo/table2/images//587736542558617814.jpeg -rotate 0 -crop 143x143+140+140 wombat/587736542558617814.jpeg ...and on.
this isn't strictness issue(*), order of evaluation issue. unlike lazily evaluated pure values, monadic effects must happen in deterministic order. mapm executes every action in given list , gathers results, cannot return until whole list of actions executed, don't same streaming behavior pure list functions.
the easy fix in case run both gencommand , putstrln inside same mapm_. note mapm_ doesn't suffer same issue since not building intermediate list.
mapm_ (gencommand indir outdir >=> putstrln) imagefiles the above uses "kleisli composition operator" >=> control.monad function composition operator . except monadic functions. can use normal bind , lambda.
mapm_ (\filename -> gencommand indir outdir filename >>= putstrln) imagefiles for more complex i/o applications want better composability between small, monadic stream processors, should use library such conduit or pipes.
also, make sure compiling either -o or -o2.
(*) exact, also strictness issue, because in addition building large, intermediate list in memory, laziness causes mapm build unnecessary thunks , use stack.
edit: seems main culprit might getdirectorycontents. looking @ function's source code, same kind of list accumulation internally mapm.
in order streaming directory listing, need use system.posix.directory unfortunately makes program incompatible non-posix systems (like windows). can stream directory contents e.g. using continuation passing style
import system.environment (getargs) import control.monad ((>=>)) import system.posix.directory (opendirstream, readdirstream, closedirstream) import control.exception (bracket) gencommand :: filepath -> filepath -> filepath -> io string gencommand indir outdir file = let infile = indir ++ '/':file let angle = 0 -- have read file calculate real let outfile = outdir ++ '/':file return $! "convert " ++ infile ++ " -rotate " ++ show angle ++ " -crop 143x143+140+140 " ++ outfile streamingdircontents :: filepath -> (filepath -> io ()) -> io () streamingdircontents root cont = let loop stream = fp <- readdirstream stream case fp of [] -> return () _ | fp `notelem` [".", ".."] -> cont fp >> loop stream | otherwise -> loop stream bracket (opendirstream root) loop closedirstream main :: io () main = putstrln "test 1" (indir:outdir:_) <- getargs streamingdircontents indir (gencommand indir outdir >=> putstrln) here's how same thing using conduit:
import system.environment (getargs) import system.posix.directory (opendirstream, readdirstream, closedirstream) import data.conduit import qualified data.conduit.list l import control.monad.io.class (liftio, monadio) gencommand :: filepath -> filepath -> filepath -> io string gencommand indir outdir file = let infile = indir ++ '/':file let angle = 0 -- have read file calculate real let outfile = outdir ++ '/':file return $! "convert " ++ infile ++ " -rotate " ++ show angle ++ " -crop 143x143+140+140 " ++ outfile dirsource :: (monadresource m, monadio m) => filepath -> source m filepath dirsource root = bracketp (opendirstream root) closedirstream $ \stream -> let loop = fp <- liftio $ readdirstream stream case fp of [] -> return () _ -> yield fp >> loop loop main :: io () main = putstrln "test 1" (indir:outdir:_) <- getargs let files = dirsource indir $= l.filter (`notelem` [".", ".."]) commands = files $= l.mapm (liftio . gencommand indir outdir) runresourcet $ commands $$ l.mapm_ (liftio . putstrln) the nice thing conduit regain ability compose pieces of functionality things conduit versions of filter , mapm. $= operator streams stuff forward in chain , $$ connects stream consumer.
the not-so-nice thing real world complicated , writing efficient , robust code requires jump through hoops resource management. that's why operations work in resourcet monad transformer keeps track of e.g. open file handles , cleans them promptly , deterministically when no longer needed or e.g. if computation gets aborted exception (this in contrast using lazy i/o , relying on garbage collector release scarce resources).
however, means a) need run final resulting conduit operation runresourcet , b) need explicitly lift i/o operations transformed monad using liftio instead of being able directly write e.g. l.mapm_ putstrln.
Comments
Post a Comment