haskell - Stack space overflow (possibly related to mapM) -
i'm writing program creates shell script containing 1 command each image file in directory. there 667,944 images in directory, need handle strictness/laziness issue properly.
here's simple example gives me stack space overflow
. work if give more space using +rts -ksize -rts
, should able run little memory, producing output immediately. i've been reading stuff strictness in haskell wiki , wikibook on haskell, trying figure out how fix problem, , think it's 1 of mapm commands giving me grief, still don't understand enough strictness sort problem.
i've found other questions on seem relevant (is mapm in haskell strict? why program stack overflow? , is haskell's mapm not lazy?), enlightenment still eludes me.
import system.environment (getargs) import system.directory (getdirectorycontents) gencommand :: filepath -> filepath -> filepath -> io string gencommand indir outdir file = let infile = indir ++ '/':file let angle = 0 -- have read file calculate real let outfile = outdir ++ '/':file return $! "convert " ++ infile ++ " -rotate " ++ show angle ++ " -crop 143x143+140+140 " ++ outfile main :: io () main = putstrln "#!/bin/sh" (indir:outdir:_) <- getargs files <- getdirectorycontents indir let imagefiles = filter (`notelem` [".", ".."]) files commands <- mapm (gencommand indir outdir) imagefiles mapm_ putstrln commands
edit: test #1
here's newest version of example.
import system.environment (getargs) import system.directory (getdirectorycontents) import control.monad ((>=>)) gencommand :: filepath -> filepath -> filepath -> io string gencommand indir outdir file = let infile = indir ++ '/':file let angle = 0 -- have read file calculate real let outfile = outdir ++ '/':file return $! "convert " ++ infile ++ " -rotate " ++ show angle ++ " -crop 143x143+140+140 " ++ outfile main :: io () main = putstrln "test 1" (indir:outdir:_) <- getargs files <- getdirectorycontents indir putstrln $ show (length files) let imagefiles = filter (`notelem` [".", ".."]) files -- mapm_ (gencommand indir outdir >=> putstrln) imagefiles mapm_ (\filename -> gencommand indir outdir filename >>= putstrln) imagefiles
i compile command ghc --make -o2 amy2.hs -rtsopts
. if run command ./amy2 ~/nosync/galaxyzoo/table2/images/ wombat
, get
test 1 stack space overflow: current size 8388608 bytes. use `+rts -ksize -rts' increase it.
if instead run command ./amy2 ~/nosync/galaxyzoo/table2/images/ wombat +rts -k20m
, correct output...eventually:
test 1 667946 convert /home/amy/nosync/galaxyzoo/table2/images//587736546846572812.jpeg -rotate 0 -crop 143x143+140+140 wombat/587736546846572812.jpeg convert /home/amy/nosync/galaxyzoo/table2/images//587736542558617814.jpeg -rotate 0 -crop 143x143+140+140 wombat/587736542558617814.jpeg
...and on.
this isn't strictness issue(*), order of evaluation issue. unlike lazily evaluated pure values, monadic effects must happen in deterministic order. mapm
executes every action in given list , gathers results, cannot return until whole list of actions executed, don't same streaming behavior pure list functions.
the easy fix in case run both gencommand
, putstrln
inside same mapm_
. note mapm_
doesn't suffer same issue since not building intermediate list.
mapm_ (gencommand indir outdir >=> putstrln) imagefiles
the above uses "kleisli composition operator" >=>
control.monad
function composition operator .
except monadic functions. can use normal bind , lambda.
mapm_ (\filename -> gencommand indir outdir filename >>= putstrln) imagefiles
for more complex i/o applications want better composability between small, monadic stream processors, should use library such conduit
or pipes
.
also, make sure compiling either -o
or -o2
.
(*) exact, also strictness issue, because in addition building large, intermediate list in memory, laziness causes mapm
build unnecessary thunks , use stack.
edit: seems main culprit might getdirectorycontents
. looking @ function's source code, same kind of list accumulation internally mapm
.
in order streaming directory listing, need use system.posix.directory
unfortunately makes program incompatible non-posix systems (like windows). can stream directory contents e.g. using continuation passing style
import system.environment (getargs) import control.monad ((>=>)) import system.posix.directory (opendirstream, readdirstream, closedirstream) import control.exception (bracket) gencommand :: filepath -> filepath -> filepath -> io string gencommand indir outdir file = let infile = indir ++ '/':file let angle = 0 -- have read file calculate real let outfile = outdir ++ '/':file return $! "convert " ++ infile ++ " -rotate " ++ show angle ++ " -crop 143x143+140+140 " ++ outfile streamingdircontents :: filepath -> (filepath -> io ()) -> io () streamingdircontents root cont = let loop stream = fp <- readdirstream stream case fp of [] -> return () _ | fp `notelem` [".", ".."] -> cont fp >> loop stream | otherwise -> loop stream bracket (opendirstream root) loop closedirstream main :: io () main = putstrln "test 1" (indir:outdir:_) <- getargs streamingdircontents indir (gencommand indir outdir >=> putstrln)
here's how same thing using conduit
:
import system.environment (getargs) import system.posix.directory (opendirstream, readdirstream, closedirstream) import data.conduit import qualified data.conduit.list l import control.monad.io.class (liftio, monadio) gencommand :: filepath -> filepath -> filepath -> io string gencommand indir outdir file = let infile = indir ++ '/':file let angle = 0 -- have read file calculate real let outfile = outdir ++ '/':file return $! "convert " ++ infile ++ " -rotate " ++ show angle ++ " -crop 143x143+140+140 " ++ outfile dirsource :: (monadresource m, monadio m) => filepath -> source m filepath dirsource root = bracketp (opendirstream root) closedirstream $ \stream -> let loop = fp <- liftio $ readdirstream stream case fp of [] -> return () _ -> yield fp >> loop loop main :: io () main = putstrln "test 1" (indir:outdir:_) <- getargs let files = dirsource indir $= l.filter (`notelem` [".", ".."]) commands = files $= l.mapm (liftio . gencommand indir outdir) runresourcet $ commands $$ l.mapm_ (liftio . putstrln)
the nice thing conduit
regain ability compose pieces of functionality things conduit versions of filter
, mapm
. $=
operator streams stuff forward in chain , $$
connects stream consumer.
the not-so-nice thing real world complicated , writing efficient , robust code requires jump through hoops resource management. that's why operations work in resourcet
monad transformer keeps track of e.g. open file handles , cleans them promptly , deterministically when no longer needed or e.g. if computation gets aborted exception (this in contrast using lazy i/o , relying on garbage collector release scarce resources).
however, means a) need run final resulting conduit operation runresourcet
, b) need explicitly lift i/o operations transformed monad using liftio
instead of being able directly write e.g. l.mapm_ putstrln
.
Comments
Post a Comment