haskell - Stack space overflow (possibly related to mapM) -


i'm writing program creates shell script containing 1 command each image file in directory. there 667,944 images in directory, need handle strictness/laziness issue properly.

here's simple example gives me stack space overflow. work if give more space using +rts -ksize -rts, should able run little memory, producing output immediately. i've been reading stuff strictness in haskell wiki , wikibook on haskell, trying figure out how fix problem, , think it's 1 of mapm commands giving me grief, still don't understand enough strictness sort problem.

i've found other questions on seem relevant (is mapm in haskell strict? why program stack overflow? , is haskell's mapm not lazy?), enlightenment still eludes me.

import system.environment (getargs) import system.directory (getdirectorycontents)  gencommand :: filepath -> filepath -> filepath -> io string gencommand indir outdir file =   let infile = indir ++ '/':file   let angle = 0 -- have read file calculate real   let outfile = outdir ++ '/':file   return $! "convert " ++ infile ++ " -rotate " ++ show angle ++      " -crop 143x143+140+140 " ++ outfile  main :: io () main =   putstrln "#!/bin/sh"   (indir:outdir:_) <- getargs   files <- getdirectorycontents indir   let imagefiles = filter (`notelem` [".", ".."]) files   commands <- mapm (gencommand indir outdir) imagefiles   mapm_ putstrln commands 

edit: test #1

here's newest version of example.

import system.environment (getargs) import system.directory (getdirectorycontents) import control.monad ((>=>))  gencommand :: filepath -> filepath -> filepath -> io string gencommand indir outdir file =   let infile = indir ++ '/':file   let angle = 0 -- have read file calculate real   let outfile = outdir ++ '/':file   return $! "convert " ++ infile ++ " -rotate " ++ show angle ++      " -crop 143x143+140+140 " ++ outfile  main :: io () main =   putstrln "test 1"   (indir:outdir:_) <- getargs   files <- getdirectorycontents indir   putstrln $ show (length files)   let imagefiles = filter (`notelem` [".", ".."]) files   -- mapm_ (gencommand indir outdir >=> putstrln) imagefiles   mapm_ (\filename -> gencommand indir outdir filename >>= putstrln) imagefiles 

i compile command ghc --make -o2 amy2.hs -rtsopts. if run command ./amy2 ~/nosync/galaxyzoo/table2/images/ wombat, get

test 1 stack space overflow: current size 8388608 bytes. use `+rts -ksize -rts' increase it. 

if instead run command ./amy2 ~/nosync/galaxyzoo/table2/images/ wombat +rts -k20m, correct output...eventually:

test 1 667946 convert /home/amy/nosync/galaxyzoo/table2/images//587736546846572812.jpeg -rotate 0 -crop 143x143+140+140 wombat/587736546846572812.jpeg convert /home/amy/nosync/galaxyzoo/table2/images//587736542558617814.jpeg -rotate 0 -crop 143x143+140+140 wombat/587736542558617814.jpeg 

...and on.

this isn't strictness issue(*), order of evaluation issue. unlike lazily evaluated pure values, monadic effects must happen in deterministic order. mapm executes every action in given list , gathers results, cannot return until whole list of actions executed, don't same streaming behavior pure list functions.

the easy fix in case run both gencommand , putstrln inside same mapm_. note mapm_ doesn't suffer same issue since not building intermediate list.

mapm_ (gencommand indir outdir >=> putstrln) imagefiles 

the above uses "kleisli composition operator" >=> control.monad function composition operator . except monadic functions. can use normal bind , lambda.

mapm_ (\filename -> gencommand indir outdir filename >>= putstrln) imagefiles 

for more complex i/o applications want better composability between small, monadic stream processors, should use library such conduit or pipes.

also, make sure compiling either -o or -o2.

(*) exact, also strictness issue, because in addition building large, intermediate list in memory, laziness causes mapm build unnecessary thunks , use stack.

edit: seems main culprit might getdirectorycontents. looking @ function's source code, same kind of list accumulation internally mapm.

in order streaming directory listing, need use system.posix.directory unfortunately makes program incompatible non-posix systems (like windows). can stream directory contents e.g. using continuation passing style

import system.environment (getargs) import control.monad ((>=>))  import system.posix.directory (opendirstream, readdirstream, closedirstream) import control.exception (bracket)  gencommand :: filepath -> filepath -> filepath -> io string gencommand indir outdir file =   let infile = indir ++ '/':file   let angle = 0 -- have read file calculate real   let outfile = outdir ++ '/':file   return $! "convert " ++ infile ++ " -rotate " ++ show angle ++     " -crop 143x143+140+140 " ++ outfile  streamingdircontents :: filepath -> (filepath -> io ()) -> io () streamingdircontents root cont =     let loop stream =             fp <- readdirstream stream             case fp of                 [] -> return ()                 _   | fp `notelem` [".", ".."] -> cont fp >> loop stream                     | otherwise -> loop stream     bracket (opendirstream root) loop closedirstream   main :: io () main =   putstrln "test 1"   (indir:outdir:_) <- getargs   streamingdircontents indir (gencommand indir outdir >=> putstrln) 

here's how same thing using conduit:

import system.environment (getargs)  import system.posix.directory (opendirstream, readdirstream, closedirstream)  import data.conduit import qualified  data.conduit.list l import control.monad.io.class (liftio, monadio)  gencommand :: filepath -> filepath -> filepath -> io string gencommand indir outdir file =   let infile = indir ++ '/':file   let angle = 0 -- have read file calculate real   let outfile = outdir ++ '/':file   return $! "convert " ++ infile ++ " -rotate " ++ show angle ++     " -crop 143x143+140+140 " ++ outfile  dirsource :: (monadresource m, monadio m) => filepath -> source m filepath dirsource root =     bracketp (opendirstream root) closedirstream $ \stream ->         let loop =                 fp <- liftio $ readdirstream stream                 case fp of                     [] -> return ()                     _  -> yield fp >> loop         loop  main :: io () main =     putstrln "test 1"     (indir:outdir:_) <- getargs     let files    = dirsource indir $= l.filter (`notelem` [".", ".."])         commands = files $= l.mapm (liftio . gencommand indir outdir)      runresourcet $ commands $$ l.mapm_ (liftio . putstrln) 

the nice thing conduit regain ability compose pieces of functionality things conduit versions of filter , mapm. $= operator streams stuff forward in chain , $$ connects stream consumer.

the not-so-nice thing real world complicated , writing efficient , robust code requires jump through hoops resource management. that's why operations work in resourcet monad transformer keeps track of e.g. open file handles , cleans them promptly , deterministically when no longer needed or e.g. if computation gets aborted exception (this in contrast using lazy i/o , relying on garbage collector release scarce resources).

however, means a) need run final resulting conduit operation runresourcet , b) need explicitly lift i/o operations transformed monad using liftio instead of being able directly write e.g. l.mapm_ putstrln.


Comments

Popular posts from this blog

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -

node.js - Bad Request - node js ajax post -