Monads and mutable data structures |

I’m following the Stanford Algorithms MOOC and trying to implement the problems using Haskell. Many of the algorithms require quite a bit of data juggling, and my pure solutions are running way slower than benchmarks people are quoting for imperative languages. This is due to expensive copying and appending.

So I wanted to explore mutable data structures, which took me into the world of Monads. This page benefits from the advice I received to this and other Stack Overflow posts.

QuickSort is an archetypal example of an algorithm using mutability, and will form the basis of the following examples.

ST Monad with mutable array

Consider the following:

	qsort :: (STArray s Int Int) -> Int -> Int -> ST s ()
	qsort arr min mx =
		if mx - min < 1 then
			return ()

		else do
			p <- readArray arr min
			final_i <- foldM (partitioner p) (min+1) [(min+1)..mx]
			swap min $ final_i - 1
			qsort arr min     (final_i-2)
			qsort arr final_i mx

		where
			swap i j = do
				arr_i <- readArray arr i
				arr_j <- readArray arr j
				writeArray arr i arr_j
				writeArray arr j arr_i

			partitioner p i idx = do
				arr_idx <- readArray arr idx
				if arr_idx > p then
					return i
				else do
					swap i idx
					return $ i+1

	main = do
		let
			inputData = [3,1,5,4,2]
	{-
		print $ elems $ runSTArray $ do
			--newListArray :: (MArray a e m, Ix i) => (i, i) -> [e] -> m (a i e)
			state <- newListArray (1, length inputData) inputData
			qsort state 1 (length inputData)
			return state
	-}
		print $ elems $ runST $ do
			--newListArray :: (MArray a e m, Ix i) => (i, i) -> [e] -> m (a i e)
			state <- newListArray (1, length inputData) inputData
			qsort state 1 (length inputData)
			frozen <- freeze state
			return frozen

Starting from main, note how we wrap the main instructions in runST or runSTArray. runST is the more generally useful code, and it needs to be provided with a ‘pure’ value to work. So I freeze the mutable state (in this case an Array) and return, and print, that. runSTArray is special syntax which has the same effect for mutable arrays.

Qsort itself takes the relevant mutable array, undertakes a succession of recursive partitions of the data, and ultimately stops. It does not return anything, but of course leaves the state considerably changed, hence the return type of State s ().

Note how foldM works exactly as foldl, but that it returns a Monad, so I extract the value from it for later use.

IO Monad

Another approach is to use the IO Monad, again with an Array as an example. This is simpler to write and, in particular, avoids the complexity of understanding runST. I believe ST provides more checks of some sort, so that might make it preferable.

	main = do
	    [snip]
	    arr <- newListArray (0, length inputData - 1) inputData :: IO (IOArray Int Int)
	    qsort arr 0 (length inputData - 1)
	    printArray arr

	qsort :: (IOArray Int Int) -> Int -> Int -> IO ()
	[As above]

Mutable Vector

Vectors are normally to be preferred over Arrays. The resulting code is unsurprisingly similar (Haskell is logical if sometimes bewildering!). The main issues I faced were

finding the way to create the mutable vector in the first place, although I suspect that that was mostly due to;
adding the type signature for qsort.

Here is my code

	import Control.Monad
	import Control.Monad.ST
	import qualified Data.Vector as V
	import qualified Data.Vector.Mutable as MV
	--import qualified Data.Vector.Unboxed.Mutable as MV

	main = do
		let
			inputData = [5,1,4,3,2]
			initialImmutableVector = V.fromList inputData

		print $ runST $ do
			mutableState <- V.thaw initialImmutableVector
			qsort mutableState 0 (Prelude.length inputData - 1)
			frozen <- V.freeze mutableState
			return frozen

	qsort :: (MV.STVector s Int) -> Int -> Int -> ST s ()
	qsort vec min mx =
		if mx - min < 1 then
			return ()

		else do
			p <- MV.read vec min
			final_i <- foldM (partitioner p) (min+1) [(min+1)..mx]
			swap min (final_i - 1)
			qsort vec min     (final_i-2)
			qsort vec final_i mx

		where
			swap i j = do
				vec_i <- MV.read vec i
				vec_j <- MV.read vec j
				MV.write vec i vec_j
				MV.write vec j vec_i

			partitioner p i acc = do
				vec_acc <- MV.read vec acc
				if vec_acc > p then
					return i
				else do
					swap i acc
					return $ i+1

IO Vector

TBC

How quick are the various solutions?

Simplest things is to just do ‘:set +s’ in ghci, and then you can see the execution time of anything you run, along with memory usage.

Immutable Vector 8.3 seconds STVector - 0.84 seconds