A Wander through GHC’s New IO library

Simon Marlow

The 100-mile view
• the API changes:
– Unicode
• putStr “A légpárnás hajóm tele van angolnákkal” works! (if your editor is set up right…) • locale-encoding by default, except for Handles in binary mode (openBinaryFile, hSetBinaryMode) hSetEncoding :: • changing theHandle -> TextEncoding -> IO () encoding on the fly hGetEncoding :: Handle -> IO (Maybe TextEncoding)
data TextEncoding latin1, utf8, utf16, utf32, … :: TextEncoding mkTextEncoding :: String -> IO TextEncoding localeEncoding :: TextEncoding

The 100-mile view (cont.)
• Better newline support
– teletypes needed both CR+LF to start a new line, and we’ve been paying for it ever since.

hSetNewlineMode :: Handle -> NewlineMode -> IO () data Newline = LF {- “\n” –} | CRLF {- “\r\n” -} nativeNewline :: Newline data NewlineMode = NewlineMode { inputNL :: Newline, outputNL :: Newline } noNewlineTranslation = NewlineMode { inputNL = LF, outputNL = LF } universalNewlineMode = NewlineMode { inputNL = CRLF, outputNL = nativeNewline } nativeNewlineMode = NewlineMode { inputNL = nativeNewline, outputNL = nativeNewline }

The 10-mile view
• Unicode codecs:
– built-in codecs for UTF-8, UTF-16(LE,BE), UTF-32(LE-BE). – Other codecs use iconv on Unix systems – Built-in codecs only on Windows (no code pages)
• yet…

– The pieces for building a codec are provided…

The 10-mile view
• Build your own codec: API in GHC.IO.Encoding
data BufferCodec from to state = BufferCodec { encode :: Buffer from -> Buffer to -> IO (Buffer from, Buffer to) close :: IO () Saving and restoring state is getState :: IO state important since Handles support setState :: state -> IO () buffering, random access, and } changing encodings type TextEncoder state = BufferCodec Char Word8 state type TextDecoder state = BufferCodec Word8 Char state data TextEncoding = forall dstate estate . TextEncoding { mkTextDecoder :: IO (TextDecoder dstate) mkTextEncoder :: IO (TextEncoder estate) }

The 1-mile view
• Make your own Handles!
mkFileHandle :: (IODevice dev, BufferedIO dev, – why Typeable dev) mkFileHandle, => dev -> FilePath -> IOMode -> Maybe TextEncoding -> NewlineMode -> IO Handle
Type class providing I/O device operations: close, seek, getSize, … Type class providing buffered reading/writing Typeable, in case we need to take the Handle apart again later For error messages

not mkHandle?

ReadMode/WriteMode/…

IODevice
-- | I/O operations required for implementing a 'Handle'. class IODevice a where -- | closes the device. Further operations on the device should -- produce exceptions. close :: a -> IO () Default is for the -- | seek to the specified positing in the data. operation to be seek :: a -> SeekMode -> Integer -> IO () unsupported seek _ _ _ = ioe_unsupportedOperation -- | return the current position in the data. tell :: a -> IO Integer tell _ = ioe_unsupportedOperation -- | returns 'True' if the device is a terminal or console. isTerminal :: a -> IO Bool isTerminal _ = return False … etc …

BufferedIO
class BufferedIO dev where newBuffer :: dev -> BufferState -> IO (Buffer Word8) fillReadBuffer :: dev -> Buffer Word8 -> IO (Int, Buffer Word8) fillReadBuffer0 :: dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8) emptyWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer0 :: dev -> Buffer Word8 -> IO (Int, Buffer Word8) Device gets to allocate the buffer. This allows the device to choose the buffer to point directly at the data in memory, for example. 0-versions are non-blocking, non-0 versions must read or write at least one byte (but may transfer less than the whole buffer)

RawIO
-- | A low-level I/O provider where the data is bytes in memory. class RawIO a where read :: a -> Ptr Word8 -> Int -> IO Int readNonBlocking :: a -> Ptr Word8 -> Int -> IO (Maybe Int) write :: a -> Ptr Word8 -> Int -> IO () writeNonBlocking :: a -> Ptr Word8 -> Int -> IO Int readBuf :: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8)

readBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8) writeBuf :: RawIO dev => dev -> Buffer Word8 -> IO ()

writeBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8)

Example: a memory-mapped Handle
• Random-access read/write doesn’t perform very well with ordinary buffered I/O.
– Let’s implement a Handle backed by a memory-mapped file – We need to
1. define our device type 2. make it an instance of IODevice and BufferedIO 3. provide a way to create instances

Example: memory-mapped files
1. Define our device type
data MemoryMappedFile = MemoryMappedFile { mmap_fd :: FD, mmap_addr :: !(Ptr Word8), mmap_length :: !Int, mmap_ptr :: !(IORef Int) } deriving Typeable
Ordinary file descriptor, provided by GHC.IO.FD Address in memory where our file is mapped, and its length

The current file pointer (Handles have a built-in notion of the “current position” that we have to emulate)

Typeable is one of the requirements for making a Handle

aside: Buffers
module GHC.IO.Buffer ( Buffer(..), .. ) where data Buffer e = Buffer { bufRaw :: !(ForeignPtr e), bufState :: BufferState, -- ReadBuffer | WriteBuffer bufSize :: !Int, -- in elements, not bytes bufL :: !Int, -- offset of first item in the buffer bufR :: !Int -- offset of last item + 1 }

Data

bufRa w

b ufL

b ufR

bufSi ze

Example: memory-mapped files
2. (a) make it an instance of BufferedIO instance BufferedIO MemoryMappedFile where
newBuffer m state = do fp <- newForeignPtr_ (mmap_addr m) return (emptyBuffer fp (mmap_length m) state) fillReadBuffer m buf = do p <- readIORef (mmap_ptr m) let l = mmap_length m if (p >= l) then do return (0, buf{ bufL=p, bufR=p }) else do writeIORef (mmap_ptr m) l return (l-p, buf{ bufL=p, bufR=l }) flushWriteBuffer m buf = do writeIORef (mmap_ptr m) (bufR buf) return buf{ bufL = bufR buf } fillReadBuffer returns the entire file!

flush is a no-op: just remember where to read from next

Example: memory-mapped files
2. (b) make it an instance of IODevice
instance IODevice MemoryMappedFile where close = IODevice.close . mmap_fd seek m mode val = do let sz = mmap_length m ptr <- readIORef (mmap_ptr m) let off = case mode of AbsoluteSeek -> fromIntegral val RelativeSeek -> ptr + fromIntegral val SeekFromEnd -> sz + fromIntegral val when (off < 0 || off >= sz) $ ioe_seekOutOfRange writeIORef (mmap_ptr m) off tell m = do o <- readIORef (mmap_ptr m); return (fromIntegral o) getSize = return . fromIntegral . mmap_length … etc …

Example: memory-mapped files 3. provide a way to create instances
mmapFile :: FilePath -> IOMode -> Bool -> IO Handle mmapFile filepath iomode binary = do (fd,_devtype) <- FD.openFile filepath iomode sz <- IODevice.getSize fd addr <- c_mmap nullPtr (fromIntegral sz) prot flags (FD.fdFD fd) 0 ptr <- newIORef 0 let m = MemoryMappedFile { mmap_fd = fd, mmap_addr = castPtr addr, mmap_length = fromIntegral sz, mmap_ptr = ptr } Open the file and mmap() it

let (encoding, newline) | binary = (Nothing, noNewlineTranslation) | otherwise = (Just localeEncoding, nativeNewlineMode) mkFileHandle m filepath iomode encoding newline

Call mkFileHandle to build the Handle

Demo…
$ ./Setup configure Configuring mmap-handle-0.0... $ ./Setup build Preprocessing library mmap-handle-0.0... Building mmap-handle-0.0... [1 of 1] Compiling System.Posix.IO.MMap ( dist/build/System/Posix/IO/MMap.hs, dist/build/System/Posix/IO/MMap.o ) Registering mmap-handle-0.0... $ ./Setup register --inplace --user Registering mmap-handle-0.0... $ ghc-pkg list --user /home/simonmar/.ghc/x86_64-linux-6.11.20090816/package.conf.d: mmap-handle-0.0

Demo…
$ cat test.hs import System.IO import System.Posix.IO.MMap import System.Environment import Data.Char main = do [file,test] <- getArgs h <- if test == "mmap" then mmapFile file ReadWriteMode True else openBinaryFile file ReadWriteMode sequence_ [ do hSeek h SeekFromEnd (-n) c <- hGetChar h hSeek h AbsoluteSeek n hPutChar h c | n <- [ 1..10000] ] hClose h putStrLn "done" $ ghc test.hs --make [1 of 1] Compiling Main Linking test ...

( test.hs, test.o )

Timings…
$ time ./test /tmp/words file done 0.24s real 0.14s user 0.10s system 99% ./test /tmp/words file $ time ./test /tmp/words mmap done 0.09s real 0.09s user 0.00s system 99% ./test /tmp/words mmap $ time ./test ./words file # ./ is NFS-mounted done 10.44s real 0.20s user 0.52s system 6% ./test tmp file $ time ./test ./words mmap # ./ is NFS-mounted done 0.10s real 0.09s user 0.00s system 93% ./test tmp mmap

More examples
• A Handle that pipes output bytes to a Chan • Handles backed by Win32 HANDLEs • Handle that reads from a Bytestring/text • Handle that reads from text

The -1 mile view
• Inside the IO library
– The file-descriptor functionality is cleanly separated from the implementation of Handles:
• GHC.IO.FD implements file descriptors, with instances of IODevice and BufferedIO • GHC.IO.Handle.FD defines openFile, using FDs as the underlying device • GHC.IO.Handle has nothing to do with FDs

Implementation of Handle
Existential: packs up the IODevice, BufferedIO, Typeable dictionaries, and codec state is existentially quantified data Handle__ = forall dev enc_state dec_state . (IODevice dev, BufferedIO dev, Typeable dev) => Handle__ { haDevice :: !dev, haType :: HandleType, -- read/write/append etc. haByteBuffer :: !(IORef (Buffer Word8)), haCharBuffer :: !(IORef (Buffer CharBufElem)), haEncoder :: Maybe (TextEncoder enc_state), haDecoder :: Maybe (TextDecoder dec_state), haCodec :: Maybe TextEncoding, Two buffers: one for haInputNL :: Newline, bytes, one for Chars. haOutputNL :: Newline, .. some other things .. } deriving Typeable

Where to go from here
• This is a step in the right direction, but there is still some obvious ugliness
– We haven’t changed the external API, only added to it – There should be a binary I/O layer
• hPutBuf working on Handles is wrong: binary Handles should have a different type • in a sense, BufferedIO is a binary I/O layer: it is efficient, but inconvenient

– FilePath should be an abstract type.
• On Windows, FilePath = String, but on Unix, FilePath = [Word8].

– Should we rethink Handles entirely?
• OO-style layers: binary IO, buffering, encoding • Separate read Handles from write Handles?
– read/write Handles are a pain

Sign up to vote on this title
UsefulNot useful