A Questionable Interpretation of the Free Monad

Preamble: My demonstration is a Telegram bot because that's what I've been working with recently, but in principle, the idea I discuss should be doable with any UI toolkit (native/web/Slack bot/whatever). The bindings I'm using can be found here and were written by me, and are half-published, in the sense that the source is in that repository, but not on Hackage, because I've been making too many breaking changes to feel comfortable publishing it there yet. That said, I am mostly happy with how it's turning out. The API there is mostly just the typical MTL/ReaderT pattern with a bunch of types and methods defining how to interact with Telegram. This post is discussing one potentially-large change I'm deliberating, with the experimental API living in this separate example repository, which I walk through in this post.

Consider the following:

main :: IO ()
main = do
  putStrLn "Watch me compute the 'or' function! Choose two bools:"
  x <- readLn
  y <- readLn
  putStrLn $ "Result: " <> show (x || y)

I like this API. Unfortunately, most of my real-world code does not look like this. When I make an application, interactively asking the user for a couple inputs and doing something with them usually involves a couple of orders of magnitude more code.

So I made the following experimental Telegram API, which, modulo the redacted bot token (you can get your own from @botfather), is a runnable example with cabal run:

import qualified Data.Text as T
import TgFree


instance Choosable Bool

main :: IO ()
main = runTg "YOUR_BOT_TOKEN" \_ -> do
  sendMessage "Watch me compute the 'or' function! Choose two bools:"
  x <- askUserChoice "First bool"
  y <- askUserChoice "One more"
  sendMessage $ "Result: " <> T.show (x || y)

Here is a video of what it looks like:

So, how does it work?

The Choosable class specifies how to display the choices of an Enum, with parent classes Bounded and Enum providing the low-level encoding that goes over the network:

class (Bounded e , Enum e) => Choosable e where
  default choiceName :: Show e => e -> Text
  choiceName :: e -> Text
  choiceName = T.show

The meat is runTg. At a high level, it's a loop wrapping the longpolling API to wait for incoming messages, which it then passes to your closure and executes the resulting monad. The sendMessage command is just sending an HTTP request to the sendMessage endpoint with all the fluff shoved under the rug to make my example look pretty.

The askUserChoice lines are more interesting: we are blocking on user input from Telegram a la readLn. At first, you might ask "wait, what if you have more than one user? Is the application stuck?" Well, here's the first difference with readLn: askUserChoice "blocks" in a threaded way, more like await. Further, the resumption is specifically on a response to this particular message: if the user proceeds in some other way, those inputs will be processed as usual.

I ask the reader to pause and consider this for a moment, as it raises several questions: how does one implement this? What happens if the user doesn't respond, or doesn't respond in a reasonable amount of time? Isn't the "application state" being stored in the closure, i.e., if you turn the server off and turn it back on, it will "forget" where the user is? And most importantly: is this even valid, much less a good idea?

Well, some of these I know the answer to, others I don't. I'll address them out of order:

Isn't the "application state" being stored in the closure, i.e., if you turn the server off and turn it back on, it will "forget" where the user is?

Yes. The standard way of doing this is you send the message with the choice list, get a response with that MessageId, store that ChatId/MessageId in a database somewhere representing the partially-completed state of the form, then when a future input comes, you read from the database, see if that input is an interaction related to this partially-completed form, and if so, update the state and send the next part. This results in the various steps of your form living in scattered places around your code file, with the state living in cumbersome constructions of partially-completed form representations. It does work, yes, but it's verbose, bug-prone, and very difficult to read, which is the motivation for creating this new API inspired by readLn. But yes, this new way does come at the cost of the ability to "remember" a partially-completed form state when you turn the application off and on.

What happens if the user doesn't respond, or doesn't respond in a reasonable amount of time?

If they don't respond, the closure sits in memory until you restart the application. That said, it should not be difficult to implement a garbage collector, where either a timeout or a user interaction taking the bot in a different direction would discard old closures. In terms of UX, this could either delete the partially-completed form messages or simply leave them there as inert (not responding to future inputs or giving a friendly error saying to restart the form).

How does one implement this?

This may be obvious to some, but it wasn't obvious to me, and took me a fair bit of thinking and tinkering to get it to work. The typical tactic of defining a MonadTelegram interface with a sendRequest method which you implement for ReaderT TelegramToken IO obviously didn't work, as you wind up with no way to get your hands on the "next" part to store it somewhere for the "await" semantics. So, in order to be able to explicitly manipulate the "next part", I reached for Free:

data TgCommand a where
  SendText :: Text -> (() -> a) -> TgCommand a
  Ask :: [ Text ] -> Text -> (Int -> a) -> TgCommand a
  LiftOldAPI :: forall b a . T.TRequest b -> (b -> a) -> TgCommand a

instance Functor TgCommand where
  fmap f (SendText t n) = SendText t (f . n)
  fmap f (Ask cs t n) = Ask cs t (f . n)
  fmap f (LiftOldAPI x n) = LiftOldAPI x (f . n)

type Tg = Codensity (F.Free TgCommand)

{-# INLINE sendMessage #-}
sendMessage :: Text -> Tg ()
sendMessage m = lift $ F.Free $ SendText m F.Pure

{-# INLINE askUserChoice #-}
askUserChoice :: forall e . Choosable e => Text -> Tg e
askUserChoice t = lift $ F.Free $
  Ask (choiceName @e <$> [ minBound .. maxBound ]) t $ F.Pure . toEnum

sendMessage and askUserChoice are ergonomics for the public API, so the user of the library doesn't have to know anything about Codensity and Free. LiftOldAPI is a portal into my existing API, not relevant to anything here.

I interpret this structure into the Telegram monad (the normal one from the preamble, just an alias for ReaderT _ IO):

type Callback = (T.CallbackQueryId,T.CallbackContent)
type ClosureMap = M.Map (ChatId,MessageId) (Chan Callback)

{-# INLINE interpretTg #-}
interpretTg :: ChatId -> IORef ClosureMap -> Tg a -> T.Telegram a
interpretTg chat_id r = \x -> interpretF chat_id r (lowerCodensity x)

The key thing to note here is IORef ClosureMap. The await mechanism will be implemented with that Chan Callback: when we ask the user for a choice, we'll allocate a Chan, store it in this map at the corresponding (ChatId,MessageId), and block on reading from that Chan.

Here's the full interpreter, split into two parts to show the normal and interesting sections:

interpretF :: ChatId -> IORef ClosureMap -> F.Free TgCommand a -> T.Telegram a
interpretF chat_id r = go
  where
    go :: F.Free TgCommand a -> T.Telegram a
    go = \case
      F.Pure x -> pure x
      F.Free (LiftOldAPI x n) -> do
        v <- T.throwTelegram x
        go $ n v
      F.Free (SendText t n) -> do
        T.throwTelegram $ T.sendMessage T.OutgoingMessage {
            chat_id = chat_id
          , text = t
          , parse_mode = T.GFM
          , disable_notification = False
          , reply_to_message_id = Nothing
          , reply_markup = Nothing
          }
        go $ n ()

The above section is straightforward: you just interpret the commands into the existing Telegram monad and call the next step with whatever the result was supposed to be. The next case is the interesting one, where the await logic is implemented:

      F.Free (Ask choices t n) -> do
        msg <- T.throwTelegram $ T.sendMessage T.OutgoingMessage {
            chat_id = chat_id
          , text = t
          , parse_mode = T.GFM
          , disable_notification = False
          , reply_to_message_id = Nothing
          , reply_markup = Just $ T.InlineKeyboardMarkup $ flip imap choices \i c -> pure
              T.InlineKeyboardButton {
                text = c
              , url = Nothing
              , callback_data = Just (T.show i)
              }
          }
        (callback_query_id,choice) <- liftIOTg do
          chan <- newChan
          modifyIORef' r (M.insert (chat_id,msg.message_id) chan)
          readChan chan <&> fmap \case
            T.GameShortName _ -> error "Unexpected callback content."
            T.Data d -> case readMaybe (T.unpack d) of
              Nothing -> error "Unexpected callback content."
              Just (i :: Int) -> if i < Prelude.length choices
                then i else error "Unexpected callback content"
        T.throwTelegram $
          (,) <$> T.editMessageReplyMarkup chat_id msg.message_id do
                     pure $ T.InlineKeyboardMarkup $ pure $ pure $ T.InlineKeyboardButton {
                         text = choices !! choice
                       , url = Nothing
                       , callback_data = Just "_"
                       }
              <*> T.answerCallbackQuery T.CallbackQueryAnswer {
                      callback_query_id = callback_query_id
                    , text = Nothing
                    , show_alert = False
                    , url = Nothing
                    , cache_time = Nothing
                    }
        go $ n choice

The gist is we send the choice message, allocate a Chan, store it in the IORef ClosurMap for the (ChatId,MessageId) pair so we can find it later, then block on the chan. The penultimate part is just some UX fluff to modify the widget in Telegram to display the user's selected choice and make Telegram happy. The final line is simply interpreting the next step with the decoded result.

The key thing to notice is that so far we've only allocated a Chan and blocked on it: nowhere have we yet sent anything over the Chan! That comes in the implementation of runTg:

runTg :: T.BotToken -> (T.Incoming -> Tg ()) -> IO ()
runTg t f = do
  r :: IORef ClosureMap <- newIORef mempty
  longpollRunner
    do t
    do \case
        T.CallbackQuery { chat = chat , callback_query_id = cid , messageId = mid , content = d } -> do
          m <- liftIOTg $ readIORef r
          case M.lookup (chat.chat_id,mid) m of
            Nothing -> do
              T.throwTelegram $ T.answerCallbackQuery T.CallbackQueryAnswer {
                      callback_query_id = cid
                    , text = Nothing
                    , show_alert = False
                    , url = Nothing
                    , cache_time = Nothing
                    }
              pure ()
            Just c -> liftIOTg do
              modifyIORef' r (M.delete (chat.chat_id,mid))
              writeChan c (cid,d)
        i@(T.New { chat_id = chat_id }) -> do
          let T.Telegram (ReaderT io) = interpretTg chat_id r (f i)
          T.Telegram $ ReaderT \cx ->
            forkIO $ io cx
          pure ()
        _ -> pure ()

In the above, for an incoming CallbackQuery (Telegram lingo for "the user selected a choice from your widget"), look in the closure map for a Chan we stored earlier, and if found, take it out of the map and execute it with the decoded choice! For the other case, when we have an incoming message, note specifically that we don't just interpret directly into the parent monad: rather, we break it apart, throw a forkIO in, and put the pieces back together. This is the part I'm least confident in: I have difficulty proving to myself this is "correct" (whatever that means here), but I also have difficulty proving to myself it's incorrect. To be clear, forkIO has to live somewhere, otherwise we'd just be blocking the whole application. The most convincing argument I can make to myself is when you use the webhooks API (which you typically do for a real server: the longpoll API is typically just used for development), there's a forkIO hiding implicitly in the handler anyway (otherwise your server would only respond to one request at a time).

For the _ -> pure () case, don't worry about that: that's just me saying I don't care to think about a bunch of other superfluous Telegram stuff like the shopping API or how the bot behaves when administering a channel, etc. Obviously for a fuller treatment, that would be handled, but none of that is relevant to the main idea I'm playing with at present.

And now to the final question:

Is this even valid, much less a good idea?

I don't know. That's why I'm writing this, to see what others think of it.

My thoughts are tarnished by my experience with writing interactive bots: in my experience, it is much more cumbersome than writing similar forms for the browser, where typically all the pieces of the form are sent to the server in a single JSON request. For the bot paradigm, writing this sort of interactive flow in the traditional way, where you manually keep track of partially-completed form states in a database, is extremely verbose, bug-prone, and some of the least-readable and least-maintainable Haskell code I've ever written. This experiment is my attempt to make it not verbose: to have all the logic live in one place, and be readable and maintainable at least by a future version of myself. If I have to give up half-completed form states on server restarts, that is a price I am willing to pay. My main concern is what I'm doing feels so unorthodox and new that I might just be crazy and none of this actually works or behaves the way I want at all (though the code does run and seems to behave as I expect...)

For example, an explicit questionable behavior I can enunciate is that this pattern should obviously also apply to asking for textual input, but then that raises the question when you implement n <- askUserText "What is your name?", what happens when the user types "/cancel"? You end up in this quirky state where you're unable to name yourself "/cancel" (that, or the "/cancel" command doesn't work). This does rub me the wrong way as a purist, as you should be able to choose any string you want. But at the same time, it's not like I have a good answer to what to do in this situation without this library, either, so I'm really just taking a problem that was already foisted on the application developer (me) and lamenting that it's now foisted on the library developer (also me). Still, I do hold library code to a higher standard than application code.

Assuming all this is valid and not a terrible mistake, there's plenty more that could be built atop this paradigm: I'd like to generalize Choosable to be more than just a visual representation, to more like a batch of interactions that could be used to produce a value of a type, so you can just build up some fancy type out of existing types, then get the whole interaction sequence complete with error handling for free with deriving Generic.

One reason I think all this smells a bit weird is that you don't have full control over the Telegram UI: when you make a typical native application, you can force the user into a state where they must either do the next step of the form or explicitly choose to back out of it with a "back" button. But with this bot paradigm, the user always has a text box which they can interact with and do other stuff that lives outside of any interaction flow they've initiated. And it's not even just a text box -- they can send you pictures, or gifts, or record video of themselves. When you design a form in the browser, you never have these weird questions like "What if the user records a video of themself?" when you ask them for a Bool. In that sense, I think implementing the idea I'm discussing here for a widget toolkit you have full control over like native UI kits or the browser would be easier to convince onself of correctness.

Finally, another question I'm thinking about but haven't resolved yet is how to fit this into my typical effect framework. I usually just make newtype MonsterM a = MonsterM (ReaderT _ IO a), then interpret all my effects into that in the obvious way, and write my domain logic in (MonadDatabase m , MonadTelegram m , ...) => m (). With the paradigm I've discussed here, I actually have to think about how to wire this up with everything else as it cannot be treated merely as ReaderT _ IO. And if I'm going to use something in the real world, I'd prefer it be simple enough that I don't have to think about it.

Anyway, I hope I expressed my idea clearly. I am looking forward to your feedback!

Postscript galaxy-brain thought: the whole "forgetting on restart" thing could be mitigated if we'd stop using volatile DRAM. If we'd use something like Intel's now-memory-holed Optane tech, you could turn the app off and on without forgetting the heap. Of course, you still forget when you change your code. But perhaps that's incidental as well: after all, when you redefine something in ghci, it doesn't forget the existing values on the heap. But what would developing in that world even look like? Anyway, I digress.