SNMN Demo

This is a small barebones demo of the SNMN, from this paper. I'm using my own reproduction of the paper from this repository. To use, choose an image from the list, and then choose a question to use, or type your own in. You can also choose between a model trained on ground truth layouts or without them. Check out my blog post for more details!

Choose an image: CLEVR image

Ask a question (or use the suggested default): The model is only trained to answer compositional-style questions - see the defaults for each image for examples. Use ground truth layout? Ground truth layout gives more interpretable module layouts, but uses extra supervision at training.

Predicted Answer:

Below is the program executed (i.e. the modules used by the network in the order they are used). Given is their name and description, a visualisation of the attention given to the question words at that timestep (darker blue means more attention), the module's effect on the stack (i.e. how many inputs it pops and outputs it pushes), and then a visualisation of the attention over the image currently placed at the top of the stack. For No-op modules, these details aren't given (since No-op does nothing).

Stack Neural Module Network Demo

Predicted Answer: