SVIB: Systematic Visual Imagination Benchmark

Yeongbin Kim^1,, Gautam Singh^2,, Junyeong Park¹, Caglar Gulcehre^3,4, Sungjin Ahn¹

¹KAIST, ²Rutgers University, ³EPFL, ⁴Google DeepMind

*Equal Contribution

Paper Github Dataset

Dataset

Our benchmark provides a total of 12 tasks.

Subsets. The 12 tasks are divided into 3 subsets containing 4 tasks each. This division is based on the perceptual complexity of the underlying visual worlds. We refer to these 3 subsets as:

SVIB-dSprites: The underlying scenes are 2-dimensional and visually simple.
SVIB-CLEVR: The underlying scenes are 3-dimensional and visually simple.
SVIB-CLEVRTex: The underlying scenes are 3-dimensional and with complex textures on the objects and in the background.

Rules. Within each subset of the 3 aforementioned subsets, we provide 4 tasks based on 4 rules of increasing complexity:

Single Atomic (S-A). The underlying rule only modifies one factor per object. The factor modification is a function of only one causal parent. In SVIB, this corresponds to the Shape-Swap rule i.e., the input scene can be transformed to the target scene by swapping the shapes of the two objects while keeping the other factors e.g., color, size, etc., the same as that in the input scene.
Multiple Atomic (M-A). The underlying rule modifies multiple factors per object. The factor modification is a function of only one causal parent.
Single Non-Atomic (S-NA). The underlying rule only modifies one factor per object. Each factor modification is a function of multiple causal parents.
Multiple Non-Atomic (M-NA). The underlying rule modifies multiple factors per object. Each factor modification is a function of multiple causal parents.

Splits. Within each of these 12 tasks, we provide 3 training splits and 1 testing split as follows:

Easy Training Split. This training split corresponds to a generous α value of 0.6. It contains 64000 episodes.
Medium Training Split. This training split corresponds to an α value of 0.4. It contains 64000 episodes.
Hard Training Split. This training split corresponds to an α value of 0.2. It contains 64000 episodes.
Testing Split. The testing split is common across all training splits. It exposes combinations not shown in any of the training splits. It contains 8000 episodes.

Contents of an Episode. Within each split, each episode has its own directory containing the following files:

Input image.
Target image.
Ground truth scene metadata for the input image.
Ground truth scene metadata for the target image.
Ground truth object masks for the input image.
Ground truth object masks for the target image.

Download Links

SVIB dataset preview/sample (356 MB)

To download all the tasks for a specific visual complexity level and generalization difficulty, use the links below.

dSprites

Object Property

4 Shapes
4 Colors
4 Sizes

CLEVR

Object Property

4 Shapes
6 Colors
3 Sizes
2 Materials

CLEVRTex

Object Property

8 Shapes
3 Sizes
8 Materials

To download all the splits of a specific task (out of the 12 tasks), use the following links. We provide the splits associated with all α values i.e., 0.0, 0.2 (Hard Split), 0.4 (Medium Split), and 0.6 (Easy Split). The following links point to the identical dataset as above but is packaged differently in a task-wise manner.

Omni-Composition Datasets Here, we provide an omni-composition dataset for each of the 3 subsets: SVIB-dSprites, SVIB-CLEVR, and SVIB-CLEVRTex. An omni-composition dataset is a dataset containing unpaired images that capture all possible combinations of primitives under the visual vocabulary of its corresponding environment.

SVIB-dSprites: Omni-Composition (219 MB)

SVIB-CLEVR: Omni-Composition (2.4 GB)

SVIB-CLEVRTex: Omni-Composition (3.5 GB)

SVIB: Systematic Visual Imagination Benchmark

Yeongbin Kim^1,, Gautam Singh^2,, Junyeong Park¹, Caglar Gulcehre^3,4, Sungjin Ahn¹

¹KAIST, ²Rutgers University, ³EPFL, ⁴Google DeepMind

*Equal Contribution

Paper Github Dataset

Abstract

Compositional Visual World

Systematic Training and Testing Splits

Dataset

Download Links

dSprites

Object Property

CLEVR

Object Property

CLEVRTex

Object Property

SVIB: Systematic Visual Imagination Benchmark

Yeongbin Kim1,*, Gautam Singh2,*, Junyeong Park1, Caglar Gulcehre3,4, Sungjin Ahn1 1KAIST, 2Rutgers University, 3EPFL, 4Google DeepMind *Equal Contribution Paper Github Dataset

Abstract

Compositional Visual World

Systematic Training and Testing Splits

Dataset

Download Links

dSprites

Object Property

CLEVR

Object Property

CLEVRTex

Object Property

Yeongbin Kim^1,, Gautam Singh^2,, Junyeong Park¹, Caglar Gulcehre^3,4, Sungjin Ahn¹

¹KAIST, ²Rutgers University, ³EPFL, ⁴Google DeepMind

*Equal Contribution

Paper Github Dataset