Search for a command to run...
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph