Paper summary

Mask R-CNN

1.What is this paper about?

It proposes the reference free metrics CLIPScore which achive the high colitation with human evaluation in image caption.

2.What’s better than previous paper?

The metric in previous validation is mainly used reference-based one. However it is in contrast to the reference-free manner in which humans assess caption quality. Its proposal is similar way to human way.

3.What are important parts of technique and methods?

it is the score connections between images and words.

4.How did they verify it?

5.Is there a debate?

It has some risk. Depends on the bias of pre-trainig data, so it needs to reflect the world’s conditions, such as social biases.