The duplicates aren't perfect duplicates and are added to create more robust model results. Like how an image of a giraffe rotated 90 degrees is still a giraffe even though the patterns are no longer the same. Same thing applies with the Stallone pic, the noise and errors help the model deal with suboptimal image quality
Albino_Jackets t1_jb5cq6x wrote
Reply to [R] We found nearly half a billion duplicated images on LAION-2B-en. by von-hust
The duplicates aren't perfect duplicates and are added to create more robust model results. Like how an image of a giraffe rotated 90 degrees is still a giraffe even though the patterns are no longer the same. Same thing applies with the Stallone pic, the noise and errors help the model deal with suboptimal image quality