ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity
Dan Ruta [1], Saeid Motiian [2], Baldo Faieta [2], Zhe Lin [2], Hailin Jin [2], Alex Filipkowski [2], Andrew Gilbert [1], John Collomosse [1,2]
[1] University of Surrey, [2] Adobe Research
In International Conference on Computer Vision (ICCV'21)
[1] University of Surrey, [2] Adobe Research
In International Conference on Computer Vision (ICCV'21)
Abstract
We present ALADIN (All Layer AdaIN); a novel architecture for searching images based on the similarity of their artistic style. Representation learning is critical to visual search, where distance in the learned search embedding reflects image similarity. Learning an embedding that discriminates fine-grained variations in style is hard, due to the difficulty of defining and labelling style. ALADIN takes a weakly supervised approach to learning a representation for fine-grained style similarity of digital artworks, leveraging BAM-FG, a novel large-scale dataset of user generated content groupings gathered from the web. ALADIN sets a new state of the art accuracy for style-based visual search over both coarse labelled style data (BAM) and BAM-FG; a new 2.62 million image dataset of 310,000 fine-grained style groupings also contributed by this work.
Proposed ALADIN architecture for learning a fine-grained style embedding. ALADIN uses a multiple stage encoder where
AdaIN values are aggregated from each encoder layer and passed to the corresponding decoder stages. A concatenation of AdaIN features
from encoder layers on the style branch is trained via a dual reconstruction (Lrec) and constrastive loss (Lcon) under weak supervision
from project group co-membership. The style encode/decoder backbone may take the form of several convolutional layers (ALADIN-S)
or VGG-16 backbone (ALADIN-L).
AdaIN values are aggregated from each encoder layer and passed to the corresponding decoder stages. A concatenation of AdaIN features
from encoder layers on the style branch is trained via a dual reconstruction (Lrec) and constrastive loss (Lcon) under weak supervision
from project group co-membership. The style encode/decoder backbone may take the form of several convolutional layers (ALADIN-S)
or VGG-16 backbone (ALADIN-L).
Paper
ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity, Dan Ruta, Saeid Motiian, Baldo Faieta, Zhe Lin, Hailin Jin, Alex Filipkowski, Andrew Gilbert, John Collomosse, In Proc ICCV'21 2021
|
Video
|
Poster
Coming Soon
Citation
@inproceedings{Ruta:ICCV:2021,
AUTHOR = Ruta, Dan and Motiian, Saeid and Faieta, Baldo and Lin, Zhe and Jin, Hailin and Filipkowski, Alex and Gilbert, Andrew and Collomosse, John",
TITLE = "ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity",
BOOKTITLE = "International Computer Vision Vision Conference (ICCV)",
YEAR = "2021",
}
AUTHOR = Ruta, Dan and Motiian, Saeid and Faieta, Baldo and Lin, Zhe and Jin, Hailin and Filipkowski, Alex and Gilbert, Andrew and Collomosse, John",
TITLE = "ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity",
BOOKTITLE = "International Computer Vision Vision Conference (ICCV)",
YEAR = "2021",
}