HyperNST: Hyper-Networks for Neural Style Transfer
Dan Ruta [1], Andrew Gilbert [1], Saeid Motiian [2], Baldo Faieta [2], Zhe Lin [2], John Collomosse [1,2]
[1] University of Surrey, [2] Adobe Research
In European Conference on Computer Vision VISART ECCV'22
[1] University of Surrey, [2] Adobe Research
In European Conference on Computer Vision VISART ECCV'22
Abstract
We present HyperNST; a neural style transfer (NST) technique for the artistic stylization of images, based on Hyper-networks and the StyleGAN2 architecture. Our contribution is a novel method for inducing style transfer parameterized by a metric space, pre-trained for style-based visual search (SBVS). We show for the first time that such space may be used to drive NST, enabling the application and interpolation of styles from an SBVS system. The technical contribution is a hyper-network that predicts weight updates to a StyleGAN2 pre-trained over a diverse gamut of artistic content (portraits), tailoring the style parameterization on a per-region basis using a semantic map of the facial regions. We show HyperNST to exceed state of the art in content preservation for our stylized content while retaining good style transfer performance.
Architecture diagram of our approach. Facial semantic segmentation regions used in conditioning via ALADIN and guiding via patch co-occurence discriminator a HyperStyle model into embedding a content portrait image into an AAHQ+FFHQ trained StyleGAN2 model, and using ALADIN style codes to stylize it towards the style of a style image.
Blue modules represent frozen modules, and orange modules represent modules included in the training. + represents concatenation in the channels dimension.
(A) represents the stylization pass losses, and (B) represents the reconstruction pass losses. Not pictured for clarity: the reconstruction pass uses the same semantic regions as the content image for the 256x256x256 semantically arranged ALADIN conditioning.
Blue modules represent frozen modules, and orange modules represent modules included in the training. + represents concatenation in the channels dimension.
(A) represents the stylization pass losses, and (B) represents the reconstruction pass losses. Not pictured for clarity: the reconstruction pass uses the same semantic regions as the content image for the 256x256x256 semantically arranged ALADIN conditioning.
Paper
HyperNST: Hyper-Networks for Neural Style Transfer Dan Ruta, Andrew Gilbert, Saeid Motiian, Baldo Faieta, Zhe Lin, and John Collomosse. In Proc ECCV'22 VISART, 2022
|
Citation
@inproceedings{Ruta:ICCV:2021,
AUTHOR = Ruta, Dan and Gilbert, Andrew and Motiian, Saeid and Faieta, Baldo and Lin, Zhe and Collomosse, John",
TITLE = "HyperNST: Hyper-Networks for Neural Style Transfer",
BOOKTITLE = "European Conference on Computer Vision Workshop VISART ECCV'22W",
YEAR = "2022",
}
AUTHOR = Ruta, Dan and Gilbert, Andrew and Motiian, Saeid and Faieta, Baldo and Lin, Zhe and Collomosse, John",
TITLE = "HyperNST: Hyper-Networks for Neural Style Transfer",
BOOKTITLE = "European Conference on Computer Vision Workshop VISART ECCV'22W",
YEAR = "2022",
}