Neural Style Transfer by Leon Gatys

Online

Leon Gatys

Neural Style Transfer

Aug 26, 2025

Overview

"I remember we were discussing to combine the texture of artworks with photographs and I wasn’t really sure if it would work."

- Leon Gatys

Details

Inquire about this exhibition

In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with similar capabilities. However, in other key areas of visual perception such as object and face recognition near-human performance was recently demonstrated by a class of biologically inspired vision models called Deep Neural Networks. Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images. Moreover, in light of the striking similarities between performance-optimised artificial neural networks and biological vision, our work offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery.

The key finding of this paper is that the representations of content and style in the Convolutional Neural Network are separable. In other words, it is possible to manipulate both representations independently in order to produce new, perceptually meaningful images. To demonstrate this finding, the authors generate images that mix the content and style representations from two different source images. Specifically, they match the content representation of a photograph depicting the “Neckarfront” in Tübingen, Germany with the style representations of several well-known artworks drawn from different periods of art.

The images are synthesized by finding a single image that simultaneously matches the content representation of the photograph and the style representation of the chosen artwork. While the global arrangement of the original photograph is preserved, the colors and local structures that compose the global scenery are provided by the artwork. This effectively renders the photograph in the style of the painting, such that the appearance of the synthesized image resembles the work of art, even though it retains the same content as the photograph.

The style representation itself is multi-scale, spanning multiple layers of the neural network. In the examples shown, the style representation included layers from the entire network hierarchy. However, style can also be defined more locally by including only a smaller number of lower layers, leading to distinct visual experiences. When matching style representations at progressively higher layers in the network, local image structures are matched on increasingly larger scales, producing a smoother and more continuous visual effect. As a result, the most visually appealing images are usually created by matching the style representation up to the highest layers in the network.

In general, the method of synthesizing images that mix content and style from different sources provides a powerful new tool for studying perception and the neural representation of art, style, and content-independent image appearance. This approach allows the design of novel stimuli that introduce two independent, perceptually meaningful sources of variation: the content of an image and its appearance. Such a capability can be applied across a wide range of experimental studies in visual perception, from psychophysics to functional imaging, and even electrophysiological neural recordings.

The work also offers an algorithmic understanding of how neural representations can independently capture both the content of an image and the style in which it is presented. Crucially, the mathematical form of the style representations produces a clear, testable hypothesis about the representation of image appearance down to the single-neuron level. The style representations operate by computing correlations between different types of neurons in the network. This is a biologically plausible process, as such correlation extraction is already known to occur in the visual system, for example through the so-called complex cells in the primary visual cortex (V1). These findings suggest that performing a similar complex-cell-like computation at various processing stages along the ventral stream could provide a mechanism for obtaining a content-independent representation of visual appearance.

Ultimately, it is remarkable that a neural system trained to perform one of the core computational tasks of biological vision, object recognition, automatically develops image representations that enable the separation of content from style. One explanation is that, in learning object recognition, the network must achieve invariance to all image variations that preserve object identity. Representations that factorize the variation between content and appearance would be extremely advantageous for this task. In this sense, the human ability to abstract content from style, and thereby to create and appreciate art, may be seen as a profound expression of the inference capabilities of our visual system.

Leon Gatys