The Composer and the GAN: Creativity, Critique, and Curation

Simon Hutchinson

Generator & Discriminator

Years ago, when I first started teaching composition lessons, I quickly noticed that my students tended to fall into two distinct groups. The first group struggled to produce material, often coming into lessons saying that they had not been able to come up with anything, or with just a measure or two of new material. The second group, in contrast, generated plenty of musical material and needed help shaping this material to better convey their expressive ideas. With the first group, our lessons focused on strategies for generating ideas without self-censorship. With the second, our conversations centered on revision and refinement. These two distinct challenges students face learning composition resonate with my own (mature?) creative process, where I very consciously switch between a “generative mode”, freely churning out musical material, and a “revision mode”, where I sift through and refine what’s worth keeping, considering what needs to be refined and often abandoning a great deal of material and even completed pieces.

What is a GAN?

Generative Adversarial Networks (GANs) are a class of machine learning models that operate through an interplay between two neural networks: a generator and a discriminator. The developers explain GANs through the metaphor of a counterfeiter and a police officer: the counterfeiter/generator tries to produce convincing forgeries (i.e. music, art, etc.), while the police officer/discriminator attempts to determine whether they are real or forgeries[1]. If the generator produces something and the discriminator thinks is authentic, then “bad discriminator. Refine to better catch forgeries” (that is, the model adjusts internal parameters through backpropagation to improve). If a generator’s output is detected as a forgery, then “bad generator. Refine to make better forgeries.” Over the iterations of the learning process, then, they both improve. The counterfeiter learns to make more convincing fakes, and the police officer becomes more adept at spotting them.

To briefly compare, other classes of machine learning models operate differently. Recurrent Neural Networks (RNNs), for example, generate sequences by predicting the next element from previous ones. Transformer models, like those behind LLMs like ChatGPT, work on relationships of parts across an entire sequence, enabling more coherent long-range structure. GANs are distinct in that they embed a critique process into their architecture itself, a feedback loop between generation and evaluation.

The Artist as Generator and Discriminator

While the Tarantino-esque framing of counterfeiter and police officer captures the dynamic tension at the heart of generative adversarial networks (and perhaps furthers the notion that the purpose of AI is to create counterfeits), as a composer, the structure of GANs immediately conjures up an artist’s dual internal roles of creator and curator. As a composition teacher, I try to help the student refine both their ability to create musical material (their generator) even when they don’t feel “inspired”, and their critical ear (their discriminator). Over time, the student learns not only to produce a greater volume of expressive ideas as options for their work, but also to evaluate these options more effectively and to hear what is working and revise accordingly. In this way, the student’s internal generator and discriminator co-evolve, and, even as a mature artist, these dual internal roles remain.

This process, of course, is not limited to music, and Noam Chomsky emphasized the lack of reflection in his critique of ChatGPT in 2023: “Intelligence consists not only of creative conjectures but also of creative criticism. Human-style thought is based on possible explanations and error correction, a process that gradually limits what possibilities can be rationally considered.”[2] Chomsky makes the point that genuine intelligence, whether in music, writing, or scientific reasoning, relies on both generation and critical evaluation.

Human vs. Machine Goals

In a GAN, as in many generative AI classes, the goal is to produce output of statistical resemblance to something “real”, be it a photo of a cat, a piece of music, or a Ghibli-style self-portrait. To do so, these generative AIs are trained on vast amounts of data and thereby inherit the biases of their training sets, reinforcing dominant styles, aesthetics, or representations present in the data. Success is defined by resemblance to the training data, and so innovation is deviation, and deviation is error.

Returning to our human composition student, their goals are rarely to produce music that is indiscernible from other music, rather they are pursuing the goal of self-expression. Unlike training generative AI, then, the (good) composition teacher’s role in training is to help the student recognize when their work serves their expressive intentions. We do not guide our students based on the “big data” of all music, but rather through more intimate, contextualized listening (perhaps, continuing the metaphor, with smaller, more personalized datasets).

Curating Abundance

I recently attended what might be called a “noise music” concert, where the performer played a feedback-based electronic instrument. Due to the complexity of a system feeding back on itself, the musician’s had limited direct control over what the sound would be, and instead the performance was about attempting to guide the sound generation into interesting and expressive places, mostly through adjusting when the sound diverged from their aesthetic goals. In this case, part of the generative process has been passed to the machine, and the artist performs live acts of discrimination.

Brian Eno articulated this idea in a 1995 interview, where he observed: “An artist is now a curator. An artist is now much more seen as a connector of things, a person who scans the enormous field of possible places for artistic attention, and says, ‘What I am going to do is draw your attention to this sequence of things.’”[3] Whether working with generative systems, improvisational sketches, or algorithmic outputs, the creative process often involves producing more than one needs and then shaping that abundance into something meaningful. Setting aside the conversation on GANs for the moment, Eno’s point perhaps applies even more acutely to the present day, where there are enormous amounts of content generated every second by both humans and machines, and our task increasingly becomes one of using our own discriminators to filter this abundance.

In reflecting on GANs through the lens of musical composition, I find a technical analogy and a call to consider both human and machine creative processes. The roles of generator and discriminator mirror the artist’s own internal dialogue between free spontaneity and critique, but we would never hope to optimize human creativity for resemblance. Human creativity, ideally, conveys meaning, and, in a world of generative overabundance, our discrimination must strive for higher goals than fidelity.

[1] Goodfellow, Ian, et al. “Generative Adversarial Networks.” arXiv, 10 June 2014, arXiv:1406.2661

[2] Chomsky, Noam, Ian Roberts, and Jeffrey Watumull. “The False Promise of ChatGPT.” The New York Times, 8 Mar. 2023, www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html.

[3] Kelly, Kevin. “Gossip Is Philosophy.” Wired, May 1995, www.wired.com/1995/05/eno-2.