In this paper, we use a large-scale play scripts dataset to propose the novel task of theatrical cue generation from dialogues. Using over one million lines of dialogue and cues, we approach the problem of cue generation as a controlled text generation task, and show how cues can be used to enhance the impact of dialogue using a language model conditioned on a dialogue/cue discriminator. In addition, we explore the use of topic keywords and emotions for controlled text generation. Extensive quantitative and qualitative experiments show that language models can be successfully used to generate plausible and attribute-controlled texts in highly specialised domains such as play scripts.
Input | GPT2-FT | PPLM+CueDisc |
---|---|---|
OWEN. Yeah. He’s awesome | MEG. Yeah. He is. He has the biggest smile on earth. . . OWEN. Oh | WADSWORTH. You think he likes me, too? OWEN. (Laughs, but then gets serious: ) |
VAL. He’s been looking at me | STEWART. Who is he? VAL. I don’t know, never seen him. | (VAL stares back at JOE, who looks confused.) VAL. I said, “What’s his problem? |
Cues are unspoken lines that can drastically effect how the dialogues are perceived by the audience. A cue can be a trigger for an action, an informative description of the stage, or thoughts or body language of the characters.
JOHN: I don't know what to do anymore.
(JOHN: turns around and leaves.)
LIZZIE: How do you…? (Putting things together:) No...
POYDRAS: But you also have her eyes.
LIZZIE: (Weeps. Realizes she is looking at her father. Takes a moment.)
Given dialogue lines as input, we use a fine-tuned GPT-2 model, ILM and PPLM frameworks to generate cues. In addition to cue generation, we extend the PPLM framework with an automatic topic keyword extraction module and an emotion-based attribute model to generate text with the target topics or emotions.
We use n-gram similarity (LCSR and BI-SIM) and distance metrics to measure the similarity of the generated text to a reference cue corpus. We measure the diversity of the text generated by each model by the number of distinct n-grams (normalized by the length of text) and report the Dist-1, Dist-2, and Dist-3 scores.
In this paper, we use a large-scale play script dataset and propose the novel task of generating theatrical cues from dialogues. We approach the cue generation problem as a controlled text generation task and use a plug-and-play language model with a cue/dialogue discriminator, LDA-based topic keyword lists, and a multi-label emotion classifier to steer the language model to the desired attributes without re-training the model. Our experiments show that language models can be successfully used to generate plausible and attribute-controlled text in highly specialized domains such as plays. In the future, we plan to explore character and person-based cue and dialogue generation tasks with plug-and-play models.
@article{dirik2021cuegen,
author = {Dirik, Alara and Donmez, Hilal and Yanardag, Pinar},
title = {Controlled Cue Generation for Play Scripts},
journal = {CtrlGen: Controllable Generative Modeling in Language and Vision Workshop at NeurIPS 2021},
year = {2021},
}
This publication has been produced benefiting from the 2232 International Fellowship for Outstanding Researchers Program of TUBITAK (Project No:118c321). We also acknowledge the support of NVIDIA Corporation through the donation of the TITAN X GPU and GCP research credits from Google.