In NeurIPS 2021, CtrlGen Workshop

Controlled Cue Generation for Play Scripts

Bogazici University

Video

Abstract

In this paper, we use a large-scale play scripts dataset to propose the novel task of theatrical cue generation from dialogues. Using over one million lines of dialogue and cues, we approach the problem of cue generation as a controlled text generation task, and show how cues can be used to enhance the impact of dialogue using a language model conditioned on a dialogue/cue discriminator. In addition, we explore the use of topic keywords and emotions for controlled text generation. Extensive quantitative and qualitative experiments show that language models can be successfully used to generate plausible and attribute-controlled texts in highly specialised domains such as play scripts.

Input GPT2-FT PPLM+CueDisc
OWEN. Yeah. He’s awesome MEG. Yeah. He is. He has the biggest smile on earth. . . OWEN. Oh WADSWORTH. You think he likes me, too? OWEN. (Laughs, but then gets serious: )
VAL. He’s been looking at me STEWART. Who is he? VAL. I don’t know, never seen him. (VAL stares back at JOE, who looks confused.) VAL. I said, “What’s his problem?
Table 1: CueGen generates interesting cues given play scipt dialogue lines.

What is a Cue?

Cues are unspoken lines that can drastically effect how the dialogues are perceived by the audience. A cue can be a trigger for an action, an informative description of the stage, or thoughts or body language of the characters.

JOHN: I don't know what to do anymore.

(JOHN: turns around and leaves.)


LIZZIE: How do you…? (Putting things together:) No...

POYDRAS: But you also have her eyes.

LIZZIE: (Weeps. Realizes she is looking at her father. Takes a moment.)

Methodology

Given dialogue lines as input, we use a fine-tuned GPT-2 model, ILM and PPLM frameworks to generate cues. In addition to cue generation, we extend the PPLM framework with an automatic topic keyword extraction module and an emotion-based attribute model to generate text with the target topics or emotions.

Cue/Dialogue Discriminator

  • We train a single-layer cue/dialogue classifier using 10% of our data
  • We use the trained classifier to steer the generation towards the higher log-likelihood of dialogues or cues as specified by the user

Topic Modeling with LDA

  • We use 10% of our dataset for topic modeling and automatic topic keyword extraction
  • WA target topic selected by the user is then used to steer the language generation process to maximize the log-likelihood of the extracted keywords of the target topic

Emotion Classifier

  • We use DeepMoji to create an emoji-labeled subset with 10% of our data
  • We map the predicted emojis to a subset of emotions from Plutchik’s Wheel of Emotions, such as "happy", "angry", "disappointed", and train a multi-label emotion classifier
  • We use the trained classifier to steer the generation towards generating text with the target emotion

Experiments

  • GPT-2+ FT: Given a line of dialogue as input, we use a fine-tuned GPT-2 medium model to generate text.
  • ILM: ILM enables LMs to infill variable-length spans using both preceding and subsequent text. We fine-tune the GPT-2 small model on dialogue - cue - dialogue line triples. The cue is masked during training and sampling since our goal is to generate cues. Infilling is performed by using the preceding and succeeding dialogues.
  • PPLM+LDA: We detect the topic of the input dialogue line using LDA and steer the generation towards generating text with the same topic.
  • PPLM+CueDisc: We train a cue/dialogue sentence type discriminator and control the generation process using this classifier.
  • PPLM+Emotion: We train a multi-label emotion classifier and steer the generation process to generate text that reflects the target emotion specified by the user

We use n-gram similarity (LCSR and BI-SIM) and distance metrics to measure the similarity of the generated text to a reference cue corpus. We measure the diversity of the text generated by each model by the number of distinct n-grams (normalized by the length of text) and report the Dist-1, Dist-2, and Dist-3 scores.

Conclusion

In this paper, we use a large-scale play script dataset and propose the novel task of generating theatrical cues from dialogues. We approach the cue generation problem as a controlled text generation task and use a plug-and-play language model with a cue/dialogue discriminator, LDA-based topic keyword lists, and a multi-label emotion classifier to steer the language model to the desired attributes without re-training the model. Our experiments show that language models can be successfully used to generate plausible and attribute-controlled text in highly specialized domains such as plays. In the future, we plan to explore character and person-based cue and dialogue generation tasks with plug-and-play models.

BibTeX

@article{dirik2021cuegen,
  author    = {Dirik, Alara and Donmez, Hilal and Yanardag, Pinar},
  title     = {Controlled Cue Generation for Play Scripts},
  journal   = {CtrlGen: Controllable Generative Modeling in Language and Vision Workshop at NeurIPS 2021},
  year      = {2021},
}

Acknowledgements

This publication has been produced benefiting from the 2232 International Fellowship for Outstanding Researchers Program of TUBITAK (Project No:118c321). We also acknowledge the support of NVIDIA Corporation through the donation of the TITAN X GPU and GCP research credits from Google.