How to Make Your Own AI-based Text Generator - Applied AI Part I

This is the first part in a series called Applied AI. It describes how to use pre-made tools, frameworks and libraries to create AI applications, rather than designing and implementing your own neural networks from scratch.

I've always been fascinated with computer generated texts. 20 years ago there was an IRC bot that would take the messages from a channel and generate new messages from them, using markov chains. The output was often just unintelligible, but sometimes the messages were really funny. Later, other text generating services came along, I was always fascinated by them.

A couple of years ago, usable machine learning frameworks were released. Most notably Torch based on Lua, and Tensorflow based on Python (which now also has a JavaScript variant, Tensorflow.js). And with that, prepackaged machine learning models were released as open source, like the Torch Recurrent Neural Network I'll be using in this blog post.

tl;dr - first, the results! I made a Viking Metal Lyrics AI, and then I used it to create lyrics for a song I wrote. Then I recorded the song and made a video. Here it is:

Let's learn how to make your own AI-driven text generation service!

1. How to Train Your Dragon RNN

A wonderful thing about machine learning models is that you can fine tune them. It turns out, if you train a model on general image recognition, you can use that to make it really good at recognizing specific objects by fine tuning it with just a few images of those specific objects. In this case we'll take a pre-trained model for text generation, and fine tune it on texts in a specific style.

The RRN we're going to use is made in Torch and uses Python for some utils. Just getting the correct Python version running on your computer is no small feat, so we're going to skip that completely and do both training and using the RNN in Docker.

Ingredients

  • Docker installed on your machine.
  • A docker image with the text generation model (crisbal/torch-rnn). Choose the right version based on if your machine has CUDA capabilities. In this post we're going with base (no CUDA).
  • A corpus, i.e. a text used to fine tune the generic model. It can't be too small, at least a couple of k of text is needed.

Prepare your corpus

I'll show you how based on my Viking Lyrics Generator repo on github: https://github.com/osirisguitar/viking-lyrics

When you build the docker file, it will copy crawler/lyrics/corpii/viking2.txt into the image. viking2.txt is my corpus made from crawling selected bands' lyrics from Dark Lyrics, you can replace it with any text of sufficient length (at least a couple of hundred kb).

To build the image, run:
:>docker build -t my-torch-rnn .

Make a folder for the trained RNN:

:>mkdir cv

Then start a container, and exec into it:

:>docker run --name my-torch-rnn -v `pwd`/cv:/root/torch-rnn/cv -p 8899:8899 -d my-torch-rnn

:>docker exec -it my-torch-rnn /bin/bash

Now you are inside the container. The next step is to convert your corpus into files suitable for fine tuning and start the training (i.e. fine tuning):

:root@...>cd /root/torch-rnn/

:root@...>python scripts/preprocess.py --input_txt /app/lyrics/viking2.txt --output_h5 /app/viking2.h5 --output_json /app/viking2.json

:root@...>th train.lua -input_h5 /app/viking2.h5 -input_json /app/viking2.json -gpu -1

If everything is working, the output will be something like:

Running in CPU mode
Epoch 1.00 / 50, i = 1 / 11300, loss = 4.876984
Epoch 1.01 / 50, i = 2 / 11300, loss = 4.789840
Epoch 1.01 / 50, i = 3 / 11300, loss = 4.685121
Epoch 1.02 / 50, i = 4 / 11300, loss = 4.502914
Epoch 1.02 / 50, i = 5 / 11300, loss = 4.266636
Epoch 1.03 / 50, i = 6 / 11300, loss = 4.046713
Epoch 1.03 / 50, i = 7 / 11300, loss = 3.839543
Epoch 1.04 / 50, i = 8 / 11300, loss = 3.664974
Epoch 1.04 / 50, i = 9 / 11300, loss = 3.490775
Epoch 1.04 / 50, i = 10 / 11300, loss = 3.330319

Default, it will run 50 epochs, (above it's completed 4% of one epoch - training will take time).

When done, test your fine-tuned RNN:

:root@...>th sample.lua -checkpoint cv/checkpoint_11300.t7 -length 200 -gpu -1 -temperature 0.7

2. Turning the RNN into a Web Service

In my repo, there is already a web service implemented in restify. It serves a static site and an API endpoint that serves lyrics generated with the RNN. Try it out at http://localhost:8899 with your container running.

It turned out the hardest part was to get the torch-rnn stuff working, so instead of starting with a node.js base image I extended torch-rnn by installing node. You're free to adapt it any way you want (the repo is under MIT license).

3.Making a song

As you've already seen in the intro to this post, I made an actual song using my viking metal-trained RNN. So did the RNN do the lyrics all by itself? Well, not entirely... The thing is that RNNs are not very good at structure, so I had to hand-pick generated lines that fit the song's structure - like number of syllables per line etc.

So the content is completely generated by the RNN, but it's been curated by a human. A lot of AI-generated content shared in social media is.

Not all such content is AI-generated at all. Most of  the "I made an AI watch 5000 hours of <genre x> movies" is just fake, funny but made up by humans. There is currently no way of consistently getting good results with 0 human interaction (like editing out stuff that doesn't turn out as expected). This is especially true here where I intentionally choose a rather high temperature (0.71) to get amusing lyrics for my song. Some people who have heard the song complain about the AI just being crappy, but it could easily be made more human and boring (less invented words etc.) with a lower temperature.

4. Improvements

Not only are RNNs bad at structure, they are also bad at context and staying with a common thread over multiple sentences. Since I did this project, a lot more powerful text generation models have been created. One of the most powerful (and generally available) is GPT2. In the next part of this series, I will be using GPT2 to create a Twitter bot in the style of a person's account. I will probably also reimplement the Viking Lyrics Generator with a GPT2 model instead of RNN. When that's finished, I'll use it to make lyrics for a second song. Probably at lower temperature to showcase that AI-generated text can be hard to distinguish from that written by humans.

Elephants in Rooms

  1. Isn't it cheating to not create your own models?

Of course not! That's the point of this series - applied AI. There are so many AI tools and frameworks that are ready for use, that don't actually require expertise in AI, machine learning, models etc. Sure you can design your own models, but that takes some serious devotion to the machine learning field.

2. Why did you use such an old model/docker image/xyz

I did this experiment years ago, I just didn't get around to blogging about it.

Next part

The second part of Applied AI will be out soon. Until then, go forth and apply!

comments powered by Disqus
Find me on Mastodon