Spam GPT

Andrew Aarestad
Aug 3, 2023
4 min read

Updated: Nov 14, 2024

Using GPT for text categorization is easy and powerful. We love this use case for large language models and plan to roll this technology into several of our projects. Here's a quick tutorial on how we build Fine Tuned categorization models and use them in production. The code on this page is available on Github: https://github.com/hypercolor/gpt-spam-filter

Background

What exactly are we doing here? This is a classic supervised learning problem, which means we will do the following: * Prepare a set of labeled training data * Use this data to train the model * Withhold some of the data for validation * Use the trained model to predict on the validation data to evaluate model performance * After validation, use trained model in production to predict on live data

Categorization with GPT

Using pre-trained models for text categorization is not a new concept, but large language models take things to the next level. To fine-tune a foundational model, we give it a set of prompts and desired completions. In the spam filter example, the desired completion is "spam" or "ham". The language model weights are then updated so it will produce completions that match the training data. Once we've trained a fine tune model, we can use it to categorize our application data. We use the text as a prompt, and tell the model to provide a 1-word completion. Since we've trained our fine-tuned model with these specialized completions, the model will produce either spam or ham.

Preparing the Training Data

The quickest way to see all the code here is to open the Python notebook in the Github repo.

Building the Fine Tune Model

The training data was exported from our application database to JSON format, including our manually-entered "spam_flag" boolean labels. Using pandas, I load these comments, remove duplicates, add a new column for the desired completion of "spam" or "ham", and save the prompt-formatted data to a new JSON file.

allCommentsDf = pd.read_json('data/comments.json')[<'id', 'deleted', 'display_name', 'text', 'spam_flag'>]
df = allCommentsDf[allCommentsDf<'text'>.str.len() < 2000]
df = df.drop_duplicates(subset=<'text'>)
df<'spam'> = ['ham' if x == False else 'spam' for x in df<'spam_flag'>]


train = df[<'text', 'spam'>].copy()
train.rename(columns={'spam': 'completion', 'text': 'prompt'}, inplace=True)
train.to_json(f"data/{jsonlFilename}.jsonl", orient='records', lines=True)

Once the prompt/completion data is ready, run it through OpenAI's CLI data cleaning tool. This will make sure no duplicates are present, add the special suffix each prompt needs (\n\n###\n\n), and whitespace to the beginning of each completion. Just go with the defaults unless you have a good reason to do otherwise:

!openai tools fine_tunes.prepare_data -f data/{jsonlFilename}.jsonl -q

Now that the training data is finally ready, we can use the fine tune CLI to start the training. This uploads the data to OpenAI's servers and submits a training job to the queue. As far as I know there's no way to get priority access to this training queue, so may take a long time. For me, it usually takes 2-3 hours to start the job.

model = "ada"

trainFile = f"data/{jsonlFilename}_prepared_train.jsonl"
validateFile = f"data/{jsonlFilename}_prepared_valid.jsonl"

command = f"OPENAI_API_KEY={apikey} openai api fine_tunes.create -t {trainFile} -v {validateFile} -m {model} --compute_classification_metrics --classification_positive_class \" ham\""
!{command}

Predicting on Live Data

Once the fine tuning job is complete, you can use it to predict on live text.

def isSpam(comment):
    res = openai.Completion.create(
        model=os.environ<'fine_tune_model_id'>,
        prompt=comment<:1500> + "\n\n###\n\n",
        max_tokens=1,
        temperature=0,
        logprobs=2)
    label = res.choices<0>.text
    if (label == " spam"):
        return True
    elif (label == " ham"):
        return False
    else:
        print("Error, unexpected model output: {}".format(label))
        return False

hamPrompt = "I am so Happy we were able to share a few more memories at Grandma and Grandma’s house last summer. You are all in my heart. Thank you cousin Jen for the LOVE"
spamPrompt = "Hello everyone my names are ALEX JACKSON from the UK, I want to use this golden medium to appreciate Doctor Abdul a great spell caster for helping me retrieving back my relationship with my ex lover when he ended and turned back on me for quite a long time now (6 months ago)..."

print("hamPrompt is spam: {}".format(isSpam(hamPrompt)))
print("spamPrompt is spam: {}".format(isSpam(spamPrompt)))

Limitations

* No GPT model. Yes this article is called Spam GPT, but you can't actually use GPT 3.5 or GPT 4 in fine tuning. Yet. Supposedly they will be releasing support for fine tuning these more powerful models later in 2023. * New spam. This approach is great for detecting the types of spam you are already getting, but not as good when it comes to new types of spam it hasn't seen before. Consider adding open-source spam databases to your training set if you anticipate a wide variety of spam. * Text length. Openai charges per token (word), with a max of 2000 tokens. So you can't use this on text that is long-form content. Don't just use the first X number of words in a post either. Spammers will hide their spam behind a normal-looking preamble. Also given that the cost is per-word, consider training the spam filter on an abbreviated version of the content. Can you make it work with only 100 words per prompt?