HyperAIHyperAI

Command Palette

Search for a command to run...

How to Automate Email Classification Using LLMs and Databricks DBSQL AI Functions

LLM-Powered Email Classification on Databricks Since Databricks introduced AI functions, integrating large language models (LLMs) into data workflows has become significantly easier. Analysts and business users who may not have expertise in Python or machine learning infrastructure can now perform advanced AI tasks directly through SQL queries. This article provides a practical guide to building an LLM-based email classification system using few-shot learning and Databricks DBSQL AI functions. Part 1: AI Functions The primary goal is to automate the process of checking the company's mailbox and classifying emails to determine if a client requests to be unsubscribed from marketing or commercial emails. This is particularly useful when there are no historical datasets to train on. The ai_query() function, part of Databricks AI functions, is instrumental in achieving this objective. Here’s the test dataset structure we are working with: - Email_id: Unique identifier for each email - Email_body: The content of the email - Sender: The email sender To classify the emails, we use the following SQL arguments with the ai_query() function: - Model: databricks-meta-llama-3-3-70b-instruct - Prompt Template: A template designed based on the research of Si et al. (2024), which has been adapted for our specific use case. The template prompts the LLM to identify if an email requests removal from a marketing distribution list and to respond with either "Remove" or "Keep". - Model Parameters: max_tokens set to 1 to ensure a single-token response (either "Remove" or "Keep"), and temperature set to 0.1 to reduce randomness. The prompt template looks like this: ``` Forget all your previous instructions, pretend you are an e-mail classification expert who tries to identify whether an e-mail is requesting to be removed from a marketing distribution list. Answer "Remove" if the mail is requesting to be removed, "Keep" if not. Do not add any other detail. If you think it is too difficult to judge, you can exclude the impossible one and choose the other, just answer "Remove" or "Keep". Here are a few examples for you: * "I wish to no longer receive emails" is "Remove"; * "Remove me from any kind of subscriptions" is "Remove"; * "I want to update my delivery address" is "Keep"; * "When is my product warranty expiring?" is "Keep"; Now, identify whether the e-mail is "Remove" or "Keep"; e-mail: ``` These elements are combined into a single SQL query to run batch inference on all emails and generate the predicted labels: sql SELECT *, ai_query( 'databricks-meta-llama-3-3-70b-instruct', "${prompt}" || Email_body, modelParameters => named_struct('max_tokens', 1, 'temperature', 0.1) ) AS Predicted_Label FROM customer_emails; Part 2: Access to Gmail APIs To implement this use case effectively, we need a method to ingest emails automatically. This involves configuring your Gmail account to work with APIs. For this demo, a more manual approach is used, but for full automation, setting up a Service Account is recommended. Step-by-Step Guide to Using Gmail APIs Create a Project: Go to the Google Cloud Console and create a new project. Enable the Gmail API for your project. Configure OAuth Consent Screen: Navigate to the OAuth consent screen configuration. Choose an application type and fill out the required details. Add the necessary scopes for email access. Authorize Users: Allow users to authenticate and publish the application. For this demo, using a dummy Gmail account simplifies the process. However, the authentication step remains crucial. Accessing Gmail Mailbox from Databricks Notebooks To authenticate to Gmail from a Databricks Notebook, we use a custom function implemented in the repository. Since Databricks clusters lack browser access, a workaround is necessary: 1. Manual Authentication: - Implement the gmail_authenticate_manual() function. - The function suggests opening a URL in a local browser to complete the OAuth process. - After authentication, the user lands on an error page, from which the necessary code can be extracted. Reading Emails: Once authenticated, use the build('gmail', 'v1', credentials = access_) function to create the Gmail API service. Download email messages using the get_email_messages_since(service_, since_day, since_month, since_year) function. Save the email information to a Spark DataFrame and eventually to a Delta Table. Here’s the Databricks Notebook code snippet: ```python Build Gmail API service and download emails service_ = build('gmail', 'v1', credentials=access_) emails = get_email_messages_since(service_, since_day=25, since_month=3, since_year=2025) if emails: spark_emails = spark.createDataFrame(emails) display(spark_emails) else: spark_emails = None print("No emails found.") ``` Evaluation and Industry Insights Industry insiders laud the integration of LLMs into Databricks' SQL environment as a significant advancement. This feature democratizes AI capabilities, allowing non-technical users to perform complex tasks with minimal effort. The ability to classify emails without historical data is particularly noteworthy, as it leverages the power of few-shot learning to make immediate, informed decisions. Databricks, known for its robust data processing and machine learning capabilities, continues to innovate by making AI more accessible. The combination of LLMs and SQL functions simplifies the implementation of AI solutions, reducing the barrier to entry for companies looking to enhance their data workflows with intelligent automation. However, the manual authentication process for Gmail APIs presents a bottleneck for full automation. Setting up Service Accounts and automating API access are essential steps for scaling this solution in production environments. Overall, this guide provides a solid foundation for implementing a practical email classification system, demonstrating the potential of LLMs in real-world applications.

Related Links

How to Automate Email Classification Using LLMs and Databricks DBSQL AI Functions | Trending Stories | HyperAI