In today’s digital landscape, product teams are increasingly incorporating AI-driven features powered by Large Language Models (LLMs). These advanced capabilities enhance various system components, including search, question-answering, and data extraction.

However, a crucial preliminary step in delivering the right feature at the right time is accurately routing and interpreting user actions. Understanding user intent is therefore fundamental to the success of any LLM and Retrieval-Augmented Generation (RAG) based solution.

In this blog post, we will delve into how LLMs 🤖 can be utilized to effectively detect and interpret user intent from a diverse range of queries.

What is intent detection and why does it matter? Link to this heading

Intent Detection or intent recognition is an NLP technique used to identify the purpose or goal behind a user’s query. Historically, this type of classification problem has been very important in search and recommendation engines 1.

Some key aspects of intent detection include:

  • Natural Language Understanding: i.e. understanding the meaning behind what a user is saying.
  • Context Analysis: i.e. considering the context in which the user query is in to accurately detect the intent. The context here could be a document, a part of a document, in a chat etc.
  • Classification: i.e. assigning pre-defined labels or categories to a user’s input and predicted intent.

As you can imagine this is a crucial step for an LLM-based system using e.g. RAG for various reasons such as:

  • Improving the user experience: by understanding users we can tailor responses and actions to meet individual needs. But also improve the efficiency and relevancy of the responses to the user’s query by personalization.
  • Automation: knowing or predicting the intent can lead to the automation of certain pre-defined routines or tasks based on the user’s query.
  • Feature Activation: intent detection can be used to route the user to various parts of our system based on the predicted intent and thus incorporate context to respond to user queries promptly.

You may have heard about semantic routing2, which is an adjacent concept. TLDR; use intent detection to route your users to relevant features or parts of your system to provide them with a tailored and timely experience.

Intent Detection Method(s) Link to this heading

Now that we know what intent detection is, what it can be used for and why it is important.

Let’s delve a bit further into various methods for detecting intent:

graph LR A[Intent Detection Methods] A --> B[Rule-based Methods] B --> B1[Predefined Rules] B --> B2[Keyword Matching] A --> C[Traditional NLP Methods] C --> C1[Bag-of-Words] C --> C2[TF-IDF] A --> D[Classification Models] D --> D1[Naive Bayes] D --> D2[SVM] D --> D3[Random Forests] A --> G[Sequence Models] G --> G1[RNN] G --> G2[LSTM] G --> G3[GRU] A --> E[Transformer-based Methods] E --> E1[BERT] E --> E2[GPT] E --> E3[SetFit] E --> E4[FastFit] A --> F[Large Language Models] F --> F1[GPT-4o] F --> F2[Haiku] F --> F3[Mistral Large]

A rough classification of these various methods is the following:

  • Rule-based
  • Traditional NLP methods
  • Classification models
  • Sequence models
  • Transformer-based models
  • Large Language models

We will not look into all the details of these various methods. If you want to learn more, numerous great sources are available online for each type of method/algorithm mentioned.

However, see the table below, with some pros and cons of these various methods:

Method Pros Cons
Rule-based Methods - Simple to implement and interpret - Easy to maintain for small sets of rules - Limited flexibility and scalability - Requires manual updates for new intents and vocabulary
Traditional NLP Methods (Bag-of-Words, TF-IDF) - Simple and effective for basic text classification - Quick and computationally inexpensive - Ignores word order and context - Can lead to loss of meaning
Classification Models (Naive Bayes, SVM, Random Forests) - Generally more accurate than rule-based systems - Can handle a variety of inputs - Requires feature engineering - May not capture complex linguistic nuances
Sequence Models (RNN, LSTM, GRU) - Effective for capturing context and handling long sequences - Good at modeling temporal dependencies in text - Computationally intensive - Requires large datasets
Transformer-based Methods (BERT, GPT, SetFit, FastFit) - State-of-the-art performance in many NLP tasks - Capable of understanding complex context and nuances - Requires significant computational power - Needs substantial training data
Large Language Models (GPT-4o, Haiku, Mistral Large) - High accuracy and versatility in various applications - Can handle a wide range of tasks without extensive retraining - Very computationally expensive - Potential issues with bias and interpretability

Intent Detection Using LLM(s) Link to this heading

Now that we know what Ìntent Detection is and why it is useful, let’s look at a real-world example of how you could detect intent using LLMs.

In this example, we will operate a fictional recipe chatbot where people can in a QA fashion ask questions about a recipe and also get recipe recommendations. Our end-users, can either chat in the context of a recipe or a more general QA context.

This means that we have two distinct FoodContextTypes to consider based on the user’s actions:

  1. RECIPE: queries in the context of a recipe
  2. GENERAL: more general queries not strictly connected to a specific recipe.

After talking with our product team and some clients, we understand that to a start we need to be able to detect the following FoodIntentTypes:

  1. RECIPE_QA: Question and answer about a recipe
  2. RECIPE_SUGGEST: Suggestions or changes to a recipe
  3. RECIPE_TRANSLATE: Translate a recipe, or parts of a recipe in some way or form
  4. RECIPE_EXPLAIN: Explain a recipe in more detail i.e. what type of cuisine is this etc.
  5. GENERAL_QA: General QA with the chatbot
  6. GENREAL_RECOMMENDATION: Queries related to getting recipe recommendations

To spice 🔥 it up a bit we will be using TypeScript for this example, mainly as I have been doing quite a lot of TS and JS lately at my current workplace.

The libraries we will be using:

  • @langchain/openai
  • @langchain/core
  • zod

To start with, let’s see what entities we have to deal with, TS we use enums to enumerate these:

 1/* Enum for chat context */
 2export enum FoodContextType {
 7/* Enum for chat intent */
 8export enum FoodIntentType {

As intentDetection is a classification problem we want to use an LLM to predict what the user intent is based on the following:

  • context: the context the user is in
  • userQuery: the actual user question or queries.

Intent can also be of multiple meanings i.e. only allowing or deducing a single intent might be wrong. For instance, look at the query below:

“Please recommended me a french cooking recipe with instructions in french”

The intent we can deduce based on the query above is the following:

  1. GENERAL_RECOMMENDATION: the user wants to cook 🇫🇷
  2. RECIPE_TRANSLATE: the user wants to cook French food in French🇫🇷 🇫🇷

To model this we want to use zod 3. Luckily for us, many LLM(s) are good at functionCalling and extracting structuredOutput based on a provided schema.

A zod object for our task could look like the below:

 1import { z } from 'zod';
 3const zDetectFoodIntentResponse = z.object({
 4    foodIntent: z
 5      .array(
 6        z.object({
 7            foodContextType: z.nativeEnum(FoodContextType))
 8            .describe('Type of context the user is in'),
 9            foodIntentType: z.nativeEnum(FoodIntentType))
10            .describe('Predict food related intent'),
11            reasoning: z.string()
12            .describe('Reasoning around the predicted intent')
13        })
14      )
17/* Infer type */
18type FoodIntentDetectionResponse = z.infer<typeof zDetectFoodIntentResponse>;

Many modern LLMs support tool calling / structured output out-of-the-box, and using orchestrating libraries such as langchain makes it very easy to get started. Langchain released fairly recent updates to how to do structured output extraction and function calling across several different LLM providers. See more about it here.

To continue, the next step is to create our prompt and build or chain of one or several LLM calls. If you want to see some tips and tricks on how to do multiple LLM calls for data extraction tasks see my other blogpost. And if you like me are a fan of DSPy check my other post.

Anyhow, see below for the initial starting point of our prompt:

 1export const FoodIntentDetectionPromptTemplate = `
 2You are an expert restauranteur and online TV chef.
 3Based on the provided 'context' and 'userQuery', predict the users 'intent'.
 4Make sure to follow the instructions.
 6# Instructions
 71. Only use the 'foodContextTypes' specified in the schema.
 82. Use the 'foodContextType' to determine what type of 'context' the user is in.
 93. Based on the 'foodContextType' and 'userQuery' predict the 'foodIntentType'.
104. If the 'userQuery' is uncertain, unclear, or irrelevant use 'GENERAL_QA' as the default intent.
12# Food Context Input Type
15# User Context
18# User Query

As prompt engineering is still more of an art than a science (if you are not using frameworks such as DSPy) you likely need to refine a prompt such as the above for your use case. Anyhow, for this example, this is good enough and let’s proceed with building our chain.

First, we define a helper class to keep track of our chat messages:

 1import { ChatPromptTemplate } from '@langchain/core/prompts'
 2import { ChatOpenAI } from '@langchain/openai'
 4/* MessageRole enum */
 5export enum MessageRole {
 8    USER = 'USER'
11/* Messages object */
12export class Messages {
13    id: string;
14    content: string
15    recipe: string
16    role: MessageRole

Then we build our predictIntent function:

 1async predictIntent(messages: Messages)
 2: Promise<FoodIntentDetectionResponse> {
 3    // unpack message
 4    const { content, recipe, role } = message;
 6    // get userContext
 7    const userContext = (content == null && recipe != null): recipe ? content; 
 9    // deduce foodContextType from message
10    const foodContextType = !recipe ? FoodContextType.GENERAL : FoodContextType.RECIPE ;
12    // get user question
13    const userQuery = ...;
15    // build chain
16    const llm = new ChatOpenAI({
17        temperature: 0,
18        modelName: 'gpt-4o',
19        openAIApiKey: process.env.apiKey
21    });
23    const = chain = ChatPromptTemplate
24    .fromTemplate(FoodIntentDetectionPromptTemplate)
25    .pipe(llm.withStructuredOutput(zDetectFoodIntentResponse));
27    // invoke chain and parse response
28    const response = await chain.invoke(({
29        context: userContext ?? '',
30        foodContextType,
31        userQuery: userquery ?? '' 
33    }));
35    const parsedResponse = zDetectFoodIntentResponse.safeParse(response);
37    if (!parsedResponse.success) {
38        throw new Error('Failed to parse response...');
39    }
41    return;

Not too hard right? Using this function we might get output such as the below for different queries:

Query 1:

“Can give me a good dinner recommendation that is fairly quick and easy, preferably japanese?”

Output 1:

2    "foodIntent": [
3        {
4            "foodContextType": "GENERAL",
5            "foodIntentType": "GENERAL_RECOMMENDATION",
6            "reasoning": "The user is asking for a recommendation of Japanese food that is easy and quick. Due to this, the predicted intent is GENERAL_RECOMMENDATION."
7        }
8    ]

Query 2:

“This recipe is great, but I would like to make it vegetarian, and also use the imperial system instead of the metrics system for the ingredients”

Output 2:

 2    "foodIntent": [
 3        {
 4            "foodContextType": "RECIPE",
 5            "foodIntentType": "RECIPE_QA",
 6            "reasoning": "The user ..."
 7        },
 8        {
 9            "foodContextType": "RECIPE",
10            "foodIntentType": "RECIPE_TRANSLATE",
11            "reasoning": "The user ..."
12        },
13    ]

Closing remarks Link to this heading

In this post, we explored intent detection in a bit more detail and why it is important for any AI or LLM-powered system. With the main goal of increasing the relevancy and accuracy of a user’s query in a QA/search-based system.

We also demonstrated how you could use LLMs such as gpt-4o to detect intent in a fictional QA system. Intent detection is not just a technical necessity but a strategic advantage in building intelligent, user-centric LLM-powered systems.

Even if we used gpt-4o in this example, there are many relevant lower latency alternatives such as haiku from Antrophic. And if you have a few 100s of examples other transformer-based approaches such as FastFit or SetFit are also interesting. Thanks for reading and until the next time 👋!