Anyone, who has been working with LLMs and generative AI recently has noticed that how you prompt an LLM matters. Slight changes to your prompts might lead to unexpected results. It is often non-trivial to reuse the same prompts when switching the underlying LLM you are using. An example is e.g. moving from OpenAI to Antrophic and using function calling.

This often leads to quite some time spent on rewriting your prompts, thus more prompt engineering is required. Luckily, there are some interesting frameworks out there such as DSPy that focus more on ‘programming’ rather than ‘prompting’ your LLMs.

To get a good overview of DSPy see some of the references below:

  1. πŸ’» Intro to DSPy: Goodbye Prompting, Hello Programming!
  2. πŸ’» DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
  3. πŸ’» DSPy Deep-Dive

In this post, we will try to use DSPy to extract metadata data from recipes. For a recap of our previous approach see e.g. this post.

DSPy - Declarative Self-improving Language Programs Link to this heading

DSPy or Declarative Self-improving Language Programs was first introduced in the paper in (2):


DSPy is a framework for algorithmically optimizing LM prompts and weights, especially when LMs are used one or more times within a pipeline.

DSPy can routinely teach powerful models like GPT-3.5 or GPT-4 and local models like T5-base or Llama2-13b to be much more reliable at tasks, i.e. having higher quality and/or avoiding specific failure patterns. DSPy optimizers will “compile” the same program into different instructions, few-shot prompts, and/or weight updates (finetunes) for each LM β€” About DSPy1

  1. The above quote is excerpted from ↩︎

Note the use of optimizing above as this provides some analogies to optimizing neural networks using a framework such as Pytorch or Tensorflow. The nice thing with the optimizer above is that DSPy enables us to not focus too much on prompt engineering with whatever LLM we choose. Instead, we can compile i.e. optimize the underlying instructions to work with any LLM.

In short, the DSPy programming model has the following abstractions:

  1. Signatures instead of the needed for hand-written prompts/fine-tuning.
  2. Modules that implement various prompt engineering techniques such as Cot, REACT etc.
  3. Optimizer1 to automated manual prompt engineering based on given metrics

A DSPy program is a program using (1) - (3) together with data to use in the optimization step. For a more thorough walk-through of DSPy see e.g., (1) and (2) from the introduction section.

In the following sections, we will use this notebook πŸ““ to examine how you can use DSPy for NER use cases. As in the previous NER post, we want to extract different metadata for food items.

Setup environment Link to this heading

The first thing to do is to load the necessary libraries and do any setup of these libraries.

 1import dspy
 2from pydantic import BaseModel, Field
 3from dspy.functional import TypedPredictor
 4from IPython.display import Markdown, display
 5from typing import List, Optional, Union
 6from dotenv import load_dotenv
 7from devtools import pprint
 9assert load_dotenv() == True
10gpt4 = dspy.OpenAI(model="gpt-4-turbo-preview", max_tokens=4096, model_type="chat")
11gpt_turbo = dspy.OpenAI(model="gpt-3.5-turbo", max_tokens=4096, model_type="chat")

Here we use OpenAI for the example. DSPy seems to support many of the big open/closed source providers. For implementations see more here and here

Data Link to this heading

As in this post, we will use the data shown below:

 1### Chashu pork (for ramen and more)
 2Chashu pork is a classic way to prepare pork belly for Japanese dishes such as ramen. 
 3While it takes a little time, it's relatively hands-off and easy, and the result is delicious.
 5### Ingredients
 62 lb pork belly or a little more/less
 72 green onions spring onions, or 3 if small
 81 in fresh ginger (a chunk that will give around 4 - 6 slices)
 92 cloves garlic
10β…” cup sake
11β…” cup soy sauce
12ΒΌ cup mirin
13Β½ cup sugar
142 cups water or a little more as needed
16### Instructions
17Have some string/kitchen twine ready and a pair of scissors before you prepare the pork. 
18If needed, trim excess fat from the outside of the pork but you still want a later over the

We will also use the following Pydantic data models 2 as part of the problem:

1class FoodMetaData(BaseModel):
2    reasoning: str = Field(description="Reasoning for why the entity is correct")
3    value: Union[str, int] = Field(description="Value of the entity")
4    entity: str = Field(description="The actual entity i.e. pork, onions etc")
6class FoodMetaData(BaseModel):
7    context: List[FoodMetaData]

The first model above represents the “reasoning” object as part of the CoT step in the workflow.

 1class FoodEntity(BaseModel):
 2    food: str = Field(description="This can be both liquid and \
 3    solid food such as meat, vegetables, alcohol, etc")
 4    quantity: int = Field(description="The exact quantity or amount \
 5    of the food that should be used in the recipe")
 6    unit: str = Field(description="The unit being used e.g. \
 7    grams, milliliters, pounds, etc")
 8    physical_quality: Optional[str] = Field(description="The characteristic of the ingredient")
 9    color: str = Field(description="The color of the food")
11class FoodEntities(BaseModel):
12    entities: List[FoodEntity]

The second model above is the schema for the actual metadata that we want to extract. Below is the resulting JSON schema for this object:

 2    "properties": {
 3        "food": {
 4            "description": "This can be both liquid and solid food such as meat, vegetables, alcohol, etc",
 5            "title": "Food",
 6            "type": "string"
 7        },
 8        "quantity": {
 9            "description": "The exact quantity or amount of the food that should be used in the recipe",
10            "title": "Quantity",
11            "type": "integer"
12        },
13        "unit": {
14            "description": "The unit being used e.g. grams, milliliters, pounds, etc",
15            "title": "Unit",
16            "type": "string"
17        },
18        "physical_quality": {
19            "anyOf": [
20                {
21                    "type": "string"
22                },
23                {
24                    "type": "null"
25                }
26            ],
27            "description": "The characteristic of the ingredient",
28            "title": "Physical Quality"
29        },
30        "color": {
31            "description": "The color of the food",
32            "title": "Color",
33            "type": "string"
34        }
35    },
36    "required": [
37        "food",
38        "quantity",
39        "unit",
40        "physical_quality",
41        "color"
42    ],
43    "title": "FoodEntity",
44    "type": "object"

Finally, for the teleprompter/optimizer, we need to provide some training examples 3:

 1# create some dummy data for training
 2trainset = [
 3    dspy.Example(
 4        recipe="French omelett with 2 eggs, 500grams of butter and 10 grams gruyere", 
 5        entities=[
 6            FoodEntity(food="eggs", quantity=2, unit="", physical_quality="", color="white"),
 7            FoodEntity(food="butter", quantity=500, unit="grams", physical_quality="", color="yellow"),
 8            FoodEntity(food="cheese", quantity=10, unit="grams", physical_quality="gruyer", color="yellow")
 9        ]
10    ).with_inputs("recipe"),
11    dspy.Example(
12        recipe="200 grams of Ramen noodles bowel with one pickled egg, 500grams of pork, and 1 spring onion", 
13        entities=[
14            FoodEntity(food="egg", quantity=1, unit="", physical_quality="pickled", color="ivory"),
15            FoodEntity(food="ramen nudles", quantity=200, unit="grams", physical_quality="", color="yellow"),
16            FoodEntity(food="spring onion", quantity=1, unit="", physical_quality="", color="white")
17        ]
18    ).with_inputs("recipe"),
19    dspy.Example(
20        recipe="10 grams of dutch orange cheese, 2 liters of water, and 5 ml of ice", 
21        entities=[
22            FoodEntity(food="cheese", quantity=10, unit="grams", physical_quality="", color="orange"),
23            FoodEntity(food="water", quantity=2, unit="liters", physical_quality="translucent", color=""),
24            FoodEntity(food="ice", quantity=5, unit="militers", physical_quality="cold", color="white")
25        ]
26    ).with_inputs("recipe"),
27    dspy.Example(
28        recipe="Pasta carbonara, 250 grams of pasta 300 grams of pancetta, \
29        150 grams pecorino romano, 150grams parmesan cheese, 3 egg yolks", 
30        entities=[
31            FoodEntity(food="pasta", quantity=250, unit="grams", physical_quality="dried", color="yellow"),
32            FoodEntity(food="egg yolk", quantity=3, unit="", physical_quality="", color="orange"),
33            FoodEntity(food="pancetta", quantity=300, unit="grams", physical_quality="pork", color=""),
34            FoodEntity(food="pecorino", quantity=150, unit="grams", physical_quality="goat chese", color="yellow"),
35            FoodEntity(food="parmesan", quantity=150, unit="grams", physical_quality="chese", color="yellow"),
36        ]
37    ).with_inputs("recipe"),
38    dspy.Example(
39        recipe="American pancakes with 250g flour, 1 tsp baking powder, 1 gram salt, 10g sugar, 100ml fat milk", 
40        entities=[
41            FoodEntity(food="flour", quantity=250, unit="grams", physical_quality="", color="white"),
42            FoodEntity(food="baking powder", quantity=1, unit="tsp", physical_quality="", color="white"),
43            FoodEntity(food="salt", quantity=1, unit="grams", physical_quality="salty", color="white"),
44            FoodEntity(food="milk", quantity=100, unit="mil", physical_quality="fat", color="white"),
45        ]
46    ).with_inputs("recipe")

Signatures Link to this heading

The next step is to create the dspy.Signature objects, where we need to specify an InputField(...) and OutPutField(...). To recap what a Signature is:


A signature is a declarative specification of the input/output behavior of a DSPy module. Signatures allow you to tell the LM what it needs to do, rather than specify how we should ask the LM to do it. β€” DSPy Signatures1

Below are the Signatures we will be using:

 1class RecipeToFoodContext(dspy.Signature):
 2    """You are a food AI assistant. Your task is to extract the entity, the value of the entity and the reasoning 
 3    for why the extracted value is the correct value. If you cannot extract the entity, add null"""
 4    recipe: str = dspy.InputField()
 5    context: FoodMetaData = dspy.OutputField()
 7class RecipeToFoodEntities(dspy.Signature):
 8    """You are a food AI assistant. Your task is to extract food-related metadata from recipes."""
 9    recipe: str = dspy.InputField()
10    entities: FoodEntities = dspy.OutputField()

Notice the modular and sleek nature of creating these compared to how it would look in other frameworks. Looking into the actual code for these you will see that these are wrappers for the Pydantic Fields object:

1def InputField(**kwargs):
2    return pydantic.Field(**move_kwargs(**kwargs, __dspy_field_type="input"))
4def OutputField(**kwargs):
5    return pydantic.Field(**move_kwargs(**kwargs, __dspy_field_type="output"))

Modules Link to this heading

The next thing to do is select what Modules that we want to use. To recap what Modules are:


Each built-in module abstracts a prompting technique (like chain of thought or ReAct). Crucially, they are generalized to handle any [DSPy Signature]. Your init method declares the modules you will use. Your forward method expresses any computation you want to do with your modules β€” DSPy Modules1

The Modules that we will be using are:

  1. TypedPredictor
  2. TypedChainOfThought

These are 2 functional modules that let us specify types via Pydantic schemas which are useful for structured data extraction. These can either be used with dspy.Functional or dspy.Module. However, before creating the actual modules, we will define 1 helper method to parse the context call:

1def parse_context(food_context: FoodMetaData) -> str:
2    context_str = ""
3    for context in food_context:
4        context: FoodMetaData
5        context_str += f"{context.entity}:\n" + context.model_dump_json(indent=4) + "\n"
6    return context_str

This is mainly to extract the resulting context JSON object as a string for the next step of the chain.

Moving on to the actual Modules using dspy.Module we define it as:

 1class ExtractFoodEntities(dspy.Module):
 2    def __init__(self, temperature: int = 0, seed: int = 123):
 3        super().__init__()
 4        self.temperature = temperature
 5        self.seed = seed
 6        self.extract_food_context = dspy.TypedPredictor(RecipeToFoodContext)
 7        self.extract_food_context_cot = dspy.TypedChainOfThought(RecipeToFoodContext)
 8        self.extract_food_entities = dspy.TypedPredictor(RecipeToFoodEntities)
10    def forward(self, recipe: str) -> FoodEntities:
11        food_context = self.extract_food_context(recipe=recipe).context
12        parsed_context = parse_context(food_context.context)
13        food_entities = self.extract_food_entities(recipe=parsed_context)
14        return food_entities.entities

Or using dspy.Functional we define it as:

 1from dspy.functional import FunctionalModule, predictor, cot
 3class ExtractFoodEntitiesV2(FunctionalModule):
 4    def __init__(self, temperature: int = 0, seed: int = 123):
 5        super().__init__()
 6        self.temperature = temperature
 7        self.seed = seed
 9    @predictor
10    def extract_food_context(self, recipe: str) -> FoodMetaData:
11        """You are a food AI assistant. Your task is to extract the entity, the value of the entity and the reasoning 
12        for why the extracted value is the correct value. If you cannot extract the entity, add null"""
13        pass
15    @cot
16    def extract_food_context_cot(self, recipe: str) -> FoodMetaData:
17        """You are a food AI assistant. Your task is to extract the entity, the value of the entity and the reasoning 
18        for why the extracted value is the correct value. If you cannot extract the entity, add null"""
19        pass
21    @predictor
22    def extract_food_entities(self, recipe: str) -> FoodEntities:
23        """You are a food AI assistant. Your task is to extract food entities from a recipe."""
24        pass
26    def forward(self, recipe: str) -> FoodEntities:
27        food_context = self.extract_food_context(recipe=recipe)
28        parsed_context = parse_context(food_context.context)
29        food_entities = self.extract_food_entities(recipe=parsed_context)
30        return food_entities

Using the functional API we can use some nifty decorator functions i.e. @predictor and @cot. Now when we have our Module we might want to test it on some example data. DSPy also allows you to specify a dspy.Context where you can choose what LLM to use:

1extract_food_entities = ExtractFoodEntities()
3with dspy.context(lm=gpt4):
4    entities = extract_food_entities(recipe="Ten grams of orange dutch cheese,  \
5    2 liters of water and 5 ml of ice")
6    pprint(entities)

This will result in the following entities:

 2    entities=[
 3        FoodEntity(
 4            food='orange dutch cheese',
 5            quantity=10,
 6            unit='grams',
 7            physical_quality=None,
 8            color='orange',
 9        ),
10        FoodEntity(
11            food='water',
12            quantity=2000,
13            unit='milliliters',
14            physical_quality=None,
15            color='clear',
16        ),
17        FoodEntity(
18            food='ice',
19            quantity=5,
20            unit='milliliters',
21            physical_quality=None,
22            color='clear',
23        ),
24    ],

Optimize the program Link to this heading

Now we have all the components we need to start optimizing our program. To recap:


A DSPy optimizer is an algorithm that can tune the parameters of a DSPy program (i.e., the prompts and/or the LM weights) to maximize the metrics you specify, like accuracy. … DSPy programs consist of multiple calls to LMs, stacked together as [DSPy modules]. Each DSPy module has internal parameters of three kinds: (1) the LM weights, (2) the instructions, and (3) demonstrations of the input/output behavior.

Given a metric, DSPy can optimize all of these three with multi-stage optimization algorithms. β€” DSPy Optimizets1

For the optimization, we will use the BootstrapFewShot 4 and the metric below:

1def validate_entities(example, pred, trace=None):
2    """Check if both objects are equal"""
3    return example.entities == pred
I.e. we need an exact match for the objects 5

To run the optimization step we use the compile method:

1from dspy.teleprompt import BootstrapFewShot
3teleprompter = BootstrapFewShot(metric=validate_entities)
4compiled_ner = teleprompter.compile(ExtractFoodEntitiesV2(), trainset=trainset)

The compiled programming is something we can store and load from disk as well for later use.

To use the compiled program on our dataset we do:

 4    entities=[
 5        FoodEntity(
 6            food='pork belly',
 7            quantity=2,
 8            unit='lb',
 9            physical_quality=None,
10            color='',
11        ),
12        FoodEntity(
13            food='green onions',
14            quantity=2,
15            unit='items',
16            physical_quality='or 3 if small',
17            color='',
18        ),
19        FoodEntity(
20            food='fresh ginger',
21            quantity=1,
22            unit='inch',
23            physical_quality='chunk',
24            color='',
25        ),
26        FoodEntity(
27            food='garlic',
28            quantity=2,
29            unit='cloves',
30            physical_quality=None,
31            color='',
32        ),
33        FoodEntity(
34            food='sake',
35            quantity=2,
36            unit='β…” cup',
37            physical_quality=None,
38            color='',
39        ),
40        FoodEntity(
41            food='soy sauce',
42            quantity=2,
43            unit='β…” cup',
44            physical_quality=None,
45            color='',
46        ),
47        FoodEntity(
48            food='mirin',
49            quantity=1,
50            unit='ΒΌ cup',
51            physical_quality=None,
52            color='',
53        ),
54        FoodEntity(
55            food='sugar',
56            quantity=1,
57            unit='Β½ cup',
58            physical_quality=None,
59            color='',
60        ),
61        FoodEntity(
62            food='water',
63            quantity=2,
64            unit='cups',
65            physical_quality='or a little more as needed',
66            color='',
67        ),
68    ],

Note too bad after doing some code for a couple of hours πŸŽ‰. Finally to inspect the resulting prompt used by the program each LM has an inspect_history method:


Which outputs the prompt below:

  1You are a food AI assistant. Your task is to extract food entities from a recipe.
  5Follow the following format.
  7Recipe: ${recipe}
  8Extract Food Entities: ${extract_food_entities}. Respond with a single JSON object. JSON Schema: {"$defs": {"FoodEntity": {"properties": {"food": {"description": "This can be both liquid and solid food such as meat, vegetables, alcohol, etc", "title": "Food", "type": "string"}, "quantity": {"description": "The exact quantity or amount of the food that should be used in the recipe", "title": "Quantity", "type": "integer"}, "unit": {"description": "The unit being used e.g. grams, milliliters, pounds, etc", "title": "Unit", "type": "string"}, "physical_quality": {"anyOf": [{"type": "string"}, {"type": "null"}], "description": "The characteristic of the ingredient", "title": "Physical Quality"}, "color": {"description": "The color of the food", "title": "Color", "type": "string"}}, "required": ["food", "quantity", "unit", "physical_quality", "color"], "title": "FoodEntity", "type": "object"}}, "properties": {"entities": {"items": {"$ref": "#/$defs/FoodEntity"}, "title": "Entities", "type": "array"}}, "required": ["entities"], "title": "FoodEntities", "type": "object"}
 13pork belly:
 15    "reasoning": "The recipe specifies using 2 lb of pork belly as the main ingredient for the chashu pork.",
 16    "value": "2 lb",
 17    "entity": "pork belly"
 19green onions:
 21    "reasoning": "The recipe calls for 2 green onions, or 3 if they are small, to be used in the cooking process.",
 22    "value": "2 or 3",
 23    "entity": "green onions"
 25fresh ginger:
 27    "reasoning": "A 1 inch chunk of fresh ginger is required, which will give around 4 - 6 slices for the recipe.",
 28    "value": "1 in",
 29    "entity": "fresh ginger"
 33    "reasoning": "2 cloves of garlic are needed as part of the ingredients.",
 34    "value": "2 cloves",
 35    "entity": "garlic"
 39    "reasoning": "β…” cup of sake is used in the cooking liquid for flavor.",
 40    "value": "β…” cup",
 41    "entity": "sake"
 43soy sauce:
 45    "reasoning": "β…” cup of soy sauce is added to the cooking liquid, contributing to the dish's savory taste.",
 46    "value": "β…” cup",
 47    "entity": "soy sauce"
 51    "reasoning": "ΒΌ cup of mirin is included in the recipe for sweetness and depth of flavor.",
 52    "value": "ΒΌ cup",
 53    "entity": "mirin"
 57    "reasoning": "Β½ cup of sugar is used to sweeten the cooking liquid.",
 58    "value": "Β½ cup",
 59    "entity": "sugar"
 63    "reasoning": "2 cups of water (or a little more as needed) are required to create the cooking liquid for the pork.",
 64    "value": "2 cups",
 65    "entity": "water"
 68Extract Food Entities: ```json
 70  "entities": [
 71    {
 72      "food": "pork belly",
 73      "quantity": 2,
 74      "unit": "lb",
 75      "physical_quality": null,
 76      "color": ""
 77    },
 78    {
 79      "food": "green onions",
 80      "quantity": 2,
 81      "unit": "items",
 82      "physical_quality": "or 3 if small",
 83      "color": ""
 84    },
 85    {
 86      "food": "fresh ginger",
 87      "quantity": 1,
 88      "unit": "inch",
 89      "physical_quality": "chunk",
 90      "color": ""
 91    },
 92    {
 93      "food": "garlic",
 94      "quantity": 2,
 95      "unit": "cloves",
 96      "physical_quality": null,
 97      "color": ""
 98    },
 99    {
100      "food": "sake",
101      "quantity": 2,
102      "unit": "β…” cup",
103      "physical_quality": null,
104      "color": ""
105    },
106    {
107      "food": "soy sauce",
108      "quantity": 2,
109      "unit": "β…” cup",
110      "physical_quality": null,
111      "color": ""
112    },
113    {
114      "food": "mirin",
115      "quantity": 1,
116      "unit": "ΒΌ cup",
117      "physical_quality": null,
118      "color": ""
119    },
120    {
121      "food": "sugar",
122      "quantity": 1,
123      "unit": "Β½ cup",
124      "physical_quality": null,
125      "color": ""
126    },
127    {
128      "food": "water",
129      "quantity": 2,
130      "unit": "cups",
131      "physical_quality": "or a little more as needed",
132      "color": ""
133    }
134  ]

Closing remarks Link to this heading

The aim for this post was for me to familiarize myself a bit more with DSPy that seems to be all the ‘rave’ lately. This is only an initial attempt to get a first understanding of what you can and cannot do. Hopefully, this will give you some insights on how you can get started with DSPy as well.

However to summarize my first impressions:

  1. Easy to use and get started with βœ…
  2. Nice to not have to spend hours on prompt-engineering βœ…
  3. Nice to treat this a a “typical” ML problem using optimization βœ…
  4. Still evolving and not “production ready” yet ❌
  5. Needs better logging / tracing to make it easier to understand what is happening when you are debugging your programs ❌

All in all the approach of programming rather prompting really reasonates with me and is inline with the current trends of Compound AI systems. Will be exacting to follow how this package evolves and matures over time. As prompting in its current state is not likely the future of building scalable, non-fragile and resilient systems using LLMs.

  1. called teleprompters before. ↩︎

  2. DSPy seems to be relying on Pydantic for many things in the library. ↩︎

  3. For this you can use dspy.Example ↩︎

  4. BootstrapFewShot is recommended if you have a few samples. ↩︎

  5. A partial match might have been better to account for some randomness in the data extraction. ↩︎