跳到内容

签名

在 DSPy 中,当我们为语言模型 (LM) 分配任务时,我们会将所需的行为指定为一个 Signature。

签名是对 DSPy 模块输入/输出行为的声明性规范。 签名允许您告诉语言模型 需要做什么,而不是指定 如何 让语言模型去做。

您可能熟悉函数签名,它们指定输入和输出参数及其类型。DSPy 签名与此类似,但有几个区别。典型的函数签名只是 描述 事物,而 DSPy 签名则 声明并初始化 模块的行为。此外,字段名称在 DSPy 签名中很重要。您用简单的英语表达语义角色:一个 question 与一个 answer 不同,一个 sql_querypython_code 不同。

我为什么要使用 DSPy 签名?

用于模块化和干净的代码,其中 LM 调用可以优化为高质量的 prompts(或自动微调)。大多数人通过编写冗长、脆弱的 prompt 来强制 LM 执行任务。或者通过收集/生成数据进行微调。编写签名比手动修改 prompt 或微调更具模块化、适应性和可重复性。DSPy 编译器将根据您的签名、数据和管道,为您的 LM(或微调您的小型 LM)找出如何构建高度优化的 prompt。在许多情况下,我们发现编译生成的 prompt 比人工编写的更好。这并非因为 DSPy 优化器比人类更有创造力,而仅仅是因为它们可以尝试更多可能性并直接调整指标。

内联 DSPy 签名

签名可以定义为短字符串,其中包含参数名称和可选类型,用于定义输入/输出的语义角色。

  1. 问答:"question -> answer",这等同于 "question: str -> answer: str",因为默认类型始终是 str

  2. 情感分类:"sentence -> sentiment: bool",例如,如果情感积极,则为 True

  3. 摘要:"document -> summary"

您的签名还可以包含具有类型的多个输入/输出字段

  1. 检索增强问答:"context: list[str], question: str -> answer: str"

  2. 带有推理的多项选择问答:"question, choices: list[str] -> reasoning: str, selection: int"

提示:对于字段,任何有效的变量名都可以使用!字段名称应该具有语义意义,但请从简单开始,不要过早地优化关键字!将这种类型的调整留给 DSPy 编译器。例如,对于摘要任务,使用 "document -> summary""text -> gist""long_context -> tldr" 可能都可以。

您还可以为内联签名添加指令,这些指令可以在运行时使用变量。使用 instructions 关键字参数为您的签名添加指令。

toxicity = dspy.Predict(
    dspy.Signature(
        "comment -> toxic: bool",
        instructions="Mark as 'toxic' if the comment includes insults, harassment, or sarcastic derogatory remarks.",
    )
)

示例 A:情感分类

sentence = "it's a charming and often affecting journey."  # example from the SST-2 dataset.

classify = dspy.Predict('sentence -> sentiment: bool')  # we'll see an example with Literal[] later
classify(sentence=sentence).sentiment
输出
True

示例 B:摘要

# Example from the XSum dataset.
document = """The 21-year-old made seven appearances for the Hammers and netted his only goal for them in a Europa League qualification round match against Andorran side FC Lustrains last season. Lee had two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice for the U's but was unable to save them from relegation. The length of Lee's contract with the promoted Tykes has not been revealed. Find all the latest football transfers on our dedicated page."""

summarize = dspy.ChainOfThought('document -> summary')
response = summarize(document=document)

print(response.summary)
可能的输出
The 21-year-old Lee made seven appearances and scored one goal for West Ham last season. He had loan spells in League One with Blackpool and Colchester United, scoring twice for the latter. He has now signed a contract with Barnsley, but the length of the contract has not been revealed.

许多 DSPy 模块(除了 dspy.Predict)通过在内部扩展您的签名来返回辅助信息。

例如,dspy.ChainOfThought 也添加了一个 reasoning 字段,其中包含 LM 在生成输出 summary 之前的推理过程。

print("Reasoning:", response.reasoning)
可能的输出
Reasoning: We need to highlight Lee's performance for West Ham, his loan spells in League One, and his new contract with Barnsley. We also need to mention that his contract length has not been disclosed.

基于类的 DSPy 签名

对于某些高级任务,您需要更详细的签名。这通常是为了:

  1. 澄清任务的性质(如下面作为 docstring 表达)。

  2. 提供关于输入字段性质的提示,表达为 dspy.InputFielddesc 关键字参数。

  3. 提供关于输出字段的约束,表达为 dspy.OutputFielddesc 关键字参数。

示例 C:分类

from typing import Literal

class Emotion(dspy.Signature):
    """Classify emotion."""

    sentence: str = dspy.InputField()
    sentiment: Literal['sadness', 'joy', 'love', 'anger', 'fear', 'surprise'] = dspy.OutputField()

sentence = "i started feeling a little vulnerable when the giant spotlight started blinding me"  # from dair-ai/emotion

classify = dspy.Predict(Emotion)
classify(sentence=sentence)
可能的输出
Prediction(
    sentiment='fear'
)

提示:更清晰地向语言模型指定您的请求并没有错。基于类的签名可以帮助您实现这一点。然而,请勿手动过早地调整签名的关键字。DSPy 优化器可能会做得更好(并且在不同的语言模型上具有更好的迁移性)。

示例 D:评估引文忠实度的指标

class CheckCitationFaithfulness(dspy.Signature):
    """Verify that the text is based on the provided context."""

    context: str = dspy.InputField(desc="facts here are assumed to be true")
    text: str = dspy.InputField()
    faithfulness: bool = dspy.OutputField()
    evidence: dict[str, list[str]] = dspy.OutputField(desc="Supporting evidence for claims")

context = "The 21-year-old made seven appearances for the Hammers and netted his only goal for them in a Europa League qualification round match against Andorran side FC Lustrains last season. Lee had two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice for the U's but was unable to save them from relegation. The length of Lee's contract with the promoted Tykes has not been revealed. Find all the latest football transfers on our dedicated page."

text = "Lee scored 3 goals for Colchester United."

faithfulness = dspy.ChainOfThought(CheckCitationFaithfulness)
faithfulness(context=context, text=text)
可能的输出
Prediction(
    reasoning="Let's check the claims against the context. The text states Lee scored 3 goals for Colchester United, but the context clearly states 'He scored twice for the U's'. This is a direct contradiction.",
    faithfulness=False,
    evidence={'goal_count': ["scored twice for the U's"]}
)

示例 E:多模态图像分类

class DogPictureSignature(dspy.Signature):
    """Output the dog breed of the dog in the image."""
    image_1: dspy.Image = dspy.InputField(desc="An image of a dog")
    answer: str = dspy.OutputField(desc="The dog breed of the dog in the image")

image_url = "https://picsum.photos/id/237/200/300"
classify = dspy.Predict(DogPictureSignature)
classify(image_1=dspy.Image.from_url(image_url))

可能的输出

Prediction(
    answer='Labrador Retriever'
)

签名中的类型解析

DSPy 签名支持各种注解类型

  1. 基本类型,例如 strintbool
  2. typing 模块类型,例如 List[str]Dict[str, int]Optional[float]Union[str, int]
  3. 自定义类型,在您的代码中定义
  4. 点表示法,用于配置得当的嵌套类型
  5. 特殊数据类型,例如 dspy.Image, dspy.History

使用自定义类型

# Simple custom type
class QueryResult(pydantic.BaseModel):
    text: str
    score: float

signature = dspy.Signature("query: str -> result: QueryResult")

class Container:
    class Query(pydantic.BaseModel):
        text: str
    class Score(pydantic.BaseModel):
        score: float

signature = dspy.Signature("query: Container.Query -> score: Container.Score")

使用签名构建模块并编译它们

虽然签名方便用于结构化输入/输出的原型设计,但这并非使用它们的唯一原因!

您应该将多个签名组合成更大的 DSPy 模块,并将这些模块编译成优化的 prompt 和微调。