输出优化: BestOfN 和 Refine

BestOfN 和 Refine 都是 DSPy 模块，旨在通过使用不同的参数设置进行多次 LM 调用来提高预测的可靠性和质量。当尝试次数达到 N 或 reward_fn 返回高于 threshold 的奖励时，这两个模块都会停止。

BestOfN

BestOfN 是一个模块，它会使用不同的温度设置多次运行提供的模块（最多 N 次）。它会返回第一个通过指定阈值的预测，如果所有预测都未达到阈值，则返回奖励最高的那个。

基本用法

假设我们希望模型给出单个词语答案的可能性最大。我们可以使用 BestOfN 来尝试多种温度设置并返回最佳结果。

import dspy

def one_word_answer(args, pred: dspy.Prediction) -> float:
    return 1.0 if len(pred.answer.split()) == 1 else 0.0

best_of_3 = dspy.BestOfN(
    module=dspy.ChainOfThought("question -> answer"), 
    N=3, 
    reward_fn=one_word_answer, 
    threshold=1.0
)

result = best_of_3(question="What is the capital of Belgium?")
print(result.answer)  # Brussels

错误处理

默认情况下，如果模块在尝试过程中遇到错误，它将继续尝试直到达到 N 次。您可以使用 fail_count 参数调整此行为。

best_of_3 = dspy.BestOfN(
    module=qa, 
    N=3, 
    reward_fn=one_word_answer, 
    threshold=1.0,
    fail_count=1
)

best_of_3(question="What is the capital of Belgium?")
# raises an error after the first failure

Refine

Refine 通过添加自动反馈循环扩展了 BestOfN 的功能。在每次不成功的尝试（最后一次除外）之后，它会自动生成关于模块性能的详细反馈，并将此反馈用作后续运行的提示。

基本用法

import dspy

def one_word_answer(args, pred: dspy.Prediction) -> float:
    return 1.0 if len(pred.answer.split()) == 1 else 0.0

refine = dspy.Refine(
    module=dspy.ChainOfThought("question -> answer"), 
    N=3, 
    reward_fn=one_word_answer, 
    threshold=1.0
)

result = refine(question="What is the capital of Belgium?")
print(result.answer)  # Brussels

错误处理

与 BestOfN 类似，即使发生错误，Refine 默认也会尝试最多 N 次。您可以使用 fail_count 参数控制此行为。

# Stop after the first error
refine = dspy.Refine(
    module=qa, 
    N=3, 
    reward_fn=one_word_answer, 
    threshold=1.0,
    fail_count=1
)

比较: BestOfN 与 Refine

这两个模块有着相似的目的，但在方法上有所不同。

BestOfN 简单地尝试不同的温度设置，并根据 reward_fn 定义的选择最佳预测结果。
Refine 添加了一个反馈循环，使用 LM 根据之前的预测和 reward_fn 中的代码生成关于模块自身性能的详细反馈。然后将此反馈用作后续运行的提示。

实际示例

确保事实准确性

import dspy

class FactualityJudge(dspy.Signature):
    """Determine if a statement is factually accurate."""
    statement: str = dspy.InputField()
    is_factual: bool = dspy.OutputField()

factuality_judge = dspy.ChainOfThought(FactualityJudge)

def factuality_reward(args, pred: dspy.Prediction) -> float:
    statement = pred.answer    
    result = factuality_judge(statement)    
    return 1.0 if result.is_factual else 0.0

refined_qa = dspy.Refine(
    module=dspy.ChainOfThought("question -> answer"),
    N=3,
    reward_fn=factuality_reward,
    threshold=1.0
)

result = refined_qa(question="Tell me about Belgium's capital city.")
print(result.answer)

摘要 - 控制响应长度

import dspy

def ideal_length_reward(args, pred: dspy.Prediction) -> float:
    """
    Reward the summary for being close to 75 words with a tapering off for longer summaries.
    """
    word_count = len(pred.summary.split())
    distance = abs(word_count - 75)
    return max(0.0, 1.0 - (distance / 125))

optimized_summarizer = dspy.BestOfN(
    module=dspy.ChainOfThought("text -> summary"),
    N=50,
    reward_fn=ideal_length_reward,
    threshold=0.9
)

result = optimized_summarizer(
    text="[Long text to summarize...]"
)
print(result.summary)

从 `dspy.Suggest` 和 `dspy.Assert` 迁移

自 DSPy 2.6 起，BestOfN 和 Refine 替代了 dspy.Suggest 和 dspy.Assert。

输出优化: BestOfN 和 Refine

BestOfN

基本用法

错误处理

Refine

基本用法

错误处理

比较: BestOfN 与 Refine

实际示例

确保事实准确性

摘要 - 控制响应长度

从 dspy.Suggest 和 dspy.Assert 迁移

从 `dspy.Suggest` 和 `dspy.Assert` 迁移