教程：检索增强生成 (RAG)¶

让我们快速浏览一个在 DSPy 中使用和不使用检索增强生成 (RAG) 进行基本问答的示例。具体来说，让我们构建一个用于回答技术问题的系统，例如关于 Linux 或 iPhone 应用程序的问题。

通过 pip install -U dspy 安装最新版本的 DSPy 并跟着操作。如果您正在寻找 DSPy 的概念概述，可以从这篇最近的讲座开始。

配置 DSPy 环境。¶

让我们告诉 DSPy，我们将在模块中使用 OpenAI 的 gpt-4o-mini。为了进行身份验证，DSPy 会查找您的 OPENAI_API_KEY。您可以轻松地将其替换为其他提供商或本地模型。

建议：设置 MLflow Tracing 以了解内部发生的情况。

MLflow DSPy 集成¶

MLflow 是一个 LLMOps 工具，可原生集成到 DSPy 中，并提供可解释性和实验跟踪功能。在本教程中，您可以使用 MLflow 将提示和优化进度可视化为跟踪记录，以便更好地理解 DSPy 的行为。您可以按照以下四个步骤轻松设置 MLflow。

MLflow Trace

安装 MLflow

%pip install mlflow>=2.20

在单独的终端中启动 MLflow UI

mlflow ui --port 5000

将笔记本连接到 MLflow

import mlflow

mlflow.set_tracking_uri("https://:5000")
mlflow.set_experiment("DSPy")

启用跟踪。

mlflow.dspy.autolog()

完成上述步骤后，您可以在笔记本上看到每次程序执行的跟踪记录。它们提供了对模型行为的良好可见性，并帮助您在整个教程中更好地理解 DSPy 的概念。

要了解更多关于集成的信息，请访问MLflow DSPy 文档。

In [1]

已复制！

import dspy

lm = dspy.LM('openai/gpt-4o-mini')
dspy.configure(lm=lm)
import dspy lm = dspy.LM('openai/gpt-4o-mini') dspy.configure(lm=lm)

探索一些基本的 DSPy 模块。¶

您总是可以通过 lm(prompt="prompt") 或 lm(messages=[...]) 直接向 LM 发送提示。然而，DSPy 为您提供了 Modules，这是定义 LM 函数的更好方法。

最简单的模块是 dspy.Predict。它接收一个DSPy 签名，即结构化的输入/输出模式，并返回一个可调用的函数，用于实现您指定的行为。让我们使用签名的“内联”表示法来声明一个模块，该模块接收一个 question（类型为 str）作为输入，并生成一个 response 作为输出。

In [2]

已复制！

qa = dspy.Predict('question: str -> response: str')
response = qa(question="what are high memory and low memory on linux?")

print(response.response)
qa = dspy.Predict('question: str -> response: str') response = qa(question="what are high memory and low memory on linux?") print(response.response)

In Linux, "high memory" and "low memory" refer to different regions of the system's memory address space, particularly in the context of 32-bit architectures.

- **Low Memory**: This typically refers to the memory that is directly accessible by the kernel. In a 32-bit system, this is usually the first 896 MB of RAM (from 0 to 896 MB). The kernel can directly map this memory, making it faster for the kernel to access and manage. Low memory is used for kernel data structures and for user processes that require direct access to memory.

- **High Memory**: This refers to the memory above the low memory limit, which is not directly accessible by the kernel in a 32-bit system. This area is typically above 896 MB. The kernel cannot directly access this memory without using special mechanisms, such as mapping it into the kernel's address space when needed. High memory is used for user processes that require more memory than what is available in low memory.

In summary, low memory is directly accessible by the kernel, while high memory requires additional steps for the kernel to access it, especially in 32-bit systems. In 64-bit systems, this distinction is less significant as the kernel can address a much larger memory space directly.

请注意我们在签名中指定的变量名如何定义了我们的输入和输出参数名及其作用。

那么，DSPy 如何构建这个 qa 模块呢？在这个示例中，目前还没有什么特别之处。该模块将您的签名、LM 和输入传递给一个 Adapter，这是一个负责构建输入和解析结构化输出以符合您的签名的层。

让我们直接看看。您可以轻松地检查 DSPy 发送的最后 n 个提示。另外，如果您在上面启用了 MLflow Tracing，您可以在树形视图中查看每次程序执行的完整 LLM 交互。

In [3]

已复制！

dspy.inspect_history(n=1)
dspy.inspect_history(n=1)



[2024-11-23T23:16:35.966534]

System message:

Your input fields are:
1. `question` (str)

Your output fields are:
1. `response` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## response ## ]]
{response}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `response`.


User message:

[[ ## question ## ]]
what are high memory and low memory on linux?

Respond with the corresponding output fields, starting with the field `[[ ## response ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## response ## ]]
In Linux, "high memory" and "low memory" refer to different regions of the system's memory address space, particularly in the context of 32-bit architectures.

- **Low Memory**: This typically refers to the memory that is directly accessible by the kernel. In a 32-bit system, this is usually the first 896 MB of RAM (from 0 to 896 MB). The kernel can directly map this memory, making it faster for the kernel to access and manage. Low memory is used for kernel data structures and for user processes that require direct access to memory.

- **High Memory**: This refers to the memory above the low memory limit, which is not directly accessible by the kernel in a 32-bit system. This area is typically above 896 MB. The kernel cannot directly access this memory without using special mechanisms, such as mapping it into the kernel's address space when needed. High memory is used for user processes that require more memory than what is available in low memory.

In summary, low memory is directly accessible by the kernel, while high memory requires additional steps for the kernel to access it, especially in 32-bit systems. In 64-bit systems, this distinction is less significant as the kernel can address a much larger memory space directly.

[[ ## completed ## ]]

DSPy 有各种内置模块，例如 dspy.ChainOfThought、dspy.ProgramOfThought 和 dspy.ReAct。它们可以与基本的 dspy.Predict 互换：它们接受您特定于任务的签名，并对其应用通用提示技术和推理时策略。

例如，dspy.ChainOfThought 是一种在 LM 提交签名中请求的输出之前，轻松引导其产生 reasoning 的方法。

在下面的示例中，我们将省略 str 类型（因为默认类型是字符串）。您应该随意尝试其他字段和类型，例如尝试 topics: list[str] 或 is_realistic: bool。

In [4]

已复制！

cot = dspy.ChainOfThought('question -> response')
cot(question="should curly braces appear on their own line?")
cot = dspy.ChainOfThought('question -> response') cot(question="should curly braces appear on their own line?")

Out[4]

Prediction(
    reasoning='The placement of curly braces on their own line depends on the coding style and conventions being followed. In some programming languages and style guides, such as the Allman style, curly braces are placed on their own line to enhance readability. In contrast, other styles, like K&R style, place the opening brace on the same line as the control statement. Ultimately, it is a matter of personal or team preference, and consistency within a project is key.',
    response='Curly braces can appear on their own line depending on the coding style you are following. If you prefer a style that enhances readability, such as the Allman style, then yes, they should be on their own line. However, if you are following a different style, like K&R, they may not need to be. Consistency is important, so choose a style and stick with it.'
)

有趣的是，在这个例子中，要求提供推理可能会使输出 response 变短。这是好事还是坏事？这取决于您的需求：没有免费的午餐，但 DSPy 提供了工具，让您可以极快地尝试不同的策略。

顺便说一句，dspy.ChainOfThought 是在 DSPy 中使用 dspy.Predict 实现的。如果您好奇的话，这里是 dspy.inspect_history 的一个好例子。

充分利用 DSPy 需要评估和迭代开发。¶

到目前为止，您已经了解了许多关于 DSPy 的知识。如果您只是想要快速编写脚本，DSPy 的这些功能已经足以实现很多。将 DSPy 签名和模块融入到您的 Python 控制流程中是一种非常符合人体工程学的方式来完成基于 LM 的任务。

尽管如此，您很可能来到这里是因为您想构建一个高质量的系统并随着时间的推移对其进行改进。在 DSPy 中实现这一点的方法是通过评估系统质量并使用 DSPy 强大的工具（例如优化器）进行快速迭代。

在 DSPy 中操作示例。¶

为了衡量 DSPy 系统的质量，您需要 (1) 一系列输入值，例如 question，以及 (2) 一个可以对系统输出质量评分的 metric。指标种类繁多。有些指标需要理想输出的 ground-truth 标签，例如用于分类或问答。其他指标是自监督的，例如检查忠实度或缺乏幻觉，可能使用一个 DSPy 程序作为这些质量的判断者。

让我们加载一个包含问题及其（相当长）黄金标准答案的数据集。由于我们开始这个笔记本的目标是构建一个用于回答技术问题的系统，我们从 RAG-QA Arena 数据集中获取了一系列基于 StackExchange 的问题及其正确答案。

In [5]

已复制！

import ujson
from dspy.utils import download

# Download question--answer pairs from the RAG-QA Arena "Tech" dataset.
download("https://hugging-face.cn/dspy/cache/resolve/main/ragqa_arena_tech_examples.jsonl")

with open("ragqa_arena_tech_examples.jsonl") as f:
    data = [ujson.loads(line) for line in f]
import ujson from dspy.utils import download # Download question--answer pairs from the RAG-QA Arena "Tech" dataset. download("https://hugging-face.cn/dspy/cache/resolve/main/ragqa_arena_tech_examples.jsonl") with open("ragqa_arena_tech_examples.jsonl") as f: data = [ujson.loads(line) for line in f]

In [6]

已复制！

# Inspect one datapoint.
data[0]
# Inspect one datapoint. data[0]

Out[6]

{'question': 'why igp is used in mpls?',
 'response': "An IGP exchanges routing prefixes between gateways/routers.  \nWithout a routing protocol, you'd have to configure each route on every router and you'd have no dynamic updates when routes change because of link failures. \nFuthermore, within an MPLS network, an IGP is vital for advertising the internal topology and ensuring connectivity for MP-BGP inside the network.",
 'gold_doc_ids': [2822, 2823]}

给定像这样的简单字典，让我们创建一个 dspy.Example 列表，这是在 DSPy 中携带训练（或测试）数据点的数据类型。

构建 dspy.Example 时，通常应该指定 .with_inputs("field1", "field2", ...) 来指示哪些字段是输入。其他字段被视为标签或元数据。

In [7]

已复制！

data = [dspy.Example(**d).with_inputs('question') for d in data]

# Let's pick an `example` here from the data.
example = data[2]
example
data = [dspy.Example(**d).with_inputs('question') for d in data] # Let's pick an `example` here from the data. example = data[2] example

Out[7]

Example({'question': 'why are my text messages coming up as maybe?', 'response': 'This is part of the Proactivity features new with iOS 9: It looks at info in emails to see if anyone with this number sent you an email and if it finds the phone number associated with a contact from your email, it will show you "Maybe". \n\nHowever, it has been suggested there is a bug in iOS 11.2 that can result in "Maybe" being displayed even when "Find Contacts in Other Apps" is disabled.', 'gold_doc_ids': [3956, 3957, 8034]}) (input_keys={'question'})

现在，让我们将数据分为

训练集（以及验证集）
- 这些是您通常提供给 DSPy 优化器的分割。
- 优化器通常直接从训练示例中学习，并使用验证示例检查其进度。
- 每个训练集和验证集最好有 30-300 个示例。
- 特别是对于提示优化器，通常最好传递更多验证示例而不是训练示例。
- 下面，我们将总共使用 200 个。如果您不传递 valset，MIPROv2 会将它们分成 20% 训练集和 80% 验证集。
开发集和测试集：其余部分，通常在 30-1000 的范围内，可用于
- 开发（即在迭代系统时可以检查它们）和
- 测试（最终保留评估）。

In [8]

已复制！

import random

random.Random(0).shuffle(data)
trainset, devset, testset = data[:200], data[200:500], data[500:1000]

len(trainset), len(devset), len(testset)
import random random.Random(0).shuffle(data) trainset, devset, testset = data[:200], data[200:500], data[500:1000] len(trainset), len(devset), len(testset)

Out[8]

(200, 300, 500)

在 DSPy 中进行评估。¶

什么样的指标适合我们的问答任务？有很多选择，但由于答案很长，我们可能会问：系统响应在多大程度上覆盖了黄金标准响应中的所有关键事实？反过来，系统响应在多大程度上没有提及黄金标准响应中没有的内容？

这个指标本质上是语义 F1，所以让我们从 DSPy 中加载一个 SemanticF1 指标。这个指标实际上是使用我们正在使用的任何 LM，作为一个非常简单的 DSPy 模块实现的。

In [9]

已复制！





from dspy.evaluate import SemanticF1

# Instantiate the metric.
metric = SemanticF1(decompositional=True)

# Produce a prediction from our `cot` module, using the `example` above as input.
pred = cot(**example.inputs())

# Compute the metric score for the prediction.
score = metric(example, pred)

print(f"Question: \t {example.question}\n")
print(f"Gold Response: \t {example.response}\n")
print(f"Predicted Response: \t {pred.response}\n")
print(f"Semantic F1 Score: {score:.2f}")
from dspy.evaluate import SemanticF1 # Instantiate the metric. metric = SemanticF1(decompositional=True) # Produce a prediction from our `cot` module, using the `example` above as input. pred = cot(**example.inputs()) # Compute the metric score for the prediction. score = metric(example, pred) print(f"Question: \t {example.question}\n") print(f"Gold Response: \t {example.response}\n") print(f"Predicted Response: \t {pred.response}\n") print(f"Semantic F1 Score: {score:.2f}")

Question: 	 why are my text messages coming up as maybe?

Gold Response: 	 This is part of the Proactivity features new with iOS 9: It looks at info in emails to see if anyone with this number sent you an email and if it finds the phone number associated with a contact from your email, it will show you "Maybe". 

However, it has been suggested there is a bug in iOS 11.2 that can result in "Maybe" being displayed even when "Find Contacts in Other Apps" is disabled.

Predicted Response: 	 Your text messages are showing up as "maybe" because your messaging app is uncertain about the sender's identity. This typically occurs when the sender's number is not saved in your contacts or if the message is from an unknown number. To resolve this, you can save the contact in your address book or check the message settings in your app.

Semantic F1 Score: 0.33

上面最终的 DSPy 模块调用实际上发生在 metric 内部。您可能好奇它是如何衡量这个示例的语义 F1 的。

In [10]

已复制！

dspy.inspect_history(n=1)
dspy.inspect_history(n=1)



[2024-11-23T23:16:36.149518]

System message:

Your input fields are:
1. `question` (str)
2. `ground_truth` (str)
3. `system_response` (str)

Your output fields are:
1. `reasoning` (str)
2. `ground_truth_key_ideas` (str): enumeration of key ideas in the ground truth
3. `system_response_key_ideas` (str): enumeration of key ideas in the system response
4. `discussion` (str): discussion of the overlap between ground truth and system response
5. `recall` (float): fraction (out of 1.0) of ground truth covered by the system response
6. `precision` (float): fraction (out of 1.0) of system response covered by the ground truth

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## ground_truth ## ]]
{ground_truth}

[[ ## system_response ## ]]
{system_response}

[[ ## reasoning ## ]]
{reasoning}

[[ ## ground_truth_key_ideas ## ]]
{ground_truth_key_ideas}

[[ ## system_response_key_ideas ## ]]
{system_response_key_ideas}

[[ ## discussion ## ]]
{discussion}

[[ ## recall ## ]]
{recall}        # note: the value you produce must be a single float value

[[ ## precision ## ]]
{precision}        # note: the value you produce must be a single float value

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Compare a system's response to the ground truth to compute recall and precision of key ideas.
        You will first enumerate key ideas in each response, discuss their overlap, and then report recall and precision.


User message:

[[ ## question ## ]]
why are my text messages coming up as maybe?

[[ ## ground_truth ## ]]
This is part of the Proactivity features new with iOS 9: It looks at info in emails to see if anyone with this number sent you an email and if it finds the phone number associated with a contact from your email, it will show you "Maybe". 

However, it has been suggested there is a bug in iOS 11.2 that can result in "Maybe" being displayed even when "Find Contacts in Other Apps" is disabled.

[[ ## system_response ## ]]
Your text messages are showing up as "maybe" because your messaging app is uncertain about the sender's identity. This typically occurs when the sender's number is not saved in your contacts or if the message is from an unknown number. To resolve this, you can save the contact in your address book or check the message settings in your app.

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## ground_truth_key_ideas ## ]]`, then `[[ ## system_response_key_ideas ## ]]`, then `[[ ## discussion ## ]]`, then `[[ ## recall ## ]]` (must be formatted as a valid Python float), then `[[ ## precision ## ]]` (must be formatted as a valid Python float), and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## reasoning ## ]]
The system response provides a general explanation for why text messages may show up as "maybe," focusing on the uncertainty of the sender's identity. However, it does not mention the specific iOS features or the potential bug in iOS 11.2 that are highlighted in the ground truth. The ground truth explains that this feature is part of iOS 9's Proactivity features and mentions a specific bug in iOS 11.2, which is not addressed in the system response.

[[ ## ground_truth_key_ideas ## ]]
1. Proactivity features introduced in iOS 9.
2. The system checks emails to associate phone numbers with contacts.
3. "Maybe" is displayed if the number is not saved in contacts.
4. Mention of a bug in iOS 11.2 causing "Maybe" to appear incorrectly.

[[ ## system_response_key_ideas ## ]]
1. Text messages show up as "maybe" due to uncertainty about the sender's identity.
2. Occurs when the sender's number is not saved in contacts or is from an unknown number.
3. Suggests saving the contact or checking message settings.

[[ ## discussion ## ]]
There is some overlap between the ground truth and the system response regarding the uncertainty of the sender's identity and the suggestion to save the contact. However, the system response lacks specific details about the iOS features and the bug mentioned in the ground truth. The ground truth provides a more comprehensive explanation of the "maybe" feature, while the system response is more general and does not address the iOS version specifics.

[[ ## recall ## ]]
0.25

[[ ## precision ## ]]
0.5

[[ ## completed ## ]]

为了进行评估，您可以使用上面的指标在一个简单的循环中计算平均分。但是为了获得良好的并行性和实用性，我们可以依靠 dspy.Evaluate。

In [11]

已复制！

# Define an evaluator that we can re-use.
evaluate = dspy.Evaluate(devset=devset, metric=metric, num_threads=24,
                         display_progress=True, display_table=2)

# Evaluate the Chain-of-Thought program.
evaluate(cot)
# Define an evaluator that we can re-use. evaluate = dspy.Evaluate(devset=devset, metric=metric, num_threads=24, display_progress=True, display_table=2) # Evaluate the Chain-of-Thought program. evaluate(cot)

Average Metric: 125.68 / 300 (41.9%): 100%|██████████| 300/300 [00:00<00:00, 666.96it/s]

2024/11/23 23:16:36 INFO dspy.evaluate.evaluate: Average Metric: 125.68228336477591 / 300 (41.9%)

	问题	示例响应	黄金文档 ID	推理	预测响应	SemanticF1
0	什么时候使用 C 而不是 C++，什么时候使用 C++ 而不是 C？	如果您对 C++ 和 C 都同样熟悉，建议...	[733]	C 和 C++ 都是功能强大的编程语言，但它们的作用...	当您需要低级内存访问、要求高性能时使用 C...
1	图像应该存储在 Git 仓库中吗？	一种观点认为没有明显的缺点，尤其...	[6253, 6254, 6275, 6278, 8215]	将图像存储在 Git 仓库中有利于版本控制...	图像可以存储在 Git 仓库中，但重要的是...	✔️ [0.444]

... 还有 298 行未显示 ...

Out[11]

41.89

在 MLflow 实验中跟踪评估结果

要随时间跟踪和可视化评估结果，您可以将结果记录在 MLflow 实验中。

import mlflow

with mlflow.start_run(run_name="rag_evaluation"):
    evaluate = dspy.Evaluate(
        devset=devset,
        metric=metric,
        num_threads=24,
        display_progress=True,
        # To record the outputs and detailed scores to MLflow
        return_all_scores=True,
        return_outputs=True,
    )

    # Evaluate the program as usual
    aggregated_score, outputs, all_scores = evaluate(cot)


    # Log the aggregated score
    mlflow.log_metric("semantic_f1_score", aggregated_score)
    # Log the detailed evaluation results as a table
    mlflow.log_table(
        {
            "Question": [example.question for example in eval_set],
            "Gold Response": [example.response for example in eval_set],
            "Predicted Response": outputs,
            "Semantic F1 Score": all_scores,
        },
        artifact_file="eval_results.json",
    )

要了解更多关于集成的信息，请访问MLflow DSPy 文档。

到目前为止，我们构建了一个非常简单的用于问答的思维链模块，并在一个小数据集上对其进行了评估。

我们能做得更好吗？在本指南的其余部分，我们将在 DSPy 中为同一任务构建一个检索增强生成 (RAG) 程序。我们将看到这如何显著提高分数，然后我们将使用一个 DSPy 优化器来编译我们的 RAG 程序，生成更高质量的提示，从而进一步提高我们的分数。

基本检索增强生成 (RAG)。¶

首先，让我们下载将用于 RAG 搜索的语料库数据。本教程的旧版本使用了完整的（65 万文档）语料库。为了使其运行速度更快且成本更低，我们将语料库下采样到只有 2.8 万文档。

In [12]

已复制！

download("https://hugging-face.cn/dspy/cache/resolve/main/ragqa_arena_tech_corpus.jsonl")
download("https://hugging-face.cn/dspy/cache/resolve/main/ragqa_arena_tech_corpus.jsonl")

设置系统的检索器。¶

对于 DSPy 而言，您可以插入任何用于调用工具或检索器的 Python 代码。这里，为了方便起见，我们将只使用 OpenAI Embeddings 并在本地进行 top-K 搜索。

注意：下面的步骤需要您要么执行 pip install -U faiss-cpu，要么向 dspy.retrievers.Embeddings 传递 brute_force_threshold=30_000 以避免使用 faiss。

In [13]

已复制！

# %pip install -U faiss-cpu  # or faiss-gpu if you have a GPU
# %pip install -U faiss-cpu # or faiss-gpu if you have a GPU

In [14]

已复制！

max_characters = 6000  # for truncating >99th percentile of documents
topk_docs_to_retrieve = 5  # number of documents to retrieve per search query

with open("ragqa_arena_tech_corpus.jsonl") as f:
    corpus = [ujson.loads(line)['text'][:max_characters] for line in f]
    print(f"Loaded {len(corpus)} documents. Will encode them below.")

embedder = dspy.Embedder('openai/text-embedding-3-small', dimensions=512)
search = dspy.retrievers.Embeddings(embedder=embedder, corpus=corpus, k=topk_docs_to_retrieve)
max_characters = 6000 # for truncating >99th percentile of documents topk_docs_to_retrieve = 5 # number of documents to retrieve per search query with open("ragqa_arena_tech_corpus.jsonl") as f: corpus = [ujson.loads(line)['text'][:max_characters] for line in f] print(f"Loaded {len(corpus)} documents. Will encode them below.") embedder = dspy.Embedder('openai/text-embedding-3-small', dimensions=512) search = dspy.retrievers.Embeddings(embedder=embedder, corpus=corpus, k=topk_docs_to_retrieve)

Loaded 28436 documents. Will encode them below.
Training a 32-byte FAISS index with 337 partitions, based on 28436 x 512-dim embeddings

构建您的第一个 RAG 模块。¶

在之前的指南中，我们孤立地查看了单个 DSPy 模块，例如 dspy.Predict("question -> answer")。

如果我们想构建一个包含多个步骤的 DSPy 程序怎么办？下面的语法使用 dspy.Module 可以让您将几个部分连接起来，在本例中是我们的检索器和一个生成模块，这样整个系统就可以进行优化。

具体来说，在 __init__ 方法中，您可以声明您需要的任何子模块，在本例中，它只是一个 dspy.ChainOfThought('context, question -> response') 模块，该模块接收检索到的上下文、一个问题，并产生一个响应。在 forward 方法中，您只需表达任何您喜欢的 Python 控制流程，可能使用您的模块。在本例中，我们首先调用之前定义的 search 函数，然后调用 self.respond ChainOfThought 模块。

In [15]

已复制！

class RAG(dspy.Module):
    def __init__(self):
        self.respond = dspy.ChainOfThought('context, question -> response')

    def forward(self, question):
        context = search(question).passages
        return self.respond(context=context, question=question)
class RAG(dspy.Module): def __init__(self): self.respond = dspy.ChainOfThought('context, question -> response') def forward(self, question): context = search(question).passages return self.respond(context=context, question=question)

让我们使用 RAG 模块。

In [16]

已复制！

rag = RAG()
rag(question="what are high memory and low memory on linux?")
rag = RAG() rag(question="what are high memory and low memory on linux?")

Out[16]

Prediction(
    reasoning="High Memory and Low Memory in Linux refer to two segments of the kernel's memory space. Low Memory is the portion of memory that the kernel can access directly and is statically mapped at boot time. This area is typically used for kernel data structures and is always accessible to the kernel. High Memory, on the other hand, is not permanently mapped in the kernel's address space, meaning that the kernel cannot access it directly without first mapping it into its address space. High Memory is used for user-space applications and temporary data buffers. The distinction allows for better memory management and security, as user-space applications cannot directly access kernel-space memory.",
    response="In Linux, High Memory refers to the segment of memory that is not permanently mapped in the kernel's address space, which means the kernel must map it temporarily to access it. This area is typically used for user-space applications and temporary data buffers. Low Memory, in contrast, is the portion of memory that the kernel can access directly and is statically mapped at boot time. It is used for kernel data structures and is always accessible to the kernel. This separation enhances security by preventing user-space applications from accessing kernel-space memory directly."
)

In [17]

已复制！

dspy.inspect_history()
dspy.inspect_history()



[2024-11-23T23:16:49.175612]

System message:

Your input fields are:
1. `context` (str)
2. `question` (str)

Your output fields are:
1. `reasoning` (str)
2. `response` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## context ## ]]
{context}

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## response ## ]]
{response}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `context`, `question`, produce the fields `response`.


User message:

[[ ## context ## ]]
[1] «As far as I remember, High Memory is used for application space and Low Memory for the kernel. Advantage is that (user-space) applications cant access kernel-space memory.»
[2] «HIGHMEM is a range of kernels memory space, but it is NOT memory you access but its a place where you put what you want to access. A typical 32bit Linux virtual memory map is like: 0x00000000-0xbfffffff: user process (3GB) 0xc0000000-0xffffffff: kernel space (1GB) (CPU-specific vector and whatsoever are ignored here). Linux splits the 1GB kernel space into 2 pieces, LOWMEM and HIGHMEM. The split varies from installation to installation. If an installation chooses, say, 512MB-512MB for LOW and HIGH mems, the 512MB LOWMEM (0xc0000000-0xdfffffff) is statically mapped at the kernel boot time; usually the first so many bytes of the physical memory is used for this so that virtual and physical addresses in this range have a constant offset of, say, 0xc0000000. On the other hand, the latter 512MB (HIGHMEM) has no static mapping (although you could leave pages semi-permanently mapped there, but you must do so explicitly in your driver code). Instead, pages are temporarily mapped and unmapped here so that virtual and physical addresses in this range have no consistent mapping. Typical uses of HIGHMEM include single-time data buffers.»
[3] «This is relevant to the Linux kernel; Im not sure how any Unix kernel handles this. The High Memory is the segment of memory that user-space programs can address. It cannot touch Low Memory. Low Memory is the segment of memory that the Linux kernel can address directly. If the kernel must access High Memory, it has to map it into its own address space first. There was a patch introduced recently that lets you control where the segment is. The tradeoff is that you can take addressable memory away from user space so that the kernel can have more memory that it does not have to map before using. Additional resources: https://tldp.cn/HOWTO/KernelAnalysis-HOWTO-7.html http://linux-mm.org/HighMemory»
[4] «The first reference to turn to is Linux Device Drivers (available both online and in book form), particularly chapter 15 which has a section on the topic. In an ideal world, every system component would be able to map all the memory it ever needs to access. And this is the case for processes on Linux and most operating systems: a 32-bit process can only access a little less than 2^32 bytes of virtual memory (in fact about 3GB on a typical Linux 32-bit architecture). It gets difficult for the kernel, which needs to be able to map the full memory of the process whose system call its executing, plus the whole physical memory, plus any other memory-mapped hardware device. So when a 32-bit kernel needs to map more than 4GB of memory, it must be compiled with high memory support. High memory is memory which is not permanently mapped in the kernels address space. (Low memory is the opposite: it is always mapped, so you can access it in the kernel simply by dereferencing a pointer.) When you access high memory from kernel code, you need to call kmap first, to obtain a pointer from a page data structure (struct page). Calling kmap works whether the page is in high or low memory. There is also kmap_atomic which has added constraints but is more efficient on multiprocessor machines because it uses finer-grained locking. The pointer obtained through kmap is a resource: it uses up address space. Once youve finished with it, you must call kunmap (or kunmap_atomic) to free that resource; then the pointer is no longer valid, and the contents of the page cant be accessed until you call kmap again.»
[5] «/proc/meminfo will tell you how free works, but /proc/kcore can tell you what the kernel uses. From the same page: /proc/kcore This file represents the physical memory of the system and is stored in the ELF core file format. With this pseudo-file, and an unstripped kernel (/usr/src/linux/vmlinux) binary, GDB can be used to examine the current state of any kernel data structures. The total length of the file is the size of physical memory (RAM) plus 4KB. /proc/meminfo This file reports statistics about memory usage on the system. It is used by free(1) to report the amount of free and used memory (both physical and swap) on the system as well as the shared memory and buffers used by the kernel. Each line of the file consists of a parameter name, followed by a colon, the value of the parameter, and an option unit of measurement (e.g., kB). The list below describes the parameter names and the format specifier required to read the field value. Except as noted below, all of the fields have been present since at least Linux 2.6.0. Some fileds are displayed only if the kernel was configured with various options; those dependencies are noted in the list. MemTotal %lu Total usable RAM (i.e., physical RAM minus a few reserved bits and the kernel binary code). MemFree %lu The sum of LowFree+HighFree. Buffers %lu Relatively temporary storage for raw disk blocks that shouldnt get tremendously large (20MB or so). Cached %lu In-memory cache for files read from the disk (the page cache). Doesnt include SwapCached. SwapCached %lu Memory that once was swapped out, is swapped back in but still also is in the swap file. (If memory pressure is high, these pages dont need to be swapped out again because they are already in the swap file. This saves I/O.) Active %lu Memory that has been used more recently and usually not reclaimed unless absolutely necessary. Inactive %lu Memory which has been less recently used. It is more eligible to be reclaimed for other purposes. Active(anon) %lu (since Linux 2.6.28) [To be documented.] Inactive(anon) %lu (since Linux 2.6.28) [To be documented.] Active(file) %lu (since Linux 2.6.28) [To be documented.] Inactive(file) %lu (since Linux 2.6.28) [To be documented.] Unevictable %lu (since Linux 2.6.28) (From Linux 2.6.28 to 2.6.30, CONFIG_UNEVICTABLE_LRU was required.) [To be documented.] Mlocked %lu (since Linux 2.6.28) (From Linux 2.6.28 to 2.6.30, CONFIG_UNEVICTABLE_LRU was required.) [To be documented.] HighTotal %lu (Starting with Linux 2.6.19, CONFIG_HIGHMEM is required.) Total amount of highmem. Highmem is all memory above ~860MB of physical memory. Highmem areas are for use by user-space programs, or for the page cache. The kernel must use tricks to access this memory, making it slower to access than lowmem. HighFree %lu (Starting with Linux 2.6.19, CONFIG_HIGHMEM is required.) Amount of free highmem. LowTotal %lu (Starting with Linux 2.6.19, CONFIG_HIGHMEM is required.) Total amount of lowmem. Lowmem is memory which can be used for everything that highmem can be used for, but it is also available for the kernels use for its own data structures. Among many other things, it is where everything from Slab is allocated. Bad things happen when youre out of lowmem. LowFree %lu (Starting with Linux 2.6.19, CONFIG_HIGHMEM is required.) Amount of free lowmem. MmapCopy %lu (since Linux 2.6.29) (CONFIG_MMU is required.) [To be documented.] SwapTotal %lu Total amount of swap space available. SwapFree %lu Amount of swap space that is currently unused. Dirty %lu Memory which is waiting to get written back to the disk. Writeback %lu Memory which is actively being written back to the disk. AnonPages %lu (since Linux 2.6.18) Non-file backed pages mapped into user-space page tables. Mapped %lu Files which have been mmaped, such as libraries. Shmem %lu (since Linux 2.6.32) [To be documented.] Slab %lu In-kernel data structures cache. SReclaimable %lu (since Linux 2.6.19) Part of Slab, that might be reclaimed, such as caches. SUnreclaim %lu (since Linux 2.6.19) Part of Slab, that cannot be reclaimed on memory pressure. KernelStack %lu (since Linux 2.6.32) Amount of memory allocated to kernel stacks. PageTables %lu (since Linux 2.6.18) Amount of memory dedicated to the lowest level of page tables. Quicklists %lu (since Linux 2.6.27) (CONFIG_QUICKLIST is required.) [To be documented.] NFS_Unstable %lu (since Linux 2.6.18) NFS pages sent to the server, but not yet committed to stable storage. Bounce %lu (since Linux 2.6.18) Memory used for block device bounce buffers. WritebackTmp %lu (since Linux 2.6.26) Memory used by FUSE for temporary writeback buffers. CommitLimit %lu (since Linux 2.6.10) Based on the overcommit ratio (vm.overcommit_ratio), this is the total amount of memory currently available to be allocated on the system. This limit is adhered to only if strict overcommit accounting is enabled (mode 2 in /proc/sys/vm/overcommit_ratio). The CommitLimit is calculated using the following formula: CommitLimit = ([total RAM pages] - [total huge TLB pages]) * overcommit_ratio / 100 + [total swap pages] For example, on a system with 1GB of physical RAM and 7GB of swap with a overcommit_ratio of 30, this formula yields a CommitLimit of 7.3GB. For more details, see the memory overcommit documentation in the kernel source file Documentation/vm/overcommit-accounting. Committed_AS %lu The amount of memory presently allocated on the system. The committed memory is a sum of all of the memory which has been allocated by processes, even if it has not been used by them as of yet. A process which allocates 1GB of memory (using malloc(3) or similar), but touches only 300MB of that memory will show up as using only 300MB of memory even if it has the address space allocated for the entire 1GB. This 1GB is memory which has been committed to by the VM and can be used at any time by the allocating application. With strict overcommit enabled on the system (mode 2 /proc/sys/vm/overcommit_memory), allocations w»

[[ ## question ## ]]
what are high memory and low memory on linux?

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## response ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## reasoning ## ]]
High Memory and Low Memory in Linux refer to two segments of the kernel's memory space. Low Memory is the portion of memory that the kernel can access directly and is statically mapped at boot time. This area is typically used for kernel data structures and is always accessible to the kernel. High Memory, on the other hand, is not permanently mapped in the kernel's address space, meaning that the kernel cannot access it directly without first mapping it into its address space. High Memory is used for user-space applications and temporary data buffers. The distinction allows for better memory management and security, as user-space applications cannot directly access kernel-space memory.

[[ ## response ## ]]
In Linux, High Memory refers to the segment of memory that is not permanently mapped in the kernel's address space, which means the kernel must map it temporarily to access it. This area is typically used for user-space applications and temporary data buffers. Low Memory, in contrast, is the portion of memory that the kernel can access directly and is statically mapped at boot time. It is used for kernel data structures and is always accessible to the kernel. This separation enhances security by preventing user-space applications from accessing kernel-space memory directly.

[[ ## completed ## ]]

之前使用 CoT 模块时，我们在 devset 上的语义 F1 分数约为 40%。这个 RAG 模块会获得更好的分数吗？

In [18]

已复制！

evaluate(RAG())
evaluate(RAG())

Average Metric: 166.54 / 300 (55.5%): 100%|██████████| 300/300 [00:04<00:00, 61.40it/s]

2024/11/23 23:16:54 INFO dspy.evaluate.evaluate: Average Metric: 166.53601368289284 / 300 (55.5%)

	问题	示例响应	黄金文档 ID	推理	预测响应	SemanticF1
0	什么时候使用 C 而不是 C++，什么时候使用 C++ 而不是 C？	如果您对 C++ 和 C 都同样熟悉，建议...	[733]	主要在需要简单性等场景下，应该优先使用 C 而非 C++...	在开发嵌入式系统时，需要低级访问，应使用 C 而非 C++...	✔️ [0.500]
1	图像应该存储在 Git 仓库中吗？	一种观点认为没有明显的缺点，尤其...	[6253, 6254, 6275, 6278, 8215]	通常不建议在 Git 仓库中存储图像...	虽然技术上可以在 Git 仓库中存储图像...	✔️ [0.444]

... 还有 298 行未显示 ...

Out[18]

55.51

使用 DSPy 优化器改进您的 RAG 提示。¶

开箱即用，我们的 RAG 模块得分 55%。我们有哪些选项可以使其更强大？DSPy 提供的各种选择之一是优化我们管道中的提示。

如果您的程序中有许多子模块，它们都将一起被优化。在这种情况下，只有一个：self.respond = dspy.ChainOfThought('context, question -> response')

让我们设置并使用 DSPy 的 MIPRO (v2) 优化器。下面的运行成本约为 1.5 美元（对于 medium 自动设置），可能需要 20-30 分钟，具体取决于您的线程数。

In [ ]

已复制！

tp = dspy.MIPROv2(metric=metric, auto="medium", num_threads=24)  # use fewer threads if your rate limit is small

optimized_rag = tp.compile(RAG(), trainset=trainset,
                           max_bootstrapped_demos=2, max_labeled_demos=2,
                           requires_permission_to_run=False)
tp = dspy.MIPROv2(metric=metric, auto="medium", num_threads=24) # use fewer threads if your rate limit is small optimized_rag = tp.compile(RAG(), trainset=trainset, max_bootstrapped_demos=2, max_labeled_demos=2, requires_permission_to_run=False)

这里的提示优化过程相当系统化，您可以在这篇论文中了解更多信息。重要的是，它不是一个神奇的按钮。例如，它很有可能过拟合您的训练集，并且对保留集泛化不好，因此迭代验证我们的程序至关重要。

让我们在这里检查一个示例，向未优化的基准 rag = RAG() 程序提问，并向提示优化后的 optimized_rag = MIPROv2(..)(..) 程序提问。

In [20]

已复制！

baseline = rag(question="cmd+tab does not work on hidden or minimized windows")
print(baseline.response)
baseline = rag(question="cmd+tab does not work on hidden or minimized windows") print(baseline.response)

You are correct that cmd+tab does not work on hidden or minimized windows. To switch back to a minimized app, you must first switch to another application and let it take focus before returning to the minimized one.

In [21]

已复制！

pred = optimized_rag(question="cmd+tab does not work on hidden or minimized windows")
print(pred.response)
pred = optimized_rag(question="cmd+tab does not work on hidden or minimized windows") print(pred.response)

The Command + Tab shortcut on macOS is designed to switch between currently open applications, but it does not directly restore minimized or hidden windows. When you use Command + Tab, it cycles through the applications that are actively running, and minimized windows do not count as active. To manage minimized windows, you can use other shortcuts or methods. For example, you can use Command + Option + H + M to hide all other applications and minimize the most recently used one. Alternatively, you can navigate to the application you want to restore using Command + Tab and then manually click on the minimized window in the Dock to bring it back to focus.

您可以使用 dspy.inspect_history(n=2) 查看 RAG 提示在优化之前和优化之后的样子。

具体来说，在本笔记本的一次运行中，优化后的提示执行了以下操作（请注意，在后续运行中可能会有所不同）。

构建了以下指令，

Using the provided `context` and `question`, analyze the information step by step to generate a comprehensive and informative `response`. Ensure that the response clearly explains the concepts involved, highlights key distinctions, and addresses any complexities noted in the context.

并包含两个带有合成推理和答案的完整 RAG 示例，例如 how to transfer whatsapp voice message to computer?。

现在让我们在整个开发集上进行评估。

In [22]

已复制！

evaluate(optimized_rag)
evaluate(optimized_rag)

Average Metric: 183.32 / 300 (61.1%): 100%|██████████| 300/300 [00:02<00:00, 104.48it/s]

2024/11/23 23:17:21 INFO dspy.evaluate.evaluate: Average Metric: 183.3194433591069 / 300 (61.1%)

	问题	示例响应	黄金文档 ID	推理	预测响应	SemanticF1
0	什么时候使用 C 而不是 C++，什么时候使用 C++ 而不是 C？	如果您对 C++ 和 C 都同样熟悉，建议...	[733]	上下文提供了关于优缺点的见解...	在需要简单性等场景下，您应该考虑使用 C 而非 C++...	✔️ [0.333]
1	图像应该存储在 Git 仓库中吗？	一种观点认为没有明显的缺点，尤其...	[6253, 6254, 6275, 6278, 8215]	上下文讨论了存储的挑战和注意事项...	将图像存储在 Git 仓库中通常被认为是不好的做法...	✔️ [0.500]

... 还有 298 行未显示 ...

Out[22]

61.11

关注成本。¶

DSPy 允许您跟踪程序的成本，这可用于监控您的调用成本。在这里，我们将向您展示如何使用 DSPy 跟踪程序的成本。

In [23]

已复制！

cost = sum([x['cost'] for x in lm.history if x['cost'] is not None])  # in USD, as calculated by LiteLLM for certain providers
cost = sum([x['cost'] for x in lm.history if x['cost'] is not None]) # in USD, as calculated by LiteLLM for certain providers

保存和加载。¶

优化后的程序内部结构相当简单。您可以随意探索。

在这里，我们将保存 optimized_rag，以便稍后再次加载它，而无需从头开始优化。

In [24]

已复制！

optimized_rag.save("optimized_rag.json")

loaded_rag = RAG()
loaded_rag.load("optimized_rag.json")

loaded_rag(question="cmd+tab does not work on hidden or minimized windows")
optimized_rag.save("optimized_rag.json") loaded_rag = RAG() loaded_rag.load("optimized_rag.json") loaded_rag(question="cmd+tab does not work on hidden or minimized windows")

Out[24]

Prediction(
    reasoning='The context explains how the Command + Tab shortcut functions on macOS, particularly in relation to switching between applications. It notes that this shortcut does not bring back minimized or hidden windows directly. Instead, it cycles through applications that are currently open and visible. The information also suggests alternative methods for managing minimized windows and provides insights into how to navigate between applications effectively.',
    response='The Command + Tab shortcut on macOS is designed to switch between currently open applications, but it does not directly restore minimized or hidden windows. When you use Command + Tab, it cycles through the applications that are actively running, and minimized windows do not count as active. To manage minimized windows, you can use other shortcuts or methods. For example, you can use Command + Option + H + M to hide all other applications and minimize the most recently used one. Alternatively, you can navigate to the application you want to restore using Command + Tab and then manually click on the minimized window in the Dock to bring it back to focus.'
)

在 MLflow 实验中保存程序

您可以在 MLflow 中跟踪程序，而不是将其保存到本地文件，从而获得更好的可重现性和协作性。

依赖项管理：MLflow 会自动保存冻结的环境元数据以及程序，以确保可重现性。
实验跟踪：使用 MLflow，您可以跟踪程序的性能和成本以及程序本身。
协作：您可以通过共享 MLflow 实验与您的团队成员共享程序和结果。

要在 MLflow 中保存程序，请运行以下代码

import mlflow

# Start an MLflow Run and save the program
with mlflow.start_run(run_name="optimized_rag"):
    model_info = mlflow.dspy.log_model(
        optimized_rag,
        artifact_path="model", # Any name to save the program in MLflow
    )

# Load the program back from MLflow
loaded = mlflow.dspy.load_model(model_info.model_uri)

要了解更多关于集成的信息，请访问MLflow DSPy 文档。

下一步是什么？¶

将这项任务的 SemanticF1 分数从约 42% 提高到约 61%，是相当容易的。

但 DSPy 提供了继续迭代系统质量的途径，而我们才刚刚入门。

一般来说，您拥有以下工具

探索更好的程序系统架构，例如，如果我们要求 LM 为检索器生成搜索查询呢？例如，参见在 DSPy 中构建的STORM 管道。
探索不同的提示优化器或权重优化器。请参阅优化器文档。
使用 DSPy 优化器扩展推理时计算能力，例如通过集成多个优化后的程序。
通过提示或权重优化蒸馏到更小的 LM 来降低成本。

您如何决定首先进行哪些操作？

第一步是查看您的系统输出，这将允许您识别任何性能较低的来源。在完成所有这些操作时，请确保继续完善您的指标，例如通过针对您的判断进行优化，并收集更多（或更逼真）的数据，例如来自相关领域或通过将系统的演示版本呈现在用户面前。