# Install the latest version of DSPy
%pip install -U dspy-ai
# Install the Hugging Face datasets library to load the CoNLL-2003 dataset
%pip install datasets
推荐:设置 MLflow 追踪以了解底层发生的情况。
MLflow DSPy 集成¶
MLflow 是一个与 DSPy 原生集成的 LLMOps 工具,提供可解释性和实验跟踪。在本教程中,您可以使用 MLflow 将提示和优化进度可视化为追踪,以便更好地理解 DSPy 的行为。您可以通过以下四个步骤轻松设置 MLflow。
- 安装 MLflow
%pip install mlflow>=2.20
- 在单独的终端中启动 MLflow UI
mlflow ui --port 5000
- 将笔记本连接到 MLflow
import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("DSPy")
- 启用追踪。
mlflow.dspy.autolog()
要了解有关集成的更多信息,请访问MLflow DSPy 文档。
加载并准备数据集¶
在本节中,我们准备 CoNLL-2003 数据集,该数据集常用于实体抽取任务。该数据集包含标注了实体标签(如人物、组织和位置)的 token。
我们将
- 使用 Hugging Face 的
datasets
库加载数据集。 - 定义一个函数来抽取指代人物的 token。
- 切分数据集以创建更小的子集用于训练和测试。
DSPy 期望示例采用结构化格式,因此我们还将数据集转换为 DSPy Examples
以便轻松集成。
import os
import tempfile
from datasets import load_dataset
from typing import Dict, Any, List
import dspy
def load_conll_dataset() -> dict:
"""
Loads the CoNLL-2003 dataset into train, validation, and test splits.
Returns:
dict: Dataset splits with keys 'train', 'validation', and 'test'.
"""
with tempfile.TemporaryDirectory() as temp_dir:
# Use a temporary Hugging Face cache directory for compatibility with certain hosted notebook
# environments that don't support the default Hugging Face cache directory
os.environ["HF_DATASETS_CACHE"] = temp_dir
return load_dataset("conll2003", trust_remote_code=True)
def extract_people_entities(data_row: Dict[str, Any]) -> List[str]:
"""
Extracts entities referring to people from a row of the CoNLL-2003 dataset.
Args:
data_row (Dict[str, Any]): A row from the dataset containing tokens and NER tags.
Returns:
List[str]: List of tokens tagged as people.
"""
return [
token
for token, ner_tag in zip(data_row["tokens"], data_row["ner_tags"])
if ner_tag in (1, 2) # CoNLL entity codes 1 and 2 refer to people
]
def prepare_dataset(data_split, start: int, end: int) -> List[dspy.Example]:
"""
Prepares a sliced dataset split for use with DSPy.
Args:
data_split: The dataset split (e.g., train or test).
start (int): Starting index of the slice.
end (int): Ending index of the slice.
Returns:
List[dspy.Example]: List of DSPy Examples with tokens and expected labels.
"""
return [
dspy.Example(
tokens=row["tokens"],
expected_extracted_people=extract_people_entities(row)
).with_inputs("tokens")
for row in data_split.select(range(start, end))
]
# Load the dataset
dataset = load_conll_dataset()
# Prepare the training and test sets
train_set = prepare_dataset(dataset["train"], 0, 50)
test_set = prepare_dataset(dataset["test"], 0, 200)
配置 DSPy 并创建一个实体抽取程序¶
在这里,我们定义一个 DSPy 程序,用于从 tokenized 文本中抽取指代人物的实体。
然后,我们将 DSPy 配置为对程序的所有调用都使用特定的语言模型 (gpt-4o-mini
)。
介绍的关键 DSPy 概念
- 签名: 定义程序的结构化输入/输出 schema。
- 模块: 将程序逻辑封装在可重用、可组合的单元中。
具体来说,我们将
- 创建一个
PeopleExtraction
DSPy 签名来指定输入 (tokens
) 和输出 (extracted_people
) 字段。 - 定义一个
people_extractor
程序,该程序使用 DSPy 内置的dspy.ChainOfThought
模块来实现PeopleExtraction
签名。该程序使用语言模型 (LM) 提示从输入的 token 列表中抽取指代人物的实体。 - 使用
dspy.LM
类和dspy.settings.configure()
方法来配置 DSPy 在调用程序时将使用的语言模型。
from typing import List
class PeopleExtraction(dspy.Signature):
"""
Extract contiguous tokens referring to specific people, if any, from a list of string tokens.
Output a list of tokens. In other words, do not combine multiple tokens into a single value.
"""
tokens: list[str] = dspy.InputField(desc="tokenized text")
extracted_people: list[str] = dspy.OutputField(desc="all tokens referring to specific people extracted from the tokenized text")
people_extractor = dspy.ChainOfThought(PeopleExtraction)
在这里,我们告诉 DSPy 在程序中使用 OpenAI 的 gpt-4o-mini
模型。为了进行身份验证,DSPy 读取您的 OPENAI_API_KEY
。您可以轻松地将其换成其他提供商或本地模型。
lm = dspy.LM(model="openai/gpt-4o-mini")
dspy.settings.configure(lm=lm)
定义指标和评估函数¶
在 DSPy 中,评估程序的性能对于迭代开发至关重要。一个好的评估框架使我们能够
- 衡量程序输出的质量。
- 将输出与真实标签进行比较。
- 确定需要改进的领域。
我们将做什么
- 定义一个自定义指标 (
extraction_correctness_metric
) 来评估抽取的实体是否与真实值匹配。 - 创建一个评估函数 (
evaluate_correctness
) 将此指标应用于训练或测试数据集并计算总体准确率。
评估函数使用 DSPy 的 Evaluate
工具来处理并行和结果可视化。
def extraction_correctness_metric(example: dspy.Example, prediction: dspy.Prediction, trace=None) -> bool:
"""
Computes correctness of entity extraction predictions.
Args:
example (dspy.Example): The dataset example containing expected people entities.
prediction (dspy.Prediction): The prediction from the DSPy people extraction program.
trace: Optional trace object for debugging.
Returns:
bool: True if predictions match expectations, False otherwise.
"""
return prediction.extracted_people == example.expected_extracted_people
evaluate_correctness = dspy.Evaluate(
devset=test_set,
metric=extraction_correctness_metric,
num_threads=24,
display_progress=True,
display_table=True
)
评估初始抽取器¶
在优化程序之前,我们需要进行基线评估以了解其当前性能。这有助于我们
- 建立优化后用于比较的参考点。
- 识别初始实现中的潜在弱点。
在此步骤中,我们将在测试集上运行 people_extractor
程序,并使用之前定义的评估框架衡量其准确率。
evaluate_correctness(people_extractor, devset=test_set)
Average Metric: 172.00 / 200 (86.0%): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:16<00:00, 11.94it/s]
2024/11/18 21:08:04 INFO dspy.evaluate.evaluate: Average Metric: 172 / 200 (86.0%)
token | 预期抽取的任务 | 理由 | 抽取的任务 | 抽取正确性指标 | |
---|---|---|---|---|---|
0 | [SOCCER, -, JAPAN, GET, LUCKY, WIN, ,, CHINA, IN, SURPRISE, DEFEAT...] | [CHINA] | We extracted "JAPAN" and "CHINA" as they refer to specific countri... | [JAPAN, CHINA] | |
1 | [Nadim, Ladki] | [Nadim, Ladki] | We extracted the tokens "Nadim" and "Ladki" as they refer to speci... | [Nadim, Ladki] | ✔️ [True] |
2 | [AL-AIN, ,, United, Arab, Emirates, 1996-12-06] | [] | There are no tokens referring to specific people in the provided l... | [] | ✔️ [True] |
3 | [Japan, began, the, defence, of, their, Asian, Cup, title, with, a... | [] | We did not find any tokens referring to specific people in the pro... | [] | ✔️ [True] |
4 | [But, China, saw, their, luck, desert, them, in, the, second, matc... | [] | The extracted tokens referring to specific people are "China" and ... | [China, Uzbekistan] | |
... | ... | ... | ... | ... | ... |
195 | ['The', 'Wallabies', 'have', 'their', 'sights', 'set', 'on', 'a', ... | [David, Campese] | The extracted_people includes "David Campese" as it refers to a sp... | [David, Campese] | ✔️ [True] |
196 | ['The', 'Wallabies', 'currently', 'have', 'no', 'plans', 'to', 'ma... | [] | The extracted_people includes "Wallabies" as it refers to a specif... | [] | ✔️ [True] |
197 | ['Campese', 'will', 'be', 'up', 'against', 'a', 'familiar', 'foe',... | [Campese, Rob, Andrew] | The extracted tokens refer to specific people mentioned in the tex... | [Campese, Rob, Andrew] | ✔️ [True] |
198 | ['"', 'Campo', 'has', 'a', 'massive', 'following', 'in', 'this', '... | [Campo, Andrew] | The extracted tokens referring to specific people include "Campo" ... | [Campo, Andrew] | ✔️ [True] |
199 | ['On', 'tour', ',', 'Australia', 'have', 'won', 'all', 'four', 'te... | [] | We extracted the names of specific people from the tokenized text.... | [] | ✔️ [True] |
200 rows × 5 columns
86.0
在 MLflow 实验中跟踪评估结果
为了随时间跟踪和可视化评估结果,您可以将结果记录在 MLflow 实验中。
import mlflow
with mlflow.start_run(run_name="extractor_evaluation"):
evaluate_correctness = dspy.Evaluate(
devset=test_set,
metric=extraction_correctness_metric,
num_threads=24,
display_progress=True,
# To record the outputs and detailed scores to MLflow
return_all_scores=True,
return_outputs=True,
)
# Evaluate the program as usual
aggregated_score, outputs, all_scores = evaluate_correctness(people_extractor)
# Log the aggregated score
mlflow.log_metric("exact_match", aggregated_score)
# Log the detailed evaluation results as a table
mlflow.log_table(
{
"Tokens": [example.tokens for example in test_set],
"Expected": [example.expected_extracted_people for example in test_set],
"Predicted": outputs,
"Exact match": all_scores,
},
artifact_file="eval_results.json",
)
要了解有关集成的更多信息,请访问MLflow DSPy 文档。
优化模型¶
DSPy 包含强大的优化器,可以提高系统质量。
在这里,我们使用 DSPy 的 MIPROv2
优化器来
- 通过 1. 使用 LM 调整提示指令,以及 2. 从训练数据集中构建增强了由
dspy.ChainOfThought
生成的推理的少样本示例,自动调整程序的语言模型 (LM) 提示。 - 最大化训练集上的正确率。
此优化过程是自动化的,在提高准确率的同时节省了时间和精力。
mipro_optimizer = dspy.MIPROv2(
metric=extraction_correctness_metric,
auto="medium",
)
optimized_people_extractor = mipro_optimizer.compile(
people_extractor,
trainset=train_set,
max_bootstrapped_demos=4,
requires_permission_to_run=False,
minibatch=False
)
评估优化后的程序¶
优化后,我们在测试集上重新评估程序以衡量改进。比较优化后和初始结果可以帮助我们
- 量化优化的好处。
- 验证程序对未见过的数据是否具有良好的泛化能力。
在这种情况下,我们看到程序在测试数据集上的准确率显着提高。
evaluate_correctness(optimized_people_extractor, devset=test_set)
Average Metric: 186.00 / 200 (93.0%): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:23<00:00, 8.58it/s]
2024/11/18 21:15:00 INFO dspy.evaluate.evaluate: Average Metric: 186 / 200 (93.0%)
token | 预期抽取的任务 | 理由 | 抽取的任务 | 抽取正确性指标 | |
---|---|---|---|---|---|
0 | [SOCCER, -, JAPAN, GET, LUCKY, WIN, ,, CHINA, IN, SURPRISE, DEFEAT...] | [CHINA] | There are no specific people mentioned in the provided tokens. The... | [] | |
1 | [Nadim, Ladki] | [Nadim, Ladki] | The tokens "Nadim Ladki" refer to a specific individual. Both toke... | [Nadim, Ladki] | ✔️ [True] |
2 | [AL-AIN, ,, United, Arab, Emirates, 1996-12-06] | [] | There are no tokens referring to specific people in the provided l... | [] | ✔️ [True] |
3 | [Japan, began, the, defence, of, their, Asian, Cup, title, with, a... | [] | There are no specific people mentioned in the provided tokens. The... | [] | ✔️ [True] |
4 | [But, China, saw, their, luck, desert, them, in, the, second, matc... | [] | There are no tokens referring to specific people in the provided l... | [] | ✔️ [True] |
... | ... | ... | ... | ... | ... |
195 | ['The', 'Wallabies', 'have', 'their', 'sights', 'set', 'on', 'a', ... | [David, Campese] | The extracted tokens refer to a specific person mentioned in the t... | [David, Campese] | ✔️ [True] |
196 | ['The', 'Wallabies', 'currently', 'have', 'no', 'plans', 'to', 'ma... | [] | There are no specific individuals mentioned in the provided tokens... | [] | ✔️ [True] |
197 | ['Campese', 'will', 'be', 'up', 'against', 'a', 'familiar', 'foe',... | [Campese, Rob, Andrew] | The tokens include the names "Campese" and "Rob Andrew," both of w... | [Campese, Rob, Andrew] | ✔️ [True] |
198 | ['"', 'Campo', 'has', 'a', 'massive', 'following', 'in', 'this', '... | [Campo, Andrew] | The extracted tokens refer to specific people mentioned in the tex... | [Campo, Andrew] | ✔️ [True] |
199 | ['On', 'tour', ',', 'Australia', 'have', 'won', 'all', 'four', 'te... | [] | There are no specific people mentioned in the provided tokens. The... | [] | ✔️ [True] |
200 rows × 5 columns
93.0
检查优化后程序的提示¶
优化程序后,我们可以检查交互历史记录,看看 DSPy 如何用少样本示例增强了程序的提示。此步骤演示了
- 程序使用的提示结构。
- 如何添加少样本示例来引导模型的行为。
使用 inspect_history(n=1)
查看最后一次交互并分析生成的提示。
dspy.inspect_history(n=1)
[2024-11-18T21:15:00.584497] System message: Your input fields are: 1. `tokens` (list[str]): tokenized text Your output fields are: 1. `rationale` (str): ${produce the extracted_people}. We ... 2. `extracted_people` (list[str]): all tokens referring to specific people extracted from the tokenized text All interactions will be structured in the following way, with the appropriate values filled in. [[ ## tokens ## ]] {tokens} [[ ## rationale ## ]] {rationale} [[ ## extracted_people ## ]] {extracted_people} # note: the value you produce must be pareseable according to the following JSON schema: {"type": "array", "items": {"type": "string"}} [[ ## completed ## ]] In adhering to this structure, your objective is: In a high-stakes situation where accurate identification of individuals is critical for regulatory compliance and public health communication, extract contiguous tokens referring to specific people from the provided list of string tokens. Ensure that you output each identified individual as separate tokens without combining multiple tokens into a single value. This task is essential for ensuring clarity and accountability in communications pertaining to EU regulations and health matters. User message: [[ ## tokens ## ]] ["He", "said", "further", "scientific", "study", "was", "required", "and", "if", "it", "was", "found", "that", "action", "was", "needed", "it", "should", "be", "taken", "by", "the", "European", "Union", "."] Respond with the corresponding output fields, starting with the field `[[ ## rationale ## ]]`, then `[[ ## extracted_people ## ]]` (must be formatted as a valid Python list[str]), and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## rationale ## ]] There are no specific people mentioned in the provided tokens. The only reference is to the "European Union," which is an organization rather than a specific person. We therefore conclude that there are no extracted people. [[ ## extracted_people ## ]] [] [[ ## completed ## ]] User message: [[ ## tokens ## ]] ["BRUSSELS", "1996-08-22"] Respond with the corresponding output fields, starting with the field `[[ ## rationale ## ]]`, then `[[ ## extracted_people ## ]]` (must be formatted as a valid Python list[str]), and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## rationale ## ]] There are no tokens referring to specific people in the provided list. We only have a location and a date. [[ ## extracted_people ## ]] [] [[ ## completed ## ]] User message: [[ ## tokens ## ]] ["Fischler", "proposed", "EU-wide", "measures", "after", "reports", "from", "Britain", "and", "France", "that", "under", "laboratory", "conditions", "sheep", "could", "contract", "Bovine", "Spongiform", "Encephalopathy", "(", "BSE", ")", "--", "mad", "cow", "disease", "."] Respond with the corresponding output fields, starting with the field `[[ ## rationale ## ]]`, then `[[ ## extracted_people ## ]]` (must be formatted as a valid Python list[str]), and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## rationale ## ]] The extracted tokens refer to specific people mentioned in the text. In this case, "Fischler" is the only token that refers to a specific person. We extract this token as it is contiguous and represents an individual involved in the proposed measures. [[ ## extracted_people ## ]] ["Fischler"] [[ ## completed ## ]] User message: [[ ## tokens ## ]] ["Germany", "'s", "representative", "to", "the", "European", "Union", "'s", "veterinary", "committee", "Werner", "Zwingmann", "said", "on", "Wednesday", "consumers", "should", "buy", "sheepmeat", "from", "countries", "other", "than", "Britain", "until", "the", "scientific", "advice", "was", "clearer", "."] Respond with the corresponding output fields, starting with the field `[[ ## rationale ## ]]`, then `[[ ## extracted_people ## ]]` (must be formatted as a valid Python list[str]), and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## rationale ## ]] We extracted the name "Werner Zwingmann" from the tokenized text as it refers to a specific person mentioned in the context of a statement regarding sheepmeat consumption. [[ ## extracted_people ## ]] ["Werner", "Zwingmann"] [[ ## completed ## ]] User message: [[ ## tokens ## ]] ["LONDON", "1996-12-06"] Respond with the corresponding output fields, starting with the field `[[ ## rationale ## ]]`, then `[[ ## extracted_people ## ]]` (must be formatted as a valid Python list[str]), and then ending with the marker for `[[ ## completed ## ]]`. Response: [[ ## rationale ## ]] There are no tokens referring to specific people in the provided list. The tokens only include a location and a date. [[ ## extracted_people ## ]] [] [[ ## completed ## ]]
关注成本¶
DSPy 允许您跟踪程序的成本。以下代码演示了如何获取 DSPy 抽取器程序迄今为止所有 LM 调用产生的成本。
cost = sum([x['cost'] for x in lm.history if x['cost'] is not None]) # cost in USD, as calculated by LiteLLM for certain providers
cost
0.26362742999999983
保存和加载优化后的程序¶
DSPy 支持保存和加载程序,使您能够重用优化后的系统,而无需从头开始重新优化。此功能对于在生产环境中部署程序或与协作者共享程序尤其有用。
在此步骤中,我们将优化后的程序保存到文件,并演示如何加载它以便将来使用。
optimized_people_extractor.save("optimized_extractor.json")
loaded_people_extractor = dspy.ChainOfThought(PeopleExtraction)
loaded_people_extractor.load("optimized_extractor.json")
loaded_people_extractor(tokens=["Italy", "recalled", "Marcello", "Cuttitta"]).extracted_people
['Marcello', 'Cuttitta']
在 MLflow 实验中保存程序
除了将程序保存到本地文件外,您还可以在 MLflow 中跟踪它,以获得更好的可复现性和协作性。
- 依赖管理:MLflow 会自动保存冻结的环境元数据以及程序,以确保可复现性。
- 实验跟踪:使用 MLflow,您可以跟踪程序的性能和成本以及程序本身。
- 协作:您可以通过共享 MLflow 实验来与团队成员共享程序和结果。
要在 MLflow 中保存程序,请运行以下代码
import mlflow
# Start an MLflow Run and save the program
with mlflow.start_run(run_name="optimized_extractor"):
model_info = mlflow.dspy.log_model(
optimized_people_extractor,
artifact_path="model", # Any name to save the program in MLflow
)
# Load the program back from MLflow
loaded = mlflow.dspy.load_model(model_info.model_uri)
要了解有关集成的更多信息,请访问MLflow DSPy 文档。
结论¶
在本教程中,我们演示了如何
- 使用 DSPy 构建一个用于实体抽取的模块化、可解释的系统。
- 使用 DSPy 内置工具评估和优化系统。
通过利用结构化输入和输出,我们确保系统易于理解和改进。优化过程使我们能够快速提高性能,而无需手动编写提示或调整参数。
下一步
- 尝试抽取其他实体类型(例如,位置或组织)。
- 探索 DSPy 的其他内置模块,例如
ReAct
,用于更复杂的推理任务。 - 在更大的工作流中使用该系统,例如大规模文档处理或摘要。