教程:数学推理¶
让我们快速演示一个如何设置一个 dspy.ChainOfThought
模块并优化它来回答代数问题。
通过 pip install -U dspy
安装最新版本的 DSPy 并跟随操作。
推荐:设置 MLflow Tracing 来理解底层发生的事情。
MLflow DSPy 集成¶
MLflow 是一个 LLMOps 工具,与 DSPy 原生集成,提供可解释性和实验跟踪。在本教程中,您可以使用 MLflow 将提示和优化进度可视化为追踪(traces),以便更好地理解 DSPy 的行为。您可以按照以下四个步骤轻松设置 MLflow。
- 安装 MLflow
%pip install mlflow>=2.20
- 在单独的终端中启动 MLflow UI
mlflow ui --port 5000
- 将 notebook 连接到 MLflow
import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("DSPy")
- 启用追踪。
mlflow.dspy.autolog()
完成上述步骤后,您可以在 notebook 中看到每次程序执行的追踪(traces)。它们为模型行为提供了极佳的可视性,并帮助您在整个教程中更好地理解 DSPy 的概念。
要了解更多关于集成的详细信息,请访问MLflow DSPy 文档。
让我们告诉 DSPy 我们将在模块中使用 OpenAI 的 `gpt-4o-mini`。为了进行身份验证,DSPy 将查找您的 `OPENAI_API_KEY`。您可以轻松将其替换为其他提供商或本地模型。
import dspy
gpt4o_mini = dspy.LM('openai/gpt-4o-mini', max_tokens=2000)
gpt4o = dspy.LM('openai/gpt-4o', max_tokens=2000)
dspy.configure(lm=gpt4o_mini) # we'll use gpt-4o-mini as the default LM, unless otherwise specified
接下来,让我们从 MATH 基准测试中加载一些数据示例。我们将使用训练集进行优化,并在一个独立的开发集上进行评估。
请注意,以下步骤需要
%pip install git+https://github.com/hendrycks/math.git
from dspy.datasets import MATH
dataset = MATH(subset='algebra')
print(len(dataset.train), len(dataset.dev))
350 350
让我们检查训练集中的一个示例。
example = dataset.train[0]
print("Question:", example.question)
print("Answer:", example.answer)
Question: The doctor has told Cal O'Ree that during his ten weeks of working out at the gym, he can expect each week's weight loss to be $1\%$ of his weight at the end of the previous week. His weight at the beginning of the workouts is $244$ pounds. How many pounds does he expect to weigh at the end of the ten weeks? Express your answer to the nearest whole number. Answer: 221
现在让我们定义我们的模块。它非常简单:只是一个思维链(chain-of-thought)步骤,接收一个 `question` 并生成一个 `answer`。
module = dspy.ChainOfThought("question -> answer")
module(question=example.question)
Prediction( reasoning="Cal O'Ree's weight loss each week is $1\\%$ of his weight at the end of the previous week. This means that at the end of each week, he retains $99\\%$ of his weight from the previous week. \n\nIf we denote his weight at the beginning as \\( W_0 = 244 \\) pounds, then his weight at the end of week \\( n \\) can be expressed as:\n\\[\nW_n = W_{n-1} \\times 0.99\n\\]\nThis can be simplified to:\n\\[\nW_n = W_0 \\times (0.99)^n\n\\]\nAfter 10 weeks, his weight will be:\n\\[\nW_{10} = 244 \\times (0.99)^{10}\n\\]\n\nNow, we calculate \\( (0.99)^{10} \\):\n\\[\n(0.99)^{10} \\approx 0.904382\n\\]\n\nNow, we can calculate his expected weight after 10 weeks:\n\\[\nW_{10} \\approx 244 \\times 0.904382 \\approx 220.5\n\\]\n\nRounding to the nearest whole number, Cal O'Ree can expect to weigh approximately \\( 221 \\) pounds at the end of the ten weeks.", answer='221' )
接下来,在提示优化之前,让我们为上面的零样本模块设置一个评估器。
THREADS = 24
kwargs = dict(num_threads=THREADS, display_progress=True, display_table=5)
evaluate = dspy.Evaluate(devset=dataset.dev, metric=dataset.metric, **kwargs)
evaluate(module)
Average Metric: 259.00 / 350 (74.0%): 100%|██████████| 350/350 [01:30<00:00, 3.85it/s]
2024/11/28 18:41:55 INFO dspy.evaluate.evaluate: Average Metric: 259 / 350 (74.0%)
问题 | 示例推理 | 示例答案 | 预测推理 | 预测答案 | 方法 | |
---|---|---|---|---|---|---|
0 | $c$ 的最小整数值是多少,使得函数 $...$ | 给定函数的定义域为所有实数,当且仅当... | 1 | 为了确定 \( c \) 的最小整数值,使得函数 f... | 1 | ✔️ [真] |
1 | $|{-x+3}|=7$ 的解中,最小的 $x$ 值是多少? | 为了使 $|{-x+3}| = 7$,我们必须满足 $-x + 3 = 7$ 或 $-x ...$ | -4 | 为了解方程 \( |{-x+3}|=7 \),我们需要考虑... | -4 | ✔️ [真] |
2 | 计算 $\left\lceil -\frac{7}{4}\right\rceil$。 | $-\frac{7}{4}$ 介于 $-1$ 和 $-2$ 之间,所以 $\left\lceil -\frac{7}... | -1 | 为了计算 \(\left\lceil -\frac{7}{4}\right\rceil\),我们首先需要... | -1 | ✔️ [真] |
3 | 一个三角形的顶点坐标是 $(11,1)$、$(2,3)$ 和 $(3,7...$ | 我们必须使用公式找到每对点之间的距离... | 10 | 为了找到顶点为...的三角形最长边的长度... | 10 | ✔️ [真] |
4 | 设 $f(x) = x + 2$ 且 $g(x) = 1/f(x)$。求 $g(f(-3))$? | 首先,我们发现 $f(-3) = (-3) + 2 = -1$。然后,$$g(f(-3)) = g(...$ | 1 | 为了找到 \( g(f(-3)) \),我们首先需要计算 \( f(-3) \)。使用... | 1 | ✔️ [真] |
74.0
在 MLflow Experiment 中跟踪评估结果
为了跟踪和可视化随时间变化的评估结果,您可以将结果记录在 MLflow Experiment 中。
import mlflow
# Start an MLflow Run to record the evaluation
with mlflow.start_run(run_name="math_evaluation"):
kwargs = dict(num_threads=THREADS, display_progress=True, return_all_scores=True, return_outputs=True)
evaluate = dspy.Evaluate(devset=dataset.dev, metric=dataset.metric, **kwargs)
# Evaluate the program as usual
aggregated_score, outputs, all_scores = evaluate(module)
# Log the aggregated score
mlflow.log_metric("correctness", aggregated_score)
# Log the detailed evaluation results as a table
mlflow.log_table(
{
"Question": [example.question for example in dataset.dev],
"Gold Answer": [example.answer for example in dataset.dev],
"Predicted Answer": outputs,
"Correctness": all_scores,
},
artifact_file="eval_results.json",
)
要了解更多关于集成的详细信息,请访问MLflow DSPy 文档。
最后,让我们优化我们的模块。由于我们需要强大的推理能力,我们将使用大型模型 GPT-4o 作为教师模型(在优化时用于为小型 LM 引导推理),但不作为提示模型(用于构建指令)或任务模型(训练模型)。
GPT-4o 只会被调用少量次数。直接参与优化以及用于最终(优化后)程序的模型将是 GPT-4o-mini。
我们还将指定 `max_bootstrapped_demos=4`,这意味着我们希望在提示中最多包含四个引导示例,并指定 `max_labeled_demos=4`,这意味着在引导示例和预标注示例的总数中,我们最多希望有四个。
kwargs = dict(num_threads=THREADS, teacher_settings=dict(lm=gpt4o), prompt_model=gpt4o_mini)
optimizer = dspy.MIPROv2(metric=dataset.metric, auto="medium", **kwargs)
kwargs = dict(requires_permission_to_run=False, max_bootstrapped_demos=4, max_labeled_demos=4)
optimized_module = optimizer.compile(module, trainset=dataset.train, **kwargs)
evaluate(optimized_module)
Average Metric: 310.00 / 350 (88.6%): 100%|██████████| 350/350 [01:31<00:00, 3.84it/s]
2024/11/28 18:59:19 INFO dspy.evaluate.evaluate: Average Metric: 310 / 350 (88.6%)
问题 | 示例推理 | 示例答案 | 预测推理 | 预测答案 | 方法 | |
---|---|---|---|---|---|---|
0 | $c$ 的最小整数值是多少,使得函数 $...$ | 给定函数的定义域为所有实数,当且仅当... | 1 | 函数 \( f(x) = \frac{x^2 + 1}{x^2 - x + c} \) 将拥有... | 1 | ✔️ [真] |
1 | $|{-x+3}|=7$ 的解中,最小的 $x$ 值是多少? | 为了使 $|{-x+3}| = 7$,我们必须满足 $-x + 3 = 7$ 或 $-x ...$ | -4 | 方程 \( |{-x+3}|=7 \) 意味着两种可能的情况:1. \(-x ... | -4 | ✔️ [真] |
2 | 计算 $\left\lceil -\frac{7}{4}\right\rceil$。 | $-\frac{7}{4}$ 介于 $-1$ 和 $-2$ 之间,所以 $\left\lceil -\frac{7}... | -1 | 为了计算 \(\left\lceil -\frac{7}{4}\right\rceil\),我们首先需要... | -1 | ✔️ [真] |
3 | 一个三角形的顶点坐标是 $(11,1)$、$(2,3)$ 和 $(3,7...$ | 我们必须使用公式找到每对点之间的距离... | 10 | 为了找到由顶点形成的三角形边的长度... | 10 | ✔️ [真] |
4 | 设 $f(x) = x + 2$ 且 $g(x) = 1/f(x)$。求 $g(f(-3))$? | 首先,我们发现 $f(-3) = (-3) + 2 = -1$。然后,$$g(f(-3)) = g(...$ | 1 | 为了找到 \( g(f(-3)) \),我们首先需要计算 \( f(-3) \)。使用... | 1 | ✔️ [真] |
88.57
很棒。在这里,将独立集上的质量从 74% 提升到超过 88% 是相当直接的。
话虽如此,对于这类推理任务,您通常会考虑更高级的策略,例如
- 一个带有计算器函数或 `dspy.PythonInterpreter` 访问权限的 `dspy.ReAct` 模块
- 通过多数投票(或一个 Aggregator 模块)对多个优化后的提示进行集成
为了了解发生了什么变化,让我们看看优化后的提示。或者,如果您按照上述说明启用了 MLflow 追踪,您可以在丰富的追踪 UI 中比较优化前后的提示。
dspy.inspect_history()
[2024-11-28T18:59:19.176586] System message: Your input fields are: 1. `question` (str) Your output fields are: 1. `reasoning` (str) 2. `answer` (str) All interactions will be structured in the following way, with the appropriate values filled in. [[ ## question ## ]] {question} [[ ## reasoning ## ]] {reasoning} [[ ## answer ## ]] {answer} [[ ## completed ## ]] In adhering to this structure, your objective is: Analyze the `question` provided, and systematically apply mathematical reasoning to derive the `answer`. Ensure to articulate each step of your thought process in the `reasoning` field, detailing how you identify relationships and formulate equations to arrive at the solution. User message: [[ ## question ## ]] If $|4x+2|=10$ and $x<0$, what is the value of $x$? Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## reasoning ## ]] The equation $|4x+2|=10$ implies two possible equations: $4x+2=10$ or $4x+2=-10$. We solve each equation separately. 1. For $4x+2=10$: \[ 4x + 2 = 10 \\ 4x = 8 \\ x = 2 \] However, $x=2$ does not satisfy the condition $x<0$. 2. For $4x+2=-10$: \[ 4x + 2 = -10 \\ 4x = -12 \\ x = -3 \] The solution $x = -3$ satisfies the condition $x<0$. Therefore, the value of $x$ is $\boxed{-3}$. [[ ## answer ## ]] -3 [[ ## completed ## ]] User message: [[ ## question ## ]] Given that $-4$ is a solution to $x^2 + bx -36 = 0$, what is the value of $b$? Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## reasoning ## ]] The product of the roots of this quadratic is $-36/1=-36$, so the other solution must be $-36/-4=9$. That means that the sum of the solutions is $-4+9=5$. The sum of the solutions is also $-b/1=-b$. Thus, $-b=5$ and $b=\boxed{-5}$. [[ ## answer ## ]] -5 [[ ## completed ## ]] User message: [[ ## question ## ]] Richard is building a rectangular playground from 200 feet of fencing. The fencing must entirely enclose the playground. What is the maximum area of this playground? Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## reasoning ## ]] Let the length of the playground be $l$ and the width be $w$. We have the equation $2l+2w=200 \Rightarrow l + w = 100$. We want to maximize the area of this rectangular playground, which is given by $lw$. From our equation, we know that $l=100-w$. Substituting this in to our expression for area, we have \[(100-w)(w)=100w-w^2\]We will now complete the square to find the maximum value of this expression. Factoring a $-1$ out, we have \[-(w^2-100w)\]In order for the expression inside the parenthesis to be a perfect square, we need to add and subtract $(100/2)^2=2500$ inside the parenthesis. Doing this, we get \[-(w^2-100w+2500-2500) \Rightarrow -(w-50)^2+2500\]Since the maximum value of $-(w-50)^2$ is 0 (perfect squares are always nonnegative), the maximum value of the entire expression is 2500, which is achieved when $w=50$ and $l=100-w=50$ (the playground is a square). Thus, the maximum area of the playground is $\boxed{2500}$ square feet. [[ ## answer ## ]] 2500 [[ ## completed ## ]] User message: [[ ## question ## ]] Alice and Bob are playing a game. Alice starts first. On Alice's turn, she flips a coin. If she gets a heads, she wins. If not, it becomes Bob's turn. On Bob's turn, he flips a coin. If he gets a tails, he wins. If not, it becomes Alice's turn. What is the probability that Alice wins the game? Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## reasoning ## ]] Alice has a $1/2$ chance of winning the game on her first turn. If she doesn't, then the probability that she wins the game on her second turn is $1/8,$ since she must not win on her first flip ($1/2$ chance), Bob must not win on his first flip ($1/2$ chance), and then Alice must win on her second flip ($1/2$ chance). The probability that she wins the game on her third turn is $1/32,$ and in general, the probability that she wins the game on her $k^\text{th}$ turn is $(1/2)^{2k-1}.$ Thus, the probability that Alice wins is an infinite geometric series with first term $1/2$ and common ratio $1/4.$ So, the probability that Alice wins the game is $$\frac{\frac12}{1-\frac14} = \boxed{\frac{2}{3}}.$$OR Note that the only difference between the odds of Alice or Bob winning is who goes first. Because Bob goes second, the odds of him winning on his $k^\text{th}$ flip is half of the odds that Alice wins on her $k^\text{th}$ flip, since Alice must first get a tails before Bob gets a chance to win. Thus, if $a$ is Alice's chance of winning, and $b$ is Bob's chance of winning, then $a = 2b.$ Also, since someone must win, $a + b = 1.$ It follows that $a = 2/3$ and $b = 1/3,$ so Alice has a $\boxed{\frac{2}{3}}$ chance of winning the game. [[ ## answer ## ]] \frac{2}{3} [[ ## completed ## ]] User message: [[ ## question ## ]] Find the smallest possible value of $$\frac{(y-x)^2}{(y-z)(z-x)} + \frac{(z-y)^2}{(z-x)(x-y)} + \frac{(x-z)^2}{(x-y)(y-z)},$$ where $x,y,$ and $z$ are distinct real numbers. Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`. Response: [[ ## reasoning ## ]] Let \( S = \frac{(y-x)^2}{(y-z)(z-x)} + \frac{(z-y)^2}{(z-x)(x-y)} + \frac{(x-z)^2}{(x-y)(y-z)} \). To find the minimum value of \( S \), we can use the method of symmetry and the Cauchy-Schwarz inequality. By applying the Cauchy-Schwarz inequality, we have: \[ \left( \frac{(y-x)^2}{(y-z)(z-x)} + \frac{(z-y)^2}{(z-x)(x-y)} + \frac{(x-z)^2}{(x-y)(y-z)} \right) \left( (y-z)(z-x) + (z-x)(x-y) + (x-y)(y-z) \right) \geq \left( (y-x) + (z-y) + (x-z) \right)^2. \] The right-hand side simplifies to: \[ (y-x) + (z-y) + (x-z) = 0, \] which means the right-hand side is \( 0 \). Therefore, we have: \[ S \cdot \left( (y-z)(z-x) + (z-x)(x-y) + (x-y)(y-z) \right) \geq 0. \] This implies that \( S \) is non-negative. Next, we can test specific values for \( x, y, z \) to find the minimum. Let's set \( x = 0, y = 1, z = 2 \): \[ S = \frac{(1-0)^2}{(1-2)(2-0)} + \frac{(2-1)^2}{(2-0)(0-1)} + \frac{(0-2)^2}{(0-1)(1-2)}. \] Calculating each term: 1. The first term: \[ \frac{1^2}{(-1)(2)} = \frac{1}{-2} = -\frac{1}{2}. \] 2. The second term: \[ \frac{1^2}{(2)(-1)} = \frac{1}{-2} = -\frac{1}{2}. \] 3. The third term: \[ \frac{(-2)^2}{(-1)(-1)} = \frac{4}{1} = 4. \] Thus, \[ S = -\frac{1}{2} - \frac{1}{2} + 4 = 3. \] To check if this is indeed the minimum, we can also consider the case when \( x, y, z \) are the roots of a quadratic polynomial. By symmetry and the nature of the expression, we can conclude that the minimum value of \( S \) is indeed \( 3 \). Therefore, the smallest possible value of \( S \) is \( \boxed{3} \). [[ ## answer ## ]] 3 [[ ## completed ## ]]