图像生成提示词迭代¶

这基于 @ThorondorLLC 的一条推文

推文在此这里

这将接受一个初始期望提示词，并迭代地优化它，直到生成的图像与期望提示词匹配。

这并非 DSPy 通常使用的提示词优化方式，但它是一个使用多模态 DSPy 的好例子。

未来的升级将是创建一个包含初始提示词和最终提示词的数据集，以优化提示词生成。

您可以通过以下方式安装 DSPy

pip install -U dspy

在本示例中，我们将使用 FAL 的 Flux Pro。您可以在此处获取 API 密钥这里

我们还需要安装 Pillow 和 dotenv。

pip install fal-client pillow dotenv

现在，让我们导入必要的库并设置环境

In [ ]

已复制！

# Optional
#os.environ["FAL_API_KEY"] = "your_fal_api_key"
#os.environ["OPENAI_API_KEY"] = "your_openai_api_key"
# Optional #os.environ["FAL_API_KEY"] = "your_fal_api_key" #os.environ["OPENAI_API_KEY"] = "your_openai_api_key"

In [1]

已复制！





import dspy

from PIL import Image
from io import BytesIO
import requests
import fal_client

from dotenv import load_dotenv
load_dotenv()

# import display
from IPython.display import display

lm = dspy.LM(model="gpt-4o-mini", temperature=0.5)
dspy.settings.configure(lm=lm)
import dspy from PIL import Image from io import BytesIO import requests import fal_client from dotenv import load_dotenv load_dotenv() # import display from IPython.display import display lm = dspy.LM(model="gpt-4o-mini", temperature=0.5) dspy.settings.configure(lm=lm)

/Users/isaac/sd_optimizer/.venv/lib/python3.12/site-packages/pydantic/_internal/_config.py:345: UserWarning: Valid config keys have changed in V2:
* 'fields' has been removed
  warnings.warn(message, UserWarning)
/Users/isaac/sd_optimizer/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

In [9]

已复制！





def generate_image(prompt):

    request_id = fal_client.submit(
        "fal-ai/flux-pro/v1.1-ultra",
        arguments={
            "prompt": prompt
        },
    ).request_id

    result = fal_client.result("fal-ai/flux-pro/v1.1-ultra", request_id)
    url = result["images"][0]["url"]

    return dspy.Image.from_url(url)

def display_image(image):
    url = image.url
    # download the image
    response = requests.get(url)
    image = Image.open(BytesIO(response.content))

    # display at 25% of original size
    display(image.resize((image.width // 4, image.height // 4)))
def generate_image(prompt): request_id = fal_client.submit( "fal-ai/flux-pro/v1.1-ultra", arguments={ "prompt": prompt }, ).request_id result = fal_client.result("fal-ai/flux-pro/v1.1-ultra", request_id) url = result["images"][0]["url"] return dspy.Image.from_url(url) def display_image(image): url = image.url # download the image response = requests.get(url) image = Image.open(BytesIO(response.content)) # display at 25% of original size display(image.resize((image.width // 4, image.height // 4)))

In [18]

已复制！





check_and_revise_prompt = dspy.Predict("desired_prompt: str, current_image: dspy.Image, current_prompt:str -> feedback:str, image_strictly_matches_desired_prompt: bool, revised_prompt: str")

initial_prompt = "A scene that's both peaceful and tense"
current_prompt = initial_prompt

max_iter = 5
for i in range(max_iter):
    print(f"Iteration {i+1} of {max_iter}")
    current_image = generate_image(current_prompt)
    result = check_and_revise_prompt(desired_prompt=initial_prompt, current_image=current_image, current_prompt=current_prompt)
    display_image(current_image)
    if result.image_strictly_matches_desired_prompt:
        break
    else:
        current_prompt = result.revised_prompt
        print(f"Feedback: {result.feedback}")
        print(f"Revised prompt: {result.revised_prompt}")

print(f"Final prompt: {current_prompt}")
check_and_revise_prompt = dspy.Predict("desired_prompt: str, current_image: dspy.Image, current_prompt:str -> feedback:str, image_strictly_matches_desired_prompt: bool, revised_prompt: str") initial_prompt = "A scene that's both peaceful and tense" current_prompt = initial_prompt max_iter = 5 for i in range(max_iter): print(f"迭代 {i+1} / {max_iter}") current_image = generate_image(current_prompt) result = check_and_revise_prompt(desired_prompt=initial_prompt, current_image=current_image, current_prompt=current_prompt) display_image(current_image) if result.image_strictly_matches_desired_prompt: break else: current_prompt = result.revised_prompt print(f"反馈：{result.feedback}") print(f"修订后的提示词：{result.revised_prompt}") print(f"最终提示词：{current_prompt}")

Iteration 1 of 5

$No description has been provided for this image$

Feedback: The image depicts a peaceful autumn scene with people walking among colorful leaves, which aligns with the peaceful aspect of the prompt. However, it lacks any elements that convey tension, making it not fully representative of the desired prompt.
Iteration 2 of 5

Feedback: The image depicts a serene autumn scene with vibrant foliage and a calm river, which aligns well with the idea of peace. However, it lacks explicit elements that suggest underlying tension, making it less effective in conveying both aspects of the desired prompt.
Iteration 3 of 5

No description has been provided for this image

Feedback: The image depicts a serene autumn scene with warm colors and soft lighting, which aligns with the peaceful aspect of the desired prompt. However, it lacks elements that evoke tension or unease, making it not fully meet the requirement for a scene that is both peaceful and tense.
Iteration 4 of 5

Final prompt: A serene autumn scene with fog and shadows, capturing both peace and tension.

In [19]

已复制！

dspy.inspect_history(5)
dspy.inspect_history(5)



[2025-01-17T11:36:55.947579]

System message:

Your input fields are:
1. `desired_prompt` (str)
2. `current_image` (Image)
3. `current_prompt` (str)

Your output fields are:
1. `feedback` (str)
2. `revised_prompt` (str)
3. `image_strictly_matches_desired_prompt` (bool)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## desired_prompt ## ]]
{desired_prompt}

[[ ## current_image ## ]]
{current_image}

[[ ## current_prompt ## ]]
{current_prompt}

[[ ## feedback ## ]]
{feedback}

[[ ## revised_prompt ## ]]
{revised_prompt}

[[ ## image_strictly_matches_desired_prompt ## ]]
{image_strictly_matches_desired_prompt}        # note: the value you produce must be True or False

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `desired_prompt`, `current_image`, `current_prompt`, produce the fields `feedback`, `revised_prompt`, `image_strictly_matches_desired_prompt`.


User message:

[[ ## desired_prompt ## ]]
A scene that's both peaceful and tense

[[ ## current_image ## ]]
<image_url: https://fal.media/files/panda/HLKaiKFc_sN_fwBBN1C_2_1d1a72732e294452afb056a26d6b6c96.jpg>

[[ ## current_prompt ## ]]
A scene that's both peaceful and tense

Respond with the corresponding output fields, starting with the field `[[ ## feedback ## ]]`, then `[[ ## revised_prompt ## ]]`, then `[[ ## image_strictly_matches_desired_prompt ## ]]` (must be formatted as a valid Python bool), and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## feedback ## ]]
The image depicts a calm café scene with two individuals engaged in conversation, which conveys a sense of peace. However, the woman's posture and expression suggest an underlying tension, as she appears to be contemplative or distressed. This duality aligns well with the desired prompt of being both peaceful and tense.

[[ ## revised_prompt ## ]]
A scene that captures the duality of peace and tension in a café setting.

[[ ## image_strictly_matches_desired_prompt ## ]]
True

[[ ## completed ## ]]





[2025-01-17T11:37:55.458433]

System message:

Your input fields are:
1. `desired_prompt` (str)
2. `current_image` (Image)
3. `current_prompt` (str)

Your output fields are:
1. `feedback` (str)
2. `image_strictly_matches_desired_prompt` (bool)
3. `revised_prompt` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## desired_prompt ## ]]
{desired_prompt}

[[ ## current_image ## ]]
{current_image}

[[ ## current_prompt ## ]]
{current_prompt}

[[ ## feedback ## ]]
{feedback}

[[ ## image_strictly_matches_desired_prompt ## ]]
{image_strictly_matches_desired_prompt}        # note: the value you produce must be True or False

[[ ## revised_prompt ## ]]
{revised_prompt}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `desired_prompt`, `current_image`, `current_prompt`, produce the fields `feedback`, `image_strictly_matches_desired_prompt`, `revised_prompt`.


User message:

[[ ## desired_prompt ## ]]
A scene that's both peaceful and tense

[[ ## current_image ## ]]
<image_url: https://fal.media/files/elephant/p3M00ca786Ov5ITX8nXUs_7b815b2087e54c1fbd798b9b9e0c5f5e.jpg>

[[ ## current_prompt ## ]]
A scene that's both peaceful and tense

Respond with the corresponding output fields, starting with the field `[[ ## feedback ## ]]`, then `[[ ## image_strictly_matches_desired_prompt ## ]]` (must be formatted as a valid Python bool), then `[[ ## revised_prompt ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## feedback ## ]]
The image depicts a peaceful autumn scene with people walking among colorful leaves, which aligns with the peaceful aspect of the prompt. However, it lacks any elements that convey tension, making it not fully representative of the desired prompt.

[[ ## image_strictly_matches_desired_prompt ## ]]
False

[[ ## revised_prompt ## ]]
A serene autumn scene with elements that suggest underlying tension

[[ ## completed ## ]]





[2025-01-17T11:38:24.032318]

System message:

Your input fields are:
1. `desired_prompt` (str)
2. `current_image` (Image)
3. `current_prompt` (str)

Your output fields are:
1. `feedback` (str)
2. `image_strictly_matches_desired_prompt` (bool)
3. `revised_prompt` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## desired_prompt ## ]]
{desired_prompt}

[[ ## current_image ## ]]
{current_image}

[[ ## current_prompt ## ]]
{current_prompt}

[[ ## feedback ## ]]
{feedback}

[[ ## image_strictly_matches_desired_prompt ## ]]
{image_strictly_matches_desired_prompt}        # note: the value you produce must be True or False

[[ ## revised_prompt ## ]]
{revised_prompt}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `desired_prompt`, `current_image`, `current_prompt`, produce the fields `feedback`, `image_strictly_matches_desired_prompt`, `revised_prompt`.


User message:

[[ ## desired_prompt ## ]]
A scene that's both peaceful and tense

[[ ## current_image ## ]]
<image_url: https://fal.media/files/monkey/QhPlYYa5tob38190Gtn0t_17e963cacc1542d5977031aff954c718.jpg>

[[ ## current_prompt ## ]]
A serene autumn scene with elements that suggest underlying tension

Respond with the corresponding output fields, starting with the field `[[ ## feedback ## ]]`, then `[[ ## image_strictly_matches_desired_prompt ## ]]` (must be formatted as a valid Python bool), then `[[ ## revised_prompt ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## feedback ## ]]
The image depicts a serene autumn scene with vibrant foliage and a calm river, which aligns well with the idea of peace. However, it lacks explicit elements that suggest underlying tension, making it less effective in conveying both aspects of the desired prompt.

[[ ## image_strictly_matches_desired_prompt ## ]]
False

[[ ## revised_prompt ## ]]
A serene autumn scene with elements that evoke a sense of unease or foreboding

[[ ## completed ## ]]





[2025-01-17T11:38:46.424883]

System message:

Your input fields are:
1. `desired_prompt` (str)
2. `current_image` (Image)
3. `current_prompt` (str)

Your output fields are:
1. `feedback` (str)
2. `image_strictly_matches_desired_prompt` (bool)
3. `revised_prompt` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## desired_prompt ## ]]
{desired_prompt}

[[ ## current_image ## ]]
{current_image}

[[ ## current_prompt ## ]]
{current_prompt}

[[ ## feedback ## ]]
{feedback}

[[ ## image_strictly_matches_desired_prompt ## ]]
{image_strictly_matches_desired_prompt}        # note: the value you produce must be True or False

[[ ## revised_prompt ## ]]
{revised_prompt}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `desired_prompt`, `current_image`, `current_prompt`, produce the fields `feedback`, `image_strictly_matches_desired_prompt`, `revised_prompt`.


User message:

[[ ## desired_prompt ## ]]
A scene that's both peaceful and tense

[[ ## current_image ## ]]
<image_url: https://fal.media/files/panda/kNBH7FF1uLk06nXonN0Zr_256e1bf2f71c4dcc8eb94027f09f3452.jpg>

[[ ## current_prompt ## ]]
A serene autumn scene with elements that evoke a sense of unease or foreboding

Respond with the corresponding output fields, starting with the field `[[ ## feedback ## ]]`, then `[[ ## image_strictly_matches_desired_prompt ## ]]` (must be formatted as a valid Python bool), then `[[ ## revised_prompt ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## feedback ## ]]
The image depicts a serene autumn scene with warm colors and soft lighting, which aligns with the peaceful aspect of the desired prompt. However, it lacks elements that evoke tension or unease, making it not fully meet the requirement for a scene that is both peaceful and tense.

[[ ## image_strictly_matches_desired_prompt ## ]]
False

[[ ## revised_prompt ## ]]
A serene autumn scene that includes subtle elements of tension or foreboding, such as dark shadows or an unsettling atmosphere.

[[ ## completed ## ]]





[2025-01-17T11:39:03.586839]

System message:

Your input fields are:
1. `desired_prompt` (str)
2. `current_image` (Image)
3. `current_prompt` (str)

Your output fields are:
1. `feedback` (str)
2. `image_strictly_matches_desired_prompt` (bool)
3. `revised_prompt` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## desired_prompt ## ]]
{desired_prompt}

[[ ## current_image ## ]]
{current_image}

[[ ## current_prompt ## ]]
{current_prompt}

[[ ## feedback ## ]]
{feedback}

[[ ## image_strictly_matches_desired_prompt ## ]]
{image_strictly_matches_desired_prompt}        # note: the value you produce must be True or False

[[ ## revised_prompt ## ]]
{revised_prompt}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `desired_prompt`, `current_image`, `current_prompt`, produce the fields `feedback`, `image_strictly_matches_desired_prompt`, `revised_prompt`.


User message:

[[ ## desired_prompt ## ]]
A scene that's both peaceful and tense

[[ ## current_image ## ]]
<image_url: https://fal.media/files/penguin/AXRlgHfIxbZ26Ipgd64Rn_d012474791904e70bd419945f7179980.jpg>

[[ ## current_prompt ## ]]
A serene autumn scene that includes subtle elements of tension or foreboding, such as dark shadows or an unsettling atmosphere.

Respond with the corresponding output fields, starting with the field `[[ ## feedback ## ]]`, then `[[ ## image_strictly_matches_desired_prompt ## ]]` (must be formatted as a valid Python bool), then `[[ ## revised_prompt ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## feedback ## ]]
The image depicts a foggy forest scene with autumn leaves, creating a peaceful atmosphere. However, the overall mood is somewhat tense due to the fog and shadows, aligning well with the desired prompt of being both peaceful and tense. 

[[ ## image_strictly_matches_desired_prompt ## ]]
True

[[ ## revised_prompt ## ]]
A serene autumn scene with fog and shadows, capturing both peace and tension.

[[ ## completed ## ]]