Date

a year ago

Size

960.39 MB

PhotoMaker V2: Improved ID fidelity and greater control over V1

Tutorial Introduction

PhotoMaker is an efficient portrait customization model open-sourced by the Tencent team in 2024. It can quickly generate customized artistic style photos based on portraits. In addition to generating personalized portraits, it can also change the age and gender of the characters, integrate the characteristics of different characters to create new character information, and is a very practical AI painting tool. This tutorial is the 2.0 version of PhotoMaker. Compared with V1, it has greatly improved the consistency and controllability of the characters.

This tutorial has already set up the relevant environment. You only need to enter a command to experience the Demo.

Major improvements in PhotoMaker V2

ID fidelity is further improved, especially for single image input and Asian face input. Inputting more face images still produces better results.
By integrating ControlNet, twi-adapter and IP-Adapter, the generation process becomes more controllable. The research team provides corresponding scripts for reference. In addition, PhotoMaker V2 allows users to achieve better ID consistency by combining it with IP-Adapter-FaceID, InstantID and character LoRA.
PhotoMaker V2 inherits the good features of PhotoMaker V1, such as high-quality and diverse generation capabilities, as well as powerful text control. In addition, it can also integrate previous models, such as restoring people in old photos or paintings to reality, identity mixing, and changing age or gender.

Effect display

How to run

1. After cloning and starting the container, open the workspace

2. Create a new terminal and enter the command `bash run.sh`

3. After port 8080 appears, click the link at the API address on the right to enter the model experience

4. After entering the website, you can see the following interface

Upload the portrait image you want to use (you can upload multiple images)
Using English input prompts, the model will generate images based on the input prompts.

Note that the category vocabulary to be generated must use the trigger word img, such as man img, woman img, girl img.

Select the desired style from the Style template. These styles are some preset tips.
Click submit to generate the image.

There are some examples at the bottom of the website. Click them to load them directly.

You can also change the advanced settings according to your needs. The following are some parameter descriptions.

Negative Prompt: This specifies the features that should be avoided when generating the output. By inputting terms such as “bad symmetry, bad quality, low quality, illustration, 3D, 2D, painting, cartoon, sketch, open mouth”, the model will try to avoid including these features in the generated images.
Number of sample steps: This controls the number of steps the model takes when generating an image. More steps generally produce higher quality images because the model has more opportunities to refine the output.
Style strength: This indicates how much the specified style should influence the output image. The higher the percentage, the more influential the style will be.
Number of output images: This determines how many images the model should generate in one generation process
Guidance scale: This parameter adjusts how strictly the model should follow the prompt. A higher guidance scale means the model will follow the prompt more strictly, which may lead to more accurate but less creative results.
Seed: The seed value is used to initialize the random number generator and affect the output. By setting a specific seed, you can ensure the repeatability of the results. Checking Randomize seed will generate a different image each time.

Discussion and Exchange

🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [Tutorial Exchange] to join the group to discuss various technical issues and share application effects↓

This notebook is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Notebooks

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Run this Notebook Discuss on Discord

Date

a year ago

Size

960.39 MB

PhotoMaker V2: Improved ID fidelity and greater control over V1

Tutorial Introduction

This tutorial has already set up the relevant environment. You only need to enter a command to experience the Demo.

Major improvements in PhotoMaker V2

ID fidelity is further improved, especially for single image input and Asian face input. Inputting more face images still produces better results.
By integrating ControlNet, twi-adapter and IP-Adapter, the generation process becomes more controllable. The research team provides corresponding scripts for reference. In addition, PhotoMaker V2 allows users to achieve better ID consistency by combining it with IP-Adapter-FaceID, InstantID and character LoRA.
PhotoMaker V2 inherits the good features of PhotoMaker V1, such as high-quality and diverse generation capabilities, as well as powerful text control. In addition, it can also integrate previous models, such as restoring people in old photos or paintings to reality, identity mixing, and changing age or gender.

Effect display

How to run

1. After cloning and starting the container, open the workspace

2. Create a new terminal and enter the command `bash run.sh`

3. After port 8080 appears, click the link at the API address on the right to enter the model experience

4. After entering the website, you can see the following interface

Upload the portrait image you want to use (you can upload multiple images)
Using English input prompts, the model will generate images based on the input prompts.

Note that the category vocabulary to be generated must use the trigger word img, such as man img, woman img, girl img.

Select the desired style from the Style template. These styles are some preset tips.
Click submit to generate the image.

There are some examples at the bottom of the website. Click them to load them directly.

You can also change the advanced settings according to your needs. The following are some parameter descriptions.

Negative Prompt: This specifies the features that should be avoided when generating the output. By inputting terms such as “bad symmetry, bad quality, low quality, illustration, 3D, 2D, painting, cartoon, sketch, open mouth”, the model will try to avoid including these features in the generated images.
Number of sample steps: This controls the number of steps the model takes when generating an image. More steps generally produce higher quality images because the model has more opportunities to refine the output.
Style strength: This indicates how much the specified style should influence the output image. The higher the percentage, the more influential the style will be.
Number of output images: This determines how many images the model should generate in one generation process
Guidance scale: This parameter adjusts how strictly the model should follow the prompt. A higher guidance scale means the model will follow the prompt more strictly, which may lead to more accurate but less creative results.
Seed: The seed value is used to initialize the random number generator and affect the output. By setting a specific seed, you can ensure the repeatability of the results. Checking Randomize seed will generate a different image each time.

Discussion and Exchange

Related Notebooks

LongCat-Video: Meituan's open-source AI Video Generation Model

3 months ago

F5-E2 TTS Clones Any Sound in Just 3 Seconds

2 months ago

Depth-Anything-3: Restoring Visual Space From Any Perspective

2 months ago

LongCat-Image: A Bilingual Text-Driven Image Generation System

2 months ago

FLUX.2-dev: Image Generation and Editing Model

2 months ago

Nemotron-Speech-Streaming-ASR: Automatic Speech Recognition Demo

21 days ago

TRELLIS.2 3D Generation Demo

20 days ago

One-click Deployment of Qwen-Image-Lightning

2 months ago

Krea-realtime-video: Real-time Video Generation Model

3 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

PhotoMaker V2 Generates Personalized Photo Pictures in Seconds Demo

PhotoMaker V2: Improved ID fidelity and greater control over V1

Tutorial Introduction

Major improvements in PhotoMaker V2

Effect display

How to run

1. After cloning and starting the container, open the workspace

2. Create a new terminal and enter the command bash run.sh

3. After port 8080 appears, click the link at the API address on the right to enter the model experience

4. After entering the website, you can see the following interface

Discussion and Exchange

Build AI with AI

HyperAI Newsletters

Command Palette

PhotoMaker V2 Generates Personalized Photo Pictures in Seconds Demo

PhotoMaker V2: Improved ID fidelity and greater control over V1

Tutorial Introduction

Major improvements in PhotoMaker V2

Effect display

How to run

1. After cloning and starting the container, open the workspace

2. Create a new terminal and enter the command bash run.sh

3. After port 8080 appears, click the link at the API address on the right to enter the model experience

4. After entering the website, you can see the following interface

Discussion and Exchange

Related Notebooks

LongCat-Video: Meituan's open-source AI Video Generation Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

Depth-Anything-3: Restoring Visual Space From Any Perspective

LongCat-Image: A Bilingual Text-Driven Image Generation System

FLUX.2-dev: Image Generation and Editing Model

Nemotron-Speech-Streaming-ASR: Automatic Speech Recognition Demo

TRELLIS.2 3D Generation Demo

One-click Deployment of Qwen-Image-Lightning

Krea-realtime-video: Real-time Video Generation Model

Build AI with AI

HyperAI Newsletters

Command Palette

PhotoMaker V2 Generates Personalized Photo Pictures in Seconds Demo

PhotoMaker V2: Improved ID fidelity and greater control over V1

Tutorial Introduction

Major improvements in PhotoMaker V2

Effect display

How to run

1. After cloning and starting the container, open the workspace

2. Create a new terminal and enter the command bash run.sh

3. After port 8080 appears, click the link at the API address on the right to enter the model experience

4. After entering the website, you can see the following interface

Discussion and Exchange

Related Notebooks

LongCat-Video: Meituan's open-source AI Video Generation Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

Depth-Anything-3: Restoring Visual Space From Any Perspective

LongCat-Image: A Bilingual Text-Driven Image Generation System

FLUX.2-dev: Image Generation and Editing Model

Nemotron-Speech-Streaming-ASR: Automatic Speech Recognition Demo

TRELLIS.2 3D Generation Demo

One-click Deployment of Qwen-Image-Lightning

Krea-realtime-video: Real-time Video Generation Model

Build AI with AI

HyperAI Newsletters

Related Notebooks

LongCat-Video: Meituan's open-source AI Video Generation Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

Depth-Anything-3: Restoring Visual Space From Any Perspective

LongCat-Image: A Bilingual Text-Driven Image Generation System

FLUX.2-dev: Image Generation and Editing Model

Nemotron-Speech-Streaming-ASR: Automatic Speech Recognition Demo

TRELLIS.2 3D Generation Demo

One-click Deployment of Qwen-Image-Lightning

Krea-realtime-video: Real-time Video Generation Model

Related Notebooks

LongCat-Video: Meituan's open-source AI Video Generation Model

F5-E2 TTS Clones Any Sound in Just 3 Seconds

Depth-Anything-3: Restoring Visual Space From Any Perspective

LongCat-Image: A Bilingual Text-Driven Image Generation System

FLUX.2-dev: Image Generation and Editing Model

Nemotron-Speech-Streaming-ASR: Automatic Speech Recognition Demo

TRELLIS.2 3D Generation Demo

2. Create a new terminal and enter the command `bash run.sh`

2. Create a new terminal and enter the command `bash run.sh`

2. Create a new terminal and enter the command `bash run.sh`