BetterWaifu Logo
betterwaifu
BetterWaifu
ModelsBlogPricingDiscord
Generate

Footer

BetterWaifu

  • About
  • Guides
  • Models
  • Contact

Tools

  • Prompt Detector
  • Hentai Generator

Use Cases

  • BSDM AI Generator
  • Futanari AI Generator
  • AI Feet Generator
  • Gay AI Generator
  • Furry AI Generator
  • Femboy AI Generator

Legal

  • Privacy
  • Terms

Subscribe to BetterWaifu

The latest updates on NSFW AI

FacebookInstagramXGitHubYouTube

© 2025 BetterWaifu.com. All rights reserved.

    What are tokens in AI image generation? (BetterWaifu & Stable Diffusion)

    What are tokens in AI image generation? (BetterWaifu & Stable Diffusion)

    BetterWaifu
    Guides
    g

    By gerogero

    Updated: March 11, 2025

    The Power of Words

    The words we use in prompting are called tokens. Each token has its own power, which depends on its frequency of occurrence in the dataset used for training the AI. We’ll get into the technical details in a bit, but first we’ll focus on how to think about tokens practically.

    Let’s take a look at an example: I prompt “1girl wearing white oversized coat with >< and outstretched arms“. If each word is a token, we have 11 tokens.

    (When typing this prompt into BetterWaifu, you’ll notice the token counter displays 13. This is because <start> and <end> tokens are added by the AI under the hood. More on this in the next section. )

    Prepositions like “with” “for” “at” “in” and particles like “and” “a” “to” count as tokens. Thus, they also have power and influence the image.

    Let’s try the same prompt without any particles or prepositions: “1girl, white oversized coat, ><, outstretched arms“. Commas count as tokens, so we are still at 11 tokens, 13 with the start and end tokens. The result is quite different:

    I like this result better because it looks like more attention was put on the coat. The comma token is used to demarcate between different concepts.

    Now, while using comma-separated tags is my recommended approach to prompting on BetterWaifu, that doesn’t mean removing all prepositions and particles always makes the result better. Sometimes, it’s important to use prepositions to indicate relative positions.

    Here’s a simple example of when a preposition makes all the difference:

    1girl, airplane, white oversized coat, ><, outstretched arms
    1girl inside airplane, white oversized coat, ><, outstretched arms

    In addition to a token’s inherent strength, its position in the prompt is also weighted. Tokens at the beginning have greater weight than tokens at the end. It’s important to understand this, as a weak token at the end of the prompt may have no impact on the image. Conversely, a strong token at the beginning can completely determine the outcome.

    To control the strength of a token, you can use the construction (token:1.0), where the number represents the strength of the token. 0 – no influence, 1 – normal weight. I usually don’t go past 1.5. Experimenting with different strength values can help you fine-tune the desired level of control over the tokens in your prompts

    Technical Explanation

    Tokenizing is a common way to handle text data in AI generation. We use it to change the text into numbers and process them with neural networks.

    Stable Diffusion tokenizes a text prompt into a sequence of tokens. For example, it splits the text prompt a cute and adorable bunny into the tokens a, cute, and, adorable, and bunny. Then Stable Diffusion adds <start> and <end> tokens at the beginning and the end of the tokens.

    The resulting token sequence for the above example would be <start>, a, cute, and, adorable, bunny, and <end> (7 tokens).

    For easier computation, Stable Diffusion keeps the token sequences of any text prompts to have the same length of 77 by padding or truncating. If the input prompt has fewer than 77 tokens, under the hood <end> tokens are added to the end of the sequence until it reaches 77 tokens.

    The length of 77 was set to balance performance and computational efficiency. Different software will have different behavior if more then 77 tokens are used:

    • The first 77 tokens are retained and the rest are cut out.
    • The entire prompt is broken into chunks of 75, start and end tokens are added, and each chunk is processed in order. This is the method BetterWaifu uses.

    Related Posts

    BetterWaifu’s Table of Contents

    Use this as an index to find guides and topics of interest on the BetterWaifu blog. Introduction to BetterWaifu Most people who are new to the webs...

    Read more

    BetterWaifu Optimal Image Sizes

    The models on the BetterWaifu site are descended from SDXL, which was trained at a base resolution of 1024×1024, but was further fine-tuned on...

    Read more

    Optimal Settings for Models

    On BetterWaifu the “Auto-settings” option automatically chooses preset optimal settings after you select a model. These settings are de...

    Read more

    How to use negative prompts

    You should reading the prompting guide before proceeding. This will give you an idea of where to find tags and how to use them effectively. What Ar...

    Read more

    What does “Use recommended preprompts” do on BetterWaifu?

    If you check “Use recommended prepompts” in the BetterWaifu generator‘s Advanced Settings, then the following line is added at th...

    Read more

    BetterWaifu’s AI Hentai Generator Guide: Prompt Basics

    AI generators are simple enough: you type in a prompt and then click “Generate” to get your image. The question people have when they a...

    Read more