On this page
Better AI Images
Chances are, if you're browsing an article online today, you're looking at an AI-generated image somewhere in that post. These images are typically clichéd and readily recognizable. They often feature a blue hue and depict scenes using robots or futuristic landscapes with brains and glowing cities, reminiscent of science fiction movies or modern cyberpunk themes.
Having your brand or content stand out with AI-generated artwork could be a key differentiator. Larger media companies learned this last year when a collaborative group built a platform called Better Images of AI to promote more engaging representations of AI. Today, our text to image models are way more sophisticated yet some folks may have not put in the work ✨ to get the outputs they need or still stick to older tropes.
So for today, I'd like to share my latest experiment utilizing a tool called Ideogram. They promote themselves as 'Helping People Become More Creative' and indeed, they do deliver on this promise. Especially with images requiring properly spelled and stylized text. This service will become our illustrator agent. Our creative concept artist assistant will be built using OpenAI's latest Assistants API. Our goal is to develop an interactive copilot capable of leveraging human feedback and past experiences to seamlessly ideate and execute our creative requirements.
Architecture & Process
This post will focus on the agentic architecture of our workflow vs. a technical deep dive into the code or the Assistants API behind it. This API (still in beta) is a foundational shift from their completions APIs which required you to manage all messages & tool calls within a LLM's context window. AKA memory. However, feel free to explore the code on GitHub.
Take a moment and zoom into the image, this is not a formal sequence diagram. Each white box (except for the browser tool) represents an LLM-backed Assistant with its own thread. Similar to Custom GPTs, each Assistant can leverage tools such as File Search (retrieval), Code Interpreter, and custom tools (functions). Unlike Custom GPTs, Assistants can leverage Instructions (system prompt) up to 256,000 characters! Simply massive new capabilities y'all.
Why break this out into multiple assistants? Mainly because I am exploring various Panel of Experts architectures. However, there is a more practical answer here, attention via separation of responsibilities. Consider that both the Creative & Magic Prompts assistants need to embody a role with varying levels of knowledge about the customer. Yet only one needs to know about their brand, color schemes, and elements. All of which are required context to translate brand requirements into 3rd party prompts. That context could easily skew the Creative assistant's capabilities to be abstract or "think outside the box" when creating concepts & illustration instructions.
Read more about the Experts.js framework and how to create your own Multi AI Agent System using OpenAI's new Assistants API.
Everything else is straightforward, the user interface here is a simple Node.js console session supporting an interactive chat. What might not be obvious is why use a local browser executor? Ideogram, as amazing as they are, still lacks an API. So our agent will display your artwork using a macOS script to open the browser. Since the chat is ongoing, you can provide feedback on any or all Ideogram images. Repeating this cycle for as long as it takes to be happy with your creative.
Brand Identity
If you do not already have brand guidelines for your illustration needs, here are a few helpful ways to wrangle them together. The brand guidelines are part of our Magic Prompts assistant's instructions. This is where the assistant turns concepts & illustration details into on-brand magic prompts. Use descriptive color names that reflect your brand vs. their technical HEX or RGB values.
Here is a ChatGPT prompt that can help you talk about your brand identity and identify useful guidelines.
Most wont be needed. Try to stay high level. In the next section we will focus more on the artistic and illustration styles needs.
Example
The unRemarkable.ai brand embodies simplicity, catering specifically to AI practitioners rather than focusing on the broader origins of the industry in machine learning or data science. With a basic color palette and a visual style that leverages visual metaphors. Here is a list of things you may need:
- Red Sharpie #DF413F
- Yellow Sharpie: #FFC43C
- Hand-drawn with a heavy marker is preferred.
- Avoid paintbrush texture effects.
Brand Illustration Style
We will need a list of detailed illustration styles for your brand to include in our Magic Prompts assistant's instructions. If you are starting from scratch, here is a way you can turn your current artwork or any creative inspiration into descriptive guidelines. I'm going to use Anthropic's artwork as an example.
Describe with Ideogram
Earlier this month, Ideogram hit several new milestones. Included was the ability to describe your images. Turning them into details prompts which presumable can be used to generate the same image with Ideogram.
Lets get a peak into a few of Anthropic's illustrations from their news posts. I'll feed these into Ideogram, capture their description, then render an image using that description as the prompt.
Anthropic | Ideogram Describe | Ideogram |
---|---|---|
A minimalist illustration of a hand holding a pink-colored circle. The hand is positioned on the left side of the image, and the circle is being gently held between the thumb and index finger. The background is a soft beige color, and the overall design is simplistic and elegant. | ||
A hand holding a clipboard with a checklist. The clipboard is set against a vibrant red background. The checklist contains three squiggly lines, each marked with a check. The hand appears to be in the process of checking off the first line. | ||
A simple, hand-drawn depiction of a rectangular frame. Within this frame, there are three white spheres hanging vertically. One of the spheres is distinctly colored in a shade of orange, making it stand out from the others. The frame is set against a plain, light gray background. |
Like most vision-capable LLMs, Ideogram is not capturing the conceptual details of each object needed to entirely recreate the source image. However, this is really good. Notice how it captured the colors and key descriptions? Also, in two cases it correctly inferred enough of the creative style (highlighted in bold) to just almost capture Anthropic's visual brand. Take notes when you see patterns like these.
Describe with GPT-4V
Just like Ideogram, ChatGPT users who have access to GPT-4V, OpenAI's vision model, can upload several images and ask the following.
- Minimalist: The images employ a very minimal amount of shapes and colors.
- Modernist: The straightforward depiction with little to no embellishment.
- Geometric: Use of geometric forms to abstract real-life objects.
- Flat: The absence of shading or depth and the use of solid colors.
- Line Art: The artwork relies on the clever use of lines to outline & define.
Instruction's Examples
Both the Creative and Magic Prompts agents are going to need feedback loops, examples of what makes a great concept & illustration or Ideogram prompt. Commonly called few-shot prompting or in-context learning, these help "fine tune" your agent's behavior over time. Start with the Creative assistant who's job it is to come up with a concept, the creative thinking behind it, and a detailed illustration description.
Starting from Scratch
When starting out, it could be helpful to come up with a few examples manually based on styles you like. You can use Ideogram's or GPT-4V's describe capabilities to help you. Focus first on the Creative assistant's concept needs by writing very clear concept names, thinking and illustration descriptions. For example, using Anthropic's first image above.
- Concept: A hand removing a stone from the middle of a structured pile.
- Thinking: Like the puzzle game of Jenga, the hand is grabbing a stone which would cause the ones above it to fall if removed. This illustrates a basic concept of 'Safety' as the stones above could hurt your hand. Or it could illustrate the 'How' of safety and if done wrong could cause negative impacts in other areas.
- Illustration Description: A haphazardly stacked small pile of circular stones in the shape of a triangular pile. The stones get smaller as they are stacked up 3 or 4 high. The pile consists roughly of 7 to 9 stones of varying sizes. The stone being pulled out has a few others resting on top. An arm extends from the left side with a hand holding onto a stone in the middle of the triangular pile indicating some might fall when it is pulled from the pile.
Capturing good Ideograms
While chatting with the assistant, occasionally an amazing concept and illustration will surface. I'll use this Ideogram illustration I really liked when exploring the subject of a post tag called "Emergent Behavior".
When I saw the assistant come up with this idea, I captured the relevant bits and added them to the examples in the Creative assistant's instructions. Note how the concept and illustration description are abstract and lack brand context. That's the goal.
- Concept: Digital Vineyard
- Thinking: Depicts data as vine plants spreading across a digital landscape, symbolizing how information grows and intertwines, creating new pathways and connections, much like vines in a vineyard, representing the organic proliferation of digital networks.
- Illustration Description: This image features two rows of utility poles connected by multiple horizontal wires, with green vines and leaves intertwining with the wires. Glowing orbs representing data are interspersed among the leaves, giving the impression of lights along the wires.
Here is the magic prompt that was created for the creative concept. I added this one along with the concept above to the Magic Prompt assistant's instructions.
- Magic Prompt: A minimalist and abstract illustration, hand-drawn with bold, heavy strokes in black marker on a yellow background. Wires and poles stretch across the canvas, with vines in dark green carrying glowing data nodes, intertwining and expanding, depicting the organic spread of digital networks.
Using the Creative Assistant
Simply run the npm command in your terminal after to start your creative copilot. Our demo code is written in such a way that any changes to the assistants or their instructions will recreate the underlying OpenAI Assistants so you can get immediate feedback after adding new instructions such as brand guidelines or examples to learn from.
$ npm run assistant
Tips & Improvements
Overall this is a very iterative process and I'm still learning to to make this assistant work for me using less feedback. Here are some thing I recommend if you are doing something similar.
- Look for artistic hints in the prompts generated that work for you. Adding them to your Magic Prompt assistant's instructions as brand guidelines or examples could help future iterations.
- When starting, your Ideogram prompts might not be magic ✨. Try setting Ideograms "Magic Prompt" to on and see if it helps.
- Leverage Ideograms editor to correct spelling mistakes using their remix feature with an 80-90% image weight.
- In our examples we focus on illustrations, this process should work for any creative image type. For example realistic photography.
- Post process with your favorite image editor. Try not to make Ideogram do everything. For example, all my images are color corrected with Pixelmator's replace color feature.
- If you are not on a Mac, feel free to my usage of AppleScript to something else. ⚠️ Ideogram uses MUI and I found it near impossible to automate their UI with JavaScript.
I'm certain there are numerous tools available for accomplishing this task, and I encourage you to share your methods in the comments. 💞
Other Examples
Here are a few Ideograms created by my assistant while working on this post.
Thanks everyone. Please let me know if you found this useful and I would love to hear from folks that are solving this type of problem with other tools. Remember, to get updates on future posts, you can signup for my newsletter.