Mastering Image Generation with Google Nano Banana
The Rise of AI Image Generation
Generative AI for image creation has transformed how individuals and businesses produce visual content. These tools empower users to create specific visuals in seconds, bypassing the need for extensive design skills and accelerating tasks that traditionally took hours or days.
The market is filled with advanced image generation models like Stable Diffusion, Midjourney, DALL-E, and Google's own Imagen, each offering distinct advantages. Recently, Google made a significant leap forward with the release of Gemini 2.5 Flash Image, also known as nano-banana.
Image by Author | Gemini (nano-banana self portrait)
Introducing Google's Nano-Banana
Nano-banana is Google's state-of-the-art model for both generating and editing images. Its key features include creating highly realistic images, blending multiple images, maintaining consistent characters across different scenes, and applying targeted, prompt-based transformations. It offers a level of control that surpasses many previous models from Google and its competitors.
This guide will walk you through nano-banana's capabilities, demonstrating how to generate and edit images using both the Google AI Studio platform and the Gemini API in a Python environment.
Getting Started with Nano-Banana
To begin, you'll need a Google account to sign in to Google AI Studio. To use the Gemini API, you must also acquire an API key, which requires a paid plan.
For those who want to use the API with Python, install the necessary library with this command:
bash pip install google-genai
Once your account is ready, navigate to Google AI Studio and choose the Gemini-2.5-flash-image-preview
model, which is the official name for nano-banana.
Generating Your First Image
After selecting the model, you can start a new chat to generate an image. A key principle for achieving the best results is to describe the scene narratively, rather than just listing keywords. This descriptive approach helps the model better understand your vision.
In the AI Studio chat interface, you can enter your prompt in the text box.
Let's use a detailed prompt to generate a photorealistic image:
A photorealistic close-up portrait of an Indonesian batik artisan, hands stained with wax, tracing a flowing motif on indigo cloth with a canting pen. She works at a wooden table in a breezy veranda; folded textiles and dye vats blur behind her. Late-morning window light rakes across the fabric, revealing fine wax lines and the grain of the teak. Captured on an 85 mm at f/2 for gentle separation and creamy bokeh. The overall mood is focused, tactile, and proud.
Here is the generated image:
The resulting image is highly realistic and accurately reflects the detailed prompt. To achieve this same result using Python, you can use the following code snippet:
python from google import genai from google.genai import types from PIL import Image from io import BytesIO from IPython.display import display
Replace 'YOUR-API-KEY' with your actual API key
api_key = 'YOUR-API-KEY' client = genai.Client(api_key=api_key)
prompt = "A photorealistic close-up portrait of an Indonesian batik artisan, hands stained with wax, tracing a flowing motif on indigo cloth with a canting pen. She works at a wooden table in a breezy veranda; folded textiles and dye vats blur behind her. Late-morning window light rakes across the fabric, revealing fine wax lines and the grain of the teak. Captured on an 85 mm at f/2 for gentle separation and creamy bokeh. The overall mood is focused, tactile, and proud."
response = client.models.generate_content( model="gemini-2.5-flash-image-preview", contents=prompt, )
image_parts = [ part.inline_data.data for part in response.candidates[0].content.parts if part.inline_data ]
if image_parts: image = Image.open(BytesIO(image_parts[0])) # image.save('your_image.png') display(image)
Advanced Image Editing and Manipulation
While nano-banana excels at generating images from scratch, its real power lies in its editing capabilities. Let's explore how to modify the image we just created.
Prompt-Based Editing
We can make a small change by adding reading glasses to the artisan with a simple prompt:
Using the provided image, place a pair of thin reading glasses gently on the artisan's nose while she draws the wax lines. Ensure reflections look realistic and the glasses sit naturally on her face without obscuring her eyes.
The model edits the original image while keeping everything else consistent:
To perform this edit in Python, you provide the base image along with the new prompt:
python from PIL import Image
This code assumes 'client' has been configured from the previous step
base_image = Image.open('/path/to/your/photo.png') edit_prompt = "Using the provided image, place a pair of thin reading glasses gently on the artisan's nose..."
response = client.models.generate_content( model="gemini-2.5-flash-image-preview", contents=[edit_prompt, base_image])
Character Consistency
Let's generate a new scene while keeping the same person. This time, she will be looking at the camera and smiling.
Generate a new and photorealistic image using the provided image as a reference for identity: the same batik artisan now looking up at the camera with a relaxed smile, seated at the same wooden table. Medium close-up, 85 mm look with soft veranda light, background jars subtly blurred.
The result maintains the character's identity in a new pose:
Let's try an even more significant change, where she presents a finished cloth:
Create a product-style image using the provided image as identity reference: the same artisan presenting a finished indigo batik cloth, arms extended toward the camera. Soft, even window light, 50 mm look, neutral background clutter.
Even with a completely different scene, the character remains consistent:
Style Transfer
Nano-banana can also transfer the style of an image. Let's change our photorealistic image into a watercolor painting.
Using the provided image as identity reference, recreate the scene as a delicate watercolor on cold-press paper: loose indigo washes for the cloth, soft bleeding edges on the floral motif, pale umbers for the table and background. Keep her pose holding the fabric, gentle smile, and round glasses; let the veranda recede into light granulation and visible paper texture.
The model successfully applies the new style while preserving the subject and composition:
Image Fusion
Finally, let's try fusing an object from one image into another. First, we'll generate an image of a hat:
Now, we'll use a prompt to place this hat on our artisan's head in the watercolor image:
Move the same woman and pose outdoors in open shade and place the straw hat from the product image on her head. Align the crown and brim to the head realistically; bow over her right ear (camera left), ribbon tails drifting softly with gravity. Use soft sky light as key with a gentle rim from the bright background. Maintain true straw and lace texture, natural skin tone, and a believable shadow from the brim over the forehead and top of the glasses. Keep the batik cloth and her hands unchanged. Keep the watercolor style unchanged.
This process merges the two images. You can do this in Python by providing both images and the fusion prompt:
python from PIL import Image
This code assumes 'client' has been configured from the first step
base_image = Image.open('/path/to/your/photo.png') hat_image = Image.open('/path/to/your/hat.png') fusion_prompt = "Move the same woman and pose outdoors in open shade and place the straw hat..."
response = client.models.generate_content( model="gemini-2.5-flash-image-preview", contents=[fusion_prompt, base_image, hat_image])
For best results, it's recommended to use a maximum of three input images to avoid a reduction in output quality.
Final Thoughts
Google's Gemini 2.5 Flash Image, or nano-banana, is a powerful new tool in the world of AI image generation. Its greatest strength lies in editing existing images, allowing for remarkable transformations while maintaining consistency across a series of visuals.
Experiment with the model yourself. Iteration is key, as the perfect image often comes after a few attempts and prompt refinements.