Developer Offer
Try ImaginePro API with 50 Free Credits
Build and ship AI-powered visuals with Midjourney, Flux, and more — free credits refresh every month.
AI Generates Private and Coherent Synthetic Photo Albums
The Privacy Challenge in Modern AI
Differential privacy (DP) offers a powerful, mathematically rigorous guarantee that sensitive individual information within a dataset is protected during analysis. Since its inception nearly two decades ago, researchers have applied DP to everything from simple statistics to fine-tuning complex AI models. However, requiring organizations to privatize every single analytical tool they use can be a complex, burdensome, and error-prone process.
Generative AI models, such as Gemini, present a more streamlined solution. Instead of modifying each analysis method, these models can create a single, private synthetic version of an entire dataset. This synthetic data captures common patterns without containing unique details from any individual. By using a differentially private training algorithm like DP-SGD to fine-tune the generative model, we can ensure the resulting synthetic dataset is both private and highly representative of the original. This allows any standard analytical technique to be performed on the safe substitute dataset, simplifying privacy workflows.
While most research on private synthetic data has focused on simpler outputs like short text or individual images, modern applications demand more. They rely on modeling complex systems and behaviors using multi-modal data, which unstructured text cannot fully capture.
To address this need, we are introducing a new method for privately generating synthetic photo albums. This task goes beyond creating single images, as it requires maintaining thematic coherence and character consistency across a sequence of photos. Our method, which translates image data to text and back, successfully preserves the high-level semantic information needed for effective analysis, all while providing rigorous DP guarantees.
How Our Hierarchical Method Works
Our approach is distinct from other private synthetic image generation techniques in two key ways: we use an intermediate text representation, and we generate the data hierarchically.
Here is a breakdown of the process:
- We begin by generating a structured text representation of each original album. An AI model creates a detailed text caption for every photo, and another model produces an overall text summary for the album.
- Next, we privately fine-tune a pair of large language models (LLMs). The first LLM is trained to generate album summaries, and the second is trained to generate individual photo captions based on a given album summary.
- We use these trained models to generate new, structured representations of photo albums hierarchically. For each album, we first generate a summary and then use that summary as context to generate a detailed text caption for each photo.
- Finally, these generated text representations are converted into sets of images using a text-to-image AI model.
Using text as an intermediate step offers several advantages. First, LLMs excel at text generation. Second, the process is inherently privacy-enhancing because describing an image with text is a lossy operation, making it unlikely that synthetic photos will be exact copies of originals. Lastly, it is far more resource-intensive to generate images than text, so this method allows us to filter for content at the text stage before committing computational resources to image creation.
Our hierarchical strategy ensures internal consistency, as every photo caption in an album is generated using the same summary as context. This two-step process also saves significant computational resources. Since the training cost of models with self-attention scales quadratically with context length, training two models with shorter contexts is much more efficient than training one model with a very long one.
Concurrent work by Wang et al. has also shown how text-based intermediaries can be leveraged to generate differentially private single images using Private Evolution.
Evaluation and Results
We tested our method on the YFCC100M dataset, which contains nearly 100 million images under the Creative Commons license. We created "albums" by grouping photos taken by the same user within the same hour, ensuring no user contributed more than one example to any training set to maintain the DP guarantee.
To evaluate the resemblance between the original and synthetic albums, we first computed the MAUVE score, a measure of semantic similarity. The figure referenced in the original article shows strong MAUVE scores between real and synthetic album summaries and photo captions, especially after fine-tuning.
Next, we analyzed the most common topics in the album summaries. The topics were found to be very similar between the real and synthetic data, indicating that the core themes were preserved.
Finally, a direct visual examination of the synthetic photo albums reveals that each album is typically centered on a common theme, just like real photo albums. The examples in the original article demonstrate this thematic coherence.
Conclusion
The demands of modern AI require data that is not only private but also structurally and contextually rich. Our hierarchical, text-as-intermediate method for generating coherent synthetic photo albums shows a viable path for extending the benefits of synthetic data beyond simple text or isolated images.
This methodology opens up exciting new possibilities for privacy-preserving AI innovation. It helps resolve the tension between the need for large-scale, high-quality data and the critical imperative to protect user privacy, paving the way for safer and more advanced AI development.
Acknowledgements
This work is the result of a collaboration between many people at Google Research, including (in alphabetical order by last name): Kareem Amin, Alex Bie, Rudrajit Das, Alessandro Epasto, Weiwei Kong, Alex Kurakin, Natalia Ponomareva, Monica Ribero, Jane Shapiro, Umar Syed, and Sergei Vassilvitskii.
Compare Plans & Pricing
Find the plan that matches your workload and unlock full access to ImaginePro.
Plan | Price | Highlights |
---|---|---|
Standard | $8 / month |
|
Premium | $20 / month |
|
Need custom terms? Talk to us to tailor credits, rate limits, or deployment options.
View All Pricing Details