Developer Offer

Try ImaginePro API with 50 Free Credits

Build and ship AI-powered visuals with Midjourney, Flux, and more — free credits refresh every month.

Smarter Prompts and Sharper Vision AI Research Innovations

2025-07-30•By Ty Tkacik•5 minutes read

Artificial Intelligence

Research

Machine Learning

A group led by researchers at Penn State explores optimizations to AI systems' language and image processing in three upcoming publications, which are currently available online. Credit: Ole_CNX/iStock. All Rights Reserved.

Artificial intelligence systems like ChatGPT and Microsoft Copilot often feel magical in their capabilities, but it's easy to forget the complex science working behind the scenes. According to Rui Zhang, an assistant professor of computer science and engineering at Penn State, there is always room to improve and optimize these complex systems.

Zhang and his research team have recently authored three papers introducing innovative methods for processing high-resolution images and automatically crafting better prompts to get superior responses from AI. These papers are available online and will be presented at major upcoming conferences, including the 63rd Annual Meeting of the Association for Computational Linguistics, the 2025 International Conference on Computer Vision, and the 13th International Conference on Learning Representations.

In a recent discussion, Zhang explained his group's work, its potential to enhance AI's efficiency and utility, and offered strategies for everyday users to get more value from their personal AI interactions.

What is Prompt Engineering and How Can You Improve It

Q: What is prompt engineering? Are there specific things readers can do to write better prompts for an AI system?

Zhang: Prompt engineering is the art of designing effective inputs, or “prompts,” that guide AI systems like ChatGPT to generate better responses. These systems are highly sensitive to how questions are framed, so a well-crafted prompt can dramatically improve the output. For instance, instead of just asking, “summarize this article,” you could prompt, “summarize this article in three bullet points for a high school student.” This additional context allows the AI to tailor its response more effectively. For everyday users, the best strategies are to be clear, specific, and goal-oriented. Don’t hesitate to experiment with different versions of a prompt to refine the results.

Automating Prompts The Power of GReaTer

Q: What are the benefits of automating and optimizing prompt generation?

Zhang: While effective prompt engineering can significantly boost AI performance, finding the perfect prompt often requires time, experimentation, and deep subject matter expertise. Our research introduces a method called GReaTer, which empowers AI systems to automatically generate and refine their own prompts using gradient-based optimization—a powerful algorithm for optimizing data in AI.

Building on this, we developed GReaTerPrompt, a user-friendly, open-source toolkit. It enables models to automatically create and improve prompts for a wide variety of tasks. Automating this process means AI can adapt to new challenges with less human intervention, which improves accuracy, saves time, and reduces costs. This is particularly beneficial for users who lack the time or specific expertise to create an ideal prompt. By making the toolkit open-source, we are providing widespread access to our work for all interested users.

Measuring Success How GReaTer Enhances AI Models

Q: How did you measure the effectiveness of GReaTer? Are there real-world tools that could improve with its implementation?

Zhang: We tested GReaTer on a diverse set of language reasoning and mathematical problem-solving tasks, including answering complex questions, solving logic puzzles, and performing calculations. The results demonstrated that GReaTer significantly boosted performance compared to standard prompting methods. This was especially true for smaller language models that usually struggle with complex tasks due to their limited parameters. In some instances, these GReaTer-optimized smaller models performed on par with much larger ones. Real-world applications that stand to benefit include AI-powered tutors, writing assistants, customer support agents, and any tool that needs to adapt to different users or topics without manual reprogramming.

A New Frontier High Resolution Image Understanding with HRScene

Q: What is HRScene, and why do researchers care about “high‑resolution image understanding?”

Zhang: HRScene is a new benchmark we created to assess how well modern vision-language models like GPT-4V, Gemini, or Claude can comprehend high-resolution, information-rich images containing millions of pixels. While these models can answer questions about images using natural language, they often fail when faced with large, detailed visuals. High-resolution image understanding is vital because many critical scientific and societal applications rely on spotting subtle, localized details that can be missed by models not built to handle large-scale visual data. HRScene features curated examples from fields like radiology, plant phenotyping, remote sensing, and astronomy. It will help accelerate the development and improve the assessment accuracy of AI systems capable of interpreting complex visuals.

Real World Impact From Healthcare to Astronomy

Q: What are the applications of accurate and efficient high-resolution image processing?

Zhang: The potential impact is vast, spanning numerous scientific and social domains. In healthcare, high-resolution AI could help analyze radiology scans like MRIs and CTs more effectively, leading to earlier and more precise diagnoses. In agriculture, it could assist with plant phenotyping—analyzing traits like leaf structure or disease from detailed images—to boost crop yields and promote sustainability. For environmental science and public safety, high-resolution satellite imagery is crucial for disaster monitoring, urban planning, and climate research. Astronomy also stands to benefit, as researchers analyze high-resolution telescope images to find faint or distant celestial objects. AI systems that can reliably process this data could speed up scientific discovery, improve public health tools, and enhance our response to global challenges.

The Team Behind the Breakthroughs

The research included contributions from several Penn State affiliates: Wenpeng Yin, assistant professor of computer science; Yusen Zhang, a computer science doctoral candidate; Sarkar Snigdha Sarathi Das, a computer science doctoral candidate; Wenliang Zheng, a third-year computer science undergraduate student; and several other doctoral candidates and graduate students. Bo Pang and Caiming Xiong from Salesforce also contributed. The research was supported by funding from the U.S. National Science Foundation and Salesforce.

At Penn State, researchers are dedicated to solving real-world problems. Learn more about the importance of federal research funding at Research or Regress.

Read Original Post

Compare Plans & Pricing

Find the plan that matches your workload and unlock full access to ImaginePro.

ImaginePro pricing comparison
Plan	Price	Highlights
Standard	$8 / month	300 monthly credits included Access to Midjourney, Flux, and SDXL models Commercial usage rights
Premium	$20 / month	900 monthly credits for scaling teams Higher concurrency and faster delivery Priority support via Slack or Telegram

Need custom terms? Talk to us to tailor credits, rate limits, or deployment options.

View All Pricing Details