Apple researchers have introduced an unknown model that enables users to describe desired photo edits using prompts without using any photo editing software.
The MGIE model, developed in collaboration with the University of California, Santa Barbara, allows users to perform actions such as cropping, resizing, flipping, and applying filters to images solely through text instructions.
Known as MLLM-Guided Image Editing (MGIE), this model can handle both straightforward and complex editing tasks, including altering specific objects within a photo to change their shape or enhance their brightness.
The model combines two key functionalities of multimodal language models: understanding user prompts and envisioning the desired edits. For instance, requesting a bluer sky in a photo translates to increasing the brightness of the sky portion in the image.
Editing photos with MGIE is straightforward; simply type out the desired changes. For instance, when editing a picture of a pepperoni pizza, typing “make it more healthy” prompts the addition of vegetable toppings.
Similarly, adjusting a photo of tigers in the Sahara is as simple as instructing the model to “add more contrast to simulate more light,” resulting in a brighter image.
The researchers emphasized a shift from vague instructions to clear, visually informed intentions for image editing through their proposed method, MGIE.
They conducted thorough examinations across different editing criteria, showcasing MGIE’s ability to enhance performance without compromising efficiency.
The researchers expressed confidence in the potential of their MLLM-guided framework to advance vision and language studies in the future.
Apple has made MGIE accessible through GitHub for download. Additionally, Apple released a web demo of MGIE on Hugging Face Spaces. The company hasn’t disclosed its plans for MGIE beyond research purposes.
Certain platforms for image generation, such as OpenAI’s DALL-E 3, possess the capability to perform basic photo editing tasks based on textual inputs.
Adobe, known for Photoshop, has its own AI editing model known as Firefly AI. Firefly AI enables generative fill, allowing users to incorporate generated backgrounds into photos.