Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing (2405.04007v1)

Published 7 May 2024 in cs.CV

Abstract: In this technical report, we introduce SEED-Data-Edit: a unique hybrid dataset for instruction-guided image editing, which aims to facilitate image manipulation using open-form language. SEED-Data-Edit is composed of three distinct types of data: (1) High-quality editing data produced by an automated pipeline, ensuring a substantial volume of diverse image editing pairs. (2) Real-world scenario data collected from the internet, which captures the intricacies of user intentions for promoting the practical application of image editing in the real world. (3) High-precision multi-turn editing data annotated by humans, which involves multiple rounds of edits for simulating iterative editing processes. The combination of these diverse data sources makes SEED-Data-Edit a comprehensive and versatile dataset for training language-guided image editing model. We fine-tune a pretrained Multimodal LLM (MLLM) that unifies comprehension and generation with SEED-Data-Edit. The instruction tuned model demonstrates promising results, indicating the potential and effectiveness of SEED-Data-Edit in advancing the field of instructional image editing. The datasets are released in https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Magicbrush: A manually annotated dataset for instruction-guided image editing. Advances in Neural Information Processing Systems, 36, 2024.
  2. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18392–18402, 2023.
  3. Hq-edit: A high-quality dataset for instruction-based image editing. arXiv preprint arXiv:2404.09990, 2024.
  4. Emu edit: Precise image editing via recognition and generation tasks. arXiv preprint arXiv:2311.10089, 2023.
  5. Guiding instruction-based image editing via multimodal large language models. arXiv preprint arXiv:2309.17102, 2023.
  6. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  7. Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2(3):8, 2023.
  8. Pixart: Fast training of diffusion transformer for photorealistic text-to-image synthesis. arXiv preprint arXiv:2310.00426, 2023.
  9. Seed-x: Multimodal models with unified multi-granularity comprehension and generation. arXiv preprint arXiv:2404.14396, 2024.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  11. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  12. Laion-aesthetics. https://laion.ai/blog/laion-aesthetics/, 2022.
  13. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  14. Dall-e 2. https://openai.com/dall-e-2, 2022.
  15. OpenAI. Gpt-4 technical report, 2023.
  16. Gpt-4v(ision) system card. 2023.
  17. Improving image generation with better captions.
  18. Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744, 2023.
  19. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023.
  20. Segment anything. arXiv:2304.02643, 2023.
  21. Resolution-robust large mask inpainting with fourier convolutions. arXiv preprint arXiv:2109.07161, 2021.
  22. Unsplash. https://github.com/unsplash/datasets, 2023.
  23. OpenAI. Introducing chatgpt. https://openai.com/blog/chatgpt, 2022.
  24. Plug-and-play diffusion features for text-driven image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1921–1930, June 2023.
  25. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  26. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. International journal of computer vision, 128(7):1956–1981, 2020.
  27. Photoshopbattles. https://www.reddit.com/r/photoshopbattles/, 2024.
  28. Photoshop gurus. https://www.photoshopgurus.com/forum/, 2024.
  29. Photoshoprequest. https://www.reddit.com/r/PhotoshopRequest/, 2024.
  30. Zhopped. http://zhopped.com/, 2024.
  31. Journeydb: A benchmark for generative image understanding. Advances in Neural Information Processing Systems, 36, 2024.
  32. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yuying Ge (39 papers)
  2. Sijie Zhao (15 papers)
  3. Chen Li (386 papers)
  4. Yixiao Ge (99 papers)
  5. Ying Shan (252 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com