CLIPtone: Unsupervised Learning for Text-based Image Tone Adjustment
Abstract: Recent image tone adjustment (or enhancement) approaches have predominantly adopted supervised learning for learning human-centric perceptual assessment. However, these approaches are constrained by intrinsic challenges of supervised learning. Primarily, the requirement for expertly-curated or retouched images escalates the data acquisition expenses. Moreover, their coverage of target style is confined to stylistic variants inferred from the training data. To surmount the above challenges, we propose an unsupervised learning-based approach for text-based image tone adjustment method, CLIPtone, that extends an existing image enhancement method to accommodate natural language descriptions. Specifically, we design a hyper-network to adaptively modulate the pretrained parameters of the backbone model based on text description. To assess whether the adjusted image aligns with the text description without ground truth image, we utilize CLIP, which is trained on a vast set of language-image pairs and thus encompasses knowledge of human perception. The major advantages of our approach are three fold: (i) minimal data collection expenses, (ii) support for a range of adjustments, and (iii) the ability to handle novel text descriptions unseen in training. Our approach's efficacy is demonstrated through comprehensive experiments, including a user study.
- URL https://www.adobe.com/au/products/photoshop-lightroom-classic.html.
- URL https://github.com/meodai/color-names.
- Restyle: A residual-based stylegan encoder via iterative refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6711–6720, 2021.
- Hyperstyle: Stylegan inversion with hypernetworks for real image editing. In Proceedings of the IEEE/CVF conference on computer Vision and pattern recognition, pages 18511–18521, 2022.
- Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18208–18218, 2022.
- Paint by word. arXiv preprint arXiv:2103.10951, 2021.
- Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18392–18402, 2023.
- Learning photographic global tonal adjustment with a database of input/output image pairs. In CVPR 2011, pages 97–104. IEEE, 2011.
- Supervised and unsupervised learning of parameterized color enhancement. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 992–1000, 2020.
- L-cad: Language-based colorization with any-level descriptions using diffusion priors. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Deep photo enhancer: Unpaired learning for image enhancement from photographs with gans. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6306–6314, 2018.
- Vqgan-clip: Open domain image generation and editing with natural language guidance. In European Conference on Computer Vision, pages 88–105. Springer, 2022.
- Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
- Stylegan-nada: Clip-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG), 41(4):1–13, 2022.
- Deep bilateral learning for real-time image enhancement. ACM Transactions on Graphics (TOG), 36(4):1–12, 2017.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Conditional sequential modulation for efficient global image retouching. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pages 679–695. Springer, 2020.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Exposure: A white-box photo post-processing framework. ACM Transactions on Graphics (TOG), 37(2):1–17, 2018.
- Unicolor: A unified framework for multi-modal colorization with transformer. ACM Transactions on Graphics (TOG), 41(6):1–16, 2022.
- Dslr-quality photos on mobile devices with deep convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 3277–3285, 2017.
- Wespe: weakly supervised photo enhancer for digital cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 691–700, 2018.
- Language-guided global image editing via cross-modal cyclic mechanism. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2115–2124, 2021.
- Gan inversion for out-of-range images with geometric transformations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13941–13949, 2021.
- Representative color transform for image enhancement. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4459–4468, 2021.
- Global and local enhancement networks for paired and unpaired image enhancement. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pages 339–354. Springer, 2020.
- Dynagan: Dynamic few-shot adaptation of gans to multiple domains. In SIGGRAPH Asia 2022 Conference Papers, pages 1–8, 2022.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Clipstyler: Image style transfer with a single text condition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18062–18071, 2022.
- Manigan: Text-guided image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7880–7889, 2020.
- Ppr10k: A large-scale portrait photo retouching dataset with human-region mask and group-level consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 653–661, 2021.
- 4d lut: learnable context-aware 4d lookup table for image enhancement. IEEE Transactions on Image Processing, 32:4742–4756, 2023.
- Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. arXiv preprint arXiv:2106.04489, 2021.
- Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
- Deeplpf: Deep local parametric filters for image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12826–12835, 2020.
- Curl: Neural curve layers for global image enhancement. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 9796–9803. IEEE, 2021.
- Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2085–2094, 2021.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Hypergan: A generative model for diverse, performant neural networks. In International Conference on Machine Learning, pages 5361–5369. PMLR, 2019.
- Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2287–2296, 2021.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- Learning by planning: Language-guided global image editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13590–13599, 2021.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
- Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020.
- Designing an encoder for stylegan image manipulation. ACM Transactions on Graphics (TOG), 40(4):1–14, 2021.
- Exploring clip for assessing the look and feel of images. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2555–2563, 2023.
- Real-time image enhancer via learnable spatial-aware 3d lookup tables. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2471–2480, 2021.
- Tedigan: Text-guided diverse face image generation and manipulation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2256–2265, 2021.
- Automatic photo adjustment using deep neural networks. ACM Transactions on Graphics (TOG), 35(2):1–15, 2016.
- Adaint: Learning adaptive intervals for 3d lookup tables on real-time image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17522–17531, 2022a.
- Seplut: Separable image-adaptive lookup tables for real-time image enhancement. In European Conference on Computer Vision, pages 201–217. Springer, 2022b.
- Learning image-adaptive 3d lookup tables for high performance photo enhancement in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4):2058–2073, 2020.
- Clut-net: Learning adaptively compressed representations of 3dluts for lightweight image enhancement. In Proceedings of the 30th ACM International Conference on Multimedia, pages 6493–6501, 2022.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.