Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 147 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 27 tok/s Pro

GPT-5 High 30 tok/s Pro

GPT-4o 96 tok/s Pro

Kimi K2 188 tok/s Pro

GPT OSS 120B 398 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Macro-Micro Adversarial Network for Human Parsing (1807.08260v2)

Published 22 Jul 2018 in cs.CV

Abstract: In human parsing, the pixel-wise classification loss has drawbacks in its low-level local inconsistency and high-level semantic inconsistency. The introduction of the adversarial network tackles the two problems using a single discriminator. However, the two types of parsing inconsistency are generated by distinct mechanisms, so it is difficult for a single discriminator to solve them both. To address the two kinds of inconsistencies, this paper proposes the Macro-Micro Adversarial Net (MMAN). It has two discriminators. One discriminator, Macro D, acts on the low-resolution label map and penalizes semantic inconsistency, e.g., misplaced body parts. The other discriminator, Micro D, focuses on multiple patches of the high-resolution label map to address the local inconsistency, e.g., blur and hole. Compared with traditional adversarial networks, MMAN not only enforces local and semantic consistency explicitly, but also avoids the poor convergence problem of adversarial networks when handling high resolution images. In our experiment, we validate that the two discriminators are complementary to each other in improving the human parsing accuracy. The proposed framework is capable of producing competitive parsing performance compared with the state-of-the-art methods, i.e., mIoU=46.81% and 59.91% on LIP and PASCAL-Person-Part, respectively. On a relatively small dataset PPSS, our pre-trained model demonstrates impressive generalization ability. The code is publicly available at https://github.com/RoyalVane/MMAN.

Citations (138)

View on Semantic Scholar

Summary

The paper presents a novel dual-adversarial network (MMAN) that improves human parsing accuracy by targeting semantic misplacements and local pixel inconsistencies.
It integrates two discriminators—Macro for low-resolution semantic correction and Micro for high-resolution detail refinement—via a dual-output generator.
Experimental results show significant mIoU gains and robust performance across datasets like LIP and PASCAL-Person-Part, setting a new benchmark in human parsing.

Macro-Micro Adversarial Network for Human Parsing: An Expert Overview

The paper presented introduces the Macro-Micro Adversarial Network (MMAN), a novel architecture designed to enhance human parsing accuracy by explicitly addressing local and semantic inconsistencies inherent in pixel-wise classification tasks using adversarial networks. The primary contributions and methodologies employed in this paper are meticulously constructed, aiming to surpass traditional approaches, which often fail to tackle these inconsistency issues effectively.

Overview and Methodology

The MMAN framework innovatively integrates two discriminators, Macro $D$ and Micro $D$ , each tasked with a distinct focus within the parsing process—semantic consistency and local detail consistency, respectively. This bifurcated approach is particularly adept at addressing parsing inconsistencies that previous single adversarial networks struggled with. The Macro discriminator examines low-resolution maps to mitigate semantic errors such as misplacement of body parts. Conversely, the Micro discriminator assesses high-resolution patches to handle local pixel-level issues, such as noise and fuzzy borders.

This architecture's dual discriminator system is supported by a dual-output generator, a variation on the DeepLab-ASPP architecture. The generator produces two segmentation map outputs, directing one to each discriminator. Such a division not only optimizes the parsing task by specifying error correction at different levels but also facilitates balanced adversarial training, reducing the risk of convergence issues that arise when training with high-resolution data.

Experimental Results and Implications

The empirical evaluation demonstrates that MMAN outperforms several state-of-the-art techniques in human parsing benchmarks, achieving mean Intersection over Union (mIoU) scores of 46.81% on the LIP dataset and 59.91% on the PASCAL-Person-Part dataset. These scores underscore the framework's efficacy, particularly in delineating human parts with complex shape structures, evidenced by significant improvements in differentiating limbs and other body parts.

Of note is the enhanced generalization performance of MMAN on smaller datasets like PPSS, where it surpassed previous models without specific fine-tuning. This robustness indicates MMAN's potential as a flexible solution for varying datasets and settings, owing to its dual-layer correction mechanism.

Implications and Future Directions

The MMAN framework opens several avenues for further exploration in human parsing and other pixel-wise prediction tasks. Its dual adversarial approach can be a prototype for developing new architectures addressing specific granularity levels. Future research could explore automating the discriminator's focus adjustment based on the input data's nature or integrating additional context features to enhance parsing quality further.

Moreover, the paper suggests that employing task-specific adversarial networks could be extended beyond human parsing to other domains where intrinsic inconsistencies pose significant challenges, such as urban scene parsing or medical image segmentation.

In conclusion, the MMAN presents a significant contribution to the field of human parsing, demonstrating a sophisticated understanding of adversarial networks' potential in addressing complex parsing tasks through a structured, innovative approach. The research sets a benchmark for integrating multiple levels of adversarial supervision, enhancing both local and semantic consistency in generated label maps and offering insightful prospects for future studies in computational vision tasks.