1-D CNN Feature Extractor
- 1-D CNN feature extractor is a specialized neural network model that uses layered 1-D convolutions to extract localized features from sequential data like time series and audio signals.
- It employs multiple convolutional, activation, and pooling layers to build hierarchical representations, efficiently downsampling data for subsequent analysis.
- The design supports applications in fault diagnosis and domain adaptation by leveraging adversarial training and optimized parameter allocation for robust performance.
A 1-dimensional convolutional neural network (1-D CNN) feature extractor is a hierarchical neural architecture designed to learn and extract meaningful representations from univariate or multivariate sequential data, such as time series, audio, vibration signals, or other 1-D signals. It employs stacked convolutional layers and channel-wise operations to progressively distill raw input sequences into lower-dimensional, semantically salient features suitable for downstream supervised or unsupervised tasks. The 1-D CNN paradigm enables the discovery of localized and hierarchical patterns across the input domain, holding significant utility in domains ranging from condition monitoring to signal processing and function approximation.
1. Formal Structure and Architectural Variants
A generic 1-D CNN feature extractor processes an input vector through a sequence of convolutional, activation, and pooling modules. Each convolutional layer applies a set of 1-D kernels (with kernel size and stride ) across the channel outputs of the preceding layer, interleaved with nonlinearities (typically ReLU) and optional channel-wise biases . This mechanism is formalized as:
where denotes 1-D convolution with stride , and is typically ReLU activation. Spatial dimensionality 0 and channel number 1 evolve with depth, with typical practice employing downsampling (stride equal to kernel size), causing 2 to halve at each layer, and channel counts 3 that grow then contract for parameter efficiency (Li et al., 2022).
The A2CNN architecture exemplifies a concrete 1-D CNN feature extractor for bearing fault diagnosis, operating on FFT amplitude vectors 4 through a hierarchy of five convolutional+ReLU+max-pooling blocks and two fully connected layers:
- Conv1: 8 filters, kernel=32, stride=2, output 5
- Pool1: pool=2, stride=2, output 6
- Conv2: 16 filters, kernel=16, stride=2, output 7
- Pool2: pool=2, stride=2, output 8
- Conv3: 32 filters, kernel=8, stride=2, output 9
- Pool3: pool=2, stride=2, output 0
- Conv4: 32 filters, kernel=8, stride=2, output 1
- Pool4: pool=2, stride=2, output 2
- Conv5: 64 filters, kernel=3, stride=2, output 3
- Pool5: pool=2, stride=2, output 4
- FC1: 500 units, ReLU
- FC2: 5 units (softmax label outputs) (Zhang et al., 2018)
2. Theoretical Foundations and Feature Extraction Guarantees
Rigorous analysis demonstrates that deep 1-D multi-channel CNNs, with appropriate layer depths, kernel sizes (=stride), and channel allocation, have the expressivity to serve as exact linear feature extractors. For an input 6 and a dictionary of 7 target feature vectors 8, a 9-layer network can realize 0 via multi-resolution convolutions. Intermediate layers yield all 1-length patch features of the target vectors, and the parameter count is bounded by 2, considerably more efficient than fully-connected layers (Li et al., 2022). This reveals a theoretical equivalence between deep 1-D CNNs and certain classical transforms (e.g., wavelets, SVD factorizations) in their capacity for structured linear feature extraction.
3. Feature Map Dimensions and Downsampling
At each layer, the output feature map length 3 is determined by the input length 4, kernel size 5, stride 6, and padding 7:
8
For max-pooling with pool size 9:
0
Channels per layer match the number of filters. Stride equaling kernel size yields rapid downsampling, consistent with practical approaches where spatial dimension is halved per layer, concentrating high-level receptive fields at depth (Zhang et al., 2018, Li et al., 2022).
4. Partial Layer Untying and Domain Adaptation
In domain adaptation scenarios, e.g., A2CNN, feature extractors are instantiated per domain (source 1, target 2). The lower layers (up to 3) are shared across domains, enforcing basic filter consistency, while the upper 4 layers are "untied" (domain-specific weights), allowing target-specific high-level feature adaptation. This yields a compromise between domain invariance (via tied layers) and adaptation flexibility (via untied layers), supporting robust transfer learning under covariate shift (Zhang et al., 2018).
5. Adversarial Training and Objective Functions
The 1-D CNN feature extractor can be integrated into an adversarial training scheme that enforces extracted features to be both discriminative for source labels and invariant across domains. Key loss terms are:
- Source label cross-entropy:
5
- Domain discriminator loss (binary cross-entropy):
6
- Adversarial loss for target extractor:
7
The overall objective combines 8 and 9 for balance. Training follows pre-training on source, then alternating domain discriminator and adversarial feature extractor updates. Gradient reversal layers can implement adversarial updates efficiently (Zhang et al., 2018).
6. Practical Implementation Notes and Computational Considerations
1-D CNN feature extractors admit efficient implementation:
- Layer depth 0 suffices to extract 1-dimensional features with 2 parameters (Li et al., 2022).
- Pooling and strides are matched for maximal downsampling and parameter efficiency.
- Channel counts double then halve, peaking at 3, minimizing memory and computation cost.
- Post-convolutional features can be fed directly into any fully-connected network, ResNet module, or classifier head for supervised tasks.
- ReLU activations can be effectively linearized via large positive biases where needed (Zhang et al., 2018, Li et al., 2022).
7. Extensions and Analytical Perspectives
Recent analysis establishes deeper connections between 1-D and 2-D CNN architectures, using vectorization to show that multi-layer 2-D convolutions can be reformulated as 1-D convolutions with appropriately structured filters. This provides insights on how CNNs extract singular values and enables novel theoretical guarantees on their function approximation rates, which surpass those of shallow or fully-connected architectures under certain regularity conditions. Open directions include generalizing these results to stride-1 architectures, grouped/dilated convolutions, and quantifying the effects of finite precision and stochastic optimization (Li et al., 2022).