Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generator Assisted Mixture of Experts For Feature Acquisition in Batch (2312.12574v1)

Published 19 Dec 2023 in cs.LG

Abstract: Given a set of observations, feature acquisition is about finding the subset of unobserved features which would enhance accuracy. Such problems have been explored in a sequential setting in prior work. Here, the model receives feedback from every new feature acquired and chooses to explore more features or to predict. However, sequential acquisition is not feasible in some settings where time is of the essence. We consider the problem of feature acquisition in batch, where the subset of features to be queried in batch is chosen based on the currently observed features, and then acquired as a batch, followed by prediction. We solve this problem using several technical innovations. First, we use a feature generator to draw a subset of the synthetic features for some examples, which reduces the cost of oracle queries. Second, to make the feature acquisition problem tractable for the large heterogeneous observed features, we partition the data into buckets, by borrowing tools from locality sensitive hashing and then train a mixture of experts model. Third, we design a tractable lower bound of the original objective. We use a greedy algorithm combined with model training to solve the underlying problem. Experiments with four datasets show that our approach outperforms these methods in terms of trade-off between accuracy and feature acquisition cost.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds. In ICLR.
  2. Wrapper based feature selection in semantic medical information retrieval. Journal of Medical Imaging and Health Informatics, 6(3): 802–805.
  3. Charikar, M. S. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, 380–388.
  4. Approximate submodularity and its applications: Subset selection, sparse approximation and dictionary selection. The Journal of Machine Learning Research, 19(1): 74–107.
  5. Classification Under Human Assistance. AAAI.
  6. Sequential approaches for learning datum-wise sparse representations. Machine learning, 89: 87–122.
  7. Restricted strong convexity implies weak submodularity. The Annals of Statistics, 46(6B): 3539–3568.
  8. Normalized mutual information feature selection. IEEE Transactions on neural networks, 20(2): 189–201.
  9. Online sparse linear regression. In Conference on Learning Theory, 960–970. PMLR.
  10. Feature selection for ranking. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 407–414.
  11. Difa: Differentiable feature acquisition.
  12. Icebreaker: Element-wise efficient information acquisition with a bayesian deep latent gaussian model. In Advances in Neural Information Processing Systems, 14820–14831.
  13. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
  14. Hans, C. 2009. Bayesian lasso regression. Biometrika, 96(4): 835–845.
  15. Submodular maximization beyond non-negativity: Guarantees, fast algorithms, and applications. In International Conference on Machine Learning, 2634–2643. PMLR.
  16. beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations.
  17. A survey on online feature selection with streaming features. Frontiers of Computer Science, 12: 479–493.
  18. Active feature acquisition with supervised matrix completion. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1571–1579.
  19. Approximate Nearest Neighbors: Towards Removing the.
  20. Efficient sublinear-regret algorithms for online sparse linear regression with limited observation. Advances in Neural Information Processing Systems, 30.
  21. Classification with costly features using deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 3959–3966.
  22. Classification with costly features as a sequential decision-making problem. Machine Learning, 109: 1587–1615.
  23. Adaptive feature selection: Computationally efficient online sparse linear regression under rip. In International Conference on Machine Learning, 1780–1788. PMLR.
  24. Learning from less data: A unified data subset selection and active learning framework for computer vision. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 1289–1299. IEEE.
  25. Scalable greedy feature selection via weak submodularity. In Artificial Intelligence and Statistics, 1560–1568. PMLR.
  26. Grad-match: Gradient matching based data subset selection for efficient deep model training. In International Conference on Machine Learning, 5464–5474. PMLR.
  27. An Interactive Multi-Label Consensus Labeling Model for Multiple Labeler Judgments. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI), 2018, 1479–1486. AAAI Press.
  28. Active feature acquisition with generative surrogate models. In International Conference on Machine Learning, 6450–6459. PMLR.
  29. Dynamic Feature Acquisition with Arbitrary Conditional Flows. arXiv preprint arXiv:2006.07701.
  30. Liu, A. 2016. Robust classification under covariate shift with application to active learning. In AAAI.
  31. Dynamic instance-wise joint feature selection and classification. IEEE Transactions on Artificial Intelligence, 2(2): 169–184.
  32. Eddi: Efficient dynamic discovery of high-value information with partial vae. arXiv preprint arXiv:1809.11142.
  33. Active feature-value acquisition for classifier induction. In Fourth IEEE International Conference on Data Mining (ICDM’04), 483–486. IEEE.
  34. Lazier than lazy greedy. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29.
  35. Using Partial Monotonicity in Submodular Maximization. arXiv preprint arXiv:2202.03051.
  36. Sample efficient stochastic gradient iterative hard thresholding method for stochastic sparse linear regression with limited attribute observation. Advances in Neural Information Processing Systems, 31.
  37. Active feature-value acquisition. Management Science, 55(4): 664–684.
  38. Active Learning for Convolutional Neural Networks: A Core-Set Approach. In International Conference on Learning Representations.
  39. Settles, B. 2009. Active learning literature survey. Technical report, University of Wisconsin-Madison Department of Computer Sciences.
  40. Towards understanding the geometry of knowledge graph embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 122–131.
  41. Joint active feature acquisition and classification with variable-size set encoding. In Advances in neural information processing systems, 1368–1378.
  42. Feature Selection via Dependence Maximization. Journal of Machine Learning Research, 13(5).
  43. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
  44. Efficientnetv2: Smaller models and faster training. In International conference on machine learning, 10096–10106. PMLR.
  45. Submodularity in data subset selection and active learning. In International Conference on Machine Learning, 1954–1963.
  46. Group-wise feature selection for supervised learning. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3149–3153. IEEE.
  47. Feature selection using stochastic gates. In International Conference on Machine Learning, 10648–10659. PMLR.
  48. Scalable and accurate online feature selection for big data. ACM Transactions on Knowledge Discovery from Data (TKDD), 11(2): 1–39.
  49. Wide residual networks. arXiv preprint arXiv:1605.07146.
  50. Active Learning under Label Shift. arXiv preprint arXiv:2007.08479.

Summary

We haven't generated a summary for this paper yet.