Unknown Training Procedures for Proprietary API Embedding Models
Ascertain the training procedures of proprietary API embedding models, specifically OpenAI Text-Embedding-v3-Large, Cohere v3 English, and Google Gecko, including whether these models were trained on instruction-following data, to accurately categorize and evaluate their instruction-following capabilities in information retrieval settings.
References
It is mostly unknown what these models' training procedures were---including if they were trained on instructions or not---thus we place them in a distinct category. However, we note that Google's model did explicitly train with instructions, as mentioned in their technical report.
                — FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
                
                (2403.15246 - Weller et al., 22 Mar 2024) in Section 4.1 (Evaluation Settings), API Models paragraph