Pretraining a specialized tokenizer for camera maps
Develop and pretrain a tokenizer specialized for pixel‑wise camera maps (e.g., Perspective Field encodings of up‑vector and latitude) instead of reusing the image VAE tokenizer, to improve geometric conditioning fidelity in the Puffin framework.
References
Pretraining a specialized tokenizer for camera maps is left as future work.
— Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
(2510.08673 - Liao et al., 9 Oct 2025) in Section 6.1 (Implementation Details), Network Configuration