Multi-modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

AccentBox: High-Fidelity Zero-Shot Accent Generation

AccentBox: Towards High-Fidelity Zero-Shot Accent Generation