TY - JOUR T1 - Diffusion-based Audio-to-Visual Generation for High-Quality Bird Images AU - Toleubekova, Adel AU - Shim, Joo Yong AU - Piao, XinYu AU - Kim, Jong-Kook JO - KIPS Transactions on Software and Data Engineering PY - 2025 DA - 2025/1/30 DO - https://doi.org/10.3745/TKIPS.2025.14.3.135 KW - Audio-to-visual generation KW - Diffusion models KW - Image generation KW - Audio features KW - Multi-modal generation AB - Accurately identifying bird species from their vocalizations and generating corresponding bird images is still a challenging task due to limited training data and environmental noise in audio data. To address this limitation, this paper introduces a diffusion-based audio-to-image generation approach that satisfies both the need to accurately identify bird sounds and generate bird images. The main idea is to use a conditional diffusion model to handle the complexities of bird audio data, such as pitch variations and environmental noise while establishing a robust connection between the auditory and visual domains. This enables the model to generate high-quality bird images based on the given bird audio input. Plus, the proposed approach is integrated with deep audio processing to enhance its capabilities by meticulously aligning audio features with visual information and learning to map intricate acoustic patterns to corresponding visual representations. Experimental results demonstrate the effectiveness of the proposed approach in generating better images for bird classes compared to previous methods