Audio Analysis

MUKU uses AI to:

  • Genre Analysis

  • Mood Analysis

Model

  • Genre Analysis

    • wav2vec 2.0

  • Mood Analysis (Multi-modal Model)

    • Whisper (Speech-to-Text (Lyrics Extraction))

    • BERT (Text-based Mood Analysis)

    • music2emo (Audio-based Mood Analysis)

Dataset Label

  • Gerne Analysis (12 Gernes)

    • Blues

    • EDM

    • Folk & Country

    • Funk & Soul

    • Jazz

    • K-pop

    • Latin

    • Metal

    • Pop

    • Rap Hip-Hop

    • Rock

    • R&B

  • Mood Analysis (13 Mood)

    • Aggressive

    • Calm

    • Chilled

    • Dark

    • Energetic

    • Epic

    • Ethereal

    • Happy

    • Romantic

    • Sad

    • Scary

    • Sexy

    • Uplifting

Audio Analysis Pipeline

  1. Audio Feature Extraction

Raw audio waveforms are processed by wav2vec 2.0 to extract high-level audio embeddings capturing rhythm, timbre, and structural patterns for genre classification.

  1. Audio-based Mood Analysis

In parallel, music2emo analyzes acoustic features to estimate emotion and mood-related characteristics directly from the audio signal.

  1. Lyrics Extraction & Analysis

Whisper transcribes sung vocals into text, which is then analyzed by BERT to infer semantic and emotional information from lyrics.

  1. Multimodal Fusion

Audio-based and text-based representations are combined to produce robust genre and mood predictions.

Last updated