Audio Analysis
MUKU uses AI to:
Genre Analysis
Mood Analysis
Model
Genre Analysis
wav2vec 2.0
Mood Analysis (Multi-modal Model)
Whisper (Speech-to-Text (Lyrics Extraction))
BERT (Text-based Mood Analysis)
music2emo (Audio-based Mood Analysis)
Dataset Label
Gerne Analysis (12 Gernes)
BluesEDMFolk & CountryFunk & SoulJazzK-popLatinMetalPopRap Hip-HopRockR&B
Mood Analysis (13 Mood)
AggressiveCalmChilledDarkEnergeticEpicEtherealHappyRomanticSadScarySexyUplifting
Audio Analysis Pipeline
Audio Feature Extraction
Raw audio waveforms are processed by wav2vec 2.0 to extract high-level audio embeddings capturing rhythm, timbre, and structural patterns for genre classification.
Audio-based Mood Analysis
In parallel, music2emo analyzes acoustic features to estimate emotion and mood-related characteristics directly from the audio signal.
Lyrics Extraction & Analysis
Whisper transcribes sung vocals into text, which is then analyzed by BERT to infer semantic and emotional information from lyrics.
Multimodal Fusion
Audio-based and text-based representations are combined to produce robust genre and mood predictions.
Last updated