White-Boxing and Personalization of Music Recommendation Models through Attention Visualization and Retraining

Tasuku Okada; Hiromitsu Shimakawa; Fumiko Harada

doi:10.47577/tssj.v81i1.13491

PDF

Published: Mar 8, 2026

DOI: https://doi.org/10.47577/tssj.v81i1.13491

Keywords:

Attention, Triplet Loss, Retraining, Music Recommendation, BERT, Personalization

Tasuku Okada

Ritsumeikan University

Hiromitsu Shimakawa

Ritsumeikan University

Fumiko Harada

Ritsumeikan University

Abstract

The widespread adoption of music streaming services allows users to access a vast number of songs with ease. However, identifying music that matches a user’s situational context or emotional state remains challenging. Recent text-based recommendation approaches using models such as BERT have made significant progress. However, when recommendations fail, users often cannot identify the reasons for failure, resulting in a recognition gap between users and the model. To address the issue, the study proposes an explainable and user-adaptive music recommendation method. The proposed method enables users to understand recommendation rationales and refine the model through feedback. First, the model is pre-trained using Triplet Loss on YouTube comments to capture music-related nuances. It then performs classification-based learning that takes user comments as input and outputs song recommendations. The method visualizes self-attention weights in BERT as heatmaps. The visualizations highlight the words the model focuses on during the recommendation process. Based on the visualizations, users can specify words to strengthen or weaken. The model is retrained using feedback-driven Triplets, enabling modification of attention distributions and recommendation results. Experiments have verified changes in attention distribution due to different learning methods, differences in recommendation results for similar queries, and human evaluation. Their results show that attention modification for user-specified words succeeds in approximately 60–70% of cases. Under conditions for practical usages, the success rate increases to around 80%. Post-retraining recommendations achieve high user satisfaction for top-ranked results. The highest-ranked recommendation attains an average satisfaction score of 4.15. The proposed method allows users to specify model errors, which directs the model toward a suitable one for their preferences.

Downloads

Download data is not yet available.

How to Cite

Okada, T., Shimakawa, H., & Harada, F. (2026). White-Boxing and Personalization of Music Recommendation Models through Attention Visualization and Retraining. Technium Social Sciences Journal, 81(1), 67–81. https://doi.org/10.47577/tssj.v81i1.13491