Spotify gained recognition in the audio-streaming industry for its highly personalized user experience, achieved through artificial intelligence and a workforce of 9,800 employees as of the end of 2022.
However, following three rounds of layoffs in a single year, including 590 positions in January, 200 in June, and an additional 1,500 recently, Spotify’s emphasis on AI investments to improve profit margins in its podcasting and audiobook divisions appears to represent a significant shift in strategy. This strategic pivot is met with confidence by Wall Street, which believes it can be successful.
According to Justin Patterson, an equity research analyst at KeyBanc Capital Markets, “Spotify is leveraging AI throughout its platform, introducing AI DJ to simulate a traditional radio experience in 50 more markets and introducing AI Voice Translation for podcasts. Alongside the introduction of audiobooks for Premium Subscribers, we believe Spotify has multiple opportunities to enhance user engagement and ultimately achieve stronger monetization.”
Shares of Spotify Technology SA’s parent company have surged by over 30% in the past six months and have seen an impressive year-to-date increase of more than 135%.
Like many other tech firms, Spotify is retrenching due to the decline in demand that was initially driven by the pandemic. Additionally, the company is facing the challenge of recovering from the more than $1 billion it invested in podcasting. Much of this investment went towards celebrity podcast deals that never materialized and the acquisition of podcast studios that were subsequently closed down.
In a letter to the company’s staff, posted on its website, Spotify CEO Daniel Ek acknowledged the challenging economic landscape, stating that “economic growth has slowed dramatically, and capital has become more expensive. Spotify is not immune to these economic realities.”
Embracing the AI trend or capitalizing on the AI boom.
In November, Spotify announced a collaboration with Google Cloud to revamp its audiobook and podcast recommendation system by utilizing Google Cloud’s Vertex AI Search, which incorporates large language models. These models, such as ChatGPT, are computer programs trained on extensive datasets and can generate human-like text and information based on their knowledge.
Earlier in February, Spotify introduced an “AI DJ” feature, and it also integrated OpenAI’s “Whisper” voice translation tool to translate specific episodes of English podcasts into Spanish, French, and German.
A Spotify spokesperson mentioned in an email to CNN that the company intends to expand this technology in the future, pending feedback from content creators and audiences. They also referred to comments made by Spotify CEO Daniel Ek during the third-quarter earnings call, where the emphasis on “efficiency” was mentioned repeatedly.
During Spotify’s October earnings call, CEO Daniel Ek emphasized that the primary objective of their AI initiatives is to enhance user engagement, which, in turn, reduces customer attrition (churn). He further explained that greater engagement leads to the creation of more value for consumers, and this improved value-to-price ratio enables Spotify to successfully increase prices, as they did in the previous quarter.
Douglas Anmuth, the Managing Director and Internet Analyst at JP Morgan, noted in a research note that investments in podcasts, coupled with artist-driven advertising investments, have the potential to drive long-term user engagement, underscoring their importance for the platform’s growth and success.
What is the mechanism behind personalization?
For approximately a decade, Spotify has been delivering an extremely personalized user experience. This level of personalization became possible after the acquisition of music analytics firm The Echo Nest Corp in 2014, which allowed Spotify to combine machine learning and natural language processing techniques.
Spotify’s technology operates by creating a comprehensive database of songs and artists through the recognition of musical elements like pitches and tempos. It also establishes connections between artists within a shared cultural context. Additional metadata, such as release dates, and various metrics like volume, duration, and a song’s propensity to make someone dance, play a role in determining which songs align with a user’s preferences.
Based on this data, Spotify generates playlists like “Daily Mix” and “Discover Weekly.” It also creates playlists like “Time Capsules” and “On Repeat” that gather a user’s most-listened-to songs, either to keep users engaged with their current preferences or to reintroduce them to songs they haven’t heard in a while.
Anil Jain, the Global Managing Director of Strategic Consumer Industries at Google Cloud, mentioned in an email to CNN that their Vertex AI Search technology enables media and entertainment companies to establish content discovery capabilities across various types of content, including video, audio, images, and text. However, Jain did not provide specific details about the arrangement with Spotify.
Vertex AI Search takes into account multiple factors when suggesting content to users, including real-time user behavior, content similarity, and content related to users’ search queries.
Challenges and potential advantages.
Reece Hayden, a senior analyst at ABI Research, is optimistic about the potential of large language models (LLMs) to enhance engagement on Spotify’s platform. He believes that LLMs can contribute to better personalization and recommendations by understanding the entirety of text and video content, rather than relying solely on keywords and metadata. LLMs can assess podcasts comprehensively to determine if they align with user interests and can gain deeper insights into user preferences by analyzing all available user data.
However, there are associated challenges. Running LLMs to analyze all podcasts and audiobooks is resource-intensive and may not always provide significantly better results compared to basic predictive models. LLMs also introduce data privacy and cost/resource concerns, which can be substantial.
Hayden expressed confidence in the Whisper translation tool’s ability to translate podcasts, but acknowledged that generative AI like Whisper may make occasional mistakes, resulting in inaccuracies or misinterpretations. He believes that as more data becomes available, translation models like Whisper will rapidly improve their accuracy. Nevertheless, Whisper’s primary strength lies in translating from other languages into English, which may limit its effectiveness for podcasts predominantly recorded in English.