Unleashing the Power of Cloud and AI: Automating Music Discovery with a Smartphone Camera
In this post, we'll explore how cloud computing and artificial intelligence can be harnessed to transform the way you discover and access new music. Learn how to use your smartphone camera to scan album covers, leverage AWS services like S3 and Rekognition to automatically identify the album, and then seamlessly integrate with the Spotify API to start listening. Discover the power of cloud-based solutions and AI-driven technology in enhancing your music experience.
Archit Soni
Amazon Employee
Published Sep 19, 2024
Last Modified Sep 23, 2024
Get ready to take your music experience to the next level! Imagine being able to scan an album art cover and instantly play the corresponding music on Spotify. With the power of Amazon S3, Rekognition, and Lambda, we can bring this innovative idea to life. In this community post, we'll dive into the technical details of how to build a system that uses computer vision to recognize album art covers, which then seamlessly plays the matching song on Spotify. From scanning to streaming, we'll explore the step-by-step process of creating this solution, as well as uncover the possibilities that emerge when AI, cloud technology, and music come together. Building this project is also a fun way to learn about various AWS services and how to integrate them.
- You need an AWS account to deploy this solution. If you don’t have an existing account, you can sign up for one. The instructions in this post use the AWS Region us-east-1. Make sure you deploy your resources in a Region with AWS Machine Learning services available.
- Set up the Boto3 AWS SDK and Python: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html
- Before proceeding, make sure you have the necessary permissions to utilize Amazon S3, AWS Lambda, and Amazon Rekognition. You can refer to the AWS documentation on IAM access management to ensure your credentials have the required permissions:
https://docs.aws.amazon.com/IAM/latest/UserGuide/access.html
- The use of these services will incur costs. If you access them through your own AWS account, you will be responsible for paying those costs.
- You will need to provide your own front-end or mobile application for the purpose of uploading the image to Amazon S3. An example mobile application is discussed below.
- For the purposes of an end-to-end solution, we recommend having a front-end set up where your users can upload images that they want detected and labeled. A sample mobile app is provided later in this post. To learn more about front-end deployment on AWS, refer to Front-end Web & Mobile on AWS.
- The picture taken by the user is stored in an Amazon Simple Storage Service (Amazon S3) bucket. This S3 bucket should be configured with a lifecycle policy that deletes the image after usage. To learn more about S3 lifecycle policies, see Managing your storage lifecycle.
- This architecture uses an AWS Lambda function that serves as the business logic for this solution. The Lambda function harnesses the power of Amazon Rekognition by using the Boto3 Python API. Amazon Rekognition is a cutting-edge computer vision service that uses machine learning (ML) models to analyze the uploaded images.
- We use Rekognition Custom Labels so that this solution can fit a personalized use case. With the aid of custom labels specifically trained to recognize various album covers, Amazon Rekognition accurately identifies the items present in the images.
- The album names are stored as keys in Amazon DynamoDB table, a fully managed NoSQL database service, along with their Spotify URIs. When a user scans an album, Rekognition detects the cover and responds with the label (i.e. album name). Lambda then uses DynamoDB to look up the corresponding Spotify link to play the album.
- Spotify is a music streaming platform that also offers an API, enabling developers to create applications that leverage its capabilities. In our use case, we make HTTP requests to Spotify's endpoint to specify which album should be played. This information is retrieved by a Lambda function through a DynamoDB lookup. Once Spotify authorization is obtained, the requested album begins playing.
Amazon Rekognition is a service that makes it easy to add powerful visual analysis to your applications. Rekognition offers pre-trained and customizable Computer Vision (CV) capabilities to allow users to detect information and gain insights from their images. Rekognition Image lets you easily build powerful applications to search, verify, and organize millions of images by classifying objects, scenes, activities, landmarks, faces, dominant colors, and image quality.
For further customization, you can use Amazon Rekognition Custom Labels. With Custom Labels, you can identify scenes and objects in your images that are specific to your business needs. Developing a custom model to analyze images is a significant undertaking that requires time, expertise, and resources, often taking months to complete. With Custom Labels, we take care of the heavy lifting for you.
Rekognition set-up:
- On the Amazon Rekognition Custom Labels console, select 'Projects' from the left sidebar.
- Click 'Create Project' and enter a project name.
- On the Project page, click 'Create Dataset'.
- Select the option 'Start with a training dataset and test dataset' to have more control over the training and testing images.
- Upload the images of the album covers you want to include in the database from various angles.
- For the training dataset, label the images based on the corresponding album names.
- Click 'Train Model' to start the training process.
- Review the performance metrics to ensure the model can accurately label the test images.
- Once training is successful, click on the model and navigate to the 'Use Model' section.
- Click 'Start' to begin using the custom image recognition model to detect the album covers it was trained on.
- The custom Rekognition model is now set up and ready to use for your application.
We will create a Lambda function that serves as the business logic for the solution where a mobile/web app uploads album artwork to an S3 bucket, triggering the Lambda function to use Amazon Rekognition Custom Labels to detect the album name, look up the album URI in a DynamoDB table, and then leverage the Spotify API to play the album.
For our Lambda functions to run successfully, Lambda requires an AWS Identity and Access Management (IAM) role and policy with the appropriate permissions. Complete the necessary steps outlined here to create and attach a Lambda execution role for the Lambda function to access all necessary actions for Rekognition, S3 and DynamoDB.
Lambda set-up:
- On the Lambda console, choose Functions in the navigation pane.
- Choose Create Lambda function.
- Choose Author from scratch.
- Name your function and choose Python 3.8 for Runtime, and choose Create function.
- Replace the text in Lambda function code with the following sample code and choose Save:
Next, you will create an S3 bucket to store the images you upload, which will automatically invokes the Lambda function after each upload. Complete the following steps to create the bucket and configure the Lambda function:
S3 set-up:
- Choose Create bucket.
- Enter a unique bucket name.
- On the Lambda console, navigate to the Lambda function you created.
- On the Configuration tab, choose Add trigger.
- Select the trigger type as S3 and choose the bucket you created.
- Set Event type to All object create events and choose Add.
- On the Amazon S3 console, navigate to the bucket you created.
- Under Properties and Event Notifications, choose Create event notification.
- Enter an event name (for example, Trigger LambdaFunctionName) and set the events to All object create events.
- For Destination, select Lambda Function and choose the Lambda function you created in the prior steps.
- Choose Save.
For the DynamoDB setup, you will create a table to store the mapping between the album names and their corresponding Spotify album URIs. This DynamoDB table will be used by the Lambda function to look up the album URI after detecting the album name using the Rekognition Custom Labels model.
DynamoDB set-up:
- On the DynamoDB console, choose Tables in the navigation pane.
- Choose Create table.
- For Table name, enter a name for the table.
- For Partition key, use ‘album’ (String).
- Verify that all entries on the page are accurate, leave the rest of the settings as default, and choose Create.
- After creating the table, navigate to the 'Items' tab and choose 'Create item'.
- For each album in your Rekognition training dataset, enter the album name as the 'album' partition key and the corresponding Spotify album URI as the 'uri' attribute.
- You can find the Spotify URI by navigating to the album's page on the Spotify website and copying the unique identifier from the URL (e.g. '41GuZcammIkupMPKH2OJ6I' for Astroworld).
- Repeat this process to add all album names and URIs from your Rekognition training dataset.
Mobile Application Code
In this section, we will discuss the steps involved in creating the mobile application.
In this section, we will discuss the steps involved in creating the mobile application.
- Select your preferred IDE and language for development. We are using Expo & React Native (JavaScript) to code this app.
Here's a sample code:
This code sample demonstrates the process of capturing a picture, converting it to a base64-encoded string, and then uploading the image to the specified S3 bucket. Once the image is uploaded to S3, it will trigger the Lambda function you created earlier. This Lambda function serves as the entry point for the solution pipeline, kicking off the subsequent steps of the process, such as using Amazon Rekognition Custom Labels to detect the album name, looking up the album URI in a DynamoDB table, and ultimately leveraging the Spotify API to play the album.
In this post, we've explored how to leverage AWS services to build a solution that can recognize album art covers and play the corresponding music on Spotify. By integrating computer vision, cloud storage, and serverless computing, we've uncovered an exciting way to enhance the music listening experience. From setup to mobile app integration, we've covered the key steps to bring this project to life. The ability to instantly identify album art opens up new possibilities for music enthusiasts, DJs, and discovery.
While this post focuses on a specific use case, the underlying principles can be applied more broadly wherever visual recognition and cloud automation can create value. As you continue with AWS, we encourage you to dive deeper into the technologies and techniques covered in this post, and check out these additional resources:
- Amazon Rekognition documentation: https://docs.aws.amazon.com/rekognition/
- AWS Lambda documentation: https://docs.aws.amazon.com/lambda/
- AWS S3 documentation: https://docs.aws.amazon.com/s3/
- Boto3 AWS SDK documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
AWS Serverless Application Model (SAM) documentation: https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.