Unleashing the Power of Cloud and AI: Automating Music Discovery with a Smartphone Camera

Authors: Archit Soni, Tommy Xie, Michaelangelo Battaglia

Introduction

Get ready to take your music experience to the next level! Imagine being able to scan an album art cover and instantly play the corresponding music on Spotify. With the power of Amazon S3, Rekognition, and Lambda, we can bring this innovative idea to life. In this community post, we'll dive into the technical details of how to build a system that uses computer vision to recognize album art covers, which then seamlessly plays the matching song on Spotify. From scanning to streaming, we'll explore the step-by-step process of creating this solution, as well as uncover the possibilities that emerge when AI, cloud technology, and music come together. Building this project is also a fun way to learn about various AWS services and how to integrate them.

Setting up your development environment and AWS account

You need an AWS account to deploy this solution. If you don’t have an existing account, you can sign up for one. The instructions in this post use the AWS Region us-east-1. Make sure you deploy your resources in a Region with AWS Machine Learning services available.
Set up the Boto3 AWS SDK and Python: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html
- Before proceeding, make sure you have the necessary permissions to utilize Amazon S3, AWS Lambda, and Amazon Rekognition. You can refer to the AWS documentation on IAM access management to ensure your credentials have the required permissions:
  https://docs.aws.amazon.com/IAM/latest/UserGuide/access.html

Disclaimers

The use of these services will incur costs. If you access them through your own AWS account, you will be responsible for paying those costs.
You will need to provide your own front-end or mobile application for the purpose of uploading the image to Amazon S3. An example mobile application is discussed below.

Architecture

Image not found

Architecture

For the purposes of an end-to-end solution, we recommend having a front-end set up where your users can upload images that they want detected and labeled. A sample mobile app is provided later in this post. To learn more about front-end deployment on AWS, refer to Front-end Web & Mobile on AWS.
The picture taken by the user is stored in an Amazon Simple Storage Service (Amazon S3) bucket. This S3 bucket should be configured with a lifecycle policy that deletes the image after usage. To learn more about S3 lifecycle policies, see Managing your storage lifecycle.
This architecture uses an AWS Lambda function that serves as the business logic for this solution. The Lambda function harnesses the power of Amazon Rekognition by using the Boto3 Python API. Amazon Rekognition is a cutting-edge computer vision service that uses machine learning (ML) models to analyze the uploaded images.
We use Rekognition Custom Labels so that this solution can fit a personalized use case. With the aid of custom labels specifically trained to recognize various album covers, Amazon Rekognition accurately identifies the items present in the images.
The album names are stored as keys in Amazon DynamoDB table, a fully managed NoSQL database service, along with their Spotify URIs. When a user scans an album, Rekognition detects the cover and responds with the label (i.e. album name). Lambda then uses DynamoDB to look up the corresponding Spotify link to play the album.
Spotify is a music streaming platform that also offers an API, enabling developers to create applications that leverage its capabilities. In our use case, we make HTTP requests to Spotify's endpoint to specify which album should be played. This information is retrieved by a Lambda function through a DynamoDB lookup. Once Spotify authorization is obtained, the requested album begins playing.

Rekognition Set-Up

Amazon Rekognition is a service that makes it easy to add powerful visual analysis to your applications. Rekognition offers pre-trained and customizable Computer Vision (CV) capabilities to allow users to detect information and gain insights from their images. Rekognition Image lets you easily build powerful applications to search, verify, and organize millions of images by classifying objects, scenes, activities, landmarks, faces, dominant colors, and image quality.

For further customization, you can use Amazon Rekognition Custom Labels. With Custom Labels, you can identify scenes and objects in your images that are specific to your business needs. Developing a custom model to analyze images is a significant undertaking that requires time, expertise, and resources, often taking months to complete. With Custom Labels, we take care of the heavy lifting for you.

Rekognition set-up:

On the Amazon Rekognition Custom Labels console, select 'Projects' from the left sidebar.
Click 'Create Project' and enter a project name.
On the Project page, click 'Create Dataset'.
Select the option 'Start with a training dataset and test dataset' to have more control over the training and testing images.
Upload the images of the album covers you want to include in the database from various angles.
For the training dataset, label the images based on the corresponding album names.
Click 'Train Model' to start the training process.
Review the performance metrics to ensure the model can accurately label the test images.
Once training is successful, click on the model and navigate to the 'Use Model' section.
Click 'Start' to begin using the custom image recognition model to detect the album covers it was trained on.
The custom Rekognition model is now set up and ready to use for your application.

Create the Lambda function

We will create a Lambda function that serves as the business logic for the solution where a mobile/web app uploads album artwork to an S3 bucket, triggering the Lambda function to use Amazon Rekognition Custom Labels to detect the album name, look up the album URI in a DynamoDB table, and then leverage the Spotify API to play the album.

For our Lambda functions to run successfully, Lambda requires an AWS Identity and Access Management (IAM) role and policy with the appropriate permissions. Complete the necessary steps outlined here to create and attach a Lambda execution role for the Lambda function to access all necessary actions for Rekognition, S3 and DynamoDB.

Lambda set-up:

On the Lambda console, choose Functions in the navigation pane.
Choose Create Lambda function.
Choose Author from scratch.
Name your function and choose Python 3.8 for Runtime, and choose Create function.
Replace the text in Lambda function code with the following sample code and choose Save:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
import json
import os
import boto3
import urllib3
from botocore.exceptions import ClientError
# Initialize AWS session and clients
session = boto3.Session()
dynamodb = session.resource('dynamodb')
rekognition = session.client('rekognition')
# Get environment variables
TABLE_NAME = os.environ.get('DYNAMODB_TABLE_NAME')
PROJECT_VERSION_ARN = os.environ.get('PROJECT_VERSION_ARN')
BUCKET_NAME = os.environ.get('BUCKET_NAME')
SPOTIFY_API_KEY = os.environ.get('SPOTIFY_API_KEY')
def lambda_handler(event, context):
    """
    Main Lambda function handler for detecting album covers and playing them on Spotify.
    Args:
        event (dict): Lambda event payload
        context (dict): Lambda context object
    Returns:
        dict: Response object containing status code and response body
    """
    try:
        # Extract the S3 object key from the event
        obj_name = event['Records'][0]['s3']['object']['key']
        # Detect custom labels (album) using Amazon Rekognition
        album = detect_album(obj_name)
        # Get Spotify URI for the detected album
        spotify_uri = get_spotify_uri(album)
        # Play the album on Spotify
        status_code, response_body = play_spotify_album(spotify_uri)
        return {
            "statusCode": status_code,
            "body": response_body
        }
    except Exception as e:
        print(f"Error: {str(e)}")
        return {
            "statusCode": 500,
            "body": json.dumps({"error": str(e)})
        }
def detect_album(obj_name):
    """
    Detect the album using Amazon Rekognition custom labels.
    Args:
        obj_name (str): S3 object key of the image
    Returns:
        str: Detected album name
    """
    try:
        response = rekognition.detect_custom_labels(
            ProjectVersionArn=PROJECT_VERSION_ARN,
            Image={
                'S3Object': {
                    'Bucket': BUCKET_NAME,
                    'Name': obj_name,
                }
            }
        )
        return response['CustomLabels'][0]['Name']
    except ClientError as e:
        print(f"Rekognition error: {e.response['Error']['Message']}")
        raise
def get_spotify_uri(album):
    """
    Retrieve Spotify URI for the given album from DynamoDB.
    Args:
        album (str): Album name
    Returns:
        str: Spotify URI for the album
    """
    table = dynamodb.Table(TABLE_NAME)
    filter_expression = boto3.dynamodb.conditions.Attr('album').eq(album)
    try:
        response = table.scan(FilterExpression=filter_expression)
        items = response['Items']
        # Use pagination if there are more items
        while 'LastEvaluatedKey' in response:
            response = table.scan(
                FilterExpression=filter_expression,
                ExclusiveStartKey=response['LastEvaluatedKey']
            )
            items.extend(response['Items'])
        if not items:
            raise ValueError(f"No Spotify URI found for album: {album}")
        return items[0]['uri']
    except ClientError as e:
        print(f"DynamoDB error: {e.response['Error']['Message']}")
        raise
def play_spotify_album(uri):
    """
    Play the album on Spotify using the Spotify API.
    Args:
        uri (str): Spotify URI for the album
    Returns:
        tuple: HTTP status code and response body
    """
    url = "https://api.spotify.com/v1/me/player/play"
    headers = {
        "Authorization": f"Bearer {SPOTIFY_API_KEY}",
        "Content-Type": "application/json"
    }
    data = {
        "context_uri": f"spotify:album:{uri}",
        "position_ms": 0  # Start from the beginning of the album
    }
    try:
        with urllib3.PoolManager() as http:
            response = http.request(
                'PUT',
                url,
                body=json.dumps(data).encode('utf-8'),
                headers=headers
            )
        return response.status, response.data.decode('utf-8')
    except urllib3.exceptions.HTTPError as e:
        print(f"Spotify API error: {str(e)}")
        raise

Create an S3 bucket to store the images

Next, you will create an S3 bucket to store the images you upload, which will automatically invokes the Lambda function after each upload. Complete the following steps to create the bucket and configure the Lambda function:

S3 set-up:

Choose Create bucket.
Enter a unique bucket name.
On the Lambda console, navigate to the Lambda function you created.
On the Configuration tab, choose Add trigger.
Select the trigger type as S3 and choose the bucket you created.
Set Event type to All object create events and choose Add.
On the Amazon S3 console, navigate to the bucket you created.
Under Properties and Event Notifications, choose Create event notification.
Enter an event name (for example, Trigger LambdaFunctionName) and set the events to All object create events.
For Destination, select Lambda Function and choose the Lambda function you created in the prior steps.
Choose Save.

DynamoDB Set-Up

For the DynamoDB setup, you will create a table to store the mapping between the album names and their corresponding Spotify album URIs. This DynamoDB table will be used by the Lambda function to look up the album URI after detecting the album name using the Rekognition Custom Labels model.

DynamoDB set-up:

On the DynamoDB console, choose Tables in the navigation pane.
Choose Create table.
For Table name, enter a name for the table.
For Partition key, use ‘album’ (String).
Verify that all entries on the page are accurate, leave the rest of the settings as default, and choose Create.
After creating the table, navigate to the 'Items' tab and choose 'Create item'.
For each album in your Rekognition training dataset, enter the album name as the 'album' partition key and the corresponding Spotify album URI as the 'uri' attribute.
You can find the Spotify URI by navigating to the album's page on the Spotify website and copying the unique identifier from the URL (e.g. '41GuZcammIkupMPKH2OJ6I' for Astroworld).
Repeat this process to add all album names and URIs from your Rekognition training dataset.

Sample Mobile App

Mobile Application Code
In this section, we will discuss the steps involved in creating the mobile application.

Select your preferred IDE and language for development. We are using Expo & React Native (JavaScript) to code this app.

Here's a sample code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
// https://docs.expo.dev/versions/latest/sdk/camera/
// Import necessary modules
import { CameraView, useCameraPermissions } from 'expo-camera';
import { useState, useRef } from 'react';
import { Button, StyleSheet, Text, TouchableOpacity, View, Image } from 'react-native';
import AWS from 'aws-sdk';
import 'dotenv/config';   // Import the dotenv module to load environment variables

// Configure AWS SDK
AWS.config.update({
  accessKeyId: process.env.AWS_ACCESS_KEY_ID,
  secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
  region: process.env.AWS_REGION,
});

// Create a memoized S3 instance
const s3 = new AWS.S3();

export default function App() {
  const [facing, setFacing] = useState('back');
  const [permission, requestPermission] = useCameraPermissions();
  const [photoUri, setPhotoUri] = useState(null);
  const cameraRef = useRef(null);

  if (!permission) {
    // Camera permissions are still loading.
    return <View />;
  }

  if (!permission.granted) {
    // Camera permissions are not granted yet.
    return (
      <View style={styles.container}>
        <Text style={{ textAlign: 'center' }}>We need your permission to show the camera</Text>
        <Button onPress={requestPermission} title="Grant Permission" />
      </View>
    );
  }
 // Switch camera or back
  function toggleCameraFacing() {
    setFacing(current => (current === 'back' ? 'front' : 'back'));
  }
 //Click picture
  const takePicture = async () => {
    if (cameraRef.current) {
      const photo = await cameraRef.current.takePictureAsync();
      setPhotoUri(photo.uri);

      // Upload the image to S3
      uploadToS3(photo.uri, `image_${Date.now()}.jpg`);
    }
  };

// Move the uploadToS3 function outside the component and memoize it
  const uploadToS3 = async (imageUri, fileName) => {
    try {
      const file = await fetch(imageUri).then((response) => response.blob());
      const params = {
        Bucket: process.env.S3_BUCKET_NAME,
        Key: fileName,
        Body: file,
      };

      await s3.upload(params).promise();
      console.log('Image uploaded to S3 successfully');
    } catch (error) {
      console.error('Error uploading image to S3:', error);
    }
  };

  return (
    <View style={styles.container}>
      <CameraView style={styles.camera} ref={cameraRef} facing={facing}>
        <View style={styles.buttonContainer}>
          <TouchableOpacity style={styles.button} onPress={toggleCameraFacing}>
            <Text style={styles.text}>Flip Camera</Text>
          </TouchableOpacity>
          <TouchableOpacity style={styles.button} onPress={takePicture}>
            <Text style={styles.text}>Take Picture</Text>
          </TouchableOpacity>
        </View>
      </CameraView>
      {photoUri && (
        <Image source={{ uri: photoUri }} style={styles.previewImage} />
      )}
    </View>
  );
}
 // Memoize the styles object to prevent unnecessary recalculations
const styles = StyleSheet.create({
  container: {
    flex: 1,
    justifyContent: 'center',
  },
  camera: {
    flex: 1,
  },
  buttonContainer: {
    flexDirection: 'row',
    justifyContent: 'space-between',
    margin: 20,
  },
  button: {
    backgroundColor: '#000',
    padding: 10,
    borderRadius: 5,
  },
  text: {
    fontSize: 18,
    color: '#fff',
  },
  previewImage: {
    width: '100%',
    height: 'auto',
    marginTop: 10,
  },
});

This code sample demonstrates the process of capturing a picture, converting it to a base64-encoded string, and then uploading the image to the specified S3 bucket. Once the image is uploaded to S3, it will trigger the Lambda function you created earlier. This Lambda function serves as the entry point for the solution pipeline, kicking off the subsequent steps of the process, such as using Amazon Rekognition Custom Labels to detect the album name, looking up the album URI in a DynamoDB table, and ultimately leveraging the Spotify API to play the album.

Conclusion

In this post, we've explored how to leverage AWS services to build a solution that can recognize album art covers and play the corresponding music on Spotify. By integrating computer vision, cloud storage, and serverless computing, we've uncovered an exciting way to enhance the music listening experience. From setup to mobile app integration, we've covered the key steps to bring this project to life. The ability to instantly identify album art opens up new possibilities for music enthusiasts, DJs, and discovery.

Learn More

While this post focuses on a specific use case, the underlying principles can be applied more broadly wherever visual recognition and cloud automation can create value. As you continue with AWS, we encourage you to dive deeper into the technologies and techniques covered in this post, and check out these additional resources:

Amazon Rekognition documentation: https://docs.aws.amazon.com/rekognition/
AWS Lambda documentation: https://docs.aws.amazon.com/lambda/
AWS S3 documentation: https://docs.aws.amazon.com/s3/
Boto3 AWS SDK documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/index.html

AWS Serverless Application Model (SAM) documentation: https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Select your cookie preferences

Site Terms, Privacy, and more.

Unleashing the Power of Cloud and AI: Automating Music Discovery with a Smartphone Camera

Introduction

Setting up your development environment and AWS account

Disclaimers

Architecture

Rekognition Set-Up

Create the Lambda function

Create an S3 bucket to store the images

DynamoDB Set-Up

Sample Mobile App

Conclusion

Learn More

1 Comment