
How To Build a VTubing App With Amazon Interactive Video Service and VRoid
Live stream a 3D avatar that mimics your body movements.
- How to render a 3D virtual character
- How to animate a 3D virtual character with your own body movements
- How to live stream 3D your virtual character to Amazon IVS
About | |
---|---|
✅ AWS Level | Intermediate - 200 |
⏱ Time to complete | 60 minutes |
💰 Cost to complete | Free when using the AWS Free Tier |
🧩 Prerequisites | - AWS Account |
💻 Code Sample | GitHub |
📢 Feedback | Any feedback, issues, or just a 👍 / 👎 ? |
⏰ Last Updated | 2024-01-12 |
- Part 1 - Download Your 3D Character
- Part 2 - Setup HTML To Display the Camera Feed and Live Stream Controls
- Part 3 - Rendering a Virtual Character With three-vrm
- Part 4 - Animating a Virtual Character With Your Own Body Movements
- Part 5 - Live Stream Your Virtual Character to Amazon IVS
You can optionally integrate with the VRoid Hub API to programmatically download and use other 3D characters from VRoid Hub.
index.html
. In the <body>
element, we first add a <video>
element for displaying the front facing camera feed. This will be useful to see how well our avatar mimics our own movements. Additionally, add buttons to join what is known in Amazon IVS terminology as a stage. A stage is a virtual space where participants exchange audio and/or video. Joining a stage will enable us to live stream our avatar to the stage audience or other participants in the stage. We will also add a modal containing a form to add a participant token. A participant token can be thought of as a password needed to join a stage. It also identifies to Amazon IVS which stage someone wants to join. Later on in this tutorial, we will explain how to create a stage and a participant token. In the <head>
tag, we have added some CSS styling files which you can find on the Github repo here.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Simple VTuber Demo for Amazon IVS</title>
<meta name="description" content="Simple VTuber Demo for Amazon IVS" />
<link rel="stylesheet" href="style.css" />
<link rel="stylesheet" href="modal.css" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/normalize/8.0.1/normalize.css" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/milligram/1.4.1/milligram.css" />
</head>
<body>
<div class="preview">
<video class="input_video" width="1280px" height="720px" autoplay muted playsinline></video>
</div>
<nav>
<button id="settings-btn" class="button-open"" class="button button-outline">Settings</button>
<button id="join-stage-btn" class="button button-outline">Join stage</button>
</nav>
<section class="modal hidden">
<div>
<h3>Settings</h3>
</div>
<input type="text" id="participant-token" placeholder="Enter a participant token" />
<button class="btn" id="submit-btn">Submit</button>
</section>
<div class="overlay hidden"></div>
</body>
</html>
<script>
elements inside your <head>
element to use these libraries.1
2
3
4
<script src="https://unpkg.com/three@0.133.0/build/three.js"></script>
<script src="https://unpkg.com/three@0.133.0/examples/js/loaders/GLTFLoader.js"></script>
<script src="https://unpkg.com/@pixiv/three-vrm@0.6.7/lib/three-vrm.js"></script>
<script src="https://unpkg.com/three@0.133.0/examples/js/controls/OrbitControls.js"></script>
app.js
, to write the code for utilizing these libraries and the rest of this tutorial. Import it right before the closing </body>
tag as follows.1
2
<script src="app.js"></script>
</body>
app.js
, initialize an instance of the WebGLRenderer which we will be using to dynamically add a <canvas>
element to our HTML. This canvas element will be used to render our avatar. The currrentVrm
variable will be used later when we animate our avatar.1
2
3
4
5
6
let currentVrm;
const renderer = new THREE.WebGLRenderer({ alpha: true });
renderer.setSize(window.innerWidth, window.innerHeight);
renderer.setPixelRatio(window.devicePixelRatio);
document.body.appendChild(renderer.domElement);
PerspectiveCamera
that defines how much of the avatar is seen on screen in degrees, its aspect ratio, and how much of the avatar is seen on screen if it's panned further away from the camera. We also create an instance of OrbitControls
that will allow us to rotate the view of our avatar by clicking and dragging.1
2
3
4
5
6
7
8
9
10
11
12
const orbitCamera = new THREE.PerspectiveCamera(
35,
window.innerWidth / window.innerHeight,
0.1,
1000
);
orbitCamera.position.set(0.0, 1.4, 0.7);
const orbitControls = new THREE.OrbitControls(orbitCamera, renderer.domElement);
orbitControls.screenSpacePanning = true;
orbitControls.target.set(0.0, 1.4, 0.0);
orbitControls.update();
DirectionalLight
to add some light to the scene. Finally, create an instance of Three.Clock
so that we can use it later for managing and synchronizing the animation of our avatar.1
2
3
4
5
6
7
const scene = new THREE.Scene();
const light = new THREE.DirectionalLight(0xffffff);
light.position.set(1.0, 1.0, 1.0).normalize();
scene.add(light);
const clock = new THREE.Clock();
GLTFLoader
from Three.js to do just that.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Load our 3D character from a VRM file via a Cloudfront distribution
const loader = new THREE.GLTFLoader();
loader.crossOrigin = "anonymous";
loader.load(
// Replace this with the URL to your own VRM file
"https://d1l5n2avb89axj.cloudfront.net/avatar-first.vrm",
(gltf) => {
THREE.VRMUtils.removeUnnecessaryJoints(gltf.scene);
THREE.VRM.from(gltf).then((vrm) => {
scene.add(vrm.scene);
currentVrm = vrm;
currentVrm.scene.rotation.y = Math.PI;
});
},
(progress) =>
console.log(
"Loading model...",
100.0 * (progress.loaded / progress.total),
"%"
),
(error) => console.error(error)
);
<script>
elements for the Kalidokit library, MediaPipe Holistic library, and camera utility module from MediaPipe to the <head>
element in index.html
. MediaPipe Holistic is a computer vision pipeline used to track a user’s body movements, facial expressions, and hand gestures. This is useful for animating your digital avatar to mimic your own movements. Kalidokit includes the use of blendshapes for facial animation and kinematics solvers for body movements to create more realistic digital avatars. Blendshapes are a technique used in character animation to create a wide range of facial animations. Kinematics solvers are algorithms used to calculate the position and orientation of an avatar’s limbs. When making our avatar animate, aka character rigging, a kinematics solver helps determine how a character’s joints and bones should move to achieve a desired pose or animation. In short, MediaPipe Holistic tracks your physical movements while Kalidokit takes those as inputs to animate your avatar. The camera utility module from MediaPipe will simplify the process of providing our front-facing camera input to MediaPipe Holistic. MediaPipe Holistic needs this camera input to do hand, face and body movement tracking.1
2
3
4
5
6
7
8
9
<script
src="https://cdn.jsdelivr.net/npm/@mediapipe/holistic@0.5.1635989137/holistic.js"
crossorigin="anonymous"
></script>
<script src="https://cdn.jsdelivr.net/npm/kalidokit@1.1/dist/kalidokit.umd.js"></script>
<script
src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js"
crossorigin="anonymous"
></script>
animate
function in app.js
. In this function, we call requestAnimationFrame
, which is provided by the Kalidokit library. This function is used to smoothly update and render animations of our avatar in sync with the browser's refresh rate. It ensures fluid motion for tracking and applying real-time face, body, and hand movements captured from our camera. After defining, we also make sure to call it as when load app.js
.1
2
3
4
5
6
7
8
9
function animate() {
requestAnimationFrame(animate);
if (currentVrm) {
currentVrm.update(clock.getDelta());
}
renderer.render(scene, orbitCamera);
}
animate();
rigRotation
helper function involves adjusting the angles of the joints or bones in our avatar's digital skeleton to match our own movements. This includes movements like turning the head or bending an elbow. The rigPosition
helper function deals with moving the entire character or parts of it in the scene to follow our own positional movements. This could be movements like shifting side to side. The rigFace
helper function adjusts our avatar's facial structure to mirror our own facial movements like blinking and mouth movement for speaking.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
const rigRotation = (
name,
rotation = { x: 0, y: 0, z: 0 },
dampener = 1,
lerpAmount = 0.3
) => {
if (!currentVrm) {
return;
}
const Part = currentVrm.humanoid.getBoneNode(
THREE.VRMSchema.HumanoidBoneName[name]
);
if (!Part) {
return;
}
let euler = new THREE.Euler(
rotation.x * dampener,
rotation.y * dampener,
rotation.z * dampener
);
let quaternion = new THREE.Quaternion().setFromEuler(euler);
Part.quaternion.slerp(quaternion, lerpAmount);
};
const rigPosition = (
name,
position = { x: 0, y: 0, z: 0 },
dampener = 1,
lerpAmount = 0.3
) => {
if (!currentVrm) {
return;
}
const Part = currentVrm.humanoid.getBoneNode(
THREE.VRMSchema.HumanoidBoneName[name]
);
if (!Part) {
return;
}
let vector = new THREE.Vector3(
position.x * dampener,
position.y * dampener,
position.z * dampener
);
Part.position.lerp(vector, lerpAmount);
};
let oldLookTarget = new THREE.Euler();
const rigFace = (riggedFace) => {
if (!currentVrm) {
return;
}
rigRotation("Neck", riggedFace.head, 0.7);
const Blendshape = currentVrm.blendShapeProxy;
const PresetName = THREE.VRMSchema.BlendShapePresetName;
riggedFace.eye.l = lerp(
clamp(1 - riggedFace.eye.l, 0, 1),
Blendshape.getValue(PresetName.Blink),
0.5
);
riggedFace.eye.r = lerp(
clamp(1 - riggedFace.eye.r, 0, 1),
Blendshape.getValue(PresetName.Blink),
0.5
);
riggedFace.eye = Kalidokit.Face.stabilizeBlink(
riggedFace.eye,
riggedFace.head.y
);
Blendshape.setValue(PresetName.Blink, riggedFace.eye.l);
Blendshape.setValue(
PresetName.I,
lerp(riggedFace.mouth.shape.I, Blendshape.getValue(PresetName.I), 0.5)
);
Blendshape.setValue(
PresetName.A,
lerp(riggedFace.mouth.shape.A, Blendshape.getValue(PresetName.A), 0.5)
);
Blendshape.setValue(
PresetName.E,
lerp(riggedFace.mouth.shape.E, Blendshape.getValue(PresetName.E), 0.5)
);
Blendshape.setValue(
PresetName.O,
lerp(riggedFace.mouth.shape.O, Blendshape.getValue(PresetName.O), 0.5)
);
Blendshape.setValue(
PresetName.U,
lerp(riggedFace.mouth.shape.U, Blendshape.getValue(PresetName.U), 0.5)
);
let lookTarget = new THREE.Euler(
lerp(oldLookTarget.x, riggedFace.pupil.y, 0.4),
lerp(oldLookTarget.y, riggedFace.pupil.x, 0.4),
0,
"XYZ"
);
oldLookTarget.copy(lookTarget);
currentVrm.lookAt.applyer.lookAt(lookTarget);
};
animateVRM
function which will receive real-time landmark data from the MediaPipe Holistic library via the results
argument. Using this landmark data, we can pass it to Kalidokit to animate the corresponding body parts of our avatar. Once we have the landmark data, we call the rigging helper functions we just created to animate our avatar.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
const animateVRM = (vrm, results) => {
if (!vrm) {
return;
}
let riggedPose, riggedLeftHand, riggedRightHand, riggedFace;
const faceLandmarks = results.faceLandmarks;
const pose3DLandmarks = results.ea;
const pose2DLandmarks = results.poseLandmarks;
const leftHandLandmarks = results.rightHandLandmarks;
const rightHandLandmarks = results.leftHandLandmarks;
// Animate Face
if (faceLandmarks) {
riggedFace = Kalidokit.Face.solve(faceLandmarks, {
runtime: "mediapipe",
video: videoElement,
});
rigFace(riggedFace);
}
// Animate Pose
if (pose2DLandmarks && pose3DLandmarks) {
riggedPose = Kalidokit.Pose.solve(pose3DLandmarks, pose2DLandmarks, {
runtime: "mediapipe",
video: videoElement,
});
rigRotation("Hips", riggedPose.Hips.rotation, 0.7);
rigPosition(
"Hips",
{
x: -riggedPose.Hips.position.x, // Reverse direction
y: riggedPose.Hips.position.y + 1, // Add a bit of height
z: -riggedPose.Hips.position.z, // Reverse direction
},
1,
0.07
);
rigRotation("Chest", riggedPose.Spine, 0.25, 0.3);
rigRotation("Spine", riggedPose.Spine, 0.45, 0.3);
rigRotation("RightUpperArm", riggedPose.RightUpperArm, 1, 0.3);
rigRotation("RightLowerArm", riggedPose.RightLowerArm, 1, 0.3);
rigRotation("LeftUpperArm", riggedPose.LeftUpperArm, 1, 0.3);
rigRotation("LeftLowerArm", riggedPose.LeftLowerArm, 1, 0.3);
rigRotation("LeftUpperLeg", riggedPose.LeftUpperLeg, 1, 0.3);
rigRotation("LeftLowerLeg", riggedPose.LeftLowerLeg, 1, 0.3);
rigRotation("RightUpperLeg", riggedPose.RightUpperLeg, 1, 0.3);
rigRotation("RightLowerLeg", riggedPose.RightLowerLeg, 1, 0.3);
}
// Animate Hands
if (leftHandLandmarks) {
riggedLeftHand = Kalidokit.Hand.solve(leftHandLandmarks, "Left");
rigRotation("LeftHand", {
z: riggedPose.LeftHand.z,
y: riggedLeftHand.LeftWrist.y,
x: riggedLeftHand.LeftWrist.x,
});
rigRotation("LeftRingProximal", riggedLeftHand.LeftRingProximal);
rigRotation("LeftRingIntermediate", riggedLeftHand.LeftRingIntermediate);
rigRotation("LeftRingDistal", riggedLeftHand.LeftRingDistal);
rigRotation("LeftIndexProximal", riggedLeftHand.LeftIndexProximal);
rigRotation("LeftIndexIntermediate", riggedLeftHand.LeftIndexIntermediate);
rigRotation("LeftIndexDistal", riggedLeftHand.LeftIndexDistal);
rigRotation("LeftMiddleProximal", riggedLeftHand.LeftMiddleProximal);
rigRotation(
"LeftMiddleIntermediate",
riggedLeftHand.LeftMiddleIntermediate
);
rigRotation("LeftMiddleDistal", riggedLeftHand.LeftMiddleDistal);
rigRotation("LeftThumbProximal", riggedLeftHand.LeftThumbProximal);
rigRotation("LeftThumbIntermediate", riggedLeftHand.LeftThumbIntermediate);
rigRotation("LeftThumbDistal", riggedLeftHand.LeftThumbDistal);
rigRotation("LeftLittleProximal", riggedLeftHand.LeftLittleProximal);
rigRotation(
"LeftLittleIntermediate",
riggedLeftHand.LeftLittleIntermediate
);
rigRotation("LeftLittleDistal", riggedLeftHand.LeftLittleDistal);
}
if (rightHandLandmarks) {
riggedRightHand = Kalidokit.Hand.solve(rightHandLandmarks, "Right");
rigRotation("RightHand", {
z: riggedPose.RightHand.z,
y: riggedRightHand.RightWrist.y,
x: riggedRightHand.RightWrist.x,
});
rigRotation("RightRingProximal", riggedRightHand.RightRingProximal);
rigRotation("RightRingIntermediate", riggedRightHand.RightRingIntermediate);
rigRotation("RightRingDistal", riggedRightHand.RightRingDistal);
rigRotation("RightIndexProximal", riggedRightHand.RightIndexProximal);
rigRotation(
"RightIndexIntermediate",
riggedRightHand.RightIndexIntermediate
);
rigRotation("RightIndexDistal", riggedRightHand.RightIndexDistal);
rigRotation("RightMiddleProximal", riggedRightHand.RightMiddleProximal);
rigRotation(
"RightMiddleIntermediate",
riggedRightHand.RightMiddleIntermediate
);
rigRotation("RightMiddleDistal", riggedRightHand.RightMiddleDistal);
rigRotation("RightThumbProximal", riggedRightHand.RightThumbProximal);
rigRotation(
"RightThumbIntermediate",
riggedRightHand.RightThumbIntermediate
);
rigRotation("RightThumbDistal", riggedRightHand.RightThumbDistal);
rigRotation("RightLittleProximal", riggedRightHand.RightLittleProximal);
rigRotation(
"RightLittleIntermediate",
riggedRightHand.RightLittleIntermediate
);
rigRotation("RightLittleDistal", riggedRightHand.RightLittleDistal);
}
};
<video>
element. We then pass in the <video>
element from our HTML to MediaPipe Holistic so that it process it and provide landmark data. Once Holistic finishes processing the camera data from the <video>
element, it invokes a callback function with the resulting landmark data. Those results are then passed to the animateVRM
function we created earlier to animate our avatar.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
let videoElement = document.querySelector(".input_video");
const onResults = (results) => {
// Animate model
animateVRM(currentVrm, results);
};
const holistic = new Holistic({
locateFile: (file) => {
return `https://cdn.jsdelivr.net/npm/@mediapipe/holistic@0.5.1635989137/${file}`;
},
});
holistic.setOptions({
modelComplexity: 1,
smoothLandmarks: true,
minDetectionConfidence: 0.7,
minTrackingConfidence: 0.7,
refineFaceLandmarks: true,
});
// Pass holistic a callback function
holistic.onResults(onResults);
// Use `Mediapipe` utils to get camera
const camera = new Camera(videoElement, {
onFrame: async () => {
await holistic.send({ image: videoElement });
},
width: 640,
height: 480,
});
camera.start();
- Stage: A virtual space where participants exchange audio or video. The Stage class is the main point of interaction between the host application and the SDK.
- StageStrategy: An interface that provides a way for the host application to communicate the desired state of the stage to the SDK
- Events: You can use an instance of a stage to communicate state changes such as when someone leaves or joins it, among other events.
1
2
3
const init = async () => {
const avatarStream = renderer.domElement.captureStream();
};
1
2
3
4
5
6
7
8
9
10
11
const init = async () => {
const avatarStream = renderer.domElement.captureStream();
joinBtn.addEventListener("click", () => {
if (tokenInput.value.length === 0) {
openModal();
} else {
joinStage(avatarStream);
}
});
};
joinStage
function. In this function, we’re going to get the MediaStream from the user’s microphone so that we can publish it to the stage. Publishing is the act of sending audio and/or video to the stage so other participants can see or hear the participant that has joined.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
const joinStage = async (avatarStream) => {
if (connected || joining) {
return;
}
joining = true;
joinBtn.addEventListener("click", () => {
leaveStage();
joinBtn.innerText = "Leave Stage";
});
const token = tokenInput.value;
if (!token) {
window.alert("Please enter a participant token");
joining = false;
return;
}
localMic = await navigator.mediaDevices.getUserMedia({
video: false,
audio: true,
});
avatarStageStream = new LocalStageStream(avatarStream.getVideoTracks()[0]);
micStageStream = new LocalStageStream(localMic.getAudioTracks()[0]);
const strategy = {
stageStreamsToPublish() {
return [avatarStageStream, micStageStream];
},
shouldPublishParticipant() {
return true;
},
shouldSubscribeToParticipant() {
return SubscribeType.AUDIO_VIDEO;
},
};
stage = new Stage(token, strategy);
};
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Other available events:
// https://aws.github.io/amazon-ivs-web-broadcast/docs/sdk-guides/stages#events
stage.on(StageEvents.STAGE_CONNECTION_STATE_CHANGED, (state) => {
connected = state === ConnectionState.CONNECTED;
if (connected) {
joining = false;
}
});
const leaveStage = async () => {
stage.leave();
joining = false;
connected = false;
};
stage.on(StageEvents.STAGE_PARTICIPANT_JOINED, (participant) => {
console.log("Participant Joined:", participant);
});
stage.on(
StageEvents.STAGE_PARTICIPANT_STREAMS_ADDED,
(participant, streams) => {
console.log("Participant Media Added: ", participant, streams);
}
);
stage.on(StageEvents.STAGE_PARTICIPANT_LEFT, (participant) => {
console.log("Participant Left: ", participant);
});
try {
await stage.join();
} catch (err) {
joining = false;
connected = false;
console.error(err.message);
}
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.