FaceMesh – Detect Faces and Expressions in Scratch

Table of Contents

😎 FaceMesh - Detect Faces and Expressions in Scratch
🌟 Overview
✨ Key Features
🚀 How to Use
🧱 Blocks and Functions
🎓 Educational Uses
🎮 Example Projects
🔧 Tips and Troubleshooting
- 😎 FaceMesh Specific Tips
🔒 Privacy and Safety
🧪 Technical Info
🔗 Related Extensions
- 💡 Why Choose FaceMesh?
📚 Learn More

😎 FaceMesh – Detect Faces and Expressions in Scratch #

The FaceMesh extension brings real AI-powered face tracking into Scratch.
It lets your code in scratch react to your expressions, head movement, and gestures – all in real time, right in your browser, with no setup required.
Simple enough for students, powerful enough for creative classrooms. 😊

FaceMesh Extension Sample

🌟 Overview #

Detect 4 Faces: Detect up to 4 faces at once.
478 Landmarks: Track 468 facial keypoints (eyes, nose, mouth, chin, etc.) plus 10 iris-specific keypoints.
Read Coordinates: Get X/Y positions of any face landmark on the Scratch stage.
Measure: Calculate angles and distances between face landmarks.
Change Camera Preview: Show, hide, or flip the live camera view to match your setup.
Choose Input: Analyze from the live camera or directly from the Scratch stage image.

✨ Key Features #

Multi-face tracking (1–4 faces).
Friendly dropdown for common face parts.
Adjustable “classify” intervals for smooth performance.
Camera preview, transparency, and device controls.
Works fully in-browser – safe and private.

🚀 How to Use #

Go to: pishi.ai/play
Open the Extensions section.
Select the FaceMesh extension.
Allow camera access if prompted and check that your video preview appears.
If no cameras are detected, the input will automatically switch to the stage image instead.
Once the extension loads successfully, continuous face detection in no delay mode starts automatically.
Now you can use the position or measurement blocks to make sprites react to your face – move, smile, blink, or tilt your head to control your project!

Tips

Good lighting helps the model detect your face better.
Use person no: 1–4 when multiple faces appear.
For classrooms or older devices, start the classification at 100–250 ms intervals for smooth performance.

🧱 Blocks and Functions #

📍 Position & Count #

x of keypoint no: [KEYPOINT_NUMBER|text] person no: [PERSON_NUMBER]

y of keypoint no: [KEYPOINT_NUMBER|text] person no: [PERSON_NUMBER]

Reports the X or Y position of a facial landmark on the stage.
KEYPOINT_MENU_ITEM: choose from the dropdown list (eyes, nose, lips, etc.).
KEYPOINT_NUMBER: enter a keypoint index between 0–477.
Click on image below to see full size landmark numbering:

PERSON_NUMBER: selects which face to track (1–4). “1” = first detected face.
Returns empty if no face is detected.

face count

Reports the number of faces currently detected (0–4).

📏 Measurements #

angle between keypoints: [KEYPOINT_1|text] and [KEYPOINT_2|text] object no: [PERSON_NUMBER]

Angle (in degrees) between two landmarks on one face – great for detecting head tilt or nods.

distance between keypoints: [KEYPOINT_1|text] and [KEYPOINT_2|text] object no: [PERSON_NUMBER]

Distance in stage pixels between two landmarks – perfect for mouth-open or eye-blink detection.

Notes:
Default keypoints: 454 (left cheek) and 234 (right cheek).
Coordinates are stage-centered (X ≈ −240…240, Y ≈ −180…180).
When the video is mirrored, the X coordinate values are also flipped to match what you see on-screen.

⚙️ Classification Controls #

classify [INTERVAL] - Choose how often detection runs:
- every time this block runs
- continuous, without delay
- continuous, every 50–2500 ms
turn classification [on/off] - start or stop continuous detection.
classification interval - reports the current interval in milliseconds.
continuous classification - reports continuous detection is “on” or “off”.
select input image [camera/stage] - choose camera or stage.
input image - reports the active input source.

🎥 Video Controls #

turn video [off/on/on-flipped]
- on: shows the camera preview in a mirrored view (like a typical webcam or mirror).
- on-flipped: shows the camera preview in a non-mirrored view — directions appear as in the real world.
- off: turns off the camera preview. In stage input mode, detection continues to run.
set video transparency to [TRANSPARENCY|text] — adjusts how visible the camera preview is:
- 0: fully visible (solid image)
- 100: fully transparent (invisible but active)
Use a value from 0–100 to adjust the camera background’s opacity and help sprites appear clearly over the camera preview.
select camera [CAMERA] — chooses among available cameras on your device. The dropdown lists all detected cameras, and the extension switches automatically to the one you select.

😊 Common Keypoints (Handy Numbers) #

Use these shortcut keypoints for common facial regions. You can also enter any index manually from 0–477.

1: nose tip , 10: forehead center , 152: chin, 473: left eye iris center , 468: right eye iris center , 291 / 61: mouth left / right corners , 0 / 17: upper / lower mid lip , 362 / 263: left eye inner / outer , 133 / 33: right eye inner / outer , 159 / 145: right eye top / bottom, 386 / 374: left eye top / bottom

Numbers match the landmark indices used by MediaPipe FaceMesh.

🎓 Educational Uses #

Explore AI and computer vision concepts visually – understand how computers recognize and track faces.
Teach coordinate systems by linking head or eye movement to sprite positions on the stage.
Apply math and geometry to measure angles, distances, and facial symmetry.
Create interactive art or accessibility projects that respond to expressions or gestures.

🎮 Example Projects #

Talking Sprite: Measure lip distance to animate a mouth or trigger speech.
Head-Tilt Controller: Tilt your head left or right to steer a sprite or car.
Blink to Jump: Detect eyelid closure and make your character jump – a fun no-hands controller!
Face Counter Game: Start only when a face is detected – bonus points for two faces at once!
Smile to Win: Create a game that rewards smiles or happy expressions.
Emoji Match: Copy the displayed emoji’s expression to score points.

🧩 Try it yourself: pishi.ai/play

🔧 Tips and Troubleshooting #

No camera?
• Make sure your camera is connected and browser permission is allowed.
• If the camera is blocked, enable it in your browser’s site settings and reload the page.
• During extension load, if no cameras are detected, the input will automatically switch to the stage image so you can still test FaceMesh features.
No detection?
• continuous classification: Use this reporter to see if classification is active.
• If it is active, improve lighting and face the camera directly.
• turn classification [on]: Use this block, if classification is not active, then recheck the classification status with the above reporter.
• In camera input mode, when the camera is turned off, classification is also stopped - you must turn the video back on or switch input to stage.
• In stage input mode, the system classifies whatever is visible on the stage - backdrops, sprites, or images. You can turn off the video completely and still process stage images.
• Stage mode is slower than camera input, so lower your classification interval (e.g., 100–250 ms) for smoother results using this block: classify [INTERVAL]
• In stage mode, “left” and “right” landmarks are swapped because the stage image is not mirrored - coordinate space represents a real (non-mirrored) view.
• Classification can also restart automatically when you use blocks such as:
turn video [on] / classify [INTERVAL] / select camera [CAMERA] / select input image [camera/stage].
Flipped view?
turn video [on-flipped]: Use this to show the camera without mirroring. “on” mirrors like a selfie; “on-flipped” shows real left/right orientation.
Laggy or slow?
Use classification intervals between 100–250 ms or close other browser tabs to reduce processing load.
WebGL2 warning?
Try Firefox, or a newer device that supports WebGL2 graphics acceleration.
Analyze stage instead of camera?
select input image [stage]: Use this to analyze the Scratch stage image instead of a live camera feed.

😎 FaceMesh Specific Tips #

No face detected? Make sure your full face is visible and well-lit – avoid backlighting from windows.
Tracking unstable? Keep your head steady and face the camera directly for best accuracy.
Multiple faces? Use person no: 1–4 to select which face to track. Person 1 is usually the largest/closest face.
Landmarks jittery? Increase the classification interval (e.g., 150–200 ms) to smooth out rapid fluctuations.
Need precise eye/iris tracking? Use keypoints 468–477 for iris centers and edges – great for gaze direction or blink detection.
Mouth not opening? Measure distance between keypoints 0 (upper lip) and 17 (lower lip) – larger values = mouth open.
Detect head tilt? Calculate angle between keypoints 454 (left cheek) and 234 (right cheek) – deviation from 0° = tilt.
Detect head nod (up/down)? Measure the vertical distance between keypoints 10 (forehead) and 152 (chin) – distance is largest when facing forward and decreases as you nod up or down. Useful for nod gestures or up/down attention tracking.
Smile detection? Measure distance between mouth corners (keypoints 61 and 291) – wider = smile.
Stage mode with photos? Remember landmarks are not mirrored in stage mode – left/right are true anatomical positions.

🔒 Privacy and Safety #

Everything runs locally in your browser.
No images or video are uploaded anywhere.
Model files may download once for offline use.
Always ask a teacher or parent before using the camera.
Anytime, you can safely turn video [off].

🧪 Technical Info #

Model: MediaPipe Face Mesh
FrameWork: TensorFlow.js (latest version) – runs fully in-browser using WebGL2 acceleration
Faces: up to 4
Landmarks: 478 total (0–477)
Coordinates: stage-centered pixels (X right, Y up)
Mirroring: “on” = mirrored preview, “on-flipped” = true view
Inputs: camera or stage canvas
Default keypoints: 454 (left cheek), 234 (right cheek)
Requires: WebGL2 for best performance

🔗 Related Extensions #

🖐️ Hand Pose – detect hand landmarks
🕺 PoseNet – track body pose
🖼️ Image Trainer – build custom AI models
🏫 Google Teachable Machine – import your own TM models

Feature	MIT Scratch Face Sensing	Pishi.ai FaceMesh
Detection Type	Simple face rectangle detection (bounding box only).	Advanced 3D face landmark detection with 478 keypoints.
Faces Supported	1 face	Up to 4 faces simultaneously
Keypoints / Landmarks	None (just general position and size).	468 standard landmarks + 10 iris keypoints (eyes, lips, chin, nose, etc.)
Expression Tracking	Limited - only “face present” or “moved” detection.	Full facial geometry - can measure smiles, blinks, mouth open, tilt, nod, or eyebrow raise.
Input Source	Camera only	Camera or stage image (for analyzing photos or screenshots). In stage mode, camera and stage can also be classified together for combined analysis.
Video Controls	Turn video on, on and flipped, off, and change video preview transparency.	Camera selection support as well as turn video on, on and flipped, off only, and change video preview transparency.
Performance Tuning	Fixed speed.	Flexible detection modes - adjustable interval (every 50–2500 ms), continuous detection, or detection on block click. Classification can also be turned on or off at any time for performance control or teaching demonstrations.
Privacy	Runs locally in browser.	100% local, no upload - same privacy level, but works even offline after model load.
Educational Focus	Simple introduction to face detection.	Deep exploration of AI and computer vision concepts - geometry, math, and interaction design.

↔ Swipe left or right to view full table on mobile

💡 Why Choose FaceMesh? #

FaceMesh offers a much richer and more precise understanding of faces. It goes beyond detecting that a face exists – it tells you what the face is doing.

Link expressions and movement directly to sprite behavior.
Measure angles, distances, and positions for STEM and AI lessons.
Create gesture-based games or emotion-reactive characters – all inside Scratch.
Learn real computer vision concepts that scale to professional AI frameworks like MediaPipe.

In short:
MIT’s Face Sensing is great for simple, fun introductions.
Pishi.ai FaceMesh is for creators who want real AI-powered expression tracking – still easy, but dramatically more capable.

📚 Learn More #

What are your Feelings

Still stuck? How can we help?

Updated on January 8, 2026

Arduino Extension for Scratch

Facemesh Extension for Scratch

Handpose Extension for Scratch

PoseNet Extension for Scratch

Image Trainer Extension for Scratch

Google Teachable Machine for Scratch

Speech Recognition for Scrach

ChatGPT for Scratch

AI Image Generator for Scratch

Translate for Scratch

Text to Speech for Scratch

Face Sensing for Scratch

FaceMesh – Detect Faces and Expressions in Scratch

😎 FaceMesh – Detect Faces and Expressions in Scratch #

🌟 Overview #

✨ Key Features #

🚀 How to Use #