😎 FaceMesh — Detect Faces and Expressions in Scratch #
The FaceMesh extension brings real AI-powered face tracking into Scratch.
It lets your code in scratch react to your expressions, head movement, and gestures — all in real time, right in your browser, with no setup required.
Simple enough for students, powerful enough for creative classrooms. 😊
🌟 Overview #
- Detect 4 Faces: Detect up to 4 faces at once.
- 478 Landmarks: Track 468 facial keypoints (eyes, nose, mouth, chin, etc.) plus 10 iris-specific keypoints.
- Read Coordinates: Get X/Y positions of any face landmark on the Scratch stage.
- Measure: Calculate angles and distances between face landmarks.
- Change Camera Preview: Show, hide, or flip the live camera view to match your setup.
- Choose Input: Analyze from the live camera or directly from the Scratch stage image.
✨ Key Features #
- Multi-face tracking (1–4 faces).
- Friendly dropdown for common face parts.
- Adjustable “classify” intervals for smooth performance.
- Camera preview, transparency, and device controls.
- Works fully in-browser — safe and private.
🚀 How to Use #
- Go to: pishi.ai/play
- Open the Extensions section.
- Select the FaceMesh extension.
- Allow camera access if prompted and check that your video preview appears.
- If no cameras are detected, the input will automatically switch to the stage image instead.
- Once the extension loads successfully, continuous face detection in no delay mode starts automatically.
- Now you can use the position or measurement blocks to make sprites react to your face — move, smile, blink, or tilt your head to control your project!
Tips
- Good lighting helps the model detect your face better.
- Use person no: 1–4 when multiple faces appear.
- For classrooms or older devices, start the classification at 100–250 ms intervals for smooth performance.
🧱 Blocks and Functions #
📍 Position & Count #
x of keypoint no: [KEYPOINT_MENU_ITEM] person no: [PERSON_NUMBER]
y of keypoint no: [KEYPOINT_MENU_ITEM] person no: [PERSON_NUMBER]
x of keypoint no: [KEYPOINT_NUMBER] person no: [PERSON_NUMBER]
y of keypoint no: [KEYPOINT_NUMBER] person no: [PERSON_NUMBER]
Reports the X or Y position of a facial landmark on the stage.
[KEYPOINT_MENU_ITEM]: choose from the dropdown list (eyes, nose, lips, etc.).
[KEYPOINT_NUMBER]: enter a keypoint index between 0–477.
Click on image below to see full size landmark numbering:
[PERSON_NUMBER]: selects which face to track (1–4). “1” = first detected face.
Returns empty if no face is detected.
face count
Reports the number of faces currently detected (0–4).
📏 Measurements #
angle between keypoints: [KEYPOINT_1] and [KEYPOINT_2] object no: [PERSON_NUMBER]
Angle (in degrees) between two landmarks on one face — great for detecting head tilt or nods.
distance between keypoints: [KEYPOINT_1] and [KEYPOINT_2] object no: [PERSON_NUMBER]
Distance in stage pixels between two landmarks — perfect for mouth-open or eye-blink detection.
Notes:
Default keypoints: 152 (chin) and 10 (forehead center).
Coordinates are stage-centered (X ≈ −240…240, Y ≈ −180…180).
When the video is mirrored, the X coordinate values are also flipped to match what you see on-screen.
⚙️ Classification Controls #
classify [INTERVAL]— Choose how often detection runs:- every time this block runs
- continuous, without delay
- continuous, every 50–2500 ms
turn classification [on/off]— start or stop continuous detection.classification interval— reports the current interval in milliseconds.continuous classification— reports continuous detection is “on” or “off”.select input image [camera/stage]— choose camera or stage.input image— reports the active input source.
🎥 Video Controls #
turn video [off/on/on-flipped]on: shows the camera preview in a mirrored view (like a typical webcam or mirror).on-flipped: shows the camera preview in a non-mirrored view — directions appear as in the real world.off: turns off the camera preview. In stage input mode, detection continues to run.
set video transparency to [0–100]— adjusts how visible the camera preview is:0: fully visible (solid image)100: fully transparent (invisible but active)
select camera [CAMERA]— chooses among available cameras on your device. The dropdown lists all detected cameras, and the extension switches automatically to the one you select.
😊 Common Keypoints (Handy Numbers) #
Use these shortcut keypoints for common facial regions. You can also enter any index manually from 0–477.
1: nose tip , 10: forehead center , 152: chin, 473: left eye iris center , 468: right eye iris center , 291 / 61: mouth left / right corners , 0 / 17: upper / lower mid lip , 362 / 263: left eye inner / outer , 133 / 33: right eye inner / outer , 159 / 145: right eye top / bottom, 386 / 374: left eye top / bottomNumbers match the landmark indices used by MediaPipe FaceMesh.
🎓 Educational Uses #
- Explore AI and computer vision concepts visually — understand how computers recognize and track faces.
- Teach coordinate systems by linking head or eye movement to sprite positions on the stage.
- Apply math and geometry to measure angles, distances, and facial symmetry.
- Create interactive art or accessibility projects that respond to expressions or gestures.
🎮 Example Projects #
- Talking Sprite: Measure lip distance to animate a mouth or trigger speech.
- Head-Tilt Controller: Tilt your head left or right to steer a sprite or car.
- Blink to Jump: Detect eyelid closure and make your character jump — a fun no-hands controller!
- Face Counter Game: Start only when a face is detected — bonus points for two faces at once!
- Smile to Win: Create a game that rewards smiles or happy expressions.
- Emoji Match: Copy the displayed emoji’s expression to score points.
🧩 Try it yourself: pishi.ai/play
🔧 Tips & Troubleshooting #
- No camera?
• Make sure your camera is connected and browser permission is allowed.
• If the camera is blocked, enable it in your browser’s site settings and reload the page.
• During extension load, if no cameras are detected, the input will automatically switch to the stage image so you can still test FaceMesh features. - No detection?
•continuous classification: Use this reporter to see if classification is active.
• If it is active, improve lighting and face the camera directly.
•turn classification [on]: Use this block, if classification is not active, then recheck the classification status with the above reporter.
• In camera input mode, when the camera is turned off, classification is also stopped — you must turn the video back on or switch input to stage.
• In stage input mode, the system classifies whatever is visible on the stage — backdrops, sprites, or images. You can turn off the video completely and still process stage images.
• Stage mode is slower than camera input, so lower your classification interval (e.g., 100–250 ms) for smoother results using this block:classify [INTERVAL]
• In stage mode, “left” and “right” landmarks are swapped because the stage image is not mirrored — coordinate space represents a real (non-mirrored) view.
• Classification can also restart automatically when you use blocks such as:turn video [on]/classify [INTERVAL]/select camera [CAMERA]/select input image [camera/stage]. - Flipped view?
turn video [on-flipped]: Use this to show the camera without mirroring. “on” mirrors like a selfie; “on-flipped” shows real left/right orientation. - Laggy or slow?
Use classification intervals between 100–250 ms or close other browser tabs to reduce processing load. - WebGL2 warning?
Try Firefox, or a newer device that supports WebGL2 graphics acceleration. - Analyze stage instead of camera?
select input image [stage]: Use this to analyze the Scratch stage image instead of a live camera feed.
😎 FaceMesh Specific Tips #
- No face detected? Make sure your full face is visible and well-lit — avoid backlighting from windows.
- Tracking unstable? Keep your head steady and face the camera directly for best accuracy.
- Multiple faces? Use person no: 1–4 to select which face to track. Person 1 is usually the largest/closest face.
- Landmarks jittery? Increase the classification interval (e.g., 150–200 ms) to smooth out rapid fluctuations.
- Need precise eye/iris tracking? Use keypoints 468–477 for iris centers and edges — great for gaze direction or blink detection.
- Mouth not opening? Measure distance between keypoints 0 (upper lip) and 17 (lower lip) — larger values = mouth open.
- Detect head tilt? Calculate angle between keypoints 454 (left eye) and 234 (right eye) — deviation from 0° = tilt.
- Smile detection? Measure distance between mouth corners (keypoints 61 and 291) — wider = smile.
- Stage mode with photos? Remember landmarks are not mirrored in stage mode — left/right are true anatomical positions.
🔒 Privacy & Safety #
- Everything runs locally in your browser.
- No images or video are uploaded anywhere.
- Model files may download once for offline use.
- Always ask a teacher or parent before using the camera.
- Anytime, you can safely
turn video [off].
🧪 Technical Info #
- Model: MediaPipe Face Mesh
- FrameWork: TensorFlow.js (latest version) — runs fully in-browser using WebGL2 acceleration
- Faces: up to 4
- Landmarks: 478 total (0–477)
- Coordinates: stage-centered pixels (X right, Y up)
- Mirroring: “on” = mirrored preview, “on-flipped” = true view
- Inputs: camera or stage canvas
- Default keypoints: 152 (chin), 10 (forehead center)
- Requires: WebGL2 for best performance
🔗 Related Extensions #
- 🖐️ Hand Pose — detect hand landmarks
- 🕺 PoseNet — track body pose
- 🧩 Image Trainer — build custom AI models
- 🏫 Google 可训练机器 — import your own TM models
| Feature | MIT Scratch Face Sensing | Pishi.ai FaceMesh |
|---|---|---|
| Detection Type | Simple face rectangle detection (bounding box only). | Advanced 3D face landmark detection with 478 keypoints. |
| Faces Supported | 1 face | Up to 4 faces simultaneously |
| Keypoints / Landmarks | None (just general position and size). | 468 standard landmarks + 10 iris keypoints (eyes, lips, chin, nose, etc.) |
| Expression Tracking | Limited — only “face present” or “moved” detection. | Full facial geometry — can measure smiles, blinks, mouth open, tilt, nod, or eyebrow raise. |
| Input Source | Camera only | Camera or stage image (for analyzing photos or screenshots). In stage mode, camera and stage can also be classified together for combined analysis. |
| Video Controls | Turn video on, on and flipped, off, and change video preview transparency. | Camera selection support as well as turn video on, on and flipped, off only, and change video preview transparency. |
| Performance Tuning | Fixed speed. | Flexible detection modes — adjustable interval (every 50–2500 ms), continuous detection, or detection on block click. Classification can also be turned on or off at any time for performance control or teaching demonstrations. |
| Privacy | Runs locally in browser. | 100% local, no upload — same privacy level, but works even offline after model load. |
| Educational Focus | Simple introduction to face detection. | Deep exploration of AI and computer vision concepts — geometry, math, and interaction design. |
↔ Swipe left or right to view full table on mobile
💡 Why Choose FaceMesh? #
FaceMesh offers a much richer and more precise understanding of faces. It goes beyond detecting that a face exists — it tells you what the face is doing.
- Link expressions and movement directly to sprite behavior.
- Measure angles, distances, and positions for STEM and AI lessons.
- Create gesture-based games or emotion-reactive characters — all inside Scratch.
- Learn real computer vision concepts that scale to professional AI frameworks like MediaPipe.
In short:
MIT’s Face Sensing is great for simple, fun introductions.
Pishi.ai FaceMesh is for creators who want real AI-powered expression tracking — still easy, but dramatically more capable.
