😎 FaceMesh – Detect Faces and Expressions in Scratch #
The FaceMesh extension brings real AI-powered face tracking into Scratch.
It lets your code in scratch react to your expressions, head movement, and gestures – all in real time, right in your browser, with no setup required.
Simple enough for students, powerful enough for creative classrooms. 😊

🌟 Overview #
- Detect 4 Faces: Detect up to 4 faces at once.
- 478 Landmarks: Track 468 facial keypoints (eyes, nose, mouth, chin, etc.) plus 10 iris-specific keypoints.
- Read Coordinates: Get X/Y positions of any face landmark on the Scratch stage.
- Measure: Calculate angles and distances between face landmarks.
- Change Camera Preview: Show, hide, or flip the live camera view to match your setup.
- Choose Input: Analyze from the live camera or directly from the Scratch stage image.
✨ Key Features #
- Multi-face tracking (1–4 faces).
- Friendly dropdown for common face parts.
- Adjustable “classify” intervals for smooth performance.
- Camera preview, transparency, and device controls.
- Works fully in-browser – safe and private.
🚀 How to Use #
- Go to: pishi.ai/play
- Open the Extensions section.
- Select the FaceMesh extension.
- Allow camera access if prompted and check that your video preview appears.
- If no cameras are detected, the input will automatically switch to the stage image instead.
- Once the extension loads successfully, continuous face detection in no delay mode starts automatically.
- Now you can use the position or measurement blocks to make sprites react to your face – move, smile, blink, or tilt your head to control your project!
Tips
- Good lighting helps the model detect your face better.
- Use person no: 1–4 when multiple faces appear.
- For classrooms or older devices, start the classification at 100–250 ms intervals for smooth performance.
🧱 Blocks and Functions #
📍 Position & Count #
Reports the X or Y position of a facial landmark on the stage.
KEYPOINT_MENU_ITEM: choose from the dropdown list (eyes, nose, lips, etc.).
KEYPOINT_NUMBER: enter a keypoint index between 0–477.
Click on image below to see full size landmark numbering:
PERSON_NUMBER: selects which face to track (1–4). “1” = first detected face.
Returns empty if no face is detected.
Reports the number of faces currently detected (0–4).
📏 Measurements #
Angle (in degrees) between two landmarks on one face – great for detecting head tilt or nods.
Distance in stage pixels between two landmarks – perfect for mouth-open or eye-blink detection.
Notes:
Default keypoints: 454 (left cheek) and 234 (right cheek).
Coordinates are stage-centered (X ≈ −240…240, Y ≈ −180…180).
When the video is mirrored, the X coordinate values are also flipped to match what you see on-screen.
⚙️ Classification Controls #
- classify [INTERVAL] - Choose how often detection runs:
- every time this block runs
- continuous, without delay
- continuous, every 50–2500 ms
- turn classification [on/off] - start or stop continuous detection.
- classification interval - reports the current interval in milliseconds.
- continuous classification - reports continuous detection is “on” or “off”.
- select input image [camera/stage] - choose camera or stage.
- input image - reports the active input source.
🎥 Video Controls #
- turn video [off/on/on-flipped]
- on: shows the camera preview in a mirrored view (like a typical webcam or mirror).
- on-flipped: shows the camera preview in a non-mirrored view — directions appear as in the real world.
- off: turns off the camera preview. In stage input mode, detection continues to run.
- set video transparency to [TRANSPARENCY|text] — adjusts how visible the camera preview is:
- 0: fully visible (solid image)
- 100: fully transparent (invisible but active)
- select camera [CAMERA] — chooses among available cameras on your device. The dropdown lists all detected cameras, and the extension switches automatically to the one you select.
😊 Common Keypoints (Handy Numbers) #
Use these shortcut keypoints for common facial regions. You can also enter any index manually from 0–477.
Numbers match the landmark indices used by MediaPipe FaceMesh.
🎓 Educational Uses #
- Explore AI and computer vision concepts visually – understand how computers recognize and track faces.
- Teach coordinate systems by linking head or eye movement to sprite positions on the stage.
- Apply math and geometry to measure angles, distances, and facial symmetry.
- Create interactive art or accessibility projects that respond to expressions or gestures.
🎮 Example Projects #
- Talking Sprite: Measure lip distance to animate a mouth or trigger speech.
- Head-Tilt Controller: Tilt your head left or right to steer a sprite or car.
- Blink to Jump: Detect eyelid closure and make your character jump – a fun no-hands controller!
- Face Counter Game: Start only when a face is detected – bonus points for two faces at once!
- Smile to Win: Create a game that rewards smiles or happy expressions.
- Emoji Match: Copy the displayed emoji’s expression to score points.
🧩 Try it yourself: pishi.ai/play
🔧 Tips and Troubleshooting #
- No camera?
• Make sure your camera is connected and browser permission is allowed.
• If the camera is blocked, enable it in your browser’s site settings and reload the page.
• During extension load, if no cameras are detected, the input will automatically switch to the stage image so you can still test FaceMesh features. - No detection?
• continuous classification: Use this reporter to see if classification is active.
• If it is active, improve lighting and face the camera directly.
• turn classification [on]: Use this block, if classification is not active, then recheck the classification status with the above reporter.
• In camera input mode, when the camera is turned off, classification is also stopped - you must turn the video back on or switch input to stage.
• In stage input mode, the system classifies whatever is visible on the stage - backdrops, sprites, or images. You can turn off the video completely and still process stage images.
• Stage mode is slower than camera input, so lower your classification interval (e.g., 100–250 ms) for smoother results using this block: classify [INTERVAL]
• In stage mode, “left” and “right” landmarks are swapped because the stage image is not mirrored - coordinate space represents a real (non-mirrored) view.
• Classification can also restart automatically when you use blocks such as:
turn video [on] / classify [INTERVAL] / select camera [CAMERA] / select input image [camera/stage]. - Flipped view?
turn video [on-flipped]: Use this to show the camera without mirroring. “on” mirrors like a selfie; “on-flipped” shows real left/right orientation. - Laggy or slow?
Use classification intervals between 100–250 ms or close other browser tabs to reduce processing load. - WebGL2 warning?
Try Firefox, or a newer device that supports WebGL2 graphics acceleration. - Analyze stage instead of camera?
select input image [stage]: Use this to analyze the Scratch stage image instead of a live camera feed.
😎 FaceMesh Specific Tips #
- No face detected? Make sure your full face is visible and well-lit – avoid backlighting from windows.
- Tracking unstable? Keep your head steady and face the camera directly for best accuracy.
- Multiple faces? Use person no: 1–4 to select which face to track. Person 1 is usually the largest/closest face.
- Landmarks jittery? Increase the classification interval (e.g., 150–200 ms) to smooth out rapid fluctuations.
- Need precise eye/iris tracking? Use keypoints 468–477 for iris centers and edges – great for gaze direction or blink detection.
- Mouth not opening? Measure distance between keypoints 0 (upper lip) and 17 (lower lip) – larger values = mouth open.
- Detect head tilt? Calculate angle between keypoints 454 (left cheek) and 234 (right cheek) – deviation from 0° = tilt.
- Detect head nod (up/down)? Measure the vertical distance between keypoints 10 (forehead) and 152 (chin) – distance is largest when facing forward and decreases as you nod up or down. Useful for nod gestures or up/down attention tracking.
- Smile detection? Measure distance between mouth corners (keypoints 61 and 291) – wider = smile.
- Stage mode with photos? Remember landmarks are not mirrored in stage mode – left/right are true anatomical positions.
🔒 Privacy and Safety #
- Everything runs locally in your browser.
- No images or video are uploaded anywhere.
- Model files may download once for offline use.
- Always ask a teacher or parent before using the camera.
- Anytime, you can safely turn video [off].
🧪 Technical Info #
- Model: MediaPipe Face Mesh
- FrameWork: TensorFlow.js (latest version) – runs fully in-browser using WebGL2 acceleration
- Faces: up to 4
- Landmarks: 478 total (0–477)
- Coordinates: stage-centered pixels (X right, Y up)
- Mirroring: “on” = mirrored preview, “on-flipped” = true view
- Inputs: camera or stage canvas
- Default keypoints: 454 (left cheek), 234 (right cheek)
- Requires: WebGL2 for best performance
🔗 Related Extensions #
- 🖐️ Hand Pose – detect hand landmarks
- 🕺 PoseNet – track body pose
- 🖼️ Image Trainer – build custom AI models
- 🏫 Google Teachable Machine – import your own TM models
| Feature | MIT Scratch Face Sensing | Pishi.ai FaceMesh |
|---|---|---|
| Detection Type | Simple face rectangle detection (bounding box only). | Advanced 3D face landmark detection with 478 keypoints. |
| Faces Supported | 1 face | Up to 4 faces simultaneously |
| Keypoints / Landmarks | None (just general position and size). | 468 standard landmarks + 10 iris keypoints (eyes, lips, chin, nose, etc.) |
| Expression Tracking | Limited - only “face present” or “moved” detection. | Full facial geometry - can measure smiles, blinks, mouth open, tilt, nod, or eyebrow raise. |
| Input Source | Camera only | Camera or stage image (for analyzing photos or screenshots). In stage mode, camera and stage can also be classified together for combined analysis. |
| Video Controls | Turn video on, on and flipped, off, and change video preview transparency. | Camera selection support as well as turn video on, on and flipped, off only, and change video preview transparency. |
| Performance Tuning | Fixed speed. | Flexible detection modes - adjustable interval (every 50–2500 ms), continuous detection, or detection on block click. Classification can also be turned on or off at any time for performance control or teaching demonstrations. |
| Privacy | Runs locally in browser. | 100% local, no upload - same privacy level, but works even offline after model load. |
| Educational Focus | Simple introduction to face detection. | Deep exploration of AI and computer vision concepts - geometry, math, and interaction design. |
↔ Swipe left or right to view full table on mobile
💡 Why Choose FaceMesh? #
FaceMesh offers a much richer and more precise understanding of faces. It goes beyond detecting that a face exists – it tells you what the face is doing.
- Link expressions and movement directly to sprite behavior.
- Measure angles, distances, and positions for STEM and AI lessons.
- Create gesture-based games or emotion-reactive characters – all inside Scratch.
- Learn real computer vision concepts that scale to professional AI frameworks like MediaPipe.
In short:
MIT’s Face Sensing is great for simple, fun introductions.
Pishi.ai FaceMesh is for creators who want real AI-powered expression tracking – still easy, but dramatically more capable.
