🖐️ HandPose – Detect Hands and Gestures in Scratch #
The HandPose extension brings real-time hand tracking into Scratch.
It lets your projects react to fingers, wrist movement, and simple gestures – right in your browser, with no setup required.
Perfect for classrooms, workshops, and creative coding builds. 🙌

🌟 Overview #
- Detect up to 4 Hands: Track one or more hands simultaneously.
- 21 Landmarks: Wrist, thumb joints, and each finger’s joints and tips.
- Read Coordinates: Get X/Y positions of any hand landmark on the Scratch stage.
- Measure: Calculate angles and distances between hand landmarks.
- Camera Controls: Show, hide, mirror, and choose your camera device.
- Choose Input: Analyze from the live camera or the stage image.
✨ Key Features #
- Multi-hand tracking (1–4 hands).
- Friendly dropdown of joints and fingertips.
- Adjustable “classify” intervals for smooth performance and CPU control.
- Camera preview transparency and mirroring options.
- Fully browser-based – no installation, private and secure.
🚀 How to Use #
- Go to pishi.ai/play.
- Open the Extensions section.
- Select HandPose from the list.
- Allow camera access when prompted and check that the preview appears.
- If no cameras are detected, the input will automatically switch to the stage image instead.
- Continuous detection starts by default at a smooth 100 ms interval.
- Use position and measurement blocks to make sprites react to finger gestures.
Tips:
- Use good, even lighting for best tracking accuracy.
- When multiple hands appear, use hand no: 1 – 4 in the block dropdown.
- On slower computers, increase interval (e.g., 150–250 ms) to reduce CPU load.
🧱 Blocks and Functions #
📍 Position and Count #
Reports the X or Y position of a specific hand landmark on the stage.
KEYPOINT: choose from the dropdown list (wrist, joints, fingertips).
HAND_NUMBER: selects which hand to track (1–4). “1” = first detected hand.
Returns empty if no hand is detected.
Reports how many hands are currently detected (0 – 4).
📏 Measurements #
Reports the angle (in degrees) between two landmarks – ideal for detecting finger bend or wrist rotation.
Measures the distance in stage pixels between two landmarks – useful for pinch, expand, or spread gestures.
Notes:
Default keypoints: 0 (wrist) and 12 (middle fingertip).
Coordinates follow the Scratch stage center (X ≈ −240…240, Y ≈ −180…180).
When mirroring is on, X values flip to match the preview view.
⚙️ Classification Controls #
- classify [INTERVAL] - Choose how often detection runs:
- every time this block runs
- continuous, without delay
- continuous, every 50–2500 ms
- turn classification [on/off] - start or stop continuous detection.
- classification interval - reports the current interval in milliseconds.
- continuous classification - reports continuous detection is “on” or “off”.
- select input image [camera/stage] - choose camera or stage.
- input image - reports the active input source.
🎥 Video Controls #
- turn video [off/on/on-flipped]
- on: shows the camera preview in a mirrored view (like a typical webcam or mirror).
- on-flipped: shows the camera preview in a non-mirrored view — directions appear as in the real world.
- off: turns off the camera preview. In stage input mode, detection continues to run.
- set video transparency to [TRANSPARENCY|text] — adjusts how visible the camera preview is:
- 0: fully visible (solid image)
- 100: fully transparent (invisible but active)
- select camera [CAMERA] — chooses among available cameras on your device. The dropdown lists all detected cameras, and the extension switches automatically to the one you select.
🖐️ Common Keypoints (Handy Numbers) #
Use these shortcuts for common hand regions, or select any landmark from the dropdown.
1: base of thumb, 2: thumb joint 2,
3: thumb joint 1, 4: thumb tip,
5–8: index finger joints / tip,
9–12: middle finger joints / tip,
13–16: ring finger joints / tip,
17–20: little finger joints / tip
The menu counts from 0–20 (like MediaPipe indices).
🎓 Educational Uses #
- Explore computer vision by visualizing hand joints and movement.
- Teach coordinate systems by mapping finger motion to sprite X/Y.
- Apply geometry and math to calculate angles and distances.
- Create gesture-based interactions like pinching, pointing, or thumbs-up triggers.
🎮 Example Projects #
- Pinch to Click: Detect thumb–index distance to simulate a mouse click.
- Finger Piano: Map fingertips to keys and play notes as you move.
- Thumbs-Up Detector: Trigger actions when the thumb points upward.
- Rock – Paper – Scissors: Recognize hand shapes using landmark distances.
- Hand Controller: Move sprites with wrist X/Y and boost on finger spread.
🧩 Try it yourself: pishi.ai/play
🔧 Tips and Troubleshooting #
- No camera?
• Make sure your camera is connected and browser permission is allowed.
• If the camera is blocked, enable it in your browser’s site settings and reload the page.
• During extension load, if no cameras are detected, the input will automatically switch to the stage image so you can still test FaceMesh features. - No detection?
• continuous classification: Use this reporter to see if classification is active.
• If it is active, improve lighting and face the camera directly.
• turn classification [on]: Use this block, if classification is not active, then recheck the classification status with the above reporter.
• In camera input mode, when the camera is turned off, classification is also stopped - you must turn the video back on or switch input to stage.
• In stage input mode, the system classifies whatever is visible on the stage - backdrops, sprites, or images. You can turn off the video completely and still process stage images.
• Stage mode is slower than camera input, so lower your classification interval (e.g., 100–250 ms) for smoother results using this block: classify [INTERVAL]
• In stage mode, “left” and “right” landmarks are swapped because the stage image is not mirrored - coordinate space represents a real (non-mirrored) view.
• Classification can also restart automatically when you use blocks such as:
turn video [on] / classify [INTERVAL] / select camera [CAMERA] / select input image [camera/stage]. - Flipped view?
turn video [on-flipped]: Use this to show the camera without mirroring. “on” mirrors like a selfie; “on-flipped” shows real left/right orientation. - Laggy or slow?
Use classification intervals between 100–250 ms or close other browser tabs to reduce processing load. - WebGL2 warning?
Try Firefox, or a newer device that supports WebGL2 graphics acceleration. - Analyze stage instead of camera?
select input image [stage]: Use this to analyze the Scratch stage image instead of a live camera feed.
🖐️ HandPose Specific Tips #
- Hand not detected? Ensure your full hand – including the wrist – is visible in the camera. Spread your fingers slightly; closed fists or motion blur make detection harder.
- Fingers confused? Keep fingers clearly separated and avoid overlapping for accurate tracking of individual fingertips.
- Multiple hands? Use the hand no: 1–4 parameter of block to choose which hand to track. Hand 1 is typically the largest or closest hand in view.
- Tracking unstable? Keep your hand steady and evenly lit. Avoid strong shadows or very bright reflections on skin.
- Detect pinch gesture? Measure the distance between keypoints 4 (thumb tip) and 8 (index fingertip). A smaller distance indicates a pinch.
- Detect pointing? Check if keypoint 8 (index fingertip) has a smaller Y value than keypoint 5 (index base) while other fingers are bent down.
- Count fingers up? Compare each fingertip’s Y position with its base – if the fingertip Y is higher, that finger is extended.
- Thumbs-up detection? Verify if keypoint 4 (thumb tip) is higher than keypoint 2 (thumb base) while other fingers remain folded.
- Hand orientation? Calculate the angle between keypoint 0 (wrist) and keypoint 12 (middle fingertip) to estimate hand rotation or tilt.
- Using stage mode with photos? In stage mode, landmarks are not mirrored – left and right correspond to true anatomical positions.
🔒 Privacy and Safety #
- Everything runs locally in your browser.
- No images or video are uploaded anywhere.
- Model files may download once for offline use.
- Always ask a teacher or parent before using the camera.
- Anytime, you can safely turn video [off].
🧪 Technical Info #
- Model: MediaPipe Hands (HandPose)
- Framework: TensorFlow.js (latest) – runs fully in-browser with WebGL 2
- Detection: Up to 4 hands / 21 landmarks (0 – 20)
- Coordinate System: Scratch stage pixels (X right, Y up)
- Mirroring: “on” = mirrored preview, “on-flipped” = true view
- Input Modes: Camera or Stage canvas
- Default Keypoints: 0 (wrist), 12 (middle fingertip)
- Default Classify Interval: 100 ms
🔗 Related Extensions #
- 😎 Malla facial – detect face landmarks
- 🕺 Bolsa Net – track body pose
- 🖼️ Entrenador de Imágenes – build custom AI models
- 🏫 Máquina de enseñanza de Google – import your own TM models

