🖐️ HandPose – Detect Hands and Gestures in Scratch #
The HandPose extension brings real-time hand tracking into Scratch.
It lets your projects react to fingers, wrist movement, and simple gestures – right in your browser, with no setup required.
Perfect for classrooms, workshops, and creative coding builds. 🙌
🌟 Overview #
- Detect up to 4 Hands: Track one or more hands simultaneously.
- 21 Landmarks: Wrist, thumb joints, and each finger’s joints and tips.
- Read Coordinates: Get X/Y positions of any hand landmark on the Scratch stage.
- Measure: Calculate angles and distances between hand landmarks.
- Camera Controls: Show, hide, mirror, and choose your camera device.
- Choose Input: Analyze from the live camera or the stage image.
✨ Key Features #
- Multi-hand tracking (1–4 hands).
- Friendly dropdown of joints and fingertips.
- Adjustable “classify” intervals for smooth performance and CPU control.
- Camera preview transparency and mirroring options.
- Fully browser-based – no installation, private and secure.
🚀 How to Use #
- Go to pishi.ai/play.
- Open the Extensions section.
- Select HandPose from the list.
- Allow camera access when prompted and check that the preview appears.
- If no cameras are detected, the input will automatically switch to the stage image instead.
- Continuous detection starts by default at a smooth 100 ms interval.
- Use position and measurement blocks to make sprites react to finger gestures.
Tips:
- Use good, even lighting for best tracking accuracy.
- When multiple hands appear, use hand no: 1 – 4 in the block dropdown.
- On slower computers, increase interval (e.g., 150–250 ms) to reduce CPU load.
🧱 Blocks and Functions #
📍 Position & Count #
x of keypoint no: [KEYPOINT] hand no: [HAND_NUMBER]
y of keypoint no: [KEYPOINT] hand no: [HAND_NUMBER]
Reports the X or Y position of a specific hand landmark on the stage.
[KEYPOINT]: choose from the dropdown list (wrist, joints, fingertips).
[HAND_NUMBER]: selects which hand to track (1–4). “1” = first detected hand.
Returns empty if no hand is detected.
hand count
Reports how many hands are currently detected (0 – 4).
📏 Measurements #
angle between keypoints: [KEYPOINT_1] and [KEYPOINT_2] object no: [HAND_NUMBER]
Reports the angle (in degrees) between two landmarks – ideal for detecting finger bend or wrist rotation.
distance between keypoints: [KEYPOINT_1] and [KEYPOINT_2] object no: [HAND_NUMBER]
Measures the distance in stage pixels between two landmarks – useful for pinch, expand, or spread gestures.
Notes:
Default keypoints: 0 (wrist) and 12 (middle fingertip).
Coordinates follow the Scratch stage center (X ≈ −240…240, Y ≈ −180…180).
When mirroring is on, X values flip to match the preview view.
⚙️ Classification Controls #
classify [INTERVAL]- Choose how often detection runs:- every time this block runs
- continuous, without delay
- continuous, every 50–2500 ms
turn classification [on/off]- start or stop continuous detection.classification interval- reports the current interval in milliseconds.continuous classification- reports continuous detection is “on” or “off”.select input image [camera/stage]- choose camera or stage.input image- reports the active input source.
🎥 Video Controls #
classify [INTERVAL]- Choose how often detection runs:- every time this block runs
- continuous, without delay
- continuous, every 50–2500 ms
turn classification [on/off]- start or stop continuous detection.classification interval- reports the current interval in milliseconds.continuous classification- reports continuous detection is “on” or “off”.select input image [camera/stage]- choose camera or stage.input image- reports the active input source.
🖐️ Common Keypoints (Handy Numbers) #
Use these shortcuts for common hand regions, or select any landmark from the dropdown.
0: wrist,1: base of thumb, 2: thumb joint 2,3: thumb joint 1, 4: thumb tip,5–8: index finger joints / tip,9–12: middle finger joints / tip,13–16: ring finger joints / tip,17–20: little finger joints / tipThe menu counts from 0–20 (like MediaPipe indices).
🎓 Educational Uses #
- Explore computer vision by visualizing hand joints and movement.
- Teach coordinate systems by mapping finger motion to sprite X/Y.
- Apply geometry and math to calculate angles and distances.
- Create gesture-based interactions like pinching, pointing, or thumbs-up triggers.
🎮 Example Projects #
- Pinch to Click: Detect thumb–index distance to simulate a mouse click.
- Finger Piano: Map fingertips to keys and play notes as you move.
- Thumbs-Up Detector: Trigger actions when the thumb points upward.
- Rock – Paper – Scissors: Recognize hand shapes using landmark distances.
- Hand Controller: Move sprites with wrist X/Y and boost on finger spread.
🧩 Try it yourself: pishi.ai/play
🔧 Tips & Troubleshooting #
- No camera?
• Make sure your camera is connected and browser permission is allowed.
• If the camera is blocked, enable it in your browser’s site settings and reload the page.
• During extension load, if no cameras are detected, the input will automatically switch to the stage image so you can still test FaceMesh features. - No detection?
•continuous classification: Use this reporter to see if classification is active.
• If it is active, improve lighting and face the camera directly.
•turn classification [on]: Use this block, if classification is not active, then recheck the classification status with the above reporter.
• In camera input mode, when the camera is turned off, classification is also stopped - you must turn the video back on or switch input to stage.
• In stage input mode, the system classifies whatever is visible on the stage - backdrops, sprites, or images. You can turn off the video completely and still process stage images.
• Stage mode is slower than camera input, so lower your classification interval (e.g., 100–250 ms) for smoother results using this block:classify [INTERVAL]
• In stage mode, “left” and “right” landmarks are swapped because the stage image is not mirrored - coordinate space represents a real (non-mirrored) view.
• Classification can also restart automatically when you use blocks such as:turn video [on]/classify [INTERVAL]/select camera [CAMERA]/select input image [camera/stage]. - Flipped view?
turn video [on-flipped]: Use this to show the camera without mirroring. “on” mirrors like a selfie; “on-flipped” shows real left/right orientation. - Laggy or slow?
Use classification intervals between 100–250 ms or close other browser tabs to reduce processing load. - WebGL2 warning?
Try Firefox, or a newer device that supports WebGL2 graphics acceleration. - Analyze stage instead of camera?
select input image [stage]: Use this to analyze the Scratch stage image instead of a live camera feed.
🖐️ HandPose Specific Tips #
- Hand not detected? Ensure your full hand – including the wrist – is visible in the camera. Spread your fingers slightly; closed fists or motion blur make detection harder.
- Fingers confused? Keep fingers clearly separated and avoid overlapping for accurate tracking of individual fingertips.
- Multiple hands? Use the hand no: 1–4 parameter of block to choose which hand to track. Hand 1 is typically the largest or closest hand in view.
- Tracking unstable? Keep your hand steady and evenly lit. Avoid strong shadows or very bright reflections on skin.
- Detect pinch gesture? Measure the distance between keypoints 4 (thumb tip) and 8 (index fingertip). A smaller distance indicates a pinch.
- Detect pointing? Check if keypoint 8 (index fingertip) has a smaller Y value than keypoint 5 (index base) while other fingers are bent down.
- Count fingers up? Compare each fingertip’s Y position with its base – if the fingertip Y is higher, that finger is extended.
- Thumbs-up detection? Verify if keypoint 4 (thumb tip) is higher than keypoint 2 (thumb base) while other fingers remain folded.
- Hand orientation? Calculate the angle between keypoint 0 (wrist) and keypoint 12 (middle fingertip) to estimate hand rotation or tilt.
- Using stage mode with photos? In stage mode, landmarks are not mirrored – left and right correspond to true anatomical positions.
🔒 Privacy & Safety #
- Everything runs locally in your browser.
- No images or video are uploaded anywhere.
- Model files may download once for offline use.
- Always ask a teacher or parent before using the camera.
- Anytime, you can safely
turn video [off].
🧪 Technical Info #
- Model: MediaPipe Hands (HandPose)
- Framework: TensorFlow.js (latest) – runs fully in-browser with WebGL 2
- Detection: Up to 4 hands / 21 landmarks (0 – 20)
- Coordinate System: Scratch stage pixels (X right, Y up)
- Mirroring: “on” = mirrored preview, “on-flipped” = true view
- Input Modes: Camera or Stage canvas
- Default Keypoints: 0 (wrist), 12 (middle fingertip)
- Default Classify Interval: 100 ms
🔗 Related Extensions #
- 😎 FaceMesh – detect face landmarks
- 🕺 PoseNet – track body pose
- 🖼️ Image Trainer – build custom AI models
- 🏫 Google Teachable Machine – import your own TM models

