- 🖼️ Image Trainer - Build Custom AI Models in Scratch
- 🌟 Overview
- ✨ Key Features
- 🚀 How to Use
- 🧱 Blocks and Functions
- 🎓 Educational Uses
- 🎮 Example Projects
- 🔧 Tips & Troubleshooting
- 🔒 Privacy & Safety
- 🧪 Technical Info
- 🔗 Related Extensions
- 🖼️ Comparison: Image Trainer vs. Google Teachable Machine Extension
- 📚 Learn More
🖼️ Image Trainer – Build Custom AI Models in Scratch #
The Image Trainer extension brings real machine learning into Scratch.
It lets you train your own AI models to recognize images – toys, faces, hand gestures, objects, or anything visible to the camera – all in real time, right in your browser, with no coding or setup required.
Simple enough for students, powerful enough for creative classrooms. ✨
🌟 ملخص #
- Train Custom Models: Teach the AI to recognize up to 10 different image labels using your camera or stage images.
- No Pre-training Needed: Start from scratch – show examples, and the AI learns instantly.
- Real-Time Classification: Get instant feedback as the model identifies what it sees.
- Save and Load: Download your trained model data and upload it later to continue your project.
- Change Camera Preview: Show, hide, or flip the live camera view to match your setup.
- Choose Input: Train and classify from the live camera or directly from the Scratch stage image.
✨ Key Features #
- Up to 10 custom trainable labels (numbered 1–10).
- Transfer learning using MobileNet v2 feature extraction for fast training.
- K-Nearest Neighbors (KNN) classification for quick, efficient recognition.
- Adjustable “classify” intervals for smooth performance.
- Camera preview, transparency, and device controls.
- Download and upload learning data to save and restore your trained models.
- Works fully in-browser – safe and private.
🚀 How to Use #
- Go to: pishi.ai/play
- Open the Extensions section.
- Select the Image Trainer extension.
- Allow camera access if prompted and check that your video preview appears.
- If no cameras are detected, the input will automatically switch to the stage image instead.
- Training Phase:
- Hold an object or strike a pose in front of the camera.
- Click the “train label [1]” block repeatedly (5–20 times) to teach the AI what label 1 looks like.
- Repeat with different objects or poses for labels 2, 3, 4, etc.
- The first training may take a moment – be patient! Subsequent trainings are instant.
- Classification Phase: Once trained, the AI continuously analyzes the camera and reports which label it sees using the “detected label” block.
- Use “when label [1] detected” hat blocks to trigger sprite actions when specific labels are recognized.
Tips
- Train each label from multiple angles and distances for better accuracy.
- Use consistent lighting during training and testing.
- Train at least 5–10 examples per label for reliable recognition.
- For classrooms or older devices, start the classification at 100–250 ms intervals for smooth performance.
🧱 Blocks and Functions #
🎓 Training #
train label [LABEL]
Captures the current camera or stage image and adds it as a training example for the selected label.
[LABEL]: choose a label number from 1 to 10.
How it works:
- Click this block multiple times (5–20 times) while showing the object or pose you want the AI to learn.
- The more examples you provide, the better the AI will recognize that label.
- Each click adds one training sample – you’ll see the train count increase.
- The first training session may take a moment as the model initializes – subsequent trainings are instant.
count of label [LABEL] trains
Reports how many training examples have been captured for the specified label.
Use this to track your training progress or display it on screen.
🧮 Detection #
detected label
Reports the label number (1–10) that the AI currently recognizes in the camera or stage image.
Returns empty if no label is confidently detected or if no training has been done yet.
Example: If the AI sees the object you trained as label 3, this block will report “3”.
when any label detected
Hat block that triggers when any trained label is detected with confidence above the minimum threshold.
Perfect for starting general reactions to any recognized image.
when label [LABEL] detected
Hat block that triggers when the specific label is detected with confidence above the minimum threshold.
[LABEL]: choose a label number from 1 to 10.
Example uses:
- Train label 1 as “red toy” → when label 1 detected, make sprite say “I see red!”
- Train label 2 as “thumbs up” → when label 2 detected, play victory sound.
- Train label 3 as “hand wave” → when label 3 detected, make sprite wave back.
ملاحظة: Hat blocks have a brief timeout (100ms) to prevent multiple rapid triggers.
🎯 Confidence #
set minimum confidence [CONFIDENCE]
Sets the minimum confidence score (0–1) required for a label to be reported as detected.
If the AI’s confidence is below this threshold, the detection will be ignored – meaning the “detected label” block will return empty and hat blocks won’t trigger.
0.6 – Default: good balance between accuracy and responsiveness.0.7–0.9 – Raise it to 0.7-0.9 for stricter, more accurate detection (fewer false positives).0.3–0.5 – Lower it to 0.3-0.5 for more lenient detection (may increase false positives but catch more borderline cases).
minimum confidence
Reports the current confidence threshold value – useful for showing it on screen or adjusting it dynamically during a project.
Tip: If your project has too many false detections, increase the confidence. If valid detections are being missed, decrease it.
🗑️ Reset and Manage #
reset label: [LABEL]
Clears all training data for the specified label.
[LABEL]: choose a label number from 1 to 10.
A confirmation prompt will appear to prevent accidental deletion.
Use this when you want to retrain a label from scratch.
reset all labels
Clears all training data for all labels (1–10).
A confirmation prompt will appear to prevent accidental deletion.
Use this to start a completely fresh training session.
Warning: This deletes all your training work! Make sure to download your learning data first if you want to save it.
💾 Save and Load #
download learning data
Downloads your trained model data as a JSON file to your computer.
The filename will be formatted as: imagetrainer-[timestamp].json
Use this to:
- Save your work before closing the project.
- Share trained models with classmates or friends.
- Back up your training progress.
upload learning data
Opens a file picker to load previously saved learning data (JSON file).
Once loaded, the AI will immediately recognize the labels it was trained on, without needing to retrain.
Use this to:
- Continue a project where you left off.
- Load a model trained by someone else.
- Switch between different trained models for different projects.
ملاحظة: Only upload JSON files downloaded from the Image Trainer extension.
⚙️ Classification Controls #
classify [INTERVAL]- Choose how often detection runs:- every time this block runs
- continuous, without delay
- continuous, every 50–2500 ms
turn classification [on/off]- start or stop continuous detection.classification interval- reports the current interval in milliseconds.continuous classification- reports continuous detection is “on” or “off”.select input image [camera/stage]- choose camera or stage.input image- reports the active input source.
🎥 Video Controls #
classify [INTERVAL]- Choose how often detection runs:- every time this block runs
- continuous, without delay
- continuous, every 50–2500 ms
turn classification [on/off]- start or stop continuous detection.classification interval- reports the current interval in milliseconds.continuous classification- reports continuous detection is “on” or “off”.select input image [camera/stage]- choose camera or stage.input image- reports the active input source.
🎓 Educational Uses #
- Introduce Machine Learning Concepts: Students learn how AI models are trained using examples, not rules.
- Teach classification, pattern recognition, and the importance of training data quality.
- Explore transfer learning – how pre-trained models (MobileNet) can be adapted for new tasks.
- Discuss overfitting, underfitting, and model accuracy through hands-on experimentation.
- Create interactive games and tools that respond to real-world objects, gestures, or faces.
- Build accessible input methods – control projects with hand signs, facial expressions, or custom gestures.
🎮 Example Projects #
- Rock Paper Scissors AI: Train labels 1, 2, 3 for rock, paper, scissors – play against the computer!
- Color Sorter: Train labels for different colored objects – sprite reacts to each color.
- Custom Gesture Controller: Train hand poses (thumbs up, peace sign, fist) to control sprite movement.
- Pet Recognition: Train labels for different pets or toys – make a sprite greet each one differently.
- Smart Classroom Tool: Train labels for “hand raised,” “question,” “ready” – automate classroom responses.
- Emotion Detector: Train labels for happy, sad, surprised faces – create an emotion-responsive character.
- Object Finder Game: Train labels for hidden objects – search and find game with AI validation.
🧩 Try it yourself: pishi.ai/play
🔧 Tips & Troubleshooting #
- No camera?
• Make sure your camera is connected and browser permission is allowed.
• If the camera is blocked, enable it in your browser’s site settings and reload the page.
• During extension load, if no cameras are detected, the input will automatically switch to the stage image so you can still test FaceMesh features. - No detection?
•continuous classification: Use this reporter to see if classification is active.
• If it is active, improve lighting and face the camera directly.
•turn classification [on]: Use this block, if classification is not active, then recheck the classification status with the above reporter.
• In camera input mode, when the camera is turned off, classification is also stopped - you must turn the video back on or switch input to stage.
• In stage input mode, the system classifies whatever is visible on the stage - backdrops, sprites, or images. You can turn off the video completely and still process stage images.
• Stage mode is slower than camera input, so lower your classification interval (e.g., 100–250 ms) for smoother results using this block:classify [INTERVAL]
• In stage mode, “left” and “right” landmarks are swapped because the stage image is not mirrored - coordinate space represents a real (non-mirrored) view.
• Classification can also restart automatically when you use blocks such as:turn video [on]/classify [INTERVAL]/select camera [CAMERA]/select input image [camera/stage]. - Flipped view?
turn video [on-flipped]: Use this to show the camera without mirroring. “on” mirrors like a selfie; “on-flipped” shows real left/right orientation. - Laggy or slow?
Use classification intervals between 100–250 ms or close other browser tabs to reduce processing load. - WebGL2 warning?
Try Firefox, or a newer device that supports WebGL2 graphics acceleration. - Analyze stage instead of camera?
select input image [stage]: Use this to analyze the Scratch stage image instead of a live camera feed.
🖼️ Image Trainer Specific Tips #
- Not detecting accurately? Train more examples (aim for 10–20 per label) from different angles.
- False positives? Increase the minimum confidence threshold (try 0.7–0.8).
- Missing detections? Lower the confidence threshold (try 0.4–0.5) or train more diverse examples.
- First training slow? This is normal – MobileNet loads on the first train. Subsequent trainings are instant.
- Want to start fresh? Use “reset all labels” – but download your data first if you want to keep it!
- Lighting matters: Train and test in similar lighting conditions for best results.
- Background clutter? Use plain backgrounds during training for cleaner recognition.
🔒 Privacy & Safety #
- Everything runs locally in your browser.
- No images or video are uploaded anywhere.
- Model files may download once for offline use.
- Always ask a teacher or parent before using the camera.
- Anytime, you can safely
turn video [off].
Image Trainer Privacy Notes:
- Your training data stays on your computer – nothing is uploaded to servers.
- Downloaded learning data is stored locally as a JSON file – you control where it goes.
- No personal images are sent anywhere – all training and classification happen in your browser.
🧪 Technical Info #
- Base Model: MobileNet v2
- Framework: TensorFlow.js + KNN Classifier – runs fully in-browser using WebGL2 acceleration
- Technique: Transfer learning + K-Nearest Neighbors classification
- Labels: 10 trainable labels (1–10)
- Training: Instant after first initialization – no server required
- Inputs: camera or stage canvas
- Default confidence: 0.6 (adjustable 0–1)
- Model loading: Local files or bundled with extension
- Data format: JSON
- Requires: WebGL2 for best performance
🔗 Related Extensions #
- 🏫 آلة جوجل التعليمية – import pre-trained models from Teachable Machine
- 😎 شبکة الوجه – detect facial landmarks
- 🖐️ Hand Pose – detect hand landmarks
- 🕺 تتبع الوضعية – track body pose
🖼️ Comparison: Image Trainer vs. Google Teachable Machine Extension #
Both extensions let you use custom-trained AI models in Scratch – but they differ in where and how you train the model.
| Feature | Image Trainer | آلة جوجل التعليمية |
|---|---|---|
| Training Location | Train directly inside Scratch – instant and integrated. | Train on Teachable Machine website, export, then import into Scratch. |
| Ease of Use | All-in-one – no external tools needed. | Requires switching between TM website and Scratch. |
| Labels Supported | 10 labels (1–10) | Unlimited labels (depends on your TM model) |
| Training Speed | Instant after first load – add examples with one click. | Slower – must train on TM website, wait for processing, export, and import. |
| Model Portability | Download/upload JSON files for saving and sharing models. | Import pre-trained TM models via URL or upload. |
| Customization | Limited to KNN + MobileNet v2 (fast and lightweight). | Full Teachable Machine options (deep learning, custom architectures). |
| Best For | Quick, iterative training during live projects – great for classrooms and beginners. | Advanced projects requiring more complex models or many labels. |
| Privacy | 100% local – all training stays in your browser. | Training on TM website (Google) – model data may be processed externally. |
| Educational Focus | Hands-on introduction to ML – see training happen in real time. | Learn professional ML workflows – training, exporting, deploying. |
↔ مرر لليسار أو اليمين لعرض الجدول الكامل على الهاتف المحمول
💡 Why Choose Image Trainer? #
Image Trainer is perfect for live, interactive learning. Students train, test, and iterate instantly – no tab-switching or waiting for uploads.
- Immediate feedback loop for experimentation.
- Fully self-contained – no external accounts or websites.
- Privacy-focused – all data stays on your device.
- Ideal for rapid prototyping and classroom demos.
💡 Why Choose Teachable Machine? #
Teachable Machine is better for advanced or pre-planned models.
- More labels and more powerful training options.
- Share models via URLs – easier collaboration.
- Access to Google’s training infrastructure for more complex models.
In short:
Image Trainer = instant, integrated, beginner-friendly ML inside Scratch.
Teachable Machine = professional workflow for complex, pre-trained models.
