Speech Recognition – Voice Control for Scratch

Table of Contents

🎤 Speech Recognition - Voice Control for Scratch
🌟 Overview
✨ Key Features
🚀 How to Use
🧱 Blocks and Functions
🎓 Educational Uses
🎮 Example Projects
🔧 Tips and Troubleshooting
- 🎤 Speech Recognition Specific Tips
🔒 Privacy and Safety
🧪 Technical Info
🔗 Related Extensions
📚 Learn More

🎤 Speech Recognition – Voice Control for Scratch #

The Speech Recognition extension brings real voice control into Scratch.
It lets you capture spoken words and convert them into text – enabling voice commands, dictation, language learning projects, and interactive voice-controlled games – all in real time, right in your browser.
Simple enough for beginners, powerful enough for creative classrooms. ✨

🌟 Overview #

Voice to Text: Capture spoken words and use them in your Scratch projects.
Multi-Language Support: Recognize speech in 25+ languages including English, Spanish, French, German, Arabic, Persian, Chinese, Japanese, Korean, and many more.
Real-Time Recognition: Get instant transcription of spoken words.
Easy Integration: Use simple blocks to start listening and retrieve recognized text.
Number Conversion: Convert Persian/Arabic digits to English numerals for easier processing.
Browser-Based: Works in Chrome, Microsoft Edge, and most Chromium-based browsers that support the Web Speech API.

✨ Key Features #

25+ supported languages with multiple regional variants.
Built-in browser speech recognition API (Web Speech API).
Simple listen-and-wait command for synchronous voice capture.
Language switching on-the-fly for multilingual projects.
Works directly through your browser – fast and responsive.
No setup or API keys required – works automatically with your browser’s built-in API.

🚀 How to Use #

Go to: pishi.ai/play
Open the Extensions section.
Select the Speech Recognition extension.
Allow microphone access when prompted by your browser.
Set Language: Use the “set language to [LANGUAGE]” block to choose your desired language (default is English).
Start Listening: Use the “listen and wait” block – the extension will start listening to your microphone and wait for you to speak.
Retrieve Speech: After speaking, the recognized text is stored and can be retrieved using the “speech” reporter block.
Use in Projects: Combine with other blocks to create voice-controlled games, dictation tools, language learning activities, and more!

Tips

Speak clearly and at a normal pace for best recognition accuracy.
Use a quiet environment to minimize background noise interference.
For languages with non-English numerals (Arabic/Persian), use the “convert persian/arabic digits to english” block to normalize numbers.
Works best in Chrome, Edge, and other Chromium-based browsers with Web Speech API support.

🧱 Blocks and Functions #

🎤 Voice Capture #

listen and wait

Starts listening to the microphone and waits for speech input to be recognized.
This is a blocking command – the script will pause until speech is detected and transcribed, or until a timeout/error occurs (approximately 60 seconds).

How it works:

When this block runs, the browser starts listening through your microphone.
Speak your words clearly – the recognition system will transcribe what you say.
Once speech is recognized, the text is saved and the block completes.
If no speech is detected or an error occurs, the block will complete with an empty result.

Important: Make sure microphone permissions are enabled in your browser settings.

speech

Reports the last recognized speech text captured by the “listen and wait” block.
Returns the transcribed text as a string, or an empty string if no speech was recognized or an error occurred.

Example:

If you say “hello scratch”, this block will report “hello scratch”.
Use this block to display recognized speech, compare words, trigger actions, or store speech in variables.

🌍 Language Settings #

set language to [LANGUAGE]

Sets the language for speech recognition.
LANGUAGE: choose from a dropdown menu of 25+ supported languages and regional variants.

Supported Languages:

Arabic (العربية)
Persian (فارسی)
English (English)
English – American (en-US)
English – British (en-GB)
German (Deutsch)
Spanish – Spain (Español)
Spanish – Latin America (Español Latinoamericano)
French (Français)
Italian (Italiano)
Portuguese (Português)
Portuguese – Brazilian (Português Brasileiro)
Russian (Русский)
Turkish (Türkçe)
Ukrainian (Українська)
Korean (한국어)
Japanese (日本語)
Chinese – Simplified (简体中文)
Chinese – Traditional (繁體中文)
Hindi (हिंदी)
Bengali (বাংলা)
Indonesian (Bahasa Indonesia)
Azeri (Azəri)
Kazakh (Қазақша)
Uzbek (Oʻzbekcha)

Note: Some languages may be supported in Chrome but not in other browsers.
For example, Persian (فارسی) currently works in Google Chrome but may show a “network” error in Microsoft Edge.
If a language doesn’t recognize speech, try testing in Chrome first.

Example uses:

Create multilingual projects that switch between languages.
Build language learning tools that recognize words in different languages.
Make voice-controlled games accessible to speakers of many languages.

🔢 Number Conversion #

convert persian/arabic digits to english

Converts Persian (۰-۹) or Arabic (٠-٩) numerals in the recognized speech text to English digits (0-9).
This is useful for projects that need to process numbers spoken in Persian or Arabic, making them easier to use in math operations or comparisons.

How it works:

Automatically detects Persian digits (۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹) and converts them to (0 1 2 3 4 5 6 7 8 9).
Automatically detects Arabic-Indic digits (٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩) and converts them to (0 1 2 3 4 5 6 7 8 9).
The converted text replaces the original speech text – call this block after “listen and wait” if needed.

Example:
If the recognized speech is “شماره ۱۲۳” (number 123 in Persian digits), after conversion the speech text becomes “شماره 123” with English digits.

Tip: Use this block when working with Persian or Arabic speech to ensure numbers are in a format Scratch can easily use for calculations.

🎓 Educational Uses #

Language Learning: Create pronunciation practice tools – students speak words and the system checks their speech.
Build vocabulary quizzes that respond to spoken answers.
Develop multilingual storytelling projects where characters respond to voice commands in different languages.
Teach accessibility concepts – show how voice control can help people with disabilities interact with technology.
Explore natural language processing – students learn how machines understand human speech.
Create voice-controlled games and simulations to practice language skills in context.
Build dictation tools for writing practice and creative storytelling.

🎮 Example Projects #

Voice Commands Game: Say “left”, “right”, “jump” to control a sprite.
Magic 8-Ball: Ask a question out loud – the sprite responds with an answer.
Voice Calculator: Say math problems like “what is five plus three” – sprite displays the answer.
Language Quiz: Sprite asks “how do you say hello in Spanish?” – student responds with voice.
Story Creator: Dictate a story and watch it appear on screen as you speak.
Color Picker: Say a color name – sprite changes to that color.
Pet Simulator: Give voice commands like “sit”, “play”, “sleep” to a virtual pet.
Multilingual Greeter: Speak in different languages – sprite detects language and responds accordingly.

🧩 Try it yourself: pishi.ai/play

🔧 Tips and Troubleshooting #

🎤 Speech Recognition Specific Tips #

Microphone not working? Check browser permissions – make sure the site has microphone access.
Not recognizing speech accurately? Speak clearly and reduce background noise. Use a headset or external microphone for better quality.
Wrong language detected? Make sure you set the correct language using “set language to [LANGUAGE]” before listening.
“Speech Recognition not supported” error? This extension requires Chrome, Edge, or another Chromium-based browser with Web Speech API support. Safari and Firefox have limited or no support.
Language not working in your browser? Some languages (like Persian) may not be supported in Edge or other browsers. Try using Chrome for the best compatibility.
Timeout or no speech detected? The system waits about 60 seconds for speech – if nothing is detected, it will timeout and return empty text.
Numbers not working correctly? For Persian or Arabic speech, use the number conversion block to normalize digits.
Recognition quality varies? Speech recognition accuracy depends on accent, pronunciation, and speech clarity – results may vary between users.
Internet connection: Web Speech API uses Google’s cloud services for most languages – an internet connection is needed for recognition to work.
You can stop recognition anytime by stopping the script or reloading the project.

🔒 Privacy and Safety #

This extension uses the browser’s built-in Web Speech API, which may send audio to Google’s servers for processing (depending on the browser implementation).
Audio is processed for transcription only – speech text is returned to your browser and not stored by this extension.
Audio is processed by your browser using the Web Speech API – Pishi.ai and Scratch never receive or store any data.
No personal data or speech recordings are saved or transmitted by Pishi.ai or Scratch.
Always review your browser’s privacy settings and permissions for microphone access.

🧪 Technical Info #

API: Web Speech API (SpeechRecognition / webkitSpeechRecognition)
Supported Browsers: Chrome, Edge, and most Chromium-based browsers that support the Web Speech API
Languages: 25+ languages and regional variants
Timeout: Approximately 60 seconds of listening per “listen and wait” call
Recognition Mode: Single-shot recognition (one utterance per listen call; continuous mode not supported)
Internet Required: Yes – Web Speech API typically uses cloud-based processing for transcription
Number Conversion: Supports Persian (۰-۹) and Arabic-Indic (٠-٩) to English (0-9)
Privacy: Audio processed by the browser API – Pishi.ai Scratch never stores your speech.

🔗 Related Extensions #

🔊 Text to Speech – convert text to spoken audio
💬 ChatGPT – generate intelligent replies, hold conversations, or build chatbots that understand and respond to your voice.
🌐 Translate – translate recognized speech into other languages for multilingual projects or real-time interpretation.
🏫 Google Teachable Machine – use your own trained models to recognize sounds, images, or poses made in Google Teachable Machine.

📚 Learn More #

What are your Feelings

Still stuck? How can we help?

Updated on February 5, 2026

Arduino Extension for Scratch

Facemesh Extension for Scratch

Handpose Extension for Scratch

PoseNet Extension for Scratch

Image Trainer Extension for Scratch

Google Teachable Machine for Scratch

Speech Recognition for Scrach

ChatGPT for Scratch

AI Image Generator for Scratch

Translate for Scratch

Text to Speech for Scratch

Face Sensing for Scratch

Speech Recognition – Voice Control for Scratch

🎤 Speech Recognition – Voice Control for Scratch #

🌟 Overview #

✨ Key Features #

🚀 How to Use #