🎤 Speech Recognition – Voice Control for Scratch #
The Speech Recognition extension brings real voice control into Scratch.
It lets you capture spoken words and convert them into text – enabling voice commands, dictation, language learning projects, and interactive voice-controlled games – all in real time, right in your browser.
Simple enough for beginners, powerful enough for creative classrooms. ✨
🌟 Overview #
- Voice to Text: Capture spoken words and use them in your Scratch projects.
- Multi-Language Support: Recognize speech in 25+ languages including English, Spanish, French, German, Arabic, Persian, Chinese, Japanese, Korean, and many more.
- Real-Time Recognition: Get instant transcription of spoken words.
- Easy Integration: Use simple blocks to start listening and retrieve recognized text.
- Number Conversion: Convert Persian/Arabic digits to English numerals for easier processing.
- Browser-Based: Works in Chrome, Microsoft Edge, and most Chromium-based browsers that support the Web Speech API.
✨ Key Features #
- 25+ supported languages with multiple regional variants.
- Built-in browser speech recognition API (Web Speech API).
- Simple listen-and-wait command for synchronous voice capture.
- Language switching on-the-fly for multilingual projects.
- Works directly through your browser – fast and responsive.
- No setup or API keys required – works automatically with your browser’s built-in API.
🚀 How to Use #
- Go to: pishi.ai/play
- Open the Extensions section.
- Select the Speech Recognition extension.
- Allow microphone access when prompted by your browser.
- Set Language: Use the “set language to [LANGUAGE]” block to choose your desired language (default is English).
- Start Listening: Use the “listen and wait” block – the extension will start listening to your microphone and wait for you to speak.
- Retrieve Speech: After speaking, the recognized text is stored and can be retrieved using the “speech” reporter block.
- Use in Projects: Combine with other blocks to create voice-controlled games, dictation tools, language learning activities, and more!
Tips
- Speak clearly and at a normal pace for best recognition accuracy.
- Use a quiet environment to minimize background noise interference.
- For languages with non-English numerals (Arabic/Persian), use the “convert persian/arabic digits to english” block to normalize numbers.
- Works best in Chrome, Edge, and other Chromium-based browsers with Web Speech API support.
🧱 Blocks and Functions #
🎤 Voice Capture #
listen and wait
Starts listening to the microphone and waits for speech input to be recognized.
This is a blocking command – the script will pause until speech is detected and transcribed, or until a timeout/error occurs (approximately 60 seconds).
How it works:
- When this block runs, the browser starts listening through your microphone.
- Speak your words clearly – the recognition system will transcribe what you say.
- Once speech is recognized, the text is saved and the block completes.
- If no speech is detected or an error occurs, the block will complete with an empty result.
Important: Make sure microphone permissions are enabled in your browser settings.
speech
Reports the last recognized speech text captured by the “listen and wait” block.
Returns the transcribed text as a string, or an empty string if no speech was recognized or an error occurred.
Example:
- If you say “hello scratch”, this block will report “hello scratch”.
- Use this block to display recognized speech, compare words, trigger actions, or store speech in variables.
🌍 Language Settings #
set language to [LANGUAGE]
Sets the language for speech recognition.
[LANGUAGE]: choose from a dropdown menu of 25+ supported languages and regional variants.
Supported Languages:
- Arabic (العربية)
- Persian (فارسی)
- English (English)
- English – American (en-US)
- English – British (en-GB)
- German (Deutsch)
- Spanish – Spain (Español)
- Spanish – Latin America (Español Latinoamericano)
- French (Français)
- Italian (Italiano)
- Portuguese (Português)
- Portuguese – Brazilian (Português Brasileiro)
- Russian (Русский)
- Turkish (Türkçe)
- Ukrainian (Українська)
- Korean (한국어)
- Japanese (日本語)
- Chinese – Simplified (简体中文)
- Chinese – Traditional (繁體中文)
- Hindi (हिंदी)
- Bengali (বাংলা)
- Indonesian (Bahasa Indonesia)
- Azeri (Azəri)
- Kazakh (Қазақша)
- Uzbek (Oʻzbekcha)
Note: Some languages may be supported in Chrome but not in other browsers.
For example, Persian (فارسی) currently works in Google Chrome but may show a “network” error in Microsoft Edge.
If a language doesn’t recognize speech, try testing in Chrome first.
Example uses:
- Create multilingual projects that switch between languages.
- Build language learning tools that recognize words in different languages.
- Make voice-controlled games accessible to speakers of many languages.
🔢 Number Conversion #
convert persian/arabic digits to english
Converts Persian (۰-۹) or Arabic (٠-٩) numerals in the recognized speech text to English digits (0-9).
This is useful for projects that need to process numbers spoken in Persian or Arabic, making them easier to use in math operations or comparisons.
How it works:
- Automatically detects Persian digits (۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹) and converts them to (0 1 2 3 4 5 6 7 8 9).
- Automatically detects Arabic-Indic digits (٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩) and converts them to (0 1 2 3 4 5 6 7 8 9).
- The converted text replaces the original speech text – call this block after “listen and wait” if needed.
Example:
If the recognized speech is “شماره ۱۲۳” (number 123 in Persian digits), after conversion the speech text becomes “شماره 123” with English digits.
Tip: Use this block when working with Persian or Arabic speech to ensure numbers are in a format Scratch can easily use for calculations.
🎓 Educational Uses #
- Language Learning: Create pronunciation practice tools – students speak words and the system checks their speech.
- Build vocabulary quizzes that respond to spoken answers.
- Develop multilingual storytelling projects where characters respond to voice commands in different languages.
- Teach accessibility concepts – show how voice control can help people with disabilities interact with technology.
- Explore natural language processing – students learn how machines understand human speech.
- Create voice-controlled games and simulations to practice language skills in context.
- Build dictation tools for writing practice and creative storytelling.
🎮 Example Projects #
- Voice Commands Game: Say “left”, “right”, “jump” to control a sprite.
- Magic 8-Ball: Ask a question out loud – the sprite responds with an answer.
- Voice Calculator: Say math problems like “what is five plus three” – sprite displays the answer.
- Language Quiz: Sprite asks “how do you say hello in Spanish?” – student responds with voice.
- Story Creator: Dictate a story and watch it appear on screen as you speak.
- Color Picker: Say a color name – sprite changes to that color.
- Pet Simulator: Give voice commands like “sit”, “play”, “sleep” to a virtual pet.
- Multilingual Greeter: Speak in different languages – sprite detects language and responds accordingly.
🧩 Try it yourself: pishi.ai/play
🔧 Tips & Troubleshooting #
🎤 Speech Recognition Specific Tips #
- Microphone not working? Check browser permissions – make sure the site has microphone access.
- Not recognizing speech accurately? Speak clearly and reduce background noise. Use a headset or external microphone for better quality.
- Wrong language detected? Make sure you set the correct language using “set language to [LANGUAGE]” before listening.
- “Speech Recognition not supported” error? This extension requires Chrome, Edge, or another Chromium-based browser with Web Speech API support. Safari and Firefox have limited or no support.
- Language not working in your browser? Some languages (like Persian) may not be supported in Edge or other browsers. Try using Chrome for the best compatibility.
- Timeout or no speech detected? The system waits about 60 seconds for speech – if nothing is detected, it will timeout and return empty text.
- Numbers not working correctly? For Persian or Arabic speech, use the number conversion block to normalize digits.
- Recognition quality varies? Speech recognition accuracy depends on accent, pronunciation, and speech clarity – results may vary between users.
- Internet connection: Web Speech API uses Google’s cloud services for most languages – an internet connection is needed for recognition to work.
- You can stop recognition anytime by stopping the script or reloading the project.
🔒 Privacy & Safety #
- This extension uses the browser’s built-in Web Speech API, which may send audio to Google’s servers for processing (depending on the browser implementation).
- Audio is processed for transcription only – speech text is returned to your browser and not stored by this extension.
- Audio is processed by your browser using the Web Speech API – Pishi.ai and Scratch never receive or store any data.
- No personal data or speech recordings are saved or transmitted by Pishi.ai or Scratch.
- Always review your browser’s privacy settings and permissions for microphone access.
🧪 Technical Info #
- API: Web Speech API (SpeechRecognition / webkitSpeechRecognition)
- Supported Browsers: Chrome, Edge, and most Chromium-based browsers that support the Web Speech API
- Languages: 25+ languages and regional variants
- Timeout: Approximately 60 seconds of listening per “listen and wait” call
- Recognition Mode: Single-shot recognition (one utterance per listen call; continuous mode not supported)
- Internet Required: Yes – Web Speech API typically uses cloud-based processing for transcription
- Number Conversion: Supports Persian (۰-۹) and Arabic-Indic (٠-٩) to English (0-9)
- Privacy: Audio processed by the browser API – Pishi.ai and Scratch never store or transmit your speech.
🔗 Related Extensions #
- 🔊 Text to Speech – convert text to spoken audio
- 💬 ChatGPT – generate intelligent replies, hold conversations, or build chatbots that understand and respond to your voice.
- 🌐 Translate – translate recognized speech into other languages for multilingual projects or real-time interpretation.
- 🏫 Google Teachable Machine – use your own trained models to recognize sounds, images, or poses made in Google Teachable Machine.
