View Categories

Speech Recognition – Voice Control for Scratch

🎤 Speech Recognition – Voice Control for Scratch #

The Speech Recognition extension brings real voice control into Scratch.
It lets you capture spoken words and convert them into text – enabling voice commands, dictation, language learning projects, and interactive voice-controlled games – all in real time, right in your browser.
Simple enough for beginners, powerful enough for creative classrooms.


🌟 Overview #

  • Voice to Text: Capture spoken words and use them in your Scratch projects.
  • Multi-Language Support: Recognize speech in 25+ languages including English, Spanish, French, German, Arabic, Persian, Chinese, Japanese, Korean, and many more.
  • Real-Time Recognition: Get instant transcription of spoken words.
  • Easy Integration: Use simple blocks to start listening and retrieve recognized text.
  • Number Conversion: Convert Persian/Arabic digits to English numerals for easier processing.
  • Browser-Based: Works in Chrome, Microsoft Edge, and most Chromium-based browsers that support the Web Speech API.

Key Features #

  • 25+ supported languages with multiple regional variants.
  • Built-in browser speech recognition API (Web Speech API).
  • Simple listen-and-wait command for synchronous voice capture.
  • Language switching on-the-fly for multilingual projects.
  • Works directly through your browser – fast and responsive.
  • No setup or API keys required – works automatically with your browser’s built-in API.

🚀 How to Use #

  1. Go to: pishi.ai/play
  2. Open the Extensions section.
  3. Select the Speech Recognition extension.
  4. Allow microphone access when prompted by your browser.
  5. Set Language: Use the “set language to [LANGUAGE]” block to choose your desired language (default is English).
  6. Start Listening: Use the “listen and wait” block – the extension will start listening to your microphone and wait for you to speak.
  7. Retrieve Speech: After speaking, the recognized text is stored and can be retrieved using the “speech” reporter block.
  8. Use in Projects: Combine with other blocks to create voice-controlled games, dictation tools, language learning activities, and more!

Tips

  • Speak clearly and at a normal pace for best recognition accuracy.
  • Use a quiet environment to minimize background noise interference.
  • For languages with non-English numerals (Arabic/Persian), use the “convert persian/arabic digits to english” block to normalize numbers.
  • Works best in Chrome, Edge, and other Chromium-based browsers with Web Speech API support.

🧱 Blocks and Functions #

 

🎤 Voice Capture #

listen and wait

Starts listening to the microphone and waits for speech input to be recognized.
This is a blocking command – the script will pause until speech is detected and transcribed, or until a timeout/error occurs (approximately 60 seconds).

How it works:

  • When this block runs, the browser starts listening through your microphone.
  • Speak your words clearly – the recognition system will transcribe what you say.
  • Once speech is recognized, the text is saved and the block completes.
  • If no speech is detected or an error occurs, the block will complete with an empty result.

Important: Make sure microphone permissions are enabled in your browser settings.

speech
Reports the last recognized speech text captured by the “listen and wait” block.
Returns the transcribed text as a string, or an empty string if no speech was recognized or an error occurred.

Example:

  • If you say “hello scratch”, this block will report “hello scratch”.
  • Use this block to display recognized speech, compare words, trigger actions, or store speech in variables.

🌍 Language Settings #

set language to [LANGUAGE]
Sets the language for speech recognition.
[LANGUAGE]: choose from a dropdown menu of 25+ supported languages and regional variants.

Supported Languages:

  • Arabic (العربية)
  • Persian (فارسی)
  • English (English)
  • English – American (en-US)
  • English – British (en-GB)
  • German (Deutsch)
  • Spanish – Spain (Español)
  • Spanish – Latin America (Español Latinoamericano)
  • French (Français)
  • Italian (Italiano)
  • Portuguese (Português)
  • Portuguese – Brazilian (Português Brasileiro)
  • Russian (Русский)
  • Turkish (Türkçe)
  • Ukrainian (Українська)
  • Korean (한국어)
  • Japanese (日本語)
  • Chinese – Simplified (简体中文)
  • Chinese – Traditional (繁體中文)
  • Hindi (हिंदी)
  • Bengali (বাংলা)
  • Indonesian (Bahasa Indonesia)
  • Azeri (Azəri)
  • Kazakh (Қазақша)
  • Uzbek (Oʻzbekcha)

Note: Some languages may be supported in Chrome but not in other browsers.
For example, Persian (فارسی) currently works in Google Chrome but may show a “network” error in Microsoft Edge.
If a language doesn’t recognize speech, try testing in Chrome first.

 

Example uses:

  • Create multilingual projects that switch between languages.
  • Build language learning tools that recognize words in different languages.
  • Make voice-controlled games accessible to speakers of many languages.

🔢 Number Conversion #

convert persian/arabic digits to english
Converts Persian (۰-۹) or Arabic (٠-٩) numerals in the recognized speech text to English digits (0-9).
This is useful for projects that need to process numbers spoken in Persian or Arabic, making them easier to use in math operations or comparisons.

How it works:

  • Automatically detects Persian digits (۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹) and converts them to (0 1 2 3 4 5 6 7 8 9).
  • Automatically detects Arabic-Indic digits (٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩) and converts them to (0 1 2 3 4 5 6 7 8 9).
  • The converted text replaces the original speech text – call this block after “listen and wait” if needed.

Example:
If the recognized speech is “شماره ۱۲۳” (number 123 in Persian digits), after conversion the speech text becomes “شماره 123” with English digits.

Tip: Use this block when working with Persian or Arabic speech to ensure numbers are in a format Scratch can easily use for calculations.


🎓 Educational Uses #

  • Language Learning: Create pronunciation practice tools – students speak words and the system checks their speech.
  • Build vocabulary quizzes that respond to spoken answers.
  • Develop multilingual storytelling projects where characters respond to voice commands in different languages.
  • Teach accessibility concepts – show how voice control can help people with disabilities interact with technology.
  • Explore natural language processing – students learn how machines understand human speech.
  • Create voice-controlled games and simulations to practice language skills in context.
  • Build dictation tools for writing practice and creative storytelling.

🎮 Example Projects #

  • Voice Commands Game: Say “left”, “right”, “jump” to control a sprite.
  • Magic 8-Ball: Ask a question out loud – the sprite responds with an answer.
  • Voice Calculator: Say math problems like “what is five plus three” – sprite displays the answer.
  • Language Quiz: Sprite asks “how do you say hello in Spanish?” – student responds with voice.
  • Story Creator: Dictate a story and watch it appear on screen as you speak.
  • Color Picker: Say a color name – sprite changes to that color.
  • Pet Simulator: Give voice commands like “sit”, “play”, “sleep” to a virtual pet.
  • Multilingual Greeter: Speak in different languages – sprite detects language and responds accordingly.

🧩 Try it yourself: pishi.ai/play


🔧 Tips & Troubleshooting #

 

🎤 Speech Recognition Specific Tips #

  • Microphone not working? Check browser permissions – make sure the site has microphone access.
  • Not recognizing speech accurately? Speak clearly and reduce background noise. Use a headset or external microphone for better quality.
  • Wrong language detected? Make sure you set the correct language using “set language to [LANGUAGE]” before listening.
  • “Speech Recognition not supported” error? This extension requires Chrome, Edge, or another Chromium-based browser with Web Speech API support. Safari and Firefox have limited or no support.
  • Language not working in your browser? Some languages (like Persian) may not be supported in Edge or other browsers. Try using Chrome for the best compatibility.
  • Timeout or no speech detected? The system waits about 60 seconds for speech – if nothing is detected, it will timeout and return empty text.
  • Numbers not working correctly? For Persian or Arabic speech, use the number conversion block to normalize digits.
  • Recognition quality varies? Speech recognition accuracy depends on accent, pronunciation, and speech clarity – results may vary between users.
  • Internet connection: Web Speech API uses Google’s cloud services for most languages – an internet connection is needed for recognition to work.
  • You can stop recognition anytime by stopping the script or reloading the project.

🔒 Privacy & Safety #

  • This extension uses the browser’s built-in Web Speech API, which may send audio to Google’s servers for processing (depending on the browser implementation).
  • Audio is processed for transcription only – speech text is returned to your browser and not stored by this extension.
  • Audio is processed by your browser using the Web Speech API – Pishi.ai and Scratch never receive or store any data.
  • No personal data or speech recordings are saved or transmitted by Pishi.ai or Scratch.
  • Always review your browser’s privacy settings and permissions for microphone access.

🧪 Technical Info #

  • API: Web Speech API (SpeechRecognition / webkitSpeechRecognition)
  • Supported Browsers: Chrome, Edge, and most Chromium-based browsers that support the Web Speech API
  • Languages: 25+ languages and regional variants
  • Timeout: Approximately 60 seconds of listening per “listen and wait” call
  • Recognition Mode: Single-shot recognition (one utterance per listen call; continuous mode not supported)
  • Internet Required: Yes – Web Speech API typically uses cloud-based processing for transcription
  • Number Conversion: Supports Persian (۰-۹) and Arabic-Indic (٠-٩) to English (0-9)
  • Privacy: Audio processed by the browser API – Pishi.ai and Scratch never store or transmit your speech.

🔗 Related Extensions #

  • 🔊 Text to Speech – convert text to spoken audio
  • 💬 ChatGPT – generate intelligent replies, hold conversations, or build chatbots that understand and respond to your voice.
  • 🌐 Translate – translate recognized speech into other languages for multilingual projects or real-time interpretation.
  • 🏫 Google Teachable Machine – use your own trained models to recognize sounds, images, or poses made in Google Teachable Machine.

📚 Learn More #


Scroll to Top