Simple, hackable offline speech to text

Jasper · Post by **Jasper** » Tue Feb 27, 2024 5:54 pm

** Apologies to all ............. I tested this using JammyPup64CE **
I did do this myself and found the results incredibly promising.

This example used US English as the language model.

Once the application was running, I had my radio on in the background and it automatically began transcribing the dialogue.

Then I opened up Abiword and the text "as if by magic" appeared in the open document

You will need to load up the DevX.SFS first (tested only in FP95) as you will need to use Python and Git.

Python 3.10 is included in the DevX.

Open up a terminal and enter the following one line at a time.

Ensuring each command has been completed.

Code: Select all


pip install cffi

Code: Select all


pip3 install vosk

Code: Select all


git clone https://github.com/ideasman42/nerd-dictation.git

Code: Select all


cd nerd-dictation

Code: Select all


wget https://alphacephei.com/kaldi/models/vosk-model-small-en-us-0.15.zip

Code: Select all


unzip vosk-model-small-en-us-0.15.zip

Code: Select all


mv vosk-model-small-en-us-0.15 model

Once you have completed the above, you are ready to test it yourself.

So, have something to say, or play some music ........etc

Then in the same terminal window enter this to begin the dictation

Code: Select all


./nerd-dictation begin --vosk-model-dir=./model &

You can open up Abiword or a similar to see the results at the same time as the open terminal.

To end the test, put the following into a terminal window.

Code: Select all


./nerd-dictation end

Impressed??

If so, additional language models are available here:

https://alphacephei.com/vosk/models

Post by **mikewalsh** » Tue Feb 27, 2024 10:33 pm

@Jasper :-

Hm! "Sounds" interesting....

I may very well try this one out myself. Text-to-speech seems quite common - I have at least 3 Windows TTS apps running happily under WINE, including what used at one time to be the 'industry leader' in this field, TextAloud! (before Dragon Naturally Speaking came along and took over market dominance) - but the reverse is far less so. I can think of all sorts of uses for a genuine, functionally usable 'dictation' app...

(I never use Abiword myself, but I assume this will output to whatever your default word processor happens to be, yes?)

Thanks for the research, BTW. More power to your search engines!

Mike.

Jasper · Post by **Jasper** » Wed Feb 28, 2024 7:10 am

@mikewalsh

Please do give it a try, The downloaded files are relatively small in size eg 50mb and you need at least 300mb RAM to run the application.

One thing I did not remember to mention , is to ensure that you have a working microphone and if required adjust the level.

Yes, any word/text processing application should work.

I remember Text Aloud, I used it many years ago to convert PDF's to audio books. I had to use "copy & paste" a lot to create chapters. It did work well and I thought it was a great program. Never really used Dragon Speaking as it required you to have to speak prescribed text to ensure that it understood you.

A dictation application I think would be useful as they are common on mobile devices today and ideal for students or just recording voice notes.

Let me know your results, I would be interested

Post by **mikewalsh** » Wed Feb 28, 2024 1:30 pm

@Jasper :-

O-kay. Had an issue with this last night before turning in, so I've returned to it today. I'm getting this Python error:-

Code: Select all

root# cd ~./nerd-dictation
bash: cd: ~./nerd-dictation: No such file or directory
root# cd ~/nerd-dictation
root# ./nerd-dictation begin --vosk-model-dir=./model &
[1] 24943
root# Connection failure: Connection refused
pa_context_connect() failed: Connection refused
Traceback (most recent call last):
  File "./nerd-dictation", line 1974, in <module>
    main()
  File "./nerd-dictation", line 1970, in main
    args.func(args)
  File "./nerd-dictation", line 1835, in <lambda>
    func=lambda args: main_begin(
  File "./nerd-dictation", line 1437, in main_begin
    found_any = text_from_vosk_pipe(
  File "./nerd-dictation", line 957, in text_from_vosk_pipe
    import vosk  # type: ignore
  File "/usr/local/lib/python3.8/dist-packages/vosk/__init__.py", line 12, in <module>
    from .vosk_cffi import ffi as _ffi
  File "/usr/local/lib/python3.8/dist-packages/vosk/vosk_cffi.py", line 2, in <module>
    import _cffi_backend
ModuleNotFoundError: No module named '_cffi_backend'

No module named "_cffi_backend"..? Where should this be? I can sorta find my way around Python directories, but is this an extra module that needs adding from the repos? Python is an absolute bastard to trouble-shoot, as you're probably aware..!

EDIT:- 'Kay. Installed the _cffi_backend module from the repos. All I'm getting now is this:-

Code: Select all

root# cd ~/nerd-dictation
root# ./nerd-dictation begin --vosk-model-dir=./model &
[1] 24943
root# Connection failure: Connection refused
pa_context_connect() failed: Connection refused

Not even any traceback, so.....I'm stumped. Any ideas? I'm aware that you've updated a ton of stuff in your FP64 9.5.....ANY of which could be affecting this.

Mike.

Jasper · Post by **Jasper** » Wed Feb 28, 2024 2:23 pm

@mikewalsh

Try this at the beginning

Code: Select all


pip install cffi

then move onto

Code: Select all


pip3 install vosk

................ just running through it again!!

Jasper · Post by **Jasper** » Wed Feb 28, 2024 2:51 pm

@mikewalsh

Thanks for the feedback ........ I am embarrassed to say I was working on a laptop at the time which was running UpupJammy64 not FP95

I have corrected my initial post and updated the details.

Post by **mikewalsh** » Wed Feb 28, 2024 3:15 pm

Jasper wrote: ↑Wed Feb 28, 2024 2:51 pm
@mikewalsh

Thanks for the feedback ........ I am embarrassed to say I was working on a laptop at the time which was running UpupJammy64 not FP95

I have corrected my initial post and updated the details.

Aaahh......

Actually, I doubt this would have worked anyway. We desktop guys have one major issue here that you laptop guys don't have.

Laptops all come with a built-in microphone, which is seen by any OS as a global default for the entire system. With a desktop, there IS no built-in microphone. You have to plug in your own microphone, or use webcam microphone(s), or employ a headset .....and you have to find a way to specify which one you want to use. And in Puppy, AFAIK (I'm willing to be corrected here!) there is no method for setting a microphone for global use across the system. Everything has to be specified on a per-app basis. Especially when you're an ALSA guy like me.....you can keep PulseAudio/Pipewire as far as I'm concerned, because they're just adding further unnecessary complexity.

Never mind..!

Mike.

Jasper · Post by **Jasper** » Wed Feb 28, 2024 4:09 pm

@mikewalsh

I did not realise that the microphone might not work as I only tested in on a laptop.

Also, PulseAudio is needed.

Sorry about that!!

cobaka · Post by **cobaka** » Mon Mar 04, 2024 11:47 am

Hello all

I'm at the beginning of the process to get/run Jasper's speech-to-text application.

Jasper wrote:

You will need to load up the DevX.SFS first (tested only in FP95) as you will need to use Python and Git.

I'm using FossaPup 96. I don't know a lot about DevX (of DevX.sfs). At first Jasper said he did this in Fossa95 (above), but later wrote he worked in Jammy-Pup.
Well, I'm in Fossa96. I assumed I could get *.sfs files from the menu: Menu -> Setup -> SFS-Load -> <click> didn't work

Soooo .... I think I must get DevX.sfs from another place, but where? (see below)
After that: place DevX file in my 'home' directory. For me, I think this should be: /mnt/home/SYSTEM - ie the folder where puppy_fossapup64_9.6.sfs is found. Yes/No? <--<<
Question: Where do I find/download DevX.sfs for Fossa96?

Maybe here?
https://www.mediafire.com/file/j0v9gye5 ... 5.sfs/file <--<< This is '95" not "96". Is that important?

Thanks everyone!

mikeslr · Post by **mikeslr** » Mon Mar 04, 2024 3:11 pm

devx.sfs for F96, from the OP of the F96 thread, https://www.forum.puppylinux.com/viewto ... 882#p85882 > https://rockedge.org/kernels/data/ISO/F ... 64_9.6.sfs

The OP of Puppy threads often provides such link.

Further info, limitations: https://github.com/ideasman42/nerd-dictation, requires Python 3.6 (or newer).

cobaka · Post by **cobaka** » Mon Mar 04, 2024 9:58 pm

Hi @mikeslr & @Jasper

I found the DevX file (and - embarrassment, embarrassment - when I went to save it discovered it was already on my HDD.
Yes, in a folder /mnt/sda2/software/Puppy_Linux_masters/Fossa64/
What an unusual place to keep the DevX file. Strange but true!

Loaded DevX. Confirmed by message from Python: ->
python <-- me
Python 3.8.10 (default, Jun 22 2022, 20:18:18) <--<< Python
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

After running pip install cffi a few times (got a syntax error message each time) I read that "pip" does not work in the Python shell, but I should be in the bash shell.
(I'm a novice here). Still trying to get "pip install cffi" to run w/out error. <-- present stage of progress.
Help very welcome while I read everything I can find using Duck-Duck search engine .....

I hope I can get/run speech to text.
cobaka

cobaka · Post by **cobaka** » Tue Mar 05, 2024 1:46 am

Continuing from the previous post (Tue, Mar 5th at 8:58)
I'm trying to install/run speech to text - following Jas[er's success.
I'm still at the beginning of the process - but I'm confident I will get "there".

I have discovered I don't have "pip" or "pip3" on my PC.

I read the following command-line should install pip: sudo apt install python3-pip
The result: sudo: apt: command not found
I need another method to install "pip" or "pip3".
Help appreciated.

Cobaka.

My rig is: Fossa96CE running on a 2012/13 Giga-something 64-bit CPU.
I have DexX loaded from an "official" website.
Python 3.8.10 runs when I type "python" in bash/terminal/CLI.
I'm pretty much a novice at this game.

Geek3579 · Post by **Geek3579** » Tue Mar 05, 2024 5:34 am

I attempted to run nerd-dictation in BW64 and got the same error as mentioned earlier. No success with FP95 which had pulseaudio. Similar in a Debian Dog Virtual machine using QEMU.

However, I setup a Jammypup64 virtual machine using QEMU, and it worked first time. The audio passthrough (using switch: -device AC97) seemed to be effective. Now I need to get the audio setup optimised, as my first try was littered with output errors. I have since installed a frugal version of JP64 and will try it on that setup, where it will have better speed and memory availability.

The instructions do not say that the output will be wherever the cursor is, once nerd-dictation is started. So if you start the program from the terminal, that is where the text will go unless you quickly move the cursor to a blank text document before talking. Stopping the application is also an issue, so I wrote a bash script to open up another terminal and end the process.

Jasper · Post by **Jasper** » Tue Mar 05, 2024 6:50 am

@cobaka

Do this first to update your 'pip'

then in the same terminal window:

cobaka · Post by **cobaka** » Tue Mar 05, 2024 7:51 am

Hello @Jasper

My puppy seems to lack "pip"

This is what 'bash/terminal' told me:

pip install cffi
bash: pip: command not found

python3 -m pip install --upgrade pip
/usr/bin/python3: No module named pip
python3 -m pip install
/usr/bin/python3: No module named pip

find / -iname "pip" <== found nothing. Well, maybe 'pip' has some trailing characters beginning with dash "-"
find / -iname "pip-" <== Anything? No ...
find / -iname "pip" | more <=== I won't print the result of this search. I found pages of "pipes" and a few other files, but not one "pip*"

Did 'find' search EVERY storage device? I'm confident it searched /mnt/sda2.

I'm currently searching the web for a way to get/install pip3

o-o-o time passes - then:
I found this command called 'curl' and tried it as you see below.

# curl https://bootstrap.pypa.io/pip/2.7/get-pip.py --output get-pip.py
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1863k 100 1863k 0 0 2317k 0 --:--:-- --:--:-- --:--:-- 2314k

o-o-o
a file called "get-pip.py" is in the "real" root directory. The directory called "/" not "root".

o-o-o to be continued .... o-o-o

cobaka.

cobaka · Post by **cobaka** » Tue Mar 05, 2024 8:13 am

@Jasper

A summary of this posting: The software is in place, but I haven't plugged in/used a microphone.
The rest of this thread describes getting/installing the software. Very routine.
I'll write about using the microphone in a new message.

o-o-o

Following on from previous posting.
I needed to get pip - and I believe I have it. Looky!

$ which pip
/usr/local/bin/pip
$ pip --version
pip 20.3.4 from /usr/local/lib/python3.8/dist-packages/pip (python 3.8)

o-o-o OK. That suggests I got and installed pip.
Now I continue using your (Jasper's) dialog from the original posting.
Thank you for your patience. I'm a novice in this part of the woods ....

o-o-o however - I am now following every step given in your original posting and everything (until now) looks good.
I am in the directory 'nerd-dictation' and ls reveals this:
$ ls
changelog.rst hacking.rst _misc package readme.rst readme-ydotool.rst
examples LICENSE nerd-dictation pyproject.toml readme-sox.rst tests
$

o-o-o-K sorry to be verbose. It looks like everything worked (once I got pip working - thank to you)
I am now at the point (in your instructions) where you wrote:

Once you have completed the above, you are ready to test it yourself.

My next task is to plug a microphone into the desktop and see what happens.
Wish me luck.

cobaka

PS I ran into a minor problem in the command wget. Copying and pasting gave an error message.
I gave up pasting and retyped the line, in the terminal and bingo - it worked.

cobaka · Post by **cobaka** » Sun Mar 10, 2024 9:53 am

Hello @Jasper and @mikewalsh

The state of the game at the moment.
I thought I had the software installed correctly.
I found a microphone and using gWaveEdit (from the menu) saw the VU meter move to center scale when I spoke etc.
The mic is connecting to Puppy in some fundamental way.

At this point I found that the directory 'nerd-dictation' is in the directory "/" - i.e. the REAL root of the directory tree.
In your setup the directory 'nerd-dictation' is in the directory /root. This is the directory for a USER called 'root'
Looky here:

Code: Select all

# echo $USER
root

Well - when I am running from the directory 'nerd-dictation' and run the program I get an error. You can see this under my signature.
Help/observations welcome!

cobaka

color=#FF0000 The error:[/color]

Code: Select all

# pwd
/
# cd /nerd-dictation/
# ./nerd-dictation begin --vosk-model-dir=./model &
[1] 11879
# Traceback (most recent call last):
  File "./nerd-dictation", line 1974, in <module>
    main()
  File "./nerd-dictation", line 1970, in main
    args.func(args)
  File "./nerd-dictation", line 1835, in <lambda>
    func=lambda args: main_begin(
  File "./nerd-dictation", line 1437, in main_begin
    found_any = text_from_vosk_pipe(
  File "./nerd-dictation", line 957, in text_from_vosk_pipe
    import vosk  # type: ignore
  File "/usr/local/lib/python3.8/dist-packages/vosk/__init__.py", line 12, in <module>
    from .vosk_cffi import ffi as _ffi
  File "/usr/local/lib/python3.8/dist-packages/vosk/vosk_cffi.py", line 2, in <module>
    import _cffi_backend
ModuleNotFoundError: No module named '_cffi_backend'
write() failed: Broken pipe
^C
[1]+  Exit 1                  ./nerd-dictation begin --vosk-model-dir=./model
#

What about cffi?
Well - cffi exists. I know only that 'cffi' is the "C" function interface. Nothing more.

Code: Select all

$find /  -iname "cffi"
/usr/lib/python3/dist-packages/cffi

(the end)

cobaka · Post by **cobaka** » Mon Mar 18, 2024 7:48 am

Hello @Jasper, @mikewalsh & @mikeslr

Mike - you had a 'go' at running this - but seem to have fallen away in the last week or so.
I notice you re-located discussion to other topics too.
I'm keen to re-activate this topic; I would like (very much) to introduce speech-to-text on Fossa 96CE (and other pups too).

I believe I'm only a few steps away from running speech to text on my desktop.
(See below for detail).

cobaka

Puppy Linux Discussion Forum

Simple, hackable offline speech to text

Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text