Simple, hackable offline speech to text

Jasper · Post by **Jasper** » Tue Feb 27, 2024 5:54 pm

Original post deleted by OP, thread may be of interest.

Post by **mikewalsh** » Tue Feb 27, 2024 10:33 pm

@Jasper :-

Hm! "Sounds" interesting....

I may very well try this one out myself. Text-to-speech seems quite common - I have at least 3 Windows TTS apps running happily under WINE, including what used at one time to be the 'industry leader' in this field, TextAloud! (before Dragon Naturally Speaking came along and took over market dominance) - but the reverse is far less so. I can think of all sorts of uses for a genuine, functionally usable 'dictation' app...

(I never use Abiword myself, but I assume this will output to whatever your default word processor happens to be, yes?)

Thanks for the research, BTW. More power to your search engines!

Mike.

Jasper · Post by **Jasper** » Wed Feb 28, 2024 7:10 am

@mikewalsh

Please do give it a try, The downloaded files are relatively small in size eg 50mb and you need at least 300mb RAM to run the application.

One thing I did not remember to mention , is to ensure that you have a working microphone and if required adjust the level.

Yes, any word/text processing application should work.

I remember Text Aloud, I used it many years ago to convert PDF's to audio books. I had to use "copy & paste" a lot to create chapters. It did work well and I thought it was a great program. Never really used Dragon Speaking as it required you to have to speak prescribed text to ensure that it understood you.

A dictation application I think would be useful as they are common on mobile devices today and ideal for students or just recording voice notes.

Let me know your results, I would be interested

Post by **mikewalsh** » Wed Feb 28, 2024 1:30 pm

@Jasper :-

O-kay. Had an issue with this last night before turning in, so I've returned to it today. I'm getting this Python error:-

Code: Select all

root# cd ~./nerd-dictation
bash: cd: ~./nerd-dictation: No such file or directory
root# cd ~/nerd-dictation
root# ./nerd-dictation begin --vosk-model-dir=./model &
[1] 24943
root# Connection failure: Connection refused
pa_context_connect() failed: Connection refused
Traceback (most recent call last):
  File "./nerd-dictation", line 1974, in <module>
    main()
  File "./nerd-dictation", line 1970, in main
    args.func(args)
  File "./nerd-dictation", line 1835, in <lambda>
    func=lambda args: main_begin(
  File "./nerd-dictation", line 1437, in main_begin
    found_any = text_from_vosk_pipe(
  File "./nerd-dictation", line 957, in text_from_vosk_pipe
    import vosk  # type: ignore
  File "/usr/local/lib/python3.8/dist-packages/vosk/__init__.py", line 12, in <module>
    from .vosk_cffi import ffi as _ffi
  File "/usr/local/lib/python3.8/dist-packages/vosk/vosk_cffi.py", line 2, in <module>
    import _cffi_backend
ModuleNotFoundError: No module named '_cffi_backend'

No module named "_cffi_backend"..? Where should this be? I can sorta find my way around Python directories, but is this an extra module that needs adding from the repos? Python is an absolute bastard to trouble-shoot, as you're probably aware..!

EDIT:- 'Kay. Installed the _cffi_backend module from the repos. All I'm getting now is this:-

Code: Select all

root# cd ~/nerd-dictation
root# ./nerd-dictation begin --vosk-model-dir=./model &
[1] 24943
root# Connection failure: Connection refused
pa_context_connect() failed: Connection refused

Not even any traceback, so.....I'm stumped. Any ideas? I'm aware that you've updated a ton of stuff in your FP64 9.5.....ANY of which could be affecting this.

Mike.

Jasper · Post by **Jasper** » Wed Feb 28, 2024 2:23 pm

@mikewalsh

Try this at the beginning

Code: Select all


pip install cffi

then move onto

Code: Select all


pip3 install vosk

................ just running through it again!!

Jasper · Post by **Jasper** » Wed Feb 28, 2024 2:51 pm

@mikewalsh

Thanks for the feedback ........ I am embarrassed to say I was working on a laptop at the time which was running UpupJammy64 not FP95

I have corrected my initial post and updated the details.

Post by **mikewalsh** » Wed Feb 28, 2024 3:15 pm

Jasper wrote: Wed Feb 28, 2024 2:51 pm
@mikewalsh

Thanks for the feedback ........ I am embarrassed to say I was working on a laptop at the time which was running UpupJammy64 not FP95

I have corrected my initial post and updated the details.

Aaahh......

Actually, I doubt this would have worked anyway. We desktop guys have one major issue here that you laptop guys don't have.

Laptops all come with a built-in microphone, which is seen by any OS as a global default for the entire system. With a desktop, there IS no built-in microphone. You have to plug in your own microphone, or use webcam microphone(s), or employ a headset .....and you have to find a way to specify which one you want to use. And in Puppy, AFAIK (I'm willing to be corrected here!) there is no method for setting a microphone for global use across the system. Everything has to be specified on a per-app basis. Especially when you're an ALSA guy like me.....you can keep PulseAudio/Pipewire as far as I'm concerned, because they're just adding further unnecessary complexity.

Never mind..!

Mike.

Jasper · Post by **Jasper** » Wed Feb 28, 2024 4:09 pm

@mikewalsh

I did not realise that the microphone might not work as I only tested in on a laptop.

Also, PulseAudio is needed.

Sorry about that!!

cobaka · Post by **cobaka** » Mon Mar 04, 2024 11:47 am

Hello all

I'm at the beginning of the process to get/run Jasper's speech-to-text application.

Jasper wrote:

You will need to load up the DevX.SFS first (tested only in FP95) as you will need to use Python and Git.

I'm using FossaPup 96. I don't know a lot about DevX (of DevX.sfs). At first Jasper said he did this in Fossa95 (above), but later wrote he worked in Jammy-Pup.
Well, I'm in Fossa96. I assumed I could get *.sfs files from the menu: Menu -> Setup -> SFS-Load -> <click> didn't work

Soooo .... I think I must get DevX.sfs from another place, but where? (see below)
After that: place DevX file in my 'home' directory. For me, I think this should be: /mnt/home/SYSTEM - ie the folder where puppy_fossapup64_9.6.sfs is found. Yes/No? <--<<
Question: Where do I find/download DevX.sfs for Fossa96?

Maybe here?
https://www.mediafire.com/file/j0v9gye5 ... 5.sfs/file <--<< This is '95" not "96". Is that important?

Thanks everyone!

mikeslr · Post by **mikeslr** » Mon Mar 04, 2024 3:11 pm

devx.sfs for F96, from the OP of the F96 thread, https://www.forum.puppylinux.com/viewto ... 882#p85882 > https://rockedge.org/kernels/data/ISO/F ... 64_9.6.sfs

The OP of Puppy threads often provides such link.

Further info, limitations: https://github.com/ideasman42/nerd-dictation, requires Python 3.6 (or newer).

cobaka · Post by **cobaka** » Mon Mar 04, 2024 9:58 pm

Hi @mikeslr & @Jasper

I found the DevX file (and - embarrassment, embarrassment - when I went to save it discovered it was already on my HDD.
Yes, in a folder /mnt/sda2/software/Puppy_Linux_masters/Fossa64/
What an unusual place to keep the DevX file. Strange but true!

Loaded DevX. Confirmed by message from Python: ->
python <-- me
Python 3.8.10 (default, Jun 22 2022, 20:18:18) <--<< Python
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

After running pip install cffi a few times (got a syntax error message each time) I read that "pip" does not work in the Python shell, but I should be in the bash shell.
(I'm a novice here). Still trying to get "pip install cffi" to run w/out error. <-- present stage of progress.
Help very welcome while I read everything I can find using Duck-Duck search engine .....

I hope I can get/run speech to text.
cobaka

cobaka · Post by **cobaka** » Tue Mar 05, 2024 1:46 am

Continuing from the previous post (Tue, Mar 5th at 8:58)
I'm trying to install/run speech to text - following Jas[er's success.
I'm still at the beginning of the process - but I'm confident I will get "there".

I have discovered I don't have "pip" or "pip3" on my PC.

I read the following command-line should install pip: sudo apt install python3-pip
The result: sudo: apt: command not found
I need another method to install "pip" or "pip3".
Help appreciated.

Cobaka.

My rig is: Fossa96CE running on a 2012/13 Giga-something 64-bit CPU.
I have DexX loaded from an "official" website.
Python 3.8.10 runs when I type "python" in bash/terminal/CLI.
I'm pretty much a novice at this game.

Geek3579 · Post by **Geek3579** » Tue Mar 05, 2024 5:34 am

I attempted to run nerd-dictation in BW64 and got the same error as mentioned earlier. No success with FP95 which had pulseaudio. Similar in a Debian Dog Virtual machine using QEMU.

However, I setup a Jammypup64 virtual machine using QEMU, and it worked first time. The audio passthrough (using switch: -device AC97) seemed to be effective. Now I need to get the audio setup optimised, as my first try was littered with output errors. I have since installed a frugal version of JP64 and will try it on that setup, where it will have better speed and memory availability.

The instructions do not say that the output will be wherever the cursor is, once nerd-dictation is started. So if you start the program from the terminal, that is where the text will go unless you quickly move the cursor to a blank text document before talking. Stopping the application is also an issue, so I wrote a bash script to open up another terminal and end the process.

Jasper · Post by **Jasper** » Tue Mar 05, 2024 6:50 am

@cobaka

Do this first to update your 'pip'

then in the same terminal window:

cobaka · Post by **cobaka** » Tue Mar 05, 2024 7:51 am

Hello @Jasper

My puppy seems to lack "pip"

This is what 'bash/terminal' told me:

pip install cffi
bash: pip: command not found

python3 -m pip install --upgrade pip
/usr/bin/python3: No module named pip
python3 -m pip install
/usr/bin/python3: No module named pip

find / -iname "pip" <== found nothing. Well, maybe 'pip' has some trailing characters beginning with dash "-"
find / -iname "pip-" <== Anything? No ...
find / -iname "pip" | more <=== I won't print the result of this search. I found pages of "pipes" and a few other files, but not one "pip*"

Did 'find' search EVERY storage device? I'm confident it searched /mnt/sda2.

I'm currently searching the web for a way to get/install pip3

o-o-o time passes - then:
I found this command called 'curl' and tried it as you see below.

# curl https://bootstrap.pypa.io/pip/2.7/get-pip.py --output get-pip.py
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1863k 100 1863k 0 0 2317k 0 --:--:-- --:--:-- --:--:-- 2314k

o-o-o
a file called "get-pip.py" is in the "real" root directory. The directory called "/" not "root".

o-o-o to be continued .... o-o-o

cobaka.

cobaka · Post by **cobaka** » Tue Mar 05, 2024 8:13 am

@Jasper

A summary of this posting: The software is in place, but I haven't plugged in/used a microphone.
The rest of this thread describes getting/installing the software. Very routine.
I'll write about using the microphone in a new message.

o-o-o

Following on from previous posting.
I needed to get pip - and I believe I have it. Looky!

$ which pip
/usr/local/bin/pip
$ pip --version
pip 20.3.4 from /usr/local/lib/python3.8/dist-packages/pip (python 3.8)

o-o-o OK. That suggests I got and installed pip.
Now I continue using your (Jasper's) dialog from the original posting.
Thank you for your patience. I'm a novice in this part of the woods ....

o-o-o however - I am now following every step given in your original posting and everything (until now) looks good.
I am in the directory 'nerd-dictation' and ls reveals this:
$ ls
changelog.rst hacking.rst _misc package readme.rst readme-ydotool.rst
examples LICENSE nerd-dictation pyproject.toml readme-sox.rst tests
$

o-o-o-K sorry to be verbose. It looks like everything worked (once I got pip working - thank to you)
I am now at the point (in your instructions) where you wrote:

Once you have completed the above, you are ready to test it yourself.

My next task is to plug a microphone into the desktop and see what happens.
Wish me luck.

cobaka

PS I ran into a minor problem in the command wget. Copying and pasting gave an error message.
I gave up pasting and retyped the line, in the terminal and bingo - it worked.

cobaka · Post by **cobaka** » Sun Mar 10, 2024 9:53 am

Hello @Jasper and @mikewalsh

The state of the game at the moment.
I thought I had the software installed correctly.
I found a microphone and using gWaveEdit (from the menu) saw the VU meter move to center scale when I spoke etc.
The mic is connecting to Puppy in some fundamental way.

At this point I found that the directory 'nerd-dictation' is in the directory "/" - i.e. the REAL root of the directory tree.
In your setup the directory 'nerd-dictation' is in the directory /root. This is the directory for a USER called 'root'
Looky here:

Code: Select all

# echo $USER
root

Well - when I am running from the directory 'nerd-dictation' and run the program I get an error. You can see this under my signature.
Help/observations welcome!

cobaka

color=#FF0000 The error:[/color]

Code: Select all

# pwd
/
# cd /nerd-dictation/
# ./nerd-dictation begin --vosk-model-dir=./model &
[1] 11879
# Traceback (most recent call last):
  File "./nerd-dictation", line 1974, in <module>
    main()
  File "./nerd-dictation", line 1970, in main
    args.func(args)
  File "./nerd-dictation", line 1835, in <lambda>
    func=lambda args: main_begin(
  File "./nerd-dictation", line 1437, in main_begin
    found_any = text_from_vosk_pipe(
  File "./nerd-dictation", line 957, in text_from_vosk_pipe
    import vosk  # type: ignore
  File "/usr/local/lib/python3.8/dist-packages/vosk/__init__.py", line 12, in <module>
    from .vosk_cffi import ffi as _ffi
  File "/usr/local/lib/python3.8/dist-packages/vosk/vosk_cffi.py", line 2, in <module>
    import _cffi_backend
ModuleNotFoundError: No module named '_cffi_backend'
write() failed: Broken pipe
^C
[1]+  Exit 1                  ./nerd-dictation begin --vosk-model-dir=./model
#

What about cffi?
Well - cffi exists. I know only that 'cffi' is the "C" function interface. Nothing more.

Code: Select all

$find /  -iname "cffi"
/usr/lib/python3/dist-packages/cffi

(the end)

cobaka · Post by **cobaka** » Mon Mar 18, 2024 7:48 am

Hello @Jasper, @mikewalsh & @mikeslr

Mike - you had a 'go' at running this - but seem to have fallen away in the last week or so.
I notice you re-located discussion to other topics too.
I'm keen to re-activate this topic; I would like (very much) to introduce speech-to-text on Fossa 96CE (and other pups too).

I believe I'm only a few steps away from running speech to text on my desktop.
(See below for detail).

cobaka

cobaka · Post by **cobaka** » Mon Aug 19, 2024 11:18 am

Hello all.

I want to get nerd dictation running (after an absence of some months).

I am running FossaPup 64 9.6 CE, distro date -> March 2023 (if that helps).

Further down this thread I see advice about getting pip and cff1.
On my system I have pip-24.2.

Here it is: # find / -iname 'pip'
/usr/local/lib/python3.8/dist-packages/pip

I have cffi
# pip install cffi
Requirement already satisfied: cffi in /usr/lib/python3/dist-packages (1.14.0)
# find / -iname 'cffi_*'
/usr/lib/python3/dist-packages/cffi/cffi_opcode.py

When I run the command: ./nerd-dictation begin --vosk-model-dir=./model &
I get a series of error messages. Tracing these through the last error message is:

File "/usr/local/lib/python3.8/dist-packages/vosk/vosk_cffi.py", line 2, in <module>
import _cffi_backend
ModuleNotFoundError: No module named '_cffi_backend'

In the file vosk_cffi.py the first six lines are:

Code: Select all

# auto-generated file
import _cffi_backend

[color=#BF0000]Note:[/color]  error message reports "ModuleNotFoundError:  No module named _cffi_backend"

ffi = _cffi_backend.FFI('vosk.vosk_cffi',
    _version = 0x2601,
    _types = b'\x00\x00\x04\x0D\x00\x00\x65\x03\x00\x00\x00\x0F\x00\x00\x1C\x0D\x00\x00\x60\x03\x

Code (above) reads "import _cffi_backend" (This is line 2 mentioned in the error report)

The problems may be in the underscore ("-") before "cffi". It appears in line 2.
The file in my file system has no underscore.
Does anyone, especially (@Jasper know what's going on?
I think I'm almost there. Help appreciated.

cobaka

Jasper · Post by **Jasper** » Mon Aug 19, 2024 12:01 pm

@cobaka

Please note, my initial instructions were for Jammypup64.

I tested this on a laptop which has an inbuilt microphone.

Maybe this will help?

Update PIP

Code: Select all


 python -m pip install --upgrade pip

then remove the existing package cffi

Code: Select all


pip uinstall cffi

finally, download the cffi package again

Code: Select all


pip install --upgrade --force-reinstall cffi

cobaka · Post by **cobaka** » Tue Aug 20, 2024 12:07 am

@Jasper

I understand your point - you use JammyPup and I'm trying to install nerd-dictation on FossaPup.

I want to get nerd-dictation working under FossaPup (eventually) because you recommend it but also there is a good review of nerd-dictation here:
https://www.youtube.com/watch?v=Cw1SESc8sdA

As a first step, I will install nerd-dictation with JammyPup on a spare machine I have here.
After I get it running with your configuration I'll return to FossaPup.
This piece of software appears most useful. I suggest every Puppian with a 64-bit machine will find it useful.
Especially this Puppian.
I may install other Puppies too and see if it will run. The review on YouTube suggests it's very useful.

Thanks for your help, Jasper.

Cobaka.

PS: For anyone who can comment about the file-name discrepancy in my preceding posting (leading underscore vs no underscore) - then please speak up. I suspect this is the problem, but I don't see the solution (yet). I think that posting is self-contained - i.e. you will understand the problem after reading just one posting.

cobaka · Post by **cobaka** » Tue Aug 20, 2024 12:35 am

. Here is a review of nerd-diction on YouTube by arthurPizza

The link:
https://www.youtube.com/watch?v=Cw1SESc8sdA
It's the same link I gave earlier in this thread.
I added it to make this posting self-contained.

Here is a comment by @DigitalMetal about Arthur's review:

<I commented earlier, but I don't see my comment now.
Thanks for this video. I've played with mozilla Deep Speak in the past, but it always seemed kind of bulky and slowly me. although i never did try setting a to use the gpu rather than a cpu. this [nerd-dictation] seems like a simpler solution. in fact i already have it up and running and i'm writing this comment using my voice, which is why nothing seems to be capitalized. [comment: I (cobaka) added the capitals to 'deep speak' to clarify DigitalMetal's meaning somewhat.]

this will definitely be useful for me, and i already have it set up where the keyboard shortcut to turn it on and off. >

and another comment by @arnaudmosse6894:

<This tool is great, but without capitalize it is not usable. Punctiation can be done by hand or in the config file (replace period by .), but capital at the first letter of sentence cannot. >

My observation: Any written communication should be proof-read. The task of capitalisation while proof-reading is a minor inconvenience. I write reports and critiques. When I do this I remember the advice of John Steinbeck, who said something like: Editing becomes difficult after the fourth revision. Steinbeck is right. That's hard advice, but true. I will capitalize at the same time I edit. For me - no problemo.

Here is another observation by @now-you-know-it:

<Yes you are good! The problem here is us "fairly newbies" do not have a clue [about] what you just said and do. Our heads are still a bit to flat. Would it be possible to make a video how to step by step. Show what to type and install the commands etc. Yes a lot of work but for newbies this would be a good learning experience.>

Finally, there are still some bugs. @harshavardhanaradhyahu2870 got the same bug as I got.

<I got an error like this "failed to create a model">

No need to say more.
My challenge is to get this working with FossaPup.

Grey · Post by **Grey** » Mon Dec 23, 2024 10:07 am

I'm currently playing with three projects. Voice assistant (Vasisualy - I use brunch that doesn't use Google services but vosk), vosk-api and speech synthesizer (RHVoice). The assistant is working, and the speech synthesizer is also beyond praise. Specifically, vosk seems to work inside the assistant, it recognizes my commands, but the assistant does not execute them (although commands are executed when entering commands from the keyboard). Maybe I need a good new microphone. The assistant knows how to do interesting things and you can create your own triggers.

@puppy_apprentice Pay attention to this speech synthesizer (RHVoice). The Polish 3 synthesis modules are quite good

English and Ukrainian synthesis modules are also good.
I'm currently compiling my own dictionary of Russian words that sound wrong.

Grey · Post by **Grey** » Tue Dec 24, 2024 4:37 pm

Grey wrote: Mon Dec 23, 2024 10:07 am
RHVoice

Two or three thoughts. The synthesizer uses a Speech Dispatcher for its work. What is better for reading texts from the screen by voice? Foliate and Firefox support work through the dispatcher. The first program is a book reader, but you cannot select the desired voice through the internal menu by default. Therefore, oddly enough , Firefox is better suited for reading books by voice at the moment.

Puppy Linux Discussion Forum

Simple, hackable offline speech to text

Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text

Re: Simple, hackable offline speech to text