Results 1 to 24 of 24

Thread: How to make the voice automatically match the sentences in the textbook?

  1. #1

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    How to make the voice automatically match the sentences in the textbook?

    I have some voice files (mp3) which are the pronunciation of all the text sentences in the Enghlish textbooks. Each unit text has an mp3 file, and each text has dozens of sentences. Now I want to achieve the following function:

    When an mp3 file (unit text voice file) is being played, the software will automatically display(match) the sentence in the text corresponding to the voice. In other words, automatically generate the text voice subtitles according to the sentences in the text. (Just like displaying the lyrics of a song)

    I know it's not an easy task, any suggestions and hints are welcome. Thanks in advance.
    Last edited by dreammanor; Sep 13th, 2017 at 12:43 PM.

  2. #2
    PowerPoster
    Join Date
    Jun 2015
    Posts
    2,224

    Re: How to make the voice automatically match the sentences?

    Last edited by DEXWERX; Sep 13th, 2017 at 10:43 AM.

  3. #3

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: How to make the voice automatically match the sentences?

    Quote Originally Posted by DEXWERX View Post
    Thank you for the links, DEXWERX. I mean: make the voice and the sentences in the textbooks match automatically.

    The voice files already exist and the sentences in the textbooks already exist. I need the voice and sentences that can be automatically matched when playing the voice file.

    In other words, automatically generate the voice subtitles according to the sentences in the textbooks.

    Note:
    The voice files are the pronunciation of all the text sentences in the Enghlish textbooks.
    Last edited by dreammanor; Sep 13th, 2017 at 11:13 AM.

  4. #4
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    14,205

    Re: How to make the voice automatically match the sentences in the textbook?

    You would need to index them somehow and match the indexes.
    For example say you have and array of filenames for your mp3s and another array of sentences to display
    When you play File(x) you display sentence(X)

  5. #5

  6. #6

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: How to make the voice automatically match the sentences in the textbook?

    Quote Originally Posted by DataMiser View Post
    You would need to index them somehow and match the indexes.
    For example say you have and array of filenames for your mp3s and another array of sentences to display
    When you play File(x) you display sentence(X)
    Thank you for your reply. The thing is like this, each text has an mp3 file, and each text has dozens of sentences. I need to display the corresponding sentence according to the pronunciation of the reading (an mp3 voice file), just like displaying the lyrics of a song.
    Last edited by dreammanor; Sep 13th, 2017 at 12:44 PM.

  7. #7

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: How to make the voice automatically match the sentences in the textbook?

    I'm studying your source code and it looks very interesting. Thank you so much, The trick.

  8. #8
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    14,205

    Re: How to make the voice automatically match the sentences in the textbook?

    Quote Originally Posted by dreammanor View Post
    Thank you for your reply. The thing is like this, each text has an mp3 file, and each text has dozens of sentences. I need to display the corresponding sentence according to the pronunciation of the reading (an mp3 voice file), just like displaying the lyrics of a song.
    In that case you would probably need to match the time code to the sentences

  9. #9
    PowerPoster
    Join Date
    Jun 2015
    Posts
    2,224

    Re: How to make the voice automatically match the sentences in the textbook?

    so you have
    *a bunch of text files
    *each text file has dozens of sentances
    *a bunch of mp3 files
    *each mp3 file corresponds to 1 text file


    now given this, you need to display a sentence from the text file, at the moment it is spoken from the mp3?

    if that's correct, there is no easy way to do this. You have to do this the way all subtitles work, and that is to tag each sentence with a time value within the mp3.

  10. #10

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: How to make the voice automatically match the sentences in the textbook?

    Quote Originally Posted by DataMiser View Post
    In that case you would probably need to match the time code to the sentences
    Yes, I need to match the time code to the sentences.

    Quote Originally Posted by DEXWERX View Post
    so you have
    *a bunch of text files
    *each text file has dozens of sentances
    *a bunch of mp3 files
    *each mp3 file corresponds to 1 text file


    now given this, you need to display a sentence from the text file, at the moment it is spoken from the mp3?

    if that's correct, there is no easy way to do this. You have to do this the way all subtitles work, and that is to tag each sentence with a time value within the mp3.
    You expressed my meaning very clearly, thank you.

    Yes, I know it's not easy. Now I have a thought:

    1. Using a third-party library (for example, Microsoft Speech API) to convert the voice(mp3) to the Machine-Text which may differ from the text in the textbook.

    2. Automatically tag each sentence in the Machine-Text with a time value within the mp3.

    3. Automatically match the sentences in the Machine-Text to the sentences in the English textbook using a fuzzy algorithm.

    Perhaps this is another form of speech recognition and AI.
    Last edited by dreammanor; Sep 13th, 2017 at 08:06 PM.

  11. #11

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: How to make the voice automatically match the sentences in the textbook?

    Hi The trick, i ran your API.AI, but didn't have any results.

    I clicked on the button M and spoke to the microphone, but the window didn't return anything.

    The following websites are blocked in our country:
    api.ai
    docs.api.ai/docs/reference
    www.youtube.com

  12. #12
    PowerPoster
    Join Date
    Feb 2017
    Posts
    4,995

    Re: How to make the voice automatically match the sentences in the textbook?

    I'm also interested on this topic. I would like to add such a functionality to a program in a very same situation as described by dreammanor.
    But I don't want to depend on an internet API, I want to do it locally.
    With a first investigation it seems that it could be possible with the SAPI.

    If anyone have any experience or knowledge about it please advice. Thanks.

    PS: I need to handle Spanish language, but my guess is that it shouldn't be a problem since it's one of the main languages.

  13. #13
    PowerPoster
    Join Date
    Jun 2015
    Posts
    2,224

    Re: How to make the voice automatically match the sentences in the textbook?

    If you download the SAPI 5.1 SDK, check the RecoVB Speech recognition example.

    C:\Program Files\Microsoft Speech SDK 5.1\Samples\VB\RecoVB

  14. #14
    PowerPoster
    Join Date
    Feb 2017
    Posts
    4,995

    Re: How to make the voice automatically match the sentences in the textbook?

    Quote Originally Posted by DEXWERX View Post
    If you download the SAPI 5.1 SDK, check the RecoVB Speech recognition example.

    C:\Program Files\Microsoft Speech SDK 5.1\Samples\VB\RecoVB
    The question: why 5.1 and not the lastest (5.4)?

  15. #15
    PowerPoster
    Join Date
    Jun 2015
    Posts
    2,224

    Re: How to make the voice automatically match the sentences in the textbook?

    Quote Originally Posted by Eduardo- View Post
    The question: why 5.1 and not the lastest (5.4)?
    There is no SAPI 5.4 examples for VB6. MSDN documentation is outdated and referring to the 5.1 examples.

    the 5.4 examples are only for C/C++/C#

  16. #16
    PowerPoster
    Join Date
    Feb 2017
    Posts
    4,995

    Re: How to make the voice automatically match the sentences in the textbook?

    Quote Originally Posted by DEXWERX View Post
    There is no SAPI 5.4 examples for VB6. MSDN documentation is outdated and referring to the 5.1 examples.

    the 5.4 examples are only for C/C++/C#
    But I suppose that it could be backward compatible, and perhaps the recognition accuracy has been improved (all guesses).
    I'll start researching the subject when I have the time, for me it's not something urgent.
    I'll keep watching to the thread for now.
    Thanks.

  17. #17
    PowerPoster
    Join Date
    Jun 2015
    Posts
    2,224

    Re: How to make the voice automatically match the sentences in the textbook?

    Quote Originally Posted by Eduardo- View Post
    But I suppose that it could be backward compatible, and perhaps the recognition accuracy has been improved (all guesses).
    I'll start researching the subject when I have the time, for me it's not something urgent.
    I'll keep watching to the thread for now.
    Thanks.
    They are backward compatible, and they refer SAPI.dll, which no longer needs to be redistributed. (It's a core os component since Vista)

  18. #18

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: How to make the voice automatically match the sentences in the textbook?

    I spent a day searching for information on the Internet and found that the system I was going to do is called "Automatic Subtitle Recognition" and it seems to be beyond my ability.

    Perhaps only Google, Microsoft, IBM, Apple have the ability to do a similar system. Of course, we can use their API to do something for our own software.

  19. #19
    PowerPoster
    Join Date
    Feb 2017
    Posts
    4,995

    Re: How to make the voice automatically match the sentences in the textbook?

    Quote Originally Posted by dreammanor View Post
    I spent a day searching for information on the Internet and found that the system I was going to do is called "Automatic Subtitle Recognition" and it seems to be beyond my ability.

    Perhaps only Google, Microsoft, IBM, Apple have the ability to do a similar system. Of course, we can use their API to do something for our own software.
    What part do you think it is so difficult?

    One part is to get the text recognized from the audio.
    One important thing on this part is to have the timings. I don't know is the SAPI is able to provide the timing for each word, or at least a time marker every few words. Otherwise the audio should have to be splitted in small pieces and there will be a difficulty (may be an important one) in doing all this right, but it should be possible to find a way to make it to work properly.

    The second part is the fuzzy logic to match one text with the other. It is for making a correspondence between the text that was automatically recognized from the audio, and the "right" text that you already have writen (that can be slightly different).
    That part can have some difficulty, but can be done.
    There are algorithms to compare similar texts out there. They are the ones that are used by programs that compare two texts and highlight differences. In this case it should be a bit "fuzzy" or with tolerance.

    I'm not saying it is easy. It can take some time to do it.

    In your research, what difficulties did you find?

  20. #20

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: How to make the voice automatically match the sentences in the textbook?

    Quote Originally Posted by Eduardo- View Post
    What part do you think it is so difficult?

    One part is to get the text recognized from the audio.
    One important thing on this part is to have the timings. I don't know is the SAPI is able to provide the timing for each word, or at least a time marker every few words. Otherwise the audio should have to be splitted in small pieces and there will be a difficulty (may be an important one) in doing all this right, but it should be possible to find a way to make it to work properly.

    The second part is the fuzzy logic to match one text with the other. It is for making a correspondence between the text that was automatically recognized from the audio, and the "right" text that you already have writen (that can be slightly different).
    That part can have some difficulty, but can be done.
    There are algorithms to compare similar texts out there. They are the ones that are used by programs that compare two texts and highlight differences. In this case it should be a bit "fuzzy" or with tolerance.

    I'm not saying it is easy. It can take some time to do it.

    In your research, what difficulties did you find?
    Hi Eduardo, I am still looking for information about ASR. After I have collected the information, we will discuss the problem together.

  21. #21
    PowerPoster
    Join Date
    Feb 2017
    Posts
    4,995

    Re: How to make the voice automatically match the sentences in the textbook?

    Quote Originally Posted by dreammanor View Post
    hi eduardo, i am still looking for information about asr. After i have collected the information, we will discuss the problem together.
    ok..

  22. #22

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: How to make the voice automatically match the sentences in the textbook?

    Quote Originally Posted by Eduardo- View Post
    What part do you think it is so difficult?

    One part is to get the text recognized from the audio.
    One important thing on this part is to have the timings. I don't know is the SAPI is able to provide the timing for each word, or at least a time marker every few words. Otherwise the audio should have to be splitted in small pieces and there will be a difficulty (may be an important one) in doing all this right, but it should be possible to find a way to make it to work properly.

    The second part is the fuzzy logic to match one text with the other. It is for making a correspondence between the text that was automatically recognized from the audio, and the "right" text that you already have writen (that can be slightly different).
    That part can have some difficulty, but can be done.
    There are algorithms to compare similar texts out there. They are the ones that are used by programs that compare two texts and highlight differences. In this case it should be a bit "fuzzy" or with tolerance.

    I'm not saying it is easy. It can take some time to do it.

    In your research, what difficulties did you find?
    After a few days of efforts, even the first step (MP3 Voice-to-Text) I can't achieve. I've looked at the SAPI examples which can implement Text-to-Voice, but I don't know how to implement MP3 Voice-to-Text using SAPI.

    I know that Google has developed ASR for Youtube, and that Facebook is now also developing ASR, but these websites are blocked in my country, so I can't test and use their results. I wonder if there are other Voice-to-Text engines that can be good as Google Voice.
    Last edited by dreammanor; Sep 19th, 2017 at 11:45 AM.

  23. #23
    PowerPoster
    Join Date
    Jun 2015
    Posts
    2,224

    Re: How to make the voice automatically match the sentences in the textbook?

    the RecoVB example is Voice-to-Text


    step 2 would to use your mp3's in lieu of recording voice.
    Last edited by DEXWERX; Sep 19th, 2017 at 11:21 AM.

  24. #24

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: How to make the voice automatically match the sentences in the textbook?

    Quote Originally Posted by DEXWERX View Post
    the RecoVB example is Voice-to-Text


    step 2 would to use your mp3's in lieu of recording voice.
    Thank you, DEXWERX. I have seen the RecoVB example. But I would like to recognize the text directly from mp3 or wav files, rather than through recording voice.

    Edit: I have sent a new thread to discuss the Voice-to-Text.
    http://www.vbforums.com/showthread.p...17#post5215817
    Last edited by dreammanor; Sep 19th, 2017 at 11:56 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width