The flow of the work is shown by the icons given below:
As you can see in the above flow diagram, during live meetings or online lectures, we can collect the live audio. From the live audio, we will generate the text from the speech. In the third step, we would apply the algorithm of artificial intelligence for spelling corrections. After this step, we will have the caption which is recognized from the live video. Then the next step is to integrate the text file with the video again. After all, steps are done, you will be able to see the captions on our web application. I hope that by this flow you have some idea about how our web application is going to work and how the captions are created in any applications.
The below image of the code shown that how I can get the live video in any browser. We can change the height and width of the video frame by changing the parameters in the below part of the code. We can also set the framerates if we want to change them.
In this part, we are going to tell you about how to get the live speech from the video.
To start the recognition we have to use recognition.continuous = true. For the acknowledgment, I have made the function that will acknowledge you when the recognition is started.
If we want to stop the recognition, that we have to use recognition.continuous = false. And have also made the function that will acknowledge when the recognition stops.
For that, I have used a variable called “content” in the above code snippet. In this code, I will collect the text continuously and so that I have to keep my variable updated. So for that, I am continuously appending my text to the “content” variable.
Now to show the content without applying the AI algorithm that how accurate the text is predicted, we will print the “content” variable in the text area that is provided by the use of HTML.
In the above code snippet, you can able to see how I have created the text area. In that text area, we are printing the text stored in the variable “content”. And the code shown is the code for the button to start and stop the recognition of the speech.
Now we will see how I have applied the algorithm for the spelling correction.
Algorithm for spelling correction
For the spelling correction task, we are using NLP libraries which is a sub-domain of Artificial Intelligence (AI). There are many libraries that are openly available for this kind of task. Some list of the libraries that are able to do the spelling correction from the statements are:
1. TextBlob Library
2. SparkNLP library
3. Spello Library
There are many other libraries present but I have tried these three libraries. From these libraries, I find that Spello is the library that can give the more accurate output, and also this library is very easy to integrate.
About Spello Package:
Spello is a spell correction model formed with the combination of two models:
1. Phoneme Model
2. Symspell Model
Phoneme Model:- It uses the Soundex algorithm(Model which is used for indexing the names by sound, as pronounced in the English Language) in the background and suggests correct spellings using phonetic concepts to identify similar-sounding words.
Symspell Model uses the concept of edit-distance in order to suggest correct spellings. Spello gets you the best of both, taking into consideration the context of the word as well. All the algorithms are using concepts of LSTM, BLSTM, RNN, etc.
This model is able to correct the two languages English and Hindi. Now let’s see how we can use this package for our project.
Step 1: Installation of library
For installation of the library, we need to write the code “pip install spello” in the Jupyter notebook or collab file or if we want to install it in our system then we can run the command in the command prompt.
So I have used Jupyter Notebook for this process. So this is how we can install the package.
Step 2: Model Initialization
To initialize the installed model we need to import the model. The code to import the model is given in the below code piece.
This is how we can initialize the model and here we are using “en” for the English language.
Step 3: Model Training/Create a new model
After initializing the model, we need to train the model. We can train the model by giving them a list of sentences. The code for training the model is given below.
We can give any number of sentences or words in the above manner to train the models.
Step 4: Save the model
After training the model, we need to save the model. The code to save the model is given below.
So by using sp.save() we can save the model.
Step 5: Load Model
There are many pre-trained models are available which are trained by spello. We can download it and use it whenever we required it. Or else we can use our own trained model which we can train by using the above process.
We can load the model with this code.
Step 6: Test the model
We can do testing of the model by using the given code.
Here in the output, we can see that we are getting three output
1. Original Text
2. Spell corrected Text
3. The dictionary contains the words that are corrected.
So this is how the spello library works.
Now we need to merge two parts of the project.
For that, we are using a python flask in which our algorithm is present and we can send the text present in the “content” variable by the use of API to the flask in which our algorithm is present.
Here in the above code in the image, I am sending the text from the URL which will be received by the Flask application.
Hereby using the “window.open()” function we are calling the URL that is written inside the function.
Now we will see how we have applied the algorithm in a python flask. In the flask application, we first get the text from the URL, and then we will test the trained model with the received text. By this, we can able to get the text corrected.
Then we are printing the corrected text and check the output.
In the above code, we will load the model and apply the text that we are getting from the URL.
Now we will see the importance of this work.
In the above gif we can see that the girl is trying to tell something but we are not able to tell what she trying to tell.
But now we will able to understand what a girl is trying to say.
The output of our work:
Here are some screenshots of our project.
The above screenshot is of our UI in which we are taking the video as well as audio also.
Here there are two buttons “Start” and “Stop”. When we have to start recognizing the sound we have to click on the “Start” button and for stopping the recognition we need to click on the “Stop” button.
Here we can see after starting the recognition, the text that is recognized is displayed in the text area.
So the text that is recognized is been sent to the flask for processing. In the flask, the text is collected from the URL, and then it is been processed from the model that is trained.
Then the corrected text is shown below.
In the above image, you can see that the text after being processed, is displayed on the screen. Further, we are going to make a much better UI in which we can get the corrected text displayed directly.
So this is how our project works.
You can see the detailed explanation of our project in the following video link.
Thanks for reading. Hope these contents will help you.
For any query you can contact us on Linkedin:
Harshal Faldu: https://www.linkedin.com/in/harshal-faldu-32b647165/
Nipun Parekh: https://www.linkedin.com/in/nipun-parekh-6006a0152/
Hope you have enjoyed reading this blog :)