
For people who sometimes need to process a few minutes of content this can be fine, but for larger content producers that need to process a couple of hours per week for example such a restriction does not work.įinally, free services usually do come with a price, and that is that you give away your data for free. You would need qualified machine learning engineers who know how to build and curate the right data sets in order to make an open-source project such as Kaldi work for you.įree services can be just fine but are always limited.

Moreover, open-source projects such as Kaldi, which Scriptix also contributes to may be free but actually applying the knowledge it contains requires a specific expertise. For some use cases this can be just fine, but when accuracy is important, a paid service will surely be the way to go. With free services the approach is always a generic one, what you see is what you get. To that end we work together with customers to update and customize models based on their content to generate much more accurate transcripts. Paid services such as Scriptix speech to text are aimed at generating the best possible output for the user. The difference between the two lies mainly in the quality of the output they generate. There are many options for automatic speech to text software out there, from paid services to free and open-source options. There is not a vendor out there that supports all the languages and dialects of the world, but in theory this is possible as long as the model can be trained with the right data sets. These two make up the language model, and by applying artificial intelligence and running multiple iterations with this data the language model will become better and better in making the right combinations between sounds and words.

Using the audio data, engineers can build an acoustic model that contains specific sounds and with the transcript data, engineers can build a lexicon that contains specific words. What this means, is that in order to build a model in a certain language you would need thousands of hours of audio in that specific language as well as hundreds of hours of perfect transcripts in that specific language.

The great thing about automatic speech recognition is that models can be built for any language out there, all that is needed is the right dataset. Step 2: Convert speech to text with an API and in different languages
