The system, which is, called SINHALA TEXT TO SPEECH, is a one sort of to the full research undertaking. This certification briefly describes the functionality of my STTS and highlights the of import and benefits of the undertaking. So this system will let user to come in Sinhala texts and internally it will change over in to pronunciation signifier. Actually it will go on after user select the peculiar option ( convert to voice ) to change over it in to that pronunciation signifier. So wholly this system is capable of accepting characters in Sinhala linguistic communication ( Sinhala founts ) and makes them in to sound moving ridges, which can be captured by a proficient object ( talkers ) . User will able to choose the voice type, which he/she like, it mean there are three option called kid voice, female voice and grownup ( male ) voice to choose. By choosing that map user can hear the voice, which he/she like most. And the system will transport out several benefits to users, those who will utilize this system. The users who are non able to read Sinhala, but those can understand verbally will promote to utilize this system, because utilizing this merchandise they can get the better of that job really easy. If person needs paperss with Sinhala texts, so he or she can utilize this system to acquire that one. In today universe there are no such systems for Sinhala linguistic communication like this.
Table of Contentss
We use speech as the chief communicating media to pass on between ourselves in our twenty-four hours to twenty-four hours life. However, when it comes to interacting with computing machines, apart from watching and executing actions, bulk of communicating is achieved presents through reading the computing machine screen. It involves surfing the cyberspace, reading electronic mails, eBooks, research documents and many more and this is really clip devouring. Nevertheless, visually impaired community in Sri Lanka is faced with much problem pass oning with computing machines since a suited tool is non available for convenient usage. As an appropriate solution to this job, this undertaking proposes an effectual tool for Text-To-Speech transition suiting address in native linguistic communication.
What is text-to-speech?
Not everybody can read text when displayed on the screen or when printed. This may be because the individual is partly sighted, or because they are non literate. These people can be helped by bring forthing address instead than by publishing or exposing it, utilizing a Text-to-Speech ( TTS ) System to bring forth the address for the given text. A Text-To-Speech ( TTS ) system takes written text ( can be from a web page, text editor, clipboard… etc. ) as the input and change over it to an hearable format so you can hear what is at that place in the text. It identifies and reads aloud what is being displayed on the screen. With a TTS application, one can listen to computing machine text in topographic point of reading it. That means you can listen to your electronic mails, eBooks while you do something else which consequence in salvaging your valuable clip. Apart from clip salvaging and authorising the visually impaired population, TTS can besides be used to get the better of the literacy barrier of the common multitudes, increase the possibilities of improved man-machine interaction through online newspaper reading from the cyberspace and heightening other information systems such as larning ushers for pupils, IVR ( Synergistic Voice Recognition ) systems, automated conditions calculating systems and so on [ 1 ] [ 2 ] .
What is “ Sinhala Text To Speech ” ?
“ Sinhala Text To Speech ” is the system I selected as my concluding research undertaking. As a station alumnus pupil I selected a research undertaking that will change over the Sinhala input text into a verbal signifier.
Actually, the term “ Text-To-speech ” ( TTS ) refers to the transition of input text into a spoken vocalization. The input is a Sinhala text, which may dwell of a figure of words, sentences, paragraphs, Numberss and abbreviations. TTS engine should place it without any ambiguity and bring forth the corresponding address sound wave with acceptable quality. The end product should be apprehensible for an mean receiving system without doing much attempt. This means that the end product should be made every bit near as to the natural address quality.
Address is produced when air is forced from the lungs through the vocal cords ( glottis ) and along the vocal piece of land. Speech is split into a quickly changing excitement signal and a easy variable filter. The envelope of the power spectra contains the vocal piece of land information.
The verbal signifier of in input should be apprehensible for the receiving system. This means that the end product will be made every bit closer as the natural human voice. The system will transport out few chief characteristics. Some of them are, after come ining the text user will capable of choosing one of voice qualities, means adult females voice, male voice and child voice. Besides the user is capable of making fluctuation in velocity of the voice.
Actually, my undertaking will transport out chief few benefits to the users, those who intend to utilize this.
Below I have mentioned the basic architecture of undertaking.
Text in Sinhala
Voice and velocity
1.3 Why need “ Sinhala Text To Speech ” ?
Since most commercial computing machine systems and applications are developed utilizing English, use and the benefits of those systems are limited merely to the people with English literacy. Due to that fact, bulk of universe could non take the advantages of such applications. This scenario is besides applicable to Sri Lanka every bit good. Though Sri Lankans have a high linguistic communication literacy, computing machine and English linguistic communication literacy in sub urban countries are bit low. Therefore the sum of benefits and the advantages which can be gained through computing machine and information systems are being kept off from people in rural countries. One manner to get the better of that would be through localisation. For that “ Sinhala Text To Speech ” will move as a strong platform to hike up package localisation and besides to cut down the spread between computing machines and people.
AIMS AND OBJECTIVES
The chief aim of the undertaking is to develop a to the full featured complete Sinhala Text to Speech system that gives a address end product similar to human voice while continuing the native prosodic features in Sinhala linguistic communication. The system will be holding a female voice which is a immense demand in the current localisation package industry. It will move as the chief platform for Sinhala Text To Speech and developers will hold the benefit of constructing terminal user applications on top of that. This will profit visually impaired population and people with low IT literacy of Sri Lanka by enabling convenient entree of information such as reading electronic mails, eBooks, website contents, paperss and larning coachs. An terminal user windows application will be developed and it will move as a papers reader every bit good as a screen reader.
To develop a system, that can able to read text in Sinhala format and covert it in to verbal ( Sinhala ) signifier. And besides, It will capable to alter the sound waves, It mean user would able to choose voice quality harmonizing to his/her sentiment. There are might be three voice choices. These are sort of female voice, sort of male voice and sort of child ‘s voice. And user can alter the velocity of the voice. If person needs to hear low velocity voices or high-velocity voice, so he/she can alter it harmonizing to their demands.
SPECIFIC STUDY OBJECTIVES
Produce a verbal format for the input Sinhala text.
Input Sinhala text which may be a user input or a given text papers will be transformed in to sound moving ridges, which is so end product is captured by talkers. So the handicapped people will be one of the most good stakeholders of Sinhala Text to Speech system. Besides undergraduates and research people who need to utilize more mentions can direct the text to my system, merely listen and grab what they need.
The end product would be more like natural address.
The human voice is a complex acoustic signal, which is generated by an air watercourse expelled at either oral cavity, nose or both. Important features of the address sound are speed, silence, accentuation and the degree of energy end product. The lingua suitably controls the air steam, lips with the aid of other articulators in the vocal system. Many fluctuations of the address signal are caused by the individual ‘s vocal system, in order to convey the significance and emotion to the receiving system who so understand the message. Besides includes many other features, which are in receiving system ‘s hearing system to place what is being said.
Identify an efficient manner of interpreting Sinhala text in to verbal signifier.
By developing this system we would be able to place and proposed a most suited algorithm, which can be used to interpret Sinhala format to verbal signifier by a fast and efficient mode.
Control the voice velocity and types of the voice ( e.g. adult male, adult females, child voice, etc. ) .
Users would be capable of choosing the quality of the sound moving ridge, which they want. Besides they would be leting reset the velocity of the end product as they need. Peoples, those would wish to larn Sinhala as their 2nd linguistic communication to larn elocution decently by altering the velocity ( cut downing and increasing ) . So this will better the hearing capablenesss.
Small childs can be encouraged to larn linguistic communication by changing the velocity and types.
Propose ways for that can be extended the current system further more for future demands.
This system merely gives the basic maps. The system is executable of heightening farther more in order to fulfill the changing demands of the users. This can be embedded in to playthings so can be used to better kids listening and elocution abilities. So those will Borden their speech production capacity.
Relevance OF THE PROJECT
The idea of developing a Sinhala Text To Speech ( STTS ) engine have begun when I sing the chances available for Sinhala talking users to hold on the benefit of Information and Computer Technology ( ICT ) . In Sri Lanka more than 75 % of population speaks in Sinhala, but it ‘s really rare to happen Sinhala packages or Sinhala stuffs sing ICT in market. This is straight consequence to development of ICT in Sri Lanka.
In present few Sinhala text to speech packages are available but those have jobs such as quality of sound, font scheme, pronunciation etc. Because of these jobs developers are afraid to utilize those STTS for their applications. My focal point on developing an engine that can change over Sinhala words in digitized signifier to Sinhala pronunciation with mistake free mode. This engine will assist to develop some applications.
Some applications where STTS can be used
Document reader. An already digitized papers ( i.e. electronic mails, e-books, newspapers, etc. ) or a conventional papers by scanned and produced through an optical character recognizer ( OCR ) .
Aid to disable individual. The vision or voice impaired community can utilize the computing machines aided devices, straight to pass on with the universe. The vision-impaired individual can be informed by a STTS system. The voice-impaired individual can pass on with others by supplying a computer keyboard and a STTS system.
Talking books & A ; playthings. Producing speaking books & A ; toys will hike the toys market and instruction.
Help helper. Develop aid helper speaks in Sinhala like in MS Office aid helper.
Automated News casting. The hereafter of wholly new strain of telecasting webs that have plans hosted by computer-generated characters is possible.
Sinhala SMS reader. SMS consist of several abbreviations. If a system that read those messages it will assist to receiving systems.
Language instruction. A high quality TTS system incorporated with a computer-aided device can be used as a tool, in larning a new linguistic communication. These tools can assist the scholar to better really rapidly since he/she has the entree to the right pronunciation whenever needed.
Travelers guide. System that located inside the vehicle or nomadic device that will give information current location & A ; other relevant information incorporated with GPRS.
Alert systems. Systems that can be incorporated with a TTS system to pull the attending of the controlled elements since as worlds are used to pull attending through voice.
Specially, states like Sri Lanka, which is still fighting to reap the ICT benefits, can utilize a Sinhala TTS engine as a solution to convey the information efficaciously. Users can acquire required information from their native linguistic communication ( i.e. by change overing the text to native linguistic communication text ) would of course travel their ideas to the accomplishable benefits and will be encouraged to utilize information engineering much often.
Therefore the development of a TTS engine for Sinhala will convey personal benefits ( e.g. assistance for disabled, linguistic communication acquisition ) in a societal position and decidedly a fiscal benefit in economic footings ( e.g. practical telecasting webs, toys industry ) for the users.
This has been developed utilizing the agile package development method. We aimed to develop the solution short clip ends which allow holding a sense of achievement. Having short term ends make life easier. Undertaking reappraisal was a really utile and powerful manner of adding a uninterrupted betterment mechanism. The undertaking supervisors are consulted on a regular footing for reappraisals and feed back in order to do right determinations, clear misinterpretations and carry out the hereafter developments efficaciously and expeditiously. Good planning and meeting follow up was important to do these reexamine a success.
BACKGROUND AND LITERATURE REVIEW
“ Text to speech “ is really popular country in computing machine scientific discipline field. There are several research held on this country. Most of research base on “ how to develop more natural address for given text “ . There are freely available text to speech bundle available in the universe. But most of package develops for most common linguistic communication like English, Nipponese, Chinese linguistic communications. Even some package companies distribute “ text to speech development tools “ for English linguistic communication every bit good. “ Microsoft Speech SDK tool kit ” is one of the illustrations for freely distributed tool kit developed by Microsoft for English linguistic communication.
Nowadays, some universities and research labs making their research undertaking on “ Text to speech ” . Carnegie Mellon University held their research focal point on text to speech ( TTS ) . They provide Open Source Speech Software, Tool kits, related publication and of import techniques to undergraduate pupil and package developer every bit good. TCTS Lab besides making their research on this country. They introduced simple, but general functional diagram of a TTS system [ 39 ] .
Image Recognition: Thierry Dutoit.
Figure: A simple, but general functional diagram
Before the undertaking induction, a basic research was done to acquire familiarized with the TTS systems and to garner information about the bing such systems. Subsequently a comprehensive literature study was done in the Fieldss of Sinhala linguistic communication and its features, Festival and Festvox, generic TTS architecture, constructing new man-made voices, Festival and Windows integrating and how to better bing voices.
History of Speech Synthesizing
A historical analysis is utile to understand how the current systems work and how they have developed into their present signifier. History of synthesized address from mechanical synthesis to the signifier of today ‘s high-quality synthesists and some mileposts in synthesis related techniques will be discussed under History of Speech Synthesizing.
Attempts have been made over two hundred old ages ago to bring forth man-made address. In 1779, Russian Professor Christian Kratzenstein has explained physiological differences between five long vowels ( /a/ , /e/ , /i/ , /o/ , and /u/ ) and constructed equipment to make them. Besides, acoustic resonating chambers which were likewise to human vocal piece of land were built and activated with vibrating reeds.
In 1791, “ Acoustic-Mechanical Address Machine ” was introduced by Wolfgang von Kempelen which generated individual and combinations of sounds. He described his surveies on address production and experiments with his address machine in his publications. Pressure chamber for the lungs, a vibrating reed to move as vocal cords, and a leather tubing for the vocal piece of land action were the important constituents of his machine and he was able to bring forth different vowel sounds by commanding the form of the leather tubing. Consonants were created by four separate restricted transitions controlled by fingers and a theoretical account of vocal piece of land including hinged lingua and movable lips is used for plosive sounds.
In mid 1800 ‘s, Charles Wheatstone implemented a version of Kempelen ‘s speech production machine which was capable of bring forthing vowels, harmonic sounds, some sound combinations and even full words. Vowels were generated utilizing vibrating reed with all transitions closed and consonants including nasals were generated with disruptive flow through an appropriate transition with reed-off.
In late 1800 ‘s, Alexander Graham Bell with his male parent constructed a same sort of machine without any important success. He changed vocal piece of land by manus to bring forth sounds utilizing his Canis familiaris between his legs and by doing it growl.
No important betterments on research and experiments with mechanical and semi electrical parallels of vocal systems were made until 1960s ‘ [ 38 ] .
The first to the full electrical synthesis device was introduced by Stewart in 1922 [ 17 ] . For the excitement, there was a doorbell in it and another two resonant circuits to pattern the acoustic resonances of the vocal piece of land. This machine was able to bring forth individual inactive vowel sounds with two lowest formants. But it could n’t make any consonants or connected vocalizations. A similar sort of synthesist was made by Wanger [ 27 ] . This device consisted of four electrical resonating chambers connected parallel and it was besides excited by a buzz-like beginning. The four end products by resonating chambers were combined in the proper amplitudes to bring forth vowel spectra. In 1932, Obata and Teshima, two research workers discovered the 3rd formant in vowels [ 28 ] . The three first formants are by and large considered to be adequate for apprehensible man-made address.
The first device that could be considered as a address synthesist was the VODER ( Voice Operating DEmonstratoR ) introduced by Homer Dudley in New York ‘s Fair 1939 [ 17 ] [ 27 ] [ 29 ] . The VODER was inspired by the VOCODER ( Voice CODER ) which developed at the Bell Laboratories in thirtiess which was chiefly developed for the communicating intent. The VOCODER was built as voice conveying device as an option for low set telephones and the VOCODER analyzed wideband address, converted it into easy changing control signals, sent those over a low-band phone line, and eventually transformed those signals back into the original address [ 36 ] . The VODER consisted of touch sensitive switches to command voice and a pedal to command the cardinal frequence.
After the presentation of VODER showing the ability of a machine to bring forth human voice clearly, the people were more interested in speech synthesis. In 1951, Franklin Cooper and his associates developed a form playback synthesist at the Haskins Laboratories [ 17 ] [ 29 ] . Its methodological analysis was to reconvert recorded spectrograph forms into sounds either in original or modified signifier. The spectrograph forms were stored optically on the transparent belts.
The Formant synthesist was introduced by Walter Lawrence in 1953 [ 17 ] and was named as PAT ( Parametric Artificial Talker ) . It consisted of three electronic formant resonating chambers connected in analogue. As an input signal, either a bombilation or a noise was used. It could command the three formant frequences, voicing amplitude, cardinal frequence, and noise amplitude. Approximately the same clip, Gunner Fant introduced the first cascade formant synthesist named OVE I ( Orator Verbis Electris ) . In 1962, Fant and Martony introduced an improved synthesist named OVE II which consisted separate parts in it to pattern the transportation map of the vocal piece of land for vowels, nasals and obstruent consonants. The OVE undertakings were farther improved and as a consequence OVE III and GLOVE introduced at the Kungliga Tekniska Hogskolan ( KTH ) , Sweden, and the present commercial Infovox system is originally descended from these [ 30 ] [ 31 ] [ 32 ] .
There was a conversation between PAT and OVE on how the transportation map of the acoustic tubing should be modeled, in analogue or in cascade. John Holmes introduced his parallel formant synthesist in 1972 after analyzing these synthesists for few old ages. The voice synthesis was so good that the mean hearer could non state the difference between the synthesized and the natural one [ 17 ] . About a twelvemonth subsequently he introduced parallel formant synthesist developed with JSRU ( Joint Speech Research Unit ) [ 33 ] .
First articulator synthesist was introduced in 1958 by George Rosen at the Massachusetts Institute of Technology, M.I.T. [ 17 ] . The DAVO ( Dynamic Analog of the Vocal piece of land ) was controlled by tape recording of control signals created by manus. The first experiments with Liner Predictive Coding ( LPC ) were made in mid 1960s [ 28 ] .
The first full text-to-speech system for English was developed in the Electro proficient Laboratory, Japan 1968 by Noriko Umeda and his comrades [ 17 ] . The synthesis was based on an articulative theoretical account and included a syntactic analysis faculty with some sophisticated heuristics. Though the system was apprehensible it is yet monotone.
The MITalk research lab text-to-speech system developed at M.I.T by Allen, Hunnicutt, and Klatt in 1979. The system was used subsequently besides in Telesensory Systems Inc. ( TSI ) commercial TTS system with some alterations [ 17 ] [ 34 ] . Dennis Klatt introduced his celebrated Klattalk system two old ages subsequently, which used a new sophisticated voicing beginning described more elaborate in [ 17 ] . The engineering used in MITalk and Klattalk systems form the footing for many synthesis systems today, such as DECtalk and Prose-2000.
In 1976, the first reading assistance with optical scanner was introduced by Kurzweil. The system was really utile for the unsighted people and it could read multifont written text. Though it was utile, the monetary value was excessively expensive for norm clients yet it used in libraries and service centres for, but was used in libraries and service centres for visually impaired people [ 17 ] .
Considerable sum of commercial text-to-speech systems were introduced in late 1970 ‘s and early 1980 ‘s [ 17 ] . In 1978 Richard Gagnon introduced an cheap Votrax-based Type-n-Talk system. In 1980, two old ages subsequently Texas Instruments introduced additive anticipation cryptography ( LPC ) based Speak-n-Spell synthesist based on low-priced additive anticipation synthesis bit ( TMS-5100 ) . In 1982 Street Electronics introduced Echo low-priced diphone synthesist which was based on a newer version of the same bit as in Speak-n-Spell ( TMS-5220 ) . At the same clip Speech Plus Inc. introduced the Prose-2000 text-to-speech system. A twelvemonth subsequently, first commercial versions of celebrated DECtalk and Infovox SA-101 synthesist were introduced [ 17 ] .
One of the modern synthesis engineering methods applied late in address synthesis is concealed Markov theoretical accounts ( HMM ) . They have been applied to speech acknowledgment from late
1970 ‘s. For two decennaries, it has been used for speech synthesis systems. A concealed Markoff
theoretical account is a aggregation of provinces connected by passages with two sets of chances in each: a passage chance which provides the chance for taking this passage, and an end product chance denseness map ( pdf ) which defines the conditional chance of breathing each end product symbol from a finite alphabet, given that the passage is taken [ 35 ] .
Nervous webs besides used in speech synthesis for approximately ten old ages and yet the ways of utilizing nervous webs are still non to the full discovered. Same as the HMM, the nervous web engineering can besides utilize in speech synthesis in a promising mode [ 28 ] .
Fig.6.1. some mileposts in speech synthesis [ 38 ]
6.1.1 History of Finnish Speech Synthesis
In past, compared to English, the figure of users is rather little and the development procedure is clip devouring and expensive even though Finnish text processing strategy is simple and correspondence to its pronunciation is in a high degree. The demand has been increased with new multimedia and telecommunication applications.
In 1977, the first Finnish address synthesist, SYNTE2 was introduced in Tampere University of Technology. It was the first microprocessor based synthesis system and the first portable TTS system in the universe. Five old ages subsequently an improved SYNTE3 synthesist