Thursday, April 28, 2011

Voice Operated computers thinktank using the delphi method



Voice controlled computing using AI.
In today’s computing environments, there is a lack of voice controlled computing devices that use AI to interact with the user. This is not to say that these devices do not exist but rather existing devices are lacking in the features that would be needed to make them truly voice only. Existing devices either must “hear” what is said or the user must use specific words or phrases to get the devices to do a task (with the majority of them only being able to take dictation or do searches).  For example on my Android powered phone, I must use pre defined words such as “call”, “Text”, “find”, “search” to get the phone to do any task. Additionally I must be very specific in what I want to get the phone to produce any kind of intelligent output.  Have you ever tried to use voice dialing on a blackberry? “Call mum,” phone responds with: “did you say call home?”  Now it is possible that the phone did not understand me because I have an accent but then again if this were truly AI based it should be able to understand me just as a human would (I am actually chuckling to myself since that last part is not really true since some people chose not to understand me… I have an accent, get over it).

For true voice operation computing devices should be able to understand or deduce commands no matter the words used or what order they are used. They should also be able to respond either by querying the speaker for additional information or speaking the requested information or telling the user that task has been successfully or unsuccessfully completed.  So for example if I say “ what is the weather forecast of Saturday?” my computer should be able to use AI to deduce that I need the weather forecast for Saturday and Saturday only. It should also be able to provide the same information if I only said “Saturdays weather forecast”. Getting computers to speak naturally has already been achieved at least on some systems (check out Alex on the mac…I was blown away).

Much as this is all fine and dandy in theory, there are several issues that need to be resolved or should I say dealt with before this vision can became a reality. Yes, a lot of work has been done on this subject and there is still a lot of work needed. What we do need at this time is not so much additional research in voice operated computing, or natural language processing or artificial intelligence but rather we need a paradigm shift  - a change in the way that we approach natural language processing and artificial intelligence. Not only do we have to reevaluate this, but we also need to switch gears in our understanding of the role played by natural language processing in scientific theory. Additionally there are many other areas that will also need changing (at least in the ways we think of them) because they too will be affected by the above mentioned paradigm shift.

I am sure that some of you reading this will fail to see the relevance or the need for such a system.  Well for you nay Sayers, imagine a blind person being able to use a computer just as you would but without the need for a keyboard and a pointing device but rather all they need is a microphone and speakers or in your case as a sighted person being in a car and needing some information on the fly. Would it not be fun if you could boot your computer and get the information you need just by talking to it and not only that you can talk to it in exactly the same way as you would talk to your peers or should I say the same way as you would command your secretary or personal assistant (assuming you have one). Actually, a system such as this eliminates the need for a personal assistant don’t you think?? I guess the question that arises then is how do we create such a system given that we are not experts in either the fields of voice recognition, Artificial Intelligence or Linguistics and for that matter psychologist (that is the different groups of people that will be involved)?

Did someone just say, “Use the Delphi Method”? That may just have been in my head but yes that is the methodology that should be used for studying or should I say finding a solution to a problem such as this one.  What is the Delphi Method? Good question. The Delphi method was developed as a means of seeking the opinions of experts to given problems without a need to have them in the same place at the same time. Cool huh!! The Delphi Method uses a group communication structure that facilitates discussions on specific task (http://pespmc1.vub.ac.be, 2011). This method usually involves anonymity of responses, feedback to the group as a whole or individually while at the same time giving participants to withdraw earlier judgment calls (http://pespmc1.vub.ac.be, 2011).  The Delphi method thus queries experts on any subject and following these the information is sent to all involved parties allowing them to reconsider their previous answers based on the responses of the others.

I guess the question on everyone’s minds at his point is: why would you use this method? What is so special about it that does not exist in the other methods (Nominal Group technique or PMI –Plus-Minus-Interesting). The beauty of the Delphi Method is that is very well suited for use in the discussion of questions or issues that must be tackled by a distributed group of experts – i.e. experts that are not located in the same area or field and cannot be logically brought together.  Additionally, the Delphi Method has the advantage that it can be used or should be used in situations that require that a consensus be reached. By far the greatest advantage of the Delphi method in my opinion is the fact that it is very effective when past data is absent. In our case, there is a lot of past data but this data and the relevant parties that need to extract and manipulate this data is by no means centrally located. 

Additionally and by no means the least of its advantages, the Delphi Method is very useful when forecasting of new technology is needed (http://web1.msue.msu.edu/msue/imp/modii/iii00006.html), as is the case here.
Furthermore, the Delphi Method allows participants to remain anonymous which in turn has the advantage of reducing social pressures, personality conflicts and individual dominance issues (http://web1.msue.msu.edu/msue/imp/modii/iii00006.html). The Delphi method also has the advantage of educating its respondents on all the diverse and interrelated parts of the issue or technology being investigated.

            Now to the downside (yes there is a downside to everything):
Much as the Delphi method is a great tool, it is not always great to use because for one thing, the results or consensus reached is the opinion of a select few which is by no means representative of the population as a whole (I am sure you understand this one …if you don’t there is a statistics class with your name on it).  The Delphi method is also lacking because it has a tendency of creating middle of the road positions just so a consensus can be reached. This tendency eliminates extreme positions on the right and left of the norm (http://web1.msue.msu.edu/msue/imp/modii/iii00006.html).  Finally and by no means the least of its worries, the Delphi method should not be used as the only forecasting tool in the box, using just it will only lead to skewed forecast. Phew!!! That was a long one…time for breather… breather taken lets continue...


We have thus far looked at voice-operated computing from the research standpoint since we have been concerned about how we will achieve this goal. Lets now look at some factors that will make or break this dream.
Forces For:
Technological:
Did I already mention that voice-operated computing is not a new idea as it has existed for some time now. It is the implementation that is lacking. With that being said, technology is advancing rapidly and it is thus not a far fetched idea that one day we will arrive at true voice operated computing. This idea is backed by the massive amounts of research and research groups that exist within the field of natural language processing, linguistics and artificial intelligence (AI).
Cultural:
Today’s world is turning us into very lazy people (and I don’t mean that in a bad way). We want to be able to do things faster and with very little effort.  Typing will soon be a thing of the past (actually it is a thing of the past if you have the money to shell out for one of those dictation systems) but this goes far beyond that. This system in my opinion will lead to a new way of doing things, a new way of human computer interaction.

Forces Against:
Socio-technological:
I guess the greatest force that would work against this kind of system is the socio-technological constraints that are needed to build it. What do I mean by this? The saying “too many cooks spoil the soup” comes to mind. We have already ascertained that for a system such as this to work it would take more than just hardware and software programmers but also linguist, psychologist and what have you. With this multiplicity of players, each with their on biases and ways and means of doing things, one can only marvel at the chaos that may ensue.  In that same light, it is my belief that they reason we have not arrived at such a system yet is that there are way too many players in the game. Much as having too many players may be a good thing, it also means that there is a duplication or triplication of efforts.

Financial:
Money, the root of everything evil rears its ugly head again. It is always fine and dandy to have people working on stuff but ideas such as these needs backing both financial and otherwise. The big boys such as IBM and Microsoft have funding programs for these kinds of ideas but then again it also means that they have control over how the projects are developed and implemented. Their open source counterparts on the other hand must contend with time available from volunteers, which is very often not forth coming. For an idea such as this one to fully work, there should be a financial backing with no strings attached allowing for full creativity of all parties concerned…. yeah that is going to happen.



No comments: