I prefer to use MS Speech Engine instead of Sphinx-4 Engine (Linux based, right?). So, I will create the server side as a web service in SOAP/WSDL format, where user will send the jpg and wav in XML format, and the return will be in XML format also. All will be written in ASP.NET using C# 2008. The question is: what camera series you need to use for photo-shot? Does it have SDK or API? Let me know.