The Shtoom Application Layer ---------------------------- Document version $Revision: 1.3 $ Currently Shtoom has 4 layers - the UI, which talks to the SIPPhone (which handles the SIP protocol), the RTP layer (which is created by the SIPPhone layer and controlled by it), and the audio layer, which delivers audio to (and receives audio from) the RTP layer. This is fine for a simple phone, but for more complex applications, we need something else. The Application =============== Instead of having the RTP connect directly to the audio, it instead will connect to a shtoom.app instance. A simple example will be the shtoom.app.phone - this is the equivalent of the existing phone. It merely reads audio from the audio device and passes it to the RTP layer and writes audio received from the RTP layer. In addition, the application becomes the main object in the program. It creates the UI and the SIP layer. This removes the linkage between the SIP layer and the UI - instead both talk to the application. A slightly more complex application would be a simple echo server. When it starts, it delivers an announcement, collects 10s of audio, then plays it back before closing the connection. The Application Interface ========================= The application interface is pretty simple. Later on, I might factor out some of the RTP into higher level events. def __init__(self): """ Create the application. The application should create the SIP listener and any user interface that's needed. """ def acceptCall(self, callcookie, calldesc): """ An incoming call. 'callcookie' is a unique identifier used for all dealings with the Application in the future. calldesc describes the new call, in a magic way that's yet to be decided. Returns a deferred - .callback() will be invoked if the call is to be accepted, or .errback() if the call is to be rejected. """ def startCall(self, callcookie, cb): """ Call setup is complete, the call is now live. Accepts a callback which is invoked when the application wishes to terminate the call. The callback is passed a reason for the call teardown. """ def endCall(self, callcookie, reason): """ Other end has terminated the call. """ def receiveRTP(self, callcookie, payloadType, payloadData): """ Pass an RTP packet that was received from the network to the application. """ def giveRTP(self, callcookie): """ The network layer wants an RTP packet to send. Return a 2-tuple of (payloadType, payloadData) """ The echo server =============== When startCall is called, start playing the announcement. While this is playing, throw away all incoming audio. Possibly look for DTMF, and use this to cancel the audio playback. When the announcement is finished, switch to recording mode, and set a callLater to end recording in 10s - either to memory or a temp file. During this phase of the application, deliver CN (comfort noise) RTP packets when requested. Once the timer expires, switch back to playback mode - start delivering the recorded audio. Once that's done, hang up the call. The conferencing server ======================= A more complex application is the conferencing server. In this application, multiple RTP layers will be created (one for each incoming user), each connected to the same application layer. The application layer keeps track of all participants in a conference (or "room", to use the normal term). RTP from one participant is queued for delivery to all other partipants in the conference. Writing applications ==================== Coding applications for the cisco voice gateways involves coding a large state machine in Tcl. I don't think I need to go down that path from the start - YAGNI clearly applies here. On the other hand, there will probably need to be a fair amount of refactoring of the application layer to make writing these applications as trivial as possible. I'm open to ideas here. Further application events ========================== Going forward, the application layer will get a bunch of convenience methods to do most of the basic functionality - start playing a fixed file, start recording to a file (or memory). There's then events generated when playback is complete (in the first case) or the end user hangs up the phone or does something else notable. Start simple, add functionality as it's needed. After thinking a bit about writing a toy language to define the applications, I think I prefer sticking with Python for it. We'll see how this goes in the future.