2 Nov

The Grand Unified Asterisk API

(This post covers what the what of the API rather than the how.  The how of APIs – REST vs. RPC, sync vs. async – will be covered in a future post.)

The problem with programming is knowing when to stop.  The key to knowing when to stop is setting goals.  To that end, here are my goals for the new API:

  • After the initial installation, I do not want to have to use the Linux shell
  • I do not want to have to directly edit configuration files
  • I want to be able to script the configuration of a complex Asterisk server instance
  • I want to be able to build a management console that uses API calls to manage every aspect of a system (or cluster of systems)
  • I want to easily write voice applications in real languages: JavaScript (Node.js), Python, Perl, PHP, Clojure, Scala, .NET, etc.
  • I want performance that is 90% of what you can get using the Dialplan
  • I want to be able to build my own version of the monolithic apps currently trapped inside Asterisk (voicemail, queues, parking, etc.)

For a few stretch goals:

  • I want one Asterisk system to be able to automatically mirror another
  • I want Asterisk systems to be able to discover and share load with each other

I’m not being too demanding am I?

The Five Asterisk API Problem Domains

While there may be eight interfaces that govern the behavior of Asterisk, they really serve only five purposes: configuration, provisioning, introspection, systems management and call control. Configuration sets the ground rules for operation, provisioning handles the lifecycle of dynamic structures like accounts and extensions, introspection lets systems and people on the outside know what’s going on inside, systems management provides tools for controlling running instances, and call control is all about telling Asterisk what to do with channels. Lets take a look at each of these in some detail.

Configuration

The Way It Works Today

Asterisk has hundreds, possibly thousands of settings that govern its behavior.  Configuration is primarily controlled by configuration files or, in some cases, alternate configuration technologies like the Asterisk Realtime Architecture. The base configuration is read from the static store when the system starts, and again when changes need to be applied to the system. The Command Line Interface (CLI) has some ability to alter runtime configuration, but the changes are not persisted. Asterisk Manager Interface (AMI) includes a number of actions that allow manipulation of the configuration files, but these are built for bulk operations, not simply altering individual settings. In effect, there is no programmatic way to alter the configuration of an Asterisk system short of manipulating the configuration files directly then forcing a configuration reload.

The Way It Should Be

The entire process by which Asterisk is configured is backwards. The configuration file methodology is great for systems administrators — the configuration is easily human readable, can be copied from system to system with single command, can be edited using any of a thousand text editors. From a programmers standpoint, it sucks. We should have a means by which both parties can be happy.  The system should be capable of starting without any configuration files using the system defaults. Changes to the configuration should be done programmatically by way of a structured API.  Those changes should be persisted to configuration files (or other data stores) which will be read in the next time the system starts up.  The CLI should make use of the configuration API to affect the same kind of change.  The use of “reload” should be minimized or eliminated altogether.

Provisioning

The Way It Is Today

Many of the modules within Asterisk make use of dynamic constructs: data that is specific to the instance and that changes over time. The most obvious examples are account structures: SIP, IAX2 and all of the VoIP channels use dynamic structures to represent and manage external entities (peers, users, etc.). Like the base configuration, these are currently defined in configuration files or the Realtime subsystem.  And like the base configuration, this is a one-way street: Asterisk reads provisioning structures, it does not write them.  This means that programmers have to create an external tool for programmatically managing provisioning data and use the same brute force process for getting those changes loaded into the system.

Another, less obvious example is audio prompts.  Today there is no formal provisioning interface or even a method for cataloging the audio installed on a system: we simply drop files into the file system and hope we have what we need. If you need to carefully manage the audio on the system you will wind up building your own catalog and managing it externally.

The Way It Should Be

Just as with the base configuration, the provisioning interface should provide a programmatic means of managing accounts and other dynamic structures.  The system should validate the data, create / modify / destroy the in-memory object and then create / modify / delete the a persistent representation of the object. The persistent representation can be stored in any form – text file, database, etc. Events should be generated that include a complete copy of the entity data, such that remote listeners can automatically mirror the changes. Dynamic data related to the entity (think registration information, subscriptions, hints) should be stored in an accessible data store (think Redis) in such a way as to make clustering simple.

Introspection

The Way It Is Today

Introspection allows external systems to query or in some cases passively observe what is happening inside another system.  Today Asterisk has two mechanisms for introspection, AMI and the CLI.  AMI includes a handful of actions that return internal state information (SIPpeers, MailboxStatus, etc.) as well as providing access to a stream of events. AMI’s “Command” action allows AMI to invoke CLI commands, many of which return useful internal state information, but the data is formatted for human rather than machine consumption. At the channel level there are a number of introspective commands including the DumpChan Dialplan application, but as with the CLI commands this is designed for debug output rather than programmatic access.

The Way It Should Be

First and foremost, every introspective mechanism (request/response and unsolicited event) should be capable of formatting data in one or more standard machine-readable formats – JSON, XML, YAML. Anything, so long as it is a real standard for data interchange. Second, all of the request/response functions should be directly callable from any interface (i.e. the CLI, all programmatic interfaces). Third, the function names and parameters should be identical across all interfaces – no more “sip show peers” in CLI and “SIPpeers” in AMI.

Systems Management

The Way It Is Today

Systems management includes things like shutdown, load sharing, backup / restore, heartbeat and the installation/licensing of various add-on modules — the things that allow the operations team to manage one or one thousand Asterisk servers easily. Today the CLI covers some of this. More of it is either done manually from the Linux command shell or using external systems.

The Way It Should Be

We need to sit down with the DevOps teams for current Asterisk shops and find out what their pain points are. At the very least we should include API calls to handle graceful and crash shutdown, backup and restore of persistent configuration and provisioning data, and installation of binary add-ons.

Call Control

How It Works Today

Call control is the meat and potatoes of telephony – the reason we’re all here.  Ironically, call control really comes down to a very short list of core actions: dial, answer, hangup, transfer, play, record and (for legacy reasons) collect digits. That’s it. Seven verbs. Everything else is gravy.

Today Asterisk calls are controlled using either the Dialplan scripting language or the external interfaces like AGI and AMI.  Dialplan programming is what it is – as much as I might like to move everyone over to a more elegant and maintainable interface, that is not something that can be done in the short term.  So I’m going to ignore it. If you’re happy writing in a cryptic, limited language that started life as a configuration file, more power to you

AGI is better than Dialplan in that you have access to a real programming language and all the libraries that go with it.  The down side to AGI is that it provides only a very narrow view into Asterisk – there are over 300 Dialplan applications and functions while there are less than 50 AGI commands. (All to support those seven basic verbs. Kind of funny, ain’t it.)

The Way It Should Be

This is the tough one. Call control breaks down into two domains: channel-level call control and global call control.  The Dialplan and AGI are both channel-level call control tools, while AMI is global. Channel level interfaces are concerned with the first person operation of a single channel in response to various events.  Global interfaces play at a higher level, redirecting channels between various channel-level applications.  The Dialplan and classic AGI are both synchronous interfaces, which means that they block (wait for function calls to complete) and execute instructions sequentially until a termination condition occurs.  AMI and Async AGI, on the other hand, are asynchronous: function calls return immediately and the results of those calls are presented as events.

Synchronous call control is easy for the application developer but more difficult for the core developer as they have to start and stop execution on various threads as the underlying operations take place.  Asynchronous call control is somewhat more complicated as application developer needs to keep track of call state and respond to events in a timely fashion. The upside is that the core system is usually more stable and scalable as it is not having to internally maintain as much state data and synchronize as many operations.

I like asynchronous programming, but I think it may be too much of a burden to pass on to the average application developer.  Call control is the meat of telephony, and making call control approachable to web developers is our number one goal.  The unfortunate result of this mandate is that we need two levels of call control API.  The high level interface should be built specifically for web programmers and should make use of web metaphors wherever possible.  The lower level interface should expose a much richer set of capabilities but at a heightened level of complexity.  It would seem logical to build the web API on top of the rich API.

The Web Call Control API

Web developers are accustomed to building applications that make use of the web’s request/response model: a request comes in from a web browser, their application generates some HTML and sends it back to the browser. The user interacts with the page – perhaps completes a form – and submits it.  The browser posts data back to the server. The app digests the data and generates another blob of HTML in response.  This continues as long as the user cares to interact with the app.

The same metaphor can be applied to realtime communications. The events are a bit different, but the overall flow is identical: a request (in the form of a call) comes in from a phone, the telephony server reads a table that maps the dialed number to a URL. It sends an HTTP request to the URL, passing various call-related information (the DID, Caller ID, etc.) to the app.  The app responds with a short set of instructions: answer the call, play a file, collect some digits.  The telephony server executes the instructions and passes the resulting digits back to the app.  The app processes the data as it would form data from a browser, then responds with more instructions.

The telephony server is taking on the role of a browser.  The web app is written using whatever web technology the developers prefers: Ruby, PHP, Python, etc.  The web app has no idea that the device at the other end of the HTTP request is a telephony server.  It sees post and get values and cookies.  It does whatever it does and returns data formatted as some kind of machine-readable text: JSON or XML.

The advantages of this model include the number of tools available for server-side web development and, more importantly, the number of web developers.  Asterisk is currently accessible to a very small audience. Web-enabling Asterisk would cause a 10x explosion in adoptability.

The Core Call Control API

The web API is great because it exposes the key pieces that developers need to graft real-time communications into general purpose applications.  What it does not do is expose the kind of interface one would need to build communications-specific or communications-intensive applications: things like full-scale PBX solutions, media gateways, carrier soft-switches, etc.  Thats where the low-level call control API comes in.  While the web API is pseudo-synchronous (it executes a stack of instructions in order until it hits a branching or termination point), the core API should be fully asynchronous.

The easy thing to do here is to build on the concept of Async AGI, but do it right.  Replace the hackish key/value pair format with something capable of expressing richer data structures more easily.  Use better plumbing – some kind of structured message bus.  Document the events and the state transitions that generate them. Include more data with each event, including details from the underlying channel. Provide immutable UUIDs for various structures – channels, bridges, etc.

The events are what spark action on the part of the consuming application. The application then needs to be able to make calls to respond to those events.  These could be in the for of either REST or RPC-style requests that invoke various actions.  Asterisk already has everything it needs to do this.  The trick here is in cleaning it up and making it work asynchronously.  The seven core call control functions probably break out into a twenty or thirty more fine-grained variations on the basic theme.

Where the web API makes it easy to communications-enable other applications, the core API should expose everything necessary to build serious communications applications: phone systems, call center solutions, soft-switch platforms, etc.

Next up: how to make all this happen.