28 Oct

The Eight APIs of Asterisk

(The first of a multi-part series on APIs for Asterisk, this article covers the current interfaces, their strengths and weaknesses.  Check back in a few days for part two: a proposal for an enhanced, unified interface for Asterisk programming.)

At the AstriDevCon this year we talked about a number of aggressive projects to make significant improvements to the platform. At the top of the list were a rewrite of our SIP channel and better APIs for application development. Both efforts are important and, frankly, long overdue. I’ll leave the SIP efforts to others and concentrate on the APIs.

Asterisk currently includes a total of eight interfaces:

  1. The internal C language interfaces
  2. The Dialplan scripting language(s)
  3. The Asterisk Gateway Interface (AGI)
  4. The Asterisk Manager Interface (AMI)
  5. The External IVR interface / protocol
  6. The Asterisk Command Line Interface (CLI)
  7. The outgoing call file spool
  8. The Asterisk configuration files

These interfaces provide various means of interfacing with Asterisk. Unfortunately, they’re inconsistent, awkwardly structured, poorly documented and generally developer-unfriendly. Let’s take a look at the pros and cons of each.

The C API

Asterisk’s internal C API is used to write applications and functions which are then exposed to the Dialplan and other interfaces.  On the positive side, the resulting apps are fast, have virtually no overhead and have access to all of the facilities of Asterisk. Unfortunately this comes at a high price: you must be quite good at C programming; you run the risk of crashing the entire instance if your application encounters a fault; you must live with the threading constraints of the platform; and perhaps most importantly, you have to update your app every time the core API changes.

Keep in mind that Asterisk was developed as an application rather than a platform.  The idea of building a formal internal API wasn’t on the agenda when Mark sat down and started coding the original version Asterisk. By the time it became apparent that Asterisk was in fact a platform, much of the key functionality was already spread across multiple modules. Over the years some effort has gone into cleaning this up, but a complete refactoring would be difficult and disruptive.

In the early days, the C API was the only way to create truly powerful applications. That’s why things like voicemail, parking, queues and conferencing were built as complex, monolithic structures. They’re difficult to maintain, difficult to enhance and impose a specific set of policy-level decisions on the user -something that’s bad news in a “platform” product.  The kinds of things that should be built at this level are atomic components: building blocks that abstract the complexities without imposing policies.

Pros: powerful, fast, low overhead

Cons: complex, any bug can crash the entire instance, changes from version-to-version, not opaque

The Dialplan

The Dialplan Language started out as “the dial plan” for Asterisk: a simple configuration file that mapped phone numbers to applications. As Asterisk evolved from a simple PBX into a platform, the configuration file evolved into a really ugly scripting language. A better solution would have been to embed an existing scripting language into Asterisk, but at the time (roughly 1999) that would have introduced more overhead than Mark was willing to accept.

As a result, the development team added all of the trappings of a scripting language to the configuration processor: variables, functions, expressions, subroutines, etc. The language that evolved is structurally similar to classical BASIC and includes many of the bad habits of that language.  It is not particularly difficult to learn but is awkward to use.  Where most scripting languages have the advantage of a large and well developed base of extensions / libraries, the Dialplan has only a few primitive mechanisms for interfacing with outside resources (think databases, web services, directories, etc.).

The Dialplan also sees the world from the perspective of an individual Asterisk channel. It has no means of knowing the state of other channels or of the system as a whole. Dialplan scripts are strictly synchronous. Calls to any external facilities (databases, web services, etc.) block and can only be interrupted by the hangup event. Data structures are primitive at best. Many functions set or respond to “magic” variables that are frequently under or un-documented. The overall result is a messy, awkward and utterly proprietary domain-specific language that tends to scare developers away.

For better or for worse, the Dialplan is what many consider to be the primary means of building Asterisk applications and has become deeply incorporated into most Asterisk solutions, which makes it impossible to simply chuck it out. The best option is probably to recommend against using it for new developments / deployments but to continue to support it for the foreseeable future.

Pros: no external dependencies, relatively simple, fast

Cons: horribly arcane syntax, limited tools for accessing external data, reinvents the wheel

The Asterisk Gateway Interface (AGI)

AGI was the first attempt at a means of stepping around the limitations of the Dialplan and exposing the capabilities of Asterisk to outside programming languages.  AGI is modeled on CGI (Common Gateway Interface) – a standard means for generating dynamic web pages using any programming language which supports standard input and standard output.  Invoked from the Dialplan, AGI spawns a process, passes in a number of stock state values (variables) over the stdin channel, then waits for the newly spawned process to respond over stdout.

The AGI interpreter includes a limited vocabulary of functions. Most are duplicates of low-level capabilities exposed by Dialplan applications: play, record, say, etc. Others are used to navigate the Dialplan or to directly execute Dialplan applications. Given enough effort, anything that can be done from the Dialplan can also be done from AGI.  The advantage of this is that happens in the context of a real programming language. Developers don’t need to learn the Dialplan syntax, just the Asterisk vocabulary.

One of the major drawbacks to the original AGI concept was the resource and time penalty incurred by invoking a new process to handle each call.  For example, an AGI application written in Perl requires the instantiation of a new Perl interpreter and any required libraries.  This takes time to load and can consume several megabytes of memory and results in yet another process that the CPU must handle.

As means of eliminating these drawbacks, the AGI application / specification was expanded to include “FastAGI” – a variant that establishes a socket connection for each call, rather than launching a new process.  Stdin and stdout are replaced by the input and output channels for the socket.  This allows developers to create “FastAGI server” applications that consume relatively little in the way of additional resources for each additional call. Another advantage is that the FastAGI server can be hosted on another computer, further reducing local resource contention.

Both the original and the Fast variant of AGI are synchronous in nature: sessions are spawned at the channel level and share the Dialplan’s channel-level view of the world. As a means of overcoming this limitation, the development team recently added a new paradigm called “AsyncAGI” which will be covered in the next section.

Pros: works with any language, many supporting libraries, relatively simple

Cons: limited vocabulary, poor introspective capabilities, channel-level viewpoint, synchronous, can be a resource hog

The Asterisk Manager Interface (AMI)

The Asterisk Manager Interface or AMI is an asynchronous, socket-based mechanism for interfacing external applications with Asterisk. Where AGI and the Dialplan oversee the actions of a single channel, AMI is global – it has a view of, and control over, the entire Asterisk instance. AMI is frequently used to build operator panels, multi-system queueing solutions and other systems that essentially remote-control one or more Asterisk instances.

The interface consists of events, actions and responses.  Events are simply notifications of various activities taking place on the system. They are unsolicited – meaning that the AMI client (program connected to AMI) does nothing to request or generate them.  Actions, on the other hand, are commands issues by the client application, instructing Asterisk to either take an action (transfer a channel, disconnect a channel, etc.) or requesting some system information (a list of active channels, the details of a conference bridge, etc.).  Responses are simply events that are generated in response to an action. They typically contain a payload that indicates the results of the command or the data requested in the query.

Originally AMI was only accessible over a raw TCP socket using a home-grown  key/value pair format. Over time it has been enhanced to support access over TLS, HTTP and HTTPS as transports and both XML and HTML as alternative output formats.

Starting in Asterisk 1.6  the development team added a new feature called AsyncAGI which basically merges the AGI feature set with AMI.  A channel can be handed off from the Dialplan to the AsyncAGI environment using the standard AGI command with the argument “agi:async”.  When this happens, an AMI even is raised that indicates a new session. The AMI client application then sends an AGI command to the channel. When the command completes, a new event is generated indicating the results. This sort of call-and-response behavior continues until the call terminates.

AsyncAGI is, arguably, the most promising of the external interfaces as it combines channel-level call control with a global view of the Asterisk instance. This does introduce certain complications: the AsyncAGI application needs to process every event or risk leaving a call in a hung state. AsyncAGI also pushes certain decision as to how individual calls should be managed to the far-end application.

Application developers frequently have a love / hate relationship with AMI. It gives the appearance of being an ideal solution to several problems but it has just enough inconsistencies and nuances to drive you mad. The output format for “standard” AMI (as opposed to HTTP AMI) is awkward and makes it difficult to express complex data elements. The vocabulary is limited, forcing developers to invoke CLI commands and parse responses that were designed for human viewers rather. Worst of all, not all AMI events are documented, making it difficult to know what to expect and when to expect it.

Pros: lightweight, accessible from any language, asynchronous

Cons: inconsistant, poorly documented, awkward data format

External IVR Interface

The External IVR Interface (sometimes referred to as the External IVR protocol) was built as a way to enable and manage asynchronous audio playback. It either forks a child process or establishes a socket connection to an external application. The child / remote process then sends commands and receives events using simple text protocol. The external application can feed a list of files to be played, then continue processing (perhaps doing some kind of data lookup) while those files are playing. Events are generated when playback of a file completes, when DTMF digits are detected or when the caller hangs up.

The advantage of using External IVR to play audio is the granularity of control it provides. Other playback facilities in Asterisk block until playback completes, is interrupted by a digit or hangup event. External IVR lets the external application independently process DTMF events and interrupt playback based on its own criteria. This can be quite useful but the whole process by which it works is somewhat awkward and arcane. It’s also limited to controlling playback only – the rest of your application has to be built using the Dialplan, AGI or some other interface.

Pros: asynchronous control over audio playback is good

Cons: awkward, arcane, limited to strictly audio playback control

Asterisk Command Line Interface (CLI)

This is just what it says – the human-accessible interface into Asterisk from the Linux shell. In a perfect world nobody would ever list the CLI as an application programming interface. Unfortunately, there are many useful functions / commands that are accessible only from the CLI, so many AMI programs wind up invoking CLI commands using the “Command” action. Awkward…

Pros: lots of good, useful commands in there

Cons: output formatted for human rather than machine consumption, cheesy way of doing things

Call File Spool

Like the External IVR interface, the call file spool is a one-hit-wonder. It provides a simple way to generate outgoing calls by dropping specially formatted text files into a specific directory. It’s a pretty easy way to do outbound calling, but there are three other ways that are almost as easy and offer other advantages: the manager Originate action, the CLI “channel originate” command and the Originate Dialplan application.  (Besides, the spool module doesn’t work right on Mac OS X.)

Pros: easy way to generate outgoing calls

Cons: there are other, better ways to do the same, not cross platform

Asterisk Configuration Files

Perhaps calling Asterisk’s great wad of configuration files an API sounds even goofier than claiming that the CLI is an API, but there are some operations that can only be accomplished by making changes to the configuration files and reloading. This is especially true for any kind of “provisioning” – managing SIP peers, PSTN interfaces, mailboxes, static conference rooms, music-on-hold, etc. Until configuration is programmatically manageable, the configs will continue to be an API.

Pros: easily human readable

Cons: not programmatically friendly, reloads can impact system operation, isolated to single systems

So What?

With a total of eight ways to interface with Asterisk, you would think that the case would be closed: that programmers would have what they need to successfully build applications using Asterisk. To some degree that’s true, but only if the developer is willing and able to exercise all eight, using eight different syntaxes and researching what can be accomplished from each. Yuck.

Just as bad: the eight APIs we have are mostly useless to the majority of today’s programmers. Web developers are accustomed to working with relatively clean API paradigms like REST from relatively high-level languages like Ruby or Python. Low level languages, proprietary languages and hackish interfaces are a huge turn off, especially if they have to simultaneously overcome the learning curve for telephony.

In a better world we could just toss out the mess and start over. Unfortunately we don’t live in a better world, and many people have applications built on the current APIs. In the next article I will outline some ideas as to what I would like to see as the “Grand Unified API” for Asterisk 12.

(Edit: Read the next in this series: Why Bother?)