The personal website of @erikwittern

Analyzing Open API Specifications

May 9th 2016
Originally published at:http://www.apiful.io/intro/2016/05/09/analyzing-api-specifications.html

Third party repositories like APIs Guru collect various API specifications describing interfaces of diverse services. We analyze these specifications to learn about the structure of web APIs.

The data: 260 Open API Specifications

While API providers themselves barely expose API specifications - like the Open API specification, RAML, or WADL - third parties have taken it upon them to create, maintain, and expose such documents. A great example is the APIs Guru project. It currently contains 260 Open API Specifications for 230 APIs (some APIs have specifications for different versions). The specifications cover some well-known APIs - including a large set of Google APIs, the Twilio API, the Spotify API, the Slack API, or GitHub's API - but also arguably less known APIs, like the BikeWise API or the FunShouts API.

The Analysis

We analyzed these specifications to obtain insights into some of the practices common among APIs. Please note that we do not claim our findings to be generalizable - ultimately, the assessed API specifications may not necessarily represent an unbiased sample. However, they indicate some common practices across APIs.

Path definitions

First, we overall find 5187 paths defined across all specifications, meaning on average over specification contains 20 path definitions. Looking at the distribution in Figure 1, we find that a majority of API specifications define between 1 and 15 paths. There are only individual cases with more than 50 paths (the four reported cases of 108 paths all belong to different versions of the "DCM/DFA Reporting And Trafficking" API). The most extreme case is the Trello API with 264 paths. In line with the high number of paths, the specification is over 16,000 lines long. The fact that such long specifications exist underlines the need to make use of them with corresponding tools, for example to enable humans to parse them. In the case of Trello, their online documentation keeps navigating the many paths manageable by grouping paths based on the resource they address.

Figure 1: Distribution of paths across Open API SpecificationsFigure 1: Distribution of paths across Open API Specifications

Endpoint definitions (path + method)

Moving on, we find that the specifications define overall 7329 endpoints across the 5187 paths. Here, we follow our definition of an endpoint being a combination of a path and an HTTP method, as presented in previous research. The distribution endpoints across paths is shown in Figure 2. Roughly 90% of paths define a single or two endpoints (i.e., feature one or two methods), and 74% of paths even only define a single endpoint. This shows how infrequently APIs provide the full canon of Create-Read-Update-Delete (CRUD) operations on resources.

Figure 2: Distribution of endpoints across pathsFigure 2: Distribution of endpoints across paths

Method distribution

Looking into the HTTP methods of the 7329 endpoints, we find a distribution that heavily relies on GET requests as shown in Figure 3 - in fact, over 53% (3925 occurrences) of the endpoints define GET requests. These are followed in frequency by POST requests (1738 occurrences, 23%). Endpoints defining PUT and DELETE requests occur roughly similarly frequent (639 occurrences, 8.7% and 654 occurrences, 8.9% respectively). Endpoints defining PATCH requests lack a bit behind (371 occurrences, 5%). Finally, we find 2 endpoints defining HEAD requests. Arguably, these findings indicate that a primary purpose of nowadays Web APIs is the retrieval of data, which is otherwise not accessible to clients.

Figure 3: Break down of HTTP methods among endpointsFigure 3: Break down of HTTP methods among endpoints

Payload data definitions

Next, we assess how many endpoints define sending payload in the request body. We find that 2224 out of the 7329 endpoints, equal to 30%, define payload to be sent. Pretty much in line with common best practices, over 99% of these payload definitions appear in endpoints defining POST, PUT, or PATCH requests. Endpoints defining POST requests most frequently contain payload definitions, making out 53% of all definitions. Only 4 or 15 endpoint definitions contain payload for GET or DELETE requests respectively. This finding is reassuring with regard to the correct usage of HTTP methods, i.e., GET requests should not contain payload as they ought to be idempotent and save, whereas sending extensive data in them would indicate breaking these rules.

Query parameter definitions

Finally, we looked at the definition of query parameters in the 7329 endpoints. Figure 4 shows the distribution of query parameter definitions across the endpoints. Overall, 5510 or 75% of the 7329 endpoints define at least one query parameters. As seen in Figure 4, most endpoints define between 1 and 17 query parameters. The large number of endpoints defining 7 query parameters is an interesting anomaly - looking into which specifications do so does, however, not reveal any explanation. Only 2 endpoints define more than 50 query parameters. Even smaller numbers arguably make the use of said endpoints cumbersome.

Figure 4: Distribution of query parameters among endpointsFigure 4: Distribution of query parameters among endpoints

The total number of query parameter definitions across all endpoints is 44,982. Of these, only 1,871 or about 4% are required (as indicated by a corresponding property in the Open API specification). As such, query parameters seem mostly to be used to refine the definition of resources to interact with in a request or to refine their representation (e.g., their ordering or limits to their number). Various opinions exist among developers about the meaning of query parameters, but the results found here show that at least some consensus exists about such parameters being optional.

Looking at the methods of endpoints that contain query parameter definitions, we find the break down shown in Figure 5. Most query parameter definitions (60%) appear in endpoints defining GET requests. The relatively high number of parameters in POST, PUT, and PATCH requests is notable, as they commonly (see above) send data also in the request payload body.

Figure 5: Break down of query parameter definitions among methodsFigure 5: Break down of query parameter definitions among methods

The Takeaways

What did we learn from the above observations? We gain insights into the typical APIs by summarizing the numbers: Half of the analyzed APIs have 9 or less paths, three quarters have 22 or less paths. A majority of these paths relate to a single endpoint definition, featuring a single HTTP method - which is predominantly GET. Only infrequently do we observe APIs that allow the full spectrum of CRUD operations on individual resources. Looking at the data sent in requests, we find that 30% percent of endpoints define payload data. Basically, all these endpoints relate to POST, PUT, or PATCH requests, which is in line with best practices for API design. Roughly 70% of endpoints define at least one query parameter. More specifically, half of all endpoints define up to 7 parameters, and three quarters define up to 8 parameters. Higher numbers of parameters are uncommon. Most frequently, query parameters are defined for GET requests. And overall, query parameters are to a very large extend optional.

Apart from these common cases, there are also outliers: we observe few huge APIs with over 200 paths and endpoints, and complex APIs where individual endpoints have up to 50 query parameters. These cases raise questions about good API design, as they likely deter developers from using these APIs.

Overall, though, the here presented numbers show that already the average API nowadays features, in my opinion, a considerable size and complexity - this, again, motivates (our) attempts to refine API specifications and to build better tools around them.