Inferring Web API Descriptions From Usage Data

November 30^th 2015

Originally published at:http://www.apiful.io/intro/2015/11/30/inferring-api-descriptions.html

In a previous post I made the point that Researching Web APIs should focus on approaches that observe how APIs are used and offer unintrusive solutions for developers thereupon. One first step in this direction is research recently published at the HotWeb '15 workshop. The work was conducted together with Philippe Suter, who also works here at IBM Research.

In this work, we address the problem of automatically creating formal Web API descriptions (e.g., Swagger, WADL) from usage data. Such descriptions exhibit extensive tooling ecosystems and thus, once available, ease many development tasks. For example, Web API descriptions allow to automatically derive client and server implementations in various languages, they enable automated testing, or provide basis for consistent documentation. The problem, however, is that creating and maintaining such descriptions typically requires manual effort and is error-prone.

To address this problem, we propose a method that takes as input observed usage data of a Web API (e.g., its server logs) and then uses machine learning to infer the corresponding Web API description. In our paper, we describe how we process the input data and how different classifiers are designed to infer the Web API descriptions. We also present findings about applying our method to logs about 10 IBM Watson APIs. Our experiments show that our method works well, especially as compared against an existing open source tool with the same goal, but that noise in the input data has a significant impact on the results.

This work is an important first step in delivering on our research agenda and hopefully paves the way for broader availability of Web API descriptions.

The paper is available online at IEEE Explore.