Originally published at:http://www.apiful.io/intro/2016/06/01/npm-analysis.html
An HTML version (that looks like the original paper) is available here (created with pdf2htmlEX).
The official version of the paper is available at the ACM digital library.
A first set of analyses concerns the evolution of npm as an ecosystem. Figure 1 shows the superlinear growth of npm both with regard to number of packages (blue line) and number of dependencies (green line). Furthermore, we see that the average number of dependencies specified per package is also growing, if not in a superlinear way (red line).
Figure 1: Packages (blue), overall dependencies (green), and avg. dependencies per package (red) in npm over time
Interestingly, while the number of dependencies specified per package continuously rises, there remains a relatively fixed ratio of packages on npm that are depended upon. Figure 2 shows, over time, the percentages of packages that are depended upon by different numbers of other packages. The graph indicates a power-law distribution, where a comparatively small percentage of packages is depended upon very frequently, whereas the large number of packages is not depended upon at all.
Figure 2: Characterizing packages in npm by the number of packages depending on them
PageRank over time
Next, and coming back to our original motivation, we assessed ways to describe package popularity. Having package.json files for all versions of packages, we can reconstruct the npm dependency graph at any point in time point (we have data spanning October 1st 2010 to September 1st 2015). Doing so allows us to use graph measures like PageRank to compute the importance of packages. Applying PageRank on a weekly basis allows to follow how package popularity evolves. For example, figure 3 shows the inner-npm PageRank of selected utility libraries. Here we see, for example, how Underscore.js remained one of the highest ranked packages in npm since its publication. Basically all other utility libraries tend to tank in popularity. The exception is lodash, which originally mimicked Underscore's API and was actually able to overtake it with regards to PageRank in May 2015.
Figure 3: PageRank of selected utility packages on npm over time
Complexities of package popularity
Figure 4: Different keywords denoting packages with either strong PageRank or high number of dependencies from GitHub
Versioning on npm
Finally, we also looked at the use of versions in npm. npm prescribes the use of semantic versioning. Every version should consist of a triple
Figure 5: Implicit adoption ratios of releases of the express package. Blue circle = major release, red circle = minor release, green circle = patch release. The size of a circle indicates the implicit adoption ratio (largest: 48%).
Figure 5 illustrates selected releases of the Express application framework. Every circle represents a release. The color of the circle states the nature of the release, being either major, minor, or patch. Finally, the size of a circle indicates the implicit adoption ratio of the release. The figure shows that with the release of the major version 4, following Express releases targeted both major version 3 and 4. Since the release of version
Takeaways and outlook
When we started to look into npm, we were motivated solely by package recommendation. We quickly learned, though, that there are many aspects that can yield in interesting findings. Our paper is a first attempt to scientifically capture these findings. Since we submitted the paper for peer-reviews, the left-pad incident happened, motivating new, interesting studies. We hope there will be follow-up papers on npm and software ecosystems in general. Their findings will tell a lot about modern software development and help shape the design of related tools.