Formulas used to calculate the statistics presented in this application

Those formulas are based on our large-scale crawl of top 10,000 Top Alexa Sites , where we have extracted and analyzed JavaScript Libraries and codes.

Popularity and Impact of a library

Popularity of a library

In a set of sites, the popularity of a library is simply the subset of sites using it.

$$\begin{equation} pop(src) = \frac{\sum_{s \in S} using(s, src)}{|S|} \text{where} s \text{ is a site} \end{equation} $$ and $$ using(s, src) = \begin{cases} 1 & \quad \text{if } \text{ site } s \text{ is using the library } src\\ 0 & \quad \text{otherwise } \\ \end{cases} $$

Impact of a library

The impact of a library, is its popularity weighted by the impact of the subset of sites using it

$$\begin{equation} \label{impact_script} impact(src) = \frac {\sum_{s\in S} impact(s) * using(s, src)}{\sum_{s\in S} impact(s)} \end{equation}$$

The impact of a site, is a number in the interval ]0, 1], assigned to each site, which is based on its rank

$$\begin{equation*} \label{impact_site} impact(s) = 1 - \frac{rank(s) - 1}{|S|} \end{equation*} $$ where $ rank(s) \text{ is the rank of site }s$ among the top Alexa sites

Popularity vs Impact of a library

Let's consider the example of 2 libraries A and B with the same popularity (10%) in a set of 10,000 sites. The only difference is that A is used in the subset of sites which ranks are in the intervall [1, 1000] (top most sites) and B in sites which ranks are in the intervall [9001, 10,000]. The following table shows the huge difference in their impacts between libraries A and B. This is due to the fact that A is used in the top most sites, and B in the top less. Notice that impact is a generalization of popularity. If we assume that all the sites have the same impact, then the impact and the popularity of a library coincide.

Library Popularity Impact
A 10% 18.99910%
B 10% 1.00089%

Formulas for statistics of a construct in a script and set of scripts

The core of our work was to establish the popularity of JavaScript constructs, whether it is correct, as some researchers do, to only consider a subset of the JavaScript language based on the assumption that other constructs are less used. Recall that we have downloaded inline and remote JavaScript files for the top Alexa sites 10,000 sites.

Occurrence of a construct in a script

In order to analyze JavaScript constructs, we have used the open source JavaScript parser Esprima. The goal of parsing each JavaScript file is to build a list of constructs used in it, and how many times they occur. This is called the occurrence of the constructs in the script.We denote it as follows:

$occurrence(c, src) \text{ where } c \text{ is a construct and } src \text{, a script }$

Average occurrence of a construct in a set of scripts

If we consider a set of scripts, we can calculate the number of times on average the construct appears in this set of scripts. We call this the average occurrence of the construct in the set of scripts. It is given by the following formula.

$$\begin{equation} average(c, Sr) = \frac{\sum_{src \in Sr} occurrence(c, src)}{|Sr|} \text{ where } Sr \text{ is a set of scripts} \end{equation}$$

Popularity of a construct in a set of scripts

In the same way, considering a set of scripts, we can deduce how many scripts are using a construct at least once. We call this the $\textbf{popularity of a construct}$ in a set of scripts, which is given by the following formula.

$$\begin{equation} pop(c, Sr) = \frac{\sum_{src \in Sr} using(c, src)}{|Sr|} \text{ where } Sr \text{ is a set of scripts} \end{equation}$$ and $$using(c, src) = \begin{cases} 1 & \quad \text{if } occurrence(c, src) > 0\\ 0 & \quad \text{otherwise } \\ \end{cases} $$

Since each script has an impact or can be assigned one, by weighting the occurrence, the average occurrence and the popularity of a construct by the impact of the script(s) in which it is used, we obtain the followings:

Normalized occurrence of a construct in a script

This is simply the occurrence of a construct weighted by the impact of a script in which it is used. It is given by the following formula.

$$\begin{equation*} n\_occurrence(c, src) = occurrence(c, src) * impact(src) \end{equation*}$$

Normalized average occurrence of a construct in a set of scripts

It is the average occurrence weighted with the impact of scripts in which it is used.

$$\begin{equation*} n\_average(c, Sr) = \frac{\sum_{src \in Sr} n\_occurrence(c, src)}{|Sr|} \end{equation*}$$

Impact of a construct in a set of scripts

Intuitively, the impact of a construct is its popularity weighted with the impact of scripts in which it is used.

$$\begin{equation*} impact(c, Sr) = \frac{\sum_{src \in Sr} impact(src) * using(c, src) }{\sum_{src \in Sr} impact(src)} \end{equation*}$$

Formulas for statistics in a site and set of sites

Categories of scripts

In the previous section, we have presented formulas to calculate statistics of a construct in a set of scripts. In the following sections, we apply those formulas to scripts extracted from the top 10,000 sites, classified in 5 categories: inline, local, related, remote and all scripts. In order to apply those formulas to any category of scripts, one need to answer 2 main questions:

  • What are the set of scripts for the category.
  • What are the impact of the scripts in that category ?

Bellow, we consider each category of scripts, answering to the 2 fundamental questions above.

  • For inline scripts category, the set of scripts is simply the inline scripts that we found in all the top 10,000 sites. Each inline script is assigned the impact of its corresponding site
  • For local scripts category, the set of scripts is the set of set of local scripts per site. Each local script is assigned the impact of its corresponding site
  • For related scripts category, the set of scripts in simply the set formed by the union of local and inline scripts per site, each local or inline script being assigned the impact of its related site
  • Regarding remote scripts, the set of scripts is that composed with all the remote scripts extracted out of the top 10,000 sites. The formulas here are exactly those described above for a script and set of scripts.
  • The category all scripts is the union of remote and inline scripts. Each remote script is assigned its impact (as above) and each inline script, the impact of its related site.

Applying formulas to inline scripts

Here, we give an example on how we applied the formulas to inline scripts. The process is the same for all other categories of scripts. We introduce an auxiliary function inlines(s) that, returns the set of inline scripts of a site s, S being the set of sites. The function in_average gives the average occurrence, n_in_average, the normalized average occurrence, pop_in, the popularity, and impact_in the impact of a construct the set of top sites' inline scripts.

$$\begin{equation*} in\_average(c, Sr) = average(c, Sr) \end{equation*} $$ $$\begin{equation*} in\_n\_average(c, Sr) = n\_average(c, Sr) \end{equation*} $$ $$\begin{equation*} pop\_in(c, Sr) = pop(c, Sr) \end{equation*} $$ $$\begin{equation*} impact\_in(c, Sr) = impact(c, Sr) \end{equation*} $$ where $ Sr = \bigcup_{s \in S}^{} inlines(s) $

For other categories of scripts, the formulas are exactly the same, applied to the corresponding set of scripts, and their impacts.