Those formulas are based on our large-scale crawl of top 10,000 Top Alexa Sites , where we have extracted and analyzed JavaScript Libraries and codes.
In a set of sites, the popularity of a library is simply the subset of sites using it.
$$\begin{equation} pop(src) = \frac{\sum_{s \in S} using(s, src)}{|S|} \text{where} s \text{ is a site} \end{equation} $$ and $$ using(s, src) = \begin{cases} 1 & \quad \text{if } \text{ site } s \text{ is using the library } src\\ 0 & \quad \text{otherwise } \\ \end{cases} $$The impact of a library, is its popularity weighted by the impact of the subset of sites using it
$$\begin{equation} \label{impact_script} impact(src) = \frac {\sum_{s\in S} impact(s) * using(s, src)}{\sum_{s\in S} impact(s)} \end{equation}$$The impact of a site, is a number in the interval ]0, 1], assigned to each site, which is based on its rank
$$\begin{equation*} \label{impact_site} impact(s) = 1 - \frac{rank(s) - 1}{|S|} \end{equation*} $$ where $ rank(s) \text{ is the rank of site }s$ among the top Alexa sitesLet's consider the example of 2 libraries A and B with the same popularity (10%) in a set of 10,000 sites. The only difference is that A is used in the subset of sites which ranks are in the intervall [1, 1000] (top most sites) and B in sites which ranks are in the intervall [9001, 10,000]. The following table shows the huge difference in their impacts between libraries A and B. This is due to the fact that A is used in the top most sites, and B in the top less. Notice that impact is a generalization of popularity. If we assume that all the sites have the same impact, then the impact and the popularity of a library coincide.
Library | Popularity | Impact |
---|---|---|
A | 10% | 18.99910% |
B | 10% | 1.00089% |
The core of our work was to establish the popularity of JavaScript constructs, whether it is correct, as some researchers do, to only consider a subset of the JavaScript language based on the assumption that other constructs are less used. Recall that we have downloaded inline and remote JavaScript files for the top Alexa sites 10,000 sites.
In order to analyze JavaScript constructs, we have used the open source JavaScript parser Esprima. The goal of parsing each JavaScript file is to build a list of constructs used in it, and how many times they occur. This is called the occurrence of the constructs in the script.We denote it as follows:
$occurrence(c, src) \text{ where } c \text{ is a construct and } src \text{, a script }$If we consider a set of scripts, we can calculate the number of times on average the construct appears in this set of scripts. We call this the average occurrence of the construct in the set of scripts. It is given by the following formula.
$$\begin{equation} average(c, Sr) = \frac{\sum_{src \in Sr} occurrence(c, src)}{|Sr|} \text{ where } Sr \text{ is a set of scripts} \end{equation}$$In the same way, considering a set of scripts, we can deduce how many scripts are using a construct at least once. We call this the $\textbf{popularity of a construct}$ in a set of scripts, which is given by the following formula.
$$\begin{equation} pop(c, Sr) = \frac{\sum_{src \in Sr} using(c, src)}{|Sr|} \text{ where } Sr \text{ is a set of scripts} \end{equation}$$ and $$using(c, src) = \begin{cases} 1 & \quad \text{if } occurrence(c, src) > 0\\ 0 & \quad \text{otherwise } \\ \end{cases} $$Since each script has an impact or can be assigned one, by weighting the occurrence, the average occurrence and the popularity of a construct by the impact of the script(s) in which it is used, we obtain the followings:
This is simply the occurrence of a construct weighted by the impact of a script in which it is used. It is given by the following formula.
$$\begin{equation*} n\_occurrence(c, src) = occurrence(c, src) * impact(src) \end{equation*}$$It is the average occurrence weighted with the impact of scripts in which it is used.
$$\begin{equation*} n\_average(c, Sr) = \frac{\sum_{src \in Sr} n\_occurrence(c, src)}{|Sr|} \end{equation*}$$Intuitively, the impact of a construct is its popularity weighted with the impact of scripts in which it is used.
$$\begin{equation*} impact(c, Sr) = \frac{\sum_{src \in Sr} impact(src) * using(c, src) }{\sum_{src \in Sr} impact(src)} \end{equation*}$$In the previous section, we have presented formulas to calculate statistics of a construct in a set of scripts. In the following sections, we apply those formulas to scripts extracted from the top 10,000 sites, classified in 5 categories: inline, local, related, remote and all scripts. In order to apply those formulas to any category of scripts, one need to answer 2 main questions:
Bellow, we consider each category of scripts, answering to the 2 fundamental questions above.
Here, we give an example on how we applied the formulas to inline scripts. The process is the same for all other categories of scripts. We introduce an auxiliary function inlines(s) that, returns the set of inline scripts of a site s, S being the set of sites. The function in_average gives the average occurrence, n_in_average, the normalized average occurrence, pop_in, the popularity, and impact_in the impact of a construct the set of top sites' inline scripts.
$$\begin{equation*} in\_average(c, Sr) = average(c, Sr) \end{equation*} $$ $$\begin{equation*} in\_n\_average(c, Sr) = n\_average(c, Sr) \end{equation*} $$ $$\begin{equation*} pop\_in(c, Sr) = pop(c, Sr) \end{equation*} $$ $$\begin{equation*} impact\_in(c, Sr) = impact(c, Sr) \end{equation*} $$ where $ Sr = \bigcup_{s \in S}^{} inlines(s) $For other categories of scripts, the formulas are exactly the same, applied to the corresponding set of scripts, and their impacts.