Welcome to my visualisation on COVID-19 Mutations Data Analysis.

This visualisation is about the spread of different SARS-CoV-2 variants in the United Kingdom, between Jan 2020 and Feb 2021.

You can view the data , or

The two charts are linked by the same data. Hover your mouse over the bars or circles, and the other chart will also respond. The legend works in a similar way as well.

This paragraph shows the last selected bar/circle:
Date: ______ Clade: ______ Count: ______

Data source: Nextstrain

How the data is processed: (This part is also included in the PDF.)

All data processing is done in JavaScript. The original data from Nextstrain is a .tsv (tab-separated values) file. Each row of the file is a separate genome sample record containing many attributes. Only two of the attributes are of concern here: the date when the sample is collected, and the clade that the identified virus belong to. The file is read using d3.tsv(), resulting in a JavaScript Promise object. The actual data is a list (JS Array) of objects containing two attributes: date and clade, and can be accessed in the .then() function of the Promise. The date attribute is also transformed into a string representing either a month (yyyy-mm) or a week (yyyy-Www).

The data is then grouped first by month/week, and second by clade, using d3.group(). And we count the number of samples in each group. This step also sorts the date in ascending order, and the clades in the order of appearance.

To draw the stacked bars, monthly/weekly data is then aggregated within each date group to compute a "baseline" for each bar. This step is similar to d3.stack but is written in vanilla JavaScript. The baselines will serve as the Y positions of the stacked bars. Padding is added to the bars for clarity.

To draw the bubble chart, the X and Y positions of the date and the clade of each group respectively. The radius of the circle is computed using Math.sqrt(). The radius is also adjusted with a minimum value of 2 to avoid circles that appear too small.