Welcome to my visualisation on COVID-19 Mutations Data Analysis.
This visualisation is about the spread of different SARS-CoV-2 variants in the United Kingdom, between Jan 2020 and Feb 2021.
You can view the data , or
The two charts are linked by the same data. Hover your mouse over the bars or circles, and the other chart will also respond. The legend works in a similar way as well.
This paragraph shows the last selected bar/circle:
Date: ______
Clade: ______
Count: ______
Data source: Nextstrain
How the data is processed: (This part is also included in the PDF.)
All data processing is done in JavaScript. The original data from Nextstrain is a .tsv
(tab-separated values) file. Each row of the file is a separate genome sample record containing many attributes.
Only two of the attributes are of concern here: the date when the sample is collected, and the clade that the
identified virus belong to. The file is read using d3.tsv()
, resulting in a JavaScript Promise
object. The actual data is a list (JS Array
) of objects containing two attributes:
date
and
clade
, and can be accessed in the .then()
function of the Promise
. The
date
attribute is also transformed into a string representing either a month (yyyy-mm) or a week
(yyyy-Www).
The data is then grouped first by month/week, and second by clade, using d3.group()
. And we count
the number of samples in each group. This step also sorts the date in ascending order, and the clades in
the order of appearance.
To draw the stacked bars, monthly/weekly data is then aggregated within each date group to compute a
"baseline" for each bar. This step is similar to
d3.stack
but is written in vanilla
JavaScript. The baselines will serve as the Y positions of the stacked bars. Padding is added to the bars for
clarity.
To draw the bubble chart, the X and Y positions of the date and the clade of each group respectively. The radius
of the circle is computed using Math.sqrt()
. The radius is also adjusted with a minimum value of 2
to
avoid circles that appear too small.