We had a Zoom session with a friend yesterday as I was showing him around the Wolfram Language, and one of the questions we discussed is whether there is a correlation between death rate per capita and population density for US states. We could hypothesize that there should be one, In essence, higher population density means higher rates of infection, regardless of what the official number of confirmed cases say, as the testing protocols can be different between different states.

Population density for the continental US states

For a warmup, let’s plot the population densities for the continental US states, which is easy to do as this information is available as one of AdministrativeDivision properties. We can use a convenient EntityValue function, with a proper EntityClass as the first argument, which can be entered by pressing Ctrl+=, typing “us states”, and hitting Enter. You can think of EntityClass as a collection of entities of a particular type.

image1.png

Since we provided EntityAssociation as a third (optional) argument to EntityValue, the result is an association which maps each US state entity to its population density.

image2.png

image3.png

Notice that the population density is not just a simple number, but a number with some magnitude (people per square mile), represented in Wolfram Language by an expression with the Quantity head.

image4.png

image5.png

Here, % refers to the result of the previous expression, and FullForm shows how the above expression is stored internally. FrontEnd does a charming job of displaying the entity, quantity, rule, and association nicely.

Having the population density represented as a quantity is very convenient, as we, for example, can convert it into different units:

image6.png

image7.png

Before we can display these densities on a heat map, we would need to do a couple of things: first, we want to remove Alaska and Hawaii, and also DC (as it’s a clear outlier). And secondly, we want to apply Log to numerical values, as the difference in densities between a tiny Delaware and a sparsely populated Wyoming is over 60x!

The first task can be done with KeyDropFrom, which drops given keys from association (you can copy and paste the names of the states from the output, or use Ctrl+= again to enter them). The second task can be accomplished by using a combination of QuantityMagnitude, which extracts a numerical value from Quantity, and Log,. Both of this functions are Listable (= can be applied to a list or association without an explicit call to Map), and Map[f, expr] applies f to values when expr is an association. Finally, GeoRegionValuePlot will color the states according to the (log) of their population density:

image8.png

image9.png

We can combine the above two steps into a function that, given an entity list of countries or administrative divisions (everything that has a population density property) plots the above. The optional exclude argument can be used to remove certain areas from plotting.

image10.png

image11.png

image12.png

image13.png

image14.png

Note that we can also use a collection of countries as an argument to plotPopulationDensities, since each country is an Entity object that also has PopulationDensity property:

image15.png

image16.png

image17.png

image18.png

Death rates per capita vs population density

Now for the main dish: we’re going to use the same dataset as in our earlier blog post on this topic, and use the fact that AdministrativeDivision column returned by ResourceData is an Entity object:

image19.png

Let’s take the first line of this dataset as an example (Normal converts a one element dataset into an association):

image20.png

image21.png

Let’s write a function that calculates two values - population density and COVID-19 deaths per capita. We also wrap these two values into a callout, so that ListPlot can show these labels:

image22.png

image23.png

image24.png

Now we can apply this function to each row of the dataset (we’ll also remove a couple of outliers) and plot the result:

image25.png

image26.png

image27.png

Visually, there is a weak correlation between log population density and log death rate per capita, as we expected. Let’s calculate the R-square explicitly. We can use Cases to extract just the first argument from Callout. There are two more subtle points: we apply N to callouts, which calculates the numerical value to all the logarithms, and we only select cases where the second value is Real, since for Wyoming, that death per capita is minus infinity.

image28.png

image29.png

image30.png

image31.png

The R-squared is pretty low, though at only 22%:

image32.png

image33.png

Finally, let’s plot the fitted line on the chart:

image34.png

image35.png

Download this notebook