Skip to content

Templates for Bayesian Regressions

2018 April 17

At the Sustainable Development Research (SusDeveR) conference this weekend, I offered some simple tools for performing Bayesian Regressions: Jump to the Github Repository.

The point of these templates is to make it possible for anyone who is familiar with OLS to run a Bayesian regression. The templates have a chunk at the top to change for your application, and a chunk at the bottom that uses Gelman et al.’s Stan to estimate the posterior parameter distributions.

In general, the area at the top is just to create an output vector and a predictor matrix. Like this:
Constructing yy and XX

The template part has all of the Stan code, which (for a Bayesian regression) always has a simple form:
Simple Stan regression model

The last line does all of the work, and just says (in OLS speak) that the error distribution follows a normal distribution. Most of the templates also have a more efficient version, which does the same thing.

I say in the README what Bayesian regressions are and what they do. But why use them? The simple answer is that we shouldn’t expect the uncertainty on our parameters to be well-behaved. It’s nice if it is, and then OLS and Bayesian regressions will give the same answer. But if the true uncertainty on your parameter of interest is skewed or long-tailed or bimodal, the OLS assumption can do some real harm.

Plus, since Bayesian regressions are just a generalization of MLE, you can setup any functional form you like, laying out multiple, nonlinear expressions, estimating intermediate variables, and imposing additional probabilistic constraints, all in one model. Of course, the templates don’t show you how to do all that, but it’s a start.

Water-Energy-Food Flows

2018 February 25

The water-energy-food nexus has become a popular buzz-word in the sustainability field. It aims to capture the idea that water, energy, and food challenges are intertwined, and that shocks to any one can precipitate problems to all three.

I’ve often wondered how closely these three are intertwined though. Water is certainly needed for energy (for thermoelectric cooling and hydropower), but the reverse link (mostly pumping) seems a lot weaker. Water is also needed for food production, but is food needed for water availability? Energy and food have some links, with a fair amount of energy needed to produce fertilizer, and a some “food” production actually going to biofuelds, but the sizes aren’t clear.

Below is my attempt to show these flows, for the United States:

Water-Energy-Food Flows

It seems to me, based on this, that this is less a nexus than water-centered system. Every drop of water is fought over for energy, food, and urban production. It’s less a interconnected nexus than a hub-with-spokes. A way to recognize that water is at the center of it all.

– Hydrological flows: Total water (GW+SW) extractions from USGS. Food system only has irrigation and livestock; energy only has thermoelectric. The rest make up the difference.
– Energy system flows: Food system energy from Canning, P. 2010. Energy Use in the U.S. Food System. USDA Economic Research Report Number 94; “In 2010, the U.S. water system consumed over 600 billion kWh, or approximately 12.6 percent of the nation’s energy according to a study by researchers at the University of Texas at Austin.” from “Energy consumption by public drinking water and wastewater utilities, which are primarily owned and operated by local governments, can represent 30%-40% of a municipality’s energy bill.” from; remainder to 100%.
– Biofuels: 18.38e6 m^3 ethanol + 1.7e6 m^3 biodiesel, at a density of 719.7 kg/m^3 is 14.45e6 MT.
– Remainder of food: reports 635 billion pounds consumption, 81% of which was domestically produced.

Extrapolating the 2017 Temperature

2018 February 5

After NASA released the 2017 global average temperature, I started getting worried. 2017 wasn’t as hot as last year, but it was well above the trend.

NASA yearly average temperatures and loess smoothed.

Three years above the trend is pretty common, but it makes you wonder: Do we know where the trend is? The convincing curve above is increasing at about 0.25°C per decade, but in the past 10 years, the temperature has increased by almost 0.5°C.

Depending on how far back you look, the more certain you are of the average trend, and the less certain of the recent trend. Back to 1900, we’ve been increasing at about 0.1°C per decade; in the past 20 years, about 0.2°C per decade; and an average of 0.4°C per decade in the past 10 years.

A little difference in the trend can make a big difference down the road. Take a look at where each of these get you, uncertainty included:

A big chunk of the fluctuations in temperature from year to year are actually predictable. They’re driven by cycles like ENSO and NAO. I used a nice data technique called “singular spectrum analysis” (SSA), which identifies the natural patterns in data by comparing a time-series to itself at all possible offsets. Then you can take extract the signal from the noise, as I do below. Black is the total timeseries, red is the main signal (the first two components of the SSA in this case), and green is the noise.

Once the noise is gone, we can look at what’s happening with the trend, on a year-by-year basis. Suddenly, the craziness of the past 5 years becomes clear:

It’s not just that the trend is higher. The trend is actually increasing, and fast! In 2010, temperatures were increasing at about 0.25°C per decade, an then that rate began to jump by almost 0.05°C per decade every year. The average from 2010 to 2017 is more like a trend that increases by 0.02°C per decade per year, but let’s look at where that takes us.

If that quadratic trend continues, we’ll blow through the “safe operating zone” of the Earth, the 2°C over pre-industrial temperatures, by 2030. Worse, by 2080, we risk a 9°C increase, with truly catastrophic consequences.

This is despite all of our recent efforts, securing an international agreement, ramping up renewable energy, and increasing energy efficiency. And therein lies the most worrying part of it all: if we are in a period of rapidly increasing temperatures, it might be because we have finally let the demon out, and the natural world is set to warm all on its own.

2018 January 17

I’ve built a new tool for working with county-level data across the United States. The tool provides a kind of clearing-house for data on climate, water, agriculture, energy, demographics, and more! See the details on the AWASH News page.

1 Million Years of Stream Flow Data

2018 January 16

The 9,322 gauges in the GAGES II database are picked for having over 20 years of reliable streamflow data from the USGS archives. Combined, these gauges represent over 400,000 years of data.
They offer a detailed sketch of water availability over the past century. But they miss the opportunity to describe a even fuller portrait.

In the AWASH model, we focus on not only gauged points within the river network and other water infrastructure like reservoirs and canals, but also on the interconnections between these nodes. When we connect gauge nodes into a network, we can infer something about the streamflows between them. In total, our US river network contains 22,619 nodes, most of which are ungauged.

We can use the models and the structure of the network to infer missing years, and flows for ungauged junctions. To do so, we create empirical models of the streamflows for any guages for which we have a complete set of gauged of upstream parents. The details of that, and the alternative models that we use for reservoirs, can be details for another post. For the other nodes, we look for structures like these:

Structures for which we can infer missing month values, where hollow nodes are ungauged and solid nodes are gauged.

If all upstream values are known, we can impute the downstream; if the downstream value is known and all but one upstream values are known, we can impute the remaining one; if upstream or downstream values can be imputed according to these rules, they may allow other values to be imputed using that new knowledge. Using these methods, we can impute an average of 44 years for ungauged flows, and an average 20 additional years for gauged flows. The result is 1,064,000 years of gauged or inferred streamflow data.

We have made this data available as a Zenodo dataset for wider use.

Economic Damages from Climate Change

2017 June 29

When I tell people I study climate change, sooner or later they usually ask me a simple question: “Is it too late?” That is, are we doomed, by our climate inaction? Or, less commonly, they ask, “But what do we really know?”

With our new paper, Estimating Economic Damage from Climate Change in the United States, I finally have an answer to both of these questions; one that is robust and nuanced and shines light on what we know and still need to understand.

The climate change that we have already committed is going to cost us trillions of dollars: at least 1% of GDP every year until we take it back out of the atmosphere. That is equivalent to three times Trump’s proposed cuts across all of the federal programs he cuts.

If we do not act quickly, that number will rise to 3 – 10% by the end of the century. That includes the cost of deaths from climate change, lost labor productivity, increased energy demands, costal property damage. The list of sectors it does not include– because the science still needs to be done– is much greater: migration, water availability, ecosystems, and the continued potential for catastrophic climate tipping points.

But many of you will be insulated from these effects, by having the financial resources to adapt or move, or just by living in cooler areas of the United States which will be impacted less. The worst impacts will fall on the poor, who in the Untied States are more likely to live in hotter regions in the South and are less able to respond.

Economic damages by income deciles

One of the most striking results from our paper is the extreme impact that climate change will have on inequality in the United States. The poorest 10% of the US live in areas that lose 7 – 17% of their income, on average by the end of the century, while the richest 10% live where in areas that will lose only 0 – 4%. Climate change is like a subsidy being paid by the poor to the rich.

That is not to say that more northern states will not feel the impacts of climate change. By the end of the century, all by 9 states will have summers that are more hot and humid than Louisiana. It just so happens that milder winters will save more lives in many states in the far north than heat waves will kill. If you want to dig in deeper, our data is all available, in a variety of forms, on the open-data portal Zenodo. I would particularly point people to the summary tables by state.

Economic damages by county

What excites me is what we can do with these results. First, with this paper we have produced the first empirically grounded damage functions that are driven by causation rather than correlation. Damage functions are the heart of an “Integrated Assessment Model”, the models that are used by the EPA to make cost-and-benefit decisions around climate change. No longer do these models need to use out-dated numbers to inform our decisions, and our numbers are 2-100 times as large as they are currently using.

Second, this is just the beginning of a new collaboration between scientists and policy-makers, as the scientific community continues to improve these estimates. We have built a system, the Distributed Meta-Analysis System, that can assimilate new results as they come out, and with each new result provide a clearer and more complete picture of our future costs.

Finally, there is a lot that we as a society can do to respond to these projected damages. Our analysis suggests that an ounce of protection is better than a pound of treatment: it is far more effective (and cheaper) to pay now to reduce emissions than to try to help people adapt. But we now know who will need that help in the United States: the poor communities, particularly in the South and Southeast.

We also know what needs to be done, because the biggest brunt of these impacts by far comes from pre-mature deaths. By the end of the century, there are likely to be about as many deaths from climate change as there are currently car crashes (about 9 deaths per 100,000 people per year). That can be stemmed by more air-conditioning, more real-time information and awareness, and ways to cool down the temperature like green spaces and white roofs.

Our results cover the United States, but some of the harshest impacts will fall on poorer countries. At the same time, we hope the economies of those countries will continue to grow and evolve, and the challenges of estimating their impacts need to take this into account. That is exactly what we are now doing, as a community of researchers at UC Berkeley, the University of Chicago, and Rutgers University called the Climate Impacts Lab. Look for more exciting news as our science evolves.

Probabilistic Coupling

2017 May 1

Environmental Modelling & Software has just published my work on a new technique for coupling models: Probabilistic Coupling. My thoughts on coupled models had been percolating for a couple years, before a session at the International Conference on Conservation Biology in 2013 offered me a chance to try it out.

Probabilistic coupling has three main goals:

  • Allowing models to be coupled without distortionary feedback
  • Allowing multiple models to inform the same variable
  • Allowing models to be coupled with different scales

With these three features, the very nature and approach of coupling models can change. Current model coupling requires carefully connecting models together, plugging inputs into outputs, and then recalibrating to recover realistic behavior again. Instead, this allows for what I call “Agglomerated Modeling”, where models are thrown together into a bucket and almost magically sort themselves out.

The code for the model is available within the OpenWorld framework, as the coupling example.

Science and language

2016 February 6

One of the rolling banners at last year’s meeting of the American Geophysical Union had a scantly-clad woman and the words “This is what most people think of as a ‘model'”. See, scientists have a communications problem. It’s insidious, and you forget how people use words and then feel attacked when you have to change how you speak.

I have a highly-educated editor working with me on the coffee and climate change report, and she got caught up on a word I use daily: “coefficient”. For me, a coefficient is just a kind of model parameter. I replaced all the uses of “coefficient” with “parameter”, but I simultaneously felt like it dumbed out an important distinction and wondered if “parameter” was still not dumbed down enough.

AGU has a small team trying to help scientists communicate better. I think they are still trying to figure out how to help those of us who want their help. I went to their session on bridging the science-policy divide, and they spent a half hour explaining that we have two houses of congress. Nonetheless, it is a start, and they sent us home with communication toolkits on USB. One gem stood out in particular:

So I will try to reduce the ignorance and political distortions of my devious communication plots, until I can flip the zodiac on this good response loop. Wish me luck.

Tropict: A clearer depiction of the tropics

2016 January 15

Tropict is a set of python and R scripts that adjust the globe to make land masses in the tropics fill up more visual real estate. It does this by exploiting the ways continents naturally “fit into” each other, splicing out wide areas of empty ocean and nestling the continents closer together.

All Tropict scripts are designed to show the region between 30°S and 30°N. In an equirectangular projection, that looks like this:


It is almost impossible to see what is happening on land: the oceans dominate. By removing open ocean and applying the Gall-Peters projection, we get a clearer picture:


There’s even a nice spot for a legend in the lower-left! Whether for convenience or lack of time, the tools I’ve made to allow you to make these maps are divided between R and Python. Here’s a handy guide for which tool to use:


(1) Supported image formats are listed in the Pillow documentation.
(2) A TSR file is a Tropict Shapefile Reinterpretation file, and includes the longitudinal shifts for each hemisphere.

Let’s say you find yourself with a NetCDF file in need of Tropiction, called bio-2.nc4. It’s already clipped to between 30°S and 30°N. The first step is to call to create a Tropicted NetCDF:

python ../ subjects/bio-2.nc4 ../bio-2b.nc4

But that NetCDF doesn’t show country boundaries. To show country boundaries, you can follow the example for using draw_map.R:


## Open the Tropicted NetCDF
database <- nc_open("bio-2b.nc4")
## Extract one variable
map <- ncvar_get(database, "change")

## Identify the range of values there
maxmap <- max(abs(map), na.rm=T)

## Set up colors centered on 0
colors <- rev(brewer.pal(11,"RdYlBu"))
breaks <- seq(-maxmap, maxmap, length.out=12)

## Draw the NetCDF image as a background
splicerImage(map, colors, breaks=breaks)
## Add country boundaries
## Add seams where Tropict knits the map together

Here’s an example of the final result, for a bit of my coffee work:


For more details, check out the documentation at the GitHub page!

And just for fun, here were two previous attempts of re-hashing the globe:


I admit that moving Australia and Hawaii into the India Ocean was over-zealous, but they fill up the space so well!


Here I can still use the slick division between Indonesian and Papua New Guinea and Hawaii fits right on the edge, but Australia gets split in two.

Enjoy the tropics!

Redrawing boundaries for the GCP

2015 December 20

The Global Climate Prospectus will describe impacts across the globe, at high resolution. That means choosing administrative regions that people care about, and representing impacts within countries. However, choosing relevant regions is tough work. We want to represent more regions where there are more people, but we also want to have more regions where spatial climate variability will produce different impacts.

We now have an intelligent way to do just that, presented this week at the meeting of the American Geophysical Union. It is generalizable, allowing the relative role of population, area, climate, and other factors to be adjusted while making hard decisions about what administrative units to combine.  See the poster here.

Below is the successive agglomeration of regions in the United States, balancing the effects of population, area, temperature and precipitation ranges, and compactness. The map progresses from 200 regions to ten.


Across the globe, some countries are maintained at the resolution of their highest available administrative unit, while others are subjected to high levels of agglomeration.


The tool is generalizable, and able to take any mechanism for proposing regions and scoring them. That means that it can also be used outside of the GCP, and we welcome anyone who wants to construct regions appropriate for their analysis to contact us.