Link to dashboard: The State of Fraser River Sockeye
In this project, I integrated data about Fraser River salmon populations and develop an interactive data visualization tool that allows fisheries scientists and managers to explore this data at different spatial and temporal scales. The objective of this project is to provide fisheries scientists and managers with the tools to easily and comprehensively explore trends in salmon data, and discover features and attributes of that data. This was my capstone project for Tamara Munzner’s graduate course in data visualization at UBC. The full paper can be downloaded here.
I was recruited by the DFO in the summer of 2017 to join a project called “The State of the Salmon” (SoS). The project team (henceforth referred to as the SoS team) is composed of four members—a senior analytical biologist, a junior analytical biologist, a senior statistician, and myself. The other members of the team expressed a desire to consolidate available data from Fraser salmon populations and provide a simple, interactive interface through which managers and researchers can explore that data, identify patterns, and discover how these patterns change over time. My role with the group would be to develop a visualization tool that department scientists could use to explore and examine the existing data on Fraser river sockeye populations.
Fisheries managers for the federal Department of Fisheries and Oceans are tasked with determining when and where to conduct fisheries, as well as how many fish to harvest at a time. These decisions are informed by advice from fisheries scientists, who in turn are responsible for understanding the factors that affect fish population dynamics. Routine analysis of this information by fisheries scientists within the DFO is impeded by decentralized datasets and the lack of user-friendly tools with which to explore the available information. There is presently no interface with which scientists can review and examine existing salmon data across species and watersheds. The development of such an interface would provide salmon scientists with a more thorough understanding of salmon in the Fraser river.
The senior biologist on the SoS team expressed a desire to use visualizations to help users browsing the dataset to “understand the state of the salmon in British Columbia”. After further discussion, we decided to narrow our focus on one of several subsets of user types within the department in order to characterize the tasks a user might have. There were several end user groups that we were interested in developing a visualization tool for, but ultimately decided our target users were senior science managers. These individuals are mostly concerned with high-level tasks and are broad rather than deep consumers of information. More specifically, these users would be interested in the following types of tasks:
1. Exploring the existing dataset.
2. Discover trends and features in measures of sockeye population size and health over time.
3. Compare trends and features across different Conservation Units (CUs).
4. Discover similarities between CUs.
5. Explore the topology of CUs within the Fraser river. Managers want to be able to explore the relationship between population health and the location of those populations relative to other populations within the watershed. Similar attributes (such as “red” status) amongst CUs that were grouped together by their position within the watershed could indicate that local factors were affecting sockeye populations.
Science managers in the DFO are distinct from research scientists in that they consume information, rather than produce it. They do not usually perform analysis or generate hypotheses about salmon populations; instead, they expect to have a high-level understanding of the short and long term trends in population health and abundance.
The solution I developed is a Tableau dashboard with geospatial and temporal representations of salmon population data. The top half of the dashboard depicts CU status and their location in both geographic (map) and quasi-geographic (tree plot) representations. CU status is encoded as a glyph that combines shape and hue. The bottom half of the dashboard is a series of three bar plots and two line plots that depict five measures of population size and health. These plots can be manipulated using a series of drop-down filters that The subcomponents of the dashboard are described below.
1.1 Status glyphs
The CU statuses were encoded as glyphs that combined hue and shape to represent each status attribute in the dataset. The DFO uses a red-yellow-green “stoplight” encoding for population health, as described previously in the derived data section. These encodings are potentially problematic, as it would be difficult for individuals with red/green colorblindness to differentiate between them. Adopting a colorblind-friendly color plot (such as a categorical diverging blue-orange) is an obvious solution, but an issue here is that the attributes being encoded take the names of the hues they are encoded by. Using a blue hue to encode the attribute “green” seems undesirably confusing.
Given this constraint, adopting a dual hue/shape encoding seemed to be an appropriate solution. In this encoding, “Green” status is represented by a green circle, “Amber” by an amber square, and “Red” by a red triangle. Using both hue and shape channels allowed me to maintain the color-attribute relationship DFO users are familiar with, while also making the visualization more colourblind friendly. Another solution would be to vary the luminance of each hue in order to make them distinguishable along that channel. I chose to use shape rather than luminance as I expected to use these statuses as nodes in a tree plot, where they would necessarily assume a shape. Furthermore, these channels do not exclude the use of gradations in luminance, although the use of three separate channels to encode a single attribute seems unnecessarily redundant.
The visualization features two maps—one that is juxtaposed with the tree plot, and another that is situated near the filters for the lower half of the dashboard. Both were designed to emphasize bodies of water, while de-emphasizing elements such as human development, place names and topological features. Including this information would potentially distract from the purpose of the map, which was to provide the location of the CUs both within the province and in relation to one another. Including extraneous details could detract from that. Furthering this simplicity, the map uses only two hues, one for water and the other for land. On the upper map, status glyphs (values determined by the 2017 status assessment) are plotted over the location of CUs. On the lower map, the mark is identical across all CUs. The name of the CU is displayed beneath the glyph or mark, and the user can pan and zoom the map by using a toolbar.
I included maps to provide information about the spatial position of CUs within the province to users. The upper map encoded information about the spatial location of CUs as well as their status, as users reported that they wanted to be able to identify the physical location of each CU. The marks displayed on each map are linked to the other plots they are juxtaposed against. As the upper section of the dashboard (map and tree plot) is meant to give a broad overview of the Fraser system in its entirety, every CU in the dataset has been plotted. In contrast, the lower map marks only the CUs that have been selected by the user through the drop-down menus, as a linked view here helps users rapidly identify the location of the chosen CUs.
1.3 Tree Plot
The upper half of the dashboard also features a quasi-geographic tree plot that represents connectivity between CUs within the larger Fraser watershed. The tree plot features links and nodes to encode this connectivity. The status glyphs stand in as the nodes, and as described in “Status glyphs”, the shape and color of these nodes encodes the status of that CU
The structure of the tree plot was derived from the geography of the Fraser river. The tree diagram was hand coded by assigning each CU and stream directly downriver of those CUs an X-Y position according to that topological stream order, and their position to river right or to river left of the Fraser mainstem. In order to avoid overlap across CUs that occupied the same stream order, the CUs with the highest stream order value were plotted nearest to the center, and then systems with lower stream order values were plotted immediately outside of those. Edges were drawn from each node to the one directly downstream of it and so on until the terminal node was reached. This formulation maintained characteristics of the geographic space (such as a CU’s right/left positioning relative to the Fraser mainstem) while obscuring details that were not directly related to connectivity, such as absolute distance between systems.
Using a tree plot idiom enabled viewers to explore the topology of the Fraser in a way that the map alone did not. The map and the tree were juxtaposed against each other because each idiom encoded different aspects of the geo-spatial data of the Fraser river. While the map showed the lat-long coordinates of each CU and the layout of those CUs within the watershed as a whole, the tree plot showed the connectivity across the system. Furthermore, the tree plot is a simple, intuitive design that users have probably encountered before. Tracing paths through the plot is simple and reveals information about the relationships between systems that is not readily apparent from the map.
1.4 Bar and line plots
Below the geo-spatial encodings are a series of five bar and line plots. Filters at the top of the plots allow users to select CUs by several attributes (Name, status, freshwater adaptive zone), and then select a range of time for which to display data. The plots are juxtaposed and linked, such that using any of the filters above the plots changes all of them simultaneously. Hovering above any point on the plot results in a pop-up tooltip that includes the name of the CU, the year of the selected item, and its exact value. The selected CUs are displayed on the small map to the right of the filters. Each plot displays an average line that is calculated across all items on the plot.
Attributes relating to yearly population size are encoded as bars, whereas attributes relating to productivity are encoded as line. This design choice was dictated by the nature of the variables. Salmon exhibit roughly 4-year lifespans, so the populations within a CU exist in four discrete cycles. The size of a return is not predictive of the size of the return the next year, or two or three years after, only the fourth year after that. Comparing year to year population sizes is somewhat misleading, and encoding the data with bars for each year emphasizes that these are discreet populations. In contrast, productivity is more dependent on local environmental conditions, which are continuous across years and independent of the total size of the population. Line plots are thus a more appropriate encoding for these continuous variables.
Each CU is associated with a hue, such that all lines and bars associated with that CU are plotted in that hue. These hues are categorical, and the pallet was a built-in Tableau pallet . While using 16 unique hues to encode categorical attributes could make the visualization indecipherably complex, the small scale of the plots already prohibits users from making meaningful comparisons across more than 40-50 items. Given that this is time-series data, this limits users to looking at only 3-4 CUs at a time, for which categorical hue encodings are appropriate.
The plots are ordered according to SoS team comments about what information fisheries scientists look to when assessing salmon populations. The top plot shows total returns, or the sum of all fish returning to a CU in a given year. According to the SoS team, scientists use this to get a broad overview of the health of the system, and often consult it first before more detailed metrics. Below this are plots of the other two population attributes, and below these are two measures of productivity.
A potential use scenario would be that of a science manager facing questions from non-scientists, from either within or outside of the department. Science managers are often the point of contact for media inquiries and governmental officials who are seeking general information about salmon stocks. A common query is for comment on the most recent year of salmon returns, or to describe the health of a population aggregate (such as all populations within the Fraser, all populations located in the greater Vancouver area, etc).
If a science manager received a call from a member of the press who had a general inquiry about the current state of Fraser river salmon, they could pull up the dashboard and refer first to the map and tree plot, which display status for individual populations across the entire river. The manager might browse these plots and answer that the health of populations varied throughout the watershed, as indicated by the distribution of different status glyphs across these plots, but CUs in the upper watershed tended to have lower status than those in the lower watershed. If the reporter asked if there were areas of concern, the manager could identify the CUs assigned a “red” status on the tree plot, and report those CUs to the reporter.
If the reporter asked for follow up on these regions or for specific details about these populations, the manager could then select “Red” under the “Select Conservation Status” filter, and see plots of population measures. The manager could then browse the bar and line plots to identify trends, outliers or features across these populations. Examining these plots suggests that population size is highly variable across these CUs, but all CUs showed a reduction in total recruits beginning in the late 1990s, and these have not recovered.
However, the smoothed productivity residual plot shows that productivity appears to be exhibiting a positive trend across most CUs and returning to the long-term average. The manager could also identify Cultus Lake as an outlier, where this CU has continued to have sharp declines in productivity and population size.
In this scenario, the manager is able to rapidly browse the statuses of CUs across the Fraser river and identify features of that topology. Furthermore, they can filter items and compare them to discover trends or features in the data, and identify outliers. The visualization enables rapid exploration of the existing dataset and comparisons across CUs in a way that was not possible using the department’s existing visualizations. While there are ways to improve the visualization, this represents a significant improvement over current techniques.
This dashboard improves the ability of fisheries scientists within the DFO to explore information about salmon within the Fraser river. Existing techniques limited the ability of fisheries scientists to make comparisons across CUs, which is a task of central importance to their jobs. Providing an interactive visual interface should improve their ability to explore and consume the data. The development of quasi-geographic encodings such as the tree plot allow users to explore the topology of the Fraser river in ways they were not able to before, and provide an important emphasis on connectivity between attributes which has not been addressed by previous visualizations in the department. The development of more quasi-geographic plots could further the ability of scientists within the department to explore these relationships, and appears to be a promising area for further work. The use of more extensive user surveys and interviews also promises to improve my understanding of the tasks fisheries scientists need to complete as part of their job, and will in turn lead to the development of more effective visualizations.