The Drew Lab at Columbia University

Home » Uncategorized » Exploring community ecology using Pokémon Go

Exploring community ecology using Pokémon Go

(UPDATE 11/30/16: One of the students, Dev Harrington, did such a nice job with this assignment that I want to post it as an example. Well done Dev.)


One of the major questions facing ecologists and resource managers is understanding the amount of diversity in a particular area. This is important not only for describing macro ecological trends such as species gradients, but for allowing for the more precise application of limited conservation resources. Additionally, careful analysis of how species are distributed can even help identify some of the process underlying that pattern.

This raises the question, how do we measure the diversity within a particular area? Ecologists measure diversity using a variety of techniques but two important ones are Species Richness and Species Evenness. Species richness refers to the total number of species in a particular area, while Species Evenness refers to the distribution of those species within the total species pool. Figure 1, and 2 were tboth contain the same number of individuals (N= 12) however Fig. 1 only contains 1 species while Fig. 2 contains 10. We would say that the community represented by Figure 2 would have a greater species richness.

Now look at Fig 3 and 4. Again, both contain 12 individuals and five species but look at how those are distributed. Fig. 3 has 5 species with the distribution being Bellsprout (N=4) Beedrill (N=3) Abra (N=2) Bulbasaur (N=2) Butterfree (N=1) so 4,3,2,2,1. Figure 4 has again 12 individuals in 5 species but the distribution is as follows Eakans (N=6) Sandshrew (N=2) Nidoran (N=2) Nidoreina (N=1) and Nidoqueen (N=1) so 6,2,2,1,1.   We can see from these distributions that the community represented in Figure 3 has a greater species eveneness.

Species inventories:

When ecologists do field work one of the most common activities they do is to compile some measure of species diversity. This is often done by looking at the number of species encountered per fixed measure of effort or time. In practice this can mean running a 50m transect tape and seeing how many species are encountered, or sitting and listening for birds for a fixed period of time. When scientists accumulate these inventory data they can then do a number of statistical analyses to investigate the kinds of diversity present and if there are any patterns within that diversity. This is exactly what we are going to do today.

Collecting data:

To investigate these ideas of community diversity and similarity we are going to need to collect data. To do that we are going to play Pokemon Go. For real.  Working in pairs, I want you to time yourself for 30 min. During that time I want you and your partner to attempt to catch every pokemon you encounter. For each capture attempt record the following data: Species, Observed or Caught (i.e. did it run away before you could catch it) Combat Points, Number of poke pokeballs needed to capture the Pokemon, the kind of catch (nice/great/excellent) and the time that you caught it. I have made a data sheet available here. If you are playing along outside of class, please email me your sheets! We’d love to incorporate your data.

Additionally, I want you to record the following data. The level of the player and the approximate locality that you were sampling in.  Since we want to replicate effort across all of our data please do not use incense and try to avoid pokestops that have lures and don’t use incense since those will artificially inflate your encounter rates.

Since our working hypothesis is that there will be a geographic signal in the kinds of species we encounter I want you to remain in one general area. In other words don’t walk from a grassy area down to a riverbank as that linear transect will likely cross multiple ecosystems. For this one, I would rather we have more students sample in different individual ecosystems than those crossing multiple ecosystems. You can sample wherever you want, however I would love if at least two pairs of students sampled:

  1. In Harlem along 125th between Lexington and  Morningside, including the Apollo Theatre.
  2. Central Park’s lower east side  (near the Central Park Zoo, apx. 59th to 65th St.).
  3. Within the American Museum of Natural History (suggested donation, you need not pay for this).
  4. Riverside Park near Grant’s Tomb (the corner of Riverside and Seminar Row).
  5. The Upper West Side (apx. 72nd St to 79th St. between Broadway and Columbus).
  6. Chinatown (between Worth and Canal St. and Bowry and Centre St).
  7. Little Italy (between Canal St. and Kenmore St, and Bowry and Centre St).
  8. Chelsea along the High Line
  9. Midtown.  40 and 49th St. Between Broadway and Lexington

This should give us 18 different transects to look at geographic patterns within New York City.  I will also add two from the northern suburbs to see if there are differences there. Additionally if anyone else wants to join in we will make our data sets publicly available so we can compare pokecommuniites from different areas.

As always, remember to be safe. Work in paris so at least one of you is not focusing on the screen and, as I tell my son, “When you cross on the green, take your eyes off the screen.”

For those of you playing along at home we will post our data set here once it is completed

Data analysis:

We will be using these data to explore three aspects of community ecology: species accumulation curves, species diversity indices and community similarity.

Species accumulation curves:  How do we know when we’ve sampled enough? By looking at species accumulation curves ecologists get an idea as to how much of the total diversity they’ve captured. A species accumulation curve graphs the number of novel species captured per sampling event. Since, by definition, all species captured during the first sampling event are going to be novel the species richness of site 1 will be the Y value of the site one on the species accumulation curve. For each additional site added we add all newly encountered species to the total. Eventually the curve will asymptote along the true number of species present. If the curve appears to be steadily increasing at the end of your sampling then it typically means more sampling is required to estimate the total number of species. If, however the curve has passed the inflection point and appears to be flattening you have a rough estimate of how much diversity is present

For example in site 1 we capture 5 pokemon species, by definition all will be new, and our Y value for site 1 would be 5. In site two we capture 2 additional species, the Y value for site two would be 7 (e.g. 5 from site 1, plus 2 from site 2). For site 3 we capture no newly encountered species so the Y value for site 3 would remain at 7.


A species accumulation curve based on reef fish data in Papua New Guinea (Based on Drew et al. 2012)

Now as a class let’s compile our data to create a species accumulation curve, add the stations from south to north (so the the Chinatown samples are first and the Westchester County samples are last):

Thought questions:

  1. At the end of the sampling period was the curve increasing or flat? What does that mean for our sampling effort?
  2. Did you notice the curve being smooth or were there certain stations which created? What might cause a sudden jump in the rate of species accumulation?

Species Diversity Indices:

There are many methods that ecologists use to help quantify both richness and evenness. One of the most common ways to measure species richness is the Shannon-Weiner Index:


Where pi is the proportion of individuals in the ith species in the data set. Thus the summation of i1, i2,….iR includes all the species in the dataset. The Shannon-Weiner index also can be used to calculate an evenness


The evenness ranges from 1-0 with higher numbers being more even and lower numbers reflecting communities that are more skewed. Like all diversity indices, these measures is subject to sampling effort.

An additional measure of diversity is the Simpson’s index, which calculates the probability that two individuals drawn at random will be of the same category. It can be calculated thusly.


Where Lambda ranks from 0-1 with, confusingly, 0 being the most diverse. Therefore the index is usually reported as 1-Lambda.

These two indices are used to calculate the richness of an area, to calculate the evenness we use a formula derived by Evelyn Pielou:


Which ranges from 1-0 with higher numbers representing more even communities.

Lastly we have Jaccard’s coefficient, which is a way to calculate diversity based on presence/absence data.


Where J = Jaccards similarity index

a = number of species common to (shared by) quadrats,

b = number of species unique to the first quadrat, and

c = number of species unique to the second quadrat

Note, that this is a pairwise calculation.

  1. Calculate the Simpsons, Shannon-Weiner (diversity and evenness) and Pielou’s indices for each of the sites in our dataset. Which site was most diverse, which site was most even? You can do this by hand, using this website, or if you know R using the VEGAN package

Thought questions:

  1. Why do we have different measures of diversity? What does it tell you when one site is more diverse by one measure while a second site is most diverse by a different?
  2. Are there any geographic patterns in diversity? What would this tell us about the distribution of Pokemon in NYC
  3. If you were approached by a tourist who only had a limited amount of time to play Pokemon Go about where to play, what neighborhood would you say was the best place to go? Why?

Community Similarity

We can take the table of pairwise comparisons to generate a community similarity tree. Briefly this will be a graphical representation where neighborhoods that share more species in common will be connected to each other on nodes. Each clade represents a group of communities that are more like each other than they are to any other community. This method of thinking has been extensively used in phylogenetic where species that are more closely related to each other are on the same clade, while those less closely related are found further out on the tree:


In phylogenetic we look for patterns of evolutionary similarity, in community ecology we look for patterns of biodiversity similarity (figure from

If we look at the above example as a community ecologist and not a phylogeneticist (although, it is possible to be both, don’t let artificial dichotomies keep you from following interesting questions) we would say that communities A and B are more similar to each other than either of them is to community C. Similarly the distance between C and A is equal to the distance between C and B.  We can use this “tree thinking” to visualize geographic patterns within our data.

In order to calculate Jacquard’s coefficient you had to create a matrix with rows being sample locations and columns being species. If you know R you can calculate a community similarity tree using that input format and the function ‘hclust’ in the R package VEGAN and you can assign significance using the function ‘simprof’ in the R package CLUSTSIG.

If you do not yet know how to use R, you can use the this website to build a tree, and use the “bootstrap” option to assign a level of significance to each node in your community tree.  To use this site however you are going to have to format your data in a FASTA format.

For each site start with a “>” sign, then choose a four letter abbreviation and a number (representing the first or second sample from that neighborhood). Then press return. On the next line you will input the data which will be a string of 1’s and 0’s representing the presence or absence of every species in our data matrix. The last part is to hit return. Then on the next line press “>” again and put in the second site.

For those playing at home we will compile our data in R and FASTA and post them accordingly.

  1. Did sites taken from within the same neighborhood cluster together?
  2. Was there a geographic pattern present? If so, what was it?
  3. Did you see a habitat influence (e.g. sites in parks clustered together while sites in urban blocks cluttered together)?
  4. What communities were the most dissimilar? Why do you think these communities were most different?

While this lab obviously focus on fake biodiversity the skills and analytical methods here are broadly applicable to a variety of actual biodiveristy. I hope you had fun playing Pokemon but I also encourage you to go outside with a pair of binoculars or a mask and snorkel* and enjoy the real world diversity that is around us.

*don’t go swimming in the Hudson or East River near NYC. Just sayin’

%d bloggers like this: