miércoles, 14 de marzo de 2018

Nice ggplot with sad data: something happens with women in science


   
   Last March 8, millions of women in more than 170 countries around the world joined street protests calling for "a society free of sexist oppression, exploitation and violence". Spanish strike was one of the most numerous, where around 5.3 million women joined the strike. Today, I will use some data (download) regarding the scientific career in Spain (2016) to make a plot using ggplot2 and discuss about it. Here you have the graph and the code:



 library(ggplot2)  
 library(scales)  
 dat <- read.csv("wis.csv")  
 dat$pos <-  
  factor(dat$pos, levels = c("pre", "post", "ryc", "ct", "ic" , "pi"))  
 ggplot(dat, aes(x = pos, y = per)) +  
  geom_line(size = 3, aes(group = sex, color = sex)) +  
  geom_point() +  
  geom_text(aes(label = round(per * 100, 1)), vjust = -1, size = 4) +  
  labs(  
   title = "Women in Science \n Spanish National Research Council",  
   x = "research stage",  
   y = "sex ratio",  
   color = " \n",  
   subtitle = "(2016)"  
  ) +  
  scale_color_manual(labels = c("women", "men"),  
            values = c("purple", "orange")) +  
  scale_x_discrete(  
   breaks = c("pre", "post", "ryc", "ct", "ic" , "pi"),  
   labels = c(  
    "Predoctoral",  
    "Postdoctoral",  
    "Ramón y Cajal",  
    "Científico\nTitular",  
    "Investigador\nCientífico",  
    "Profesor de\nInvestigación"  
   )  
  ) +  
  scale_y_continuous(labels = percent) +  
  theme(  
   text = element_text(size = 15),  
   legend.text = element_text(size = 15),  
   axis.text.x = element_text(face = "bold", size = 11, angle = 30),  
   axis.title = element_text(size = 14, face = "bold"),  
   plot.title = element_text(hjust = 0.5, face = "bold", size = 20),  
   plot.subtitle = element_text(hjust = 0.5, size = 19)  
  )  


In X axis, we can see the stages at the Spanish National Research Council (known in Spanish as CSIC) positions, from young PhD candidates (left - “Predoctoral”) to the highest level, Research Professor (right - “Profesor de Investigación”). It’s easy to understand: in Spain, there are more women starting a scientific career than men, but only few of them get to the highest positions in CSIC.

We usually call this the “scissors graph”, since the two curves soon cross each other changing the trend shown at the beginning. For the last stages this imbalance between males and females can be due to the age (mean age of Research Professor in CSIC is around 58 years old, and things have changed a lot in Spain since 30 years ago). However, the graph shows that for the earlier stages this imbalance is still being a problem: just from PhD candidate stage to first postdoctoral position, women lose around 15% positions in favor of men. And this is not only happening in Spain. Only active politics and special measures orientated to these first stages could change the trend for the highest positions in the future!

If you want to know a little bit more about women in science in Spain, here you have a poster made by some friends from the Ecology Department in Universidad de Alcalá.

sábado, 9 de diciembre de 2017

Brownian Motion GIF with R and ImageMagick



Hi there!

Last Monday we celebrated a “Scientific Marathon” at Royal Botanic Garden in Madrid, a kind of mini-conference to talk about our research. I was talking about the relation between fungal spore size and environmental variables such as temperature and precipitation. To make my presentation more friendly, I created a GIF to explain the Brownian Motion model. In evolutionary biology, we can use this model to simulate the random variation of a continuous trait through time. Under this model, we can notice how closer species tend to maintain closer trait values due to shared evolutionary history. You have a lot of information about Brownian Motion models in evolutionary biology everywhere!
Here I will show you how I built a GIF to explain Brownian Motion in my talk using R and ImageMagick.


 # First, we simulate continuous trait evolution by adding in each iteration  
 # a random number from a normal distribution with mean equal to 0 and standard  
 # deviation equal to 1. We simulate a total of 4 processes, to obtain at first  
 # two species and a specieation event at the middle of the simulation, obtaining  
 # a total of 3 species at the end.  
 df1<- data.frame(0,0)  
 names(df1)<- c("Y","X")  
 y<-0  
 for (g in 1:750){  
 df1[g,2] <- g  
 df1[g,1] <- y  
 y <- y + rnorm(1,0,1)  
 }  
 #plot(df1$X,df1$Y, ylim=c(-100,100), xlim=c(0,1500), cex=0)  
 #lines(df1$X,df1$Y, col="red")  
 df2<- data.frame(0,0)  
 names(df2)<- c("Y","X")  
 y<-0  
 for (g in 1:1500){  
  df2[g,2] <- g  
  df2[g,1] <- y  
  y <- y + rnorm(1,0,1)  
 }  
 #lines(df2$X,df2$Y, col="blue")  
 df3<- data.frame(750,df1[750,1])  
 names(df3)<- c("Y","X")  
 y<-df1[750,1]  
 for (g in 750:1500){  
  df3[g-749,2] <- g  
  df3[g-749,1] <- y  
  y <- y + rnorm(1,0,1)  
 }  
 #lines(df3$X,df3$Y, col="green")  
 df4<- data.frame(750,df1[750,1])  
 names(df4)<- c("Y","X")  
 y<-df1[750,1]  
 for (g in 750:1500){  
  df4[g-749,2] <- g  
  df4[g-749,1] <- y  
  y <- y + rnorm(1,0,1)  
 }  
 #lines(df4$X,df4$Y, col="orange")  



 # Now, we have to plot each simmulation lapse and store them in our computer.  
 # I added some code to make lighter the gif (plotting just odd generations) and   
 # to add a label at the speciation time. Note that, since Brownan Model is a   
 # stocasthic process, my simulation will be different from yours.  
 # You should adjust labels or repeat the simulation process if you don't   
 # like the shape of your plot.  
 parp<-rep(0:1, times=7, each= 15)  
 parp<- c(parp, rep(0, 600))  
 for (q in 1:750){  
  if ( q %% 2 == 1) {  
  id <- sprintf("%04d", q+749)  
  png(paste("bm",id,".png", sep=""), width=900, height=570, units="px",   
    pointsize=18)  
  par(omd = c(.05, 1, .05, 1))  
  plot(df1$X,df1$Y, ylim=c(-70,70), xlim=c(0,1500), cex=0,   
     main=paste("Brownian motion model \n generation=", 749 + q) ,   
     xlab="generations", ylab="trait value", font.lab=2, cex.lab=1.5 )  
 lines(df1$X,df1$Y, col="red", lwd=4)  
 lines(df2$X[1:(q+749)],df2$Y[1:(q+749)], col="blue", lwd=4)  
 lines(df3$X[1:q],df3$Y[1:q], col="green", lwd=4)  
 lines(df4$X[1:q],df4$Y[1:q], col="orange", lwd=4)  
 if (parp[q]==0)  
 text(750, 65,labels="speciation event", cex= 1.5, col="black", font=2)  
 if (parp[q]==0)  
 arrows(750, 60, 750, 35, length = 0.20, angle = 30, lwd = 3)  
 dev.off()  
 }  
 }  

 Now, you just have to use ImageMagick to put all the PNG files together in a GIF using a command like this in a terminal:


 convert -delay 10 *.png bm.gif  

Et voilà!



martes, 5 de diciembre de 2017

Computing wind average in an area using rWind


Hi all!

A researcher asked me last week about how to compute wind average for an area using rWind. I wrote a simple function to do this using dplyr library (following the advice of my friend Javier Fajardo). The function will be added to rWind package as soon as possible. Meanwhile, you can test the results... enjoy!



 # First, charge the new function  
 library(dplyr)  
 wind.region <- function (X){   
  X[,3] <- X[,3] %% 360   
  X[X[,3]>=180,3] <- X[X[,3]>=180,3] - 360   
  avg<-summarise_all(X[,-1], .funs = mean)   
  wind_region <- cbind(X[1,1],avg)   
  return(wind_region)   
 }   

Once you have charged the function, let’s do an example...

 # Get some wind data and convert it into a raster to be plotted later  
 library(rWind)  
 library(raster)  
 wind_data<-wind.dl(2015,2,12,0,-10,5,35,45)  
 wind_fitted_data <- wind.fit(wind_data)  
 r_speed <- wind2raster(wind_fitted_data, type="speed")  

Now, you can use the new function to obtain wind average in the study area:


 myMean <- wind.region(wind_data)  
 myMean  

 # Now, you can use wind.fit to get wind speed and direction.  
 myMean_fitted <- wind.fit(myMean)  
 myMean_fitted  

 # Finally, let's plot the results!  
 library(rworldmap)  
 library(shape)  
 plot(r_speed)  
 lines(getMap(resolution = "low"), lwd=4)   
 alpha <- arrowDir(myMean_fitted)  
 Arrowhead(myMean_fitted$lon, myMean_fitted$lat, angle=alpha,   
      arr.length = 2, arr.type="curved")  
 text(myMean_fitted$lon+1, myMean_fitted$lat+1,   
    paste(round(myMean_fitted$speed,2), "m/s"), cex = 2)  


martes, 9 de mayo de 2017

niceOverPlot, or when the number of dimensions does matter


  Hi there!

    Over the last few months, my lab-mate Irene Villa (see more of her work here!)
and I have been discussing ecological niche overlap. The niche concept dates back to ideas first proposed by ornithologist J. Grinnell (1917). Later on, G.E. Hutchinson
(1957) defined the ecological niche of a species as the n-dimensional hyper-volume of ecological factors that define a space where the species can exist indefinitely. These ecological factors can be either abiotic (fundamental niche) or biotic variables, and the intersection of both defines the realized niche, or potential ecological range where the species can survive. The last necessary element for the existence of a species in a particular place (actual presence or distribution) is related to the “movement” ability or dispersal capacity of the species to arrive to those places where ecological factors are suitable. This theory is nicely summarized in the BAM diagram (Biotic-Abiotic-Movement; Soberón, 2007).

    In the last decades, with the development of cartographic information about climatic variables (mainly temperature and precipitation) and mathematical algorithms, many research efforts have been directed towards modeling the ecological niche of species from presence records and bioclimatic information (Elith & Leathwick, 2009). As these techniques were developed, many researchers focused on the comparison of ecological niches of two species, and the degree of overlap between them. A wide range of literature has been published about this topic (Peterson, 2011) but, in the last years, a general consensus has been reached about the methods to measure this overlap. Briefly, the method consists of two steps: 1) building an environmental space using bioclimatic values in the presence records and background points with any multivariate technique (usually Principal Component Analysis), and 2) representing both species niches in this space via kernel density, comparing them using D or I indexes and applying similarity or identity tests to assess significance (Warren et al., 2008).

    Following this general approach, the R package ecospat (Di Cola et al., 2016) provides an easy way to perform a Principal Component Analysis and to compare two species niches using D overlapping index, identity tests, etc. However, Irene and I realized that, when using this approach, in many cases for which niche overlap was not obtained, this was due to the lack of overlap for just one of the two PC axes, while a complete overlap was retrieved for the other axis. Taking into account that both axes do not usually contribute equally to explain environmental space (the first axis usually explains much more variability than the following ones), we worked on a function using the ggplot2 R package to create a nice plot for visual exploration of the Principal Component scores in both axes. This plot allows us to examine the factors that are or are not producing niche overlap, since it shows three plots in one: each axis separately and the joint distribution.


   Here is the function code (too large to print here). You can download and run it in an R console. The following is an example with two wall lizards from a Mediterranean area. You can download ecological values for the 19 bioclimatic variables at presence points of these two species here.


   
 ##########################
 #######   EXAMPLE   ######  
 ##########################
library(ecospat)
library(ggplot2)
library(grid)
library(gridExtra)
library(gtable)
library(RColorBrewer) 
   
 # Read wall lizards ecological data  
   
 data_Ph_Pl<-read.csv("./data_Ph_Pl.csv")  
   
 # Note: This data represent the values for the 19 bioclimatic variables at occurrence   
 # points for two Mediterranean wall lizards, Iberian wall lizard (Podarcis hispanicus,   
 # mainly in Iberian Peninsula) and Lilford's wall lizard (Podarcis lilfordi, from the   
 # Balearic Islands). Data were downloaded from GBIF* and Worldclim (http://worldclim.org/).  
 # The first 1125 rows are the bioclimatic values for P.hispanicus presences, while the  
 # next 44 rows represent bioclimatic values for P.lilifordi. The following rows represent   
 # available bioclimatic conditions (background) obtained from 10000 random points   
 # generated in a buffer of 400 km approximatively around presence points. Some inaccuracies   
 # (such as 2 Lilford's wall lizard presences in Iberian Peninsula) were not removed (this is   
 # just an example!!).  
 # *Data searches:  
 # Podarcis hispanicus: GBIF.org (30th April 2017) GBIF Occurrence Download http://doi.org/10.15468/dl.yx4fbg  
 # Podarcis lilfordi: GBIF.org (30th April 2017) GBIF Occurrence Download http://doi.org/10.15468/dl.qqmzq3  
   
 # From this data, we perform a Principal Component Analysis using "dudi.pca" function from "ecospat" package.  
 # We will choose 2 axes for the environmental space representation.  
   
 pca_Ph_Pl <-dudi.pca(na.omit(data_Ph_Pl, center = T, scale = T, scannf = F, nf = 2))  
 2  
   
 # Now, we can use directly this result with the niceOverPlot function to represent a   
 # 2D environmental space in a central plot, and each environmental gradient represented   
 # by each axis at top and right. We must provide to the function with the number of presences  
 # of each species in the same order as in the input data (nº of presences  
 # for Sp1 and nº of presences for Sp2)  
   
 niceOverPlot(pca_Ph_Pl, n1=1125 , n2= 44)  

Results from niceOverPlot (Blue area: P. lilifordi; Pink area: P. hispanicus)

      

  This plot could help us to interpret the results obtained from niche overlap analysis for these two species. In this example, we can see that, in 2D environmental space, the bioclimatic preferences of the two wall lizard species do not overlap. But you can notice that this lack of overlap is due to Axis2 of the PCA, which represents around 18% of contribution in environmental space. Now, we can focus on the bioclimatic variables that are involved in this axis to understand which factors are preventing the overlap.
In upcoming posts, we will discuss some alternative ways to address the D overlap index to take into account cases like our example.

Have fun!

Podarcis lilifordi, from the Balearic Islands

 References


-Soberón, J. (2007). Grinnellian and Eltonian niches and geographic distributions of species. Ecology letters, 10(12), 1115-1123.

-Elith, J., & Leathwick, J. R. (2009). Species distribution models: ecological explanation and prediction across space and time. Annual review of ecology, evolution, and systematics, 40, 677-697.

-Peterson, A. T. (2011). Ecological niche conservatism: A time‐structured review of evidence. Journal of Biogeography, 38(5), 817-827.

-Di Cola, V., Broennimann, O., Petitpierre, B., Breiner, F. T., D'Amen, M., Randin, C., ... & Pellissier, L. (2016). ecospat: an R package to support spatial analyses and modeling of species niches and distributions. Ecography.

domingo, 4 de diciembre de 2016

rWind R package released!

Hi there! 

 Let me introduce you rWind, an R package with several tools for downloading, editing and converting wind data from Global Forecast System (https://www.ncdc.noaa.gov/data-access/model-data/model-datasets/global-forcast-system-gfs) in other formats as raster for GIS! Wind data is a powerful source of information that could be used for many purposes in biology and other sciences: from the design of air pathways for airplanes to the study of the dispersion routes of plants or bird migrations. Making more accessible this kind of data to scientist and other users is the objective of ERDDAP (http://coastwatch.pfeg.noaa.gov/erddap/index.html), a web service to dive into a lot of weather and oceanographic data-bases and download it easily. 

 I was using specifically one of the ERDDAP data-bases to get wind direction and speed from satellite data, the NOAA/NCEP Global Forecast System (GFS) Atmospheric Model (http://oos.soest.hawaii.edu/erddap/info/NCEP_Global_Best/index.html). At first, I was following this wonderful post from Conor Delaney (http://www.digital-geography.com/cloud-gis-getting-weather-data/#.WERgamd1DCL) to download and fix the data to be used as a GIS layer. However, I needed soon to download and modify a lot of wind data, so I started to write some R functions to automate the different tasks. Finally, I decided to put all together into an R package and upload it to CRAN repository to make it available for other users that could be interested in this kind of data. Here I give you a reference manual and an R code with a brief tutorial to get familiar with the utilities of the rWind package!

If you have any doubt or you want to report a bug or make any suggestion, please, comment the post or write me: jflopez@rjb.csic.es or jflopez.bio@gmail.com. 

 Enjoy it!



Javier Fernández-López (2016). rWind: Download, Edit and Transform Wind Data from GFS. R package version 0.1.3. https://CRAN.R-project.org/package=rWind


   
 # Download and install "rWind" package from CRAN:  
 install.packages("rWind")  
 # You should install also "raster" package if you do not have it   
   
 library(rWind)  
 library(raster)  
   
 packageDescription("rWind")  
 help(package="rWind")  
   
 # "rWind" is a package with several tools for downloading, editing and transforming wind data from Global Forecast   
 # System (GFS, see <https://www.ncdc.noaa.gov/data-access/model-data/model-datasets/global-forcast-system-gfs>) of the USA's  
 # National Weather Service (NWS, see <http://www.weather.gov/>).  
   
 citation("rWind")  
   
 # > Javier Fernández-López (2016). rWind: Download, Edit and Transform Wind Data from GFS. R package version 0.1.3.  
 # > https://CRAN.R-project.org/package=rWind  
   
 # First, we can download a wind dataset of a specified date from GFS using wind.dl function  
 # help(wind.dl)  
   
 # Download wind for Spain region at 2015, February 12, 00:00  
 # help(wind.dl)  
   
 wind.dl(2015,2,12,0,-10,5,35,45)  
   
 # By default, this function generates an R object with downloaded data. You can store it...  
   
 wind_data<-wind.dl(2015,2,12,0,-10,5,35,45)  
   
 head(wind_data)  
   
 # or download a CVS file into your work directory with the data using type="csv" argument:  
   
 getwd()  
 wind.dl(2015,2,12,0,-10,5,35,45, type="csv")  
   
 # If you inspect inside wind_data object, you can see that data are organized in a weird way, with  
 # to rows as headers, a column with date and time, longitude data expressed in 0/360 notation and wind  
 # data defined by the two vector components U and V. You can transform these data in a much more nice format
 # using "wind.fit" function:  
 #help(wind.fit)  
   
 wind_data<-wind.fit(wind_data)  
   
 head(wind_data)  
   
 # Now, data are organized by latitude, with -180/180 and U and V vector components are transformed  
 # into direction and speed. You can export the data.frame as an CVS file to be used with a GIS software  
   
 write.csv(wind_data, "wind_data.csv")  
   
 # Once you have data organized by latitude and you have direction and speed information fields,  
 # you can use it to create a raster layer with wind2raster function to be used by GIS software or to be plotted   
 # in R, for example.  
 # As raster layer can only store one information field, you should choose between direction (type="dir")  
 # or speed (type="speed").  
   
 r_dir <- wind2raster(wind_data, type="dir")  
 r_speed <- wind2raster(wind_data, type="speed")   
   
 # Now, you can use rworldmap package to plot countries contours with your direction and speed data!  
   
 #install.packages("rworldmap")  
 library(rworldmap)   
 newmap <- getMap(resolution = "low")  
   
 par(mfrow=c(1,2))  
   
 plot(r_dir, main="direction")  
 lines(newmap, lwd=4)  
   
 plot(r_speed, main="speed")  
 lines(newmap, lwd=4)  
   
   
 # Additionally, you can use arrowDir and Arrowhead (from "shape" package) functions to plot wind direction  
 # over a raster graph:  
   
 #install.packages("shape")  
 library(shape)   
   
 dev.off()  
 alpha<- arrowDir(wind_data)  
 plot(r_speed, main="wind direction (arrows) and speed (colours)")  
 lines(newmap, lwd=4)  
 Arrowhead(wind_data$lon, wind_data$lat, angle=alpha, arr.length = 0.12, arr.type="curved")  
   
 # If you want a time series of wind data, you can download it by using a for-in loop:  
 # First, you should create an empty list where you will store all the data  
   
 wind_serie<- list()  
   
 # Then, you can use a wind.dl inside a for-in loop to download and store wind data of   
 # the first 5 days of February 2015 at 00:00 in Europe region. It could take a while...  
    
 for (d in 1:5){  
  w<-wind.dl(2015,2,d,0,-10,30,35,70)  
  wind_serie[[d]]<-w  
 }  
   
 wind_serie  
   
 # Finally, you can use wind.mean function to calculate wind average   
   
 wind_average<-wind.mean(wind_serie)  
 wind_average<-wind.fit(wind_average)  
 r_average_dir<-wind2raster(wind_average, type="dir")  
 r_average_speed<-wind2raster(wind_average, type="speed")  
    
  par(mfrow=c(1,2))  
   
 plot(r_average_dir, main="direction average")  
 lines(newmap, lwd=1)  
   
 plot(r_average_speed, main="speed average")  
 lines(newmap, lwd=1)