Introducing me

My name is Ann Loraine. I am excited to introduce new people to the R programming language and environment for analyzing data.

My favorite academic topic

My favorite topic is bioinformatics.

What I like about R

I love R because it’s free, open source and fun to use.


Looking at data

I used R a lot for exploratory data analysis. I can easily read data into the R environment, make plots, and quickly answer simple questions about data.

For early, exploratory analyses, I make a Markdown document like this one, something that presents summary statistics and plots.

To start, let’s read some data from a local file.

The data come from a file I made by copy-and-pasting from a table on US Government Department of Labor and Statistics Web page titled “Employment trends of Hispanics in the U.S. labor force.”

I copied the data from “Table 2. Labor force of the civilian noninstitutional population 16 years and over, 2003–2023, annual averages.”

The data file has three columns, representing three variables:


Read and format data

Read the file:

fn <- "population.txt"
d <- read.delim(fn,header = FALSE)
names(d) = c("Y","H","N")

The data frame has 21 rows and 3 columns, as expected.

Check the mode (atomic type) for each column of data:

types = lapply(d,mode)

The previous code chunk shows that the two numeric columns in the data file (population numbers) were read by R as character values, not numeric values.

Before I can proceed, I have to convert character columns into numeric columns.

To do this, I’ll load a library called stringr that has a lot of useful text (string) manipulation functions, use one of its functions to remove the commas and then convert the character to numeric data using as.numeric:

library(stringr)
d$H=as.numeric(str_remove_all(d$H,pattern = ","))
d$N=as.numeric(str_remove_all(d$N,pattern = ","))

Now the data are ready to visualize using plotting functions.


Plot the data

Average workforce size - all ethnicities

total = (d$H+d$N)/10**6
years = d$Y
names(total) = years
bush = "brown"
obama = "orange"
trump = "gold"
biden = "navy"
p = c(rep(bush,6),rep(obama,8),rep(trump,4),rep(biden,3))
plot(years,total,pch=16,col=p,main="American workforce 2003 - 2023",ylab="millions",xlab="years",las=1,ylim=c(145,170))

Each point is a an annual average of size of the American workforce, in millions. Points are color-coded to indicate the Presidential administration at the time, for context.

Color code:

  • Brown - Bush,
  • Orange - Obama,
  • Gold - Trump,
  • Navy - Biden.

This plot shows that the American workforce rose almost every year, except during the Great Recession years 2008 and 2009, and 2020, the start of the SARS-CoV-2 pandemic.

Average workforce size - Hispanic workers only

total = d$H/10**6
years = d$Y
names(total) = years
bush = "brown"
obama = "orange"
trump = "gold"
biden = "navy"
p = c(rep(bush,6),rep(obama,8),rep(trump,4),rep(biden,3))
plot(years,total,pch=16,col=p,main="Hispanic American workforce 2003 - 2023",
     ylab="millions",xlab="years",las=1,ylim=c(18,33))

Each point is a an annual average of the number of Hispanic Americans in the American workforce. Points are color-coded to indicate the Presidential administration at the time, for context.

Color code:

  • Brown - Bush,
  • Orange - Obama,
  • Gold - Trump,
  • Navy - Biden.

This plot shows that the sub-population grew at what appears to be the same rate every year, with a flattening in 2009 - 2010 and a small decline in 2020. This pattern tracks with the pattern we noticed in the total workforce plot, except that the dips in population in the 2008 and 2020 time periods are not as severe.

Hispanic workforce, as a percentage

Are there any obvious trends we can spot with respect to the percentage of workers who are Hispanic in the United States?

Calculate percentages of Hispanic workers:

percents = round(d$H/(d$H+d$N)*100,digits=1)
years = d$Y
names(percents) = years
bush = "brown"
obama = "orange"
trump = "gold"
biden = "navy"
p = c(rep(bush,6),rep(obama,8),rep(trump,4),rep(biden,3))
plot(years,percents,pch=16,col=p,main="Hispanic American workforce, 2003 - 2023",
     ylab="percent of the total workforce",xlab="years",las=1)

The above plot shows that the percentage of workers in America who are Hispanic has been rising every year, in a nearly straight line pattern, broken only by a lower-than-trend value for 2011.