DSPG Annual Symposium 2021

Zoom

Data Science for the Public Good Virtual Symposium 2021

We hope you will join us for our annual Data Science for the Public Good Symposium on August 6! The Symposium is a signature event of the DSPG Forum, established by the University of Virginia Biocomplexity Institute. The Forum brings together a community of scientists, scholars, researchers, and policy-makers hoping to gain insight on using data science to positively transform the areas in which we live, work, and play.

At the annual Symposium, Data Science for the Public Good Young Scholars present their research findings from the 11-week summer program. This year, the Symposium features a nationally-recognized keynote speaker, and more than 35 students from DSPG Young Scholars Programs at four universities across two U.S. states.

We have planned an engaging and thought-provoking afternoon that provides opportunities to sit back and listen, or if you choose, interact with our keynote speaker in the plenary session and young scholars during research poster sessions.

Instructions for navigating your way to “zoom rooms” for the poster sessions will be shared with registered attendees on the day of the symposium.

Please note that all times are in Eastern Time Zone and you will want to plan your afternoon accordingly.

Agenda

1 p.m. Plenary Session: Welcome and Keynote
2 p.m. Young Scholars Program: Overview and Highlights
2:15 p.m. Break
2:30-4:30 p.m. Poster Sessions

Please join us for our annual Data Science for the Public Good Symposium to be hosted virtually featuring keynote speaker Jeri Mulrow, Vice President and Director of Statistics and Evaluation Sciences at Westat, and this year’s DSPG Young Scholars. *Note: This event is being recorded by audio, video, and photographic means. By attending, you grant the University of Virginia the right to use your voice/likeness in any depiction of this event.

Plenary Session

The Symposium opens with a plenary session, which is followed by poster sessions with this year’s Data Science for the Public Good Young Scholars. You are welcome to join us for a portion or all of the Symposium. We do hope you stay for all of it!

  1. Sallie Keller will open the Symposium and deliver the welcome address.
    She is the Director of the Social and Decision Analytics Division at the University of Virginia’s Biocomplexity Institute; a Distinguished Professor in Biocomplexity, and Professor of Public Health Sciences, UVA School of Medicine.
    See bio for more details.
  2. Jeri Mulrow is this year’s keynote speaker and we’re excited to hear more about this topic: Enabling Data Science to Do Public Good: Are We Making an Impact?
    She is Vice President and Director of Statistics and Evaluation Sciences at Westat.
    See bio for more details.
    Abstract: In a first keynote presentation of its kind, Jeri Mulrow will speak openly and candidly about the state of using data science to do public good in the world. She will tackle questions such as: “Where are we headed? Are we there yet? What do we still need to do?” Ms. Mulrow will highlight key elements, including critical laws and guidance, people skills, technology and tools, and funding that are woven together to provide the essential foundation and building blocks that enable the use of data science to make positive change in society. Our speaker will touch on how the work she has done and contributions she has made over her 30-year career in the federal government have dovetailed with these initiatives, and have helped to achieve progress and positive impact. Still, using data science to serve a public good is unfinished business, and a host of challenges remain, which Ms. Mulrow will discuss in the final portion of her keynote discussion. Audience Q&A will follow this dynamic presentation.

DSPG Young Scholars Program: Overview and Highlights

This year, more than 35 undergraduate and graduate students from four partner universities were involved virtually with postdoctoral fellows and faculty mentors in summer research projects that addressed critical social issues relevant in the world today.

In addition, the Young Scholars program ran in Istanbul and Ankara, Turkey where students gained practical research experience and skills to provide them with more career opportunities.

DSPG Young Scholars program leaders will share highlights from their programs.

  • Gizem Korkmaz, Research Associate Professor, University of Virginia
  • Aaron Schroeder, Research Associate Professor, University of Virginia
  • Claudia Scholz, Director for Research Development, UVA School of Data Science
  • Susan E. Chen, Program Director, VT Data Science for the Public Good and Associate Professor, Department of Agricultural and Applied Economics, Virginia Tech
  • Omar Fasion, Assistant Vice President, Research, Virginia State University
  • Heike Hofmann, Professor, Iowa State University

University and Non-profit Partners

It’s been an exciting and eventful year for the Data Science for the Public Good Young Scholars program! UVA is proud to partner with Virginia Tech, Virginia State University, and Iowa State University.

Sponsors

Thank you to our sponsors for their continued support as we continue to foster civic engagement and future discussions on the importance of doing data science to enhance the quality of life where we all live, learn, work, and play:

AVISON YOUNG, American Statistical Association (ASA), NORC, Rstudio, Sage Publishing, Westat, and Washington Statistical Society

Poster Sessions

In a normal world, in-person attendees would be invited to circulate the poster sessions and interact with young scholars as they discussed their research projects. This year, as you know, nothing is normal; and we’re trying to simulate the same experience in a virtual setting. Our young scholars have worked really hard this summer to deliver public good, and would love to share their research in their virtual “zoom rooms” after the plenary session.

Before the Symposium, learn more about each research project, including brief overview, teaser video, and students involved, by following the links below. On the day of the event, join the zoom room for each project using the links that will be provided to registered attendees.

1. Fostering Data Reuse: Measuring the Usability of Publicly Accessible Research Data

In this project, we investigated factors associated with reuse of publicly accessible research data, which is data that is made freely available on a journal, repository, or other website. Funding agencies, such as NSF, mandate that data be made available to the public. However, it takes time and resources to do so. In order to help data sharers understand the impact of this effort and to understand if those using the data can re-use it, we studied datasets on popular data repositories, such as KNB, Figshare, and Dryad, and used R’s web scraping capabilities to gather information on heavily reused datasets. We gathered metrics like downloads, citations, views, usability scores, metadata information, dataset size, and more from thousands of datasets from six chosen repositories. We used these metrics to understand reuse, which we measured using both the number of downloads and citations. We also analyzed equity of access by utilizing information that some repositories, such as ICPSR and NSF PAR, track on the makeup of their data users and data sharers. If you’d like to learn more, please come to our virtual poster session!

Fellow

Emily Kurtz, University of Minnesota, Political Science (PhD) and Statistics (MS)

Interns

Aditi Mahabal , University of Virginia, College of Arts and Sciences

Akilesh S Ramakrishna , University of Virginia, College of Arts and Sciences & Batten School of Leadership and Public Policy

Mentors

Alyssa Mikytuck, Postdoctoral Research Associate, Biocomplexity Institute, University of Virginia

Gizem Korkmaz, Research Associate Professor, Biocomplexity Institute, University of Virginia

Sarah Nusser, Professor Emeritus, Biocomplexity Institute, University of Virginia

Stakeholder

Martin Halbert, Science Advisor for Public Access, National Science Foundation

Stakeholder: National Science Foundation

2. Manpower Planning Using Skill Data

We study the connection of Army jobs and skills acquired through jobs like those army jobs to create a unique vector of skills associated with the Army. Veterans are often crowded out in job searches because they don’t know what skills they have acquired in their tenure. We utilize a unique ONET crosswalk that connects Army MOS codes with SOC codes and then connect these to job ads from Burning Glass. This research provides an overview of skills acquired in the Army which can provide information to Army Veterans on the jobs they may be best suited for and the skills they can place on their resumes.

Fellow

Morgan Stockham, Claremont Graduate University, Department of Economic Sciences

Interns

Asia Porter, Washington University in St. Louis, Department of Sociology

Stephanie Zhang, University of Virginia, College of Arts and Sciences

Mentors

Josh Goldstein, Research Assistant Professor, Biocomplexity Institute, University of Virginia

Aritra Halder, Research Assistant Professor, Biocomplexity Institute, University of Virginia

Joanna Schroeder, Research Specialist, Biocomplexity Institute, University of Virginia

Stakeholder

Andy Slaughter, Senior Research Psychologist, US Army Research Institute

Stakeholder: US Army Research Institute

3. Implementing Text-Based AIs to Investigate and Measure Private Sector Software Innovation

We study the landscape of product innovation in the computer software sector, leveraging publicly available opportunity data news articles obtained from Dow Jones, a business news and data provider. We implement a series of Bidirectional Encoder Representations from Transformers (BERT) neural networks, a sophisticated natural language processing method, for a number of tasks. Our work developed a BERT classification model to identify news articles describing innovation broadly, making use of a training set of 600 manually labeled articles and demonstrating an accuracy rate over 96%. This model was then applied to 1 year’s worth of news articles about the computer software industry to predict which articles describe innovation. We applied a different BERT algorithm to this set of predicted innovation articles for the purposes of named entity recognition, which was used here to extract the company and new product names mentioned in these predicted innovation-describing articles.

Fellow

Digvijay Ghotane, Georgetown University, Department of Public Policy

Interns

Aditi Mahabal, University of Virginia, College of Arts and Sciences

Akilesh S Ramakrishna, University of Virginia, College of Arts and Sciences & Batten School of Leadership and Public Policy

Mentors

Neil Alexander Kattampallil, Research Scientist, Biocomplexity Institute, University of Virginia

Devika Mahoney-Nair, Research Scientist, Biocomplexity Institute, University of Virginia

Gizem Korkmaz, Research Associate Professor, Biocomplexity Institute, University of Virginia

Stakeholder

Gary Anderson, Research & Development Statistics Program, National Center for Science and Engineering Statistics

Stakeholder: National Center for Science and Engineering Statistics

4. Classifying and Measuring Open Source Software Projects on GitHub

Over the past few years, our research group has advanced a number of computational approaches to measure the scope and impact of open source software (OSS), including a method that evaluates the resource costs of source code development in online platforms (e.g., Robbins et al. 2018). The goal of this current project is to address how different software types may impact economic evaluations of OSS. During our 2021 Data Science for the Public Good Young Scholars Summer Program, our team has begun to develop a methodology to help researchers study different software types through the use of computational text analysis. Drawing on 10+ million repositories scraped from GitHub, the world’s largest code hosting platform, we detail an approach that classifies software into categories using the information provided on repositories such as README files and repository descriptions. The categories are based on Fleming’s (2021) proposed classifications of software price indices and another prominent code hosting platform named SourceForge. After detailing these category types, we discuss how we use dictionary-based and unsupervised computational text analysis to classify these GitHub repositories. More specifically, we plan to probabilistically match repositories to predefined categories using text-based similarity metrics. After detailing this methodology, we talk about some potential use cases that this approach may proffer and its potential impact on developing novel economic evaluations of OSS tools.

Fellow

Crystal Zang, University of Pittsburgh Graduate School of Public Health

Interns

Cierra Oliveira, Clemson University

Stephanie Zhang, University of Virginia

Mentors

Brandon Kramer, Postdoctoral Research Associate, Biocomplexity Institute, University of Virginia

Gizem Korkmaz, Research Associate Professor, Biocomplexity Institute, University of Virginia

Stakeholders

Carol Robbins, Senior Analyst, National Center for Science and Engineering Statistics

Ledia Guci, Science Resources Analyst, National Center for Science and Engineering Statistics

Stakeholder: National Center for Science and Engineering Statistics

5. R&D Text Corpora Filtering and Data Mining

We use administrative data for federal grants to discover research topics and their trends in the area of artificial intelligence (AI). Our data source is Federal RePORTER, a database of federally funded research grants that includes project abstracts and other project data such as funding agencies and start years. We filter Federal RePORTER project abstracts for those that describe projects about AI. AI is a complex and hard to define theme, so this filtering problem is challenging. We utilized three different filtering methods: 1) an AI term matching method proposed by the Organization for Economic Co-operation and Development (OECD), 2) a method by Eads et al., which utilizes term matching and topic modeling, and 3) a Sentence BERT (bidirectional encoder representations from transformers) method that compares the similarity between the AI Wikipedia page and each grant abstract. Each filtering method produces an AI themed corpus on which we run a non-negative matrix factorization (NMF) topic model. Using linear regression and visualization, we analyze the topic model results to discover AI research trends in projects that were federally funded.

Fellow

Crystal Zang, University of Pittsburgh Graduate School of Public Health

Interns

Haleigh Tomlin, Washington and Lee University

Cierra Oliveira, Clemson University

Mentors

Joel Thurston, Senior Scientist, Biocomplexity Institute, University of Virginia

Eric Oh, Research Assistant Professor, Biocomplexity Institute, University of Virginia

Stephanie Shipp, Research Professor, Biocomplexity Institute, University of Virginia

Kathryn Linehan, Research Scientist, Biocomplexity Institute, University of Virginia

Stakeholders

John Jankowski, Director of R&D Statistics Program, National Center for Science and Engineering Statistics

Audrey Kindlon, Survey Statistician, National Center for Science and Engineering Statistics

Stakeholder: National Center for Science and Engineering Statistics

6. A Racial Equity Case Study of the Provision of Parks and Other Amenities in Arlington County

We examine the landscape of the provision of parks and their amenities by the Parks and Recreation Department in Arlington County from a racial equity lens. Combining data from the American Community Survey, Arlington Open Data Portal, CoreLogic, and scraped web information, we characterize the extent to which Arlington County is providing services that align with various communities’ needs and desires. In addition to racial (and other demographic) breakdowns, we consider other factors that may influence one’s needs for certain amenities such as car ownership, presence of young children, and type of housing (e.g. single family home or apartment building) and their intersections with race. We then calculate isochrones to determine how long residents of each neighborhood must travel to get to various parks and access certain amenities and overlay the factors described above to determine varying levels of access.

Fellows

Morgan Stockham, Claremont Graduate University, Department of Economics

Digvijay Ghotane, Georgetown University, Department of Public Policy

Interns

Asia Porter, Washington University in St. Louis, Department of Sociology

Madeline Garrett, University of Colorado Boulder, College of Arts and Sciences

Mentors

Eric Oh, Research Assistant Professor, Biocomplexity Institute, University of Virginia

Kathryn Linehan, Research Scientist, Biocomplexity Institute, University of Virginia

Aaron Schroeder, Research Associate Professor, Biocomplexity Institute, University of Virginia

Stakeholder

Jaime Lees, Arlington County Chief Data Officer, Arlington County

Stakeholder: Arlington County

7. What are the certifications that lead to a job in the Skilled Technical Workforce?

In this project, we are studying the nondegree credentials needed for a job in the Skilled Technical Workforce (STW). This work is important because the skilled technical workforce is a fast growing and crucial sector of the US economy, and these STW jobs can offer a path to the middle class for millions of Americans. Despite the importance of the skilled technical workforce, there are massive data gaps between the skilled technical workforce players, namely federal and state governments, employers, and the educational institutions that provide the nondegree credential training. We spent this summer bridging some of these data gaps. First, we used R’s web scraping capabilities to collect the certifications associated with the 133 skilled technical workforce occupations listed in the Department of Labor’s Occupational Information Network (ONET). Then, we collected the certifications demanded by employers for these occupations in job-ads from Burning Glass Technologies. We used Natural Language Processing techniques to standardize the job-ad certifications using the ONET certification names. Finally, we used network analysis to visualize the connections between occupations and certifications, thus highlighting paths workers could potentially take in the STW. If you’d like to learn more, please attend our virtual poster session!

Fellow

Emily Kurtz, University of Minnesota, College of Liberal Arts

Interns

Haleigh Tomlin, Washington and Lee University, Department of Sociology

Madeline Garrett, University of Colorado Boulder, College of Arts and Sciences

Mentors

Vicki Lancaster, Principal Scientist, Biocomplexity Institute, University of Virginia

Cesar Montalvo, Postdoctoral Research Associate, Biocomplexity Institute, University of Virginia

Sponsor

Gigi Jones, National Center for Science and Engineering Statistics

Stakeholder: National Center for Science and Engineering Statistics

8. Stroke and COVID Population: A Health Equity Analysis

To date, COVID-19 has claimed the lives of 449,020 Americans and has infected tens of millions more. In doing so, this massively sweeping pandemic has uncovered systemic flaws leading to the unequal share of burden to be held by racial minorities. In fact, Black, Hispanic, and Indigenous Americans have 1.5x higher infection rates, 4x higher hospitalization, and 2.7x higher death rates than White Americans. Data shows that this disparity exists across all age groups with the potential for furthering devastation in minority communities. Currently, 41% of all new COVID-19 infections are assigned to persons 35-49yrs old. Unfortunately, for 35-44yo Black and Hispanic Americans, they are 8-11x more likely to die following COVID infections compared to their White counterparts. Despite these startling statistics, racial minorities are receiving COVID vaccines at dramatically lower rates, with the majority of states reporting vaccination patterns along lines of race showing a 2-4x higher vaccination rate for White vs Black Americans. The cause of these disparities in outcomes vs intervention is multifactorial due to the compounding issues of lack of access, health care bias, and presence of negative factors impacting social determinants of health largely assigned to Black and Hispanic communities. Knowing that, we aim to investigate whether or not there are racial disparities in medical resource allocation in stroke patients who are COVID-19 positive. This project will use the limited dataset N3C data. The focus will be on ischemic stroke and we will explore if there are particular patterns that suggest the COVID-19 stroke outcomes/treatments and COVID-care patterns differ along lines of race.

Interns

Ethan Assefa, University of Virginia, College of Arts and Sciences, Department of Psychology

Esau Hutcherson, Howard University, Department of Electrical Engineering & Computer Science

Suliah Apatira, Spelman College, Environmental & Health Science Department

Dahnielle Milton, Spelman College, Chemistry & Biochemistry Department

Rehan Javaid, University of Virginia, School of Engineering & Applied Science, Department of Computer Science

Mentors

Sucheta Sharma, Data Scientist, School of Data Science & iTHRIV, University of Virginia

Dr. Andrew Southerland, Executive Vice Chair, Department of Neurology, School of Medicine, University of Virginia

Donald E. Brown, Senior Associate Dean and Co-Director, School of Data Science, University of Virginia

Stakeholder

Johanna Loomba, Director of Informatics, iTHRIV

Sponsors

Deloitte AI Institute for Government

Oracle for Research

Stakeholders: Deloitte and Oracle

9. Exploring the Skill Content of Jobs in Appalachia

We study the industries that comprise Appalachian labor markets, the jobs these industries provide, and the skills required for these jobs. Our work uses individual-level Integrated Public Use Microdata Series (IPUMS) data and occupation-specific O*NET data to understand the skill content of labor in Appalachian communities. We construct an index of skills for each occupation and then aggregate this to the PUMA level. The result is a PUMA level index that characterizes the skill content of workers in Appalachia. Typifying the current skill endowment of workers in Appalachia will help us understand how a transition away from the current industrial mix towards more green industries, with potentially new skill content, will affect the labor market of these rural communities.

Fellow

Timothy Pierce , Virginia Tech, Department of Agricultural and Applied Economics

Interns

Ryan Jacobs, Virginia Tech, Department of Agricultural and Applied Economics

Austin Burcham, Virginia Tech, Computational Modeling and Data Analytics

Yang Cheng, Virginia Tech, Department of Agricultural and Applied Economics

Mentors

Dr. Anubhab Gupta, Virginia Tech, Department of Agricultural and Applied Economics

Dr. Susan Chen, Virginia Tech, Department of Agricultural and Applied Economics

Stakeholder: Agricultural and Applied Economics, Virginia Tech

10. Water Resource Management and Industry and Residential Growth in Floyd County

We studied the factors that affect the water quality and quantity issues existing within Floyd County. We performed relevant literature review and had focused discussions with both the stakeholders and relevant experts on the area’s geology and water issues, which informed our data scraping directives. We found that nearly all the residents in the county rely on well and natural spring systems for their water supply, so these systems were a major focus of our findings. Due to the lack of data on the area’s groundwater resources, we developed various models to indirectly estimate the county’s water resources. We used remote sensing data from both the GRACE satellites and the Landsat 8 satellites to develop estimates for the water quantity trends in the area. The GRACE satellite data was used to estimate temporal trends of the water table anomalies for the county. The Landsat 8 satellite imagery was used to develop a neural network model which used Normalized Difference Water Index (NDWI) alongside precipitation, elevation and well water depth values from counties across Virginia to estimate the water table depth for the county. Alongside looking at water quantity, we also studied the water quality issues in the county and probed into factors that might lead to potential contamination of the county’s water resources. We identified the geology of the area alongside potential surface contamination sources and household plumbing issues to be the major sources of contamination in the county. We further identified strategies that the county could utilize for sustainable and efficient use of their water resources for future industrial and residential development.

Fellow

Esha Dwibedi, Virginia Tech, Department of Economics

Interns

Julie Rebstock, Virginia Tech, Department of CMDA and Economics

Ryan Jacobs, Virginia Tech, Department of Agricultural and Applied Economics

John Wright, Virginia State University, College of Engineering and Technology

Mentors

Dr. Sarah M. Witiak, Assistant Professor, College of Natural and Health Sciences, Virginia State University

Dr. Brianna Posadas, Postdoctoral Associate, Department of Agricultural, Leadership, and Community Education, Virginia Tech

Stakeholder

Dawn Barnes, Virginia Cooperative Extension, Floyd County

Stakeholder: Virginia Cooperative Extension, Floyd County

11. Tracking Indicators of the Economic and Social Mobility of the Black Community in Hampton Roads

Hampton Roads is a coastal region of Virginia comprised of 10 cities and six counties. It represents most of the Virginia Beach-Norfolk-Newport News metropolitan statistical area, the 37th largest MSA in the United States. Black families represent 31% percentage of the area’s population, and ~15% of them are below the poverty line – this is nearly double the general population of Hampton Roads, 8.1% of which is below the poverty line. This project uses publicly available Census data to analyze trends and statistics on key indicators of economic well-being of the black community in Hampton Roads. We compare these indicators across the Hampton Roads localities and the Virginia population. We visualize the data through a dashboard to provide insights to regional stakeholders to plan policies and activities to positively affect the community.

Fellow

Avi Seth, Virginia Tech, Department of Computer Science

Interns

Matthew Burkholder, Virginia Tech, College of Liberal Arts and Human Sciences

Christina Prisbe, Virginia Tech, Department of Computational Modeling and Data Analytics

Victor Mukora, Virginia Tech, Department of Computational Modeling and Data Analytics

Kwabena Boateng, Virginia State University, College of Engineering and Technology

Mentors

Dr. Isabel Bradburn, Research Director, Department of Human Development and Family Sciences, Virginia Tech

Dr. Chanita Holmes, Assistant Research Professor, Department of Agricultural and Applied Economics, Virginia Tech

Stakeholder

Mallory Tuttle, Associate Director, Virginia Tech Hampton Roads Centers

Stakeholder: Black BRAND, Virginia Cooperative Extension 

12. Service Provision for Vulnerable Transition Aged Youth in Loudoun County, Virginia

Transition Aged Youth (TAY), young adults ages 18-24, encounter numerous difficulties in their transition to adulthood. The transition can be especially difficult for youths “aging out” of foster care or those exiting the juvenile detention system. Motivated by the Loudoun County Human Services Strategic Plan 2019-2024, we identify the availability of services for TAYs in five major areas: education, employment, housing, transportation, and health. This project uses geospatial mapping and interactive tress to identify intra-county variation in services provision and utilization. We also conducted cross-county analysis between Loudoun County and Fairfax County, which is inside Virginia, and Allegany County, which is outside Virginia.

By unveiling the difference in the demographics, we identified those disproportionately served TAY.

Fellows

Yang Cheng, Virginia Tech, Department of Agricultural and Applied Economics

JaiDa Robinson, Virginia State University

Interns

Julie Rebstock, Virginia Tech, Department of CMDA and Economics

Austin Burcham, Virginia Tech, Department of CMDA and Economics

Kyle Jacobs, Virginia State University, College of Agriculture

Mentors

Dr. Isabel Bradburn, Research Director, Department of Human Development and Family Sciences, Virginia Tech

Dr. Chanita Holmes, Assistant Research Professor, Department of Agricultural and Applied Economics, Virginia Tech

Stakeholder

Stuart Vermaak, Virginia Cooperative Extension, Loudoun County

Stakeholder: Virginia Cooperative Extension, Loudoun County 

13. Analyzing Vegetative Health using Landsat 8 Satellite Imagery

The Normalized Difference Vegetative Index and Normalized Difference Water Index are indices developed to assess the vegetative health and water content of plants. These indices can be calculated using different wavelengths of light captured high-resolution satellite imagery. The goal of this project is to analyze these indices in the New River Valley using 11 bands of reflected light captured by the Landsat 8 satellite. This research contains machine learning forecasting algorithms and analysis of literature to use these indices in areas such as precision agriculture, groundwater detection, coastal flooding and drought. This project uses raw satellite images taken of the region, constructs filters, indices and subsets of the region by decomposing the wavelengths of light collected in each photograph. These subsets were supplied to a feed-forward neural network to obtain a robust prediction model for the New River valley.

Fellows

Esha Dwibedi, Virginia Tech, Department of Agricultural and Applied Economics

Avi Seth, Virginia Tech, Department of Computer Science

Interns

Atticus Rex, Virginia Tech, Computational Modeling and Data Analytics

Victor Mukora, Virginia Tech, Computational Modeling and Data Analytics

Mentor

Briana Posadas, Postdoctoral Associate, Department of Agricultural, Leadership, and Community Education, Virginia Tech

Stakeholder: Agricultural and Applied Economics, Virginia Tech

14. Availability of Services: Evolving Demographics, Housing, and Traffic in Rappahannock County

We used publicly available data from the American Community Survey (ACS) to explore questions and concerns held by stakeholders in Rappahannock County, Virginia. Our work involved the creation of a county profile for Rappahannock that displays information about age, race, income, employment, housing prices, and more.  Additionally, we analyzed traffic volume data from the Virginia Department of Transportation to identify areas of increased or decreased traffic in the last ten years (2010-2020). Finally, we aggregated community services and resources into a single dashboard that allows us to visualize the availability of services to residents of the county. Using the county profile, traffic volume data, and the service data, we are able to provide data-driven descriptions of service provision in Rappahannock County, Virginia during the last decade.

Fellow

Timothy Pierce, Virginia Tech, Department of Agricultural and Applied Economics

Interns

Christina Prisbe, Virginia Tech, Computational Modeling and Data Analytics

Mousa Toure, Virginia State University, Computer Science

Mentors

Leonard-Allen Quaye, Ph.D. Student, Department of Agricultural and Applied Economics, Virginia Tech

Dr. Anubhab Gupta, Assistant Professor, Department of Agricultural and Applied Economics, Virginia Tech

Dr. Mulugeta Kahsai, Assistant Professor, Department of Applied Engineering Technology, Virginia State University

Stakeholders

Kenner Love, Virginia Cooperative Extension, Agriculture and Natural Resources Crop & Soil Sciences, Rappahannock County

Stakeholder: Virginia Cooperative Extension, Rappahannock County 

15. Using PICES Data to Visualize District Level Multidimensional Poverty in Zimbabwe

Prior research suggests that poverty in Zimbabwe has increased since the period of crisis began at the turn of the millennium. According to the latest World Bank estimates, almost 49% of the population of Zimbabwe were in extreme poverty in 2020. Our stakeholders seek solutions to the economic situation. They would like more granular information presented in creative ways that allow the user to glean the multidimensional and temporal aspects of poverty in Zimbabwe. The recent availability of household surveys for public use has opened the possibility of using the data to inform evidence-based policy. This project uses data from the Poverty, Income, Consumption, Expenditure Survey (PICES) to provide granular information on poverty in Zimbabwe. We created multidimensional poverty indices (MPI) at the district and province level and decomposed them into components that focus on education, health, employment, housing conditions, living conditions, assets, agricultural assets, and access to services.   We provide interactive tools that allow the user to visualize and study each component and understand their contribution to the MPI. We constructed these measures for two waves of data in 2011 and 2017 to show the changes in poverty over time and across regions in Zimbabwe.  The composition and decomposition of MPI in this project provide policy implications for informing evidence-based policy and interventions for poverty reduction.

Fellow

Yang Cheng, Virginia Tech, Department of Agricultural and Applied Economics

Interns

Matthew Burkholder, Virginia Tech, College of Liberal Arts and Human Science

Atticus Rex, Virginia Tech, Computational Modeling and Data Analytics

Mentors

Sambath Jayapregasham, Research Associate, Department of Agricultural and Applied Economics, Virginia Tech

Susan Chen, Associate Professor, Department of Agricultural and Applied Economics, Virginia Tech

Anubhab Gupta, Assistant Professor, Department of Agricultural and Applied Economics, Virginia Tech

Jeffrey Alwang, Professor, Department of Agricultural and Applied Economics, Virginia Tech

Stakeholders

Dhiraj Sharma, Senior Economist, World Bank

Grown Chirongwem ZimStat

Stakeholders: World Bank and ZimStat

16. Just the Facts on Educational Attainment

How does the educational path of minorities, women or older adults differ from those of other general population individuals in Iowa? What types of jobs do various educational pathways lead to for minorities?

The aim of this project is to develop a series of indicators that identify the post-secondary educational attainment of disproportionately impacted communities in Iowa. The team has investigated data related to educational opportunities, attainment and outcomes of the identified population groups. These data has been cleaned and integrated into a data pipeline. To finally be presented as engaging, unbiased infographics and visuals through a publication series titled Just the Facts.

The team has worked closely with the DHR educational attainment and economic and workforce development teams to collaborate on and share general data resources such as population, demographics and languages spoken that are specifically related to the identified disproportionately impacted populations in Iowa. The team has worked closely with the DHR educational attainment and economic and workforce development teams to collaborate with and share general data resources such as population, demographics and languages spoken that specifically related to the identified disproportionately impacted populations in Iowa.

Fellows

Amanda Rae, Iowa State University, Graduate Sociology Student and Research Assistant

Interns

Laailah Ali, Washington State University, Undergraduate Major in Economics; Minors in Human Development and Sociology

Max Ruehle, Iowa State University, Undergraduate Majors in Statistics and Data Science

Ellie Uhrhammer, Drake University, Undergraduate Majors in Data Analytics and Mathematics

Mentors

Chris Seeger, Lead Investigator and Professor, Extension Specialist in Geospatial Technologies, Iowa State University

Bailey Hanson , Extension Specialist in Community Data and GIS, GISP, Iowa State University

Sponsors

Tina Shaw, Data & Government Access Officer, Iowa Department of Human Rights

Monica Stone, Administrator, Community Advocacy and Services Division, Iowa Department of Human Rights

Consultants

Sandy Burke, Research Scientist III, Community and Economic Development, Iowa State University

Liesl Eathington, Research Scientist III and Iowa Community Indicators Program Coordinator, Iowa State University

Stakeholder: Iowa Department of Human Rights

17. DHR Just the Facts on Economic and Workforce Development

The mission of the Iowa Department of Human Rights (IDHR) is to empower underrepresented Iowans through advocating for the elimination of economic, social, and cultural barriers to full participation in civic life. To that aim, we analyzed data and created indicators to identify employment and earnings opportunities for disproportionately impacted communities in Iowa. These communities include racial and ethnic minorities, women, and individuals with disabilities. We also developed a web application to explore language usage across the state of Iowa.

Fellow

Joseph Zemmels, Iowa State University, Statistics Ph.D. Student

Interns

Avery Schoen, University of Chicago, Statistics

Dylan Mack, Washington University in St. Louis

Zack Johnson, Iowa State University, Political Science, Undergraduate

Mentors

Chris Seeger, Lead Investigator and Professor, Extension Specialist in Geospatial Technologies, Iowa State University

Bailey Hanson, Extension Specialist in Community Data and GIS, GISP, Iowa State University

Stakeholders

Tina Shaw, Data & Government Access Officer, Iowa Department of Human Rights

Monica Stone, Administrator, Community Advocacy and Services Division, Iowa Department of Human Rights

Other Advisors

Sandy Burke, Research Scientist III, Community and Economic Development, Iowa State University

Liesl Eathington, Research Scientist III and Iowa Community Indicators Program Coordinator, Iowa State University

Stakeholder: Iowa Department of Human Rights

18. Iowa’s Integrated Data System for Decision-Making (Early Childhood Iowa)

The purpose of this project is to build an interactive dashboard for Early Childhood Iowa with the capacity to connect with I2D2 and identified national, state, and local sources. We aimed at 22 indicators as our primary data resource of the dashboard, including the data from IDPH, IDSH, CDC, etc. We implemented several tools to scrape the data that we collected from each indicator in a different format, and the dashboard users can pull the data from the dashboard in pdf or CSV file format. Furthermore, we also created various visualization for the dashboard user to analyze the data more directly.

Fellow

Tiancheng Zhou, Iowa State University, Computer Science (M.S.)

Interns

Avery Schoen, University of Chicago, Statistics

Dylan Mack, Washington University in St. Louis, Systems Engineering

Sonyta Ung, Iowa State University, Computer Science

Mentor

Todd Abraham, Assistant Director of Data & Analytics, Iowa State University

Stakeholders

Heather Rouse, Iowa’s Integrated Data System for Decision Making & Iowa’s Department of Management

Amanda Winslow, Iowa’s Integrated Data System for Decision Making & Iowa’s Department of Management

Shanell Wagler, Iowa’s Integrated Data System for Decision Making & Iowa’s Department of Management

Stakeholders: Iowa’s Integrated Data System for Decision Making and Iowa’s Department of Management 

19. Supporting Eat Greater Des Moines and Food Rescue in Central Iowa

This project looks at the non-profit organization Eat Greater Des Moines (EGDM) and its food rescue efforts. EGDM takes donations of surplus food from grocery and convenience stores, restaurants, and other locations and transports it to food pantries, non-profits, schools, housing locations, and other organizations that can distribute food to those that need it. In the project, the team used data provided by EGDM and other sources to demonstrate where food rescue currently happens, where it can be expanded, and what areas can benefit most from food rescue. The team has also built a data pipeline and dashboard that is sustainable for EGDM and will be used by the organization moving forward to support their food rescue efforts.

Fellow

Matthew Voss, University of Wisconsin-Madison

Interns

Zack Johnson, Iowa State University

Ellie Uhrhammer, Drake University

Saul Varshavsky, Drake University

Mentor

Adisak Sukul, Associate Teaching Professor, Iowa State University

Technical Support

Masoud Nosrati, Iowa State University

Stakeholder

Aubrey Alvarez, Executive, Eat Greater Des Moines

Stakeholder: Eat Greater Des Moines  

20. Quality of Life in Small and Shrinking Cities in Iowa

This project focuses on factors affecting the perception of quality of life in small and shrinking rural communities in Iowa. The goal is to help communities focus their limited resources on improving quality of life rather than using scarce resources to try to grow (as this is unlikely in most towns). Residents and leaders of small rural towns are collaborators and stakeholders of the umbrella NSF project. The team is building a community information ecosystem that will be available through an online web application. This ecosystem makes use of publicly available data and links it to some proprietary data sets to help communities understand, utilize, and collect new data about their towns and peer communities. The ecosystem will use statistical modeling and cutting-edge visualization strategies to make data more accessible to stakeholders in these communities, including city staff, local leaders, and the public.

Fellow

Amanda Rae, Iowa State University

Interns

Laailah Ali, Washington State University

Max Ruehle, Iowa State University

Jack Studier, Iowa State University

Mentor

Heike Hoffman, Professor, Iowa State University

Stakeholders

Kim Zarecor, Iowa State University

Other Advisors

Gina Nichols, Graduate Student, Iowa State University

Stakeholder: Iowa State University

21. Assessing the Impact of Publicly Accessible Research Data: What Can Repositories Tell Us About Data Reuse?

In this project, we aim to understand better what makes a data source reusable to another researcher by focusing on the repository component of the data-sharing ecosystem. We have explored a list of data repositories and looked for the associated metrics that suggest reusing a data source, and analyzed factors associated with higher levels of reuse and potential impact. Our two approaches are getting API requests and HTML scraping, which helped us extract the metrics from the repositories we assigned to, and use correlation plots to analyze the impact of reusability from each metric. Overall, this study is a repository-focused complement to a larger researcher-centered effort to develop a path for accelerating community readiness in creating reusable publicly accessible data products.

Fellow

Tiancheng Zhou, Iowa State University, Computer Science (M.S.)

Interns

Jack Studier, Iowa State University

Saul Varshavsky, Drake University

Sonyta Ung, Iowa State University, Computer Science

Mentor

Dr. Adisak Sukul, Associate Teaching Professor, Iowa State University

Stakeholders

Martin Halbert, Senior Advisory for Public Access, National Science Foundation (NSF)

Stakeholder: Public Data Sources

22. Equity in Access to Parks in Chesterfield, Virginia

Chesterfield County’s Department of  Parks and Recreation is concerned with the changing demographics of Chesterfield county as the population increases, ages, and becomes more diverse. As Chesterfield grows, the Department of Parks and recreation wants to ensure that the people serve have equitable access to their facilities. For this project we needed to find a way to define and quantify equity. The next leg of the project is determining equitable access, which involves an understanding of Chesterfield’s parks and their facilities. Our goals were to determine travel time and distance data for each park, create a measure of equity for each park, and rank each park based on their ability to serve their surrounding area. Using Chesterfield Parks’s GeoSpace resources we were able to attain detailed location and facilities data on their parks. The park’s qualities were ranked based on total facilities to provide a rudimentary “quality score” that can be further refined with more time and analysis of other park scoring systems. our literature determined the best variables to research for equity included analyzing vulnerable demographics, which we collected using the US Census’s 5-Year American Community Surveys and geospace information to determine the quantity of vulnerable population demographics on a census tract level and estimate an approximate number of people of these groups that are closest to the parks in their census areas. With this information and further analysis we can get a better understanding of who are closest to which parks and with further literature analysis understand if the facilities of each park best serve the communities closest to them.

Interns

R. Mousa Touré, Virginia State University

Kyle Jacobs, Virginia State University

Kwabeana Boateng, Virginia State University

Stakeholder: Chesterfield Parks and Recreation

23. Understanding Unemployment in the Prince George and Hopewell Region

The current unemployment rate in the Prince George and Hopewell Region is a concern for the Prince George and Hopewell Chamber of Commerce. There are several barriers to employment according to the literature, however we focused on job demands, transportation, education levels, and skills required for employment. Using data from the American Community Survey, Virginia Employment Commission and Jobs EQ for Workforce, the project used exploratory data analysis tools to visualize the distribution of unemployment, taking into account job demands, transportation, education levels and skills required. In addition, we explored the demand of labor and occupational gaps in the area. It seems that the job openings are unequally distributed in the area. For instance, it looks like most currently available jobs are available closer to the densely populated area and have a more diverse industry for employment. Whereas the sparsely populated area may have a higher travel time to work. This indicates that residents need to have means of transportation to get to current job opening locations. We compared the education level requirements of current job posting to the current education level of the unemployed. For example, in Hopewell, there are 318 people that are collecting unemployment benefits with a high school diploma or higher and there are about 400 job ads that require one. This indicates there may be enough jobs available based on education, but they may need additional skills to obtain employment. It seems that 40- 50% of the current job postings for each area require hard cognitive skills such as Cash Handling and Microsoft skills. Whereas it seems that 35% of the current job opening ads require physical skills to lift 50lbs or more. The next steps would be to explore the skills sets that the unemployed have. In addition, we would do statistical analysis to estimate the relationship between unemployment and the different barriers to employment.

Interns

John Wright, Virginia State University

JaiDa Robinson, Virginia State University

Stakeholder: Prince George/Hopewell Chamber of Commerce