BeeDC: an R package and globally synthesised and flagged bee occurrence dataset
bioRxiv, 2023•biorxiv.org
Species occurrence data are foundational for research, conservation, and science
communication. But the limited availability and accessibility of reliable data represents a
major obstacle, particularly for insects, which face mounting pressures. We present BeeDC,
a new R package, and a global bee occurrence dataset to address this issue. We
combined> 17.7 million bee occurrence records from multiple public repositories (GBIF,
SCAN, iDigBio, USGS, ALA) and smaller datasets, then standardised, flagged, deduplicated …
communication. But the limited availability and accessibility of reliable data represents a
major obstacle, particularly for insects, which face mounting pressures. We present BeeDC,
a new R package, and a global bee occurrence dataset to address this issue. We
combined> 17.7 million bee occurrence records from multiple public repositories (GBIF,
SCAN, iDigBio, USGS, ALA) and smaller datasets, then standardised, flagged, deduplicated …
Abstract
Species occurrence data are foundational for research, conservation, and science communication. But the limited availability and accessibility of reliable data represents a major obstacle, particularly for insects, which face mounting pressures. We present BeeDC, a new R package, and a global bee occurrence dataset to address this issue. We combined >17.7 million bee occurrence records from multiple public repositories (GBIF, SCAN, iDigBio, USGS, ALA) and smaller datasets, then standardised, flagged, deduplicated, and cleaned the data using the reproducible BeeDC R-workflow. Specifically, we harmonised species names following established global taxonomy, country, and collection date and we added record-level flags for a series of potential quality issues. These data are provided in two formats, “completely-cleaned” and “flagged-but-uncleaned”. Our data cleaning process is open and documented for transparency and reproducibility. The BeeDC package and R Markdown are provided, and will be improved and updated regularly. By publishing reproducible R workflows and globally cleaned datasets we can increase the accessibility and reliability of downstream analyses. This workflow can be implemented for other taxa to support research and conservation.
biorxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果