This dataset contains INSDC sequence records not associated with environmental sample identifiers or host organisms. The dataset is prepared periodically using the public ENA API (https://www.ebi.ac.uk/ena/portal/api/) by querying data with search parameters: `environmental_sample=False & host=""`
EMBL-EBI also publishes other records in separate datasets (https://www.gbif.org/publisher/ada9d123-ddb4-467d-8891-806ea8d94230).
The data was then processed as follows:
1. Human sequences were excluded.
2. For non-CONTIG records, the sample accession number (when available) along with the scientific name were used to identify sequence records corresponding to the same individuals (or group of organism of the same species in the same sample). Only one record was kept for each scientific name/sample accession number.
3. Contigs and whole genome shotgun (WGS) records were added individually.
4. The records that were missing some information were excluded. Only records associated with a specimen voucher or records containing both a location AND a date were kept.
5. The records associated with the same vouchers are aggregated together.
6. A lot of records left corresponded to individual sequences or reads corresponding to the same organisms. In practise, these were "duplicate" occurrence records that weren't filtered out in STEP 2 because the sample accession sample was missing. To identify those potential duplicates, we grouped all the remaining records by `scientific_name`, `collection_date`, `location`, `country`, `identified_by`, `collected_by` and `sample_accession` (when available). Then we excluded the groups that contained more than 50 records. The rationale behind the choice of threshold is explained here: https://github.com/gbif/embl-adapter/issues/10#issuecomment-855757978
7. To improve the matching of the EBI scientific name to the GBIF backbone taxonomy, we incorporated the ENA taxonomic information. The kingdom, Phylum, Class, Order, Family, and genus were obtained from the ENA taxonomy checklist available here: http://ftp.ebi.ac.uk/pub/databases/ena/taxonomy/sdwca.zip
More information available here: https://github.com/gbif/embl-adapter#readme
You can find the mapping used to format the EMBL data to Darwin Core Archive here: https://github.com/gbif/embl-adapter/blob/master/DATAMAPPING.md
GBIF url: https://www.gbif.org/dataset/d8cd16ba-bb74-4420-821e-083f2bac17c2
Citation: European Bioinformatics Institute (EMBL-EBI), GBIF Helpdesk (2024). INSDC Sequences. Version 1.82. European Nucleotide Archive (EMBL-EBI). Occurrence dataset https://doi.org/10.15468/sbmztx accessed via GBIF.org on 2024-04-29.
物種 | 原始紀錄物種 | 類群 | 日期 | 行政區 | 資料集 | |
---|---|---|---|---|---|---|
Luteoporia albomarginata | 2017-02-05 | 南投縣仁愛鄉 | INSDC Sequences | |||
Dischidia formosana Maxim. 風不動 | Dischidia formosana | 被子植物 | 2014-06-23 | 台東縣海端鄉 | INSDC Sequences | |
Gymnema sylvestre (Retz.) Schultes 武靴藤 | Gymnema sylvestre | 被子植物 | 2014-06-23 | 彰化縣 | INSDC Sequences | |
Geliporus exilisporus | 2017-02-19 | 南投縣仁愛鄉 | INSDC Sequences | |||
Sphaerius sp. Taiwan DRM-2022b | 台中市霧峰區 | INSDC Sequences | ||||
Sinorchestia taiwanensis | Sinorchestia taiwanensis | 蝦蟹類 | 2017-03-08 | 台南市七股區 | INSDC Sequences | |
Dexipeus sp. CNCCOLVG00006980 | 2013-09-03 | 南投縣仁愛鄉 | INSDC Sequences | |||
Dacus 長角實蠅屬 | Dacus formosanus | 其他昆蟲 | 2014-06-01 | 台東縣海端鄉 | INSDC Sequences | |
Dexipeus sp. CNCCOLVG00006974 | 2013-09-02 | 南投縣仁愛鄉 | INSDC Sequences | |||
Kishinouyepenaeopsis cornuta 角突仿對蝦 | Parapenaeopsis cornuta | 蝦蟹類 | 2002-07-02 | 台南市 | INSDC Sequences | |
Milesia fissipennis 裂翅蚜蠅 | Milesia fissipennis | 其他昆蟲 | 2016-05-20 | 南投縣仁愛鄉 | INSDC Sequences | |
Dexipeus sp. CNCCOLVG00006967 | 2013-09-02 | 南投縣仁愛鄉 | INSDC Sequences | |||
Alstonia scholaris (L.) R.Br. 黑板樹 | Alstonia scholaris | 被子植物 | 2014-06-02 | 彰化縣鹿港鎮 | INSDC Sequences | |
Nephius sp. CNCCOLVG00006868 | 2013-08-15 | 南投縣仁愛鄉 | INSDC Sequences | |||
Epimeria sp. 5 MLV-2017 | INSDC Sequences | |||||
Rauvolfia verticillata (Lour.) Baill. 蘿芙木 | Rauvolfia verticillata | 被子植物 | 2014-06-02 | 彰化縣鹿港鎮 | INSDC Sequences | |
Meretrix lusoria 文蛤 | Meretrix lusoria | 蝸牛與貝類 | 2018-08-18 | 彰化縣伸港鄉 | INSDC Sequences | |
Phlebiodontia acanthocystis | 2017-03-24 | 南投縣仁愛鄉 | INSDC Sequences | |||
Meretrix lusoria 文蛤 | Meretrix lusoria | 蝸牛與貝類 | 2018-08-18 | 彰化縣伸港鄉 | INSDC Sequences | |
Staphylococcus aureus | 2022-06-01 | 台中市和平區 | INSDC Sequences | |||
Meretrix lusoria 文蛤 | Meretrix lusoria | 蝸牛與貝類 | 2018-08-30 | 彰化縣伸港鄉 | INSDC Sequences | |
Poecilobdella 牛蛭屬 | Poecilobdella nanjingensis | 其他無脊椎 | 2009-07-02 | 南投縣仁愛鄉 | INSDC Sequences | |
Staphylococcus aureus | 2022-06-01 | 台中市和平區 | INSDC Sequences | |||
Pholcus fragillimus | Pholcus fragillimus | 蜘蛛類 | 台中市西屯區 | INSDC Sequences | ||
Staphylococcus aureus | 2022-06-01 | 台中市和平區 | INSDC Sequences |