This dataset contains INSDC sequence records not associated with environmental sample identifiers or host organisms. The dataset is prepared periodically using the public ENA API (https://www.ebi.ac.uk/ena/portal/api/) by querying data with search parameters: `environmental_sample=False & host=""`
EMBL-EBI also publishes other records in separate datasets (https://www.gbif.org/publisher/ada9d123-ddb4-467d-8891-806ea8d94230).
The data was then processed as follows:
1. Human sequences were excluded.
2. For non-CONTIG records, the sample accession number (when available) along with the scientific name were used to identify sequence records corresponding to the same individuals (or group of organism of the same species in the same sample). Only one record was kept for each scientific name/sample accession number.
3. Contigs and whole genome shotgun (WGS) records were added individually.
4. The records that were missing some information were excluded. Only records associated with a specimen voucher or records containing both a location AND a date were kept.
5. The records associated with the same vouchers are aggregated together.
6. A lot of records left corresponded to individual sequences or reads corresponding to the same organisms. In practise, these were "duplicate" occurrence records that weren't filtered out in STEP 2 because the sample accession sample was missing. To identify those potential duplicates, we grouped all the remaining records by `scientific_name`, `collection_date`, `location`, `country`, `identified_by`, `collected_by` and `sample_accession` (when available). Then we excluded the groups that contained more than 50 records. The rationale behind the choice of threshold is explained here: https://github.com/gbif/embl-adapter/issues/10#issuecomment-855757978
7. To improve the matching of the EBI scientific name to the GBIF backbone taxonomy, we incorporated the ENA taxonomic information. The kingdom, Phylum, Class, Order, Family, and genus were obtained from the ENA taxonomy checklist available here: http://ftp.ebi.ac.uk/pub/databases/ena/taxonomy/sdwca.zip
More information available here: https://github.com/gbif/embl-adapter#readme
You can find the mapping used to format the EMBL data to Darwin Core Archive here: https://github.com/gbif/embl-adapter/blob/master/DATAMAPPING.md
GBIF url: https://www.gbif.org/dataset/d8cd16ba-bb74-4420-821e-083f2bac17c2
Citation: European Bioinformatics Institute (EMBL-EBI), GBIF Helpdesk (2024). INSDC Sequences. Version 1.82. European Nucleotide Archive (EMBL-EBI). Occurrence dataset https://doi.org/10.15468/sbmztx accessed via GBIF.org on 2024-04-29.
物種 | 原始紀錄物種 | 類群 | 日期 | 行政區 | 資料集 | |
---|---|---|---|---|---|---|
Nasutitermes 象白蟻屬 | Nasutitermes kinoshitai | 其他昆蟲 | 2014-04-01 | 桃園市復興區 | INSDC Sequences | |
Murina recondita 隱姬管鼻蝠 | Murina recondita | 哺乳類 | 台中市和平區 | INSDC Sequences | ||
Epaphiopsis | Epaphiopsis grebennikovi | 甲蟲類 | 宜蘭縣大同鄉 | INSDC Sequences | ||
Phanerochaete sordida (P. Karst.) J. Erikss. et Ryvarden, 1978 污濁顯絲菌 | Phanerochaete sordida | 真菌類 | 2007-11-15 | 台中市和平區 | INSDC Sequences | |
Strobilomyces 松塔牛肝菌屬 | Strobilomyces sp. L12-s309 | 真菌類 | 宜蘭縣員山鄉 | INSDC Sequences | ||
Microtropis fokienensis Dunn 福建賽衛矛 | Microtropis fokienensis | 被子植物 | 2008-03-27 | 宜蘭縣大同鄉 | INSDC Sequences | |
Microtropis fokienensis Dunn 福建賽衛矛 | Microtropis fokienensis | 被子植物 | 2003-08-05 | 台中市和平區 | INSDC Sequences | |
Episiphon 蕊象牙貝屬 | Episiphon sp. | 蝸牛與貝類 | 2001-05-19 | 宜蘭縣 | INSDC Sequences | |
Sardinella hualiensis 花蓮小沙丁魚 | Sardinella hualiensis | 魚類 | 2011-04-21 | 宜蘭縣 | INSDC Sequences | |
Nasutitermes parvonasutus 小象白蟻 | Nasutitermes parvonasutus | 其他昆蟲 | 2015-05-13 | 苗栗縣卓蘭鎮 | INSDC Sequences | |
Sardinella hualiensis 花蓮小沙丁魚 | Sardinella hualiensis | 魚類 | 2011-04-21 | 宜蘭縣 | INSDC Sequences | |
Hydnophanerochaete odontoidea | 2001-06-28 | 新北市烏來區 | INSDC Sequences | |||
Vincetoxicum 白前屬 | Vincetoxicum atratum | 被子植物 | 2014-06-02 | 台中市 | INSDC Sequences | |
Ectoedemia sp. Quercus variabilis TW | 2012-10-10 | 台中市和平區 | INSDC Sequences | |||
Symphurus 無線鰨屬 | Symphurus microrhynchus | 魚類 | 2009-12-30 | 宜蘭縣 | INSDC Sequences | |
Ectoedemia sp. Carpinus Taiwan | 2012-10-10 | 台中市和平區 | INSDC Sequences | |||
Sardinella hualiensis 花蓮小沙丁魚 | Sardinella hualiensis | 魚類 | 2011-04-21 | 宜蘭縣 | INSDC Sequences | |
Symphurus orientalis 東方無線鰨 | Symphurus orientalis | 魚類 | 2009-12-30 | 宜蘭縣 | INSDC Sequences | |
Sardinella hualiensis 花蓮小沙丁魚 | Sardinella hualiensis | 魚類 | 2011-04-21 | 宜蘭縣 | INSDC Sequences | |
Symphurus megasomus 巨體無線鰨 | Symphurus megasomus | 魚類 | 2007-02-05 | 宜蘭縣 | INSDC Sequences | |
Dipsastraea favus 正菊珊瑚 | Dipsastraea favus | 其他無脊椎 | 2014-07-01 | 新北市烏來區 | INSDC Sequences | |
Sardinella hualiensis 花蓮小沙丁魚 | Sardinella hualiensis | 魚類 | 2011-04-21 | 宜蘭縣 | INSDC Sequences | |
Symphurus orientalis 東方無線鰨 | Symphurus orientalis | 魚類 | 2009-12-30 | 宜蘭縣 | INSDC Sequences | |
Favites pentagona 五邊角菊珊瑚 | Favites pentagona | 其他無脊椎 | 2014-07-01 | 新北市烏來區 | INSDC Sequences | |
Sardinella hualiensis 花蓮小沙丁魚 | Sardinella hualiensis | 魚類 | 2011-04-21 | 宜蘭縣 | INSDC Sequences |