EGAS00001000644 GoNL aligned sequence data in BAM format.
Data and Resources
This dataset has no data
Additional Info
| Field | Value |
|---|---|
| Title | EGAS00001000644 GoNL aligned sequence data in BAM format. |
| Description | We mapped the data to the UCSC human reference genome build 37 using BWA 0.5.9-r16. We first mapped each read pair separately using bwa aln. Then we used bwa sampe to map the paired reads together to a BAM9 file. The BAM file was then sorted by genomic position and indexed using PicardTools-1.32 SortSam. To prevent PCR artifacts from influencing the downstream analysis of our data, we used Picard to mark the duplicate reads, which were ignored in downstream analysis. We used GATK IndelRealigner on our data around known indels (from 1KG Pilot). The IndelRealigner creates all possible read alignments using the source and computes the likelihood of the data containing the indel based on the read pileup. Whenever the maximum likelihood contains an indel, the reads are realigned accordingly. Each base is associated with a phred-scaled base quality score. Calibration of Phred scores is crucial as they are used in some of the downstream analysis models. We used GATK to recalibrate the base qualities with respect to (i) the base cycle, (ii) original quality score, and (iii) dinucleotide context. To minimize issues stemming from mapping problems around indels, we decided to undergo a second round of indel realignment using the GATK IndelRealigner by family rather than by individual. For this second round, we considered two sources of possible indels: 1KG Phase 1 indels and indels aligned by BWA in the GoNL data. |
| Keywords | |
| Contact points |
|
| Publisher |
|
| Creator |
|
| Landing page | |
| Release date | |
| Modification date | |
| Temporal start date | |
| Temporal end date | |
| In Series | |
| Version | |
| Version notes | |
| Identifier | https://ega-archive.org/datasets/EGAD00001001038 |
| Frequency | |
| Provenance | |
| Type | |
| Temporal coverage | |
| Temporal resolution | |
| Spatial coverage | |
| Spatial resolution in meters | |
| Access rights | http://publications.europa.eu/resource/authority/access-right/NON_PUBLIC |
| Other identifier | |
| Theme |
|
| Language | |
| Documentation | |
| Conforms to | |
| Is referenced by | |
| Analytics | |
| Applicable legislation |
|
| Has version | |
| Code values | |
| Coding system | |
| Purpose | |
| Health category | |
| Health theme | |
| Legal basis | |
| Minimum typical age | |
| Maximum typical age | |
| Number of records | |
| Number of records for unique individuals. | |
| Personal data | |
| Publisher note | |
| Publisher type | |
| Trusted Data Holder | |
| Population coverage | |
| Retention period | |
| Health data access body | |
| Qualified relation | |
| Provenance activity | |
| Qualified attribution | |
| Quality annotations | |
| URI | http://umcgresearchdatacatalogue.nl/catalogue_rdf/api/rdf/CollectionEvents/name=EGAS00001000644%20GoNL%20aligned%20sequence%20data%20in%20BAM%20format.&resource=EGAS00001000644 |