jeweliop.blogg.se

Bam file format
Bam file format












When reading other textual data formats like SAM or VCF, a one-based index is perhaps more " natural" for humans used to counting digits on their hands. For that matter, why do the VCF and BCF formats, whose relationship is equivalent to that between SAM and BAM, also do this? I am guessing there is a good reason and it is equally applicable to both pairs of sister formats.Īrithmetic on a zero-based coordinate system is less complicated than that on a one-based system, so it appears zero-based is often (not exclusively) used for binary data formats, like BAM or bigBed, or text formats like BED, where computers are used to more efficiently calculate lengths of or set operations on intervals.Ī more complete answer on the benefits of zero-based accounting is available here, which references the original note (EWD831) on this subject from Dijkstra. The two coordinate systems are a very common source of error in bioinformatics, so there must be some reason why these two largely equivalent formats use different systems. The BAM, BCFv2, BED, and PSL formats are using For example, the region between the 3rd and the 7th bases In thisĬoordinate system, a region is specified by a half-closed-half-open The SAM, VCF, GFF and Wiggle formats areĬoordinate system where the first base of a sequence is zero. For example, the region between the 3rd and the 7thīases inclusive is. In this coordinate system, a region is specified byĪ closed interval.

bam file format

This is the relevant section of the SAM format specification (emphasis mine):ġ-based coordinate system A coordinate system where the first base ofĪ sequence is one. Why then does the SAM format use a 1-based coordinate system while BAM uses a 0-based one? They have the exact same information and are used in the same way. BAM files are, at least as far as I know, simply binary compressed versions of SAM files.














Bam file format