c# adobe pdf reader component : Add pages to pdf document application Library tool html .net wpf online SAMv10-part1634

SequenceAlignment/MapFormatSpecication
TheSAM/BAMFormatSpecicationWorkingGroup
18Nov2015
Themasterversionofthisdocumentcanbefoundathttps://github.com/samtools/hts-specs.
Thisprintingisversion354b495fromthatrepository,lastmodiedonthedateshownabove.
1 TheSAMFormatSpecication
SAMstandsforSequenceAlignment/Mapformat.ItisaTAB-delimitedtextformatconsistingofaheader
section,whichisoptional,andanalignmentsection.Ifpresent,theheadermustbepriortothealignments.
Headerlinesstartwith‘@’,whilealignmentlines donot. . Eachalignmentlinehas s 11mandatoryelds for
essentialalignmentinformationsuchasmappingposition,andvariablenumberofoptionaleldsfor exible
oralignerspecicinformation.
1.1 Anexample
Supposewehavethefollowingalignmentwithbasesinlowercasesclippedfromthealignment.Readr001/1
andr001/2constituteareadpair;r003isachimericread;r004representsasplitalignment.
Coor
12345678901234 5678901234567890123456789012345
ref
AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCAT
+r001/1
TTAGATAAAGGATA*CTG
+r002
aaaAGATAA*GGATA
+r003
gcctaAGCTAA
+r004
ATAGCT..............TCAGC
-r003
ttagctTAGGC
-r001/2
CAGCGGCAT
ThecorrespondingSAMformatis:
1
@HD VN:1.5 SO:coordinate
@SQ SN:ref LN:45
r001
99 ref f 7 7 30 0 8M2I4M1D3M M = = 37 7 39 9 TTAGATAAAGGATACTG *
r002
0 ref f 9 9 30 0 3S6M1P1I4M M * * 0
0 AAAAGATAAGGATA
*
r003
0 ref f 9 9 30 0 5S6M
* 0
0 GCCTAAGCTAA
* SA:Z:ref,29,-,6H5M,17,0;
r004
0 ref 16 30 0 6M14N5M
* 0
0 ATAGCTTCAGC
*
r003 2064 4 ref 29 17 7 6H5M
* 0
0 TAGGC
* SA:Z:ref,9,+,5S6M,30,1;
r001 147 7 ref 37 30 0 9M
= 7 7 -39 9 CAGCGGCAT
* NM:i:1
1
The values in the FLAGcolumn correspond to bitwise ags as follows: : 99 9 =0x63: : rst/next t is s reverse-complemented/
properly aligned/multiple segments; 0: : no o  ags set, thus s amapped single segment; 2064 4 =0x810: : supplementary/reverse-
complemented;147=0x93: last(secondofapair)/reverse-complemented/properlyaligned/multiplesegments.
1
Add pages to pdf document - insert pages into PDF file in C#.net, ASP.NET, MVC, Ajax, WinForms, WPF
Guide C# Users to Insert (Empty) PDF Page or Pages from a Supported File Format
add page pdf reader; add or remove pages from pdf
Add pages to pdf document - VB.NET PDF Page Insert Library: insert pages into PDF file in vb.net, ASP.NET, MVC, Ajax, WinForms, WPF
Easy to Use VB.NET APIs to Add a New Blank Page to PDF Document
adding pages to a pdf; add page to pdf reader
1.2 TerminologiesandConcepts
Template ADNA/RNAsequencepartofwhichissequencedonasequencingmachineorassembledfrom
rawsequences.
Segment Acontiguoussequenceorsubsequence.
Read Arawsequencethatcomesoasequencingmachine.Areadmayconsistofmultiplesegments. For
sequencingdata,readsareindexedbytheorderinwhichtheyaresequenced.
Linear alignment t An alignment of a a read to a single reference e sequence e that t may include e insertions,
deletions,skipsandclipping,butmaynotincludedirectionchanges(i.e. oneportionofthealignment
onforwardstrandandanother portionof alignmentonreversestrand). . A A linear alignmentcanbe
representedinasingleSAMrecord.
Chimericalignment Analignmentofareadthatcannotberepresentedasalinearalignment.Achimeric
alignmentisrepresentedasasetoflinearalignmentsthatdonothavelargeoverlaps. Typically,one
ofthelinearalignmentsinachimericalignmentisconsideredthe\representative"alignment,andthe
othersarecalled\supplementary"andaredistinguishedbythesupplementaryalignment ag. Allthe
SAMrecordsinachimericalignmenthavethesameQNAMEandthesamevaluesfor0x40and0x80
ags(seeSection1.4).Thedecisionregardingwhichlinearalignmentisrepresentativeisarbitrary.
Readalignment Alinear r alignmentor achimericalignment that isthe complete representationof the
alignmentoftheread.
Multiplemapping The e correct placement of aread may y be ambiguous, e.g. . due e to repeats. . Inthis
case,theremaybemultiplereadalignmentsforthesameread. Oneofthesealignmentsisconsidered
primary. Allthe e other alignments have thesecondary alignment  agset intheSAM records that
representthem. AlltheSAMrecordshavethesameQNAMEandthesamevaluesfor0x40and0x80
ags. Typically y the alignment designated d primary y is s the e best alignment, but the decisionmay be
arbitrary.
2
1-basedcoordinatesystem Acoordinate system wheretherstbase of asequence is s one. . Inthis s co-
ordinatesystem,aregionis speciedbyaclosedinterval. . Forexample,theregionbetweenthe3rd
andthe7thbasesinclusiveis[3;7]. TheSAM,VCF,GFFandWiggleformatsareusingthe1-based
coordinatesystem.
0-basedcoordinatesystem A A coordinate e system where e the rst base e of f a a sequence is zero. . In n this
coordinate system,a regionis speciedby ahalf-closed-half-openinterval. . Forexample,theregion
betweenthe3rdandthe7thbasesinclusiveis[2;7). TheBAM,BCFv2,BED,andPSLformatsare
usingthe0-basedcoordinatesystem.
Phredscale Givenaprobability0<p1,thephredscaleofpequals 10log
10
p,roundedtotheclosest
integer.
1.3 Theheadersection
Eachheader linebeginswiththecharacter ‘@’followedbyoneofthetwo-letterheaderrecordtypecodes
denedinthissection. Intheheader,eachlineisTAB-delimitedand,apartfrom@COlines,eachdataeld
follows aformat ‘TAG:VALUE’ ’ whereTAG isa two-character r stringthat denes the formatand d content t of
VALUE.Thusheaderlinesmatch/^@[A-Z][A-Z](\t[A-Za-z][A-Za-z0-9]:[ -~]+)+$/or/^@CO\t.*/.
2A chimericalignmentis primarilycaused bystructural variations,gene fusions, misassemblies, RNA-seq or experimental
protocols. Itismorefrequentgivenlongerreads. Forachimericalignment,thelinearalignmentsconsistingofthealigmentare
largelynon-overlapping;eachlinear alignment may havehigh mappingqualityandis informativein SNP/INDELcalling. . In
contrast,multiplemappingsarecausedprimarilybyrepeats. Theyarelessfrequent givenlongerreads. Ifareadhasmultiple
mappings,allthesemappings arealmostentirelyoverlappingwitheachother;exceptthesingle-bestoptimalmapping,allthe
othermappingsgetmappingquality<Q3andareignoredbymostSNP/INDELcallers.
2
C# PDF Page Extract Library: copy, paste, cut PDF pages in C#.net
pageIndexes.Add(3); // The 4th page. // Create the new document with 3 pages. String outputFilePath = Program.RootPath + "\\" Output.pdf"; newDoc.Save
add page number pdf; add page numbers pdf
VB.NET PDF Page Extract Library: copy, paste, cut PDF pages in vb.
pageIndexes.Add(3) ' The 4th page. ' Create the new document with 3 pages. Dim outputFilePath As String = Program.RootPath + "\\" Output.pdf" newDoc.Save
adding a page to a pdf in reader; adding page numbers to pdf
Thefollowingtabledescribestheheaderrecordtypesthatmaybeusedandtheirpredenedtags.Tags
listedwith‘*’arerequired;e.g.,every@SQheaderlinemusthaveSNandLNelds.Aswithalignmentoptional
elds(seeSection1.5),youcanfreelyaddnewtagsforfurtherdataelds. Tagscontaininglowercaseletters
arereservedforlocaluseandwillnotbeformallydenedinanyfutureversionofthisspecication.
3
Tag
Description
@HD
Theheaderline. Therstlineifpresent.
VN*
Formatversion. Acceptedformat: /^[0-9]+\.[0-9]+$/.
SO
Sorting order of alignments. . Valid d values: : unknown n (default), , unsorted, , queryname and
coordinate. Forcoordinate e sort, themajor sortkeyis the RNAMEeld,with order dened
bythe order of@SQlines in theheader. . The e minor sortkeyis thePOS eld. . For r alignments
withequalRNAMEandPOS,orderisarbitrary. Allalignmentswith‘*’inRNAMEeldfollow
alignmentswithsomeothervaluebutotherwiseareinarbitraryorder.
GO
Groupingofalignments,indicatingthatsimilaralignmentrecordsaregroupedtogetherbutthe
leisnotnecessarilysortedoverall. Validvalues:none(default),query(alignmentsaregrouped
byQNAME),andreference(alignmentsaregroupedbyRNAME/POS).
@SQ
Referencesequencedictionary. Theorderof@SQlinesdenesthealignmentsortingorder.
SN*
Referencesequencename. Each@SQlinemusthaveauniqueSNtag. Thevalueofthiseldisused
inthealignmentrecordsinRNAMEandRNEXTelds. Regularexpression: [!-)+-<>-~][!-~]*
LN*
Referencesequencelength. Range: [1,2
31
-1]
AS
Genomeassemblyidentier.
M5
MD5checksumofthesequenceintheuppercase,excludingspacesbutincludingpads(as‘*’s).
SP
Species.
UR
URIofthesequence. Thisvaluemaystartwithoneofthestandardprotocols,e.ghttp: orftp:.
Ifitdoesnotstartwithoneoftheseprotocols,itisassumedtobeale-systempath.
@RG
Readgroup. Unorderedmultiple@RGlinesareallowed.
ID*
Readgroupidentier. Each@RGlinemusthaveauniqueID.ThevalueofIDisusedintheRG
tagsofalignmentrecords.Mustbeuniqueamongallreadgroupsinheadersection.Readgroup
IDsmaybemodiedwhenmergingSAMlesinordertohandlecollisions.
CN
Nameofsequencingcenterproducingtheread.
DS
Description.
DT
Datetherunwasproduced(ISO8601dateordate/time).
FO
Flow order. . The e array of nucleotide bases s that correspond d to o the nucleotides s used for r each
owofeachread. Multi-base owsareencodedinIUPACformat,andnon-nucleotide owsby
variousothercharacters. Format: /\*|[ACMGRSVTWYHKDBN]+/
KS
Thearrayofnucleotidebasesthatcorrespondtothekeysequenceofeachread.
LB
Library.
PG
Programsusedforprocessingthereadgroup.
PI
Predictedmedianinsertsize.
PL
Platform/technology used toproduce thereads. . Validvalues: : CAPILLARY,LS454, , ILLUMINA,
SOLID,HELICOS,IONTORRENT,ONT,andPACBIO.
PM
Platformmodel. Free-formtextprovidingfurtherdetailsoftheplatform/technologyused.
PU
Platformunit(e.g. owcell-barcode.laneforIlluminaorslideforSOLiD).Uniqueidentier.
SM
Sample. Usepoolnamewhereapoolisbeingsequenced.
@PG
Program.
ID*
Programrecordidentier. Each @PGlinemusthaveauniqueID.ThevalueofIDisusedinthe
alignment PGtagandPPtagsofother @PGlines. . PGIDsmaybemodiedwhen n mergingSAM
lesinordertohandlecollisions.
PN
Programname
CL
Commandline
3
Bestpracticeistouselowercasetagswhiledesigningandexperimentingwithnewdataeldtagsorforeldsoflocalinterest
only. For r newtagsthat areofgeneralinterest,raiseanhts-specsissueoremail samtools-devel@lists.sourceforge.netto
haveanuppercaseequivalentaddedtothespecication. Thiswaycollisionsofthesameuppercasetagbeingusedwithdierent
meanings canbeavoided.
3
VB.NET PDF Page Delete Library: remove PDF pages in vb.net, ASP.
Add necessary references: RasterEdge.Imaging.Basic.dll. This is a VB .NET example for how to delete a range of pages from a PDF document.
add page numbers to pdf; add a page to a pdf document
C# PDF Page Delete Library: remove PDF pages in C#.net, ASP.NET
options, including setting a single page, a series of pages, and random pages to be C#.NET Project DLLs for Deleting PDF Document Page. Add necessary references
adding page numbers in pdf; add page number to pdf reader
PP
Previous@PG-ID.Mustmatchanother@PGheader’sIDtag. @PGrecordsmaybechainedusingPP
tag,withthelastrecordinthechainhavingnoPPtag. Thischaindenestheorderofprograms
thathavebeenapplied tothealignment. . PPvaluesmaybemodiedwhenmergingSAMles
inorder tohandlecollisions ofPGIDs. . TherstPGrecord d inachain(i.e. . theonereferredto
bythePGtaginaSAMrecord)describes themostrecentprogramthatoperatedontheSAM
record. ThenextPGrecordinthechaindescribesthenextmostrecentprogramthatoperated
ontheSAMrecord. ThePGIDonaSAMrecordisnotrequiredtorefertothenewestPGrecord
inachain. It t mayrefer toany PGrecord in achain,implyingthat the SAMrecord hasbeen
operatedonbytheprograminthatPGrecord,andtheprogram(s)referredtoviathePPtag.
DS
Description.
VN
Programversion
@CO
One-linetextcomment. Unorderedmultiple@COlinesareallowed.
1.4 Thealignmentsection: : mandatoryelds
IntheSAMformat,eachalignmentlinetypicallyrepresents thelinear alignmentofasegment. . Eachline
has11mandatoryelds. Theseeldsalwaysappearinthesameorderandmustbepresent,buttheirvalues
canbe‘0’or‘*’(dependingontheeld)ifthecorrespondinginformationisunavailable.Thefollowingtable
givesanoverviewofthemandatoryeldsintheSAMformat:
Col Field
Type
Regexp/Range
Briefdescription
1
QNAME
String
[!-?A-~]{1,254}
QuerytemplateNAME
2
FLAG
Int
[0,2
16
-1]
bitwiseFLAG
3
RNAME
String
\*|[!-()+-<>-~][!-~]*
ReferencesequenceNAME
4
POS
Int
[0,2
31
-1]
1-basedleftmostmappingPOSition
5
MAPQ
Int
[0,2
8
-1]
MAPpingQuality
6
CIGAR
String
\*|([0-9]+[MIDNSHPX=])+
CIGARstring
7
RNEXT
String
\*|=|[!-()+-<>-~][!-~]*
Ref. nameofthemate/nextread
8
PNEXT
Int
[0,2
31
-1]
Positionofthemate/nextread
9
TLEN
Int
[-2
31
+1,2
31
-1]
observedTemplateLENgth
10
SEQ
String
\*|[A-Za-z=.]+
segmentSEQuence
11
QUAL
String
[!-~]+
ASCIIofPhred-scaledbaseQUALity+33
1. QNAME:QuerytemplateNAME.Reads/segmentshavingidenticalQNAMEareregardedtocomefrom
thesametemplate. AQNAME‘*’indicatestheinformationisunavailable.InaSAMle,areadmay
occupymultiplealignmentlines,whenitsalignmentischimericorwhenmultiplemappingsaregiven.
2. FLAG:CombinationofbitwiseFLAGs.Eachbitisexplainedinthefollowingtable:
Bit
Description
1
0x1
templatehavingmultiplesegmentsinsequencing
2
0x2
eachsegmentproperlyalignedaccordingtothealigner
4
0x4
segmentunmapped
8
0x8
nextsegmentinthetemplateunmapped
16
0x10
SEQbeingreversecomplemented
32
0x20
SEQofthenextsegmentinthetemplatebeingreversecomplemented
64
0x40
therstsegmentinthetemplate
128
0x80
thelastsegmentinthetemplate
256
0x100
secondaryalignment
512
0x200
notpassinglters,suchasplatform/vendorqualitycontrols
1024
0x400
PCRoropticalduplicate
2048
0x800
supplementaryalignment
 Foreachread/contiginaSAMle,itisrequiredthatoneandonlyonelineassociatedwiththe
readsatises‘FLAG& 0x900 == 0’.Thislineiscalledtheprimaryline e oftheread.
4
Themanipulationofbitwise agsisdescribedatWikipedia(see\Flageld")andelsewhere.
4
VB.NET PDF insert image library: insert images into PDF in vb.net
VB.NET PDF - Add Image to PDF Page in VB.NET. Guide VB.NET Programmers How to Add Images in PDF Document Using XDoc.PDF SDK for VB.NET.
add page numbers to a pdf document; add pdf pages together
C# PDF Password Library: add, remove, edit PDF file password in C#
in C#.NET framework. Support to add password to PDF document online or in C#.NET WinForms for PDF file protection. Able to create a
add pages to pdf; add and remove pages from pdf file online
 Bit0x100marksthealignmentnottobeusedincertainanalyseswhenthetoolsinuseareaware
ofthisbit. Itistypicallyusedto agalternativemappingswhenmultiplemappingsarepresented
inaSAM.
 Bit0x800indicatesthatthecorrespondingalignmentlineispartofachimericalignment. Aline
aggedwith0x800iscalledasasupplementaryline.
 Bit0x4istheonlyreliableplacetotellwhetherthereadisunmapped.If0x4isset,noassumptions
canbemadeaboutRNAME,POS,CIGAR,MAPQ,andbits0x2,0x100,and0x800.
 Bit0x10indicateswhetherSEQhasbeenreversecomplementedandQUALreversed.Whenbit0x4
isunset,thiscorrespondstothestrandtowhichthesegmenthasbeenmapped. When0x4isset,
thisindicates whethertheunmappedreadisstoredinitsoriginalorientationas itcameothe
sequencingmachine.
 If0x40and0x80arebothset,thereadispartofalineartemplate,butitisneithertherstnor
thelastread.Ifboth0x40and0x80areunset,theindexofthereadinthetemplateisunknown.
Thismayhappenforanon-lineartemplateortheindexislostindataprocessing.
 If0x1isunset,noassumptionscanbemadeabout0x2,0x8,0x20,0x40and0x80.
 Bitsthatarenot t listedinthetablearereservedfor futureuse. . They y shouldnotbesetwhen
writingandshouldbeignoredonreadingbycurrentsoftware.
3. RNAME:ReferencesequenceNAMEofthealignment. If@SQheaderlinesarepresent,RNAME(ifnot
‘*’)mustbepresentinoneoftheSQ-SNtag. Anunmappedsegmentwithoutcoordinatehasa‘*’at
thiseld. However,anunmappedsegmentmayalsohaveanordinarycoordinatesuchthatitcanbe
placedatadesiredpositionaftersorting. IfRNAMEis‘*’,noassumptionscanbemadeaboutPOS
andCIGAR.
4. POS:1-basedleftmost mapping POSitionof the rst matching base. . The e rst base in n areference
sequencehascoordinate1. POSissetas0foranunmappedreadwithoutcoordinate.IfPOSis0,no
assumptionscanbemadeaboutRNAMEandCIGAR.
5. MAPQ:MAPpingQuality.Itequals 10log
10
Prfmappingpositioniswrongg,roundedtothenearest
integer.Avalue255indicatesthatthemappingqualityisnotavailable.
6. CIGAR:CIGARstring.TheCIGARoperationsaregiveninthefollowingtable(set‘*’ifunavailable):
Op
BAM
Description
M
0
alignmentmatch(canbeasequencematchormismatch)
I
1
insertiontothereference
D
2
deletionfromthereference
N
3
skippedregionfromthereference
S
4
softclipping(clippedsequencespresentinSEQ)
H
5
hardclipping(clippedsequencesNOTpresentinSEQ)
P
6
padding(silentdeletionfrompaddedreference)
=
7
sequencematch
X
8
sequencemismatch
 Hcanonlybepresentastherstand/orlastoperation.
 SmayonlyhaveHoperationsbetweenthemandtheendsoftheCIGARstring.
 FormRNA-to-genomealignment,anNoperationrepresentsanintron. . Forothertypesofalign-
ments,theinterpretationofNisnotdened.
 SumoflengthsoftheM/I/S/=/XoperationsshallequalthelengthofSEQ.
7. RNEXT:ReferencesequencenameoftheprimaryalignmentoftheNEXTreadinthetemplate. . For
thelastread,thenextreadistherstreadinthetemplate.If@SQheaderlinesarepresent,RNEXT(if
not‘*’or‘=’)mustbepresentinoneoftheSQ-SNtag.Thiseldissetas‘*’whentheinformationis
unavailable,andsetas‘=’ifRNEXTisidenticalRNAME.Ifnot‘=’andthenextreadinthetemplate
hasoneprimarymapping(seealsobit0x100inFLAG),thiseldisidenticaltoRNAMEattheprimary
lineofthenextread. IfRNEXTis‘*’,noassumptionscanbemadeonPNEXTandbit0x20.
5
C# PDF insert image Library: insert images into PDF in C#.net, ASP
Create high resolution PDF file without image quality losing in ASP.NET application. Add multiple images to multipage PDF document in .NET WinForms.
add multi page pdf to word document; adding page numbers to pdf in reader
VB.NET PDF Password Library: add, remove, edit PDF file password
allowed. passwordSetting.IsCopy = True ' Allow to assemble document. passwordSetting.IsAssemble = True ' Add password to PDF file.
add pages to pdf file; adding page numbers to a pdf file
8. PNEXT:Positionofthe primaryalignment of the NEXTreadinthe template. . Set t as 0whenthe
informationisunavailable. ThiseldequalsPOSattheprimarylineofthenextread. IfPNEXTis0,
noassumptionscanbemadeonRNEXTandbit0x20.
9. TLEN: signed observedTemplate LENgth. . If f all segments are mappedto the same reference, the
unsignedobservedtemplatelengthequalsthenumberofbasesfromtheleftmostmappedbasetothe
rightmostmappedbase. Theleftmostsegment t hasaplus signandthe rightmost hasaminus sign.
Thesignofsegmentsinthemiddleisundened. Itissetas0forsingle-segmenttemplateorwhenthe
informationisunavailable.
10. SEQ:segmentSEQuence.Thiseldcanbea‘*’whenthesequenceisnotstored.Ifnota‘*’,thelength
ofthesequencemustequalthesumoflengthsofM/I/S/=/XoperationsinCIGAR.An‘=’denotesthe
baseisidenticaltothereferencebase.Noassumptionscanbemadeonthelettercases.
11. QUAL:ASCIIofbaseQUALityplus33(sameasthequalitystringintheSangerFASTQformat). A
basequalityisthephred-scaledbaseerrorprobabilitywhichequals 10log
10
Prfbaseiswrongg.This
eldcanbea‘*’whenqualityisnotstored. Ifnota‘*’,SEQmustnotbea‘*’andthelengthofthe
qualitystringoughttoequalthelengthofSEQ.
1.5 Thealignmentsection: : optionalelds
All optional elds follow the e TAG:TYPE:VALUE format where e TAG is a two-character r string that t matches
/[A-Za-z][A-Za-z0-9]/. EachTAGcanonlyappearonceinonealignmentline.ATAGcontaininglowercase
lettersarereservedforendusers.Inanoptionaleld,TYPEisasinglecase-sensitiveletterwhichdenesthe
formatofVALUE:
Type
RegexpmatchingVALUE
Description
A
[!-~]
Printablecharacter
i
[-+]?[0-9]+
Signedinteger
5
f
[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?
Single-precision oatingnumber
Z
[ !-~]+
Printablestring,includingspace
H
[0-9A-F]+
BytearrayintheHexformat
6
B
[cCsSiIf](,[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)+
Integerornumericarray
For aninteger or numeric array (type ‘B’), , the rstletter r indicatesthetypeof numbersinthe following
commaseparatedarray.Thelettercanbeoneof‘cCsSiIf’,correspondingtoint8
t(signed8-bitinteger),
uint8
t(unsigned8-bitinteger),int16
t,uint16
t,int32
t,uint32
tandfloat,respectively.During
import/export,theelementtypemaybechangedifthenewtypeisalsocompatiblewiththearray.
Predenedtags are showninthefollowingtable. . Youcanfreelyaddnewtags, , andifanewtagmay
beofgeneralinterest,youcanemailsamtools-devel@lists.sourceforge.nettoaddthenewtagtothe
specication.Notethattagsstartingwith‘X’,‘Y’and‘Z’ortagscontaininglowercaselettersineitherposition
arereservedforlocaluseandwillnotbeformallydenedinanyfutureversionofthisspecication.
Tag
8
Type
Description
X?
?
Reservedeldsforendusers(togetherwithY?andZ?)
AM
i
Thesmallesttemplate-independentmappingqualityofsegmentsintherest
AS
i
Alignmentscoregeneratedbyaligner
BC
Z
Barcodesequence,withanyqualityscoresstoredintheQTtag.
BQ
Z
Osettobasealignmentquality(BAQ),ofthesamelengthasthereadsequence.Atthe
i-threadbase,BAQ
i
=Q
i
(BQ
i
64)whereQ
i
isthei-thbasequality.
CC
Z
Referencenameofthenexthit;‘=’forthesamechromosome
CM
i
Editdistancebetweenthecolorsequenceandthecolorreference(seealsoNM)
CO
Z
Free-textcomments
5
Thenumberofdigitsinan integeroptional eldis notexplicitlylimitedinSAM.However, BAMcanrepresentvaluesin
therange[ 2
31
;2
32
),soinpracticethisis therealisticrangeof valuesfor SAM’s‘i’as well.
6For example,abytearray[0x1a;0xe3;0x1]correspondstoaHexstring‘1AE301’.
7
ExplicittypingeasesformatparsingandhelpstoreducethelesizewhenSAMis convertedtoBAM.
6
CP
i
Leftmostcoordinateofthenexthit
CQ
Z
Color read quality on the originalstrand of the read. . Same e encodingas QUAL;same
lengthasCS.
CS
Z
Colorreadsequenceontheoriginalstrandoftheread.Theprimerbasemustbeincluded.
CT
Z
Completereadannotationtag,usedforconsensusannotationdummyfeatures.
9
E2
Z
The2ndmostlikelybasecalls. SameencodingandsamelengthasQUAL.
FI
i
Theindexofsegmentinthetemplate.
FS
Z
Segmentsux.
FZ
B,S
Flow signal intensities on the original l strand d of the read, , stored d as (uint16
t)
round(value * * 100.0).
LB
Z
Library. ValuetobeconsistentwiththeheaderRG-LBtagif@RGispresent.
H0
i
Numberofperfecthits
H1
i
Numberof1-dierencehits(seealsoNM)
H2
i
Numberof2-dierencehits
HI
i
Queryhitindex,indicatingthealignmentrecordisthei-thonestoredinSAM
IH
i
NumberofstoredalignmentsinSAMthatcontainsthequeryinthecurrentrecord
MC
Z
CIGARstringformate/nextsegment
MD
Z
Stringformismatchingpositions. Regex:[0-9]+(([A-Z]|\^[A-Z]+)[0-9]+)*
10
MQ
i
Mappingqualityofthemate/nextsegment
NH
i
Numberofreportedalignmentsthatcontainsthequeryinthecurrentrecord
NM
i
Editdistancetothereference,includingambiguousbasesbutexcludingclipping
OQ
Z
Originalbasequality(usuallybeforerecalibration). Sameencodingas QUAL.
OP
i
Originalmappingposition(usuallybeforerealignment)
OC
Z
OriginalCIGAR(usuallybeforerealignment)
PG
Z
Program. ValuematchestheheaderPG-IDtagif@PGispresent.
PQ
i
Phredlikelihoodofthetemplate,conditionalonboththemappingbeingcorrect
PT
Z
Readannotationsforpartsofthepaddedreadsequence
11
PU
Z
Platformunit. ValuetobeconsistentwiththeheaderRG-PUtagif@RGispresent.
QT
Z
PhredqualityofthebarcodesequenceintheBC(orRT)tag. SameencodingasQUAL.
Q2
Z
Phredqualityofthemate/nextsegmentsequenceintheR2tag. SameencodingasQUAL.
R2
Z
Sequenceofthemate/nextsegmentinthetemplate.
RG
Z
Readgroup. ValuematchestheheaderRG-IDtagif@RGispresentintheheader.
RT
Z
DeprecatedalternativetoBCtagoriginallyusedatSanger.
SA
Z
Othercanonicalalignmentsinachimericalignment,formattedasasemicolon-delimited
list: (rname,pos,strand,CIGAR,mapQ,NM;)+. Eachelementinthelistrepresentsa
partofthechimericalignment.Conventionally,atasupplementaryline,therstelement
pointstotheprimaryline.
SM
i
Template-independentmappingquality
TC
i
Thenumberofsegmentsinthetemplate.
U2
Z
Phred probilityof the 2ndcallbeingwrongconditional onthebestbeingwrong. . The
sameencodingasQUAL.
UQ
i
Phredlikelihoodofthesegment,conditionalonthemappingbeingcorrect
8
TheGS,GC,GQ,MF,S2andSQarereservedforbackwardcompatibility.
9TheCTtagisintended primarilyfor annotationdummyreads,andconsists of astrand, type andzeroormorekey=value
pairs,eachseparatedwithsemicolons. Thestrand d eldhas s fourvaluesasinGFF3,andsupplementsFLAGbit0x10toallow
unstranded(‘.’),andstranded butunknownstrand(‘?’) ) annotation. Fortheseandannotation n ontheforwardstrand(strand
set to‘+’),donotsetFLAGbit0x10. Forannotationonthereversestrand,setthestrand d to‘-’andsetFLAGbit0x10. The
type andanykeysandtheir optionalvalues s areallpercentencodedaccordingtoRFC3986toescapemeta-characters s ‘=’,‘%’,
‘;’,‘|’ornon-printablecharactersnotmatchedbytheisprint()macro(withtheClocale). Forexampleapercentsignbecomes
‘%2C’.TheCTrecordmatches: \strand;type(;key(=value))*".
10
TheMDeldaims toachieveSNP/indelcallingwithout lookingat the reference. . Forexample,astring‘10A5^AC6’means
from theleftmost reference basein thealignment,there are10matches followedby an A on thereferencewhich is dierent
from thealignedreadbase;thenext 5referencebases arematchesfollowedbya2bpdeletionfromthereference; the deleted
sequenceisAC;thelast6basesarematches. TheMDeldoughttomatchtheCIGARstring.
11ThePTtagvaluehas the format of aseries oftags separated by|, each annotatingasub-region of the read. . Each h tag
consists of start, , end, strand, , type and d zero o or more key=value pairs, each h separated with semicolons. . Start t and d end d are
1-basedpositions between oneandthesum oftheM/I/D/P/S/=/XCIGARoperators, i.e. . SEQlengthplusanypads. Noteany
7
2 RecommendedPracticefortheSAMFormat
ThissectiondescribesthebestpracticeforrepresentingdataintheSAMformat. Theyarenotrequiredin
general,butmayberequiredbyaspecicsoftwarepackageforittofunctionproperly.
1. Theheadersection
1 The@HDlineshouldbepresent,witheithertheSOtagortheGOtag(butnotboth)specied.
2 The@SQlinesshouldbepresentifreadshavebeenmapped.
3 WhenaRGtagappearsanywhereinthealignmentsection,thereshouldbeasinglecorresponding
@RGlinewithmatchingIDtagintheheader.
4 WhenaPGtagappearsanywhereinthealignmentsection,thereshouldbeasinglecorresponding
@PGlinewithmatchingIDtagintheheader.
2. AdjacentCIGARoperationsshouldbedierent.
3. Noalignmentsshouldbeassignedmappingquality255.
4. Unmappedreads
1 Foraunmappedpaired-endormate-pairreadwhosemateismapped,theunmappedreadshould
haveRNAMEandPOSidenticaltoitsmate.
2 Ifallsegmentsinatemplateareunmapped,theirRNAMEshouldbesetas‘*’andPOSas0.
3 IfPOSplusthesumoflengthsofM/=/X/D/NoperationsinCIGARexceedsthelengthspeciedin
theLNeldofthe@SQheaderline(ifexists)withanSNequaltoRNAME,thealignmentshould
beunmapped.
4 Unmappedreadsshouldbestoredintheorientationinwhichtheycameothesequencingmachine
andhavetheirreverse agbit(0x10)correspondinglyunset.
5. Multiplemapping
1 Whenonesegmentis s present inmultiplelinestorepresentamultiplemappingofthesegment,
only oneof theserecords shouldhavethesecondary alignment agbit (0x100)unset. . RNEXT
andPNEXTpointtotheprimarylineofthenextreadinthetemplate.
2 SEQandQUALofsecondaryalignmentsshouldbesetto‘*’toreducethelesize.
6. Optionaltags:
1 Ifthetemplatehasmorethan2segments,theTCtagshouldbepresent.
2 TheNMtagshouldbepresent.
7. Annotationdummyreads: ThesehaveSEQsetto*,FLAGbits s 0x100and0x200set(secondaryand
ltered),andaCTtag.
1 IfyouwishtostorefreetextinaCTtag,usethekeyvalueNote(uppercaseN)tomatchGFF3.
2 Multi-segmentannotation(e.g. . agenewithintrons)shouldbedescribedwithmultiplelinesin
SAM(likeamulti-segmentread).Wherethereisaclearbiologicaldirection(e.g.agene),therst
segment(FLAGbit0x40)isusedfortherstsection(e.g.the50endofthegene).ThusaGenBank
entry location n like complement(join(85052..85354, 85441..85621, 86097..86284)) would
havethreelinesinSAMwithacommonQNAME:
editingof the CIGAR stringmay requireupdatingthe ‘PT’ tagcoordinates, or eveninvalidatethem. . As s inGFF3, strand d is
one of f ‘+’ ’ for forward strand tags, ‘-’ for reverse strand, ‘.’ ’ for r unstranded or r ‘?’ ’ for r stranded but unknown strand. . The
type and anykeys s andtheiroptionalvalues areallpercentencodedas s intheCTtag. FormallytheentirePTrecordmatches:
\start;end;strand;type(;key(=value))*(\|start;end;strand;type(;key(=value))*)*".
8
FLAG
POS
CIGAR
Optionalelds
The5
0
fragment 883(0x373)
86097
188M
FI:i:1 TC:i:3
Middlefragment 819(0x333)
85441
181M
FI:i:2 TC:i:3
The3
0
fragment 947(0x3B3) ) 85052
303M
FI:i:3 TC:i:3
3 IfconvertingGFF3toSAM,storeany key,values from column9inthe CTtag,exceptforthe
unique IDwhichis usedforthe QNAME.GFF3columns1 (seqid), 4(start) and5(end) are
encodedusingSAM columnsRNAME,POSandCIGARtoholdthe length. . GFF3columns s 3
(type)and7(strand)arestoredexplicitlyinthe CTtag. . RemainingGFF3columns2(source),
6(score),and8(phase)arestoredintheCTtagusingkey valuesFSource,FScoreandFPhase
(uppercasekeysarerestrictedinGFF3,sothesenamesavoidclashes). Splitlocationfeaturesare
describedwithmultiplelinesinGFF3,andsimilarlybecomemulti-segmentdummyreadsinSAM,
withtheRNEXTandPNEXTcolumnslledinappropriately. Intheabsenceofaconventionin
SAM/BAMforreadswrappingtheoriginofacirculargenome,anyGFF3featurelinewrapping
theoriginmustbesplitintotwosegmentsinSAM.
3 GuideforDescribingAssemblySequencesinSAM
3.1 Unpaddedversuspaddedrepresentation
Todescribealignments,wecanregardthereferencesequencewithnorespecttootheralignmentsagainstit.
Suchareferencesequenceiscalledanunpaddedreference.Apositiononanunpaddedreference,referredto
asanunpaddedposition,isnotaectedbyanyalignments. Whenweuseunpaddedreferencesandpositions
todescribealignments,wesayweareusingtheunpaddedrepresentation.
Alternatively,todescribe the samealignments, wecanmodifythereference sequence tocontainpads
thatmakeroomforsequencesinsertedrelativetothereference.Apadiseectivelyagapandconventionally
representedbyanasterisk‘*’.Areferencesequencecontainingpadsiscalledapaddedreference. Aposition
whichcountsthe*’sisreferredtoasapaddedposition. Apaddedreferencesequencemaybeaectedbythe
queryalignmentsandbecauseofgapinsertionsistypicallylongerthantheunpaddedreference. Thepadded
positionofonequeryalignmentmaybeaectedbyotherqueryalignments.
Unpaddedandpaddedare dierent representations of the same alignments. . They y are convertible to
each other r withno o loss of f any information. . The e unpaddedrepresentation n is s more common n due e to the
convenienceofaxedcoordinatesystem,whilethepaddedrepresentationhastheadvantagethatalignments
canbe simply describedby the start andend d coordinates s without usingcomplex CIGAR strings. . SAM
traditionally uses the padded d representationfor r de e novo o assembly. . The e ACE assembly y format t uses the
paddedrepresentationexclusively.
3.2 PaddedSAM
TheSAMformatistypicallyusedtodescribealignments againstanunpaddedreference sequence,but it
isalsoabletodescribealignments against apaddedreference. . Inthe e latter case,we say we areusinga
paddedSAM.ApaddedSAMisavalidSAM,butwiththedierencethatthereferenceandpositionsinuse
arepadded. Theremaybemorethanonewaytodescribethepaddedrepresentation. . Werecommendthe
following.
InapaddedSAM,alignments andcoordinates aredescribedwithrespect tothe paddedreferencese-
quence. UnliketraditionalpaddedrepresentationsliketheACEleformatwherepads/gapsarerecordedin
readsusing*’s,wedonotwrite*’sintheSEQeldoftheSAMformat.12 Instead,wedescribepadsinthe
querysequencesasdeletionsfromthepaddedreferenceusingtheCIGAR‘D’operation. InapaddedSAM,
theinsertionandpaddingCIGARoperations(‘I’and‘P’)arenotusedbecausethepaddedreferencealready
considersalltheinsertions.
ThefollowingshowsthepaddedSAMfortheexamplealignmentinSection1.1. Notably,thelengthof
refis47insteadof45. POSofthelastthreealignmentsareallshiftedby2.CIGARofalignmentsbridging
the2bpinsertionarealsochanged.
12Writing pads/gaps s as *’s in the e SEQ
eld might have been more e convenient, , but t this caused concerns for backward
compatibility.
9
@HD VN:1.5 SO:coordinate
@SQ SN:ref LN:47
ref
516 ref f 1 1 0 0 14M2D31M
* 0
0 AGCATGTTAGATAAGATAGCTGTGCTAGTAGGCAGTCAGCGCCAT T *
r001
99 ref f 7 7 30 0 14M1D3M
= 39 9 41 1 TTAGATAAAGGATACTG *
*
768 ref f 8 8 30 0 1M
* 0
0 *
* CT:Z:.;Warning;Note=Ref f wrong?
r002
0 ref f 9 9 30 0 3S6M1D5M
* 0
0 AAAAGATAAGGATA
* PT:Z:1;4;+;homopolymer
r003
0 ref f 9 9 30 0 5H6M
* 0
0 AGCTAA
* NM:i:1
r004
0 ref 18 30 0 6M14N5M
* 0
0 ATAGCTTCAGC
*
r003 2064 4 ref 31 30 0 6H5M
* 0
0 TAGGC
* NM:i:0
r001 147 7 ref 39 30 0 9M
= 7 7 -41 1 CAGCGGCAT
* NM:i:1
Herewealsoexemplify therecommendedpracticeforstoringthereferencesequenceandthereference
annotations inSAM M whennecessary. . For r a a reference sequence inSAM, QNAME E shouldbe e identical to
RNAME,POSsetto1andFLAGto516(lteredandunmapped);foranannotation,FLAGshouldbesetto
768(lteredandsecondary)withnorestrictiontoQNAME.Dummyreads for annotationwouldtypically
havean‘CT’tagtoholdtheannotationinformation,seeSection2.
10
Documents you may be interested
Documents you may be interested