There are a number of things you can do with Longbow’s index file. For example, if you can tell how much data is in a BAM file, you could have Longbow find the first read in the file, get the id of the read, and then use that read’s id as a key into a quicksort-like list. Once you have those sorted reads up front, Longbow can just focus on extracting the necessary read properties, processing the reads, and then extracting the base calls from the reads. This can help dramatically in the processing of Illumina adapter-trimmed reads that don’t meet the minimum length requirements for Ion Torrent chips, because short reads complicate getting to the point of reading off the targeted bases.
Thus the.pbi index can streamline the processing of datasets that have obsolete and/or irregularly spaced reads. The Power repertoire of Longbow is extensive and is used for a full spectrum of bacterial, viral, and other nucleotide data.
4.1 Indexing BAM Inputs In addition to indexing long sequences, Longbow can also read their metadata (e.g., their accession numbers) and store this data in a text file. This analysis can be performed on a longer range of sequence types in addition to simple human-generated sequences (e.g., microbial isolates, plasmids, or viral subspecies). Unlike the.pbi index file, this analysis doesn't require all of those sequences to be present as independent objects; rather, the boostrapping of the analysis over a large number of sequences is possible. The advantage of this mode is that it may be more useful for situations where you're looking at samples that are raw sequence (e.g., FASTA, FASTQ, or FASTQ '.fq') files, rather than specifically BAM files. Once the.txt file is generated, you can choose to index the alignments, alignments without reference sequences, reference sequences alone, or all of the above. For example: d2c66b5586