kaori
A C++ library for barcode extraction and matching
|
Search for barcodes with segmented mismatches. More...
#include <MismatchTrie.hpp>
Classes | |
struct | Result |
Result of the segmented search. More... | |
Public Member Functions | |
SegmentedMismatches ()=default | |
SegmentedMismatches (std::array< SeqLength, num_segments_ > segments, DuplicateAction duplicates) | |
TrieAddStatus | add (const char *barcode_seq) |
SeqLength | length () const |
BarcodeIndex | size () const |
void | optimize () |
Result | search (const char *search_seq, const std::array< int, num_segments_ > &max_mismatches) const |
Search for barcodes with segmented mismatches.
Given an input sequence, this class will perform a segmented mismatch-aware search to a pool of known barcode sequences. Specifically, the sequence interval is split into multiple segments where a barcode is only considered to be matching the input if the number of mismatches in each segment is no greater than a segment-specific threshold, e.g., 1 mismatch in the first 4 bp, 3 mismatches for the next 10 bp, and so on. The aim is to enable searching for concatenations of sequences from multiple variable regions, where each segment is subject to a different number of mismatches. The barcode with the fewest total mismatches to the input sequence is then returned.
num_segments_ | Number of segments to consider. |
|
default |
Default constructor. This is only provided to enable composition, the resulting object should not be used until it is copy-assigned to a properly constructed instance.
|
inline |
segments | Length of each segment of the sequence. Each entry should be positive and the sum should be equal to the total length of the barcode sequence. |
duplicates | How duplicate sequences across add() calls should be handled. |
|
inline |
[in] | barcode_seq | Pointer to a character array containing a barcode sequence. The array should have length equal to length() and should only contain IUPAC nucleotides or their lower-case equivalents (excepting U or gap characters). |
|
inline |
|
inline |
Attempt to optimize the trie for more cache-friendly look-ups. This is not necessary if sorted sequences are supplied in add()
.
|
inline |
[in] | search_seq | Pointer to a character array of length equal to length() , containing an input sequence to search against the barcode pool. |
max_mismatches | Maximum number of mismatches for each segment. Each entry should be non-negative. |
|
inline |