|
kaori
A C++ library for barcode extraction and matching
|
Search for barcodes with segmented mismatches. More...
#include <MismatchTrie.hpp>
Classes | |
| struct | Result |
| Result of the segmented search. More... | |
Public Member Functions | |
| SegmentedMismatches ()=default | |
| SegmentedMismatches (std::array< SeqLength, num_segments_ > segments, DuplicateAction duplicates) | |
| TrieAddStatus | add (const char *barcode_seq) |
| SeqLength | length () const |
| BarcodeIndex | size () const |
| void | optimize () |
| Result | search (const char *search_seq, const std::array< int, num_segments_ > &max_mismatches) const |
Search for barcodes with segmented mismatches.
Given an input sequence, this class will perform a segmented mismatch-aware search to a pool of known barcode sequences. Specifically, the sequence interval is split into multiple segments where a barcode is only considered to be matching the input if the number of mismatches in each segment is no greater than a segment-specific threshold, e.g., 1 mismatch in the first 4 bp, 3 mismatches for the next 10 bp, and so on. The aim is to enable searching for concatenations of sequences from multiple variable regions, where each segment is subject to a different number of mismatches. The barcode with the fewest total mismatches to the input sequence is then returned.
| num_segments_ | Number of segments to consider. |
|
default |
Default constructor. This is only provided to enable composition, the resulting object should not be used until it is copy-assigned to a properly constructed instance.
|
inline |
| segments | Length of each segment of the sequence. Each entry should be positive and the sum should be equal to the total length of the barcode sequence. |
| duplicates | How duplicate sequences across add() calls should be handled. |
|
inline |
| [in] | barcode_seq | Pointer to a character array containing a barcode sequence. The array should have length equal to length() and should only contain IUPAC nucleotides or their lower-case equivalents (excepting U or gap characters). |
|
inline |
|
inline |
Attempt to optimize the trie for more cache-friendly look-ups. This is not necessary if sorted sequences are supplied in add().
|
inline |
| [in] | search_seq | Pointer to a character array of length equal to length(), containing an input sequence to search against the barcode pool. |
| max_mismatches | Maximum number of mismatches for each segment. Each entry should be non-negative. |
|
inline |