lintdb package¶
Submodules¶
lintdb.lintdb module¶
- class lintdb.lintdb.Collection(index, opts)¶
Bases:
object
Collection is a collection of documents. Instead of dealing directly with vectors, this class allows you to add and search for documents by text.
- search(*args)¶
Search the index for similar documents.
- Parameters:¶
- tenant : int
The tenant id.
- text : string
The text to search for.
- k : int
The number of results to return.
- opts :
SearchOptions
, optional Any search options to use.
- property thisown¶
The membership flag
- train(texts)¶
- class lintdb.lintdb.CollectionOptions¶
Bases:
object
- property max_length¶
- property model_file¶
- property thisown¶
The membership flag
- property tokenizer_file¶
- class lintdb.lintdb.Configuration¶
Bases:
object
Configuration of the Index.
- property dim¶
the number of iterations to use during training.
- property lintdb_version¶
- property nbits¶
the number of centroids to train.
- property niter¶
the number of bits to use in residual compression.
- property nlist¶
the current version of the index. Used internally for feature compatibility.
- property num_subquantizers¶
the dimensions expected for incoming vectors.
- property quantizer_type¶
the number of subquantizers to use in the product quantizer.
- property thisown¶
The membership flag
- class lintdb.lintdb.FloatVector(*args)¶
Bases:
object
- append(x)¶
- assign(n, x)¶
- back()¶
- begin()¶
- capacity()¶
- clear()¶
- empty()¶
- end()¶
- erase(*args)¶
- front()¶
- get_allocator()¶
- insert(*args)¶
- iterator()¶
- pop()¶
- pop_back()¶
- push_back(x)¶
- rbegin()¶
- rend()¶
- reserve(n)¶
- resize(*args)¶
- size()¶
- swap(v)¶
- property thisown¶
The membership flag
- class lintdb.lintdb.IdxVector(*args)¶
Bases:
object
- append(x)¶
- assign(n, x)¶
- back()¶
- begin()¶
- capacity()¶
- clear()¶
- empty()¶
- end()¶
- erase(*args)¶
- front()¶
- get_allocator()¶
- insert(*args)¶
- iterator()¶
- pop()¶
- pop_back()¶
- push_back(x)¶
- rbegin()¶
- rend()¶
- reserve(n)¶
- resize(*args)¶
- size()¶
- swap(v)¶
- property thisown¶
The membership flag
- class lintdb.lintdb.IndexIVF(*args)¶
Bases:
object
IndexIVF is a multi vector index with an inverted file structure.
This relies on pretrained centroids to accurately retrieve the closest documents.
- add_single(tenant, doc)¶
Add a single document.
- property config¶
- merge(path)¶
Merge will combine the index with another index.
We verify that the configuration of each index is correct, but this doesn’t prevent you from merging indices with different centroids. There will be subtle ways for this to break, but this can enable easier multiprocess building of indices.
- property read_only¶
- remove(tenant, ids)¶
Remove deletes documents from the index by id.
void remove(const std::vector<int64_t>& ids) works if SWIG complains about idx_t.
- save()¶
Index should be able to resume from a previous state. Any quantization and compression will be saved within the Index’s path.
Inverted lists are persisted to the database.
- search(*args)¶
- set_centroids(data)¶
set_centroids overwrites the centroids in the encoder.
This is useful if you want to parallelize index writing and merge indices later.
- set_weights(weights, cutoffs, avg_residual)¶
set_weights overwrites the compression weights in the encoder, if using compression.
- property thisown¶
The membership flag
- train(embeddings)¶
- update(tenant, docs)¶
Update is a convenience function for remove and add.
- class lintdb.lintdb.MetadataMap(*args)¶
Bases:
object
- asdict()¶
- begin()¶
- clear()¶
- count(x)¶
- empty()¶
- end()¶
- erase(*args)¶
- find(x)¶
- get_allocator()¶
- has_key(key)¶
- items()¶
- iterator()¶
- iteritems()¶
- iterkeys()¶
- itervalues()¶
- key_iterator()¶
- keys()¶
- lower_bound(x)¶
- rbegin()¶
- rend()¶
- size()¶
- swap(v)¶
- property thisown¶
The membership flag
- upper_bound(x)¶
- value_iterator()¶
- values()¶
- class lintdb.lintdb.RawPassage(*args)¶
Bases:
object
RawPassage is a simple struct to hold the raw passage data.
This represents a document before it’s indexed.
- property embedding_block¶
embedding_block contains the document’s embeddings. this is an array, and can be any number of embeddings, but they’ll all be indexed together.
- property id¶
id is a unique identifier for the document or passage. it must be an integer. we enable document ids to be strings that we can lookup after retrieval.
- property metadata¶
- property thisown¶
The membership flag
- class lintdb.lintdb.RawPassageConstPtrVector(*args)¶
Bases:
object
- append(x)¶
- assign(n, x)¶
- back()¶
- begin()¶
- capacity()¶
- clear()¶
- empty()¶
- end()¶
- erase(*args)¶
- front()¶
- get_allocator()¶
- insert(*args)¶
- iterator()¶
- pop()¶
- pop_back()¶
- push_back(x)¶
- rbegin()¶
- rend()¶
- reserve(n)¶
- resize(*args)¶
- size()¶
- swap(v)¶
- property thisown¶
The membership flag
- class lintdb.lintdb.RawPassagePtrVector(*args)¶
Bases:
object
- append(x)¶
- assign(n, x)¶
- back()¶
- begin()¶
- capacity()¶
- clear()¶
- empty()¶
- end()¶
- erase(*args)¶
- front()¶
- get_allocator()¶
- insert(*args)¶
- iterator()¶
- pop()¶
- pop_back()¶
- push_back(x)¶
- rbegin()¶
- rend()¶
- reserve(n)¶
- resize(*args)¶
- size()¶
- swap(v)¶
- property thisown¶
The membership flag
- class lintdb.lintdb.RawPassageVector(*args)¶
Bases:
object
- append(x)¶
- assign(n, x)¶
- back()¶
- begin()¶
- capacity()¶
- clear()¶
- empty()¶
- end()¶
- erase(*args)¶
- front()¶
- get_allocator()¶
- insert(*args)¶
- iterator()¶
- pop()¶
- pop_back()¶
- push_back(x)¶
- rbegin()¶
- rend()¶
- reserve(n)¶
- resize(*args)¶
- size()¶
- swap(v)¶
- property thisown¶
The membership flag
- class lintdb.lintdb.SearchOptions¶
Bases:
object
SearchOptions enables custom searching behavior.
These options expose ways to tradeoff recall and latency at different levels of retrieval. Searching more centroids: - decrease centroid_score_threshold and increase k_top_centroids. - increase n_probe in search()
Decreasing latency: - increase centroid_score_threshold and decrease k_top_centroids. - decrease n_probe in search()
- property centroid_score_threshold¶
expects a document id in the return result. prints additional information during execution. useful for debugging.
- property expected_id¶
- property k_top_centroids¶
the threshold for centroid scores.
- property n_probe¶
the number of second pass candidates to consider.
- property num_second_pass¶
the number of top centroids to consider per token.
- property thisown¶
The membership flag
- class lintdb.lintdb.SearchResult¶
Bases:
object
SearchResult is a simple struct to hold the results of a search.
- property id¶
- property metadata¶
the final score as determined by the database.
- property score¶
the document’s id.
- property thisown¶
The membership flag
- class lintdb.lintdb.SearchResultVector(*args)¶
Bases:
object
- append(x)¶
- assign(n, x)¶
- back()¶
- begin()¶
- capacity()¶
- clear()¶
- empty()¶
- end()¶
- erase(*args)¶
- front()¶
- get_allocator()¶
- insert(*args)¶
- iterator()¶
- pop()¶
- pop_back()¶
- push_back(x)¶
- rbegin()¶
- rend()¶
- reserve(n)¶
- resize(*args)¶
- size()¶
- swap(v)¶
- property thisown¶
The membership flag
- class lintdb.lintdb.SwigPyIterator(*args, **kwargs)¶
Bases:
object
- advance(n)¶
- copy()¶
- decr(n=1)¶
- distance(x)¶
- equal(x)¶
- incr(n=1)¶
- next()¶
- previous()¶
- property thisown¶
The membership flag
- value()¶
Module contents¶
- class lintdb.Collection(index, opts)¶
Bases:
object
Collection is a collection of documents. Instead of dealing directly with vectors, this class allows you to add and search for documents by text.
- search(*args)¶
Search the index for similar documents.
- Parameters:¶
- tenant : int
The tenant id.
- text : string
The text to search for.
- k : int
The number of results to return.
- opts :
SearchOptions
, optional Any search options to use.
- property thisown¶
The membership flag
- train(texts)¶
- class lintdb.CollectionOptions¶
Bases:
object
- property max_length¶
- property model_file¶
- property thisown¶
The membership flag
- property tokenizer_file¶
- class lintdb.Configuration¶
Bases:
object
Configuration of the Index.
- property dim¶
the number of iterations to use during training.
- property lintdb_version¶
- property nbits¶
the number of centroids to train.
- property niter¶
the number of bits to use in residual compression.
- property nlist¶
the current version of the index. Used internally for feature compatibility.
- property num_subquantizers¶
the dimensions expected for incoming vectors.
- property quantizer_type¶
the number of subquantizers to use in the product quantizer.
- property thisown¶
The membership flag
- class lintdb.IndexIVF(*args)¶
Bases:
object
IndexIVF is a multi vector index with an inverted file structure.
This relies on pretrained centroids to accurately retrieve the closest documents.
- add_single(tenant, doc)¶
Add a single document.
- property config¶
- merge(path)¶
Merge will combine the index with another index.
We verify that the configuration of each index is correct, but this doesn’t prevent you from merging indices with different centroids. There will be subtle ways for this to break, but this can enable easier multiprocess building of indices.
- property read_only¶
- remove(tenant, ids)¶
Remove deletes documents from the index by id.
void remove(const std::vector<int64_t>& ids) works if SWIG complains about idx_t.
- save()¶
Index should be able to resume from a previous state. Any quantization and compression will be saved within the Index’s path.
Inverted lists are persisted to the database.
- search(*args)¶
- set_centroids(data)¶
set_centroids overwrites the centroids in the encoder.
This is useful if you want to parallelize index writing and merge indices later.
- set_weights(weights, cutoffs, avg_residual)¶
set_weights overwrites the compression weights in the encoder, if using compression.
- property thisown¶
The membership flag
- train(embeddings)¶
- update(tenant, docs)¶
Update is a convenience function for remove and add.
- class lintdb.RawPassage(*args)¶
Bases:
object
RawPassage is a simple struct to hold the raw passage data.
This represents a document before it’s indexed.
- property embedding_block¶
embedding_block contains the document’s embeddings. this is an array, and can be any number of embeddings, but they’ll all be indexed together.
- property id¶
id is a unique identifier for the document or passage. it must be an integer. we enable document ids to be strings that we can lookup after retrieval.
- property metadata¶
- property thisown¶
The membership flag
- class lintdb.SearchOptions¶
Bases:
object
SearchOptions enables custom searching behavior.
These options expose ways to tradeoff recall and latency at different levels of retrieval. Searching more centroids: - decrease centroid_score_threshold and increase k_top_centroids. - increase n_probe in search()
Decreasing latency: - increase centroid_score_threshold and decrease k_top_centroids. - decrease n_probe in search()
- property centroid_score_threshold¶
expects a document id in the return result. prints additional information during execution. useful for debugging.
- property expected_id¶
- property k_top_centroids¶
the threshold for centroid scores.
- property n_probe¶
the number of second pass candidates to consider.
- property num_second_pass¶
the number of top centroids to consider per token.
- property thisown¶
The membership flag