lintdb package¶

Submodules¶

lintdb.lintdb module¶

class lintdb.lintdb.Collection(index, opts)¶

Bases: object

Collection is a collection of documents. Instead of dealing directly with vectors, this class allows you to add and search for documents by text.

add(tenant, id, text, metadata)¶

Add a text document to the index.

Parameters:¶

tenant : int¶: The tenant id.
id : int¶: The document id.
text : string¶: The text to add.

search(*args)¶

Search the index for similar documents.

Parameters:¶

tenant : int: The tenant id.
text : string: The text to search for.
k : int: The number of results to return.
opts : SearchOptions, optional: Any search options to use.

property thisown¶: The membership flag

train(texts)¶

class lintdb.lintdb.CollectionOptions¶

Bases: object

property max_length¶

property model_file¶

property thisown¶: The membership flag

property tokenizer_file¶

class lintdb.lintdb.Configuration¶

Bases: object

Configuration of the Index.

property dim¶: the number of iterations to use during training.

property lintdb_version¶

property nbits¶: the number of centroids to train.

property niter¶: the number of bits to use in residual compression.

property nlist¶: the current version of the index. Used internally for feature compatibility.

property num_subquantizers¶: the dimensions expected for incoming vectors.

property quantizer_type¶: the number of subquantizers to use in the product quantizer.

property thisown¶: The membership flag

class lintdb.lintdb.FloatVector(*args)¶

Bases: object

append(x)¶

assign(n, x)¶

back()¶

begin()¶

capacity()¶

clear()¶

empty()¶

end()¶

erase(*args)¶

front()¶

get_allocator()¶

insert(*args)¶

iterator()¶

pop()¶

pop_back()¶

push_back(x)¶

rbegin()¶

rend()¶

reserve(n)¶

resize(*args)¶

size()¶

swap(v)¶

property thisown¶: The membership flag

class lintdb.lintdb.IdxVector(*args)¶

Bases: object

append(x)¶

assign(n, x)¶

back()¶

begin()¶

capacity()¶

clear()¶

empty()¶

end()¶

erase(*args)¶

front()¶

get_allocator()¶

insert(*args)¶

iterator()¶

pop()¶

pop_back()¶

push_back(x)¶

rbegin()¶

rend()¶

reserve(n)¶

resize(*args)¶

size()¶

swap(v)¶

property thisown¶: The membership flag

class lintdb.lintdb.IndexIVF(*args)¶

Bases: object

IndexIVF is a multi vector index with an inverted file structure.

This relies on pretrained centroids to accurately retrieve the closest documents.

add(tenant, docs)¶

Add will add a block of embeddings to the index.

Parameters:¶

tenant : int¶: the tenant to assign the document to.
docs : std::vector< lintdb::RawPassage,std::allocator< lintdb::RawPassage > >¶: a vector of RawPassages. This includes embeddings and ids.

add_single(tenant, doc)¶: Add a single document.

property config¶

merge(path)¶

Merge will combine the index with another index.

We verify that the configuration of each index is correct, but this doesn’t prevent you from merging indices with different centroids. There will be subtle ways for this to break, but this can enable easier multiprocess building of indices.

property read_only¶

remove(tenant, ids)¶

Remove deletes documents from the index by id.

void remove(const std::vector<int64_t>& ids) works if SWIG complains about idx_t.

save()¶

Index should be able to resume from a previous state. Any quantization and compression will be saved within the Index’s path.

Inverted lists are persisted to the database.

search(*args)¶

set_centroids(data)¶

set_centroids overwrites the centroids in the encoder.

This is useful if you want to parallelize index writing and merge indices later.

set_weights(weights, cutoffs, avg_residual)¶: set_weights overwrites the compression weights in the encoder, if using compression.

property thisown¶: The membership flag

train(embeddings)¶

update(tenant, docs)¶: Update is a convenience function for remove and add.

class lintdb.lintdb.MetadataMap(*args)¶

Bases: object

asdict()¶

begin()¶

clear()¶

count(x)¶

empty()¶

end()¶

erase(*args)¶

find(x)¶

get_allocator()¶

has_key(key)¶

items()¶

iterator()¶

iteritems()¶

iterkeys()¶

itervalues()¶

key_iterator()¶

keys()¶

lower_bound(x)¶

rbegin()¶

rend()¶

size()¶

swap(v)¶

property thisown¶: The membership flag

upper_bound(x)¶

value_iterator()¶

values()¶

class lintdb.lintdb.RawPassage(*args)¶

Bases: object

RawPassage is a simple struct to hold the raw passage data.

This represents a document before it’s indexed.

property embedding_block¶: embedding_block contains the document’s embeddings. this is an array, and can be any number of embeddings, but they’ll all be indexed together.

property id¶: id is a unique identifier for the document or passage. it must be an integer. we enable document ids to be strings that we can lookup after retrieval.

property metadata¶

property thisown¶: The membership flag

class lintdb.lintdb.RawPassageConstPtrVector(*args)¶

Bases: object

append(x)¶

assign(n, x)¶

back()¶

begin()¶

capacity()¶

clear()¶

empty()¶

end()¶

erase(*args)¶

front()¶

get_allocator()¶

insert(*args)¶

iterator()¶

pop()¶

pop_back()¶

push_back(x)¶

rbegin()¶

rend()¶

reserve(n)¶

resize(*args)¶

size()¶

swap(v)¶

property thisown¶: The membership flag

class lintdb.lintdb.RawPassagePtrVector(*args)¶

Bases: object

append(x)¶

assign(n, x)¶

back()¶

begin()¶

capacity()¶

clear()¶

empty()¶

end()¶

erase(*args)¶

front()¶

get_allocator()¶

insert(*args)¶

iterator()¶

pop()¶

pop_back()¶

push_back(x)¶

rbegin()¶

rend()¶

reserve(n)¶

resize(*args)¶

size()¶

swap(v)¶

property thisown¶: The membership flag

class lintdb.lintdb.RawPassageVector(*args)¶

Bases: object

append(x)¶

assign(n, x)¶

back()¶

begin()¶

capacity()¶

clear()¶

empty()¶

end()¶

erase(*args)¶

front()¶

get_allocator()¶

insert(*args)¶

iterator()¶

pop()¶

pop_back()¶

push_back(x)¶

rbegin()¶

rend()¶

reserve(n)¶

resize(*args)¶

size()¶

swap(v)¶

property thisown¶: The membership flag

class lintdb.lintdb.SearchOptions¶

Bases: object

SearchOptions enables custom searching behavior.

These options expose ways to tradeoff recall and latency at different levels of retrieval. Searching more centroids: - decrease centroid_score_threshold and increase k_top_centroids. - increase n_probe in search()

Decreasing latency: - increase centroid_score_threshold and decrease k_top_centroids. - decrease n_probe in search()

property centroid_score_threshold¶: expects a document id in the return result. prints additional information during execution. useful for debugging.

property expected_id¶

property k_top_centroids¶: the threshold for centroid scores.

property n_probe¶: the number of second pass candidates to consider.

property num_second_pass¶: the number of top centroids to consider per token.

property thisown¶: The membership flag

class lintdb.lintdb.SearchResult¶

Bases: object

SearchResult is a simple struct to hold the results of a search.

property id¶

property metadata¶: the final score as determined by the database.

property score¶: the document’s id.

property thisown¶: The membership flag

class lintdb.lintdb.SearchResultVector(*args)¶

Bases: object

append(x)¶

assign(n, x)¶

back()¶

begin()¶

capacity()¶

clear()¶

empty()¶

end()¶

erase(*args)¶

front()¶

get_allocator()¶

insert(*args)¶

iterator()¶

pop()¶

pop_back()¶

push_back(x)¶

rbegin()¶

rend()¶

reserve(n)¶

resize(*args)¶

size()¶

swap(v)¶

property thisown¶: The membership flag

class lintdb.lintdb.SwigPyIterator(*args, **kwargs)¶

Bases: object

advance(n)¶

copy()¶

decr(n=1)¶

distance(x)¶

equal(x)¶

incr(n=1)¶

next()¶

previous()¶

property thisown¶: The membership flag

value()¶

Module contents¶

class lintdb.Collection(index, opts)¶

Bases: object

Collection is a collection of documents. Instead of dealing directly with vectors, this class allows you to add and search for documents by text.

add(tenant, id, text, metadata)¶

Add a text document to the index.

Parameters:¶

tenant : int¶: The tenant id.
id : int¶: The document id.
text : string¶: The text to add.

search(*args)¶

Search the index for similar documents.

Parameters:¶

tenant : int: The tenant id.
text : string: The text to search for.
k : int: The number of results to return.
opts : SearchOptions, optional: Any search options to use.

property thisown¶: The membership flag

train(texts)¶

class lintdb.CollectionOptions¶

Bases: object

property max_length¶

property model_file¶

property thisown¶: The membership flag

property tokenizer_file¶

class lintdb.Configuration¶

Bases: object

Configuration of the Index.

property dim¶: the number of iterations to use during training.

property lintdb_version¶

property nbits¶: the number of centroids to train.

property niter¶: the number of bits to use in residual compression.

property nlist¶: the current version of the index. Used internally for feature compatibility.

property num_subquantizers¶: the dimensions expected for incoming vectors.

property quantizer_type¶: the number of subquantizers to use in the product quantizer.

property thisown¶: The membership flag

class lintdb.IndexIVF(*args)¶

Bases: object

IndexIVF is a multi vector index with an inverted file structure.

This relies on pretrained centroids to accurately retrieve the closest documents.

add(tenant, docs)¶

Add will add a block of embeddings to the index.

Parameters:¶

tenant : int¶: the tenant to assign the document to.
docs : std::vector< lintdb::RawPassage,std::allocator< lintdb::RawPassage > >¶: a vector of RawPassages. This includes embeddings and ids.

add_single(tenant, doc)¶: Add a single document.

property config¶

merge(path)¶

Merge will combine the index with another index.

property read_only¶

remove(tenant, ids)¶

Remove deletes documents from the index by id.

void remove(const std::vector<int64_t>& ids) works if SWIG complains about idx_t.

save()¶

Index should be able to resume from a previous state. Any quantization and compression will be saved within the Index’s path.

Inverted lists are persisted to the database.

search(*args)¶

set_centroids(data)¶

set_centroids overwrites the centroids in the encoder.

This is useful if you want to parallelize index writing and merge indices later.

set_weights(weights, cutoffs, avg_residual)¶: set_weights overwrites the compression weights in the encoder, if using compression.

property thisown¶: The membership flag

train(embeddings)¶

update(tenant, docs)¶: Update is a convenience function for remove and add.

class lintdb.RawPassage(*args)¶

Bases: object

RawPassage is a simple struct to hold the raw passage data.

This represents a document before it’s indexed.

property embedding_block¶: embedding_block contains the document’s embeddings. this is an array, and can be any number of embeddings, but they’ll all be indexed together.

property id¶: id is a unique identifier for the document or passage. it must be an integer. we enable document ids to be strings that we can lookup after retrieval.

property metadata¶

property thisown¶: The membership flag

class lintdb.SearchOptions¶

Bases: object

SearchOptions enables custom searching behavior.

Decreasing latency: - increase centroid_score_threshold and decrease k_top_centroids. - decrease n_probe in search()

property centroid_score_threshold¶: expects a document id in the return result. prints additional information during execution. useful for debugging.

property expected_id¶

property k_top_centroids¶: the threshold for centroid scores.

property n_probe¶: the number of second pass candidates to consider.

property num_second_pass¶: the number of top centroids to consider per token.

property thisown¶: The membership flag