lintdb package

Submodules

lintdb.lintdb module

class lintdb.lintdb.Collection(index, opts)

Bases: object

Collection is a collection of documents. Instead of dealing directly with vectors, this class allows you to add and search for documents by text.

add(tenant, id, text, metadata)

Add a text document to the index.

Parameters:
tenant : int

The tenant id.

id : int

The document id.

text : string

The text to add.

search(*args)

Search the index for similar documents.

Parameters:
tenant : int

The tenant id.

text : string

The text to search for.

k : int

The number of results to return.

opts : SearchOptions, optional

Any search options to use.

property thisown

The membership flag

train(texts)
class lintdb.lintdb.CollectionOptions

Bases: object

property max_length
property model_file
property thisown

The membership flag

property tokenizer_file
class lintdb.lintdb.Configuration

Bases: object

Configuration of the Index.

property dim

the number of iterations to use during training.

property lintdb_version
property nbits

the number of centroids to train.

property niter

the number of bits to use in residual compression.

property nlist

the current version of the index. Used internally for feature compatibility.

property num_subquantizers

the dimensions expected for incoming vectors.

property quantizer_type

the number of subquantizers to use in the product quantizer.

property thisown

The membership flag

class lintdb.lintdb.FloatVector(*args)

Bases: object

append(x)
assign(n, x)
back()
begin()
capacity()
clear()
empty()
end()
erase(*args)
front()
get_allocator()
insert(*args)
iterator()
pop()
pop_back()
push_back(x)
rbegin()
rend()
reserve(n)
resize(*args)
size()
swap(v)
property thisown

The membership flag

class lintdb.lintdb.IdxVector(*args)

Bases: object

append(x)
assign(n, x)
back()
begin()
capacity()
clear()
empty()
end()
erase(*args)
front()
get_allocator()
insert(*args)
iterator()
pop()
pop_back()
push_back(x)
rbegin()
rend()
reserve(n)
resize(*args)
size()
swap(v)
property thisown

The membership flag

class lintdb.lintdb.IndexIVF(*args)

Bases: object

IndexIVF is a multi vector index with an inverted file structure.

This relies on pretrained centroids to accurately retrieve the closest documents.

add(tenant, docs)

Add will add a block of embeddings to the index.

Parameters:
tenant : int

the tenant to assign the document to.

docs : std::vector< lintdb::RawPassage,std::allocator< lintdb::RawPassage > >

a vector of RawPassages. This includes embeddings and ids.

add_single(tenant, doc)

Add a single document.

property config
merge(path)

Merge will combine the index with another index.

We verify that the configuration of each index is correct, but this doesn’t prevent you from merging indices with different centroids. There will be subtle ways for this to break, but this can enable easier multiprocess building of indices.

property read_only
remove(tenant, ids)

Remove deletes documents from the index by id.

void remove(const std::vector<int64_t>& ids) works if SWIG complains about idx_t.

save()

Index should be able to resume from a previous state. Any quantization and compression will be saved within the Index’s path.

Inverted lists are persisted to the database.

search(*args)
set_centroids(data)

set_centroids overwrites the centroids in the encoder.

This is useful if you want to parallelize index writing and merge indices later.

set_weights(weights, cutoffs, avg_residual)

set_weights overwrites the compression weights in the encoder, if using compression.

property thisown

The membership flag

train(embeddings)
update(tenant, docs)

Update is a convenience function for remove and add.

class lintdb.lintdb.MetadataMap(*args)

Bases: object

asdict()
begin()
clear()
count(x)
empty()
end()
erase(*args)
find(x)
get_allocator()
has_key(key)
items()
iterator()
iteritems()
iterkeys()
itervalues()
key_iterator()
keys()
lower_bound(x)
rbegin()
rend()
size()
swap(v)
property thisown

The membership flag

upper_bound(x)
value_iterator()
values()
class lintdb.lintdb.RawPassage(*args)

Bases: object

RawPassage is a simple struct to hold the raw passage data.

This represents a document before it’s indexed.

property embedding_block

embedding_block contains the document’s embeddings. this is an array, and can be any number of embeddings, but they’ll all be indexed together.

property id

id is a unique identifier for the document or passage. it must be an integer. we enable document ids to be strings that we can lookup after retrieval.

property metadata
property thisown

The membership flag

class lintdb.lintdb.RawPassageConstPtrVector(*args)

Bases: object

append(x)
assign(n, x)
back()
begin()
capacity()
clear()
empty()
end()
erase(*args)
front()
get_allocator()
insert(*args)
iterator()
pop()
pop_back()
push_back(x)
rbegin()
rend()
reserve(n)
resize(*args)
size()
swap(v)
property thisown

The membership flag

class lintdb.lintdb.RawPassagePtrVector(*args)

Bases: object

append(x)
assign(n, x)
back()
begin()
capacity()
clear()
empty()
end()
erase(*args)
front()
get_allocator()
insert(*args)
iterator()
pop()
pop_back()
push_back(x)
rbegin()
rend()
reserve(n)
resize(*args)
size()
swap(v)
property thisown

The membership flag

class lintdb.lintdb.RawPassageVector(*args)

Bases: object

append(x)
assign(n, x)
back()
begin()
capacity()
clear()
empty()
end()
erase(*args)
front()
get_allocator()
insert(*args)
iterator()
pop()
pop_back()
push_back(x)
rbegin()
rend()
reserve(n)
resize(*args)
size()
swap(v)
property thisown

The membership flag

class lintdb.lintdb.SearchOptions

Bases: object

SearchOptions enables custom searching behavior.

These options expose ways to tradeoff recall and latency at different levels of retrieval. Searching more centroids: - decrease centroid_score_threshold and increase k_top_centroids. - increase n_probe in search()

Decreasing latency: - increase centroid_score_threshold and decrease k_top_centroids. - decrease n_probe in search()

property centroid_score_threshold

expects a document id in the return result. prints additional information during execution. useful for debugging.

property expected_id
property k_top_centroids

the threshold for centroid scores.

property n_probe

the number of second pass candidates to consider.

property num_second_pass

the number of top centroids to consider per token.

property thisown

The membership flag

class lintdb.lintdb.SearchResult

Bases: object

SearchResult is a simple struct to hold the results of a search.

property id
property metadata

the final score as determined by the database.

property score

the document’s id.

property thisown

The membership flag

class lintdb.lintdb.SearchResultVector(*args)

Bases: object

append(x)
assign(n, x)
back()
begin()
capacity()
clear()
empty()
end()
erase(*args)
front()
get_allocator()
insert(*args)
iterator()
pop()
pop_back()
push_back(x)
rbegin()
rend()
reserve(n)
resize(*args)
size()
swap(v)
property thisown

The membership flag

class lintdb.lintdb.SwigPyIterator(*args, **kwargs)

Bases: object

advance(n)
copy()
decr(n=1)
distance(x)
equal(x)
incr(n=1)
next()
previous()
property thisown

The membership flag

value()

Module contents

class lintdb.Collection(index, opts)

Bases: object

Collection is a collection of documents. Instead of dealing directly with vectors, this class allows you to add and search for documents by text.

add(tenant, id, text, metadata)

Add a text document to the index.

Parameters:
tenant : int

The tenant id.

id : int

The document id.

text : string

The text to add.

search(*args)

Search the index for similar documents.

Parameters:
tenant : int

The tenant id.

text : string

The text to search for.

k : int

The number of results to return.

opts : SearchOptions, optional

Any search options to use.

property thisown

The membership flag

train(texts)
class lintdb.CollectionOptions

Bases: object

property max_length
property model_file
property thisown

The membership flag

property tokenizer_file
class lintdb.Configuration

Bases: object

Configuration of the Index.

property dim

the number of iterations to use during training.

property lintdb_version
property nbits

the number of centroids to train.

property niter

the number of bits to use in residual compression.

property nlist

the current version of the index. Used internally for feature compatibility.

property num_subquantizers

the dimensions expected for incoming vectors.

property quantizer_type

the number of subquantizers to use in the product quantizer.

property thisown

The membership flag

class lintdb.IndexIVF(*args)

Bases: object

IndexIVF is a multi vector index with an inverted file structure.

This relies on pretrained centroids to accurately retrieve the closest documents.

add(tenant, docs)

Add will add a block of embeddings to the index.

Parameters:
tenant : int

the tenant to assign the document to.

docs : std::vector< lintdb::RawPassage,std::allocator< lintdb::RawPassage > >

a vector of RawPassages. This includes embeddings and ids.

add_single(tenant, doc)

Add a single document.

property config
merge(path)

Merge will combine the index with another index.

We verify that the configuration of each index is correct, but this doesn’t prevent you from merging indices with different centroids. There will be subtle ways for this to break, but this can enable easier multiprocess building of indices.

property read_only
remove(tenant, ids)

Remove deletes documents from the index by id.

void remove(const std::vector<int64_t>& ids) works if SWIG complains about idx_t.

save()

Index should be able to resume from a previous state. Any quantization and compression will be saved within the Index’s path.

Inverted lists are persisted to the database.

search(*args)
set_centroids(data)

set_centroids overwrites the centroids in the encoder.

This is useful if you want to parallelize index writing and merge indices later.

set_weights(weights, cutoffs, avg_residual)

set_weights overwrites the compression weights in the encoder, if using compression.

property thisown

The membership flag

train(embeddings)
update(tenant, docs)

Update is a convenience function for remove and add.

class lintdb.RawPassage(*args)

Bases: object

RawPassage is a simple struct to hold the raw passage data.

This represents a document before it’s indexed.

property embedding_block

embedding_block contains the document’s embeddings. this is an array, and can be any number of embeddings, but they’ll all be indexed together.

property id

id is a unique identifier for the document or passage. it must be an integer. we enable document ids to be strings that we can lookup after retrieval.

property metadata
property thisown

The membership flag

class lintdb.SearchOptions

Bases: object

SearchOptions enables custom searching behavior.

These options expose ways to tradeoff recall and latency at different levels of retrieval. Searching more centroids: - decrease centroid_score_threshold and increase k_top_centroids. - increase n_probe in search()

Decreasing latency: - increase centroid_score_threshold and decrease k_top_centroids. - decrease n_probe in search()

property centroid_score_threshold

expects a document id in the return result. prints additional information during execution. useful for debugging.

property expected_id
property k_top_centroids

the threshold for centroid scores.

property n_probe

the number of second pass candidates to consider.

property num_second_pass

the number of top centroids to consider per token.

property thisown

The membership flag