Reference¤

core ¤

ColBERTContextData ¤

ColBERTContextData()

ColBERT context data

Default constructor

doc_codes `property` ¤

doc_codes

Document codes

doc_residuals `property` ¤

doc_residuals

Document residuals

Configuration ¤

Configuration()

Configuration for the index

Default constructor

lintdb_version `property` ¤

lintdb_version

LintDB version

DataType ¤

Bases: enum.Enum

COLBERT ¤

COLBERT = DataType.COLBERT

Colbert data type

DATETIME ¤

DATETIME = DataType.DATETIME

Datetime data type

FLOAT ¤

FLOAT = DataType.FLOAT

Float data type

INTEGER ¤

INTEGER = DataType.INTEGER

Integer data type

QUANTIZED_TENSOR ¤

QUANTIZED_TENSOR = DataType.QUANTIZED_TENSOR

Quantized tensor data type

TENSOR ¤

TENSOR = DataType.TENSOR

Tensor data type

TEXT ¤

TEXT = DataType.TEXT

Text data type

DateTime ¤

DateTime()

DateTime type

Default constructor

from_python ¤

from_python()

from_python(arg: object, /) -> lintdb.core.DateTime

Convert from Python datetime

to_python `method descriptor` ¤

to_python()

to_python(self) -> object

Convert to Python datetime

Document ¤

Document()

Document for storing multiple fields and a unique ID.

Constructor with ID and fields.

:param id: Unique ID of the document. :param fields: List of FieldValue objects.

fields `property` ¤

fields

List of FieldValue objects in the document.

id `property` ¤

id

Unique ID of the document.

Duration ¤

Duration(*args, **kwargs)

Duration type

FieldParameters ¤

FieldParameters()

Field parameters for configuration

Default constructor

analyzer `property` ¤

analyzer

Analyzer type

dimensions `property` ¤

dimensions

Number of dimensions

nbits `property` ¤

nbits

Number of bits

num_centroids `property` ¤

num_centroids

Number of centroids

num_iterations `property` ¤

num_iterations

Number of iterations

num_subquantizers `property` ¤

num_subquantizers

Number of subquantizers

quantization `property` ¤

quantization

Quantization type

FieldType ¤

Bases: enum.Enum

Colbert ¤

Colbert = FieldType.Colbert

Colbert field type

Context ¤

Context = FieldType.Context

Context field type

Indexed ¤

Indexed = FieldType.Indexed

Indexed field type

Stored ¤

Stored = FieldType.Stored

Stored field type

FieldValue ¤

FieldValue()

FieldValue for storing different types of data.

init(self, name: str, value: float) -> None init(self, name: str, value: str) -> None init(self, name: str, value: lintdb.core.DateTime) -> None init(self, name: str, value: collections.abc.Sequence[float]) -> None init(self, name: str, value: collections.abc.Sequence[float], num_tensors: int) -> None init(self, name: str, value: collections.abc.Sequence[int], num_tensors: int) -> None init(self, name: str, value: lintdb.core.ColBERTContextData, num_tensors: int) -> None

Overloaded function.

__init__(self) -> None

Default constructor.

__init__(self, name: str, value: int) -> None

Constructor with integer value.

:param name: Field name. :param value: Integer value.

__init__(self, name: str, value: float) -> None

Constructor with float value.

:param name: Field name. :param value: Float value.

__init__(self, name: str, value: str) -> None

Constructor with string value.

:param name: Field name. :param value: String value.

__init__(self, name: str, value: lintdb.core.DateTime) -> None

Constructor with DateTime value.

:param name: Field name. :param value: DateTime value.

__init__(self, name: str, value: collections.abc.Sequence[float]) -> None

Constructor with Tensor value.

:param name: Field name. :param value: Tensor value.

__init__(self, name: str, value: collections.abc.Sequence[float], num_tensors: int) -> None

Constructor with Tensor value and number of tensors.

:param name: Field name. :param value: Tensor value. :param num_tensors: Number of tensors.

__init__(self, name: str, value: collections.abc.Sequence[int], num_tensors: int) -> None

Constructor with QuantizedTensor value and number of tensors.

:param name: Field name. :param value: QuantizedTensor value. :param num_tensors: Number of tensors.

__init__(self, name: str, value: lintdb.core.ColBERTContextData, num_tensors: int) -> None

Constructor with ColBERTContextData value and number of tensors.

:param name: Field name. :param value: ColBERTContextData value. :param num_tensors: Number of tensors.

data_type `property` ¤

data_type

Field data type.

name `property` ¤

name

Field name.

num_tensors `property` ¤

num_tensors

Number of tensors.

value `property` ¤

value

Field value.

ICoarseQuantizer ¤

ICoarseQuantizer(*args, **kwargs)

Abstract ICoarseQuantizer class providing an interface for coarse quantization operations.

add `method descriptor` ¤

add()

add(self, n: int, data: float) -> None

Add new data points to the coarse quantizer.

:param n: Number of data points. :param data: Pointer to data points.

assign `method descriptor` ¤

assign()

assign(self, n: int, x: float, codes: int) -> None

Assign the nearest centroids to the given data points.

:param n: Number of data points. :param x: Pointer to data. :param codes: Pointer to assigned codes.

code_size `method descriptor` ¤

code_size()

code_size(self) -> int

Get the size of the code.

:return: Size of the code.

compute_residual `method descriptor` ¤

compute_residual()

compute_residual(self, vec: float, residual: float, centroid_id: int) -> None

Compute the residual vector for a given data point and centroid.

:param vec: Pointer to data point. :param residual: Pointer to residual vector. :param centroid_id: Centroid ID.

compute_residual_n `method descriptor` ¤

compute_residual_n()

compute_residual_n(self, n: int, vec: float, residual: float, centroid_ids: int) -> None

Compute the residual vectors for multiple data points and centroids.

:param n: Number of data points. :param vec: Pointer to data points. :param residual: Pointer to residual vectors. :param centroid_ids: Pointer to centroid IDs.

get_xb `method descriptor` ¤

get_xb()

get_xb(self) -> float

Get the centroids.

:return: Pointer to centroids.

is_trained `method descriptor` ¤

is_trained()

is_trained(self) -> bool

Check if the coarse quantizer is trained.

:return: True if trained, False otherwise.

num_centroids `method descriptor` ¤

num_centroids()

num_centroids(self) -> int

Get the number of centroids.

:return: Number of centroids.

reconstruct `method descriptor` ¤

reconstruct()

reconstruct(self, centroid_id: int, embedding: float) -> None

Reconstruct the embedding for a given centroid.

:param centroid_id: Centroid ID. :param embedding: Pointer to reconstructed embedding.

reset `method descriptor` ¤

reset()

reset(self) -> None

Reset the coarse quantizer.

sa_decode `method descriptor` ¤

sa_decode()

sa_decode(self, n: int, codes: int, x: float) -> None

Decode the given codes to data points.

:param n: Number of data points. :param codes: Pointer to codes. :param x: Pointer to decoded data.

save `method descriptor` ¤

save()

save(self, path: str) -> None

Save the coarse quantizer to the specified path.

:param path: Path to save the quantizer.

search `method descriptor` ¤

search()

search(self, num_query_tok: int, data: float, k_top_centroids: int, distances: float, coarse_idx: int) -> None

Search for the nearest centroids to the given data points.

:param num_query_tok: Number of query tokens. :param data: Pointer to data points. :param k_top_centroids: Number of top centroids to search. :param distances: Pointer to distances to nearest centroids. :param coarse_idx: Pointer to coarse indices of nearest centroids.

serialize `method descriptor` ¤

serialize()

serialize(self, filename: str) -> None

Serialize the coarse quantizer to a file.

:param filename: File name to serialize to.

train `method descriptor` ¤

train()

train(self, n: int, x: float, k: int, num_iter: int) -> None

Train the coarse quantizer with the given data.

:param n: Number of data points. :param x: Pointer to data. :param k: Number of centroids. :param num_iter: Number of iterations.

IndexIVF ¤

IndexIVF()

IndexIVF is a multi-vector index with an inverted file structure.

init(self, other: lintdb.core.IndexIVF, path: str) -> None

Overloaded function.

__init__(self, path: str, read_only: bool = False) -> None

Load an existing index.

:param path: The path to the index. :param read_only: Whether to open the index in read-only mode.

__init__(self, path: str, schema: lintdb.core.Schema, config: lintdb.core.Configuration) -> None

Create a new index with the given path, schema, and configuration.

:param path: The path to initialize the index. :param schema: The schema for the index. :param config: The configuration for the index.

__init__(self, other: lintdb.core.IndexIVF, path: str) -> None

Create a copy of a trained index at the given path. The copy will always be writable.

Throws an exception if the index isn't trained when this method is called.

:param other: The other IndexIVF to copy. :param path: The path to initialize the new index.

config `property` ¤

config

Configuration of the index.

read_only `property` ¤

read_only

Flag indicating whether the index is read-only.

add `method descriptor` ¤

add()

add(self, tenant: int, docs: collections.abc.Sequence[lintdb.core.Document]) -> None

Add a block of embeddings to the index.

:param tenant: The tenant to assign the documents to. :param docs: A vector of documents to add.

add_single `method descriptor` ¤

add_single()

add_single(self, tenant: int, doc: lintdb.core.Document) -> None

Add a single document to the index.

:param tenant: The tenant to assign the document to. :param doc: The document to add.

close `method descriptor` ¤

close()

close(self) -> None

Close the index, releasing any resources.

merge `method descriptor` ¤

merge()

merge(self, path: str) -> None

Merge the index with another index.

This enables easier multiprocess building of indices but can have subtle issues if indices have different centroids.

:param path: The path to the other index.

remove `method descriptor` ¤

remove()

remove(self, tenant: int, ids: collections.abc.Sequence[int]) -> None

Remove documents from the index by their IDs.

:param tenant: The tenant the documents belong to. :param ids: The IDs of the documents to remove.

save `method descriptor` ¤

save()

save(self) -> None

Save the current state of the index. Quantization and compression will be saved within the Index's path.

search `method descriptor` ¤

search()

search(self, tenant: int, query: lintdb.core.Query, k: int, opts: dict = {}) -> list[lintdb.core.SearchResult]

Find the nearest neighbors for a vector block.

:param tenant: The tenant the document belongs to. :param query: The query to search with. :param k: The number of top results to return. :param opts: Search options to use during searching.

set_coarse_quantizer `method descriptor` ¤

set_coarse_quantizer()

set_coarse_quantizer(self, field: str, quantizer: lintdb.core.ICoarseQuantizer) -> None

Set the coarse quantizer for a field.

:param field: The field to set the coarse quantizer for. :param quantizer: The coarse quantizer to set.

set_quantizer `method descriptor` ¤

set_quantizer()

set_quantizer(self, field: str, quantizer: lintdb.core.Quantizer) -> None

Set the quantizer for a field.

:param field: The field to set the quantizer for. :param quantizer: The quantizer to set.

train `method descriptor` ¤

train()

train(self, docs: collections.abc.Sequence[lintdb.core.Document]) -> None

Train the index with the given documents to learn quantization and compression parameters.

:param docs: The documents to use for training.

update `method descriptor` ¤

update()

update(self, tenant: int, docs: collections.abc.Sequence[lintdb.core.Document]) -> None

Update documents in the index. This is a convenience function for remove and add.

:param tenant: The tenant the documents belong to. :param docs: The documents to update.

Quantizer ¤

Quantizer(*args, **kwargs)

Abstract Quantizer class providing an interface for quantization operations.

code_size `method descriptor` ¤

code_size()

code_size(self) -> int

Get the size of the code.

:return: Size of the code.

get_nbits `method descriptor` ¤

get_nbits()

get_nbits(self) -> int

Get the number of bits.

:return: Number of bits.

get_type `method descriptor` ¤

get_type()

get_type(self) -> lintdb.core.QuantizerType

Get the type of quantizer.

:return: Quantizer type.

sa_decode `method descriptor` ¤

sa_decode()

sa_decode(self, n: int, codes: int, x: float) -> None

Decode the given data using the quantizer.

:param n: Number of data points. :param codes: Pointer to encoded data. :param x: Pointer to decoded data.

sa_encode `method descriptor` ¤

sa_encode()

sa_encode(self, n: int, x: float, codes: int) -> None

Encode the given data using the quantizer.

:param n: Number of data points. :param x: Pointer to data. :param codes: Pointer to encoded data.

save `method descriptor` ¤

save()

save(self, path: str) -> None

Save the quantizer to the specified path.

:param path: Path to save the quantizer.

train `method descriptor` ¤

train()

train(self, n: int, x: float, dim: int) -> None

Train the quantizer with the given data.

:param n: Number of data points. :param x: Pointer to data. :param dim: Dimension size.

QuantizerType ¤

Bases: enum.Enum

Enumeration of quantizer types.

BINARIZER ¤

BINARIZER = QuantizerType.BINARIZER

Binarizer quantizer.

NONE ¤

NONE = QuantizerType.NONE

No quantizer.

PRODUCT_ENCODER ¤

PRODUCT_ENCODER = QuantizerType.PRODUCT_ENCODER

Product encoder quantizer.

UNKNOWN ¤

UNKNOWN = QuantizerType.UNKNOWN

Unknown quantizer type.

Query ¤

Query()

Query object containing a root query node.

Constructor with a unique pointer to the root query node.

:param root: Unique pointer to the root query node.

QueryNodeType ¤

Bases: enum.Enum

Types of query nodes.

AND ¤

AND = QueryNodeType.AND

AND query node.

TERM ¤

TERM = QueryNodeType.TERM

Term query node.

VECTOR ¤

VECTOR = QueryNodeType.VECTOR

Vector query node.

Schema ¤

Schema()

Schema configuration

Overloaded function.

__init__(self) -> None

Default constructor

__init__(self, arg: collections.abc.Sequence[lintdb.core.__Field], /) -> None

Constructor with fields

fields `property` ¤

fields

Fields in the schema

add_field `method descriptor` ¤

add_field()

add_field(self, arg: lintdb.core.__Field, /) -> None

Add a field to the schema

fromJson ¤

fromJson()

fromJson(arg: Json::Value, /) -> lintdb.core.Schema

Create schema from JSON

toJson `method descriptor` ¤

toJson()

toJson(self) -> Json::Value

Convert schema to JSON

SearchOptions ¤

SearchOptions()

SearchOptions enables custom searching behavior.

These options expose ways to tradeoff recall and latency at different levels of retrieval. To search more centroids: - Decrease centroid_score_threshold and increase k_top_centroids. - Increase n_probe in search(). To decrease latency: - Increase centroid_score_threshold and decrease k_top_centroids. - Decrease n_probe in search().

Default constructor

centroid_score_threshold `property` ¤

centroid_score_threshold

The threshold for centroid scores. Lower values mean more centroids are considered.

colbert_field `property` ¤

colbert_field

The field to use for ColBERT (Contextualized Late Interaction over BERT).

expected_id `property` ¤

expected_id

Expects a document ID in the return result. Prints additional information during execution. Useful for debugging.

k_top_centroids `property` ¤

k_top_centroids

The number of top centroids to consider per token. Higher values mean more centroids are considered.

n_probe `property` ¤

n_probe

The number of centroids to search overall. Higher values mean more centroids are searched.

nearest_tokens_to_fetch `property` ¤

nearest_tokens_to_fetch

The number of nearest tokens to fetch in XTR. Higher values mean more tokens are fetched.

num_second_pass `property` ¤

num_second_pass

The number of second pass candidates to consider. Higher values mean more candidates are considered.

SearchResult ¤

SearchResult()

Search result

Default constructor

id `property` ¤

id

Document ID

metadata `property` ¤

metadata

Metadata for the document

score `property` ¤

score

Final score

SupportedTypes ¤

SupportedTypes(*args, **kwargs)

Supported data types

Version ¤

Version()

Version class representing LintDB's version with major, minor, revision, and build numbers.

Overloaded function.

__init__(self) -> None

Default constructor.

__init__(self, versionStr: str) -> None

Constructor with version string.

:param versionStr: Version string in the format 'major.minor.revision'.

build `property` ¤

build

Build number.

major `property` ¤

major

Major version number.

metadata_enabled `property` ¤

metadata_enabled

Flag indicating if metadata is enabled.

minor `property` ¤

minor

Minor version number.

revision `property` ¤

revision

Revision number.

AndQueryNode ¤

AndQueryNode()

AndQueryNode(its: collections.abc.Sequence[lintdb.core.__QueryNode]) -> lintdb.core.__QueryNode

Binarizer ¤

Binarizer()

Binarizer(arg0: collections.abc.Sequence[float], arg1: collections.abc.Sequence[float], arg2: float, arg3: int, arg4: int, /) -> lintdb.core._Binarizer

Create a Binarizer object

ColbertField ¤

ColbertField()

ColbertField(arg0: str, arg1: lintdb.core.DataType, arg2: dict, /) -> lintdb.core.__ColbertField

Create a ColbertField object

ContextField ¤

ContextField()

ContextField(arg0: str, arg1: lintdb.core.DataType, arg2: dict, /) -> lintdb.core.__ContextField

Create a ContextField object

DateFieldValue ¤

DateFieldValue()

DateFieldValue(arg0: str, arg1: object, /) -> lintdb.core.FieldValue

Create FieldValue from DateTime

FaissCoarseQuantizer ¤

FaissCoarseQuantizer()

FaissCoarseQuantizer(centroids: ndarray[dtype=float32, device='cpu']) -> lintdb.core._FaissCoarseQuantizer

Field ¤

Field()

Field(arg0: str, arg1: lintdb.core.DataType, arg2: collections.abc.Sequence[lintdb.core.FieldType], arg3: dict, /) -> lintdb.core.__Field

Create a Field object

FloatFieldValue ¤

FloatFieldValue()

FloatFieldValue(arg0: str, arg1: float, /) -> lintdb.core.FieldValue

Create FieldValue from float

IndexedField ¤

IndexedField()

IndexedField(arg0: str, arg1: lintdb.core.DataType, arg2: dict, /) -> lintdb.core.__IndexedField

Create an IndexedField object

IntFieldValue ¤

IntFieldValue()

IntFieldValue(arg0: str, arg1: int, /) -> lintdb.core.FieldValue

Create FieldValue from integer

QuantizedTensorFieldValue ¤

QuantizedTensorFieldValue()

QuantizedTensorFieldValue(arg0: str, arg1: ndarray[dtype=uint8, shape=(, ), device='cpu'], /) -> lintdb.core.FieldValue

Create FieldValue from QuantizedTensor

StoredField ¤

StoredField()

StoredField(arg0: str, arg1: lintdb.core.DataType, arg2: dict, /) -> lintdb.core.__StoredField

Create a StoredField object

TensorFieldValue ¤

TensorFieldValue()

TensorFieldValue(arg0: str, arg1: ndarray[dtype=float32, shape=(, ), device='cpu'], /) -> lintdb.core.FieldValue

Create FieldValue from Tensor

TermQueryNode ¤

TermQueryNode()

TermQueryNode(value: lintdb.core.FieldValue) -> lintdb.core.__QueryNode

TextFieldValue ¤

TextFieldValue()

TextFieldValue(arg0: str, arg1: str, /) -> lintdb.core.FieldValue

Create FieldValue from string

VectorQueryNode ¤

VectorQueryNode()

VectorQueryNode(value: lintdb.core.FieldValue) -> lintdb.core.__QueryNode

Reference¤

core ¤

ColBERTContextData ¤

doc_codes property ¤

doc_residuals property ¤

Configuration ¤

lintdb_version property ¤

DataType ¤

COLBERT ¤

DATETIME ¤

FLOAT ¤

INTEGER ¤

QUANTIZED_TENSOR ¤

TENSOR ¤

TEXT ¤

DateTime ¤

from_python ¤

to_python method descriptor ¤

Document ¤

fields property ¤

id property ¤

Duration ¤

FieldParameters ¤

analyzer property ¤

dimensions property ¤

nbits property ¤

num_centroids property ¤

num_iterations property ¤

num_subquantizers property ¤

quantization property ¤

FieldType ¤

Colbert ¤

Context ¤

Indexed ¤

Stored ¤

FieldValue ¤

data_type property ¤

name property ¤

num_tensors property ¤

value property ¤

ICoarseQuantizer ¤

add method descriptor ¤

assign method descriptor ¤

code_size method descriptor ¤

compute_residual method descriptor ¤

compute_residual_n method descriptor ¤

get_xb method descriptor ¤

is_trained method descriptor ¤

num_centroids method descriptor ¤

reconstruct method descriptor ¤

reset method descriptor ¤

sa_decode method descriptor ¤

save method descriptor ¤

search method descriptor ¤

serialize method descriptor ¤

train method descriptor ¤

IndexIVF ¤

config property ¤

read_only property ¤

add method descriptor ¤

add_single method descriptor ¤

close method descriptor ¤

merge method descriptor ¤

remove method descriptor ¤

save method descriptor ¤

search method descriptor ¤

set_coarse_quantizer method descriptor ¤

set_quantizer method descriptor ¤

train method descriptor ¤

update method descriptor ¤

Quantizer ¤

code_size method descriptor ¤

get_nbits method descriptor ¤

get_type method descriptor ¤

sa_decode method descriptor ¤

sa_encode method descriptor ¤

save method descriptor ¤

train method descriptor ¤

QuantizerType ¤

BINARIZER ¤

doc_codes `property` ¤

doc_residuals `property` ¤

lintdb_version `property` ¤

to_python `method descriptor` ¤

fields `property` ¤

id `property` ¤

analyzer `property` ¤

dimensions `property` ¤

nbits `property` ¤

num_centroids `property` ¤

num_iterations `property` ¤

num_subquantizers `property` ¤

quantization `property` ¤

data_type `property` ¤

name `property` ¤

num_tensors `property` ¤

value `property` ¤

add `method descriptor` ¤

assign `method descriptor` ¤

code_size `method descriptor` ¤

compute_residual `method descriptor` ¤

compute_residual_n `method descriptor` ¤

get_xb `method descriptor` ¤

is_trained `method descriptor` ¤

num_centroids `method descriptor` ¤

reconstruct `method descriptor` ¤

reset `method descriptor` ¤

sa_decode `method descriptor` ¤

save `method descriptor` ¤

search `method descriptor` ¤

serialize `method descriptor` ¤

train `method descriptor` ¤

config `property` ¤

read_only `property` ¤

add `method descriptor` ¤

add_single `method descriptor` ¤

close `method descriptor` ¤

merge `method descriptor` ¤

remove `method descriptor` ¤

save `method descriptor` ¤

search `method descriptor` ¤

set_coarse_quantizer `method descriptor` ¤

set_quantizer `method descriptor` ¤

train `method descriptor` ¤

update `method descriptor` ¤

code_size `method descriptor` ¤

get_nbits `method descriptor` ¤

get_type `method descriptor` ¤

sa_decode `method descriptor` ¤

sa_encode `method descriptor` ¤

save `method descriptor` ¤

train `method descriptor` ¤

fields `property` ¤

add_field `method descriptor` ¤

toJson `method descriptor` ¤

centroid_score_threshold `property` ¤

colbert_field `property` ¤

expected_id `property` ¤

k_top_centroids `property` ¤

n_probe `property` ¤

nearest_tokens_to_fetch `property` ¤

num_second_pass `property` ¤

id `property` ¤

metadata `property` ¤

score `property` ¤

build `property` ¤

major `property` ¤

metadata_enabled `property` ¤

minor `property` ¤

revision `property` ¤