Reference¤
core
¤
ColBERTContextData
¤
ColBERTContextData()
DataType
¤
Bases: enum.Enum
DateTime
¤
DateTime()
Document
¤
Document()
FieldType
¤
FieldValue
¤
FieldValue()
FieldValue for storing different types of data.
init(self, name: str, value: float) -> None init(self, name: str, value: str) -> None init(self, name: str, value: lintdb.core.DateTime) -> None init(self, name: str, value: collections.abc.Sequence[float]) -> None init(self, name: str, value: collections.abc.Sequence[float], num_tensors: int) -> None init(self, name: str, value: collections.abc.Sequence[int], num_tensors: int) -> None init(self, name: str, value: lintdb.core.ColBERTContextData, num_tensors: int) -> None
Overloaded function.
__init__(self) -> None
Default constructor.
__init__(self, name: str, value: int) -> None
Constructor with integer value.
:param name: Field name. :param value: Integer value.
__init__(self, name: str, value: float) -> None
Constructor with float value.
:param name: Field name. :param value: Float value.
__init__(self, name: str, value: str) -> None
Constructor with string value.
:param name: Field name. :param value: String value.
__init__(self, name: str, value: lintdb.core.DateTime) -> None
Constructor with DateTime value.
:param name: Field name. :param value: DateTime value.
__init__(self, name: str, value: collections.abc.Sequence[float]) -> None
Constructor with Tensor value.
:param name: Field name. :param value: Tensor value.
__init__(self, name: str, value: collections.abc.Sequence[float], num_tensors: int) -> None
Constructor with Tensor value and number of tensors.
:param name: Field name. :param value: Tensor value. :param num_tensors: Number of tensors.
__init__(self, name: str, value: collections.abc.Sequence[int], num_tensors: int) -> None
Constructor with QuantizedTensor value and number of tensors.
:param name: Field name. :param value: QuantizedTensor value. :param num_tensors: Number of tensors.
__init__(self, name: str, value: lintdb.core.ColBERTContextData, num_tensors: int) -> None
Constructor with ColBERTContextData value and number of tensors.
:param name: Field name. :param value: ColBERTContextData value. :param num_tensors: Number of tensors.
ICoarseQuantizer
¤
ICoarseQuantizer(*args, **kwargs)
Abstract ICoarseQuantizer class providing an interface for coarse quantization operations.
add
method descriptor
¤
add()
add(self, n: int, data: float) -> None
Add new data points to the coarse quantizer.
:param n: Number of data points. :param data: Pointer to data points.
assign
method descriptor
¤
assign()
assign(self, n: int, x: float, codes: int) -> None
Assign the nearest centroids to the given data points.
:param n: Number of data points. :param x: Pointer to data. :param codes: Pointer to assigned codes.
code_size
method descriptor
¤
code_size()
code_size(self) -> int
Get the size of the code.
:return: Size of the code.
compute_residual
method descriptor
¤
compute_residual()
compute_residual(self, vec: float, residual: float, centroid_id: int) -> None
Compute the residual vector for a given data point and centroid.
:param vec: Pointer to data point. :param residual: Pointer to residual vector. :param centroid_id: Centroid ID.
compute_residual_n
method descriptor
¤
compute_residual_n()
compute_residual_n(self, n: int, vec: float, residual: float, centroid_ids: int) -> None
Compute the residual vectors for multiple data points and centroids.
:param n: Number of data points. :param vec: Pointer to data points. :param residual: Pointer to residual vectors. :param centroid_ids: Pointer to centroid IDs.
get_xb
method descriptor
¤
get_xb()
get_xb(self) -> float
Get the centroids.
:return: Pointer to centroids.
is_trained
method descriptor
¤
is_trained()
is_trained(self) -> bool
Check if the coarse quantizer is trained.
:return: True if trained, False otherwise.
num_centroids
method descriptor
¤
num_centroids()
num_centroids(self) -> int
Get the number of centroids.
:return: Number of centroids.
reconstruct
method descriptor
¤
reconstruct()
reconstruct(self, centroid_id: int, embedding: float) -> None
Reconstruct the embedding for a given centroid.
:param centroid_id: Centroid ID. :param embedding: Pointer to reconstructed embedding.
sa_decode
method descriptor
¤
sa_decode()
sa_decode(self, n: int, codes: int, x: float) -> None
Decode the given codes to data points.
:param n: Number of data points. :param codes: Pointer to codes. :param x: Pointer to decoded data.
save
method descriptor
¤
save()
save(self, path: str) -> None
Save the coarse quantizer to the specified path.
:param path: Path to save the quantizer.
search
method descriptor
¤
search()
search(self, num_query_tok: int, data: float, k_top_centroids: int, distances: float, coarse_idx: int) -> None
Search for the nearest centroids to the given data points.
:param num_query_tok: Number of query tokens. :param data: Pointer to data points. :param k_top_centroids: Number of top centroids to search. :param distances: Pointer to distances to nearest centroids. :param coarse_idx: Pointer to coarse indices of nearest centroids.
serialize
method descriptor
¤
serialize()
serialize(self, filename: str) -> None
Serialize the coarse quantizer to a file.
:param filename: File name to serialize to.
train
method descriptor
¤
train()
train(self, n: int, x: float, k: int, num_iter: int) -> None
Train the coarse quantizer with the given data.
:param n: Number of data points. :param x: Pointer to data. :param k: Number of centroids. :param num_iter: Number of iterations.
IndexIVF
¤
IndexIVF()
IndexIVF is a multi-vector index with an inverted file structure.
init(self, other: lintdb.core.IndexIVF, path: str) -> None
Overloaded function.
__init__(self, path: str, read_only: bool = False) -> None
Load an existing index.
:param path: The path to the index. :param read_only: Whether to open the index in read-only mode.
__init__(self, path: str, schema: lintdb.core.Schema, config: lintdb.core.Configuration) -> None
Create a new index with the given path, schema, and configuration.
:param path: The path to initialize the index. :param schema: The schema for the index. :param config: The configuration for the index.
__init__(self, other: lintdb.core.IndexIVF, path: str) -> None
Create a copy of a trained index at the given path. The copy will always be writable.
Throws an exception if the index isn't trained when this method is called.
:param other: The other IndexIVF to copy. :param path: The path to initialize the new index.
add
method descriptor
¤
add()
add(self, tenant: int, docs: collections.abc.Sequence[lintdb.core.Document]) -> None
Add a block of embeddings to the index.
:param tenant: The tenant to assign the documents to. :param docs: A vector of documents to add.
add_single
method descriptor
¤
add_single()
add_single(self, tenant: int, doc: lintdb.core.Document) -> None
Add a single document to the index.
:param tenant: The tenant to assign the document to. :param doc: The document to add.
merge
method descriptor
¤
merge()
merge(self, path: str) -> None
Merge the index with another index.
This enables easier multiprocess building of indices but can have subtle issues if indices have different centroids.
:param path: The path to the other index.
remove
method descriptor
¤
remove()
remove(self, tenant: int, ids: collections.abc.Sequence[int]) -> None
Remove documents from the index by their IDs.
:param tenant: The tenant the documents belong to. :param ids: The IDs of the documents to remove.
save
method descriptor
¤
save()
save(self) -> None
Save the current state of the index. Quantization and compression will be saved within the Index's path.
search
method descriptor
¤
search()
search(self, tenant: int, query: lintdb.core.Query, k: int, opts: dict = {}) -> list[lintdb.core.SearchResult]
Find the nearest neighbors for a vector block.
:param tenant: The tenant the document belongs to. :param query: The query to search with. :param k: The number of top results to return. :param opts: Search options to use during searching.
set_coarse_quantizer
method descriptor
¤
set_coarse_quantizer()
set_coarse_quantizer(self, field: str, quantizer: lintdb.core.ICoarseQuantizer) -> None
Set the coarse quantizer for a field.
:param field: The field to set the coarse quantizer for. :param quantizer: The coarse quantizer to set.
set_quantizer
method descriptor
¤
set_quantizer()
set_quantizer(self, field: str, quantizer: lintdb.core.Quantizer) -> None
Set the quantizer for a field.
:param field: The field to set the quantizer for. :param quantizer: The quantizer to set.
train
method descriptor
¤
train()
train(self, docs: collections.abc.Sequence[lintdb.core.Document]) -> None
Train the index with the given documents to learn quantization and compression parameters.
:param docs: The documents to use for training.
update
method descriptor
¤
update()
update(self, tenant: int, docs: collections.abc.Sequence[lintdb.core.Document]) -> None
Update documents in the index. This is a convenience function for remove and add.
:param tenant: The tenant the documents belong to. :param docs: The documents to update.
Quantizer
¤
Quantizer(*args, **kwargs)
Abstract Quantizer class providing an interface for quantization operations.
code_size
method descriptor
¤
code_size()
code_size(self) -> int
Get the size of the code.
:return: Size of the code.
get_nbits
method descriptor
¤
get_nbits()
get_nbits(self) -> int
Get the number of bits.
:return: Number of bits.
get_type
method descriptor
¤
get_type()
get_type(self) -> lintdb.core.QuantizerType
Get the type of quantizer.
:return: Quantizer type.
sa_decode
method descriptor
¤
sa_decode()
sa_decode(self, n: int, codes: int, x: float) -> None
Decode the given data using the quantizer.
:param n: Number of data points. :param codes: Pointer to encoded data. :param x: Pointer to decoded data.
sa_encode
method descriptor
¤
sa_encode()
sa_encode(self, n: int, x: float, codes: int) -> None
Encode the given data using the quantizer.
:param n: Number of data points. :param x: Pointer to data. :param codes: Pointer to encoded data.
save
method descriptor
¤
save()
save(self, path: str) -> None
Save the quantizer to the specified path.
:param path: Path to save the quantizer.
train
method descriptor
¤
train()
train(self, n: int, x: float, dim: int) -> None
Train the quantizer with the given data.
:param n: Number of data points. :param x: Pointer to data. :param dim: Dimension size.
QuantizerType
¤
Bases: enum.Enum
Enumeration of quantizer types.
Query
¤
Query()
Query object containing a root query node.
Constructor with a unique pointer to the root query node.
:param root: Unique pointer to the root query node.
QueryNodeType
¤
Schema
¤
Schema()
Schema configuration
Overloaded function.
__init__(self) -> None
Default constructor
__init__(self, arg: collections.abc.Sequence[lintdb.core.__Field], /) -> None
Constructor with fields
add_field
method descriptor
¤
add_field()
add_field(self, arg: lintdb.core.__Field, /) -> None
Add a field to the schema
SearchOptions
¤
SearchOptions()
SearchOptions enables custom searching behavior.
These options expose ways to tradeoff recall and latency at different levels of retrieval.
To search more centroids:
- Decrease centroid_score_threshold
and increase k_top_centroids
.
- Increase n_probe
in search().
To decrease latency:
- Increase centroid_score_threshold
and decrease k_top_centroids
.
- Decrease n_probe
in search().
Default constructor
centroid_score_threshold
property
¤
centroid_score_threshold
The threshold for centroid scores. Lower values mean more centroids are considered.
colbert_field
property
¤
colbert_field
The field to use for ColBERT (Contextualized Late Interaction over BERT).
expected_id
property
¤
expected_id
Expects a document ID in the return result. Prints additional information during execution. Useful for debugging.
k_top_centroids
property
¤
k_top_centroids
The number of top centroids to consider per token. Higher values mean more centroids are considered.
n_probe
property
¤
n_probe
The number of centroids to search overall. Higher values mean more centroids are searched.
nearest_tokens_to_fetch
property
¤
nearest_tokens_to_fetch
The number of nearest tokens to fetch in XTR. Higher values mean more tokens are fetched.
num_second_pass
property
¤
num_second_pass
The number of second pass candidates to consider. Higher values mean more candidates are considered.
SearchResult
¤
SearchResult()
Version
¤
Version()
Version class representing LintDB's version with major, minor, revision, and build numbers.
Overloaded function.
__init__(self) -> None
Default constructor.
__init__(self, versionStr: str) -> None
Constructor with version string.
:param versionStr: Version string in the format 'major.minor.revision'.
AndQueryNode
¤
AndQueryNode()
AndQueryNode(its: collections.abc.Sequence[lintdb.core.__QueryNode]) -> lintdb.core.__QueryNode
Binarizer
¤
Binarizer()
Binarizer(arg0: collections.abc.Sequence[float], arg1: collections.abc.Sequence[float], arg2: float, arg3: int, arg4: int, /) -> lintdb.core._Binarizer
Create a Binarizer object
ColbertField
¤
ColbertField()
ColbertField(arg0: str, arg1: lintdb.core.DataType, arg2: dict, /) -> lintdb.core.__ColbertField
Create a ColbertField object
ContextField
¤
ContextField()
ContextField(arg0: str, arg1: lintdb.core.DataType, arg2: dict, /) -> lintdb.core.__ContextField
Create a ContextField object
DateFieldValue
¤
DateFieldValue()
DateFieldValue(arg0: str, arg1: object, /) -> lintdb.core.FieldValue
Create FieldValue from DateTime
FaissCoarseQuantizer
¤
FaissCoarseQuantizer()
FaissCoarseQuantizer(centroids: ndarray[dtype=float32, device='cpu']) -> lintdb.core._FaissCoarseQuantizer
Field
¤
Field()
Field(arg0: str, arg1: lintdb.core.DataType, arg2: collections.abc.Sequence[lintdb.core.FieldType], arg3: dict, /) -> lintdb.core.__Field
Create a Field object
FloatFieldValue
¤
FloatFieldValue()
FloatFieldValue(arg0: str, arg1: float, /) -> lintdb.core.FieldValue
Create FieldValue from float
IndexedField
¤
IndexedField()
IndexedField(arg0: str, arg1: lintdb.core.DataType, arg2: dict, /) -> lintdb.core.__IndexedField
Create an IndexedField object
IntFieldValue
¤
IntFieldValue()
IntFieldValue(arg0: str, arg1: int, /) -> lintdb.core.FieldValue
Create FieldValue from integer
QuantizedTensorFieldValue
¤
QuantizedTensorFieldValue()
QuantizedTensorFieldValue(arg0: str, arg1: ndarray[dtype=uint8, shape=(, ), device='cpu'], /) -> lintdb.core.FieldValue
Create FieldValue from QuantizedTensor
StoredField
¤
StoredField()
StoredField(arg0: str, arg1: lintdb.core.DataType, arg2: dict, /) -> lintdb.core.__StoredField
Create a StoredField object
TensorFieldValue
¤
TensorFieldValue()
TensorFieldValue(arg0: str, arg1: ndarray[dtype=float32, shape=(, ), device='cpu'], /) -> lintdb.core.FieldValue
Create FieldValue from Tensor
TermQueryNode
¤
TermQueryNode()
TermQueryNode(value: lintdb.core.FieldValue) -> lintdb.core.__QueryNode
TextFieldValue
¤
TextFieldValue()
TextFieldValue(arg0: str, arg1: str, /) -> lintdb.core.FieldValue
Create FieldValue from string
VectorQueryNode
¤
VectorQueryNode()
VectorQueryNode(value: lintdb.core.FieldValue) -> lintdb.core.__QueryNode