Skip to content

icon

LintDB¤

LintDB is a multi-vector database meant for Gen AI. LintDB natively supports late interaction like ColBERT and PLAID.

Key Features¤

  • Multi vector support: LintDB stores multiple vectors per document id and calculates the max similarity across vectors to determine relevance.
  • Bit-level Compression: LintDB fully implements PLAID's bit compression, storing 128 dimension embeddings in as low as 16 bytes.
  • Embedded: LintDB can be embedded directly into your Python application. No need to setup a separate database.
  • Full Support for PLAID and ColBERT: LintDB is built around PLAID and ColBERT.
  • Filtering: LintDB supports filtering on any field in the schema.

Installation¤

LintDB relies on OpenBLAS for accerlated matrix multiplication. To smooth the process of installation, we only support conda.

conda install lintdb -c deployql -c conda-forge

Usage¤

LintDB makes it easy to upload data, even if you have multiple tenants.

Below shows creating a database. LintDB defines a schema for a given database that can be used to index embeddings, floats, strings, even dates. Fields can be indexed, stored, or used as a filter.

from lintdb.core import (
  Schema,
  ColbertField,
  QuantizerType,
  Configuration,
  IndexIVF
)

schema = Schema(
  [
    ColbertField('colbert', DataType.TENSOR, {
      'dimensions': 128,
      'quantization': QuantizerType.BINARIZER,
      "num_centroids": 32768,
      "num_iterations": 10,
    })
  ]
)
config = Configuration()
index = IndexIVF(index_path, schema, config)
)

And querying the database. We can query any of the data fields we indexed.

from lintdb.core import (
Query,
VectorQueryNode
)
for id, query in zip(data.qids, data.queries):
  embedding = checkpoint.queryFromText(query)
e = np.squeeze(embedding.cpu().numpy().astype('float32'))

query = Query(
  VectorQueryNode(
    TensorFieldValue('colbert', e)
  )
)
results = index.search(0, query, 10)
print(results)

Late Interaction Model Support¤

LintDB aims to support late interaction and more advanced retrieval models.

  • ColBERTv2 with PLAID
  • XTR

Roadmap¤

LintDB aims to be a retrieval platform for Gen AI. We believe that to do this, we must support flexible retrieval and scoring methods while maintaining a high level of performance.

  • Improving performance and scalability
  • Improved benchmarks
  • Support CITADEL for scalable late interaction
  • Support learnable query adapters in the retrieval pipeline
  • Enhance support for arbitrary retrieval and ranking functions
  • Support learnable ranking functions

Comparison with other Vector Databases¤

LintDB is one of two databases that support token level embeddings. The other being Vespa.

Token Level Embeddings¤

Vespa¤

Vespa is a robust, mature search engine with many features. However, the learning curve to get started and operate Vespa is high. With embedded LintDB, there's no setup required. conda install lintdb -c deployql and get started.

Embedded¤

Chroma¤

Chroma is an embedded vector database available in Python and Javascript. LintDB currently only supports Python.

However, unlike Chroma, LintDB offers multi-tenancy support.

Documentation¤

For detailed documentation on using LintDB, refer to the official documentation

License¤

LintDB is licensed under the Apache 2.0 License. See the LICENSE file for details.

We want to offer a managed service¤

We need your help! If you'd want a managed LintDB, reach out and let us know.

Book time on the founder's calendar: https://calendar.app.google/fsymSzTVT8sip9XX6