Metadata-Version: 2.1
Name: experimaestro-ir
Version: 1.1.0
Summary: Experimaestro common module for IR experiments
Author-email: Benjamin Piwowarski <benjamin@piwowarski.fr>
License: GPL-3
Project-URL: homepage, https://github.com/bpiwowar/experimaestro-ir
Project-URL: documentation, https://experimaestro-ir.readthedocs.io/en/latest/
Project-URL: repository, https://github.com/bpiwowar/experimaestro-ir
Keywords: neural information retrieval,information retrieval,experiments
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: experimaestro >=1.2.1
Requires-Dist: datamaestro >=0.8.13
Requires-Dist: datamaestro-text >=2023.11.22
Requires-Dist: ir-datasets
Requires-Dist: docstring-parser
Requires-Dist: xpmir-rust ==0.20.*
Requires-Dist: omegaconf >=2.2
Requires-Dist: attrs
Requires-Dist: ir-measures >=0.3.3
Requires-Dist: toma
Requires-Dist: pytorch-lightning
Provides-Extra: anserini
Requires-Dist: pyserini >=0.20.0 ; extra == 'anserini'
Provides-Extra: neural
Requires-Dist: torch >=1.12 ; extra == 'neural'
Requires-Dist: tensorboard ; extra == 'neural'
Requires-Dist: transformers ; extra == 'neural'
Requires-Dist: sentence-transformers ; extra == 'neural'

[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![Documentation Status](https://readthedocs.org/projects/experimaestro-ir/badge/?version=latest)](https://experimaestro-ir.readthedocs.io/en/latest/?badge=latest)

# Information Retrieval for experimaestro

Information Retrieval module for [experimaestro](https://experimaestro-python.readthedocs.io/)

The full documentation can be read at [IR@experimaestro](https://experimaestro-ir.readthedocs.io/).

Finally, you can find the [roadmap](https://github.com/experimaestro/experimaestro-ir/issues/9).

## Install

Base experimaestro-IR can be installed with `pip install xpmir`.
Functionalities can be added by installing optional dependencies:

- `pip install xpmir[neural]` to install neural-IR packages (torch, etc.)
- `pip install xpmir[anserini]` to install Anserini related packages

For the development version, you can:

- If you just want the development version: install with `pip install git+https://github.com/experimaestro/experimaestro-ir.git`
- If you want to edit the code: clone and then do a `pip install -e .` within the directory

## What's inside?

- Collection management (using datamaestro)
    - Interface for the [IR datasets library](https://ir-datasets.com/)
    - Splitting IR datasets
    - Shuffling training triplets
- Representation
    - Word Embeddings
    - HuggingFace transformers
- Indices
    - dense: [FAISS](https://github.com/facebookresearch/faiss) interface
    - sparse: [xpmir-rust library](https://github.com/experimaestro/experimaestro-ir-rust)
- Standard Indexing and Retrieval
    - Anserini
- Learning to Rank
    - Pointwise
    - Pairwise
    - Distillation
- Neural IR
    - Cross-Encoder
    - Splade
    - DRMM
    - ColBERT
- Paper reproduction:
    - *MonoBERT* (Passage Re-ranking with BERT. Rodrigo Nogueira and Kyunghyun Cho. 2019)
    - (alpha) *DuoBERT* (Multi-Stage Document Ranking with BERT. Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, Jimmy Lin. 2019)
    - (beta) *Splade v2* (SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval, Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. SIGIR 2021)
    - (planned) ANCE
- Pre-trained models
    - [HuggingFace](https://huggingface.co) [integration](https://experimaestro-ir.readthedocs.io/en/latest/pretrained.html) (direct, through the Sentence Transformers library)

## Thanks

Some parts of the code have been adapted from [OpenNIR](https://github.com/Georgetown-IR-Lab/OpenNIR)
