Cross-language Information Retrieval (CLIR) has been studied at TREC and subsequent evaluation forums for more than twenty years, but recent advances in the application of deep learning to information retrieval (IR) warrant a new, large-scale effort that will enable exploration of classical and modern IR techniques for this task.
This Website contains information about the upcoming NeuCLIR track at TREC 2022. We encourage submissions of systems that will help answering the following research questions:
In its first year, NeuCLIR will focus on three languages: Chinese, Persian, and Russian.
The main task for the proposed track is ad hoc cross-language retrieval. Documents will be drawn from Common Crawl newswire, and will be written in Chinese, Russian, and Persian. Topics will be in English, and will be expressed in traditional TREC title/description/narrative form. Retrieved documents will be graded as highly relevant, somewhat relevant, and not relevant; we expect to use several metrics to evaluate runs, includiing nDCG@100 and ERR.
The development and test data created for SCALE 2021 (60 topics in Chinese and Persian, 40 topics in Russian) will soon be available to the track as development data with relevance judgments. Further, participants are welcome to use any related cross-language and non-English monolingual datasets have been made available (e.g., CLIRMatrix, MLWIKIR, and others). In addition, the SCALE 2021 effort produced translations of MS MARCO into Chinese, Russian, and Persian. These translations should be freely redistributable.
While the exact timeline is not finalized yet, here’s an approximate schedule:
For any questions, please reach out to the organizers at email@example.com.