Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Aleksandr Chuklin
    @varepsilon
    Feel free to ask any questions related to the challenge and the dataset here.
    DeepPavlov
    @DeepPavlov
    Hi everyone!
    Julia Eclipse
    @julianakiseleva
    hello world!
    Mohammad Aliannejadi
    @aliannejadi
    Please note that the performance metrics in single_turn_train_eval.pkl could be used also as training labels for the document_relevance task.
    YosiMass
    @YosiMass

    Hi. Thanks for organizing the challenge. For stage1 - what do you expect to see on our github? If I understand correctly, a submission should include 2 files only. The guidelines say

    "Please send two files per run as described above to clariq@convai.io, indicating your team's name, as well as your run ID. You'll also need to share your GitHub repository with us."

    Mohammad Aliannejadi
    @aliannejadi
    Hi. Thank you for your interest. Sharing the GitHub repo is part of Stage 2. At the moment, we only need the run files. Please make sure that your run files include the results both on test and dev sets.
    3 replies
    YosiMass
    @YosiMass
    Are there qrels for the train & dev topics? I see that the 198 qulac topics were taken from TREC's Web Track so they have qrels, but you added more topics. I guess you should have qrels for all the 237 train & dev topics, since you supply the document relevance scores for all topics?
    Mohammad Aliannejadi
    @aliannejadi
    Hi @YosiMass , thank you for your message. We will soon release the qrel files for all the topics in the train & dev sets.
    Mohammad Aliannejadi
    @aliannejadi
    @YosiMass please check the ./data/ directory in the repository for the qrel files.
    YosiMass
    @YosiMass
    Thanks! What is -2 relevance? For example "F0010 0 clueweb09-en0000-45-05740 -2"
    Mohammad Aliannejadi
    @aliannejadi
    @YosiMass -2,-1, and 0 mean that the document is irrelevant, while 1 and 2 indicate relevant documents.
    4 replies
    DakotaNorth
    @DakotaNorth
    Hi. I have a question regarding the first stage. We should add a model to "clariq bm25 baseline" to make an improvement for selecting the right question based on the user's query, correct me if I am wrong?
    5 replies
    YosiMass
    @YosiMass
    I am a bit confused about the clarification_need label (1-4). You say that label 1 means no clarification is needed. But when I look at the train.tsv and dev.tsv I see some topics with clarification label =1, yet they have several facets and several clarification questions. For example, train topic 108 has clarification_need=1 but it has several facets.
    15 replies
    Kaushal Kumar Maurya
    @kaushal0494
    Hi, @aliannejadi. I started a bit late for this challenge. I have requested, Is this recoding of the kick-off webinar conducted on August 4. If yes, please share. This will be helpful for us. Thank you!
    ky941122
    @ky941122
    Hi, @aliannejadi . I noticed that you want to know whether ensembling is used. About this, I have some questions: 1: I trained a single model with 5 folds, and use the average results of these 5-folds model, is this belong to ensemble? This is a very common method in each competition. 2.: I trained a single model, and use the average result of different checkpoints of different training steps, is this belong to ensemble? 3: I trained a single model, and averaged weights of different checkpoints, is this belong to ensemble?
    Mohammad Aliannejadi
    @aliannejadi
    Hi @ky941122 , thank you for your interest. You can find the recorded webinar here: https://youtu.be/2cSLNScJqFk
    @kaushal0494 sorry, the message above was for you ;-).
    Hi @ky941122 , thank you for your question. You can use any combination or ensembles.
    Kaushal Kumar Maurya
    @kaushal0494
    Thank you! @aliannejadi
    ky941122
    @ky941122
    @aliannejadi Thanks, but the rule says "we may place these in separate tracks in an attempt to deemphasize the use of ensembles", so I want to know more details about it.
    Mohammad Aliannejadi
    @aliannejadi

    @aliannejadi Thanks, but the rule says "we may place these in separate tracks in an attempt to deemphasize the use of ensembles", so I want to know more details about it.

    @ky941122 Thanks for pointing this out. Using the average output of different folds or checkpoints do not count for ensemble models.

    Chang Gao
    @gao-xiao-bai
    How do you support multi-turn conversations? I am confused about this since I only see single-turn conversations in the dataset.
    Mohammad Aliannejadi
    @aliannejadi

    How do you support multi-turn conversations? I am confused about this since I only see single-turn conversations in the dataset.

    Hi Chang. Thank you for your question. We will soon release a synthetic multi-turn dataset. The second phase will be concerned with multi-turn conversations. However, we release the multi-turn data earlier, so that the participants can start developing their models. Hope this answers your question.

    Chang Gao
    @gao-xiao-bai
    Thank you! @aliannejadi
    DakotaNorth
    @DakotaNorth
    Hi all, what are the difference between topic description and facet? they provide some info about the topic, why two separate is needed?
    3 replies
    Chang Gao
    @gao-xiao-bai
    What does "the best question" mean? Based on what standard? I saw "BestQuestion (WorstQuestion) selects the candidate question for which the MRR value of the retrieval model is the maximum (minimum)" in your paper "Asking Clarifying Questions in Open-Domain Information-Seeking Conversations". But I found that the question corresponding to the maximum MRR is not consistent with the question in dev_best_q.eval. For example, for topic 107, its facets are F0031 to F0036. I calculated the average of the MRR values from F0031 to F0036 for each relevant question and found that the best question is Q02613, but the best question corresponding to dev_best_q.eval is Q03544. Is my calculation process correct? How do you calculate it?
    4 replies
    DakotaNorth
    @DakotaNorth
    Is there the base code available for these tasks?
    4 replies
    Chang Gao
    @gao-xiao-bai
    Hi, @aliannejadi. I have another question. Why are there no corresponding metrics for some relevant questions? For example, for facet F0088, there is no NDCG3 corresponding to the relevant question Q03781. How should we calculate the metrics of these questions?
    11 replies
    欧文杰
    @ouwenjie03
    Hi all, What's the plan of stage2? Does stage1 finish?
    8 replies
    YosiMass
    @YosiMass
    Why some topics do have Q00001 as a possible clarification and some do not have? Moreover, it seems that topics with several facets (e.g., topic 1) do have Q00001 while topics with only a single facet (e.g., topic 203) do not have Q00001? I would expect that topics with multiple facets would not have Q00001 as a valid clarification.
    6 replies
    ky941122
    @ky941122
    Hi, @aliannejadi , I think stage2 is a totally different task, so what the deadline of submmiting our system? I think it's better give us a little more time to retrain the model.
    1 reply
    Jian Wang
    @iwangjian
    Hi @aliannejadi, I'm still a bit confused about the clarification need prediction task. As you said, when a user asks an initial query, the system needs to determine how critical it is to ask a clarifying question to disambiguate the query and provide a reasonable answer. Here, we must have a search system (i.e., a document retrieval system) to show the quality of document retrieval for the user's query. That is, when our model determines how critical it is to ask a clarifying question, it is determined not only by the user's query but also the retrieval quality. e.g., for topic id 55, 56, 57, the user's query is "tell me about iron", "tell me about uss yorktown charleston SC", "tell me about ct jobs" respectively, however, the clarification need is 4, 1, 2 respectively. So, it seems impossible to determine the clarification need if only by the user's query. My question is, we don't have such a retrieval system provided by the official organizer, how can we determine the clarification need?
    2 replies
    ky941122
    @ky941122
    hi @aliannejadi , in the github page, I see that facet_id will be provided in the input when evaluating the system, is this a mistake?
    1 reply
    Kim
    @KimYar
    @aliannejadi Hi Mohammad, as you mentioned before relevancy score is -2,-1,0 for irrelevant and 1,2 for relevant, I found relevancy score as follow:[1, -2, 0, 2, 3, 4], is there any reason for these numbers?
    2 replies
    Kim
    @KimYar
    @aliannejadi Hi Mohammad, I was looking at single_turn_train_eval file and I realized there are 789 facets for each evaluation metric (I tried MRR100, NDCG1 &3) although the total number of facets for train and dev sets are 801 (638 trainset, 163 testset). so evaluation metrics for 12 facets are missing. Please correct me if I am wrong?
    1 reply