Publications

2025

Preprint

Investigating Language and Retrieval Bias in Multilingual Previously Fact-Checked Claim Detection

Ivan Vykopal, Antonia Karamolegkou, Jaroslav Kopčan, Qiwei Peng, Tomáš Javůrek, Michal Gregor, and 1 more author
EMNLP

Understanding Subword Compositionality of Large Language Models

Qiwei Peng, Yekun Chai, and Anders Søgaard

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
EMNLP

Debiasing Multilingual LLMs in Cross-lingual Latent Space

Qiwei Peng, Guimin Hu, Yekun Chai, and Anders Søgaard

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
EMNLPSystem Demo Track

o-MEGA: Optimized Methods for Explanation Generation and Analysis

Ľuboš Kriš, Jaroslav Kopčan, Qiwei Peng, Andrej Ridzik, Marcel Veselý, and Martin Tamajka

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
SemEval

SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval

Qiwei Peng, Robert Moro, Michal Gregor, Ivan Srba, Simon Ostermann, Marian Simko, and 4 more authors

Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

2024

EMNLP

Concept Space Alignment in Multilingual LLMs

Qiwei Peng, and Anders Søgaard

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
EMNLP

FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture

Wenyan Li, Xinyu Zhang, Jiaang Li, Qiwei Peng, Raphael Tang, Li Zhou, and 5 more authors

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
EMNLP

On Training Data Influence of GPT Models

Qingyi Liu, Yekun Chai, Shuohuan Wang, Yu Sun, Qiwei Peng, and Hua Wu

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
EMNLPFindings

Tokenization Falling Short: On Subword Robustness in Large Language Models

Yekun Chai, Yewei Fang, Qiwei Peng, and Xuhong Li

Findings of the Association for Computational Linguistics: EMNLP 2024
LREC-COLING

HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-Lingual Natural Language Generalization

Qiwei Peng*, Yekun Chai*, and Xuhong Li

The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation

URL

2023

*SEM

Testing Paraphrase Models on Recognising Sentence Pairs at Different Degrees of Semantic Overlap

Qiwei Peng, David Weir, and Julie Weeds

Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

URL

2022

COLINGOral

Towards Structure-aware Paraphrase Identification with Phrase Alignment Using Sentence Encoders

Qiwei Peng, David Weir, and Julie Weeds

Proceedings of the 29th International Conference on Computational Linguistics
ACLOral

Predicate-Argument Based Bi-Encoder for Paraphrase Identification

Qiwei Peng, David Weir, Julie Weeds, and Yekun Chai

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Abs URL

Paraphrase identification involves identifying whether a pair of sentences express the same or similar meanings. While cross-encoders have achieved high performances across several benchmarks, bi-encoders such as SBERT have been widely applied to sentence pair tasks. They exhibit substantially lower computation complexity and are better suited to symmetric tasks. In this work, we adopt a bi-encoder approach to the paraphrase identification task, and investigate the impact of explicitly incorporating predicate-argument information into SBERT through weighted aggregation. Experiments on six paraphrase identification datasets demonstrate that, with a minimal increase in parameters, the proposed model is able to outperform SBERT/SRoBERTa significantly. Further, ablation studies reveal that the predicate-argument based component plays a significant role in the performance gain.

2021

RepL4NLP@ACL

Structure-aware Sentence Encoder in Bert-Based Siamese Network

Qiwei Peng, David Weir, and Julie Weeds

Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

Abs URL

Recently, impressive performance on various natural language understanding tasks has been achieved by explicitly incorporating syntax and semantic information into pre-trained models, such as BERT and RoBERTa. However, this approach depends on problem-specific fine-tuning, and as widely noted, BERT-like models exhibit weak performance, and are inefficient, when applied to unsupervised similarity comparison tasks. Sentence-BERT (SBERT) has been proposed as a general-purpose sentence embedding method, suited to both similarity comparison and downstream tasks. In this work, we show that by incorporating structural information into SBERT, the resulting model outperforms SBERT and previous general sentence encoders on unsupervised semantic textual similarity (STS) datasets and transfer classification tasks.
ACLFindings

Representing Syntax and Composition with Geometric Transformations

Lorenzo Bertolini, Julie Weeds, David Weir, and Qiwei Peng

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

URL
Behav Sci

Aroused and Impulsive Effects of Colour Stimuli on Lateral and Logical Abilities

Guobin Xia, Muzi Li, Philip Henry, Stephen Westland, Francisco Queiroz, Qiwei Peng, and 1 more author

Behavioral Sciences