Publications / Preprints
* denotes equal contribution
Please see my Google Scholar or Semantic Scholar for the most up-to-date list of publications.
Preprints
2024
- SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented GenerationZijun Yao, Weijian Qi, Liangming Pan, Shulin Cao, Linmei Hu, Weichuan Liu, Lei Hou, and Juanzi LiarXiv preprint, 2024
This paper introduces Self-aware Knowledge Retrieval (SeaKR), a novel adaptive RAG model that extracts self-aware uncertainty of LLMs from their internal states. SeaKR activates retrieval when the LLMs present high self-aware uncertainty for generation. To effectively integrate retrieved knowledge snippets, SeaKR re-ranks them based on LLM’s self-aware uncertainty to preserve the snippet that reduces their uncertainty to the utmost. To facilitate solving complex tasks that require multiple retrievals, SeaKR utilizes their self-aware uncertainty to choose among different reasoning strategies. Our experiments on both complex and simple Question Answering datasets show that SeaKR outperforms existing adaptive RAG methods.
@misc{yao2024seakrselfawareknowledgeretrieval, title = {SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation}, author = {Yao, Zijun and Qi, Weijian and Pan, Liangming and Cao, Shulin and Hu, Linmei and Liu, Weichuan and Hou, Lei and Li, Juanzi}, year = {2024}, eprint = {2406.19215}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2406.19215} }
- Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve FrameworkJundong Xu, Hao Fei, Meng Luo, Qian Liu, Liangming Pan, William Yang Wang, Preslav Nakov, Mong-Li Lee, and Wynne HsuarXiv preprint, 2024
In the context of large language models (LLMs), current advanced reasoning methods have made impressive strides in various reasoning tasks. However, when it comes to logical reasoning tasks, major challenges remain in both efficacy and efficiency. This is rooted in the fact that these systems fail to fully leverage the inherent structure of logical tasks throughout the reasoning processes such as decomposition, search, and resolution. To address this, we propose a logic-complete reasoning framework, Aristotle, with three key components: Logical Decomposer, Logical Search Router, and Logical Resolver. In our framework, symbolic expressions and logical rules are comprehensively integrated into the entire reasoning process, significantly alleviating the bottlenecks of logical reasoning, i.e., reducing sub-task complexity, minimizing search errors, and resolving logical contradictions. The experimental results on several datasets demonstrate that Aristotle consistently outperforms state-of-the-art reasoning frameworks in both accuracy and efficiency, particularly excelling in complex logical reasoning scenarios.
@misc{xu2024aristotlemasteringlogicalreasoning, title = {Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework}, author = {Xu, Jundong and Fei, Hao and Luo, Meng and Liu, Qian and Pan, Liangming and Wang, William Yang and Nakov, Preslav and Lee, Mong-Li and Hsu, Wynne}, year = {2024}, eprint = {2412.16953}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2412.16953} }
- AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World KnowledgeXiaobao Wu, Liangming Pan, Yuxi Xie, Ruiwen Zhou, Shuai Zhao, Yubo Ma, Mingzhe Du, Rui Mao, Anh Tuan Luu, and William Yang WangarXiv preprint, 2024
Data contamination hinders fair LLM evaluation by introducing test data into newer models’ training sets. Existing studies solve this challenge by updating benchmarks with newly collected data. However, they fail to guarantee contamination-free evaluation as the newly collected data may contain pre-existing knowledge, and their benchmark updates rely on intensive human labor. To address these issues, we in this paper propose AntiLeak-Bench, an automated anti-leakage benchmarking framework. Instead of simply using newly collected data, we construct samples with explicitly new knowledge absent from LLMs’ training sets, which thus ensures strictly contamination-free evaluation. We further design a fully automated workflow to build and update our benchmark without human labor. This significantly reduces the cost of benchmark maintenance to accommodate emerging LLMs. Through extensive experiments, we highlight that data contamination likely exists before LLMs’ cutoff time and demonstrate AntiLeak-Bench effectively overcomes this challenge.
@misc{wu2024antileakbenchpreventingdatacontamination, title = {AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge}, author = {Wu, Xiaobao and Pan, Liangming and Xie, Yuxi and Zhou, Ruiwen and Zhao, Shuai and Ma, Yubo and Du, Mingzhe and Mao, Rui and Luu, Anh Tuan and Wang, William Yang}, year = {2024}, eprint = {2412.13670}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2412.13670} }
- RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World ScenariosRuiwen Zhou, Wenyue Hua, Liangming Pan, Sitao Cheng, Xiaobao Wu, En Yu, and William Yang WangarXiv preprint, 2024
This paper introduces RuleArena, a novel and challenging benchmark designed to evaluate the ability of large language models (LLMs) to follow complex, real-world rules in reasoning. Covering three practical domains – airline baggage fees, NBA transactions, and tax regulations – RuleArena assesses LLMs’ proficiency in handling intricate natural language instructions that demand long-context understanding, logical reasoning, and accurate mathematical computation. Two key attributes distinguish RuleArena from traditional rule-based reasoning benchmarks: (1) it extends beyond standard first-order logic representations, and (2) it is grounded in authentic, practical scenarios, providing insights into the suitability and reliability of LLMs for real-world applications. Our findings reveal several notable limitations in LLMs: (1) they struggle to identify and apply the appropriate rules, frequently becoming confused by similar but distinct regulations, (2) they cannot consistently perform accurate mathematical computations, even when they correctly identify the relevant rules, and (3) in general, they perform poorly in the benchmark. These results highlight significant challenges in advancing LLMs’ rule-guided reasoning capabilities in real-life applications.
@misc{zhou2024rulearenabenchmarkruleguidedreasoning, title = {RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios}, author = {Zhou, Ruiwen and Hua, Wenyue and Pan, Liangming and Cheng, Sitao and Wu, Xiaobao and Yu, En and Wang, William Yang}, year = {2024}, eprint = {2412.08972}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2412.08972} }
- COrAL: Order-Agnostic Language Modeling for Efficient Iterative RefinementYuxi Xie, Anirudh Goyal, Xiaobao Wu, Xunjian Yin, Xiao Xu, Min-Yen Kan, Liangming Pan, and William Yang WangarXiv preprint, 2024
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. However, existing approaches typically implement iterative refinement at the application or prompting level, relying on autoregressive (AR) modeling. The sequential token generation in AR models can lead to high inference latency. To overcome these challenges, we propose Context-Wise Order-Agnostic Language Modeling (COrAL), which incorporates iterative refinement directly into the LLM architecture while maintaining computational efficiency. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally during the generation process. Leveraging the order-agnostic nature of COrAL, we introduce sliding blockwise order-agnostic decoding, which performs multi-token forward prediction and backward reconstruction within context windows. This allows the model to iteratively refine its outputs in parallel in the sliding block, effectively capturing diverse dependencies without the high inference cost of sequential generation. Empirical evaluations on reasoning tasks demonstrate that COrAL improves performance and inference speed, respectively, achieving absolute accuracy gains of 4.6% on GSM8K and 4.0% on LogiQA, along with inference speedups of up to 3.9× over next-token baselines. Preliminary results on code generation indicate a drop in pass rates due to inconsistencies in order-agnostic outputs, highlighting the inherent quality–speed trade-off.
@misc{xie2024coralorderagnosticlanguagemodeling, title = {COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement}, author = {Xie, Yuxi and Goyal, Anirudh and Wu, Xiaobao and Yin, Xunjian and Xu, Xiao and Kan, Min-Yen and Pan, Liangming and Wang, William Yang}, year = {2024}, eprint = {2410.09675}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2410.09675} }
- Understanding the Interplay between Parametric and Contextual Knowledge for Large Language ModelsSitao Cheng, Liangming Pan, Xunjian Yin, Xinyi Wang, and William Yang WangarXiv preprint, 2024
Large language models (LLMs) encode vast amounts of knowledge during pre-training (parametric knowledge, or PK) and can further be enhanced by incorporating contextual knowledge (CK). Can LLMs effectively integrate their internal PK with external CK to solve complex problems? In this paper, we investigate the dynamic interaction between PK and CK, categorizing their relationships into four types: Supportive, Complementary, Conflicting, and Irrelevant. To support this investigation, we introduce ECHOQA, a benchmark spanning scientific, factual, and commonsense knowledge. Our results show that LLMs tend to suppress their PK when contextual information is available, even when it is complementary or irrelevant. While tailored instructions can encourage LLMs to rely more on their PK, they still struggle to fully leverage it. These findings reveal a key vulnerability in LLMs, raising concerns about their reliability in knowledge-intensive tasks.
@misc{cheng2024understandinginterplayparametriccontextual, title = {Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models}, author = {Cheng, Sitao and Pan, Liangming and Yin, Xunjian and Wang, Xinyi and Wang, William Yang}, year = {2024}, eprint = {2410.08414}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2410.08414} }
- Gödel Agent: A Self-Referential Agent Framework for Recursive Self-ImprovementXunjian Yin, Xinyi Wang, Liangming Pan, Xiaojun Wan, and William Yang WangarXiv preprint, 2024
The rapid advancement of large language models (LLMs) has significantly enhanced the capabilities of AI-driven agents across various tasks. However, existing agentic systems, whether based on fixed pipeline algorithms or pre-defined meta-learning frameworks, cannot search the whole agent design space due to the restriction of human-designed components, and thus might miss the globally optimal agent design. In this paper, we introduce Gödel Agent, a self-evolving framework inspired by the Gödel machine, enabling agents to recursively improve themselves without relying on predefined routines or fixed optimization algorithms. Gödel Agent leverages LLMs to dynamically modify its own logic and behavior, guided solely by high-level objectives through prompting. Experimental results on mathematical reasoning and complex agent tasks demonstrate that implementation of Gödel Agent can achieve continuous self-improvement, surpassing manually crafted agents in performance, efficiency, and generalizability.
@misc{yin2024godelagentselfreferentialagent, title = {G\"odel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement}, author = {Yin, Xunjian and Wang, Xinyi and Pan, Liangming and Wan, Xiaojun and Wang, William Yang}, year = {2024}, eprint = {2410.04444}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2410.04444} }
Conference & Journal Papers
2025
- Improving Causal Reasoning in Large Language Models: A SurveyLongxuan Yu, Delin Chen, Siheng Xiong, Qingyang Wu, Qingzhen Liu, Dawei Li, Zhikai Chen, Xiaoze Liu, and Liangming PanIn Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Causal reasoning (CR) is a crucial aspect of intelligence, essential for problem-solving, decision-making, and understanding the world. While large language models (LLMs) can generate rationales for their outputs, their ability to reliably perform causal reasoning remains uncertain, often falling short in tasks requiring a deep understanding of causality. In this survey, we provide a comprehensive review of research aimed at enhancing LLMs for causal reasoning. We categorize existing methods based on the role of LLMs: either as reasoning engines or as helpers providing knowledge or data to traditional CR methods, followed by a detailed discussion of the methodologies in each category. We then evaluate the performance of LLMs on various causal reasoning tasks, providing key findings and in-depth analysis. Finally, we provide insights from current studies and highlight promising directions for future research. We aim for this work to serve as a comprehensive resource, fostering further advancements in causal reasoning with LLMs.
@inproceedings{yu2024improvingcausalreasoningsurvey, title = {Improving Causal Reasoning in Large Language Models: A Survey}, author = {Yu, Longxuan and Chen, Delin and Xiong, Siheng and Wu, Qingyang and Liu, Qingzhen and Li, Dawei and Chen, Zhikai and Liu, Xiaoze and Pan, Liangming}, year = {2025}, booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)}, address = {Albuquerque, USA}, url = {https://arxiv.org/abs/2410.16676} }
- DistiLRR: Transferring Code Repair for Low-Resource Programming LanguagesKyle Wong, Alfonso Amayuelas, Liangming Pan, and William Yang WangIn Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Large language models (LLMs) have shown remarkable performance on code generation tasks. A recent application of LLMs for code generation is iterative code repair, where a model fixes an incorrect program by rationalizing about errors and generating a new program. However, code repair is primarily studied on high-resource languages like Python, and the framework’s efficacy is under-explored on low-resource languages. To apply code repair for low-resource languages, we propose Distilling Low-Resource Repairs (DistiLRR), an approach that transfers the reasoning and code generation ability from a teacher model to a student model. Our results show that DistiLRR consistently outperforms baselines on low-resource languages, but has similar performance on high-resource languages. To investigate this behavior, we perform a further analysis and find that the correlation between rationale quality and code correctness is weaker than previously perceived. We hypothesize this weakness is magnified in low-resource settings where base models lack deep knowledge of a programming language, leading to wavering benefits of code repair between high-resource and low-resource languages.
@inproceedings{wong2024distilrrtransferringcoderepair, title = {DistiLRR: Transferring Code Repair for Low-Resource Programming Languages}, author = {Wong, Kyle and Amayuelas, Alfonso and Pan, Liangming and Wang, William Yang}, year = {2025}, booktitle = {Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)}, address = {Albuquerque, USA}, url = {https://arxiv.org/abs/2406.14867} }
- Combating Multimodal LLM Hallucination via Bottom-Up Holistic ReasoningShengqiong Wu, Hao Fei, Liangming Pan, William Yang Wang, Shuicheng Yan, and Tat-Seng ChuaIn AAAI Conference on Artificial Intelligence (AAAI), 2025
Recent advancements in multimodal large language models (MLLMs) have shown unprecedented capabilities in advancing various vision-language tasks. However, MLLMs face significant challenges with hallucinations, and misleading outputs that do not align with the input data. While existing efforts are paid to combat MLLM hallucinations, several pivotal challenges are still unsolved. First, while current approaches aggressively focus on addressing errors at the perception level, another important type at the cognition level requiring factual commonsense can be overlooked. In addition, existing methods might fall short in finding a more effective way to represent visual input, which is yet a key bottleneck that triggers visual hallucinations. Moreover, MLLMs can frequently be misled by faulty textual inputs and cause hallucinations, while unfortunately, this type of issue has long been overlooked by existing studies. Inspired by human intuition in handling hallucinations, this paper introduces a novel bottom-up reasoning framework. Our framework systematically addresses potential issues in both visual and textual inputs by verifying and integrating perception-level information with cognition-level commonsense knowledge, ensuring more reliable outputs. Extensive experiments demonstrate significant improvements in multiple hallucination benchmarks after integrating MLLMs with the proposed framework. In-depth analyses reveal the great potential of our methods in addressing perception- and cognition-level hallucinations.
@inproceedings{wu2025combating, author = {Wu, Shengqiong and Fei, Hao and Pan, Liangming and Wang, William Yang and Yan, Shuicheng and Chua, Tat{-}Seng}, title = {Combating Multimodal {LLM} Hallucination via Bottom-Up Holistic Reasoning}, booktitle = {AAAI Conference on Artificial Intelligence (AAAI)}, year = {2025}, url = {https://arxiv.org/abs/2412.11124} }
2024
- TART: An Open-Source Tool-Augmented Framework for Explainable Table-based ReasoningXinyuan Lu, Liangming Pan*, Yubo Ma, Preslav Nakov, and Min-Yen KanIn NeurIPS Workshop on Table Representation Learning Workshop (TRL@NeurIPS), 2024Best Paper Runner-Up
Current Large Language Models (LLMs) exhibit limited ability to understand table structures and to apply precise numerical reasoning, which is crucial for tasks such as table question answering (TQA) and table-based fact verification (TFV). To address these challenges, we introduce our Tool-Augmented Reasoning framework for Tables (TART), which integrates LLMs with specialized tools. TART contains three key components: a table formatter to ensure accurate data representation, a tool maker to develop specific computational tools, and an explanation generator to maintain explainability. We also present the TOOLTAB dataset, a new benchmark designed specifically for training LLMs in table-tool integration. Our experiments indicate that TART achieves substantial improvements over existing methods (e.g., Chain-of-Thought) by improving both the precision of data processing and the clarity of the reasoning process. Notably, TART paired with CodeLlama achieves 90.0% of the accuracy of the closed-sourced LLM GPT-3.5-turbo, highlighting its robustness in diverse real-world scenarios.
@inproceedings{lu2024tartopensourcetoolaugmentedframework, title = {TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning}, author = {Lu, Xinyuan and Pan, Liangming and Ma, Yubo and Nakov, Preslav and Kan, Min-Yen}, year = {2024}, booktitle = {NeurIPS Workshop on Table Representation Learning Workshop (TRL@NeurIPS)}, address = {Vancouver, Canada}, url = {https://arxiv.org/abs/2409.11724} }
- MMLongBench-Doc: Benchmarking Long-context Document Understanding with VisualizationsYubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, Pan Zhang, Liangming Pan, Yu-Gang Jiang, Jiaqi Wang, Yixin Cao, and Aixin SunIn Annual Conference on Neural Information Processing Systems (NeurIPS) (Dataset and Benchmark Track), 2024Spotlight Paper
Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark comprising 1,062 expert-annotated questions. Distinct from previous datasets, it is constructed upon 130 lengthy PDF-formatted documents with an average of 49.4 pages and 20,971 textual tokens. Towards comprehensive evaluation, answers to these questions rely on pieces of evidence from (1) different sources (text, image, chart, table, and layout structure) and (2) various locations (i.e. page number). Moreover, 33.2% of the questions are cross-page questions requiring evidence across multiple pages. 22.8% of the questions are designed to be unanswerable for detecting potential hallucinations. Experiments on 14 LVLMs demonstrate that long-context DU greatly challenges current models. Notably, the best-performing model, GPT-4o, achieves an F1 score of only 42.7%, while the second-best, GPT-4V, scores 31.4%. Furthermore, 12 LVLMs (all except GPT-4o and GPT-4V) even present worse performance than their LLM counterparts which are fed with lossy-parsed OCR documents. These results validate the necessity of future research toward more capable long-context LVLMs.
@inproceedings{ma-etal-2024-mmlongbench-doc, title = {MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations}, author = {Ma, Yubo and Zang, Yuhang and Chen, Liangyu and Chen, Meiqi and Jiao, Yizhu and Li, Xinze and Lu, Xinyuan and Liu, Ziyu and Ma, Yan and Dong, Xiaoyi and Zhang, Pan and Pan, Liangming and Jiang, Yu-Gang and Wang, Jiaqi and Cao, Yixin and Sun, Aixin}, year = {2024}, booktitle = {Annual Conference on Neural Information Processing Systems (NeurIPS) (Dataset and Benchmark Track)}, address = {Vancouver, Canada}, url = {https://arxiv.org/abs/2407.01523} }
- AKEW: Assessing Knowledge Editing in the WildXiaobao Wu, Liangming Pan, William Yang Wang, and Anh Tuan LuuIn Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Knowledge editing aims to inject knowledge updates into language models to keep them correct and up-to-date. However, its current evaluation strategies are notably impractical: they solely update with well-curated structured facts (triplets with subjects, relations, and objects), whereas real-world knowledge updates commonly emerge in unstructured texts like news articles. In this paper, we propose a new benchmark, Unstructured Knowledge Editing (UKE). It evaluates editing performance directly using unstructured texts as knowledge updates, termed unstructured facts. Hence UKE avoids the laborious construction of structured facts and enables efficient and responsive knowledge editing, becoming a more practical benchmark. We conduct extensive experiments on newly built datasets and demonstrate that UKE poses a significant challenge to state-of-the-art knowledge editing methods, resulting in their critical performance declines. We further show that this challenge persists even if we extract triplets as structured facts. Our analysis discloses key insights to motivate future research in UKE for more practical knowledge editing.
@inproceedings{wu-etal-2024-akew, title = {AKEW: Assessing Knowledge Editing in the Wild}, author = {Wu, Xiaobao and Pan, Liangming and Wang, William Yang and Luu, Anh Tuan}, year = {2024}, booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)}, address = {Miami, USA}, publisher = {Association for Computational Linguistics}, url = {https://arxiv.org/abs/2402.18909} }
- SciAgent: Tool-augmented Language Models for Scientific ReasoningYubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, and Weizhu ChenIn Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the research of such setting, we construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving. Additionally, we craft a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs’ abilities with tool assistance. Extensive experiments on SciToolBench confirm the effectiveness of SciAgent. Notably, SciAgent-Mistral-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy. Furthermore, SciAgent-DeepMath-7B shows much superior performance than ChatGPT.
@inproceedings{ma-etal-2024-sciagent, title = {SciAgent: Tool-augmented Language Models for Scientific Reasoning}, author = {Ma, Yubo and Gou, Zhibin and Hao, Junheng and Xu, Ruochen and Wang, Shuohang and Pan, Liangming and Yang, Yujiu and Cao, Yixin and Sun, Aixin and Awadalla, Hany and Chen, Weizhu}, year = {2024}, booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)}, address = {Miami, USA}, publisher = {Association for Computational Linguistics}, url = {https://arxiv.org/abs/2402.11451} }
- A Survey on Detection of LLMs-Generated ContentXianjun Yang, Liangming Pan, Xuandong Zhao, Haifeng Chen, Linda Petzold, William Yang Wang, and Wei ChengIn Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
The burgeoning capabilities of advanced large language models (LLMs) such as ChatGPT have led to an increase in synthetic content generation with implications across a variety of sectors, including media, cybersecurity, public discourse, and education. As such, the ability to detect LLMs-generated content has become of paramount importance. We aim to provide a detailed overview of existing detection strategies and benchmarks, scrutinizing their differences and identifying key challenges and prospects in the field, advocating for more adaptable and robust models to enhance detection accuracy. We also posit the necessity for a multi-faceted approach to defend against various attacks to counter the rapidly advancing capabilities of LLMs. To the best of our knowledge, this work is the first comprehensive survey on the detection in the era of LLMs. We hope it will provide a broad understanding of the current landscape of LLMs-generated content detection, offering a guiding reference for researchers and practitioners striving to uphold the integrity of digital information in an era increasingly dominated by synthetic content.
@inproceedings{yang-etal-2024-survey, title = {A Survey on Detection of LLMs-Generated Content}, author = {Yang, Xianjun and Pan, Liangming and Zhao, Xuandong and Chen, Haifeng and Petzold, Linda and Wang, William Yang and Cheng, Wei}, year = {2024}, booktitle = {Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP)}, address = {Miami, USA}, publisher = {Association for Computational Linguistics}, url = {https://arxiv.org/abs/2310.15654} }
- MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via DebateAlfonso Amayuelas, Xianjun Yang, Antonis Antoniades, Wenyue Hua, Liangming Pan, and William WangIn Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Large Language Models (LLMs) have shown exceptional results on current benchmarks when working individually. The advancement in their capabilities, along with a reduction in parameter size and inference times, has facilitated the use of these models as agents, enabling interactions among multiple models to execute complex tasks. Such collaborations offer several advantages, including the use of specialized models (e.g. coding), improved confidence through multiple computations, and enhanced divergent thinking, leading to more diverse outputs. Thus, the collaborative use of language models is expected to grow significantly in the coming years. In this work, we evaluate the behavior of a network of models collaborating through debate under the influence of an adversary. We introduce pertinent metrics to assess the adversary’s effectiveness, focusing on system accuracy and model agreement. Our findings highlight the importance of a model’s persuasive ability in influencing others. Additionally, we explore inference-time methods to generate more compelling arguments and evaluate the potential of prompt-based mitigation as a defensive strategy.
@inproceedings{amayuelas-etal-2024-multiagent, title = {MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate}, author = {Amayuelas, Alfonso and Yang, Xianjun and Antoniades, Antonis and Hua, Wenyue and Pan, Liangming and Wang, William}, year = {2024}, booktitle = {Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP)}, address = {Miami, USA}, publisher = {Association for Computational Linguistics}, url = {https://arxiv.org/abs/2406.14711} }
- Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkersYuxia Wang, Revanth Gangi Reddy, Zain Muhammad Mujahid, Arnav Arora, Aleksandr Rubashevskii, Jiahui Geng, Osama Mohammed Afzal, Liangming Pan, Nadav Borenstein, Aditya Pillai, Isabelle Augenstein, Iryna Gurevych, and Preslav NakovIn Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. In this work, we present a holistic end-to-end solution for annotating the factuality of LLM-generated responses, which encompasses a multi-stage annotation scheme designed to yield detailed labels concerning the verifiability and factual inconsistencies found in LLM outputs. We design and build an annotation tool to speed up the labelling procedure and ease the workload of raters. It allows flexible incorporation of automatic results in any stage, e.g. automatically-retrieved evidence. We further construct an open-domain document-level factuality benchmark in three-level granularity: claim, sentence and document. Preliminary experiments show that FacTool, FactScore and Perplexity.ai are struggling to identify false claims with the best F1=0.53.
@inproceedings{wang-etal-2024-factcheck, title = {Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers}, author = {Wang, Yuxia and Reddy, Revanth Gangi and Mujahid, Zain Muhammad and Arora, Arnav and Rubashevskii, Aleksandr and Geng, Jiahui and Afzal, Osama Mohammed and Pan, Liangming and Borenstein, Nadav and Pillai, Aditya and Augenstein, Isabelle and Gurevych, Iryna and Nakov, Preslav}, year = {2024}, booktitle = {Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP)}, address = {Miami, USA}, publisher = {Association for Computational Linguistics}, url = {https://arxiv.org/abs/2311.09000} }
- Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction StrategiesLiangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, and William Yang WangTransactions of the Association for Computational Linguistics (TACL), 2024Oral Presentation at ACL 2024
While large language models (LLMs) have shown remarkable effectiveness in various NLP tasks, they are still prone to issues such as hallucination, unfaithful reasoning, and toxicity. A promising approach to rectify these flaws is correcting LLMs with feedback, where the LLM itself is prompted or guided with feedback to fix problems in its own output. Techniques leveraging automated feedback—either produced by the LLM itself (self-correction) or some external system—are of particular interest as they make LLM-based solutions more practical and deployable with minimal human intervention. This paper provides an exhaustive review of the recent advances in correcting LLMs with automated feedback, categorizing them into training-time, generation-time, and post-hoc approaches. We also identify potential challenges and future directions in this emerging field.
@article{pan-etal-2024-automatically, title = {Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies}, author = {Pan, Liangming and Saxon, Michael and Xu, Wenda and Nathani, Deepak and Wang, Xinyi and Wang, William Yang}, journal = {Transactions of the Association for Computational Linguistics (TACL)}, volume = {12}, year = {2024}, address = {Cambridge, MA}, publisher = {MIT Press}, url = {https://aclanthology.org/2024.tacl-1.27}, pages = {484--506} }
- Faithful Logical Reasoning via Symbolic Chain-of-ThoughtJundong Xu, Hao Fei, Liangming Pan, Qian Liu, Mong-Li Lee, and Wynne HsuIn Annual Meeting of the Association for Computational Linguistics (ACL), 2024
While the recent Chain-of-Thought (CoT) technique enhances the reasoning ability of large language models (LLMs) with the theory of mind, it might still struggle in handling logical reasoning that relies much on symbolic expressions and rigid deducing rules. To strengthen the logical reasoning capability of LLMs, we propose a novel Symbolic Chain-of-Thought, namely SymbCoT, a fully LLM-based framework that integrates symbolic expressions and logic rules with CoT prompting. Technically, building upon an LLM, SymbCoT 1) first translates the natural language context into the symbolic format, and then 2) derives a step-by-step plan to solve the problem with symbolic logical rules, 3) followed by a verifier to check the translation and reasoning chain. Via thorough evaluations on 5 standard datasets with both First-Order Logic and Constraint Optimization symbolic expressions, SymbCoT shows striking improvements over the CoT method consistently, meanwhile refreshing the current state-of-the-art performances. We further demonstrate that our system advances in more faithful, flexible, and explainable logical reasoning. To our knowledge, this is the first to combine symbolic expressions and rules into CoT for logical reasoning with LLMs.
@inproceedings{xu-etal-2024-faithful, title = {Faithful Logical Reasoning via Symbolic Chain-of-Thought}, author = {Xu, Jundong and Fei, Hao and Pan, Liangming and Liu, Qian and Lee, Mong{-}Li and Hsu, Wynne}, booktitle = {Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2024}, address = {Thailand}, publisher = {Association for Computational Linguistics}, url = {https://arxiv.org/abs/2405.18357} }
- ACL Oral PresentationPride and Prejudice: LLM Amplifies Self-Bias in Self-RefinementWenda Xu, Guanglei Zhu, Xuandong Zhao, Liangming Pan, Lei Li, and William Yang WangIn Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Recent studies show that large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others. We discovered that such a contrary is due to LLM’s bias in evaluating their own output. In this paper, we formally define LLM’s self-bias - the tendency to favor its own generation - using two statistics. We analyze six LLMs (GPT-4, GPT-3.5, Gemini, LLaMA2, Mixtral and DeepSeek) on translation, constrained text generation, and mathematical reasoning tasks. We find that self-bias is prevalent in all examined LLMs across multiple languages and tasks. Our analysis reveals that while the self-refine pipeline improves the fluency and understandability of model outputs, it further amplifies self-bias. To mitigate such biases, we discover that larger model size and external feedback with accurate assessment can significantly reduce bias in the self-refine pipeline, leading to actual performance improvement in downstream tasks.
@inproceedings{xu-etal-2024-pride, title = {Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement}, author = {Xu, Wenda and Zhu, Guanglei and Zhao, Xuandong and Pan, Liangming and Li, Lei and Wang, William Yang}, booktitle = {Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2024}, address = {Thailand}, publisher = {Association for Computational Linguistics}, url = {https://arxiv.org/abs/2402.11436} }
- Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language ModelsAlfonso Amayuelas, Kyle Wong, Liangming Pan, Wenhu Chen, and William Yang WangIn Findings of Annual Meeting of the Association for Computational Linguistics (ACL), 2024
This paper investigates the capabilities of Large Language Models (LLMs) in the context of understanding their knowledge and uncertainty over questions. Specifically, we focus on addressing known-unknown questions, characterized by high uncertainty due to the absence of definitive answers. To facilitate our study, we collect a new dataset with Known-Unknown Questions (KUQ) and establish a categorization framework to clarify the origins of uncertainty in such queries. Subsequently, we examine the performance of open-source LLMs, fine-tuned using this dataset, in distinguishing between known and unknown queries within open-ended question-answering scenarios. The fine-tuned models demonstrated a significant improvement, achieving a considerable increase in F1-score relative to their pre-fine-tuning state. Through a comprehensive analysis, we reveal insights into the models’ improved uncertainty articulation and their consequent efficacy in multi-agent debates. These findings help us understand how LLMs can be trained to identify and express uncertainty, improving our knowledge of how they understand and express complex or unclear information.
@inproceedings{alfonso-etal-2024-knowledge, title = {Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models}, author = {Amayuelas, Alfonso and Wong, Kyle and Pan, Liangming and Chen, Wenhu and Wang, William Yang}, booktitle = {Findings of Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2024}, address = {Thailand}, publisher = {Association for Computational Linguistics}, url = {https://arxiv.org/abs/2305.13712} }
- The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language ModelsShuo Zhang, Liangming Pan, Junzhou Zhao, and William Yang WangIn Findings of Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Large language models often necessitate grounding on external knowledge to generate faithful and reliable answers. Yet even with the correct groundings in the reference, they can ignore them and rely on wrong groundings or their inherent biases to hallucinate when users, being largely unaware of the specifics of the stored information, pose questions that might not directly correlate with the retrieved groundings. In this work, we formulate this knowledge alignment problem and introduce MixAlign, a framework that interacts with both the human user and the knowledge base to obtain and integrate clarifications on how the user question relates to the stored information. MixAlign employs a language model to achieve automatic knowledge alignment and, if necessary, further enhances this alignment through human user clarifications. Experimental results highlight the crucial role of knowledge alignment in boosting model performance and mitigating hallucination, with improvements noted up to 22.2% and 27.1% respectively. We also demonstrate the effectiveness of MixAlign in improving knowledge alignment by producing high-quality, user-centered clarifications.
@inproceedings{zhang-etal-2024-knowledge, title = {The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language Models}, author = {Zhang, Shuo and Pan, Liangming and Zhao, Junzhou and Wang, William Yang}, booktitle = {Findings of Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2024}, address = {Thailand}, publisher = {Association for Computational Linguistics}, url = {https://arxiv.org/abs/2305.13669} }
- Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model AttributionXinze Li, Yixin Cao, Liangming Pan, Yubo Ma, and Aixin SunIn Findings of Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Although achieving great success, Large Language Models (LLMs) usually suffer from unreliable hallucinations. Although language attribution can be a potential solution, there are no suitable benchmarks and evaluation metrics to attribute LLMs to structured knowledge. In this paper, we define a new task of Knowledge-aware Language Model Attribution (KaLMA) that improves upon three core concerns with conventional attributed LMs. First, we extend attribution source from unstructured texts to Knowledge Graph (KG), whose rich structures benefit both the attribution performance and working scenarios. Second, we propose a new “Conscious Incompetence” setting considering the incomplete knowledge repository, where the model identifies the need for supporting knowledge beyond the provided KG. Third, we propose a comprehensive automatic evaluation metric encompassing text quality, citation quality, and text citation alignment. To implement the above innovations, we build a dataset in biography domain BioKaLMA via evolutionary question generation strategy, to control the question complexity and necessary knowledge to the answer. For evaluation, we develop a baseline solution and demonstrate the room for improvement in LLMs’ citation generation, emphasizing the importance of incorporating the “Conscious Incompetence” setting, and the critical role of retrieval accuracy.
@inproceedings{li-etal-2024-towards, title = {Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model Attribution}, author = {Li, Xinze and Cao, Yixin and Pan, Liangming and Ma, Yubo and Sun, Aixin}, booktitle = {Findings of Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2024}, address = {Thailand}, publisher = {Association for Computational Linguistics}, url = {https://arxiv.org/abs/2310.05634} }
- Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word ExclusionXiaobao Wu, Xinshuai Dong, Liangming Pan, Thong Nguyen, and Anh Tuan LuuIn Findings of Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Dynamic topic models track the evolution of topics in sequential documents, which have derived various applications like trend analysis and opinion mining. However, existing models suffer from repetitive topic and unassociated topic issues, failing to reveal the evolution and hindering further applications. To address these issues, we break the tradition of simply chaining topics in existing work and propose a novel neural \modelfullname. We introduce a new evolution-tracking contrastive learning method that builds the similarity relations among dynamic topics. This not only tracks topic evolution but also maintains topic diversity, mitigating the repetitive topic issue. To avoid unassociated topics, we further present an unassociated word exclusion method that consistently excludes unassociated words from discovered topics. Extensive experiments demonstrate our model significantly outperforms state-of-the-art baselines, tracking topic evolution with high-quality topics, showing better performance on downstream tasks, and remaining robust to the hyperparameter for evolution intensities.
@inproceedings{wu-etal-2024-modeling, title = {Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion}, author = {Wu, Xiaobao and Dong, Xinshuai and Pan, Liangming and Nguyen, Thong and Luu, Anh Tuan}, booktitle = {Findings of Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2024}, address = {Thailand}, publisher = {Association for Computational Linguistics}, url = {https://arxiv.org/abs/2405.17957} }
- Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths AggregationXinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, and William Yang WangIn International Conference on Machine Learning (ICML), 2024
Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning. To understand how pre-training with a next-token prediction objective contributes to the emergence of such reasoning capability, we propose that we can view an LM as deriving new conclusions by aggregating indirect reasoning paths seen at pre-training time. We found this perspective effective in two important cases of reasoning: logic reasoning with knowledge graphs (KGs) and chain-of-thought (CoT) reasoning. More specifically, we formalize the reasoning paths as random walk paths on the knowledge/reasoning graphs. Analyses of learned LM distributions suggest that a weighted sum of relevant random walk path probabilities is a reasonable way to explain how LMs reason. Experiments and analysis on multiple KG and CoT datasets reveal the effect of training on random walk paths and suggest that augmenting unlabeled random walk reasoning paths can improve real-world multi-step reasoning performance.
@inproceedings{wang-etal-2024-understanding, title = {Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation}, author = {Wang, Xinyi and Amayuelas, Alfonso and Zhang, Kexun and Pan, Liangming and Chen, Wenhu and Wang, William Yang}, booktitle = {International Conference on Machine Learning (ICML)}, year = {2024}, address = {Austria}, url = {https://arxiv.org/abs/2402.03268} }
- Tweets to Citations: Unveiling the Impact of Social Media Influencers on AI Research VisibilityIain Xie Weissburg, Mehir Arora, Xinyi Wang, Liangming Pan, and William Yang WangIn International Conference on Machine Learning (ICML), 2024
As the number of accepted papers at AI and ML conferences reaches into the thousands, it has become unclear how researchers access and read research publications. In this paper, we investigate the role of social media influencers in enhancing the visibility of machine learning research, particularly the citation counts of papers they share. We have compiled a comprehensive dataset of over 8,000 papers, spanning tweets from December 2018 to October 2023, alongside controls precisely matched by 9 key covariates. Our statistical and causal inference analysis reveals a significant increase in citations for papers endorsed by these influencers, with median citation counts 2-3 times higher than those of the control group. Additionally, the study delves into the geographic, gender, and institutional diversity of highlighted authors. Given these findings, we advocate for a responsible approach to curation, encouraging influencers to uphold the journalistic standard that includes showcasing diverse research topics, authors, and institutions.
@inproceedings{weissburg-etal-2024-tweets, title = {Tweets to Citations: Unveiling the Impact of Social Media Influencers on AI Research Visibility}, author = {Weissburg, Iain Xie and Arora, Mehir and Wang, Xinyi and Pan, Liangming and Wang, William Yang}, booktitle = {International Conference on Machine Learning (ICML)}, year = {2024}, address = {Austria}, url = {https://arxiv.org/abs/2401.13782} }
- A Survey on Data Selection for Language ModelsAlon Albalak, Yanai Elazar, Sang Michael Xie, Shayne Longpre, Nathan Lambert, Xinyi Wang, Niklas Muennighoff, Bairu Hou, Liangming Pan, Haewon Jeong, Colin Raffel, Shiyu Chang, Tatsunori Hashimoto, and William Yang WangTransactions on Machine Learning Research (TMLR), 2024
A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training. However, naively training a model on all available data may not be optimal (or feasible), as the quality of available text data can vary. Filtering out data can also decrease the carbon footprint and financial costs of training models by reducing the amount of training required. Data selection methods aim to determine which candidate data points to include in the training dataset and how to appropriately sample from the selected data points. The promise of improved data selection methods has caused the volume of research in the area to rapidly expand. However, because deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive, few organizations have the resources for extensive data selection research. Consequently, knowledge of effective data selection practices has become concentrated within a few organizations, many of which do not openly share their findings and methodologies. To narrow this gap in knowledge, we present a comprehensive review of existing literature on data selection methods and related research areas, providing a taxonomy of existing approaches. By describing the current landscape of research, this work aims to accelerate progress in data selection by establishing an entry point for new and established researchers. Additionally, throughout this review we draw attention to noticeable holes in the literature and conclude the paper by proposing promising avenues for future research.
@article{DBLP:journals/corr/abs-2402-16827, author = {Albalak, Alon and Elazar, Yanai and Xie, Sang Michael and Longpre, Shayne and Lambert, Nathan and Wang, Xinyi and Muennighoff, Niklas and Hou, Bairu and Pan, Liangming and Jeong, Haewon and Raffel, Colin and Chang, Shiyu and Hashimoto, Tatsunori and Wang, William Yang}, title = {A Survey on Data Selection for Language Models}, journal = {Transactions on Machine Learning Research (TMLR)}, year = {2024}, url = {https://doi.org/10.48550/arXiv.2402.16827} }
2023
- SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific TablesXinyuan Lu*, Liangming Pan*, Qian Liu, Preslav Nakov, and Min-Yen KanIn Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Current scientific fact-checking benchmarks exhibit several shortcomings, such as biases arising from crowd-sourced claims and an over-reliance on text-based evidence. We present SCITAB, a challenging evaluation dataset consisting of 1.2K expert-verified scientific claims that 1) originate from authentic scientific publications and 2) require compositional reasoning for verification. The claims are paired with evidence-containing scientific tables annotated with labels. Through extensive evaluations, we demonstrate that SCITAB poses a significant challenge to state-of-the-art models, including table-based pretraining models and large language models. All models except GPT-4 achieved performance barely above random guessing. Popular prompting techniques, such as Chain-of-Thought, do not achieve much performance gains on SCITAB. Our analysis uncovers several unique challenges posed by SCITAB, including table grounding, claim ambiguity, and compositional reasoning. Our codes and data are publicly available at https://github.com/XinyuanLu00/SciTab.
@inproceedings{lu-etal-2023-scitab, title = {{SCITAB}: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables}, author = {Lu, Xinyuan and Pan, Liangming and Liu, Qian and Nakov, Preslav and Kan, Min-Yen}, booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year = {2023}, address = {Singapore}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.emnlp-main.483}, pages = {7787--7813} }
- MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language ModelsDeepak Nathani, David Wang, Liangming Pan, and William WangIn Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Language Models (LMs) have shown impressive performance in various natural language tasks. However, when it comes to natural language reasoning, LMs still face challenges such as hallucination, generating incorrect intermediate reasoning steps, and making mathematical errors. Recent research has focused on enhancing LMs through *self-improvement* using feedback. Nevertheless, existing approaches relying on a single generic feedback source fail to address the diverse error types found in LM-generated reasoning chains. In this work, we propose **Multi-Aspect Feedback**, an iterative refinement framework that integrates multiple feedback modules, including frozen LMs and external tools, each focusing on a specific error category. Our experimental results demonstrate the efficacy of our approach to addressing several errors in the LM-generated reasoning chain and thus improving the overall performance of an LM in several reasoning tasks. We see an improvement of up to 20% in Mathematical Reasoning and up to 18% in Logical Entailment.
@inproceedings{nathani-etal-2023-maf, title = {{MAF}: Multi-Aspect Feedback for Improving Reasoning in Large Language Models}, author = {Nathani, Deepak and Wang, David and Pan, Liangming and Wang, William}, booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year = {2023}, address = {Singapore}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.emnlp-main.407}, pages = {6591--6616} }
- EMNLP Oral PresentationINSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic FeedbackWenda Xu, Danqing Wang, Liangming Pan, Zhenqiao Song, Markus Freitag, William Wang, and Lei LiIn Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Automatically evaluating the quality of language generation is critical. Although recent learned metrics show high correlation with human judgement, these metrics do not provide explicit explanation of their verdict, nor associate the scores with defects in the generated text. To address this limitation, we present INSTRUCTSCORE, a fine-grained explainable evaluation metric for text generation. By harnessing both explicit human instruction and the implicit knowledge of GPT-4, we fine-tune a text evaluation metric based on LLaMA, producing both a score for generated text and a human readable diagnostic report. We evaluate INSTRUCTSCORE on a variety of generation tasks, including translation, captioning, data-to-text, and commonsense generation. Experiments show that our 7B model surpasses all other unsupervised metrics, including those based on 175B GPT-3 and GPT-4. Surprisingly, our INSTRUCTSCORE, even without direct supervision from human-rated data, achieves performance levels on par with state-of-the-art metrics like COMET22, which were fine-tuned on human ratings.
@inproceedings{xu-etal-2023-instructscore, title = {{INSTRUCTSCORE}: Towards Explainable Text Generation Evaluation with Automatic Feedback}, author = {Xu, Wenda and Wang, Danqing and Pan, Liangming and Song, Zhenqiao and Freitag, Markus and Wang, William and Li, Lei}, booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year = {2023}, address = {Singapore}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.emnlp-main.365}, pages = {5967--5994} }
- Doolittle: Benchmarks and Corpora for Academic Writing FormalizationShizhe Diao, Yongyu Lei, Liangming Pan, Tianqing Fang, Wangchunshu Zhou, Sedrick Keh, Min-Yen Kan, and Tong ZhangIn Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Improving the quality of academic writing is a meaningful but challenging task. Conventional methods of language refinement focus on narrow, specific linguistic features within isolated sentences, such as grammatical errors and improper word use. We propose a more general task, Academic Writing Formalization (AWF), to improve the overall quality of formal academic writing at the paragraph level. We formulate this language refinement task as a formal text style transfer task which transfers informal-academic text to formal-academic and contribute a large-scale non-parallel dataset, Doolittle, for this purpose. Concurrently, we apply a method named metric-oriented reinforcement learning (MORL) to two large language models (LLM) where we incorporate different levels of automatic feedback into the training process. Our experiments reveal that existing text transfer models and grammatical error correction models address certain aspects of AWF but still have a significant performance gap compared to human performance. Meanwhile, language models fine-tuned with our MORL method exhibit considerably improved performance, rivaling the latest chatbot ChatGPT, but still have a non-negligible gap compared to the ground truth formal-academic texts in Doolittle.
@inproceedings{diao-etal-2023-doolittle, title = {Doolittle: Benchmarks and Corpora for Academic Writing Formalization}, author = {Diao, Shizhe and Lei, Yongyu and Pan, Liangming and Fang, Tianqing and Zhou, Wangchunshu and Keh, Sedrick and Kan, Min-Yen and Zhang, Tong}, booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year = {2023}, address = {Singapore}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.emnlp-main.809}, pages = {13093--13111} }
- Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical ReasoningLiangming Pan, Alon Albalak, Xinyi Wang, and William WangIn Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Large Language Models (LLMs) have shown human-like reasoning abilities but still struggle with complex logical problems. This paper introduces a novel framework, Logic-LM, which integrates LLMs with symbolic solvers to improve logical problem-solving. Our method first utilizes LLMs to translate a natural language problem into a symbolic formulation. Afterward, a deterministic symbolic solver performs inference on the formulated problem. We also introduce a self-refinement module, which utilizes the symbolic solver’s error messages to revise symbolic formalizations. We demonstrate Logic-LM’s effectiveness on five logical reasoning datasets: ProofWriter, PrOntoQA, FOLIO, LogicalDeduction, and AR-LSAT. On average, Logic-LM achieves a significant performance boost of 39.2% over using LLM alone with standard prompting and 18.4% over LLM with chain-of-thought prompting. Our findings suggest that Logic-LM, by combining LLMs with symbolic logic, offers a promising avenue for faithful logical reasoning.
@inproceedings{pan-etal-2023-logic, title = {Logic-{LM}: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning}, author = {Pan, Liangming and Albalak, Alon and Wang, Xinyi and Wang, William}, booktitle = {Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year = {2023}, address = {Singapore}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.findings-emnlp.248}, pages = {3806--3824} }
- On the Risk of Misinformation Pollution with Large Language ModelsYikang Pan*, Liangming Pan*, Wenhu Chen, Preslav Nakov, Min-Yen Kan, and William WangIn Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
We investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation and its subsequent impact on information-intensive applications, particularly Open-Domain Question Answering (ODQA) systems. We establish a threat model and simulate potential misuse scenarios, both unintentional and intentional, to assess the extent to which LLMs can be utilized to produce misinformation. Our study reveals that LLMs can act as effective misinformation generators, leading to a significant degradation (up to 87%) in the performance of ODQA systems. Moreover, we uncover disparities in the attributes associated with persuading humans and machines, presenting an obstacle to current human-centric approaches to combat misinformation. To mitigate the harm caused by LLM-generated misinformation, we propose three defense strategies: misinformation detection, vigilant prompting, and reader ensemble. These approaches have demonstrated promising results, albeit with certain associated costs. Lastly, we discuss the practicality of utilizing LLMs as automatic misinformation generators and provide relevant resources and code to facilitate future research in this area.
@inproceedings{pan-etal-2023-risk, title = {On the Risk of Misinformation Pollution with Large Language Models}, author = {Pan, Yikang and Pan, Liangming and Chen, Wenhu and Nakov, Preslav and Kan, Min-Yen and Wang, William}, booktitle = {Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year = {2023}, address = {Singapore}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.findings-emnlp.97}, pages = {1389--1403} }
- QACheck: A Demonstration System for Question-Guided Multi-Hop Fact-CheckingLiangming Pan, Xinyuan Lu, Min-Yen Kan, and Preslav NakovIn Conference on Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP Demo), 2023
Fact-checking real-world claims often requires intricate, multi-step reasoning due to the absence of direct evidence to support or refute them. However, existing fact-checking systems often lack transparency in their decision-making, making it challenging for users to comprehend their reasoning process. To address this, we propose the Question-guided Multi-hop Fact-Checking (QACheck) system, which guides the model’s reasoning process by asking a series of questions critical for verifying a claim. QACheck has five key modules: a claim verifier, a question generator, a question-answering module, a QA validator, and a reasoner. Users can input a claim into QACheck, which then predicts its veracity and provides a comprehensive report detailing its reasoning process, guided by a sequence of (question, answer) pairs. QACheck also provides the source of evidence supporting each question, fostering a transparent, explainable, and user-friendly fact-checking process.
@inproceedings{pan-etal-2023-qacheck, title = {{QAC}heck: A Demonstration System for Question-Guided Multi-Hop Fact-Checking}, author = {Pan, Liangming and Lu, Xinyuan and Kan, Min-Yen and Nakov, Preslav}, booktitle = {Conference on Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP Demo)}, year = {2023}, address = {Singapore}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.emnlp-demo.23}, pages = {264--273} }
- Fact-Checking Complex Claims with Program-Guided ReasoningLiangming Pan, Xiaobao Wu, Xinyuan Lu, Anh Tuan Luu, William Yang Wang, Min-Yen Kan, and Preslav NakovIn Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Fact-checking real-world claims often requires collecting multiple pieces of evidence and applying complex multi-step reasoning. In this paper, we present Program-Guided Fact-Checking (ProgramFC), a novel fact-checking model that decomposes complex claims into simpler sub-tasks that can be solved using a shared library of specialized functions. We first leverage the in-context learning ability of large language models to generate reasoning programs to guide the verification process. Afterward, we execute the program by delegating each sub-task to the corresponding sub-task handler. This process makes our model both explanatory and data-efficient, providing clear explanations of its reasoning process and requiring minimal training data. We evaluate ProgramFC on two challenging fact-checking datasets and show that it outperforms seven fact-checking baselines across different settings of evidence availability, with explicit output programs that benefit human debugging. Our codes and data are publicly available at \urlhttps://github.com/mbzuai-nlp/ProgramFC.
@inproceedings{pan-etal-2023-fact, title = {Fact-Checking Complex Claims with Program-Guided Reasoning}, author = {Pan, Liangming and Wu, Xiaobao and Lu, Xinyuan and Luu, Anh Tuan and Wang, William Yang and Kan, Min-Yen and Nakov, Preslav}, booktitle = {Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2023}, address = {Toronto, Canada}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.acl-long.386}, pages = {6981--7004} }
- Modeling What-to-ask and How-to-ask for Answer-unaware Conversational Question GenerationXuan Long Do, Bowei Zou, Shafiq Joty, Tran Tai, Liangming Pan, Nancy Chen, and Ai Ti AwIn Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Conversational Question Generation (CQG) is a critical task for machines to assist humans in fulfilling their information needs through conversations. The task is generally cast into two different settings: answer-aware and answer-unaware. While the former facilitates the models by exposing the expected answer, the latter is more realistic and receiving growing attentions recently. What-to-ask and how-to-ask are the two main challenges in the answer-unaware setting. To address the first challenge, existing methods mainly select sequential sentences in context as the rationales. We argue that the conversation generated using such naive heuristics may not be natural enough as in reality, the interlocutors often talk about the relevant contents that are not necessarily sequential in context. Additionally, previous methods decide the type of question to be generated (boolean/span-based) implicitly. Modeling the question type explicitly is crucial as the answer, which hints the models to generate a boolean or span-based question, is unavailable. To this end, we present SG-CQG, a two-stage CQG framework. For the what-to-ask stage, a sentence is selected as the rationale from a semantic graph that we construct, and extract the answer span from it. For the how-to-ask stage, a classifier determines the target answer type of the question via two explicit control signals before generating and filtering. In addition, we propose Conv-Distinct, a novel evaluation metric for CQG, to evaluate the diversity of the generated conversation from a context. Compared with the existing answer-unaware CQG models, the proposed SG-CQG achieves state-of-the-art performance.
@inproceedings{do-etal-2023-modeling, title = {Modeling What-to-ask and How-to-ask for Answer-unaware Conversational Question Generation}, author = {Do, Xuan Long and Zou, Bowei and Joty, Shafiq and Tai, Tran and Pan, Liangming and Chen, Nancy and Aw, Ai Ti}, booktitle = {Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2023}, address = {Toronto, Canada}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.acl-long.603}, pages = {10785--10803} }
- AACL / IJCNLP Oral PresentationAttacking Open-domain Question Answering by Injecting MisinformationLiangming Pan, Wenhu Chen, Min-Yen Kan, and William Yang WangIn International Joint Conference on Natural Language Processing and Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL), 2023Area Chair Award (Question Answering Track)
With a rise in false, inaccurate, and misleading information in propaganda, news, and social media, real-world Question Answering (QA) systems face the challenges of synthesizing and reasoning over misinformation-polluted contexts to derive correct answers. This urgency gives rise to the need to make QA systems robust to misinformation, a topic previously unexplored. We study the risk of misinformation to QA models by investigating the sensitivity of open-domain QA models to corpus pollution with misinformation documents. We curate both human-written and model-generated false documents that we inject into the evidence corpus of QA models and assess the impact on the performance of these systems. Experiments show that QA models are vulnerable to even small amounts of evidence contamination brought by misinformation, with large absolute performance drops on all models. Misinformation attack brings more threat when fake documents are produced at scale by neural models or the attacker targets hacking specific questions of interest. To defend against such a threat, we discuss the necessity of building a misinformation-aware QA system that integrates question-answering and misinformation detection in a joint fashion.
@inproceedings{pan-etal-2023-attacking, title = {Attacking Open-domain Question Answering by Injecting Misinformation}, author = {Pan, Liangming and Chen, Wenhu and Kan, Min-Yen and Wang, William Yang}, booktitle = {International Joint Conference on Natural Language Processing and Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL)}, year = {2023}, address = {Nusa Dua, Bali}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.ijcnlp-main.35}, pages = {525--539} }
- AACL / IJCNLP Oral PresentationInvestigating Zero- and Few-shot Generalization in Fact VerificationLiangming Pan, Yunxiang Zhang, and Min-Yen KanIn International Joint Conference on Natural Language Processing and Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL), 2023
In this paper, we explore zero- and few-shot generalization for fact verification (FV), which aims to generalize the FV model trained on well-resourced domains (e.g., Wikipedia) to low-resourced domains that lack human annotations. To this end, we first construct a benchmark dataset collection which contains 11 FV datasets representing 6 domains. We conduct an empirical analysis of generalization across these FV datasets, finding that current models generalize poorly. Our analysis reveals that several factors affect generalization, including dataset size, length of evidence, and the type of claims. Finally, we show that two directions of work improve generalization: 1) incorporating domain knowledge via pretraining on specialized domains, and 2) automatically generating training data via claim generation.
@inproceedings{pan-etal-2023-investigating, title = {Investigating Zero- and Few-shot Generalization in Fact Verification}, author = {Pan, Liangming and Zhang, Yunxiang and Kan, Min-Yen}, booktitle = {International Joint Conference on Natural Language Processing and Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL)}, year = {2023}, address = {Nusa Dua, Bali}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.ijcnlp-main.34}, pages = {511--524} }
- AACL / IJCNLP Oral PresentationFollowupQG: Towards Information-Seeking Follow-up Question GenerationYan Meng, Liangming Pan, Yixin Cao, and Min-Yen KanIn International Joint Conference on Natural Language Processing and Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL), 2023
Humans ask follow-up questions driven by curiosity, which reflects a creative human cognitive process. We introduce the task of real-world information-seeking follow-up question generation (FQG), which aims to generate follow-up questions seeking a more in-depth understanding of an initial question and answer. We construct FOLLOWUPQG, a dataset of over 3K real-world (initial question, answer, follow-up question) tuples collected from a Reddit forum providing layman-friendly explanations for open-ended questions. In contrast to existing datasets, questions in FOLLOWUPQG use more diverse pragmatic strategies to seek information, and they also show higher-order cognitive skills (such as applying and relating). We evaluate current question generation models on their efficacy for generating follow-up questions, exploring how to generate specific types of follow-up questions based on step-by-step demonstrations. Our results validate FOLLOWUPQG as a challenging benchmark, as model-generated questions are adequate but far from human-raised questions in terms of informativeness and complexity.
@inproceedings{meng-etal-2023-followupqg, title = {{F}ollowup{QG}: Towards Information-Seeking Follow-up Question Generation}, author = {Meng, Yan and Pan, Liangming and Cao, Yixin and Kan, Min-Yen}, booktitle = {International Joint Conference on Natural Language Processing and Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL)}, year = {2023}, address = {Nusa Dua, Bali}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.ijcnlp-main.17}, pages = {252--271} }
- Efficient Online Data Mixing For Language Model Pre-TrainingAlon Albalak, Liangming Pan, Colin Raffel, and William Yang WangIn NeurIPS Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models (R0-FoMo@NeurIPS), 2023Spotlight Paper
The data used to pretrain large language models has a decisive impact on a model’s downstream performance, which has led to a large body of work on data selection methods that aim to automatically determine the most suitable data to use for pretraining. Existing data selection methods suffer from slow and computationally expensive processes, a problem amplified by the increasing size of models and of pretraining datasets. Data mixing, on the other hand, reduces the complexity of data selection by grouping data points together and determining sampling probabilities across entire groups. However, data mixing proportions are typically fixed before training and therefore cannot adapt to changing training dynamics. To address these limitations, we develop an efficient algorithm for Online Data Mixing (ODM) that combines elements from both data selection and data mixing. Based on multi-armed bandit algorithms, our online approach optimizes the data mixing proportions during training. Remarkably, our method trains a model that reaches the final perplexity of the next best method with 19% fewer training iterations, and improves performance on the 5-shot MMLU benchmark by 1.9% relative accuracy, while adding negligible wall-clock time during pretraining.
@inproceedings{weissburg-etal-2024-tweett, title = {Efficient Online Data Mixing For Language Model Pre-Training}, author = {Albalak, Alon and Pan, Liangming and Raffel, Colin and Wang, William Yang}, booktitle = {NeurIPS Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models (R0-FoMo@NeurIPS)}, year = {2023}, address = {New Orleans, USA}, url = {https://arxiv.org/abs/2312.02406} }
- Hashtag-Guided Low-Resource Tweet ClassificationShizhe Diao, Sedrick Scott Keh, Liangming Pan, Zhiliang Tian, Yan Song, and Tong ZhangIn International World Wide Web Conference (WWW), 2023
Social media classification tasks (e.g., tweet sentiment analysis, tweet stance detection) are challenging because social media posts are typically short, informal, and ambiguous. Thus, training on tweets is challenging and demands large-scale human-annotated labels, which are time-consuming and costly to obtain. In this paper, we find that providing hashtags to social media tweets can help alleviate this issue because hashtags can enrich short and ambiguous tweets in terms of various information, such as topic, sentiment, and stance. This motivates us to propose a novel Hashtag-guided Tweet Classification model (HashTation), which automatically generates meaningful hashtags for the input tweet to provide useful auxiliary signals for tweet classification. To generate high-quality and insightful hashtags, our hashtag generation model retrieves and encodes the post-level and entity-level information across the whole corpus. Experiments show that HashTation achieves significant improvements on seven low-resource tweet classification tasks, in which only a limited amount of training data is provided, showing that automatically enriching tweets with model-generated hashtags could significantly reduce the demand for large-scale human-labeled data. Further analysis demonstrates that HashTation is able to generate high-quality hashtags that are consistent with the tweets and their labels.
@inproceedings{DBLP:conf/www/DiaoKPT0023, author = {Diao, Shizhe and Keh, Sedrick Scott and Pan, Liangming and Tian, Zhiliang and Song, Yan and Zhang, Tong}, title = {Hashtag-Guided Low-Resource Tweet Classification}, booktitle = {International World Wide Web Conference (WWW)}, pages = {1415--1426}, publisher = {{ACM}}, year = {2023}, url = {https://doi.org/10.1145/3543507.3583194} }
- InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic ModelingXiaobao Wu, Xinshuai Dong, Thong Nguyen, Chaoqun Liu, Liangming Pan, and Anh Tuan LuuIn AAAI Conference on Artificial Intelligence (AAAI), 2023
@inproceedings{DBLP:conf/aaai/WuDNLPL23, author = {Wu, Xiaobao and Dong, Xinshuai and Nguyen, Thong and Liu, Chaoqun and Pan, Liangming and Luu, Anh Tuan}, title = {InfoCTM: {A} Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling}, booktitle = {AAAI Conference on Artificial Intelligence (AAAI)}, pages = {13763--13771}, publisher = {{AAAI} Press}, year = {2023}, url = {https://doi.org/10.1609/aaai.v37i11.26612} }
2022
- ACL Oral PresentationKQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge BaseShulin Cao, Jiaxin Shi, Liangming Pan, Lunyiu Nie, Yutong Xiang, Lei Hou, Juanzi Li, Bin He, and Hanwang ZhangIn Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Complex question answering over knowledge base (Complex KBQA) is challenging because it requires various compositional reasoning capabilities, such as multi-hop inference, attribute comparison, set operation, etc. Existing benchmarks have some shortcomings that limit the development of Complex KBQA: 1) they only provide QA pairs without explicit reasoning processes; 2) questions are poor in diversity or scale. To this end, we introduce KQA Pro, a dataset for Complex KBQA including around 120K diverse natural language questions. We introduce a compositional and interpretable programming language KoPL to represent the reasoning process of complex questions. For each question, we provide the corresponding KoPL program and SPARQL query, so that KQA Pro can serve for both KBQA and semantic parsing tasks. Experimental results show that state-of-the-art KBQA methods cannot achieve promising results on KQA Pro as on current datasets, which suggests that KQA Pro is challenging and Complex KBQA requires further research efforts. We also treat KQA Pro as a diagnostic dataset for testing multiple reasoning skills, conduct a thorough evaluation of existing models and discuss further directions for Complex KBQA.
@inproceedings{cao-etal-2022-kqa, title = {{KQA} Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base}, author = {Cao, Shulin and Shi, Jiaxin and Pan, Liangming and Nie, Lunyiu and Xiang, Yutong and Hou, Lei and Li, Juanzi and He, Bin and Zhang, Hanwang}, booktitle = {Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2022}, address = {Dublin, Ireland}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.acl-long.422}, pages = {6101--6119} }
- Interpreting the Robustness of Neural NLP Models to Textual PerturbationsYunxiang Zhang, Liangming Pan, Samson Tan, and Min-Yen KanIn Findings of Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Modern Natural Language Processing (NLP) models are known to be sensitive to input perturbations and their performance can decrease when applied to real-world, noisy data. However, it is still unclear why models are less robust to some perturbations than others. In this work, we test the hypothesis that the extent to which a model is affected by an unseen textual perturbation (robustness) can be explained by the learnability of the perturbation (defined as how well the model learns to identify the perturbation with a small amount of evidence). We further give a causal justification for the learnability metric. We conduct extensive experiments with four prominent NLP models — TextRNN, BERT, RoBERTa and XLNet — over eight types of textual perturbations on three datasets. We show that a model which is better at identifying a perturbation (higher learnability) becomes worse at ignoring such a perturbation at test time (lower robustness), providing empirical support for our hypothesis.
@inproceedings{zhang-etal-2022-interpreting, title = {Interpreting the Robustness of Neural {NLP} Models to Textual Perturbations}, author = {Zhang, Yunxiang and Pan, Liangming and Tan, Samson and Kan, Min-Yen}, booktitle = {Findings of Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2022}, address = {Dublin, Ireland}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.findings-acl.315}, pages = {3993--4007} }
- KHANQ: A Dataset for Generating Deep Questions in EducationHuanli Gong, Liangming Pan, and Hengchang HuIn International Conference on Computational Linguistics (COLING), 2022
Designing in-depth educational questions is a time-consuming and cognitively demanding task. Therefore, it is intriguing to study how to build Question Generation (QG) models to automate the question creation process. However, existing QG datasets are not suitable for educational question generation because the questions are not real questions asked by humans during learning and can be solved by simply searching for information. To bridge this gap, we present KHANQ, a challenging dataset for educational question generation, containing 1,034 high-quality learner-generated questions seeking an in-depth understanding of the taught online courses in Khan Academy. Each data sample is carefully paraphrased and annotated as a triple of 1) Context: an independent paragraph on which the question is based; 2) Prompt: a text prompt for the question (e.g., the learner’s background knowledge); 3) Question: a deep question based on Context and coherent with Prompt. By conducting a human evaluation on the aspects of appropriateness, coverage, coherence, and complexity, we show that state-of-the-art QG models which perform well on shallow question generation datasets have difficulty in generating useful educational questions. This makes KHANQ a challenging testbed for educational question generation.
@inproceedings{gong-etal-2022-khanq, title = {{KHANQ}: A Dataset for Generating Deep Questions in Education}, author = {Gong, Huanli and Pan, Liangming and Hu, Hengchang}, booktitle = {International Conference on Computational Linguistics (COLING)}, year = {2022}, address = {Gyeongju, Republic of Korea}, publisher = {International Committee on Computational Linguistics}, url = {https://aclanthology.org/2022.coling-1.518}, pages = {5925--5938} }
- CoHS-CQG: Context and History Selection for Conversational Question GenerationXuan Long Do, Bowei Zou, Liangming Pan, Nancy F. Chen, Shafiq Joty, and Ai Ti AwIn International Conference on Computational Linguistics (COLING), 2022
Conversational question generation (CQG) serves as a vital task for machines to assist humans, such as interactive reading comprehension, through conversations. Compared to traditional single-turn question generation (SQG), CQG is more challenging in the sense that the generated question is required not only to be meaningful, but also to align with the provided conversation. Previous studies mainly focus on how to model the flow and alignment of the conversation, but do not thoroughly study which parts of the context and history are necessary for the model. We believe that shortening the context and history is crucial as it can help the model to optimise more on the conversational alignment property. To this end, we propose CoHS-CQG, a two-stage CQG framework, which adopts a novel CoHS module to shorten the context and history of the input. In particular, it selects the top-p sentences and history turns by calculating the relevance scores of them. Our model achieves state-of-the-art performances on CoQA in both the answer-aware and answer-unaware settings.
@inproceedings{do-etal-2022-cohs, title = {{C}o{HS}-{CQG}: Context and History Selection for Conversational Question Generation}, author = {Do, Xuan Long and Zou, Bowei and Pan, Liangming and Chen, Nancy F. and Joty, Shafiq and Aw, Ai Ti}, booktitle = {International Conference on Computational Linguistics (COLING)}, year = {2022}, address = {Gyeongju, Republic of Korea}, publisher = {International Committee on Computational Linguistics}, url = {https://aclanthology.org/2022.coling-1.48}, pages = {580--591} }
- Automatic True/False Question Generation for Educational PurposeBowei Zou, Pengfei Li, Liangming Pan, and Ai Ti AwIn NAACL Workshop on Innovative Use of NLP for Building Educational Applications (BEA@NAACL), 2022
In field of teaching, true/false questioning is an important educational method for assessing students’ general understanding of learning materials. Manually creating such questions requires extensive human effort and expert knowledge. Question Generation (QG) technique offers the possibility to automatically generate a large number of questions. However, there is limited work on automatic true/false question generation due to the lack of training data and difficulty finding question-worthy content. In this paper, we propose an unsupervised True/False Question Generation approach (TF-QG) that automatically generates true/false questions from a given passage for reading comprehension test. TF-QG consists of a template-based framework that aims to test the specific knowledge in the passage by leveraging various NLP techniques, and a generative framework to generate more flexible and complicated questions by using a novel masking-and-infilling strategy. Human evaluation shows that our approach can generate high-quality and valuable true/false questions. In addition, simulated testing on the generated questions challenges the state-of-the-art inference models from NLI, QA, and fact verification tasks.
@inproceedings{zou-etal-2022-automatic, title = {Automatic True/False Question Generation for Educational Purpose}, author = {Zou, Bowei and Li, Pengfei and Pan, Liangming and Aw, Ai Ti}, booktitle = {NAACL Workshop on Innovative Use of NLP for Building Educational Applications (BEA@NAACL)}, year = {2022}, address = {Seattle, Washington}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.bea-1.10}, pages = {61--70} }
- Ingredient-enriched Recipe Generation from Cooking VideosJianlong Wu, Liangming Pan, Jingjing Chen, and Yu-Gang JiangIn International Conference on Multimedia Retrieval (ICMR), 2022
Cooking video captioning aims to generate the text instructions that describes the cooking procedures presented in the video. Current approaches tend to use large neural models or use more robust feature extractors to increase the expressive ability of features, ignoring the strong correlation between consecutive cooking steps in the video. However, it is intuitive that previous cooking steps can provide clues for the next cooking step. Specially, consecutive cooking steps tend to share the same ingredients. Therefore, accurate ingredients recognition can help to introduce more fine-grained information in captioning. To improve the performance of video procedural caption in cooking video, this paper proposes a framework that introduces ingredient recognition module which uses the copy mechanism to fuse the predicted ingredient information into the generated sentence. Moreover, we integrate the visual information of the previous step into the generation of the current step, and the visual information of the two steps together assist in the generation process. Extensive experiments verify the effectiveness of our propose framework and it achieves the promising performances on both YouCookII and Cooking-COIN datasets.
@inproceedings{DBLP:conf/mir/WuPCJ22, author = {Wu, Jianlong and Pan, Liangming and Chen, Jingjing and Jiang, Yu{-}Gang}, title = {Ingredient-enriched Recipe Generation from Cooking Videos}, booktitle = {International Conference on Multimedia Retrieval (ICMR)}, pages = {249--257}, publisher = {{ACM}}, year = {2022}, url = {https://doi.org/10.1145/3512527.3531388} }
- Modeling and Leveraging Prerequisite Context in RecommendationHengchang Hu, Liangming Pan, Yiding Ran, and Min-Yen KanIn RecSys Workshop on Context-Aware Recommender Systems (CARS@RecSys), 2022
Prerequisites can play a crucial role in users’ decision-making yet recommendation systems have not fully utilized such contextual background knowledge. Traditional recommendation systems (RS) mostly enrich user-item interactions where the context consists of static user profiles and item descriptions, ignoring the contextual logic and constraints that underlie them. For example, an RS may recommend an item on the condition that the user has interacted with another item as its prerequisite. Modeling prerequisite context from conceptual side information can overcome this weakness. We propose Prerequisite Driven Recommendation (PDR), a generic context-aware framework where prerequisite context is explicitly modeled to facilitate recommendation. We first design a Prerequisite Knowledge Linking (PKL) algorithm, to curate datasets facilitating PDR research. Employing it, we build a 75k+ high-quality prerequisite concept dataset which spans three domains. We then contribute PDRS, a neural instantiation of PDR. By jointly optimizing both the prerequisite learning and recommendation tasks through multi-layer perceptrons, we find PDRS consistently outperforms baseline models in all three domains, by an average margin of 7.41%. Importantly, PDRS performs especially well in cold-start scenarios with improvements of up to 17.65%.
@inproceedings{he-etal-2022-modeling, title = {Modeling and Leveraging Prerequisite Context in Recommendation}, author = {Hu, Hengchang and Pan, Liangming and Ran, Yiding and Kan, Min{-}Yen}, booktitle = {RecSys Workshop on Context-Aware Recommender Systems (CARS@RecSys)}, year = {2022}, address = {Seattle, WA, USA}, url = {https://arxiv.org/abs/2209.11471} }
2021
- ACL Oral PresentationZero-shot Fact Verification by Claim GenerationLiangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, and William Yang WangIn Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Neural models for automated fact verification have achieved promising results thanks to the availability of large, human-annotated datasets. However, for each new domain that requires fact verification, creating a dataset by manually writing claims and linking them to their supporting evidence is expensive. We develop QACG, a framework for training a robust fact verification model by using automatically generated claims that can be supported, refuted, or unverifiable from evidence from Wikipedia. QACG generates question-answer pairs from the evidence and then converts them into different types of claims. Experiments on the FEVER dataset show that our QACG framework significantly reduces the demand for human-annotated training data. In a zero-shot scenario, QACG improves a RoBERTa model’s F1 from 50% to 77%, equivalent in performance to 2K+ manually-curated examples. Our QACG code is publicly available.
@inproceedings{pan-etal-2021-zero, title = {Zero-shot Fact Verification by Claim Generation}, author = {Pan, Liangming and Chen, Wenhu and Xiong, Wenhan and Kan, Min-Yen and Wang, William Yang}, booktitle = {Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2021}, address = {Online}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2021.acl-short.61}, pages = {476--483} }
- NAACL Oral PresentationUnsupervised Multi-hop Question Answering by Question GenerationLiangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, and William Yang WangIn Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Obtaining training data for multi-hop question answering (QA) is time-consuming and resource-intensive. We explore the possibility to train a well-performed multi-hop QA model without referencing any human-labeled multi-hop question-answer pairs, i.e., unsupervised multi-hop QA. We propose MQA-QG, an unsupervised framework that can generate human-like multi-hop training data from both homogeneous and heterogeneous data sources. MQA-QG generates questions by first selecting/generating relevant information from each data source and then integrating the multiple information to form a multi-hop question. Using only generated training data, we can train a competent multi-hop QA which achieves 61% and 83% of the supervised learning performance for the HybridQA and the HotpotQA dataset, respectively. We also show that pretraining the QA system with the generated data would greatly reduce the demand for human-annotated training data.
@inproceedings{pan-etal-2021-unsupervised, title = {Unsupervised Multi-hop Question Answering by Question Generation}, author = {Pan, Liangming and Chen, Wenhu and Xiong, Wenhan and Kan, Min-Yen and Wang, William Yang}, booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2021}, address = {Online}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2021.naacl-main.469}, pages = {5866--5880} }
- A Hybrid Approach for Detecting Prerequisite Relations in Multi-Modal Food RecipesLiangming Pan, Jingjing Chen, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, and Tat-Seng ChuaIEEE Transactions on Multimedia (TMM), 2021
Modeling the structure of culinary recipes is the core of recipe representation learning. Current approaches mostly focus on extracting the workflow graph from recipes based on text descriptions. Process images, which constitute an important part of cooking recipes, has rarely been investigated in recipe structure modeling. We study this recipe structure problem from a multi-modal learning perspective, by proposing a prerequisite tree to represent recipes with cooking images at a step-level granularity. We propose a simple-yet-effective two-stage framework to automatically construct the prerequisite tree for a recipe by (1) utilizing a trained classifier to detect pairwise prerequisite relations that fuses multi-modal features as input; then (2) applying different strategies (greedy method, maximum weight, and beam search) to build the tree structure. Experiments on the MM-ReS dataset demonstrates the advantages of introducing process images for recipe structure modeling. Also, compared with neural methods which require large numbers of training data, we show that our two-stage pipeline can achieve promising results using only 400 labeled prerequisite trees as training data.
@article{DBLP:journals/tmm/PanCLNKC21, author = {Pan, Liangming and Chen, Jingjing and Liu, Shaoteng and Ngo, Chong{-}Wah and Kan, Min{-}Yen and Chua, Tat{-}Seng}, title = {A Hybrid Approach for Detecting Prerequisite Relations in Multi-Modal Food Recipes}, journal = {IEEE Transactions on Multimedia (TMM)}, volume = {23}, pages = {4491--4501}, year = {2021}, url = {https://doi.org/10.1109/TMM.2020.3042706} }
2020
- ACL Oral PresentationSemantic Graphs for Generating Deep QuestionsLiangming Pan, Yuxi Xie, Yansong Feng, Tat-Seng Chua, and Min-Yen KanIn Annual Meeting of the Association for Computational Linguistics (ACL), 2020
This paper proposes the problem of Deep Question Generation (DQG), which aims to generate complex questions that require reasoning over multiple pieces of information about the input passage. In order to capture the global structure of the document and facilitate reasoning, we propose a novel framework that first constructs a semantic-level graph for the input document and then encodes the semantic graph by introducing an attention-based GGNN (Att-GGNN). Afterward, we fuse the document-level and graph-level representations to perform joint training of content selection and question decoding. On the HotpotQA deep-question centric dataset, our model greatly improves performance over questions requiring reasoning over multiple facts, leading to state-of-the-art performance.
@inproceedings{pan-etal-2020-semantic, title = {Semantic Graphs for Generating Deep Questions}, author = {Pan, Liangming and Xie, Yuxi and Feng, Yansong and Chua, Tat-Seng and Kan, Min-Yen}, booktitle = {Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2020}, address = {Online}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2020.acl-main.135}, pages = {1463--1475} }
- ACL Oral PresentationExpertise Style Transfer: A New Task Towards Better Communication between Experts and LaymenYixin Cao, Ruihao Shui, Liangming Pan, Min-Yen Kan, Zhiyuan Liu, and Tat-Seng ChuaIn Annual Meeting of the Association for Computational Linguistics (ACL), 2020
The curse of knowledge can impede communication between experts and laymen. We propose a new task of expertise style transfer and contribute a manually annotated dataset with the goal of alleviating such cognitive biases. Solving this task not only simplifies the professional language, but also improves the accuracy and expertise level of laymen descriptions using simple words. This is a challenging task, unaddressed in previous work, as it requires the models to have expert intelligence in order to modify text with a deep understanding of domain knowledge and structures. We establish the benchmark performance of five state-of-the-art models for style transfer and text simplification. The results demonstrate a significant gap between machine and human performance. We also discuss the challenges of automatic evaluation, to provide insights into future research directions.
@inproceedings{cao-etal-2020-expertise, title = {Expertise Style Transfer: A New Task Towards Better Communication between Experts and Laymen}, author = {Cao, Yixin and Shui, Ruihao and Pan, Liangming and Kan, Min-Yen and Liu, Zhiyuan and Chua, Tat-Seng}, booktitle = {Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2020}, address = {Online}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2020.acl-main.100}, pages = {1061--1071} }
- Exploring and Evaluating Attributes, Values, and Structures for Entity AlignmentZhiyuan Liu, Yixin Cao, Liangming Pan, Juanzi Li, Zhiyuan Liu, and Tat-Seng ChuaIn Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Entity alignment (EA) aims at building a unified Knowledge Graph (KG) of rich content by linking the equivalent entities from various KGs. GNN-based EA methods present promising performance by modeling the KG structure defined by relation triples. However, attribute triples can also provide crucial alignment signal but have not been well explored yet. In this paper, we propose to utilize an attributed value encoder and partition the KG into subgraphs to model the various types of attribute triples efficiently. Besides, the performances of current EA methods are overestimated because of the name-bias of existing EA datasets. To make an objective evaluation, we propose a hard experimental setting where we select equivalent entity pairs with very different names as the test set. Under both the regular and hard settings, our method achieves significant improvements (5.10% on average Hits@1 in DBP15k) over 12 baselines in cross-lingual and monolingual datasets. Ablation studies on different subgraphs and a case study about attribute types further demonstrate the effectiveness of our method.
@inproceedings{liu-etal-2020-exploring, title = {Exploring and Evaluating Attributes, Values, and Structures for Entity Alignment}, author = {Liu, Zhiyuan and Cao, Yixin and Pan, Liangming and Li, Juanzi and Liu, Zhiyuan and Chua, Tat-Seng}, booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year = {2020}, address = {Online}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2020.emnlp-main.515}, pages = {6355--6364} }
- Exploring Question-Specific Rewards for Generating Deep QuestionsYuxi Xie, Liangming Pan*, Dongzhe Wang, Min-Yen Kan, and Yansong FengIn International Conference on Computational Linguistics (COLING), 2020
Recent question generation (QG) approaches often utilize the sequence-to-sequence framework (Seq2Seq) to optimize the log likelihood of ground-truth questions using teacher forcing. However, this training objective is inconsistent with actual question quality, which is often reflected by certain global properties such as whether the question can be answered by the document. As such, we directly optimize for QG-specific objectives via reinforcement learning to improve question quality. We design three different rewards that target to improve the fluency, relevance, and answerability of generated questions. We conduct both automatic and human evaluations in addition to thorough analysis to explore the effect of each QG-specific reward. We find that optimizing on question-specific rewards generally leads to better performance in automatic evaluation metrics. However, only the rewards that correlate well with human judgement (e.g., relevance) lead to real improvement in question quality. Optimizing for the others, especially answerability, introduces incorrect bias to the model, resulting in poorer question quality.
@inproceedings{xie-etal-2020-exploring, title = {Exploring Question-Specific Rewards for Generating Deep Questions}, author = {Xie, Yuxi and Pan, Liangming and Wang, Dongzhe and Kan, Min-Yen and Feng, Yansong}, booktitle = {International Conference on Computational Linguistics (COLING)}, year = {2020}, address = {Barcelona, Spain (Online)}, publisher = {International Committee on Computational Linguistics}, url = {https://aclanthology.org/2020.coling-main.228}, pages = {2534--2546} }
- ACM MM Oral PresentationMulti-modal Cooking Workflow Construction for Food RecipesLiangming Pan, Jingjing Chen, Jianlong Wu, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, Yu-Gang Jiang, and Tat-Seng ChuaIn ACM International Conference on Multimedia (ACM MM), 2020
Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe. This is a non-trivial task that involves common-sense reasoning. However, existing efforts rely on hand-crafted features to extract the workflow graph from recipes due to the lack of large-scale labeled datasets. Moreover, they fail to utilize the cooking images, which constitute an important part of food recipes. In this paper, we build MM-ReS, the first large-scale dataset for cooking workflow construction, consisting of 9,850 recipes with human-labeled workflow graphs. Cooking steps are multi-modal, featuring both text instructions and cooking images. We then propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow, which achieved over 20% performance gain over existing hand-crafted baselines.
@inproceedings{DBLP:conf/mm/PanCWLNKJC20, author = {Pan, Liangming and Chen, Jingjing and Wu, Jianlong and Liu, Shaoteng and Ngo, Chong{-}Wah and Kan, Min{-}Yen and Jiang, Yu{-}Gang and Chua, Tat{-}Seng}, title = {Multi-modal Cooking Workflow Construction for Food Recipes}, booktitle = {ACM International Conference on Multimedia (ACM MM)}, pages = {1132--1141}, publisher = {{ACM}}, year = {2020}, url = {https://doi.org/10.1145/3394171.3413765} }
- CVPRHyperbolic Visual Embedding Learning for Zero-Shot RecognitionShaoteng Liu, Jingjing Chen, Liangming Pan, Chong-Wah Ngo, Tat-Seng Chua, and Yu-Gang JiangIn IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
This paper proposes a Hyperbolic Visual Embedding Learning Network for zero-shot recognition. The network learns image embeddings in hyperbolic space, which is capable of preserving the hierarchical structure of semantic classes in low dimensions. Comparing with existing zero-shot learning approaches, the network is more robust because the embedding feature in hyperbolic space better represents class hierarchy and thereby avoid misleading resulted from unrelated siblings. Our network outperforms exiting baselines under hierarchical evaluation with an extremely challenging setting, i.e., learning only from 1,000 categories to recognize 20,841 unseen categories. While under flat evaluation, it has competitive performance as state-of-the-art methods but with five times lower embedding dimensions.
@inproceedings{DBLP:conf/cvpr/LiuCPNCJ20, author = {Liu, Shaoteng and Chen, Jingjing and Pan, Liangming and Ngo, Chong{-}Wah and Chua, Tat{-}Seng and Jiang, Yu{-}Gang}, title = {Hyperbolic Visual Embedding Learning for Zero-Shot Recognition}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, pages = {9270--9278}, publisher = {Computer Vision Foundation / {IEEE}}, year = {2020}, url = {https://openaccess.thecvf.com/content\_CVPR\_2020/html/Liu\_Hyperbolic\_Visual\_Embedding\_Learning\_for\_Zero-Shot\_Recognition\_CVPR\_2020\_paper.html} }
- Zero-Shot Ingredient Recognition by Multi-Relational Graph Convolutional NetworkJingjing Chen, Liangming Pan, Zhipeng Wei, Xiang Wang, Chong-Wah Ngo, and Tat-Seng ChuaIn AAAI Conference on Artificial Intelligence (AAAI), 2020
Recognizing ingredients for a given dish image is at the core of automatic dietary assessment, attracting increasing attention from both industry and academia. Nevertheless, the task is challenging due to the difficulty of collecting and labeling sufficient training data. On one hand, there are hundred thousands of food ingredients in the world, ranging from the common to rare. Collecting training samples for all of the ingredient categories is difficult. On the other hand, as the ingredient appearances exhibit huge visual variance during the food preparation, it requires to collect the training samples under different cooking and cutting methods for robust recognition. Since obtaining sufficient fully annotated training data is not easy, a more practical way of scaling up the recognition is to develop models that are capable of recognizing unseen ingredients. Therefore, in this paper, we target the problem of ingredient recognition with zero training samples. More specifically, we introduce multi-relational GCN (graph convolutional network) that integrates ingredient hierarchy, attribute as well as co-occurrence for zero-shot ingredient recognition. Extensive experiments on both Chinese and Japanese food datasets are performed to demonstrate the superior performance of multi-relational GCN and shed light on zero-shot ingredients recognition.
@inproceedings{DBLP:conf/aaai/ChenPWWNC20, author = {Chen, Jingjing and Pan, Liangming and Wei, Zhipeng and Wang, Xiang and Ngo, Chong{-}Wah and Chua, Tat{-}Seng}, title = {Zero-Shot Ingredient Recognition by Multi-Relational Graph Convolutional Network}, booktitle = {AAAI Conference on Artificial Intelligence (AAAI)}, pages = {10542--10550}, publisher = {{AAAI} Press}, year = {2020}, url = {https://doi.org/10.1609/aaai.v34i07.6626} }
2017
- Prerequisite Relation Learning for Concepts in MOOCsLiangming Pan, Chengjiang Li, Juanzi Li, and Jie TangIn Annual Meeting of the Association for Computational Linguistics (ACL), 2017
What prerequisite knowledge should students achieve a level of mastery before moving forward to learn subsequent coursewares? We study the extent to which the prerequisite relation between knowledge concepts in Massive Open Online Courses (MOOCs) can be inferred automatically. In particular, what kinds of information can be leverage to uncover the potential prerequisite relation between knowledge concepts. We first propose a representation learning-based method for learning latent representations of course concepts, and then investigate how different features capture the prerequisite relations between concepts. Our experiments on three datasets form Coursera show that the proposed method achieves significant improvements (+5.9-48.0% by F1-score) comparing with existing methods.
@inproceedings{pan-etal-2017-prerequisite, title = {Prerequisite Relation Learning for Concepts in {MOOC}s}, author = {Pan, Liangming and Li, Chengjiang and Li, Juanzi and Tang, Jie}, booktitle = {Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2017}, address = {Vancouver, Canada}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/P17-1133}, pages = {1447--1456} }
- Course Concept Extraction in MOOCs via Embedding-Based Graph PropagationLiangming Pan, Xiaochen Wang, Chengjiang Li, Juanzi Li, and Jie TangIn International Joint Conference on Natural Language Processing (IJCNLP), 2017
Massive Open Online Courses (MOOCs), offering a new way to study online, are revolutionizing education. One challenging issue in MOOCs is how to design effective and fine-grained course concepts such that students with different backgrounds can grasp the essence of the course. In this paper, we conduct a systematic investigation of the problem of course concept extraction for MOOCs. We propose to learn latent representations for candidate concepts via an embedding-based method. Moreover, we develop a graph-based propagation algorithm to rank the candidate concepts based on the learned representations. We evaluate the proposed method using different courses from XuetangX and Coursera. Experimental results show that our method significantly outperforms all the alternative methods (+0.013-0.318 in terms of R-precision; p\textless\textless0.01, t-test).
@inproceedings{pan-etal-2017-course, title = {Course Concept Extraction in {MOOC}s via Embedding-Based Graph Propagation}, author = {Pan, Liangming and Wang, Xiaochen and Li, Chengjiang and Li, Juanzi and Tang, Jie}, booktitle = {International Joint Conference on Natural Language Processing (IJCNLP)}, year = {2017}, address = {Taipei, Taiwan}, publisher = {Asian Federation of Natural Language Processing}, url = {https://aclanthology.org/I17-1088}, pages = {875--884} }
PhD thesis
2022
- Towards Generating Deep Questions from TextNational University of Singapore, 2022