|
I am a PhD Student at the
University of Texas at Austin (currently on leave)
and Research Scientist at Blackbird.AI, where I created Compass, an agentic fact-checker for multimodal social media content. My PhD advisor is Ray Mooney.
I completed my B.Sc. and M.Sc. in Computer Science at Brown University, where I was advised by Stefanie Tellex and George Konidaris in the Humans to Robots Lab. |
News
-
[November 2024]: Excited to share CaT-Bench! Our benchmark, CaT-Bench: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans, presented at EMNLP 2024, evaluates how language models handle step dependencies in procedure understanding, revealing challenges for state-of-the-art models. Check out CaT-Bench on HuggingFace and explore the code here.
-
[August 2024]: We’ve presented A Survey of Robotic Language Grounding in the IJCAI Survey Track! This paper explores the trade-offs in robotic language grounding between symbolic and vector-space representations, with insights into interpretability, generalizability, and scalability. Read more on the blog.
-
[May 2024]: Announcing CAPE! In our ICRA 2024 presentation, CAPE: Corrective Actions from Precondition Errors using Large Language Models, we introduced a method that empowers robots to autonomously make corrective actions by re-prompting large language models. Our implementation on the Spot robot demonstrates significant advancements in LLM-assisted robotic planning. Discover more on the CAPE project page and check out our code base.
Publications
2024
Yash Kumar Lal*, Vanya Cohen*, Nathanael Chambers, Niranjan Balasubramanian, Raymond Mooney.
EMNLP 2024, June 2024.
[paper] [dataset] [code]
A benchmark that evaluates language models' ability to reason about step dependencies in task plans, using causal and temporal relations. We find SOTA LLMs perform poorly on this task despite its simplicity.
Vanya Cohen*, Jason Xinyu Liu*, Raymond Mooney*, Stefanie Tellex*, David Watkins*.
IJCAI Survey Track, May 2024.
[paper] [project page]
Robotic language grounding methods can be positioned along an axis that ranges from methods that map natural language to formal symbolic representations to those that map to high-dimensional vector spaces. The survey explores the trade-offs between interpretability, generalizability, and scalability.
Shreyas Sundara Raman, Vanya Cohen, Ifrah Idrees, Eric Rosen, Ray Mooney, Stefanie Tellex, David Paulius.
ICRA 2024, May 2024.
[paper] [code] [project page]
CAPE resolves precondition errors in task planning for robotic agents by leveraging large language models. The method re-prompts LLMs using error feedback, allowing robots to make corrective actions in real-world environments.
2023
Vanya Cohen, Raymond Mooney
Workshop on Natural Language Reasoning and Structured Explanations at ACL 2023, June 2023, Pages 47-58.
[paper]
A symbolic planning-based decoder is introduced to enhance semantic parsing in instructional texts. Leveraging large language models, it generates action sequences in a formal language for improved execution accuracy in few-shot settings. Evaluation demonstrates significant gains in parsing quality across two recipe instruction domains.
2022
Vanya Cohen*, Geraud Nangue Tasse*, Nakul Gopalan, Steven James, Ray Mooney, Benjamin Rosman
Workshop on Language and Robotics at CoRL 2022, December 2022.
[paper]
This paper introduces an end-to-end model combining large language models with pretrained compositional value functions to execute goal-reaching tasks specified in natural language. Evaluations in the BabyAI environment demonstrate the model's ability to generalize zero-shot to new combinations of task attributes.
2021
Vanya Cohen*, Geraud Nangue Tasse*, Nakul Gopalan, Steven James, Matthew Gombolay, Benjamin Rosman
AI-HRI Symposium at AAAI-FSS 2021, October 2021.
[paper]
A new framework leverages compositionality in value functions and language to execute natural language instructions in goal-reaching tasks. Using Boolean algebra to compose value functions, the approach reduces training steps by 86% for new tasks in the BabyAI domain, demonstrating efficient generalization.
2020
Zhen Xu, Vanya Cohen, Shruti Mishra, MingYu Lu
NewInML @ NeurIPS 2020, December 2020.
[workshop]
Sessions included talks by renowned speakers such as Dr. Samy Bengio, Prof. David Jensen, Prof. Anima Anandkumar, and Prof. Isabelle Guyon, as well as a panel discussion with prominent ML experts. The workshop aimed to guide new researchers through the process of publishing high-quality papers, with oral presentations and awards for standout contributions.
Vanya Cohen, Aaron Gokaslan
XRDS: Crossroads, The ACM Magazine for Students Fall 2020, September 2020.
[article]
Guest feature in ACM's XRDS Magazine. When OpenAI released its billion-parameter language model GPT-2, their attempts to withhold the model inspired two researchers to use open research practices to combat the misuse of machine learning.
2019
Aaron Gokaslan*, Vanya Cohen*, Ellie Pavlick, Stefanie Tellex.
NeurIPS NewInML Workshop, December 2019.
[article] [code]
OpenGPT-2 is a replication of OpenAI's GPT-2 model, featuring one of the first publicly accessible language models. It utilized the OpenWebText dataset and helped pave the way for open-source LLMs.
Vanya Cohen*, Benjamin Burchfiel*, Thao Nguyen*, Nakul Gopalan, Stefanie Tellex, George Konidaris.
IROS 2019, November 2019.
[paper] [code] [project page]
This work presents a method to recognize 3D objects from natural language descriptions and depth images, leveraging unsupervised learning on 3D object meshes to generalize to novel viewpoints.
Aaron Gokaslan*, Vanya Cohen*, Ellie Pavlick, Stefanie Tellex.
NeurIPS NewInML Workshop, May 2019.
[dataset]
OpenWebText replicates OpenAI's WebText dataset and has become a widely used open-source dataset for training language models, with over 4 million downloads.