Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tianxiang Xia

SHYI: Action Support for Contrastive Learning in High-Fidelity Text-to-Image Generation

Jan 15, 2025

Tianxiang Xia, Lin Xiao, Yannick Montorfani, Francesco Pavia, Enis Simsar, Thomas Hofmann

Figure 1 for SHYI: Action Support for Contrastive Learning in High-Fidelity Text-to-Image Generation

Figure 2 for SHYI: Action Support for Contrastive Learning in High-Fidelity Text-to-Image Generation

Figure 3 for SHYI: Action Support for Contrastive Learning in High-Fidelity Text-to-Image Generation

Figure 4 for SHYI: Action Support for Contrastive Learning in High-Fidelity Text-to-Image Generation

Abstract:In this project, we address the issue of infidelity in text-to-image generation, particularly for actions involving multiple objects. For this we build on top of the CONFORM framework which uses Contrastive Learning to improve the accuracy of the generated image for multiple objects. However the depiction of actions which involves multiple different object has still large room for improvement. To improve, we employ semantically hypergraphic contrastive adjacency learning, a comprehension of enhanced contrastive structure and "contrast but link" technique. We further amend Stable Diffusion's understanding of actions by InteractDiffusion. As evaluation metrics we use image-text similarity CLIP and TIFA. In addition, we conducted a user study. Our method shows promising results even with verbs that Stable Diffusion understands mediocrely. We then provide future directions by analyzing the results. Our codebase can be found on polybox under the link: https://polybox.ethz.ch/index.php/s/dJm3SWyRohUrFxn

* Main content 4 pages

Via

Access Paper or Ask Questions

Newclid: A User-Friendly Replacement for AlphaGeometry

Nov 18, 2024

Vladmir Sicca, Tianxiang Xia, Mathïs Fédérico, Philip John Gorinski, Simon Frieder, Shangling Jui

Figure 1 for Newclid: A User-Friendly Replacement for AlphaGeometry

Figure 2 for Newclid: A User-Friendly Replacement for AlphaGeometry

Figure 3 for Newclid: A User-Friendly Replacement for AlphaGeometry

Figure 4 for Newclid: A User-Friendly Replacement for AlphaGeometry

Abstract:We introduce a new symbolic solver for geometry, called Newclid, which is based on AlphaGeometry. Newclid contains a symbolic solver called DDARN (derived from DDAR-Newclid), which is a significant refactoring and upgrade of AlphaGeometry's DDAR symbolic solver by being more user-friendly - both for the end user as well as for a programmer wishing to extend the codebase. For the programmer, improvements include a modularized codebase and new debugging and visualization tools. For the user, Newclid contains a new command line interface (CLI) that provides interfaces for agents to guide DDARN. DDARN is flexible with respect to its internal reasoning, which can be steered by agents. Further, we support input from GeoGebra to make Newclid accessible for educational contexts. Further, the scope of problems that Newclid can solve has been expanded to include the ability to have an improved understanding of metric geometry concepts (length, angle) and to use theorems such as the Pythagorean theorem in proofs. Bugs have been fixed, and reproducibility has been improved. Lastly, we re-evaluated the five remaining problems from the original AG-30 dataset that AlphaGeometry was not able to solve and contrasted them with the abilities of DDARN, running in breadth-first-search agentic mode (which corresponds to how DDARN runs by default), finding that DDARN solves an additional problem. We have open-sourced our code under: https://github.com/LMCRC/Newclid

* 51 pages

Via

Access Paper or Ask Questions