HyperAI超神経
Back to Headlines

Using ASTs to Fix Hallucinated Image URLs in LLM-Generated Code

3日前

Recently, a tech startup building a platform for users to create web apps via simple prompts encountered significant issues, particularly with invalid or "hallucinated" image URLs in the frontend code generated by large language models (LLMs). These issues ranged from syntactical errors to inconsistent styles, necessitating a robust solution to ensure the validity and relevance of image URLs in generated code. The Problem When generating web pages using LLMs, it's common for the code to include image URLs that might not point to valid or relevant images. For instance, the source code might contain <img src="https://example.com/image1.png" /> or <Image src="https://cdn.site.com/pic.jpg" />, where the URLs are placeholders or incorrect. This poses challenges as these URLs need to be identified and replaced with valid ones without altering the structural integrity or syntax of the code. Options Considered Regex Using regular expressions (regex) to identify image URLs seemed like an initial simple solution. However, regex proved insufficient due to the complexity and variability of how images can be represented in JSX code: - HTML <img> tags - Framework-specific <Image> tags - Custom components like <ProfileImage> - Object literals and dynamic properties LLMs Deploying LLMs to parse and extract image URLs appeared more promising but came with significant drawbacks: - Overkill: Using an LLM for a relatively simple task is inefficient. - Cost: While the individual API calls might be inexpensive, costs can accumulate over time. - Speed: LLMs can be slow to respond, especially during high traffic. - Non-determinism: LLMs, being probabilistic, might fail in edge cases, leading to inconsistent results. Solution: ASTs To effectively address these issues, the startup turned to Abstract Syntax Trees (ASTs). An AST is a tree representation of the source code, breaking it down into manageable syntactic components such as JSX elements, functions, and JavaScript objects. This approach allows precise identification and manipulation of code segments, making it ideal for tasks like replacing invalid image URLs. Implementing the Solution Constructing the AST: The code uses Babel, a popular JavaScript parser, to construct the AST from the given source code. javascript const babelParser = require('@babel/parser'); const ast = babelParser.parse(code, { sourceType: 'module', plugins: ['jsx', 'typescript'] }); Traversing the AST: The AST is traversed to identify all image nodes, including <img> tags and custom <Image> components. ```javascript const traverse = require('@babel/traverse').default; const candidateNodes = []; traverse(ast, { ObjectExpression(path) { const properties = path.node.properties; properties.forEach((prop) => { if (prop.type === 'ObjectProperty' && (prop.key.name === 'src' || prop.key.name === 'url' || prop.key.name === 'image')) { const nodeSourceCode = generator(path.node).code; candidateNodes.push(nodeSourceCode); } }); }, JSXElement(path) { const openingElement = path.node.openingElement; const tagName = openingElement.name && openingElement.name.name; if (tagName === 'img' || tagName === 'Image') { const nodeSourceCode = generator(openingElement).code; const existingSrcAttr = openingElement.attributes.find(attr => attr.name.name === 'src'); if (existingSrcAttr && existingSrcAttr.value.type === 'StringLiteral') { candidateNodes.push(nodeSourceCode); } else if (!existingSrcAttr) { candidateNodes.push(nodeSourceCode); } } } }); ``` Updating Image Nodes: Once the image nodes are identified, the next step involves generating relevant captions and using a text-to-image model to create the actual images. The updateImageNodesWithMetaData function replaces the hallucinated image URLs with the generated ones and adds or updates the alt text. ```javascript const resultMap = new Map(nodeResults); traverse(ast, { ObjectExpression(path) { const properties = path.node.properties; const nodeSourceCode = generator(path.node).code; if (resultMap.has(nodeSourceCode)) { properties.forEach((prop) => { if (prop.type === 'ObjectProperty' && (prop.key.name === 'src' || prop.key.name === 'url' || prop.key.name === 'image')) { prop.value = t.stringLiteral(resultMap.get(nodeSourceCode)['image_url']); const existingAlt = properties.find((p) => p.key.name === 'alt'); if (existingAlt) { existingAlt.value = t.stringLiteral(resultMap.get(nodeSourceCode)['description']); } else { const altProperty = t.objectProperty(t.identifier('alt'), t.stringLiteral(resultMap.get(nodeSourceCode)['description'])); path.node.properties.push(altProperty); } } }); } }, JSXElement(path) { const openingElement = path.node.openingElement; const nodeSourceCode = generator(openingElement).code; if (resultMap.has(nodeSourceCode)) { const srcAttr = openingElement.attributes.find(attr => attr.type === 'JSXAttribute' && attr.name.name === 'src'); if (srcAttr) { srcAttr.value = t.stringLiteral(resultMap.get(nodeSourceCode)['image_url']); const existingAltAttr = openingElement.attributes.find(attr => attr.type === 'JSXAttribute' && attr.name.name === 'alt'); if (existingAltAttr) { existingAltAttr.value = t.stringLiteral(resultMap.get(nodeSourceCode)['description']); } else { openingElement.attributes.push(t.jsxAttribute(t.jsxIdentifier('alt'), t.stringLiteral(resultMap.get(nodeSourceCode)['description']))); } } } } }); return generator(ast).code; ``` Example Consider the following input code snippet, which includes various forms of image representations, including object literals and JSX elements: ```javascript export default function EcommerceTShirts() { const [menuOpen, setMenuOpen] = useState(false); const [searchOpen, setSearchOpen] = useState(false); const hats = [ { id: 1, name: "Simple Hat", price: "$24.99", image: "/hat.jpg", alt: "existing" }, { id: 2, name: "New Hat", price: "$19.99", image: "/hat-new.jpg" }, ]; return ( Featured T-Shirts {[ { id: 1, name: "Graphic Tee", price: "$24.99", image: "/tshirt1.jpg" }, { id: 2, name: "Pocket Tee", price: "$19.99", image: "/tshirt2.jpg" }, { id: 3, name: "V-Neck Tee", price: "$22.99", image: "/tshirt3.jpg" }, { id: 4, name: "Crew Neck Tee", price: "$21.99", image: "/tshirt4.jpg" }, ].map((shirt) => ( {shirt.name} {shirt.price} Add to Cart ))} {hats.map((hat) => ( ))} ); } ``` After processing with the AST-based solution, the output code correctly replaces the hallucinated URLs and adds or updates the alt text: ```javascript export default function EcommerceTShirts() { const [menuOpen, setMenuOpen] = useState(false); const [searchOpen, setSearchOpen] = useState(false); const hats = [ { id: 1, name: "Simple Hat", price: "$24.99", image: "s3.productx.hat.jpg", alt: "A simple classic hat in a neutral tone displayed on a flat surface", }, { id: 2, name: "New Hat", price: "$19.99", image: "s3.productx.hat-new.jpg", alt: "A modern-style hat with a minimalist logo on the front panel", }, ]; return ( Featured T-Shirts {[ { id: 1, name: "Graphic Tee", price: "$24.99", image: "s3.productx.tshirt1.jpg", alt: "A white graphic t-shirt featuring bold abstract art" }, { id: 2, name: "Pocket Tee", price: "$19.99", image: "s3.productx.tshirt2.jpg", alt: "A soft grey t-shirt with a front chest pocket" }, { id: 3, name: "V-Neck Tee", price: "$22.99", image: "s3.productx.tshirt3.jpg", alt: "A navy V-neck t-shirt made from breathable cotton" }, { id: 4, name: "Crew Neck Tee", price: "$21.99", image: "s3.productx.tshirt4.jpg", alt: "A classic black crew neck t-shirt with a tailored fit" }, ].map((shirt) => ( {shirt.name} {shirt.price} Add to Cart ))} {hats.map((hat) => ( ))} ); } ``` Evaluation by Industry Insiders and Company Profiles Industry experts agree that the AST-based approach provides a scalable, efficient, and deterministic solution compared to regex or LLMs. It avoids the overhead and potential inaccuracies associated with using LLMs while ensuring the syntactical correctness of the code. Companies like Aider and GritSQL, which leverage ASTs for code linting and large-scale code maintenance, have demonstrated the effectiveness and reliability of this method. The startup is now able to deliver consistent and error-free web apps, improving user experience and reducing development time. By adopting ASTs, the company not only solved the immediate problem of hallucinated image URLs but also enhanced their codebase's overall quality and maintainability. This approach has broader implications and can be applied to various other aspects of code manipulation and optimization, positioning the company as a leader in the field of generative AI for software development. For more insights, check out Aider’s blog on LLM-assisted code linting and the GritSQL talk on maintaining and manipulating code at scale using ASTs. Stay connected with us on LinkedIn and subscribe to our newsletter and YouTube channel for the latest updates in generative AI.

Related Links