DALL-E

DALL-E, DALL-E 2, and DALL-E 3 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as prompts.

History and background

DALL-E was revealed by OpenAI in a blog post on 5 January 2021, and uses a version of GPT-3 modified to generate images. == Technology ==

Technology

The first generative pre-trained transformer (GPT) model was initially developed by OpenAI in 2018, A trained CLIP pair is used to filter a larger initial list of images generated by DALL-E to select the image that is closest to the text prompt. == Capabilities ==

Capabilities

DALL-E can generate imagery in multiple styles, including photorealistic imagery, paintings, and emoji. It can "manipulate and rearrange" objects in its images, and can correctly place design elements in novel compositions without explicit instruction. Thom Dunn writing for BoingBoing remarked that "For example, when asked to draw a daikon radish blowing its nose, sipping a latte, or riding a unicycle, DALL-E often draws the handkerchief, hands, and feet in plausible locations." DALL-E showed the ability to "fill in the blanks" to infer appropriate details without specific prompts, such as adding Christmas imagery to prompts commonly associated with the celebration, and appropriately placed shadows to images that did not mention them. Furthermore, DALL-E exhibits a broad understanding of visual and design trends. DALL-E can produce images for a wide variety of arbitrary descriptions from various viewpoints with only rare failures. Mark Riedl, an associate professor at the Georgia Tech School of Interactive Computing, found that DALL-E could blend concepts (described as a key element of human creativity). Its visual reasoning ability is sufficient to solve Raven's Matrices (visual tests often administered to humans to measure intelligence). DALL-E 3 follows complex prompts with more accuracy and detail than its predecessors, and is able to generate more coherent and accurate text. DALL-E 3 is integrated into ChatGPT Plus. Image modification Given an existing image, DALL-E 2 and DALL-E 3 can produce "variations" of the image as individual outputs based on the original, as well as edit the image to modify or expand upon it. The "inpainting" and "outpainting" abilities of these models use context from an image to fill in missing areas using a medium consistent with the original, following a given prompt. For example, this can be used to insert a new subject into an image, or expand an image beyond its original borders. According to OpenAI, "Outpainting takes into account the image’s existing visual elements — including shadows, reflections, and textures — to maintain the context of the original image." Technical limitations DALL-E 2's language understanding has limits. It is sometimes unable to distinguish "A yellow book and a red vase" from "A red book and a yellow vase" or "A panda making latte art" from "Latte art of a panda". It generates images of an astronaut riding a horse when presented with the prompt "a horse riding an astronaut". It also fails to generate the correct images in a variety of circumstances. Requesting more than three objects, negation, numbers, and connected sentences may result in mistakes, and object features may appear on the wrong object. Additional limitations include generating text, ambigrams and other forms of typography, which often results in dream-like gibberish. The model also has a limited capacity to address scientific information, such as astronomy or medical imagery. text using the prompt "a person pointing at a tanuki, with a speech bubble that says ', which results in the text being rendered with nonsensical kanji and kana == Ethical concerns ==

Ethical concerns

DALL-E 2's reliance on public datasets influences its results and leads to algorithmic bias in some cases, such as generating higher numbers of men than women for requests that do not mention gender. However, DALL-E 3 continues to disproportionally represent people as White, female, and youthful. Users are able to somewhat remedy this through more specific prompts for image generation. A concern about DALL-E 2 and similar image generation models is that they could be used to propagate deepfakes and other forms of misinformation. legal concerns have been raised regarding who owns those images. In 2023 Microsoft pitched the United States Department of Defense to use DALL-E models to train battlefield management systems. In January 2024 OpenAI removed its blanket ban on military and warfare use from its usage policies. == Reception ==

Reception

Most coverage of DALL-E focuses on a small subset of "surreal" or "quirky" outputs. DALL-E's output for "an illustration of a baby daikon radish in a tutu walking a dog" was mentioned in pieces from Input, NBC, Nature, and other publications. Its output for "an armchair in the shape of an avocado" was also widely covered. ExtremeTech stated "you can ask DALL-E for a picture of a phone or vacuum cleaner from a specified period of time, and it understands how those objects have changed". Engadget also noted its unusual capacity for "understanding how telephones and other objects change over time". According to MIT Technology Review, one of OpenAI's objectives was to "give language models a better grasp of the everyday concepts that humans use to make sense of things". Wall Street investors have had a positive reception of DALL-E 2, with some firms thinking it could represent a turning point for a future multi-trillion dollar industry. By mid-2019, OpenAI had already received over $1 billion in funding from Microsoft and Khosla Ventures, and in January 2023, following the launch of DALL-E 2 and ChatGPT, received an additional $10 billion in funding from Microsoft. Japan's anime community has had a negative reaction to DALL-E 2 and similar models. Two arguments are typically presented by artists against the software. The first is that AI art is not art because it is not created by a human with intent. "The juxtaposition of AI-generated images with their own work is degrading and undermines the time and skill that goes into their art. AI-driven image generation tools have been heavily criticized by artists because they are trained on human-made art scraped from the web." The second is the trouble with copyright law and data text-to-image models are trained on. OpenAI has not released information about what dataset(s) were used to train DALL-E 2, inciting concern from some that the work of artists has been used for training without permission. Copyright laws surrounding these topics are inconclusive at the moment. After integrating DALL-E 3 into Bing Chat and ChatGPT, Microsoft and OpenAI faced criticism for excessive content filtering, with critics saying DALL-E had been "lobotomized." The flagging of images generated by prompts such as "man breaks server rack with sledgehammer" was cited as evidence. Over the first days of its launch, filtering was reportedly increased to the point where images generated by some of Bing's own suggested prompts were being blocked. TechRadar argued that leaning too heavily on the side of caution could limit DALL-E's value as a creative tool. == Open-source implementations ==

Open-source implementations

Since OpenAI has not released source code or weights for any of the three models, there have been several attempts to create open-source models offering similar capabilities. == See also ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com