Programmazione Avanzata di Prompt per AI

La programmazione di prompt per l’AI è utilizzata principalmente per chatbot, ma utilizzando una programmazione attenta dei prompt, può essere impiegata per guidare applicazioni interattive. Fornendo a ChatGPT il contesto come un dataset strutturato (ad esempio in JSON), è in grado di disambiguare e identificare entità che sono state identificate in modo ambiguo dall’utente, e fornire al tuo programma dati strutturati.

Per illustrare il concetto, ho scritto un piccolo gioco di “fiction interattiva”, in cui l’utente viene presentato con un testo che descrive un ambiente fantasy, che può essere navigato e interagito emettendo comandi come “go west” o “eat the apple”.

Descrizione degli oggetti con definizioni di base

Lo sviluppatore deve fornire solo definizioni di base degli oggetti nel gioco, e utilizzare la programmazione di prompt per farli rendere in una descrizione coerente e analizzare richieste complesse dall’utente.

Ecco come una stanza è descritta nei dati del gioco:

const RustySword: IFObject = addObject(
{ name: "Rusty Sword",
  description: "An ancient looking sword",
  extra: ["seen better days", "unsharpened"] })
const Entrance: IFRoom = addRoom(
{ name: "The Entrance",
  description: "the outside of a large construction, in front of a stone gate",
  extra: ["wild", "immerse in the forest", "ancient"],
  e: "Atrium",
  objects: [RustySword] })

Il codice è abbastanza autoesplicativo, ma il prompt che forniamo a ChatGPT per fargli capire come vogliamo rendere la stanza lo rende molto chiaro:

You are an interactive fiction game.
The user will provide you with json data about the current room the character
is in:
* Generic description (in the field 'description').
* Additional attributes of the room, usually a short list of adjectives,
  in the field 'extra'.
* Exits, in the fields 'n' for nord, 'e' for east,
  's' for south, 'w' for west, 'nw' for north-west, 'ne' for north-east,
  'sw' for south-west and 'se' for south-east; the value will be a short
  description of where directions lead to.
* A list of objects contained in the room, each of wich will have the
  attributes, 'name', 'description' and 'extra'; in the room description,
  you will have only to include the name and eventually render the 'extra'
  attributes.
You can pick and chose which of the elements in 'extra' field for
rooms and objects you want to render each time.

E il risultato appare così:

You are standing outside a large, wild construction immersed in the forest. The edifice, worn by the ticking of the ages, rests before you. Its imposing stone gate stands robust, tallying its grandeur. This is the mythical entrance, challenging and intriguing.To the east, a short journey will lead you to the Atrium. Your gaze moves to the ground and you spot something unusual. A Rusty Sword lies abandoned. It is an ancient looking sword, unsharpened and clearly having seen better days. Could it be of any use? Well, that is for you to discern.

Come nota a margine, adoro il modo in cui ChatGPT ha deciso di rendere le informazioni che ho fornito nei campi extra. È possibile controllare quanto verboso/stilistico appare il rendering aggiungendo requisiti per un output più fiorito o succinto nel prompt.Analisi dei comandi utentePuoi sfruttare il potere di ChatGPT per scomporre i comandi utente in richieste più strutturate che possono guidare la logica del tuo programma.Questo prompt categorizza le azioni che l’utente vuole eseguire e riconosce le entità su cui vengono eseguite, restituendo un oggetto strutturato che può essere analizzato programmaticamente:

You are an interactive fiction game. The user will provide you with a
json object, with the input stored in the field 'command'.
Reply with a json object containing:
* The 'action' the user wants to perform. It can be:
  * 'move' if the user types a directional command as north, n, southeast,
    sw etc.
  * 'examine' if the user wants more information about a certain object,
    the room it's in or themselves.
  * 'use' if the user wants to use a specific object. * 'take' pick up an
    object from the room.
  * 'drop' if the user wants to drop an object in the player's inventory.
  * 'unknown' in other cases.
* The optional 'object' on which the action is applied.
  - If the action refers to the player themselves, set this field to "player".
  - If the action refers to the current location, set this field to "room".
  - if the action is "move", set this field to the direction the user wants
    to go (one or two letters, i.e. 'n' for north, 'sw' for south-west etc.)
  - If the action is applied on an object, set this field to the named object.
* The optional 'target', which is the additional ultimate target acted upon,
  if any.
* An optional 'error' field containing a coincise explanation of why
  the action is misunderstood (i.e. a mispelling), or cannot be
  completed (i.e. the named object is not part of the given context).
  This include actions with ambiguous or non existing objects or targets.

L’eleganza di questo metodo è che permette di scomporre le azioni dell’utente in categorie e fornisce un messaggio coerente che possiamo rivolgere direttamente all’utente quando la categoria non è riconosciuta. Ad esempio:

Your command 'blurb the fuzz and get on with it now!' is not recognized. Please try to use different verbs or check your spelling.

Disambiguazione degli elementi target

Una volta che il comando è scomposto, possiamo provare ad associare gli oggetti a cui l’utente fa riferimento a ciò che sappiamo sul mondo.Possiamo sfruttare il potere di ChatGPT per identificare anche oggetti descritti in modo vago. Supponiamo che l’utente inserisca il comando:

> tell me more about that sparkly thing

Ora, il giocatore ha nel suo inventario questo oggetto:

const Ring: IFObject = addObject({
  name: "Golden Ring",
  description: "an ornate ring",
  extra: ["made of pure gold", "precious", "encrusted with diamonds"] })

Potremmo inferire che, essendo fatto d’oro e incastonato con diamanti, questo potrebbe essere la cosa scintillante a cui il giocatore si riferisce. Possiamo programmare il prompt per restituire esattamente questo oggetto:

You are an interactive fiction game, and I need you to disambiguate the
object the player is referring to. The user data is a json object with one
field called "referred" which is the object named to the user, and a
structure "player" that contains:
- an inventory, with the set of objects the player is carriyng,
- a room with a field called 'objects' that contains the list of objects
  that are in the room.
Every object has a field 'name' and possibly a list of 'extra',
containing additional information about that object.
I want you to return a json object with a single 'object' field; if you
can disambiguate the input, its value must be the exact 'name' field
of the object the user refers to; otherwise, return an empty json object.

La traccia delle operazioni eseguite quando analizziamo questo comando è:

[Log] { (bundle.js, line 193) "action": "examine", "object": "sparkly thing" }
[Log] Disambiguating "sparkly thing" (bundle.js, line 341)
[Log] chatbot returned {"object":"Golden Ring"} (bundle.js, line 348)
[Log] Disambiguated "sparkly thing" as {"name":"Golden Ring","description":"an ornate ring","extra":["made of pure gold","precious","encrusted with diamonds"]} (bundle.js, line 354)

Nella prima riga possiamo vedere che ChatGPT è stato in grado di scomporre la frase e categorizzare correttamente la richiesta “tell me more about…” come un “examine”. Poi, fornendo a ChatGPT l’oggetto giocatore, che ha l’anello d’oro nel campo inventario, otteniamo l’identificatore dell’oggetto corretto. Possiamo quindi cercare quell’identificatore nel database del gioco e recuperare l’oggetto corretto, e infine passarlo al generatore di testo:

The player is currently in possession of a Golden Ring which is visually compelling. At first glance, it takes the form of an ornate ring, exquisitely fashioned and unerringly eye-catching. Further examination of this artifact reveals that it has been sculpted from pure gold, denoting its intrinsic value. The ring is identified to be precious, an aspect that intensifies its overall allure. Adding to its splendid aesthetic are diamonds tucked into its body, their brilliant shimmer contributing to the ring's overall majestic demeanor. The Golden Ring appears to hold immense worth, both in value and beauty.

Inoltre, questo prompt ha la capacità di trovare oggetti mancanti; la sequenza dopo il comando:> eat the apple è la seguente:

[Log] { (bundle.js, line 193) "action": "use", "object": "apple" }
[Log] Disambiguating "apple" (bundle.js, line 341)
[Log] chatbot returned {} (bundle.js, line 348)
[Log] Can't disambiguate "apple" ({}): What do you mean with apple? (bundle.js, line 357)

Va notato come ChatGPT sia stato in grado di catturare le relazioni tra gli oggetti, e generare descrizioni per i contenuti della stanza e, specialmente, per gli oggetti che l’utente sta portando nel loro inventario (notando anche che l’anello era un ‘singolo oggetto’ nell’inventario del giocatore), senza prompting esplicito.Ottimizzazione dei PromptI prompt presentati qui sono relativamente pesanti; li ho resi intenzionalmente verbosi anche per i lettori di questo articolo per dare un po’ più di contesto su cosa stava succedendo dietro le quinte.Poiché il costo dell’uso dell’API OpenAI è per token (input + output), per applicazioni reali vorrai affinarli e ridurli al minimo indispensabile.Una tecnica che ho trovato efficace per testare, debuggare e ottimizzare i prompt è quella di fornirli all’interfaccia web gratuita di ChatGPT, e testarli contro input JSON grezzi per vedere come ChatGPT rispondeva.

Il progetto

Al momento, questo è solo un demo che ho scritto per dimostrare il potenziale della programmazione programmatica di prompt AI, ma essendo stato uno scrittore di fiction interattiva in passato, pianifico di estendere questo progetto come base per un motore di fiction interattiva basato su AI. Se sei interessato, puoi seguire il progetto su GitHub: https://github.com/jonnymind/AIF#readme

Conclusioni

La programmazione programmatica di prompt AI è ancora agli albori, ma i potenziali sono già visibili. Con l’aumento delle offerte di API AI, possiamo aspettarci una riduzione concreta del time-to-market di applicazioni altamente professionali e versatili — considera solo che questo piccolo gioco ha richiesto circa mezza giornata di codifica.