Massive Multimodal Models

AI THROUGH PROMPT ENGINEERING CAN HELP DIPLOMACY AND NOT ONLY

@ndreah • May 18, 2024

“ChatGPT is an artificial intelligence (AI) chatbot that uses natural language processing to create humanlike conversational dialogue. The language model can respond to questions and compose various written content, including articles, social media posts, essays, code and emails.” [04] Amanda Hetler

"People have used ChatGPT to do the following:

Code computer programs and check for bugs in code.
Compose music.
Draft emails.
Summarize articles, podcasts or presentations.
Script social media posts.
Create titles for articles.
Solve math problems.
Discover keywords for search engine optimization.
Create articles, blog posts and quizzes for websites.
Reword existing content for a different medium, such as a presentation transcript for a blog post.
Formulate product descriptions.
Play games.
Assist with job searches, including writing resumes and cover letters.
Ask trivia questions.
Describe complex topics more simply.
Write video scripts.
Research markets for products.
Generate art." [04] Amanda Hetler

We can’t ignore the impact on our day-to-day activities of AI tools. They make the communication between humans and machines easier and simpler.

We can quickly have new insights and information on different fields of interests. But in order to improve the quality and the accuracy of responses we have to structure and phrase of the prompt appropriately.

It’s important to remember that ChatGPT doesn’t understand and doesn’t think, it generates responses based on patterns it learned during training.

The engine of ChatGPT is based on the concept of “token”. GPT (Generative Pre-trained Transformer) model generates the tokens predicting the most probable subsequent token using complex linear algebra.

The model uses an iterative process. It generates one token at time and after generating each token, it revisits the entire sequence of generated tokes and processes them again to generate the next token.

WHAT IS PROMPT ENGINEERING

It is the art of creating precise and effective prompts to guide AI models like ChatGPT toward generating the most accurate, useful outputs. We have to bear in mind that Better Input means Better Output (BI-BO).

The prompt engineering is very important for creating better AI-powered services and obtaining useful results from AI Tools.

When you craft the prompt, it’s important to bear in mind that ChatGPT has a token limit (generally 2048 tokens), which include both the prompt and the generated response. Long prompts can limit the length of response, for this reason it is important to keep prompts concise.

Let us now analyze some techniques of prompt engineering.

INFORMATION-SEEKING PROMPTING

It is used to gather information and to answer question what and how. Examples of prompt:

What are the best restaurants in Rome?
How do I cook pizza?

CONTEXT-PROVIDING PROMPTING

The prompts provide information to the model to perform a specific task. Example:

Prompt: I am planning to celebrate the Italian Republic Day in the [COUNTRY] can you suggest some original ideas to make it more enjoyable?

COMPARATIVE PROMPTING

It is used to ask the model to compare and to evaluate different options to help the user make an appropriate decision. Example:

Prompt: What are the strengths and weaknesses of [Option A] compared to [Option B].

OPINION SEEKING PROMPTING

It is used to ask the model to get the AI’s opinion on a given topic. Example:

Prompt: What would happen if we use only public transport in Rome?.

DIRECTIONAL PROMPTING

If you ask the model for generic question, you receive generic answer. You have to define your prompt with clear instruction and precise and descriptive information.

OUTPUT FORMATTING: CODE GENERATION

If you want to have a specific output or format of the output from ChatGPT, for example a program in Python or Visual Foxpro.

We try with a simple e specific request.

Prompt: Write a function in Python that takes as input three integers and gives as output the maximum of these three numbers.

ChatGPT 3.5 answer:

Prompt: Write a function in Visual Foxpro that takes as input three integers and gives as output the maximum of these three numbers.

Amazing, ChatGPT 3.5 answer:

VERY IMPORTANT

AI-generated code may need to be modified or tested before deploying it. 
It is strongly recommended to:
- always modify and review the generated code to ensure it meets your specific requirements;
- use it as STARTING-POINT;
- test and check the code;
- it ALWAYS NEEDS HUMAN OVERSIGHT.

ROLE PLAYING

You can also act as someone else when you interact with ChatGPT.

If you add a role to the question, you make ChatGPT change the answer and the quality and the tone of the output. In this way we got much better information. Here it is a schema to use when you build a prompt related a role playing:

WHO: you can ask ChatGPT to be what you want. You assign the role you need the model to play. A scientist, doctor, business man, chef and so on.
WHEN: you can put the character at any moment in time;
WHERE: you can put the character to a particular location or context.
WHY: you want to dialogue with the character for whatever reason, motivations or purpose you want;
WHAT: you want to dialogue with the character about what. That is the action you want the model to do.

We just need to verify the level of reliability and credibility given to this type of interaction.

Here is some practical examples.

Act as a character from a book:

Prompt: I want you to act like [character] from [book]. I want you to respond and answer like [character] using the tone, manner and vocabulary [character] would use.

Act as historical character:

Prompt: I want you to act as [historical character] to better understand the historical facts of that period.

Act as a political character:

Prompt: I want you to act as [political character] in order to ask as improving the quality of life of the people.

Act as a scientist:

Prompt: I want you to act as a scientist. You will apply your knowledge of scientific to propose useful strategies for saving the environment from pollution.

Act as a travel guide:

Prompt: I want you to act as a travel guide from Italy at the time of the Roman empire when Caesar was emperor. I will write you my location and you will suggest a place to visit near my location.

ZERO-SHOT PROMPTING

In zero-shot prompting, we use a prompt that describes the task, but it doesn't contain examples or demonstrations.

You use this prompt when you trust the model’s knowledge to provide a sufficient answer.

Prompt: Write a description of the Colosseum.

FEW SHOTS PROMPTING

It involves providing the model a few examples to guide its understanding of the desired outcome.

The example will be of:

Knowledge extracting;
And it’s formatting.

We can define the prompt like this:

Prompt: Here are some examples of each item of the list of best important business people.

X is the Y of Z
X -> [PERSON]
Y -> [POSITION/TITLE]
Z -> [COMPANY]

PUTTING ALL TOGETHER

You can also combine all these techniques:

Directional prompting;
Output formatting;
Role based prompting;
Few shots prompting.

CHAIN OF THOUGHT (CoT) PROMPTING

This technique encourages the model to break down complex tasks into smaller intermediate steps before arriving to conclusion. It improves the multi-step reasoning abilities of large language models (LLMs) and is helpful for complex problems that would be difficult or impossible to solve in a single step.

There are also variants of CoT prompting, such as "Tree-of-Thought" and "Graph-of-Thought", which were inspired by the success of CoT prompting.

IN STYLE PROMPTING

You can ask the model for the style of the output:

Writing as another author;
As emotional state;
In enthusiastic tone;
Writing something in a sad state;
Rewriting the following email in my style of writing;
Rewriting the following email in the style of xy;

STRUCTURING YOUR DATA & USING TABLES

You can ask the model to extract information in the way is useful. You can ask to extract information from the example and structure it into markdown table or a specific format.

Prompt: Generate a table of three column: name, function, phone numbers from the text.

Text: “Urs Wiedmer
Head of Communications
+41584645082
+41796919559
Markus Spörndli
Press spokesperson
Deputy Head of Communications
+41584634149
+41796747396
Irène Harnischberg
Press spokesperson
+41584622034
+41794567139
Charles-Étienne Viladoms
Webmaster
+41584622054
+41792194031
Loïc Zen-Ruffinen
Social Media Manager
Press spokesperson for French-speaking Switzerland
+41584817911
+41791507632"

Source: “https://www.wbf.admin.ch/wbf/en/home/dokumentation/dienstleistungen/dienstleistungen-wbf/zugang-zu-amtlichen-dokumenten.htm”

TEXT SUMMARIZATION

You can give to the model a big chunk of text and ask to summarize it.

For example, You can prompt:

Prompt: You are summarization bot any text that I provide to you summarize it, and create a title from it.

But a more effective technique could be:

Prompt: summarize the text below as a bullet point list of the the most import points.

Text: “ … “

If you need to generate a brief overview of a scientific paper don’t use generic instruction like “summarize the scientific paper” instead you should be more specific.

Prompt: generate a brief (approx. 300 words), of the following scientific paper. The summary should be understandable and clear especially to someone with no scientific background.

Paper: “ … “

TEXT CLASSIFICATION

You can use the model:

As SPAM DETECTOR in the mail;
To perform SENTIMENT ANALYSIS for brands and so on.

You can prompt:

You are a sentiment analysis bot. Classify any text that I provide into three classes:

NEGATIVE
POSITIVE
NEUTRAL

CONSIDERATIONS ON AI GENERATED RESPONSES

The AI-generated responses aren’t always correct. You have always to verify that the AI-generated output is accurate and up-to-date. This is important of you want to make an informed decision based on the response generated.

In any case, it is a good practice to have some idea of what you are asking for in order to properly evaluate the answer obtained from AI.

BIBLIOGRAPHY/WEBOGRAPHY

[01] openai.com;

[02] “ChatGPT Teacher Tips Part 1: Role-Playing Activities” https://edtechteacher.org/chatgptroleplaying/, March 2023;

[03] Aayush Mittal, “The Essential Guide to Prompt Engineering”, https://www.unite.ai/prompt-engineering-in-chatgpt/, April 2024;

[04] Amanda Hetler, “ChatGPT”, https://www.techtarget.com/whatis/definition/ChatGPT, December 2023;

{[(homo scripsit)]} - Not generated by AI tools or platforms.

AI in Action: BASIC CONCEPTS ABOUT LLM

@ndreah • April 29, 2024

LLMs - Large Language Models

LLMs are neural networks that can process and generate natural language text.

TRAINING PHASE

They are trained on a dataset of billions of sentences using unsupervised learning techniques. In the training process LLMs learn what is the most likely word to came next to the previous one based on huge amount of data.

INPUT BY USER

LLMs accept as input a text prompt by a user and in relation with it generate in output text, word by word (token by token).

GENERATION OF THE OUTPUT

The generation process consists in predicting the next word on the base of previously generated words. LLMs are trained in doing this without any consciousness which is a prerogative of the human mind.

Building a Simple Large Language Model

In this example we use as data the dystopian novel “Nineteen Eighty-Four – 1984” by English writer George Orwell, published on 1949.

Using the text of the novel as a data source, the following tables were produced. I show only a part of them:

Word/Token	Occurrences
the	6249
of	3309
a	2482
and	2326
to	2236
was	2213
He	1959
It	1864
in	1759
that	1457
had	1311
his	1079
you	1011
not	827
with	771
as	672
At	654
they	642
for	615
IS	614
but	611
be	608
on	604
were	583
there	559
Winston	526
him	512
i	495
which	443
s	439
one	426
or	424
…	…

Word/Token	Word Next	Score	Probability
of	the	743	0,01139
It	was	589	0,00903
in	the	574	0,00880
He	had	355	0,00544
he	was	273	0,00418
on	the	230	0,00352
was	a	225	0,00345
there	was	223	0,00342
to	the	212	0,00325
O	Brien	205	0,00314
to	be	203	0,00311
and	the	203	0,00311
had	been	202	0,00310
the	party	195	0,00299
at	the	183	0,00280
that	he	167	0,00256
from	the	161	0,00247
with	a	158	0,00242
did	not	148	0,00227
that	the	147	0,00225
of	a	145	0,00222
of	his	145	0,00222
out	of	142	0,00218
was	not	130	0,00199
with	the	127	0,00195
he	could	124	0,00190
it	is	124	0,00190
in	his	123	0,00188
in	a	122	0,00187
They	were	122	0,00187
seemed	to	115	0,00176
was	the	110	0,00169
could	not	109	0,00167
he	said	109	0,00167
the	same	103	0,00158
for	the	101	0,00155
by	the	95	0,00146
for	a	92	0,00141
into	the	92	0,00141
she	had	87	0,00133
as	though	82	0,00126
they	had	80	0,00123
that	it	80	0,00123
have	been	79	0,00121
and	a	78	0,00120
it	had	77	0,00118
The	other	76	0,00116
of	them	76	0,00116
to	him	75	0,00115
the	telescreen	75	0,00115
BIG	BROTHER	73	0,00112
…	…	…	…

This is a simple diagram to understand how the text is generated word by word.

For example, if I start with BIG, LLM will probably generate BROTHER, and continuing we can produce this sentence:

BIG	BROTHER	was	a	sort	of	the	thought	…
Probability	0,00112	0,0005	0,00345	0,00095	0,00104	0,00023	0,00087

ChatGPT is a LLM

By using “prompt” mechanism you can ask ChatGPT for what you want using the natural language.

But how ChatGPT “UNDERSTAND” text inserted by the user?

The text is transformed and each word represented by a code that computer can processed.

A way to represent individual words is Word2Vec technique in natural language processing (NLP), in which each word is represented by a vector (a set of numbers). This helped a computer to assign a meaning to the word.

Word2Vec stands for “words as vectors”. It means expressing each word in your text corpus in n-dimensional space. The word’s weight in each dimension defines it for the model.

The meaning of the words is based on the context defined by its neighboring words where they are associated.

A simple example of word representation using the Word2Vec approach in two-dimensional space.

Man = [1,4]

Woman = [1,3]

Manager = [4,2]

Actress = [4,1]

Manager	-	Man	+	Woman	=	Actress
[4,2]	-	[1,4]	+	[1,3]	=	[4,1]

In the following picture we have the graphic representation.

This is what happens when you sent some prompt to ChatGPT.

The text is converted and split in tokens;

[10,10], [10,31], [10,15], [14,44], [8,5], …

(you, are , an, ICT, specialist, with, a, lot, of, experience)

An algorithm (like ChatGPT) makes some prediction and output text word by word.

[10,10],…

(you,can,have, an, important, and, well-paid, job)

ChatGPT and PROMPT ENGINEERING

Let us now analyze some techniques to better exploit the potential of ChatGPT.

DIRECTIONAL PROMPTING

If you submit the same question to ChatGPT many times, you will likely receive different answers.

How can you use directional prompting in order to get more precise answer?

You have to give more information and to be more descriptive when you define a prompt. You have to give clear instruction. This will help the model to understand of what you want. If you ask for generic question, you receive generic answer.

Generic question:

BIBLIOGRAPHY/WEBOGRAPHY

[01] openai.com;

[02] KENNETH WARD CHURCH, Emerging Trends Word2Vec, IBM 2016;

Not generated by AI tools or platforms.

{[(homo scripsit)]}

AI IN ACTION: COOPERATION WITH HUMAN

@ndreah • March 29, 2024

"Artificial Intelligence attempts to coax a machine, typically a computer, to behave in ways humans judge to be intelligent" John McCarthy (1927-2011)

"I think that there is a lot of fear about robots and artificial intelligence among some people, whereas I'm more afraid of natural stupidity" Eugenia Cheng

Now we can talk with a “machine” using a natural language instead of using programming languages consisting of specific words to be written following a strict syntax and form.

We have a new paradigm in the HCI (Human Computer Interaction) with generative models and large language models.

We can interact with these AI Systems by using “prompt” mechanism, in which we have flexible inputs continued by equally flexible outputs.

In [01] this mechanism is called massive multimodal models.

In the following table a selection of different input/output modalities.

"Interaction with prompt-commanded AI is a different from other ways of interaction with machines" [01]. It has three important properties:

flexibility: using of text, code, images etc.;
generality: applicable to broad range of tasks;
originality: generate original content.

"Cognitive tools are external artifacts that are used to aid the psychological capacities of the human brain in completing a cognitive task" [01] They are used to reduce the cognitive work of human brain.

Massive multimodal models are cognitive tools or extenders they can be used for simple o complex interactions. The results depend on the skill and capacity of the user in exploiting these cognitive tools.

AI IMAGE SYNTHESIS: AI Text-to-Art Generator

It is the task where AI learns to understand a description in natural language and reproduce realistic image matching the description. It combines natural language processing (NLP) and computer vision (CV). In this text-to-image tasks NLP model is the encoder and an image synthesis model as the decoder.

Our society is becoming increasingly visual. Images are a very strong means of communication and in this Artificial Intelligence is a very powerful tool.

We can create incredible images (AI Text-to-Art Generator) using AI. Here are some websites where we can create image using AI:

www.bing.com/images/create: AI-powered Bing using its new feature Image Creator: “Powered by the very latest DALL∙E models from our partners at OpenAI, Bing Image Creator allows you to create an image simply by using your own words to describe the picture you want to see. Now users off the waitlist can generate both written and visual content in one place from within chat.”;
www.midjourney.com: you can access by discord.com account.
openai.com/dall-e-2: DALL·E 2 is an AI system that can create realistic images and art from a description in natural language. It is not free by default.
YOUIMAGINE from you.com: to magically transform your ideas into stunning visuals and one-of-a-kind graphics;
stable-diffusion-art.com: Stable Diffusion Art:
www.canva.com: MagicStudio By Canva allows to supercharge your work and designs with all power of AI.
www.imagine.art : "Create awe-inspiring masterpieces effortlessly and explore the endless possibilities of AI generated art".
davinci.ai : Create AI art using only your words in just a few seconds!

AI-powered Bing’s Image Creator

More specific is the prompt better you obtain the image that you have in mind. Bing’s Image Creator recommends you format your prompts:

Adjective + Noun + Verb + Style.

Small fox running in the forest, digital art

Simple prompt description:

“The centurion in the time of the Roman Empire: the backbone of the Roman army.”

in Bing's Image Creator produce in output:

Complex prompt description

"[...] the centurions must be, not so much men who are bold and contemptuous of danger, as men who are able to command, tenacious and calm, who, moreover, do not move to attack when the situation is uncertain, nor throw themselves into the heat of battle, but on the contrary know how to resist even when pressed and defeated, and are ready to die on the battlefield." POLYBIUS, HISTORIES, VI, 24, 9

Produce:

MIDJOURNEY

Let's have a look to midjourney: an AI image generator prompt. A prompt is an input that guides a computer’s AI system in producing an art.

Prompts can range from a simple text description:

“The centurion in the time of the Roman Empire: the backbone of the Roman army.”

that produce:

to more complicated description that involve multiple parts coming together:

Produce:

In midjourney the syntax of the prompt in order to generate an image is the following:

/imagine < description text of the image >

What you put in the prompt is very important in order to define the picture that you would have be generated by midjourney.com.

You can ask :

a photo of ...txt...

a painting of ...txt...

you can decide the subject of the photo or painting:

animal;
person;
landscape;
object;
and so on

You can define which details you would like to add:

1) special environment: for example on a boat, in the forest,...

2) special lighting:

soft lighting,
ring lighting,
neon,
and son on

3) colour scheme;

4) point of view:

camera behind;
camera in the front of;
camera beside;

5) background:

solid colour;
a nebula;
a forest;
and so on.

6) atmosphere:

vibrant;
dark;
and so on.

You can add more information, for example the time of the day and so on.

As an image is 1000 words you can ask midjourney.com to generate an image by an uploaded image.

You can merge multiple images into one by using the blending process.

/blend <image1> <image2> ...

You can add additional text to enrich or modify the image.

You can ask also midjourney to get the prompt back (image captioning) using this command syntax:

/describe <image>

This means please describe this image for me.

You can use a negative prompting about you don't really want in the results, putting at the end of your prompt:

--no fog or dust

The you can use other commands:

--aspect 5:4 for aspect ratio

--ar 5:4

--chaos 100 or 90

You can stop the process at a determined percent in order to have an image at different stages.

--stop 50

Human-Computer Interaction but Human-Centred

Massive multimodal models are cognitive extenders and are distinct from autonomous AI systems because they are highly user-dependent.

BIBLIOGRAPHY/WEBOGRAPHY

[01] Wout Schellaert et al., Your Prompt is My Command, Journal of Artificial Intelligence Research, 2023;

[02] Ronald T. Kneusel, How AI works, 2024;

Not generated by AI tools or platforms.

{[(homo scripsit)]}