Natural Language Processing (NLP)

LLMs - Large Language Models

LLMs are neural networks that can process and generate natural language text.

Midjourney Bot
APP
 — Today at 19:22
LLMs are neural networks that can process and generate natural language text. - Image #1 @Ahdpea8
Midjourney Bot APP — Today at 19:22 LLMs are neural networks that can process and generate natural language text. - Image #1 @Ahdpea8

TRAINING PHASE

They are trained on a dataset of billions of sentences using unsupervised learning techniques. In the training process LLMs learn what is the most likely word to came next to the previous one based on huge amount of data.

INPUT BY USER

LLMs accept as input a text prompt by a user and in relation with it generate in output text, word by word (token by token).

GENERATION OF THE OUTPUT

The generation process consists in predicting the next word on the base of previously generated words. LLMs are trained in doing this without any consciousness which is a prerogative of the human mind.

In this example we use as data the dystopian novel “Nineteen Eighty-Four – 1984” by English writer George Orwell, published on 1949.

Using the text of the novel as a data source, the following tables were produced. I show only a part of them:

Word/TokenOccurrences
the6249
of3309
a2482
and2326
to2236
was2213
He1959
It1864
in1759
that1457
had1311
his1079
you1011
not827
with771
as672
At654
they642
for615
IS614
but611
be608
on604
were583
there559
Winston526
him512
i495
which443
s439
one426
or424
Word/TokenWord NextScoreProbability
ofthe7430,01139
Itwas5890,00903
inthe5740,00880
Hehad3550,00544
hewas2730,00418
onthe2300,00352
wasa2250,00345
therewas2230,00342
tothe2120,00325
OBrien2050,00314
tobe2030,00311
andthe2030,00311
hadbeen2020,00310
theparty1950,00299
atthe1830,00280
thathe1670,00256
fromthe1610,00247
witha1580,00242
didnot1480,00227
thatthe1470,00225
ofa1450,00222
ofhis1450,00222
outof1420,00218
wasnot1300,00199
withthe1270,00195
hecould1240,00190
itis1240,00190
inhis1230,00188
ina1220,00187
Theywere1220,00187
seemedto1150,00176
wasthe1100,00169
couldnot1090,00167
hesaid1090,00167
thesame1030,00158
forthe1010,00155
bythe950,00146
fora920,00141
intothe920,00141
shehad870,00133
asthough820,00126
theyhad800,00123
thatit800,00123
havebeen790,00121
anda780,00120
ithad770,00118
Theother760,00116
ofthem760,00116
tohim750,00115
thetelescreen750,00115
BIGBROTHER730,00112

This is a simple diagram to understand how the text is generated word by word.

For example, if I start with BIG, LLM will probably generate BROTHER, and continuing we can produce this sentence:

BIGBROTHERwasasortofthethought
Probability0,001120,00050,003450,000950,001040,000230,00087

By using “prompt” mechanism you can ask ChatGPT for what you want using the natural language.

But how ChatGPT “UNDERSTAND” text inserted by the user?

The text is transformed and each word represented by a code that computer can processed.

A way to represent individual words is Word2Vec technique in natural language processing (NLP), in which each word is represented by a vector (a set of numbers). This helped a computer to assign a meaning to the word.

Word2Vec stands for “words as vectors”. It means expressing each word in your text corpus in n-dimensional space. The word’s weight in each dimension defines it for the model.

The meaning of the words is based on the context defined by its neighboring words where they are associated.

A simple example of word representation using the Word2Vec approach in two-dimensional space.

Man = [1,4]

Woman = [1,3]

Manager = [4,2]

Actress = [4,1]

Manager-Man+Woman=Actress
[4,2][1,4][1,3][4,1]


In the following picture we have the graphic representation.

This is what happens when you sent some prompt to ChatGPT.

  1. The text is converted and split in tokens;

[10,10], [10,31], [10,15], [14,44], [8,5], …

(you, are , an, ICT, specialist, with, a, lot, of, experience)

  1. An algorithm (like ChatGPT) makes some prediction and output text word by word.

[10,10],…

(you,can,have, an, important, and, well-paid, job)

Let us now analyze some techniques to better exploit the potential of ChatGPT.

DIRECTIONAL PROMPTING

If you submit the same question to ChatGPT many times, you will likely receive different answers.

How can you use directional prompting in order to get more precise answer?

You have to give more information and to be more descriptive when you define a prompt. You have to give clear instruction. This will help the model to understand of what you want. If you ask for generic question, you receive generic answer.

Generic question:

More specific question:

More contextual and specific question:

OUTPUT FORMATTING

If you want to have a specific output or format of the output from ChatGPT, for example CSV (Comma Separated Values), Microsoft Excel, Microsoft Word or simply txt or maybe code as well, you have to specify as in the following examples.

We want statistical data in CSV format:

[01] openai.com;

[02] KENNETH WARD CHURCH, Emerging Trends Word2Vec, IBM 2016;

"Artificial Intelligence attempts to coax a machine, typically a computer, to behave in ways humans judge to be intelligent" John McCarthy (1927-2011)

"I think that there is a lot of fear about robots and artificial intelligence among some people, whereas I'm more afraid of natural stupidity" Eugenia Cheng

Created with Midjourney Bot#9282

Now we can talk with a “machine” using a natural language instead of using programming languages consisting of specific words to be written following a strict syntax and form.

We have a new paradigm in the HCI (Human Computer Interaction) with generative models and large language models.

We can interact with these AI Systems by using “prompt” mechanism, in which we have flexible inputs continued by equally flexible outputs.

In [01] this mechanism is called massive multimodal models.

In the following table a selection of different input/output modalities.

"Interaction with prompt-commanded AI is a different from other ways of interaction with machines" [01]. It has three important properties:

  • flexibility: using of text, code, images etc.;
  • generality: applicable to broad range of tasks;
  • originality: generate original content.

"Cognitive tools are external artifacts that are used to aid the psychological capacities of the human brain in completing a cognitive task" [01] They are used to reduce the cognitive work of human brain.

Massive multimodal models are cognitive tools or extenders they can be used for simple o complex interactions. The results depend on the skill and capacity of the user in exploiting these cognitive tools.

AI IMAGE SYNTHESIS: AI Text-to-Art Generator

It is the task where AI learns to understand a description in natural language and reproduce realistic image matching the description. It combines natural language processing (NLP) and computer vision (CV). In this text-to-image tasks NLP model is the encoder and an image synthesis model as the decoder.

Our society is becoming increasingly visual. Images are a very strong means of communication and in this Artificial Intelligence is a very powerful tool.

We can create incredible images (AI Text-to-Art Generator) using AI. Here are some websites where we can create image using AI:

  • www.bing.com/images/create: AI-powered Bing using its new feature Image Creator: “Powered by the very latest DALL∙E models from our partners at OpenAI, Bing Image Creator allows you to create an image simply by using your own words to describe the picture you want to see. Now users off the waitlist can generate both written and visual content in one place from within chat.”;
  • www.midjourney.com: you can access by discord.com account.
  • openai.com/dall-e-2: DALL·E 2 is an AI system that can create realistic images and art from a description in natural language. It is not free by default.
  • YOUIMAGINE from you.com: to magically transform your ideas into stunning visuals and one-of-a-kind graphics;
  • stable-diffusion-art.com: Stable Diffusion Art:
  • www.canva.com: MagicStudio By Canva allows to supercharge your work and designs with all power of AI.
  • www.imagine.art : "Create awe-inspiring masterpieces effortlessly and explore the endless possibilities of AI generated art". 
  • davinci.ai : Create AI art using only your words in just a few seconds!

AI-powered Bing’s Image Creator

More specific is the prompt better you obtain the image that you have in mind. Bing’s Image Creator recommends you format your prompts:

Adjective + Noun + Verb + Style.

Small fox running in the forest, digital art

Simple prompt description:

The centurion in the time of the Roman Empire: the backbone of the Roman army.

in Bing's Image Creator produce in output:

Complex prompt description

"[...] the centurions must be, not so much men who are bold and contemptuous of danger, as men who are able to command, tenacious and calm, who, moreover, do not move to attack when the situation is uncertain, nor throw themselves into the heat of battle, but on the contrary know how to resist even when pressed and defeated, and are ready to die on the battlefield." POLYBIUS, HISTORIES, VI, 24, 9

Produce:

MIDJOURNEY

Let's have a look to midjourney: an AI image generator prompt. A prompt is an input that guides a computer’s AI system in producing an art.

Prompts can range from a simple text description:

The centurion in the time of the Roman Empire: the backbone of the Roman army.

that produce:

to more complicated description that involve multiple parts coming together:

"[...] the centurions must be, not so much men who are bold and contemptuous of danger, as men who are able to command, tenacious and calm, who, moreover, do not move to attack when the situation is uncertain, nor throw themselves into the heat of battle, but on the contrary know how to resist even when pressed and defeated, and are ready to die on the battlefield." POLYBIUS, HISTORIES, VI, 24, 9

Produce:

In midjourney the syntax of the prompt in order to generate an image is the following:

/imagine < description text of the image >

What you put in the prompt is very important in order to define the picture that you would have be generated by midjourney.com.

You can ask :

a photo of ...txt...

a painting of ...txt...

you can decide the subject of the photo or painting:

  • animal;
  • person;
  • landscape;
  • object;
  • and so on

You can define which details you would like to add:

1) special environment: for example on a boat, in the forest,...

2) special lighting:

  • soft lighting,
  • ring lighting,
  • neon,
  • and son on

3) colour scheme;

4) point of view:

  • camera behind;
  • camera in the front of;
  • camera beside;

5) background:

  • solid colour;
  • a nebula;
  • a forest;
  • and so on.

6) atmosphere:

  • vibrant;
  • dark;
  • and so on.

You can add more information, for example the time of the day and so on.

As an image is 1000 words you can ask midjourney.com to generate an image by an uploaded image.

You can merge multiple images into one by using the blending process.

/blend <image1> <image2> ...

You can add additional text to enrich or modify the image.

You can ask also midjourney to get the prompt back (image captioning) using this command syntax:

/describe <image>

This means please describe this image for me.

You can use a negative prompting about you don't really want in the results, putting at the end of your prompt:

--no fog or dust

The you can use other commands:

--aspect 5:4 for aspect ratio

--ar 5:4

--chaos 100 or 90

You can stop the process at a determined percent in order to have an image at different stages.

--stop 50

Human-Computer Interaction but Human-Centred

Massive multimodal models are cognitive extenders and are distinct from autonomous AI systems because they are highly user-dependent.

[01] Wout Schellaert et al., Your Prompt is My Command, Journal of Artificial Intelligence Research, 2023;

[02] Ronald T. Kneusel, How AI works, 2024;