Notice: Undefined index: open in /home/webusers/library/htdocs/ian/2003/11/index.php on line 31

November 18, 2003

How many web pages?

".. Estimates of the size of the web are all over the place. I've heard 10 billion, 7 billion, 35 billion and 300 billion.

Much of the confusion emerges from how you define a webpage. Does it count if you have to use a form to get to the information? Does it count if you need a password? These are not trivial questions. It determines if you count every book listed in Amazon.com as a separate webpage (It is online though you do need to use a form) or if you count everything available in DialogWeb as a webpage (It is online though you do need your credit card and Dialog password). So, for example, do we count intranets? Do we count email messages sent to a mailing list that get archived?

Oh, and for not very good reasons, I'm currently guessing the web is between 15 and 20 billion webpages, not counting commercial or intranet or form-based data...

* * *
David Novak, founder of the Spire Project

Novak, David.  "10 Billion pages or more".  Research Commentary.  2002.  The Spire Project.  17 Nov 2003.  <http://spireproject.com/art13.htm>.

Posted by iachan at 09:47 AM

November 17, 2003

The Ambiguity Inherent in the Invisible Web

It is very difficult to predict what sites or kinds of sites or portions of sites will or won't be part of the Invisible Web. There are several factors involved:

Which sites replicate some of their content in static pages (hybrid of visible and invisible in some combination)?
Which replicate it all (visible in search engines if you construct a search matching the page)?
Which replicate none and must be searched directly (totally invisible)?
You often don't know if a page has a ? in its URL until after you've somehow found it (excluded by policy).
Search engines can change their policies on what the exclude and include.


Excerpt from:

Barker, Joe.  Invisible Web: What it is, Why it exists, How to find it, and Its inherent ambiguity.  Finding Information on the Internet: A Tutorial.  28 August 2003.  UC Berkeley - Teaching Library Internet Workshops.  17 Nov 2003.  http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InivisibleWeb.html

Posted by iachan at 09:57 PM

What is the Invisible Web?

"The "visible web" is what you see in the results pages from general web search engines. It's also what you see in almost all subject directories. The "invisible web" is what you cannot retrieve ("see") in the search results and other links contained in these types of tools.

  • Searchable Databases. [more]
    • Most of the invisible web is made up of the contents of thousands of specialized searchable databases that you can search via the Web. The search results from many of these databases are delivered to you in web pages that are just for your search. Such pages very often are not stored anywhere: it is easier and cheaper to dynamically generate the answer page for each query than to store all the possible pages containing all the possible answers to all the possible queries people could make to the database. Search engines cannot find or create these pages. More explanation.
  • Excluded Pages. [more]
    • There are some types of pages that search engine companies exclude by policy. There is no technical reason they could not include them if they wanted. It's a matter of selecting what and what not to include in databases that are already huge, expensive to operate, and low revenue producers. More explanation.

and search engines cannot find or recreate them. You have to go to the page with a search box for each specialized database and search it. Additional invisible web pages are ones that search engines choose for various reasons to exclude.

Excerpt from:

Barker, Joe.  Invisible Web: What it is, Why it exists, How to find it, and Its inherent ambiguity.  Finding Information on the Internet: A Tutorial.  28 August 2003.  UC Berkeley - Teaching Library Internet Workshops.  17 Nov 2003.  http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InivisibleWeb.html

Posted by iachan at 09:28 PM

Brief History of the Internet

1969 :: Stanford, UCLA, University of Utah, and UC Santa Barbara connected to form ARPAnet

1971 :: Ray Tomlinson writes first email program
:: Project Gutenberg started by Michael Hart

1972 :: 2000 ARPAnet users

1980 :: ARPAnet crashes due to virus

1982 :: First name server deployed at the University of Wisconsin
:: TCP/IP defined

1984 :: DNS [Domain Name System] established / 2000 Internet hosts

1986 :: Creation of NSFNET allows for many more Internet connections

1987 :: 10,000 hosts

1988 :: 6000 out of 60,000 Internet hosts affected by Internet worm

1989 :: 100,00 hosts

1991 :: World Wide Web developed by CERN, HTML developed by Tim Berners-Lee

1993 :: Introduction of the Mosiac browser accelerates growth of Web.

1994 :: Online shopping, banking, and pizza ordering. Law firm sends out spam. First banner ads.

1995 :: AOL, Compuserve, Netscape. Many governmental offices around the world go online

1997 :: USAF, DOJ, CIA, and sSpice Girls web site hacked

1998 :: 320 million web pages

2002 :: 580 million people online worldwide

Based on Hobbe's Internet Timeline Posted by iachan at 08:16 PM

World Wide Web

A system of Internet servers that support specially formatted documents. The documents are formatted in a script called HTML (HyperText Markup Language) that supports links to other documents, as well as graphics, audio, and video files. This means you can jump from one document to another simply by clicking on hot spots. Not all Internet servers are part of the World Wide Web.

There are several applications called Web browsers that make it easy to access the World Wide Web; Two of the most popular being Netscape Navigator and Microsoft's Internet Explorer.

World Wide Web is not synonymous with the Internet.

World Wide Web
.
  5 August 2003.  Webopedia.com.  17 November 2003  <http://www.webopedia.com/TERM/W/World_Wide_Web.html
>.

Posted by iachan at 07:55 PM

The Internet

A global network connecting millions of computers. More than 100 countries are linked into exchanges of data, news and opinions.

Unlike online services, which are centrally controlled, the Internet is decentralized by design. Each Internet computer, called a host, is independent. Its operators can choose which Internet services to use and which local services to make available to the global Internet community. Remarkably, this anarchy by design works exceedingly well.

There are a variety of ways to access the Internet. Most online services, such as America Online, offer access to some Internet services. It is also possible to gain access through a commercial Internet Service Provider (ISP).

The Internet is not synonymous with World Wide Web.

Also see The Difference Between the Internet and the World Wide Web in the Did You Know . . . ? section of Webopedia.

Internet
.
  5 August 2003.  Webopedia.com.  17 November 2003  <http://www.webopedia.com/TERM/I/Internet.html

Posted by iachan at 11:35 AM

November 07, 2003

Criteria

Posted by iachan at 09:55 AM

November 06, 2003

CSS Layout

http://www.saila.com/usage/layouts/
http://glish.com/css/
http://www.spinwebdesign.com/experiments/dynamicHeight.shtml
http://www.alistapart.com/stories/flexiblelayouts/
http://www.bluerobot.com/web/layouts/
http://www.thenoodleincident.com/tutorials/box_lesson/boxes.html

Posted by iachan at 01:38 PM