|
|
|
Brian Bingley authored and submitted this article.
|
Finding that Elusive Internet Search Tool
|
By Brian Bingley
The Internet, and networks in general, revolve around the passing of
information between users distributed over distance. Looking at the
various services it provides in a historical order can give us
a clearer idea of what the Internet is and what it can do.
The Internet, and
networks in general, revolve around the passing of information between users
distributed over distance. Looking at the various services it provides in a
historical order can give us a clearer idea of what the Internet is and what it
can do
In the
beginning... there was Archie, FTP(File
Transfer Protocol ) and Gopher. Gopher and
Archie already contained most of the components of current search engines. They
had spiders crawling the web looking for content which was stored in databases
and/or topic directories. Sites were ranked in a computer generated estimation
of relevance to the query. Search protocols, including the search command
syntax, were established from the beginning. As Archie was to FTP archives, Veronica was to Gopherspace,
a search utility that helps find information on gopher servers. (See Common Questions and
Answers about Veronica, a title search and retrieval system for use with the
Internet Gopher). Check the Original Internet
Hunt and THE
ANSWERS for an indication of what the early Internet could deliver in
the right hands.
Then came... Graphic User Interfaces
(GUI), in the form of Mosaic in 1993, & the World Wide Web (WWW) and
existing search features were built into the new search engines such as Lycos,
Alta Vista and Hotbot (See A brief
history of the Lycos and HotBot search engines and A brief history
of the AltaVista search engine). Alta Vista's popularity stemmed from its
embrace of Boolean searches enhanced with 'case sensitivity', 'phrase searching'
and a 'proximity search capability' (the NEAR operator) all of which survived
until the recent takeover by Yahoo of AltaVista.
Ten little
Indians... Hotbot, based on Inktomi search
results, used such features as field searching, limiting by date and searching
for particular file types. It was, briefly, the search engine of choice for many
but never achieved the popularity enjoyed by Alta Vista and lost direction after
substituting DirectHit
content for that from Inktomi. Even before the Internet bubble burst in 2001,
great search tools closed or changed. You can see some of them at Searching
Graveyard where they are organized in chronologic order with some of their
logos.
The Swiss Army knife of search engines; or why we are
googly-eyed about Google In a departure from the boolean search
based technologies of the early nineties the rating of Google hits is based on
their linkages (in imitation of the famed ISI Citation
Indexes for academic literature) and authority rather than 'weightings' by
the numbers of occurrences of keywords in the text . Google detects phrase
matches even when quotes are not used in the basic search mode and it usually
ranks documents with matching phrases higher. ( See Review of
Google 5 June 2004; Google Advanced Search
operators and The Google ~Guide
Site )
Google has its
limitations....: There is no nesting, no truncation, and it does not
support full Boolean logic; It only indexes the first 101 KB of a Web page and
about 120 KB of PDFs; The number of keywords you can search on is limited to 10
( now 32...January 2005) but you can override this limitation by putting a plus
sign ( + ) in front of any of the words when using them in a search phrase or
you can use the wildcard symbol ( * ) and actually search for more than 10
(32) keywords at a time because the ( * ) is not counted as a
word
....and
special features It is currently indexing the abstract records for all
online technical documents and standards by the Institute of Electrical and Electronics
Engineers (IEEE); Abstracts are available free and full-text documents are
available to subscribers or for online purchase; Starting a search with
"define", "definition", "what is", and "what are" will invoke a Google Glossary lookup;
Google will soon provide access to a 2 million record subset of more than 53
million records in the OCLC Project WorldCat - the most popular and widely
available books (but see Two Million Open
Worldcat Records Hit the Yahoo Database - Infotoday July 18 2004); WebQuotes - what people are
saying about a particular site Google provides background information on a page
if you type the URL in the form info:www.whatever.xxx. (See also Gary
Price's Tips for
Searching Google and FAQ based on questions in the google.public.support.general newsgroup )
... but that aint
all, Google very sensibly allows and encourages others to adapt and
enhance their software as indicated by the following examples: Google Ultimate Interface
utilizes all advanced search options (e.g. Web search, Image search, News
search) and Google's tools (e.g. Glossary, Sets),toggle the
Duplicates Filter on or off, use the file format search, and set the number of
results per page & has links for typing non-English letters; Google API Proximity Search
(GAPS) lets you look for two words within one, two or three words of each
other; Google hacks by Tara Calishain and Rael Dornfest (book) -
Google
Hacks - 100 industrial-strength, real-world, tested solutions to practical
problems including Hack 5: Getting
Around the 10 Word Limit Hack 17: Consulting the
Phonebook Hack 32: Google
News Hack 44: Scraping
Google Results Hack 54:
NoXML, Another SOAP::Lite Alternative Hack 79: Measuring Google
Mindshare Hack 87: Google
Whacking Hack 100:
Removing Your Materials from Google
There are other search
engine technologies... The clustering search engines Vivisimo, Mooter and SnakeT(SNippet Aggregation for Knowledge
ExTraction) show potential but are effected by the usual business manoeuverings.
The clustering meta-search engine Vivisimo no longer harvests data from Google.
Different tools cluster using different methods. One of the more common methods
is to look for phrases which appear in multiple listings. All pages that have a
certain phrase are listed in this cluster, who's name is that phrase. ( See Topic Clustering in
Searches ). Kartoo visual search
and Maps of the Web use similar
technology but present their results in a visual display.
Natural
language searching presents a problem for artificial intelligence due to the
complexity, irregularity, and diversity of human language. Ask Jeeves is the best example of a search engine
using natural language. Ixquick and Surfwax are also of interest but Surfwax like other very successful
Internet technologies has been absorbed into the commercial sector ( See SurfWax Enterprise/ SurfWax Scholar /SurfWax LawKT (Knowledge
Tools ). Applied Semantics' Oingo
provided very effective natural language searches in limited domains but it too
was quickly commercialised. It was acquired by Google in April 2004 to drive
their ad products. (See Google
Buys Applied Semantics )
Beyond
Google... Rumours persist of work being done by Yahoo! and
Microsoft (MSN) to supplant Google(See MSN
launches revamped search engine and Yahoo! Search has a fresh, new
look) and claims are made about Social
networking search technologies such as those employed by Eurekster, Orkut, Ryze,
Linkedin, delicious, and Furl but none of these appears to be in a
position yet to effect a dramatic shift in web searching. (See accounts of Tim
Bernars Lee's Semantic
Web for more measured projections of future search technologies)
But wait. Search
engines dont tell us the full story. There are many tried and true websites for
searchers Here are a selection of resources which can be appealed
to immediately when appropriate:
DIRECTORIES: Keyword searching ensures maximum recall but often
finds far too many hits to check easily and some of those found have limited relevance.
The hierarchical subject directories on the other hand were usually produced by human indexers and consequently excluded much of the ephemeral, the unreliable
and the purely commercial sites. Whilst these are now under challenge from the
clustering search engines (see above) many remain key resources, for
example: Beyond...the Black
Stump which includes Australiana and Search by ISBN (compare the prices of in-print
and out-of-print books at 14 online bookstores) BUBL LINK / 5:15 Catalogue of Selected Internet
Resources Gary Price's List
of Lists (and see his weekly newsletter) The World Wide Web Virtual Library
(WWW-VL) oldest catalog of the web by Tim Berners-Lee, the creator of html and
the web itself About - The Human Internet
[formerly called the Mining Company] directory/portal neatly organizes thousands
of topics, with good news and commentary. About.com
Closed Guide Relocation Directory and Assistance Links designed to help
editors relocate their pages and users find the pages that have
moved INFOMINE scholarly Internet resource
collections Internet Public
Library - see their Pathfinders Librarians' Index to the Internet
See LII Theme
Collection: The Olympic Games, Librarianship
& California and Washington
Wine
SPECIALISED
SEARCH TOOLS Amazon.com "Search
Inside the Book" results list authors and titles, "excerpt from" and
the hyperlinked title of the book...FAQ Cached
websites Gigablast, Wayback Machine, Daypop, IncyWincy, Yuntis ( See also Finding Old Web
Pages ) MESA -
Meta-Email-Search-Agent . PINAKES, A Subject
Launchpad Voice of the Shuttle
(University of California, Santa Barbara) one of the few comprehensive
research subject lists with a humanities orientation. SurfWax Enterprise/SurfWax Scholar /SurfWax LawKT (Knowledge
Tools) by subscription
SEARCH
PORTALS Fagan Finder
- search engines, reference, tools, and more...Biography page...Quotations and Proverbs
Search Pandia
Powersearch: All-in-One List of Search Engines
DATABANKS
&/OR DIGITAL LIBRARIES Encyclopedia Britannica: The 1911
Edition Jewish Encyclopedia.com New Advent Catholic Encyclopedia Nonverbal Dictionary of
Gestures, Signs & Body Language Cues Official history of Australia
in the war of 1914–1918 Guardian Archive (since 1899) Home Economics Archive: Research,
Tradition, History Old Car Manual
Project Spectator Text
Project Published by Joseph Addison and Richard Steele
from 1711 to 1714 Technology in
Australia 1788-1988
INTERNET
ARCHIVES Scout Report
Archives The Coombsweb is the
world's oldest and most prominent Asian Studies online research facility. Its
Web pages are designed for transmission speed, not fancy looks. Alan Lomax Archive... Audio Archive...Film and Videotape
Archive...Paper
Archive...Photograph
Collection. BBC
World Service Archive international news, analysis and information in
English and 42 other languages (See also BBC Audio Interviews
)
AUSTRALIAN DATABANKS & GUIDES The
AusAnthrop Database On Line AusStage gateway to Australian
performing arts Australian
Cooperative Digitisation Project, 1840-45 Australian Digital Theses Program -
CAUL Historical
Australian Acts (none earlier than 1973) Mining in
Australia Social
Health Atlas of Australia Womens Weekly Index
Database See also NLA's Electronic
Australiana, Charles Sturt
University Regional Archives and SLNSW's Aboriginal Australian links
SOUTH AUSTRALIAN DATABANKS &
GUIDES Atlas of South Australia Ground Truth: a community resource guide to the human& environmental
history of South
Australia biogeographical regions, local government areas (LGA),
coastal and marine mapping, aboriginal history SASS: South Australian
Sources for History and Social Science Brian
Condon's 30 year compilation includes relevant theses South Australian Police
Historical Society
CONTACTS Free Pint Bar International Rivers Network Medical Expert Witness Database:
Green MedicoLegal Ltd NGO Global
Network OZLISTS: A list of
Australian electronic mailing lists Philanthropy Australia Pitsco's Ask an Expert Yearbook of Experts, Authorities and
Spokespersons
KEEPING UP TO_DATE WITH WEB
RESOURCES Gary Price's ResourceShelf Freepint newsletter ...Beyond the Black Stump
newsletter BUBL News BUBL LINK Updates NSDL
Scout Reports ResearchBuzz.com Phil Bradley's weblog
Internet searching, web design, search engine developments and anything that
will interest librarians! Library
Clips Web 2.0 oriented search blog Top
100 Alternative Search Engines, March 2007; Feb
2007; Jan
2007 Read/WriteWeb
Web 2.0 weblog ranked among Technorati’s Top 50 blogs in world...web technology
news, reviews & analysis; Search Month is a monthly
newsletter that recaps stories covered on Search Engine Land over the past
month.
Check for others at Google
Groups and Yahoo! Groups, OZLISTS and Internet Resources
Newsletter: Internet In Print
Index
|
|
|
|