Spacer Sidebar Directory Map

The Training Book, the handbook for trainers

The Training Book, the handbook for trainers

 


ITrain - International Association of Information Technology Trainers

Search Engines Pick Up Everything

Hmm, that's looks like an interesting search hit!


ITINFO Sponsor

Certification Required.

Trainers need certification to get ahead. Professionalization and certification open doors to greater training opportunities and higher earnings.

The first step to PTT certification is to successfully complete the Train the Trainer Advanced Seminar & Conference. This 2-day event will hone your training skills beyond what you may have imagined possible. And it makes you eligible to apply for Professional Technical Trainer certification.

The seminars are offered at least once each month. Register now, get professionalized, and get ahead.

Seminar details
Certification details

Internet Poll
Have you attended a seminar via e-learning?
yes
no

poll archive


Search Engines Also Record Private Data

by Dave Murphy
ISSN 1535-3613

Dave Murphy, DGL President & ITrain founder Search engine spiders, the software programs that crawl through millions of webpages each day indexing new and modified pages are finding more than webmasters would like. The spiders will record just about everything they find, including passwords, credit card numbers, classified documents, and other private information that the sites' webmasters never intended to be indexed.

Most popular search engines, such as Google, AltaVista, HotBot, Lycos, and Northern Light, will pick up webpages created in HTML (HyperText Markup Language), ASCII text, and, increasingly, PDF (Adobe's Portable Document Format). Unless documents are secured in protected directories or are included in a "robots.txt" instruction file on the website, the search engine's crawling bots will read the documents and include them in their master index that can then be searched by anyone with access to the Internet.

Recently, webmasters have found that other document formats are showing up in the major search engines: word processor files, spreadsheets, graphics, and other binary files that were posted to websites for easy access by authorized employees.

In most instances when sensitive data turns up the search engine databases it's the fault of an untrained web designer. Webmasters frequently use CGI (Common Gateway Interface) scripts to execute commands behind the scenes of a website. Unless the CGI programmer is aware of potential security vulnerabilities in his script, he may be leaving a gaping hole in the site's security. For example, a CGI script that collects and stores credit card data in an unprotected ASCII (American Standard Code for Information Interchange) file may leave the data open to a search engine's crawler. Using an MySQL database on a separate server and a web-interface such as PHP, both of which are available for free, would add a layer of security to the credit card data that would prevent search engines from locating and indexing the data.

Dave's Opinion

I'm careful to check out online retailers before I enter any private information on their websites. Often, I'll call the retailer and get a feel for how they do business. I often ask to talk to their webmaster and ask about his security practices. A few rules I follow: 1) try to buy only from large retailers, 2) check references for making my first purchase, 3) add my office address as a second shipping address to my credit card, and 4) have all shipments delivered to the office.

And, if you're thinking that the robots.txt fill will solve all your problems, consider this: the robots.txt file will only turn away crawling bots that comply to standards; not all are compliant. Also, the robots.txt file can be a clue to crackers as to which directories may hold the more interesting files.

Creating a secure website takes a bit of knowledge and a bit of skill.

Call for Comments

What do you think? Leave your comments on the message center.

References

Google
AltaVista
HotBot
Lycos
Northern Light
HTML
CGI
MySQL
PHP
Message Center


Subscribe to ITINFO.
Receive computing and Internet news & tips
by subscribing to the ITINFO information service.
Type your Internet email address in the form, and click "Subscribe."
Email Address:

Damar Group, Ltd. helps business use technology.

ITINFO is again accepting sponsors. Sponsor messages are included in ITINFO's email newsletter and are permanently posted to DGL's website and online reference areas.

ITINFO is an electronic publication of Damar Group, Ltd., publisher of Training Express computer learning guides. Comments and submissions to info@dgl.com.

Previous issues are on our website at http://dgl.com/itinfo/.

updated November 26, 2001
http://dgl.com/itinfo/2001/it011126.html

Return to DGL homepage
Copyright © 2001, Damar Group, Ltd., All Rights Reserved