|
| Recent
Articles |
UN Site Is Hacked The United Nations web site was hacked over the weekend, and was hacked very well by a group that used a common SQL injection escape ' in the code allowing them to put up anti war notes all over the web site.
Senator Asks Web For Legislation Help Something very cool is happening. Something historic. For the first time in American history the people are being directly consulted about legislation, and it's being...
Search Isn't Finding Candidate Stands On Issues Though it is estimated 42 percent of voters use the Internet to find information about Presidential candidates, those voters are not being served as well as they could be. Only television has a broader reach with voters...
EU Reviewing All Search Engines Information about users takes a bigger step in the EU as they decide that they need to review the data protection and retention policies of all the search engines.
NY Politicians Argue Over Google Earth New York politicians appear to be at odds over just how dangerous Google Earth might be; Assemblyman Mike Gianaris is asking Google to blur out images of "sensitive" sites, while Mayor Michael Bloomberg feels such...
SEO Firm Banned For Life By FTC Netvertise, Inc. and owner Elliot Krasnow were banned for life from promoting or selling franchises or business opportunities by the Federal Trade Commission. The FTC alleges the company violated federal law by selling...
Blogger Anti-SLAPP Case Angers Munchkin Man That "Left," the code-named blogger behind StockLemon.com (which is now CitronResearch) doesn't have many friends in the financial world isn't what's important. What is important that Left becomes...
|
|
|
|
08.27.07
Search Engines And The Federal Government
By David Vogelpohl
Like a page out of a John Grisham novel, the Federal Government is using robots to help stay invisible on the web.
Of course I'm not talking about futuristic robots with laser beams for eyes, but rather robots.txt files on various government websites.
A sharp eyed Declan McCullagh of CNet recently posted about several federal government websites using robots.txt files to keep their entire site from being indexed by search engines.
The offenders?
http://www.dni.gov/robots.txt
https://gits-sec.treas.gov/robots.txt
http://thomas.loc.gov/robots.txt
http://www.erl.noaa.gov/robots.txt
http://www.nwd.usace.army.mil/robots.txt
http://www.tricare.mil/robots.txt
Declan also points out other government sites who are using quirky robots.txt restrictions based on the bots they presumably prefer (example: favoring MSN's bot over Google).
So the question arises, is this the work of an inexperienced webmaster or part of a broader government conspiracy to hide web content?
Declan theorizes "I can think of two reasons: (a) avoiding the situation of posting a report that turned out to be embarrassing and was discovered by Google and (b) letting the Feds modify a file such as a transcript without anyone noticing. (There have been allegations of the Bush administration altering, or at least creatively interpreting, transcripts before. I've documented how a transcript of a public meeting was surreptitiously deleted - and then restored.)"
While a conspiracy theory based on the feds hiding web content for information manipulation purposes is an attractive assumption, in reality I can't believe this would be the actual intent. Let's not forget that robots.txt is an entirely voluntary command structure. Any person or bot who chooses to ignore robots.txt can freely access and save all content available in any publicly accessible area of all of these sites.
Say for example CareerBuilder wanted to crawl and cache content from the Office of the Director of National Intelligence's (ODNi) site. Perhaps it would look something like this? Perhaps this content would also be accessible from a search engine and would look something like this.
I'm fairly certain that the ODNI is smart enough to know that robots.txt isn't a security mechanism. I personally feel that these robots.txt files are either the work of inexperienced web masters or part of a misguided desire to reduce search engine visibility. If this is an effort to reduce search engine visibility, then that effort has failed. Many of the phrases I searched for which were contained on the ODNI website were either quoted somewhere else or down right copied all together.
This means that ODNI's original content is available on search engines, and even worse, is solely available (through search engines anyways) on web pages which the ODNI doesn't control. By having publicly available content and restricting search engine bots, the ODNI has effectively released control over their content to third parties who may append that content with politically motivated criticisms.
Comments
About the Author: David is the Vice President of Marketing and Sales for Giganews, Inc., the world’s largest Usenet access company. For over 6 years David has managed countless large scale global Internet marketing campaigns covering up to 8 different languages. David’s specialties include PPC management, affiliate channel sales, and Search Engine Optimization.
David grew up in Houston, TX, but now shares a home in Austin with his wife Lara. He has been involved in marketing Internet based services since 1996.
|