Author |
Message |
Richard
WebHelper
Joined: 12 Jan 2002
Posts: 93
Location: Herts, UK
|
Posted:
Mon Apr 15, 2002 10:56 pm (21 years, 11 months ago) |
|
I want only for search engines to include my Home Page when they crawl through my website. I don't want any other pages to be added. I heard about robots.txt which seemed great but using the following method would have taken me ages because of the amont of pages I have :
Code: | User-agent: *
Disallow: /folder1/
Disallow: /folder2/
Disallow: /folder3/
Disallow: file1.html etc......
|
Can it be done in an easier way without me having to include every page in the robots.txt file or by placing a code into each page I don't want included ?
Thanks |
|
|
|
|
Darren
Team Member
Joined: 05 Feb 2002
Posts: 549
Location: London
|
Posted:
Tue Apr 16, 2002 7:23 am (21 years, 11 months ago) |
|
Quote: | To exclude all files except one
This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "docs", and leave the one file in the level above this directory:
User-agent: *
Disallow: /~joe/docs/ |
From:
http://www.robotstxt.org/wc/exclusion-admin.html
OR......
Code: | <html>
<head>
<meta name="robots" content="noindex,nofollow">
<meta name="description" content="This page ....">
<title>...</title>
</head>
<body> |
See here for more details:
http://www.robotstxt.org/wc/meta-user.html |
|
|
|
|
Darren
Team Member
Joined: 05 Feb 2002
Posts: 549
Location: London
|
Posted:
Tue Apr 16, 2002 7:27 am (21 years, 11 months ago) |
|
Just found this which contradicts what the other site says - maybe this is newer?
Quote: | The record starts with one or more User-agent lines, specifying which robots the record applies to, followed by "Disallow" and "Allow" instructions to that robot. To evaluate if access to a URL is allowed, a robot must attempt to match the paths in Allow and Disallow lines against the URL, in the order they occur in the record. The first match found is used. If no match is found, the default assumption is that the URL is allowed. For example:
User-agent: webcrawler
User-agent: infoseek
Allow: /tmp/ok.html
Disallow: /tmp |
See for more details:
http://www.wdvl.com/Location/Search/Robots.html |
|
|
|
|
Daniel
Team Member
Joined: 06 Jan 2002
Posts: 2564
|
Posted:
Tue Apr 16, 2002 9:03 am (21 years, 11 months ago) |
|
I think both methods will work; they're just different ways of doing things. |
________________________________
|
|
|
|
Darren
Team Member
Joined: 05 Feb 2002
Posts: 549
Location: London
|
Posted:
Tue Apr 16, 2002 9:14 am (21 years, 11 months ago) |
|
When I said they contradicted each other I was referring to the info about the robots.txt where one site said there was no 'Allow' field was available and the other did. |
|
|
|
|
Richard
WebHelper
Joined: 12 Jan 2002
Posts: 93
Location: Herts, UK
|
Posted:
Tue Apr 16, 2002 9:31 am (21 years, 11 months ago) |
|
So, how should I write it to only accept my Home Page ?. Something like :
Code: | User-agent: *
Allow: index.htm
|
Thanks |
|
|
|
|
Daniel
Team Member
Joined: 06 Jan 2002
Posts: 2564
|
Posted:
Tue Apr 16, 2002 9:35 am (21 years, 11 months ago) |
|
Try that and see if it works, but personally I would use the <meta> tag (not that it's better). |
________________________________
|
|
|
|
Richard
WebHelper
Joined: 12 Jan 2002
Posts: 93
Location: Herts, UK
|
Posted:
Tue Apr 16, 2002 1:04 pm (21 years, 11 months ago) |
|
But the <meta> tag is not very widly excepted. Why would you pick this option, Daniel ?
Thanks |
|
|
|
|
Daniel
Team Member
Joined: 06 Jan 2002
Posts: 2564
|
Posted:
Tue Apr 16, 2002 6:22 pm (21 years, 11 months ago) |
|
Don't know why. I didn't know it wasn't widely accepted. Where did you find that out? |
________________________________
|
|
|
|
Richard
WebHelper
Joined: 12 Jan 2002
Posts: 93
Location: Herts, UK
|
Posted:
Tue Apr 16, 2002 9:27 pm (21 years, 11 months ago) |
|
|
|
|
Peter
Team Member
Joined: 09 Jan 2002
Posts: 147
Location: UK
|
Posted:
Wed Apr 17, 2002 12:26 pm (21 years, 11 months ago) |
|
Richy wrote: | So, how should I write it to only accept my Home Page ?. Something like :
Code: | User-agent: *
Allow: index.htm
|
Thanks |
Try:
Code: | User-agent: *
Disallow: *
Allow: index.htm
|
Which should block everything apart from index.htm.
But the question is why should you want to do this?
Peter. |
________________________________ Maple Design - quality web design and custom programming |
|
|
|
Richard
WebHelper
Joined: 12 Jan 2002
Posts: 93
Location: Herts, UK
|
Posted:
Wed Apr 17, 2002 4:32 pm (21 years, 11 months ago) |
|
Another one of my sites is going through a major change. The only page currently ready is the Home Page but making the rest of the new site will be a lengty process and I want my site to be recorded on the site engines as soon as possible. As time goes on, I will include the new pages as I add them to the site.
Thanks for everybody's help |
|
|
|
|
Richard
WebHelper
Joined: 12 Jan 2002
Posts: 93
Location: Herts, UK
|
Posted:
Sun Apr 21, 2002 10:56 am (21 years, 11 months ago) |
|
Another question .. Will :
be accepted by the search engines ?
Thanks |
|
|
|
|
Darren
Team Member
Joined: 05 Feb 2002
Posts: 549
Location: London
|
Posted:
Sun Apr 21, 2002 12:48 pm (21 years, 11 months ago) |
|
Not sure, but why not have all your images in one directory, and do this:
Keeps your files nice and tidy as well |
|
|
|
|
Justin
4WebHelp Addict
Joined: 07 Jan 2002
Posts: 1060
|
Posted:
Sun Apr 21, 2002 12:53 pm (21 years, 11 months ago) |
|
and if you want to be listed by the major Search Directories, use a meta description tag, as we give preference to sites including one, as it makes our life easier rather than having to write a description for you, but remember NOT to use hype, or we won't list you... |
|
|
|
|
|