Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
The forums and Google. Episode II
View unanswered posts
View posts from last 24 hours

Goto page 1, 2, 3  Next  
Reply to topic    Gentoo Forums Forum Index Gentoo Forums Feedback
View previous topic :: View next topic  
Author Message
juantxorena
Apprentice
Apprentice


Joined: 19 Mar 2006
Posts: 201
Location: The Shire

PostPosted: Sat May 16, 2009 10:01 am    Post subject: Reply with quote

Just a remainder to everybody, the forum is still useless.

Also, I have made a surprising discovery: in the google search ban thread, which is now locked (so I can't say this there), somebody said that the reason of blocking google bots were that they stressed the bugzilla database, or something like that, so the forums banned them. Now I have found that while the google search is still blocked for the forum, the bugzilla database is searchable with google (search anything in google using "site:bugs.gentoo.org", and compare the results to the ones of searching "site:forums.gentoo.org"). This is even more stupid than the whole forums suckiness stuff.

Who is the responsible of this nonsense?
_________________
I cannot write English very well. Please, correct any mistake so that I can improve.
Back to top
View user's profile Send private message
desultory
Bodhisattva
Bodhisattva


Joined: 04 Nov 2005
Posts: 9410

PostPosted: Sun May 17, 2009 6:35 am    Post subject: Reply with quote

Split from "new search stopwords list".

juantxorena wrote:
Also, I have made a surprising discovery: in the google search ban thread, which is now locked (so I can't say this there), somebody said that the reason of blocking google bots were that they stressed the bugzilla database, or something like that, so the forums banned them.
Two things need to be clarified in that summary. Something using that user agent identity was part of an incident with bugs.gentoo.org, and possibly other Gentoo sites, whether it was actually Google or not has never been made clear to me. Blocking GoogleBot from the forums was neither done via the forums administration interface nor by any member of the forums team it is not a ban it is blocking in a manner which the forums lack provision to remove.
juantxorena wrote:
Now I have found that while the google search is still blocked for the forum, the bugzilla database is searchable with google (search anything in google using "site:bugs.gentoo.org", and compare the results to the ones of searching "site:forums.gentoo.org"). This is even more stupid than the whole forums suckiness stuff.
Set a browser to claim to be GoogleBot, then making sure that the address you are browsing from does not map to a legitimate address for a real GoogleBot try browsing bugs.gentoo.org and forums.gentoo.org, avoid posting until the laughter and swearing have subsided.
juantxorena wrote:
Who is the responsible of this nonsense?
That was addressed in the previous topic.

In short, this is a known problem and it will be addressed properly as soon as it can be, no sooner.
Back to top
View user's profile Send private message
muhsinzubeir
l33t
l33t


Joined: 29 Sep 2007
Posts: 948
Location: /home/muhsin

PostPosted: Mon May 18, 2009 10:30 am    Post subject: Reply with quote

i thought may be this forum search should be replaced with google custom search....anyone think it might be a good idea?
_________________
~x86
p5k-se
Intel Core 2 Duo
Nvidia GT200
http://www.zanbytes.com
Back to top
View user's profile Send private message
gentoo-dev
Apprentice
Apprentice


Joined: 24 Jan 2006
Posts: 172

PostPosted: Mon May 18, 2009 10:53 am    Post subject: Reply with quote

muhsinzubeir wrote:
i thought may be this forum search should be replaced with google custom search....anyone think it might be a good idea?
That would only work if google was allowed to index the content of forums.g.o in the first place.
Google has been banned for no real reason. Allowing it back is a 5 minutes fix, but no, Gentoo devs would rather lock the thread than actually help. https://forums.gentoo.org/viewtopic-t-711943.html
Back to top
View user's profile Send private message
lordcris
Apprentice
Apprentice


Joined: 09 Jul 2002
Posts: 248

PostPosted: Fri May 22, 2009 11:51 am    Post subject: Reply with quote

c'mon man!!!!!
enable googlebot indexing.
forums are dying.
i've noticed a HUGE decline in users over last monts.
why are you doing this to my favorite distribution?
you should be all ashamed of your-selfs.
it is been more than a year that this shit is going on ...
Back to top
View user's profile Send private message
pilla
Bodhisattva
Bodhisattva


Joined: 07 Aug 2002
Posts: 7729
Location: Underworld

PostPosted: Fri May 22, 2009 12:24 pm    Post subject: Reply with quote

lordcris wrote:
c'mon man!!!!!
enable googlebot indexing.
forums are dying.
i've noticed a HUGE decline in users over last monts.
why are you doing this to my favorite distribution?
you should be all ashamed of your-selfs.
it is been more than a year that this shit is going on ...


Forums are dying? Do you have numbers to back up your "HUGE decline"? Cut the FUD.
_________________
"I'm just very selective about the reality I choose to accept." -- Calvin
Back to top
View user's profile Send private message
M
Guru
Guru


Joined: 12 Dec 2006
Posts: 432

PostPosted: Fri May 22, 2009 1:06 pm    Post subject: Reply with quote

Maybe they are not dying but they will die if this continues, how can someone find that gentoo even has forums (or had) ? From gentoo home page maybe, if home page doesn't scare them away. Numbers? FUD? People here expect better answer, only numbers we need is estimate number of days (maybe minutes or sec.) needed to solve this. I don't get it, people everywhere are doing best they can so google can index better their site, but no, gentoo people decided they don't need google. What happended to that squid proxy, can someone just edit that robots.txt file please.
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 6111
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Fri May 22, 2009 3:12 pm    Post subject: Reply with quote

lordcris wrote:
c'mon man!!!!!
enable googlebot indexing.
forums are dying.
i've noticed a HUGE decline in users over last monts.
why are you doing this to my favorite distribution?
you should be all ashamed of your-selfs.
it is been more than a year that this shit is going on ...


++

M wrote:
Maybe they are not dying but they will die if this continues, how can someone find that gentoo even has forums (or had) ? From gentoo home page maybe, if home page doesn't scare them away. Numbers? FUD? People here expect better answer, only numbers we need is estimate number of days (maybe minutes or sec.) needed to solve this. I don't get it, people everywhere are doing best they can so google can index better their site, but no, gentoo people decided they don't need google. What happended to that squid proxy, can someone just edit that robots.txt file please.


++

the problem is that gentoo-related problems are not appearing that often anymore in google and this to a big extent is caused by fgo not showing up ...

also what is the purpose in significantly complicating to search for or find solution to ones problems ?

letting a search-engine do full-text indexing is the way to go or at least improve the search function so that it doesn't cut 95% of all search words

thanks :)
_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004 :D
Back to top
View user's profile Send private message
bunder
Bodhisattva
Bodhisattva


Joined: 10 Apr 2004
Posts: 5934

PostPosted: Fri May 22, 2009 11:52 pm    Post subject: Reply with quote

pilla wrote:
lordcris wrote:
c'mon man!!!!!
enable googlebot indexing.
forums are dying.
i've noticed a HUGE decline in users over last monts.
why are you doing this to my favorite distribution?
you should be all ashamed of your-selfs.
it is been more than a year that this shit is going on ...


Forums are dying? Do you have numbers to back up your "HUGE decline"? Cut the FUD.


while i don't have any concrete proof, i would certainly bet that fgo usage has gone down since we disappeared from search engines. if you want to call that fud, so be it. :?
_________________
Neddyseagoon wrote:
The problem with leaving is that you can only do it once and it reduces your influence.

banned from #gentoo since sept 2017
Back to top
View user's profile Send private message
desultory
Bodhisattva
Bodhisattva


Joined: 04 Nov 2005
Posts: 9410

PostPosted: Sat May 23, 2009 1:06 am    Post subject: Reply with quote

Moved the above seven posts from "Gentoo forum search sucks".
Back to top
View user's profile Send private message
d2_racing
Bodhisattva
Bodhisattva


Joined: 25 Apr 2005
Posts: 13047
Location: Ste-Foy,Canada

PostPosted: Sat May 23, 2009 1:08 am    Post subject: Reply with quote

If they solve the problem, it will be good for everybody :P
Back to top
View user's profile Send private message
hitachi
Guru
Guru


Joined: 20 Feb 2006
Posts: 478
Location: Freiburg / Deutschland

PostPosted: Thu Jun 04, 2009 8:55 am    Post subject: Reply with quote

It looks like whatever site:forums.gento.org is working again. Tested it with google and bing. Can anyone confirme that?
Back to top
View user's profile Send private message
think4urs11
Bodhisattva
Bodhisattva


Joined: 25 Jun 2003
Posts: 6659
Location: above the cloud

PostPosted: Thu Jun 04, 2009 10:24 am    Post subject: Reply with quote

hitachi wrote:
Can anyone confirme that?

(non-official answer) seems as if since yesterday we're again indexed by Google.
_________________
Nothing is secure / Security is always a trade-off with usability / Do not assume anything / Trust no-one, nothing / Paranoia is your friend / Think for yourself
Back to top
View user's profile Send private message
M
Guru
Guru


Joined: 12 Dec 2006
Posts: 432

PostPosted: Thu Jun 04, 2009 10:37 am    Post subject: Reply with quote

Whouu, nice, we are again first for term "forums", that googlebot is really fast.
Back to top
View user's profile Send private message
Akkara
Bodhisattva
Bodhisattva


Joined: 28 Mar 2006
Posts: 6702
Location: &akkara

PostPosted: Thu Jun 04, 2009 11:24 am    Post subject: Reply with quote

A note of thanks to everyone who had their hand in re-enabling googlebot: thanks!

M wrote:
Whouu, nice, we are again first for term "forums", that googlebot is really fast.

Let's hope it is not *too* fast, lest it overloads the forums system and gets blocked again. :)

Hmm... I wonder if there's a way to rate-limit to specific destinations.
Back to top
View user's profile Send private message
bunder
Bodhisattva
Bodhisattva


Joined: 10 Apr 2004
Posts: 5934

PostPosted: Thu Jun 04, 2009 12:01 pm    Post subject: Reply with quote

actually, for the record... KoT noticed it the day we turned it back on (which was actually the May 29th)... i was hoping for an official comment from the remaining staff, so i asked him to redact his kudos. long story short, we thought we might have to turn it off again... and for all i know, we still might. *shrugs*
_________________
Neddyseagoon wrote:
The problem with leaving is that you can only do it once and it reduces your influence.

banned from #gentoo since sept 2017
Back to top
View user's profile Send private message
desultory
Bodhisattva
Bodhisattva


Joined: 04 Nov 2005
Posts: 9410

PostPosted: Fri Jun 05, 2009 4:25 am    Post subject: Reply with quote

hitachi wrote:
Can anyone confirme that?
I can, officially if needs be. Along with the other common search engine spiders Googlebot has been allowed back in for very slightly over a week at this point.

This was deliberately not announced publicly as the index of the forums that Google is using includes only a small fraction of the actual content present in the forums. The intention was to make such an announcement once some additional measures were in place to attempt to provide a more comprehensive view of the contents of the forums to Google and other search engines so that they could provide more meaningful search results.

As it turns out there was an actual problem with respect to Google and other spiders indexing the forum; having been allowed to run searches they did so to the point of consuming all available memory on the front end. Such behavior is no longer allowed and any bots doing so with sufficient frequency to potentially cause problems will be blocked, either from searching or outright. So far this seems to be working well enough in terms of resource usage, though if necessary further restrictions will be put in place or current restrictions may simply be more strongly enforced.

Naturally, this is all subject to resource availability and as such subject to change.
Back to top
View user's profile Send private message
neysx
Retired Dev
Retired Dev


Joined: 27 Jan 2003
Posts: 795

PostPosted: Fri Jun 05, 2009 5:55 am    Post subject: Reply with quote

desultory wrote:
hitachi wrote:
Can anyone confirme that?
As it turns out there was an actual problem with respect to Google and other spiders indexing the forum; having been allowed to run searches they did so to the point of consuming all available memory on the front end. Such behavior is no longer allowed and any bots doing so with sufficient frequency to potentially cause problems will be blocked, either from searching or outright. So far this seems to be working well enough in terms of resource usage, though if necessary further restrictions will be put in place or current restrictions may simply be more strongly enforced.

Naturally, this is all subject to resource availability and as such subject to change.
Tweak robots.txt. It's trivial to do and has been explained ad nauseum in threads about this issue, but looking at https://forums.gentoo.org/robots.txt
Code:
User-agent: *
Disallow: /cgi-bin/
Disallow: /search.php
Disallow: /admin/
Disallow: /memberlist.php
Disallow: /groupcp.php
Disallow: /statistics.php
Disallow: /profile.php
Disallow: /privmsg.php
Disallow: /login.php
nothing's been done so far to limit bots (all are allowed) or limit the number of hits they are allowed to make. Before you switch it off again, I suggest editing robots.txt first...
Back to top
View user's profile Send private message
Akkara
Bodhisattva
Bodhisattva


Joined: 28 Mar 2006
Posts: 6702
Location: &akkara

PostPosted: Fri Jun 05, 2009 6:13 am    Post subject: Reply with quote

[Context: I don't know anything about web serving, hosting, etc.]

desultory wrote:
[...] having been allowed to run searches they did so to the point of consuming all available memory on the front end. [...]

Do you mean to say, that besides following links and indexing the static pages, they were trying various search terms in the "search" box? And if so, any idea why? If they spidered all the static pages, what more would there be that 'search' would return, that would warrent them to try it? (And with what keywords even, how does a bot pick and choose?)
Back to top
View user's profile Send private message
desultory
Bodhisattva
Bodhisattva


Joined: 04 Nov 2005
Posts: 9410

PostPosted: Fri Jun 05, 2009 7:02 am    Post subject: Reply with quote

neysx wrote:
Tweak robots.txt.
What do you think the first thing done to limit well behaved spiders was?
neysx wrote:
It's trivial to do and has been explained ad nauseum in threads about this issue, but looking at https://forums.gentoo.org/robots.txt
Somehow, I noticed.
neysx wrote:
nothing's been done so far to limit bots (all are allowed) or limit the number of hits they are allowed to make.
That being rather the point of the whole exercise, to be as permissive as resources allow.
neysx wrote:
Before you switch it off again, I suggest editing robots.txt first...
Neither was I personally nor was any other member of the forum staff involved in blocking Google in the first place. Given that it took approximately a year to get this far, what leads you to infer that it would be blocked again lightly?

Akkara wrote:
Do you mean to say, that besides following links and indexing the static pages, they were trying various search terms in the "search" box?
Not exactly, they would just follow links to search results. It is not a matter of deliberately searching for things, they were simply following links that they had encountered elsewhere. That it triggered a search on this site was entirely inconsequential so far as the spiders were concerned.
Akkara wrote:
If they spidered all the static pages, what more would there be that 'search' would return, that would warrent them to try it?
In practice nothing is gained by third party search engines running searches on the forums because they are supposed to index everything the search function would return results for anyway, which is why disallowing searches by spiders is such an acceptable solution.
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 6111
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Sat Jun 06, 2009 12:45 pm    Post subject: Reply with quote

desultory wrote:

Akkara wrote:
Do you mean to say, that besides following links and indexing the static pages, they were trying various search terms in the "search" box?
Not exactly, they would just follow links to search results. It is not a matter of deliberately searching for things, they were simply following links that they had encountered elsewhere. That it triggered a search on this site was entirely inconsequential so far as the spiders were concerned.


would it be much of a deal then to disable the forum's own search functionality (-> load) and leave everything to the search engines ?

or even go that far to use google's search engine exclusively and locking the others out

that way the load would be rather acceptable, I suppose

thanks, btw, VERY VERY MUCH for enabling search indexing by search engines again :D
_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004 :D
Back to top
View user's profile Send private message
muhsinzubeir
l33t
l33t


Joined: 29 Sep 2007
Posts: 948
Location: /home/muhsin

PostPosted: Sat Jun 06, 2009 1:16 pm    Post subject: Reply with quote

sweet...:D :D :D :D :D :D :D
_________________
~x86
p5k-se
Intel Core 2 Duo
Nvidia GT200
http://www.zanbytes.com
Back to top
View user's profile Send private message
desultory
Bodhisattva
Bodhisattva


Joined: 04 Nov 2005
Posts: 9410

PostPosted: Sun Jun 07, 2009 8:02 am    Post subject: Reply with quote

kernelOfTruth wrote:
would it be much of a deal then to disable the forum's own search functionality (-> load) and leave everything to the search engines ?
While it would be possible to do, it is highly unlikely that the forums would be allowed to go without an integrated search engine.

Reasons include, but are not limited to:
  • The demonstrated ease with which external search engines can be blocked and the difficulty and most especially the delay involved in restoring their access.
  • The current poor coverage of known external indexes of the forums.
  • Embedding advertising in core site functions is almost certainly a nonstarter, equally so getting funding to pay thousands of dollars per year to avoid them. Not that I have contacted Google regarding embedding their search engine in the forums, nor do I intend to.
  • Closely integrated or fully internal search engines are not subject to the intrinsic lag involved with using an external spider fed search engine.
  • Other measures to improve site search are being explored.

Just to reiterate, regular numbers of searches run by users does not seem to be a problem, just repeated heavy usage of the search functions by spiders.
kernelOfTruth wrote:
or even go that far to use google's search engine exclusively and locking the others out
There are no plans to block any well behaved spiders, in part because as it is any well behaved spider should impose little more load than a few regular users could be expected to generate.
kernelOfTruth wrote:
that way the load would be rather acceptable, I suppose
As it is, the load seems to be adequately sustainable, points of concern should be soluble by analysis of the logs.
Back to top
View user's profile Send private message
Dont Panic
Guru
Guru


Joined: 20 Jun 2007
Posts: 322
Location: SouthEast U.S.A.

PostPosted: Wed Jun 10, 2009 4:50 am    Post subject: Reply with quote

Hooray!!!

I just did a Google search, and I am now receiving hits from the Gentoo Forums.

Example: http://www.google.com/search?hl=en&as_q=hitchhiker+sources&as_epq=&as_oq=&as_eq=&num=10&lr=&as_filetype=&ft=i&as_sitesearch=forums.gentoo.org&as_qdr=all&as_rights=&as_occt=any&cr=&as_nlo=&as_nhi=&safe=images

Thank You! Thank You! Thank You!

I really appreciate you guys working through this issue.
Back to top
View user's profile Send private message
pilla
Bodhisattva
Bodhisattva


Joined: 07 Aug 2002
Posts: 7729
Location: Underworld

PostPosted: Wed Jun 10, 2009 10:59 am    Post subject: Reply with quote

Yes, we know by the number of spammers we have been receiving.
_________________
"I'm just very selective about the reality I choose to accept." -- Calvin
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Forums Feedback All times are GMT
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum