Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
new search stopwords list
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4, 5, 6  Next  
Reply to topic    Gentoo Forums Forum Index Gentoo Forums Feedback
View previous topic :: View next topic  
Author Message
hds
Advocate
Advocate


Joined: 21 Aug 2004
Posts: 2629
Location: Sprockhoevel [GER]

PostPosted: Thu Apr 14, 2005 10:29 am    Post subject: Reply with quote

i just realize that "screen" is a stopword. could this be removed? there is an application called screen (everyone probably knows that) and i have answered questions about this one at least 2 or 3 times.
Back to top
View user's profile Send private message
kbranch
n00b
n00b


Joined: 17 Nov 2004
Posts: 40

PostPosted: Fri May 13, 2005 3:45 am    Post subject: Reply with quote

I can understand the reasons for having a list like this, but I'd say that having this many words on it is just overkill. Many of the words on the list are very useful under some circumstances.

For example, let's say you're having a problem compiling kde 3.4. Such a problem would usually mean that there's something wrong with the ebuild, so other users have likely seen the same problem and posted a fix for it. I'd expect a search for "kde 3.4 compile error" to turn up some useful posts within the first few results, but the current search would just turn up useless crap about kde 3.4.

I've run into the problem several times without knowing why I got useless results. I'd say that slightly slower servers would be better than people thinking that the forums are just full of crap.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 16104
Location: Colorado

PostPosted: Fri May 13, 2005 4:36 am    Post subject: Reply with quote

kbranch wrote:
the current search would just turn up useless crap about kde 3.4.
Actually, it'll just turn up useless crap about kde. "3.4" (or other similar numbers) aren't indexed. The stopwords list was generated based on how often the word is used. For example, "the" and "gentoo" are not likely to be helpful search words.
_________________
lolgov. 'cause where we're going, you don't have civil liberties.

In Loving Memory
1787 - 2008
Back to top
View user's profile Send private message
kbranch
n00b
n00b


Joined: 17 Nov 2004
Posts: 40

PostPosted: Fri May 13, 2005 5:02 am    Post subject: Reply with quote

pjp wrote:
kbranch wrote:
the current search would just turn up useless crap about kde 3.4.
Actually, it'll just turn up useless crap about kde. "3.4" (or other similar numbers) aren't indexed. The stopwords list was generated based on how often the word is used. For example, "the" and "gentoo" are not likely to be helpful search words.


Well, I guess that makes my example somewhat less than useful, but the underlying point still stands.

If the list was just automatically generated and there aren't any specific objections to removing the more useful words, I'd be glad to go through and find such words.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 16104
Location: Colorado

PostPosted: Fri May 13, 2005 5:04 am    Post subject: Reply with quote

Well, it wasn't a script that did it without human intervention. Someone looked at the worst offenders, and filtered out the obvious useful terms, such as kde.
_________________
lolgov. 'cause where we're going, you don't have civil liberties.

In Loving Memory
1787 - 2008
Back to top
View user's profile Send private message
kbranch
n00b
n00b


Joined: 17 Nov 2004
Posts: 40

PostPosted: Fri May 13, 2005 5:11 am    Post subject: Reply with quote

So then why are some of those words still on the list? Is there another side to things that I haven't seen in this thread or is it just that nobody's done it yet?

Again, if it's just a question of effort, I'll be glad to submit a revised list.
Back to top
View user's profile Send private message
Butts McCokey
Advocate
Advocate


Joined: 23 Apr 2004
Posts: 3327

PostPosted: Fri May 13, 2005 5:15 am    Post subject: Reply with quote

what happens when you put a phrase in speech marks like in a search engine?
_________________
Since the bible and the church are obviously mistaken about where we came from, how can we trust them with where we're going?

"An eye for an eye will make us all blind" - Gandhi

Cold is gods way to tell us to burn more Catholics
Back to top
View user's profile Send private message
tomk
Administrator
Administrator


Joined: 23 Sep 2003
Posts: 7219
Location: Sat in front of my computer

PostPosted: Fri May 13, 2005 7:19 am    Post subject: Reply with quote

cokehabit wrote:
what happens when you put a phrase in speech marks like in a search engine?


That doesn't make any difference with the forums search, as it only knows about individual words, not groups of words.

kbranch wrote:
So then why are some of those words still on the list? Is there another side to things that I haven't seen in this thread or is it just that nobody's done it yet?

Again, if it's just a question of effort, I'll be glad to submit a revised list.


It's not just a matter of removing words from the list, the stopwords aren't index. Re-indexing isn't an option on forums this big.
_________________
Search | Read | Answer | Report | Strip
Back to top
View user's profile Send private message
Gherald
Veteran
Veteran


Joined: 23 Aug 2004
Posts: 1399
Location: CLUAConsole

PostPosted: Fri May 13, 2005 7:27 am    Post subject: Reply with quote

Anior wrote:
You can use google to search the forums

Firefox users can bookmark: http://www.google.com/search?q=%s+site%3Aforums.gentoo.org+-inurl%3Asearch.php

Right click the bookmark, go to properties, and set the keyword to something short such as "fgo"

Then you can type "fgo <search keywords>" in the address bar.
Back to top
View user's profile Send private message
TXTad
Tux's lil' helper
Tux's lil' helper


Joined: 15 Jan 2004
Posts: 108
Location: Texas

PostPosted: Wed May 18, 2005 3:01 pm    Post subject: Reply with quote

Wow. No wonder seraching for anything in the forums is such a whippin'. I agree with this concept for keywords, but this is completely counter productive for phrases, such as the "you have mail" example. As a sysadmin, I can sympathize with the resources being consumed, but, that's what we have computers for: To do things that are hard for humans to do. If the computer has to work hard, then it's doing its job. Breaking the system for humans so that things are easy on the computer doesn't seem like the proper way to handle a problem.

Tad
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 16104
Location: Colorado

PostPosted: Wed May 18, 2005 3:04 pm    Post subject: Reply with quote

TXTad wrote:
Breaking the system for humans so that things are easy on the computer doesn't seem like the proper way to handle a problem.
When the hardware can no longer perform the tasks demanded by humans, and there is no reasonable means to ensure power is available as load increases, tradeoffs need to be made. If you'd like to donate an 8-way dual-core Opteron system with SCSI disks and many gigs of RAM, that might delay it for a while.
_________________
lolgov. 'cause where we're going, you don't have civil liberties.

In Loving Memory
1787 - 2008
Back to top
View user's profile Send private message
killfire
l33t
l33t


Joined: 04 Oct 2003
Posts: 618

PostPosted: Thu May 19, 2005 10:53 pm    Post subject: Reply with quote

is it possible to say, not index those words in the actual posts (but take them from the post subject), and get rid of the stopwords altogether? or block them when alone, but for some, use it when other (valid) words are combined? like have a completely block list (like RTFM, the, or LOL) and have a partial block list (error, screen, compile...) for things that are combined?

because just stripping them seems like it will lower the accuracy of the posts a lot...

then again, i have no idea how the db works...
_________________
my website, built in HAppS: http://dbpatterson.com
an art (oil painting) website I built a pure python backend for: http://www.lydiajohnston.com
Back to top
View user's profile Send private message
rhill
Developer
Developer


Joined: 22 Oct 2004
Posts: 1629
Location: sk.ca

PostPosted: Sat May 21, 2005 6:10 am    Post subject: Reply with quote

killfire wrote:
is it possible to say, not index those words in the actual posts (but take them from the post subject), and get rid of the stopwords altogether?

Quote:
because just stripping them seems like it will lower the accuracy of the posts a lot...


and only searching thread subjects won't? you don't search for titles, you search for content.

Quote:
or block them when alone, but for some, use it when other (valid) words are combined?


tomk wrote:
That doesn't make any difference with the forums search, as it only knows about individual words, not groups of words.

_________________
by design, by neglect
for a fact or just for effect
Back to top
View user's profile Send private message
mallchin
l33t
l33t


Joined: 21 Jan 2003
Posts: 655
Location: United Kingdom

PostPosted: Sun May 22, 2005 12:24 pm    Post subject: Reply with quote

I agree with cokehabit, a heavy forum needs an advanced search widget to help sift through the crap and find what you want. Encapsualting whole strings (as google can) would soon make the word shitlist irrelevant, and I have always felt this is a missing feature from phpBB.

I understand we might not make such changes, and this may be the only acceptable solution for now, but a better solution does exist.
_________________
6700 @ 2.66GHz, 4Gb RAM, 2 x 500Gb, 8800 GTX, PhysX, X-Fi, 24" Widescreen, Tux mascot
Back to top
View user's profile Send private message
My_World
Guru
Guru


Joined: 01 Sep 2003
Posts: 339
Location: Kalahari Desert

PostPosted: Sun May 22, 2005 3:19 pm    Post subject: Reply with quote

Just a few questions and suggestions I would like to voice....

Having some of those words removed from searches WILL lead to more double postings, and in the end, very frustrated users who cannot find the solution they want to a "serious" problem. We cater not only for the home user, but companies also, and being a sys-admin and go through pages uppon pages of worthless search results will land Gentoo in a bit of hot water at the end of the day. I have also been there this week trying to find a cure for why GLX was suddenly broken and all the search strings I entered landed me less than helpfull results. I was only by chanse that I found the topic to help me while browsing the forums a bit. I have resolved, in many cases now, to rather use Google and see if I can't find an answer rather than using the forums search.
For example, look when I joined and look at the number of posts I made till now. Very few, and most of them was made within this year cause the search function did not return usable results anymore. Get my drift?

I think "strings" are important, like searching the error string output. That will lead you exactly to the right place almost every time.

That said, I know that the servers are taking an enormous load, but isn't there a way to make the searching a bit more effective? Limiting the results to say 3 months (with maybe an advanced option for 6, 9, 12 months), upgrading the board maybe to better software, rallying for donations to buy a better server?
We once bought, from donations, an Opteron server for a Bit-Torrent tracker here in South Africa with only 300 members! US-$10 might not sound much, but add a few thousand users x $10 and you have yourself a new server. There must be a better way to either filter the stopwords, better the search engine software or someting?
:(
_________________
"Ubuntu" - an African word meaning "Gentoo is too hard for me".
Back to top
View user's profile Send private message
ian!
Bodhisattva
Bodhisattva


Joined: 25 Feb 2003
Posts: 3827
Location: Essen, Germany

PostPosted: Sun May 22, 2005 7:36 pm    Post subject: Reply with quote

Hardware isn't the problem. (Webserver: Dual Xeon 3Ghz, 2GB Ram, RAID5 10k rpm harddisks 120GB, DB: an even faster machine)
It's a software problem.
_________________
"To have a successful open source project, you need to be at least somewhat successful at getting along with people." -- Daniel Robbins
Back to top
View user's profile Send private message
mcspiff
Tux's lil' helper
Tux's lil' helper


Joined: 24 Oct 2004
Posts: 109

PostPosted: Sun May 22, 2005 8:01 pm    Post subject: Reply with quote

My_World wrote:
Just a few questions and suggestions I would like to voice....

Having some of those words removed from searches WILL lead to more double postings, and in the end, very frustrated users who cannot find the solution they want to a "serious" problem. We cater not only for the home user, but companies also, and being a sys-admin and go through pages uppon pages of worthless search results will land Gentoo in a bit of hot water at the end of the day. I have also been there this week trying to find a cure for why GLX was suddenly broken and all the search strings I entered landed me less than helpfull results. I was only by chanse that I found the topic to help me while browsing the forums a bit. I have resolved, in many cases now, to rather use Google and see if I can't find an answer rather than using the forums search.
For example, look when I joined and look at the number of posts I made till now. Very few, and most of them was made within this year cause the search function did not return usable results anymore. Get my drift?

I think "strings" are important, like searching the error string output. That will lead you exactly to the right place almost every time.

That said, I know that the servers are taking an enormous load, but isn't there a way to make the searching a bit more effective? Limiting the results to say 3 months (with maybe an advanced option for 6, 9, 12 months), upgrading the board maybe to better software, rallying for donations to buy a better server?
We once bought, from donations, an Opteron server for a Bit-Torrent tracker here in South Africa with only 300 members! US-$10 might not sound much, but add a few thousand users x $10 and you have yourself a new server. There must be a better way to either filter the stopwords, better the search engine software or someting?
:(


Wow, just wow. If your business relies on the gentoo forums for tech. support, maybe they should fire the tech guys and use the saved cash on a redhat contract.
Back to top
View user's profile Send private message
My_World
Guru
Guru


Joined: 01 Sep 2003
Posts: 339
Location: Kalahari Desert

PostPosted: Sun May 22, 2005 8:44 pm    Post subject: Reply with quote

mcspiff wrote:

Wow, just wow. If your business relies on the gentoo forums for tech. support, maybe they should fire the tech guys and use the saved cash on a redhat contract.

Not everybody knows all there is to know of Gentoo Linux...
:P
_________________
"Ubuntu" - an African word meaning "Gentoo is too hard for me".
Back to top
View user's profile Send private message
tecknojunky
Veteran
Veteran


Joined: 19 Oct 2002
Posts: 1937
Location: Montréal

PostPosted: Tue May 24, 2005 9:37 am    Post subject: Reply with quote

Well, I ended up here because I could not get relevant search result anymore and I wanted to know if there was a problem with it. I guess I've found the answer.

My opinion: bad idea. Like many said, it's the combination of words that matters. This solution will not hold the long term road, I'm afraid. Like one poster said, more irrelevant search will lead to more posting will lead to more words will lead to a bigger database anyway, and back to square one.

The cause is quantity and the problem is two fold, depending on how you see things:

1- More quantities requires better search algorithm. I wonder how the db is set up. Maybe the problem is right there. Ever thought trying Oracle instead of mysql? How are the indexes made?

2- Do you really need to index all the posts made since april 2002? I know it's been mentionned before and other big forums do it, after some time, the threads should be made static. It as to be. You can't keep accumulating and preserve live threads and posts ad vitam eternam. Makes no sense (for proof).

So either find (make) a good search algo, or diminush the ammount of data. Don't pretend to know that by removing words is a solution, because any of these words in combination with other words is relevant. With time, you'll block all the words from the dictionnary and then we'll be able to say "wow man, searches are blazing fast now!" :wink:
_________________
(7 of 9) Installing star-trek/species-8.4.7.2::talax.
Back to top
View user's profile Send private message
chrib
Guru
Guru


Joined: 27 Sep 2003
Posts: 558
Location: Berlin, Germany

PostPosted: Tue May 24, 2005 2:18 pm    Post subject: Reply with quote

tecknojunky wrote:

1- More quantities requires better search algorithm. I wonder how the db is set up. Maybe the problem is right there. Ever thought trying Oracle instead of mysql?


Oracle is rather expensive and I don't think that the Gentoo Project is willing to spend their money on a license for an Oracle Database-Server.

YMMV
_________________
Der Mensch kämpft um zu überleben, und nicht, um zu Grunde zu gehen. - Paulo Coelho
It is the end of all hope. To lose the child, the faith. To end all the innocence. To be someone like me. - Nightwish - End of all hope
Back to top
View user's profile Send private message
tecknojunky
Veteran
Veteran


Joined: 19 Oct 2002
Posts: 1937
Location: Montréal

PostPosted: Tue May 24, 2005 7:33 pm    Post subject: Reply with quote

chrib wrote:
Oracle is rather expensive...

Oh. I was under the impression that it was free (beer) if use for not-for-profit.
_________________
(7 of 9) Installing star-trek/species-8.4.7.2::talax.
Back to top
View user's profile Send private message
mallchin
l33t
l33t


Joined: 21 Jan 2003
Posts: 655
Location: United Kingdom

PostPosted: Tue May 24, 2005 9:05 pm    Post subject: Reply with quote

tecknojunky wrote:
chrib wrote:
Oracle is rather expensive...

Oh. I was under the impression that it was free (beer) if use for not-for-profit.


I'd be surprised if it was.
_________________
6700 @ 2.66GHz, 4Gb RAM, 2 x 500Gb, 8800 GTX, PhysX, X-Fi, 24" Widescreen, Tux mascot
Back to top
View user's profile Send private message
Omega21
l33t
l33t


Joined: 14 Feb 2004
Posts: 788
Location: Canada (brrr. Its cold up here)

PostPosted: Thu May 26, 2005 3:33 am    Post subject: Reply with quote

Im suprised to not see much profanity in there... ;)
_________________
iMac G4 1GHz :: q6600 //2x 500GB//2GB RAM//8600GT//Gentoo :: MacBook Pro//2.53GHz
Back to top
View user's profile Send private message
Given M. Sur
l33t
l33t


Joined: 03 Feb 2004
Posts: 648
Location: No such file or directory

PostPosted: Fri May 27, 2005 7:53 am    Post subject: Reply with quote

pjp wrote:
For example, "the" and "gentoo" are not likely to be helpful search words.

Yeah? Try finding the Gentoo Desktops for May 2005 thread. Luckily Earthwings merged my mistakingly created dupe.

I have to agree with some of the others here that have said that these stopwords suck. Obviously the desktop thread isn't a great example since it's not a support thread (and can be found by skimming through the pages in Desktop Environments), but I have had problems trying to find support issues too.

And I don't understand something. If it's not a hardware problem (as ian! mentioned) what exactly is wrong with the software? I, like others, would rather have a slow relevant search than a quick irrelevant one.

Anyways, I just wanted to voice my objections. Thanks for reading.
_________________
What is the best [insert-type-of-program-here]?
Back to top
View user's profile Send private message
amne
Bodhisattva
Bodhisattva


Joined: 17 Nov 2002
Posts: 6377
Location: Graz / EU

PostPosted: Fri May 27, 2005 9:08 am    Post subject: Reply with quote

Given M. Sur wrote:

Anyways, I just wanted to voice my objections. Thanks for reading.

We are of course aware that the stopwords list isn't the perfect soltution and has some limitations. However we think that the positive effects outweigh the negative ones.
_________________
Dinosaur week! (Ok, this thread is so last week)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Forums Feedback All times are GMT
Goto page Previous  1, 2, 3, 4, 5, 6  Next
Page 2 of 6

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum