Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
new search stopwords list
View unanswered posts
View posts from last 24 hours

Goto page 1, 2, 3, 4, 5, 6  Next  
Reply to topic    Gentoo Forums Forum Index Gentoo Forums Feedback
View previous topic :: View next topic  
Author Message
rac
Bodhisattva
Bodhisattva


Joined: 30 May 2002
Posts: 6553
Location: Japanifornia

PostPosted: Wed Sep 15, 2004 10:14 pm    Post subject: new search stopwords list Reply with quote

We've analyzed the most commonly occurring words on the forums, and made some additions to the stopword list. Attempting to search using any of these words won't return any posts, and if you combine a stopword with other legitimate terms, the stopword just gets ignored.

Here's the current list, including both upstream phpBB's entry and ours:

AFAIK
I
IIRC
Ive
LOL
ROTF
ROTFLMAO
YMMV
a
aber
able
about
above
access
actually
add
after
again
ago
all
almost
along
alot
already
also
always
am
amp
an
and
and
another
answer
any
anybody
anybodys
anyone
anything
anyway
anywhere
are
arent
around
as
ask
askd
at
auch
auf
available
back
bad
be
because
been
before
being
believe
best
better
between
big
bit
both
box
btw
bug
build
but
but
by
can
cannot
cant
card
case
change
che
check
code
come
command
compile
compiled
compiling
computer
con
configuration
correct
could
couldnt
course
create
das
day
days
days
default
den
der
desktop
did
didnt
die
different
do
does
doesnt
doing
done
dont
down
drive
each
edit
either
else
emerged
end
enough
errors
etc
even
ever
every
everybody
everybodys
everyone
everything
exactly
example
failed
far
few
file
files
find
fine
first
fix
fixed
following
for
for
forum
forums
found
from
function
gentoo
get
getting
give
go
going
gone
good
got
gotten
great
guess
had
hard
hardware
has
have
have
havent
having
help
her
here
hers
him
his
home
hope
how
however
hows
href
ich
idea
ideas
if
ill
in
info
ini
install
installation
installed
installing
instead
into
is
isnt
issue
ist
it
its
ive
just
keep
know
large
last
latest
least
less
let
lib
like
liked
line
link
linux
list
little
load
local
log
lol
long
look
looked
looking
looking
looks
lot
machine
made
mal
man
many
may
maybe
me
mean
message
might
mit
mode
more
most
much
must
mustnt
my
name
near
need
net
network
never
new
news
next
nice
nicht
no
non
none
not
nothing
now
of
off
often
old
on
once
one
only
oops
open
option
options
or
org
other
our
ours
out
output
over
own
package
packages
page
part
pas
people
per
play
please
point
possible
post
pretty
probably
problem
problems
program
put
que
question
questioned
questions
quite
quot
quote
rather
read
really
reason
recent
remember
right
run
said

same
saw
say
says
screen
script
see
seem
seems
sees
server
set
setting
settings
setup
she
should
since
sites
small
so
software
solution
some
someone
something
sometime
somewhere
soon
sorry
source
start
started
still
stuff
such
support
sure
take
tell
than
thank
thanks
that
thatd
thats
the
the
their
theirs
them
then
there
theres
these
they
theyd
theyll
theyre
thing
things
think
this
this
those
though
thought
thread
through
thus
time
times
to
too
tried
true
try
trying
two
type
und
under
until
untrue
up
update
upon
use
used
user
users
using
usr
version
very
via
want
was
way
we
well
went
were
werent
what
whats
when
where
which
while
who
whom
whose
why
wide
will
wink
with
where
which
while
who
whom
whose
why
wide
will
wink
with
with
within
without
wont
work
worked
working
works
world
worse
worst
would
wrong
wrote
www
yes
yet
you
you
youd
youll
your
youre
yours
_________________
For every higher wall, there is a taller ladder
Back to top
View user's profile Send private message
klieber
Bodhisattva
Bodhisattva


Joined: 17 Apr 2002
Posts: 3657
Location: San Francisco, CA

PostPosted: Thu Sep 16, 2004 12:11 am    Post subject: Reply with quote

To follow up on rac's post, the reason we did this was to reduce the size of our search database in mysql. It was overwhelming the database server and causing the slowdowns that people have been experiencing recently.

--kurt
_________________
The problem with political jokes is that they get elected
Back to top
View user's profile Send private message
kamagurka
Veteran
Veteran


Joined: 25 Jan 2004
Posts: 1026
Location: /germany/munich

PostPosted: Mon Sep 20, 2004 6:05 pm    Post subject: Reply with quote

would it be possible to have the search throw an informative error when searching for stopwords instead of just saying "no posts found"?
_________________
If you loved me, you'd all kill yourselves today.
--Spider Jerusalem, the Word
Back to top
View user's profile Send private message
rac
Bodhisattva
Bodhisattva


Joined: 30 May 2002
Posts: 6553
Location: Japanifornia

PostPosted: Mon Sep 20, 2004 6:26 pm    Post subject: Reply with quote

It might be possible to change the message to something like "none of your search terms were usable" in the case where you enter only stopwords. Telling you that some of your terms were used, but not others, would be considerably harder.
_________________
For every higher wall, there is a taller ladder
Back to top
View user's profile Send private message
Gentree
Watchman
Watchman


Joined: 01 Jul 2003
Posts: 5350
Location: France, Old Europe

PostPosted: Sat Oct 23, 2004 7:49 pm    Post subject: Reply with quote

klieber wrote:
To follow up on rac's post, the reason we did this was to reduce the size of our search database in mysql. It was overwhelming the database server and causing the slowdowns that people have been experiencing recently.

--kurt


You may like to consider how much the lack of an effective search tool is burdgeoning the database.

People cant find what's there, make a new post and there's a new thread of 10 or 20 posts.

Before too long this will become unmanagable and the forum will break.

Without the forum Gentoo would be of limitted use.

I have made concrete suggestions in other posts today.

HTH 8)
_________________
Linux, because I'd rather own a free OS than steal one that's not worth paying for.
Gentoo because I'm a masochist
AthlonXP-M on A7N8X. Portage ~x86
Back to top
View user's profile Send private message
Deathwing00
Bodhisattva
Bodhisattva


Joined: 13 Jun 2003
Posts: 4087
Location: Dresden, Germany

PostPosted: Sun Oct 24, 2004 1:27 am    Post subject: Reply with quote

I made this one sticky... I think it's important to know what words are filtered.
Back to top
View user's profile Send private message
c45207
n00b
n00b


Joined: 08 Mar 2004
Posts: 70

PostPosted: Thu Jan 27, 2005 3:25 am    Post subject: Reply with quote

Is there any way to override this? For example, today I wanted to find "You have new mail in". However, only mail is a searchable word, so I go lots of useless posts.
Back to top
View user's profile Send private message
ian!
Bodhisattva
Bodhisattva


Joined: 25 Feb 2003
Posts: 3829
Location: Essen, Germany

PostPosted: Thu Jan 27, 2005 7:06 am    Post subject: Reply with quote

c45207 wrote:
Is there any way to override this?

No.
_________________
"To have a successful open source project, you need to be at least somewhat successful at getting along with people." -- Daniel Robbins
Back to top
View user's profile Send private message
Wicked Wesley
n00b
n00b


Joined: 20 May 2004
Posts: 70
Location: Here

PostPosted: Fri Jan 28, 2005 4:50 pm    Post subject: Reply with quote

Just to let you know, the word but is in there twice!

Have a nice day!
_________________
The Jester!
Linux user 357122!
Back to top
View user's profile Send private message
knefas
l33t
l33t


Joined: 21 Dec 2003
Posts: 828

PostPosted: Fri Jan 28, 2005 5:25 pm    Post subject: Reply with quote

Ohh...also two days, have and this :)
Back to top
View user's profile Send private message
masseya
Bodhisattva
Bodhisattva


Joined: 17 Apr 2002
Posts: 2602
Location: Baltimore, MD

PostPosted: Fri Jan 28, 2005 10:49 pm    Post subject: Reply with quote

Those are particularly insidious words that absolutely have to be stopped so we put the second entry in the stopwords list sort of as a way to add injury to insult for the many weeks of futile searching those words have caused.
_________________
if i never try anything, i never learn anything..
if i never take a risk, i stay where i am..
Back to top
View user's profile Send private message
Anior
Guru
Guru


Joined: 17 Apr 2003
Posts: 317
Location: European Union (Stockholm / Sweden)

PostPosted: Sat Jan 29, 2005 2:56 am    Post subject: Reply with quote

c45207 wrote:
Is there any way to override this? For example, today I wanted to find "You have new mail in". However, only mail is a searchable word, so I go lots of useless posts.

You can use google to search the forums, even if you'll only get hits from those posts which has been indexed.

http://www.google.com/search?hl=en&q=site%3Aforums.gentoo.org+%22you+have+new+mail%22
Back to top
View user's profile Send private message
SubAtomic
Apprentice
Apprentice


Joined: 20 Dec 2003
Posts: 255
Location: Hobart, TAS, Australia

PostPosted: Thu Feb 10, 2005 3:24 am    Post subject: Reply with quote

What about RTFM and rtfm, IMHO and imho?

Would a "Suggest words to add to the stopwords list" thread topic (possibly in the Feedback section) be of use? Im thinking of something similar to the report spammers thread.
_________________
"The real romance is out ahead and yet to come. The computer revolution hasn't started yet. Don't be misled by the enormous flow of money into bad defacto standards for unsophisticated buyers using poor adaptations of incomplete ideas." -- Alan Kay
Back to top
View user's profile Send private message
cokey
Advocate
Advocate


Joined: 23 Apr 2004
Posts: 3355

PostPosted: Thu Mar 24, 2005 12:07 pm    Post subject: Reply with quote

I think "compile" and "error(s)" should be taken out, after all this is gentoo not SuSE
_________________
https://otw20.com/ OTW20 The new place for off the wall chat
Back to top
View user's profile Send private message
masseya
Bodhisattva
Bodhisattva


Joined: 17 Apr 2002
Posts: 2602
Location: Baltimore, MD

PostPosted: Thu Mar 24, 2005 11:02 pm    Post subject: Reply with quote

cokehabit wrote:
I think "compile" and "error(s)" should be taken out, after all this is gentoo not SuSE
The reason these words are on the list is that they are too commonly appearing to actually be of use in identifying a particular thread. There are so many posts with the words 'compile' or 'error' that it's not a useful descriptor. If I were trying to describe myself to you so you could pick me out of a crowd at an amusement park I would want to avoid a description such as "medium height with blue jeans, sneakers and a tshirt" because it wouldn't really tell you anything that would set me apart from virtually everyone else. This is essentially the kind of description you get when searching for the words 'compile' and 'error'.
_________________
if i never try anything, i never learn anything..
if i never take a risk, i stay where i am..
Back to top
View user's profile Send private message
kallamej
Administrator
Administrator


Joined: 27 Jun 2003
Posts: 4980
Location: Gothenburg, Sweden

PostPosted: Thu Mar 24, 2005 11:15 pm    Post subject: Reply with quote

Heh, error is not in the list, actually.
_________________
Please read our FAQ Forum, it answers many of your questions.
irc: #gentoo-forums on irc.libera.chat
Back to top
View user's profile Send private message
cokey
Advocate
Advocate


Joined: 23 Apr 2004
Posts: 3355

PostPosted: Fri Mar 25, 2005 7:52 am    Post subject: Reply with quote

kallamej wrote:
Heh, error is not in the list, actually.
errors is so i put it in bracket(s)
_________________
https://otw20.com/ OTW20 The new place for off the wall chat
Back to top
View user's profile Send private message
masseya
Bodhisattva
Bodhisattva


Joined: 17 Apr 2002
Posts: 2602
Location: Baltimore, MD

PostPosted: Fri Mar 25, 2005 6:56 pm    Post subject: Reply with quote

kallamej wrote:
Heh, error is not in the list, actually.
lol.. We should, like, add that and stuff.
_________________
if i never try anything, i never learn anything..
if i never take a risk, i stay where i am..
Back to top
View user's profile Send private message
cokey
Advocate
Advocate


Joined: 23 Apr 2004
Posts: 3355

PostPosted: Fri Mar 25, 2005 7:05 pm    Post subject: Reply with quote

is there any way to make the gentoo forums searchable through google like wikipedia is? Perhaps somone could speak to them? That would sort out the seach database while offering google free advertising every time someone searches through gentoo.
_________________
https://otw20.com/ OTW20 The new place for off the wall chat
Back to top
View user's profile Send private message
kallamej
Administrator
Administrator


Joined: 27 Jun 2003
Posts: 4980
Location: Gothenburg, Sweden

PostPosted: Fri Mar 25, 2005 7:54 pm    Post subject: Reply with quote

Yes, the forums are google searchable, but there are only about 30K pages indexed. It's increasing quite nicely since the urls got html-ised, though.
_________________
Please read our FAQ Forum, it answers many of your questions.
irc: #gentoo-forums on irc.libera.chat
Back to top
View user's profile Send private message
Satori80
Tux's lil' helper
Tux's lil' helper


Joined: 24 Feb 2004
Posts: 137

PostPosted: Sat Apr 09, 2005 11:19 am    Post subject: Reply with quote

Why don't you guys try to get a consensus? I for one would rather have a slow useful search database than a quick irrelevant one.
Back to top
View user's profile Send private message
curtis119
Bodhisattva
Bodhisattva


Joined: 10 Mar 2003
Posts: 2160
Location: Toledo, Ohio,USA, North America, Earth, SOL System, Milky Way, The Universe, The Cosmos, and Beyond.

PostPosted: Sat Apr 09, 2005 11:30 am    Post subject: Reply with quote

Satori80 wrote:
Why don't you guys try to get a consensus? I for one would rather have a slow useful search database than a quick irrelevant one.


The stop words list is attempting to do both. A quick and relevant search. It's gotten so much better since rac and ian! starting actively doing this. I search constantly and have noticed a significant difference in quality of results.
_________________
Gentoo: it's like wiping your ass with silk.
Back to top
View user's profile Send private message
Satori80
Tux's lil' helper
Tux's lil' helper


Joined: 24 Feb 2004
Posts: 137

PostPosted: Sat Apr 09, 2005 11:34 am    Post subject: Reply with quote

masseya wrote:
cokehabit wrote:
I think "compile" and "error(s)" should be taken out, after all this is gentoo not SuSE
The reason these words are on the list is that they are too commonly appearing to actually be of use in identifying a particular thread. There are so many posts with the words 'compile' or 'error' that it's not a useful descriptor.


It isn't the words in and of themselves that make them useful or not. It's the use of the words in combination with other specific words. For instance, those generated in an error message. If the search finds all the terms in the error message, you can quickly find the subject of your concern. Without the right words at your disposal, you'll have to fish around through irrelevant topics to try and find what you need to get your system back on its feet. I've found myself in this second situation more often than usual the past few days – more than once without resolution to my issue. Now I know why. It isn't because the issue isn't in the forums, it's because it can't be found due to a flaky search. And frankly, I'm pissed about it.

There is a reason error messages are generated in the first place. If you can't make the forums able to find specific input then why bother devoting the resources to keep them online? I always used the forums as a troubleshooting tool in the past. Apparently, I can no longer do that. Too bad for me, huh?
Back to top
View user's profile Send private message
Satori80
Tux's lil' helper
Tux's lil' helper


Joined: 24 Feb 2004
Posts: 137

PostPosted: Sat Apr 09, 2005 11:54 am    Post subject: Reply with quote

Look, I'm sorry if that last post came off as crass. I wasn't trying to insult anybody, and I didn't mean it as directing my frustration on any one person in particular.

But the sentiment is valid. I mean look at that list. "Man" is in the list? If I have an issue with the "man" program I can't directly look for a resolution to my issue in these forums? Common guys, give us a fighting chance.
Back to top
View user's profile Send private message
cokey
Advocate
Advocate


Joined: 23 Apr 2004
Posts: 3355

PostPosted: Sat Apr 09, 2005 12:11 pm    Post subject: Reply with quote

curtis119 wrote:
Satori80 wrote:
Why don't you guys try to get a consensus? I for one would rather have a slow useful search database than a quick irrelevant one.

The stop words list is attempting to do both. A quick and relevant search. It's gotten so much better since rac and ian! starting actively doing this. I search constantly and have noticed a significant difference in quality of results.

I've noticed the opposite, i continually miss threads or have no threads come up at all where i would expect at least a few. VERY infuriating if you cannot ONE SINGLE THREAD up. It just makes it look broken.
_________________
https://otw20.com/ OTW20 The new place for off the wall chat
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Forums Feedback All times are GMT
Goto page 1, 2, 3, 4, 5, 6  Next
Page 1 of 6

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum