Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Portage & Programming
  • Search

How to make wget to obtain a particular information?

Problems with emerge or ebuilds? Have a basic programming question about C, PHP, Perl, BASH or something else?
Post Reply
Advanced search
11 posts • Page 1 of 1
Author
Message
stolar
n00b
n00b
User avatar
Posts: 49
Joined: Sat Sep 15, 2007 7:21 pm
Location: Zgierz, Poland

How to make wget to obtain a particular information?

  • Quote

Post by stolar » Wed Oct 03, 2007 11:50 pm

Hello
I would be very grateful for any suggestions concerning programming or writing a bash script for wget to obtain a certain piece of information out of a given web page. I mean that I would like for example from www.google.com download only information about a weather. Of course this information should be obtained without going through all the subsequent menues etc. What I mean is writing a bash script in order to do so. My first approach was first to grab this particular 'key words' from the html site of my interest using sed or grep. Unfortunatelly I am still missing either my regular expression knowledge or generally bash programming problem approach. I would be grateful for any kind of guidance in my beginnings with bash. Is at least my starting reasoning ok or its totally wrong?

p.s so far I am able to display all the lines including the phrase of my interest using sed but even if my way of reasoning is right I am not sure chow to 'pass' this info to wget. Since I am a weak programmer but eager to learn this stuff just please give me some feedback on the general reasoning and idea on dealing with this problem...and I would try to riddle this out. Thanks in advance.
Top
poly_poly-man
Advocate
Advocate
User avatar
Posts: 2477
Joined: Wed Dec 06, 2006 9:59 pm
Location: RIT, NY, US
Contact:
Contact poly_poly-man
Website

  • Quote

Post by poly_poly-man » Thu Oct 04, 2007 1:46 am

You could do it BUT...

It's a lot easier to parse stuff meant to be parsed. So, for example, you want the weather? Check out weatherget.py on my ftp site (in sig). It grabs METAR information and parses that (crudely. You'll need to expand on mine. And send me back patches ;) )

What exactly is it you want to do?

poly-p man
iVBORw0KGgoAAAANSUhEUgAAA

avatar: new version of logo - see [topic]838248[/topic]. Potentially still a WiP.
Top
Kingmilo
Apprentice
Apprentice
User avatar
Posts: 173
Joined: Fri Apr 29, 2005 5:35 am
Location: South Africa

  • Quote

Post by Kingmilo » Thu Oct 04, 2007 1:50 am

Alright, maybe the problem is that you are expecting too much from wget? It will only fetch the .html file?
So what i would do is get wget to crap the .html file with the weather info, then once you have the file on your
pc i would use a bash script to run through the file and pull the information you need.

Let me know if we are on the same page then i will help you with a bash script.
trample the weak, hurdle the dead.. .
Top
stolar
n00b
n00b
User avatar
Posts: 49
Joined: Sat Sep 15, 2007 7:21 pm
Location: Zgierz, Poland

  • Quote

Post by stolar » Thu Oct 04, 2007 2:54 pm

Thank You very much Kingmilo. I think that after your reply i will at least at first try to do it exactly in a way you have proposed it. Poly_poly-man thanks also but actually what i am trying to do is a little bit of bash scripts writing. I know that in this particular case the thing i want to do looks bit lika 'an art just for itself' and there are some better tools for doing so but i guess Kingmilo knows what i mean...
So far i have downloaded this weather(i am doing it for sth else but the context stays the same;) stuff to a catalogue using wget, but unfortunatelly when i perform:

Code: Select all

sed -n -e '/Temperature/p' index.html
I am getting nothing or sometimes to many lines when:

Code: Select all

sed -n -e '/BEGINNING/,/ENDING/p' index.html
Should i practice or get to know sed bit more or You would suggest doing it in a different way or mayby using wget with some special options?I am looking forward for your bash script help Kingmilo:)
Top
yuwy
n00b
n00b
Posts: 38
Joined: Mon May 22, 2006 5:47 pm

  • Quote

Post by yuwy » Thu Oct 04, 2007 3:47 pm

Well im a developer so my solution may be biased, but for something like that, I might just write
a simple java program that parses out the data from wgets index.html and then just use
bash to call it and store the output of the java program into a variable where u refernce said time.
Top
poly_poly-man
Advocate
Advocate
User avatar
Posts: 2477
Joined: Wed Dec 06, 2006 9:59 pm
Location: RIT, NY, US
Contact:
Contact poly_poly-man
Website

  • Quote

Post by poly_poly-man » Thu Oct 04, 2007 7:42 pm

Well if you don't want to do it the easy way... ;)

Try bash scripts with Sed, and you'd be surprised at what Grep can do.

As for wget, there aren't really any options that would help you do want you want to do, but I would suggest curl for this instead (I _think_ wget has an option to do this too, but I don't know). With curl, it prints the output to stdout so you don't have to create temp files, and you can run it through sed no problem.

Oh, and Wietze is probably going to come around her soon yelling at you to use Perl. Don't use perl. But python might be nice for this.

Remember, it's not how you code the program, it's whether you win or lose. And how efficient it is.

poly-p man
iVBORw0KGgoAAAANSUhEUgAAA

avatar: new version of logo - see [topic]838248[/topic]. Potentially still a WiP.
Top
stolar
n00b
n00b
User avatar
Posts: 49
Joined: Sat Sep 15, 2007 7:21 pm
Location: Zgierz, Poland

  • Quote

Post by stolar » Thu Oct 04, 2007 8:20 pm

Thanks Poly_poly-man! :D
I used grep as You suggested after downloading particular site with wget finally obtaining satisfactory piece of information at output.
Now the only thing(but as for me hard i guess) is to put this into a nice script. Thanks for curl info, i didn't know about it. As far as the other scripting languages are concerned i must first finish that in bash, otherwise I would start each of the languages without any effect ;). But I will surely take a look on python as You suggest. Kingmilo if you would have a while for your suggestions concerning the script i am looking forward for them.
Top
stolar
n00b
n00b
User avatar
Posts: 49
Joined: Sat Sep 15, 2007 7:21 pm
Location: Zgierz, Poland

  • Quote

Post by stolar » Tue Oct 23, 2007 10:27 pm

So far using Your pieces of advice I maneged to do something like this mainly using sed and grep:

Code: Select all

echo  
echo "                  ..::Weather service for my home city:)::.. "
echo
wget -nv -O pogodanet http://pogoda.wp.pl/miasto,lodz,mid,1201127,mi.html 2>&1 > /dev/null | sed '/^Cookie/d' | sed '/^[[:digit:]]/d'

echo -n
echo

   case $1 in
-t)
    grep 'Temp. odczuwalna' pogodanet | sed 's/(wiatr):<br> <strong>/ /ig' | sed 's/<\/strong>/ /ig'
;;
-w) 
    grep 'Wsch.' pogodanet | sed 's/<strong>/ /ig' | sed 's/<\/strong><br\/>/ /ig'
;;
-z)
    grep 'Zach. s' pogodanet | sed 's/<strong style="padding-left: 2px;">/ /ig' | sed 's/<\/strong>/ /ig'
;;
*)
    echo "This weather element is not available yet:)"
esac
echo
If you have any suggestions concerning any kind of enhancement or bad programming practice please let me know.
Top
poly_poly-man
Advocate
Advocate
User avatar
Posts: 2477
Joined: Wed Dec 06, 2006 9:59 pm
Location: RIT, NY, US
Contact:
Contact poly_poly-man
Website

  • Quote

Post by poly_poly-man » Wed Oct 24, 2007 12:43 am

Here's my script, modified for Łódź (the town that your site mentions):

Code: Select all

import urllib
def cel_fah(placeholder):
	return 9.0/5.0*placeholder+32
def twotoone(tens, ones):
	return (tens * 10) + ones
def kts_mph(kts):
	return (1.1507794 * kts) - ((1.1507794 * kts) % .5)
sock = urllib.urlopen("http://weather.noaa.gov/pub/data/observations/metar/stations/AAXX.TXT")
weatherText = sock.read()
sock.close()
for i in range(len(weatherText)):
	if weatherText[i] == '/':
		if i > 16:
			slashChar = i

if weatherText[slashChar-3] == " ":
	minus = 0
elif weatherText[slashChar-3] == "M":
	minus = 1

onesdigit = int(weatherText[slashChar-1])
tensdigit = int(weatherText[slashChar-2])

tempTemp = twotoone(tensdigit, onesdigit)
if minus == 1:
	tempCel = 0 - tempTemp
elif minus == 0:
	tempCel = tempTemp

tempFahr = cel_fah(tempCel)

for i in range(5):
	if weatherText[i+slashChar] == " ":
		nextSpace = i + slashChar

if weatherText[nextSpace-3] == "/":
	dpMinus = 0
elif weatherText[nextSpace-3] =="M":
	dpMinus = 1

dpodigit = int(weatherText[nextSpace-1])
dptdigit = int(weatherText[nextSpace-2])

tempDp = twotoone(dptdigit, dpodigit)
if dpMinus == 1:
        dpCel = 0 - tempDp
elif dpMinus == 0:
	dpCel = tempDp
dpFahr = cel_fah(dpCel)

print "Temp:", tempFahr, "degrees Fahrenheit"
print "Dew Point: ", dpFahr, "degrees Fahrenheit"

relHum = ((6.11*10.0**(7.5*(dpCel)/(237.7+(dpCel))))/(6.11*10.0**(7.5*(tempCel)/(237.7+(tempCel)))))*100

print "Relative Humidity: ", relHum, "%"

#Start doing wind now

for i in range(len(weatherText)):
	if weatherText[i] == "K":
		if weatherText[i+1] == "T":
			windChar = i
goo, foo = 0, "D"
while foo != " ":
	goo = goo + 1
	foo = weatherText[windChar-goo]

windStartChar = windChar - goo + 1

#print weatherText[windStartChar]

if weatherText[windStartChar] == "V":
	dirWord = "Variable Direction"
else:
	wTens = int(weatherText[windStartChar + 1])
	wHuns = int(weatherText[windStartChar])
	windDir = twotoone(wHuns, wTens) * 10
	# Wind direction is done stupidly, behold:
	if windDir == 0:
		dirWord = "N"
	
	if windDir == 10:
		dirWord = "N"
	
	if windDir == 20:
		dirWord = "N"
	
	if windDir == 30:
		dirWord = "NE"
	
	if windDir == 40:
		dirWord = "NE"
	
	if windDir == 50:
		dirWord = "NE"
	
	if windDir == 60:
		dirWord = "NE"
	
	if windDir == 70:
		dirWord = "E"
	
	if windDir == 80:
		dirWord = "E"
	
	if windDir == 90:
		dirWord = "E"
	
	if windDir == 100:
		dirWord = "E"
	
	if windDir == 110:
		dirWord = "E"
	
	if windDir == 120:
		dirWord = "SE"
	
	if windDir == 130:
		dirWord = "SE"
	
	if windDir == 140:
		dirWord = "SE"
	
	if windDir == 150:
		dirWord = "SE"
	
	if windDir == 160:
		dirWord = "S"
	
	if windDir == 170:
		dirWord = "S"
	
	if windDir == 180:
		dirWord = "S"
	
	if windDir == 190:
		dirWord = "S"
	
	if windDir == 200:
		dirWord = "S"
	
	if windDir == 210:
		dirWord = "SW"
	
	if windDir == 220:
		dirWord = "SW"
	
	if windDir == 230:
		dirWord = "SW"
	
	if windDir == 240:
		dirWord = "SW"
	
	if windDir == 250:
		dirWord = "W"
	
	if windDir == 260:
		dirWord = "W"
	
	if windDir == 270:
		dirWord = "W"
	
	if windDir == 280:
		dirWord = "W"
	
	if windDir == 290:
		dirWord = "W"
	
	if windDir == 300:
		dirWord = "NW"
	
	if windDir == 310:
		dirWord = "NW"
	
	if windDir == 320:
		dirWord = "NW"
	
	if windDir == 330:
		dirWord = "NW"
	
	if windDir == 340:
		dirWord = "N"
	
	if windDir == 350:
		dirWord = "N"
	

#implement hurricane-mode later.

wsTens = int(weatherText[windStartChar+3])
wsOnes = int(weatherText[windStartChar+4])

windSus = twotoone(wsTens, wsOnes)

print "Wind:", kts_mph(windSus), "mph", dirWord 
if weatherText[windStartChar+5] == "G":
	windGus = twotoone(int(weatherText[windStartChar+6]), int(weatherText[windStartChar+7]))
	print "Gusts:", kts_mph(windGus), "mph"
Yeah yeah, add the interpreter line up top, and take out the conversions if you like the metric system ;)

Websites cannot be trusted; you never know when another ad, or changed wording will pop up. Then you'll have to modify. Mine'll work forever.

Oh, and if you get some other METAR features in there, I would be your best friend ;)

Python is WAYY more elegant than a bash script (wait a minute... poly-p man is saying that... ;) )

poly-p man
iVBORw0KGgoAAAANSUhEUgAAA

avatar: new version of logo - see [topic]838248[/topic]. Potentially still a WiP.
Top
stolar
n00b
n00b
User avatar
Posts: 49
Joined: Sat Sep 15, 2007 7:21 pm
Location: Zgierz, Poland

  • Quote

Post by stolar » Wed Oct 24, 2007 8:55 pm

Thanks a lot and regards for the script!:) I guess it will take me some time to understand this, but anyway for me it really seems a great job for me. Surely I will have a deeper look on python.
Top
poly_poly-man
Advocate
Advocate
User avatar
Posts: 2477
Joined: Wed Dec 06, 2006 9:59 pm
Location: RIT, NY, US
Contact:
Contact poly_poly-man
Website

  • Quote

Post by poly_poly-man » Wed Oct 24, 2007 10:51 pm

Python is REALLY useful...

BTW, the hard-to-understand part is mostly the METAR parsing. Take a look at
http://weather.noaa.gov/pub/data/observ ... s/AAXX.TXT
and you'll see why it's confusing. I had to look at many a guide before I understood for my original program

poly-p man
iVBORw0KGgoAAAANSUhEUgAAA

avatar: new version of logo - see [topic]838248[/topic]. Potentially still a WiP.
Top
Post Reply

11 posts • Page 1 of 1

Return to “Portage & Programming”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic