View previous topic :: View next topic |
Author |
Message |
roseZ n00b
Joined: 17 Dec 2005 Posts: 31 Location: Finland
|
Posted: Wed Apr 09, 2014 9:51 am Post subject: emerge --sync fails almost always |
|
|
I've had a lot of problems syncing the portage tree since Oct 1, 2013. I've been using rsync.europe.gentoo.org or rsync.se.gentoo.org (I live in Finland) without problems prior to that date. Checking emerge.log between Oct 1 and the last sync before that, I found no changes to my system that appear relevant to the issue (only a rebuild of a few packages that aren't even installed any more).
A typical failed sync attempt (with banners, filelists and similar retries abridged) looks like this:
Code: |
# emerge --sync
>>> Synchronization of repository 'gentoo' located in '/usr/portage'...
>>> Starting rsync with rsync://88.156.222.90/gentoo-portage...
receiving incremental file list
timestamp.chk
Number of files: 1
Number of files transferred: 1
Total file size: 32 bytes
Total transferred file size: 32 bytes
Literal data: 32 bytes
Matched data: 0 bytes
File list size: 27
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 98
Total bytes received: 135
sent 98 bytes received 135 bytes 93.20 bytes/sec
total size is 32 speedup is 0.14
receiving incremental file list
app-i18n/ibus-chewing/ChangeLog
app-i18n/ibus-chewing/Manifest
--- SNIP ---
dev-ruby/hoe/hoe-2.13.0-r1.ebuild
dev-ruby/hoe/hoe-3.9.0.ebuild
rsync error: timeout in data send/receive (code 30) at io.c(137) [sender=3.0.9]
rsync: connection unexpectedly closed (1610442 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [receiver=3.0.9]
rsync: connection unexpectedly closed (3185 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [generator=3.0.9]
>>> Retrying...
>>> Starting retry 1 of 5 with rsync://81.91.242.10/gentoo-portage
--- SNIP ---
rsync error: timeout in data send/receive (code 30) at io.c(137) [sender=3.0.8]
rsync: read error: Connection reset by peer (104)
rsync error: error in rsync protocol data stream (code 12) at io.c(764) [receiver=3.0.9]
rsync: connection unexpectedly closed (2909 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [generator=3.0.9]
>>> Retrying...
>>> Starting retry 2 of 5 with rsync://137.226.34.42/gentoo-portage
*******************************************************
=======================================================
Welcome to the ftp.halifax.rwth-aachen.de rsync mirror
=======================================================
Please don't use compression (-z) or checksums (-c).
Both features are disabled!
=======================================================
=======================================================
receiving incremental file list
timestamp.chk
--- SNIP ---
receiving incremental file list
rsync error: timeout in data send/receive (code 30) at io.c(137) [sender=3.0.9]
rsync: read error: Connection reset by peer (104)
rsync error: error in rsync protocol data stream (code 12) at io.c(764) [receiver=3.0.9]
rsync: connection unexpectedly closed (3200 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [generator=3.0.9]
>>> Retrying...
>>> Starting retry 3 of 5 with rsync://137.226.34.228/gentoo-portage
Welcome to rsync.informatik.rwth-aachen.de (137.226.34.228).
This server is part of the SunSITE Central Europe and is located
in Aachen, Germany (http://sunsite.informatik.rwth-aachen.de).
--- SNIP ---
receiving incremental file list
dev-ruby/listen/ChangeLog
dev-ruby/listen/Manifest
--- SNIP ---
dev-ruby/ntlm-http/ntlm-http-0.1.1-r1.ebuild
dev-ruby/ntlm-http/ntlm-http-0.1.1.ebuild
rsync error: timeout in data send/receive (code 30) at io.c(239) [sender=3.0.3]
rsync: read error: Connection reset by peer (104)
rsync error: error in rsync protocol data stream (code 12) at io.c(764) [receiver=3.0.9]
rsync: connection unexpectedly closed (2496 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [generator=3.0.9]
>>> Retrying...
>>> Starting retry 4 of 5 with rsync://83.164.133.198/gentoo-portage
--- SNIP ---
receiving incremental file list
rsync error: timeout in data send/receive (code 30) at io.c(137) [sender=3.0.7]
rsync: read error: Connection reset by peer (104)
rsync error: error in rsync protocol data stream (code 12) at io.c(764) [receiver=3.0.9]
rsync: connection unexpectedly closed (3210 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [generator=3.0.9]
>>> Retrying...
>>> Starting retry 5 of 5 with rsync://193.219.184.226/gentoo-portage
timed out
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(549) [Receiver=3.0.9]
>>> Exceeded PORTAGE_RSYNC_RETRIES: 5
|
My internet connection works fine, as far as I'm aware: no frequent timeouts or disconnects; downloads, bittorrent, etc. all work fine.
In order to troubleshoot the issue, I wrote a script that extracts some sync statistics from emerge.log. Here's what I have from the time period Jun 2010 - Oct 2013:
Code: |
$ python ~/syncstats.py <(sed -n '1,/^1380592/p' /var/log/emerge.log)
Sync server Syncs Fails Rate Avg dur. Max dur. Min dur. Hostname
--------------------------------------------------------------------------------
88.156.222.90 6 0 100.0 % 96.5 s 194 s 31 s ftp.vectranet.pl
212.24.172.37 6 0 100.0 % 77.8 s 167 s 37 s rsync1.hu.gentoo.org
137.226.34.42 6 0 100.0 % 56.2 s 116 s 35 s ftp.halifax.rwth-aachen.de
193.136.164.6 7 0 100.0 % 116.0 s 482 s 42 s ftp.rnl.ist.utl.pt
212.89.225.155 3 0 100.0 % 45.7 s 57 s 39 s tux.rainside.sk
83.164.133.198 12 0 100.0 % 48.9 s 80 s 36 s gentoo.lagis.at
137.226.34.228 8 0 100.0 % 90.1 s 145 s 48 s rsync.informatik.rwth-aachen.de
212.227.96.139 4 0 100.0 % 106.2 s 140 s 33 s s15736424.onlinehome-server.info
194.62.23.99 7 0 100.0 % 59.4 s 96 s 42 s mirror.opteamax.de
81.91.242.10 5 0 100.0 % 43.2 s 51 s 36 s gig0.mirror.ln1.qubenet.net
92.240.244.16 2 0 100.0 % 40.0 s 41 s 39 s phoenix.wheel.sk
178.63.25.207 8 0 100.0 % 52.5 s 99 s 32 s static.207.25.63.178.clients.your-server.de
212.110.161.109 7 0 100.0 % 43.3 s 55 s 36 s 212-110-161-109.no-reverse-dns-set.bytemark.co.uk
193.219.32.205 5 0 100.0 % 163.4 s 549 s 48 s kmynas.elen.ktu.lt
91.193.124.136 1 0 100.0 % 37.0 s 37 s 37 s <none>
88.198.224.205 13 0 100.0 % 42.8 s 55 s 34 s www1.stealer.net
212.110.161.69 1 0 100.0 % 58.0 s 58 s 58 s mirror.bytemark.co.uk
87.106.218.149 2 0 100.0 % 45.0 s 48 s 42 s s15423754.onlinehome-server.info
88.198.69.178 5 0 100.0 % 66.8 s 135 s 33 s hosting1.amiconsult.de
212.37.1.30 124 3 97.6 % 187.9 s 619 s 23 s mirror2.dataphone.se
193.1.193.64 9 1 90.0 % 2150.7 s 6418 s 690 s ftp.heanet.ie
89.238.64.78 9 1 90.0 % 47.4 s 107 s 31 s banana.gentoo-ev.org
188.165.231.20 8 1 88.9 % 43.6 s 57 s 24 s gentoo64.net
81.223.20.162 8 1 88.9 % 93.2 s 409 s 38 s gentoo.inode.at
88.191.100.68 6 1 85.7 % 68.8 s 106 s 35 s 88-191-100-68.rev.poneytelecom.eu
194.146.132.15 4 1 80.0 % 66.0 s 84 s 52 s gentoo.iteam.ua
89.206.169.171 8 2 80.0 % 59.0 s 107 s 34 s rsync.star.net.uk
147.175.167.205 2 2 50.0 % 46.5 s 52 s 41 s <none>
178.210.128.3 1 1 50.0 % 42.0 s 42 s 42 s www.telcom.net.ua
130.230.54.100 0 2 0.0 % nan s 0 s 0 s trumpetti.ip4.atm.tut.fi
193.219.184.226 0 1 0.0 % nan s 0 s 0 s <none>
|
On the other hand, this shows what's been happening since the problems started (Oct 2013 - today):
Code: |
$ python ~/syncstats.py <(sed -n '/^1379995624/,$p' /var/log/emerge.log)
Sync server Syncs Fails Rate Avg dur. Max dur. Min dur. Hostname
--------------------------------------------------------------------------------
212.24.172.37 17 1 94.4 % 64.9 s 90 s 36 s rsync1.hu.gentoo.org
212.37.1.30 1 3 25.0 % 161.0 s 161 s 161 s mirror2.dataphone.se
88.198.224.205 2 10 16.7 % 556.0 s 797 s 315 s www1.stealer.net
89.238.64.78 1 9 10.0 % 59.0 s 59 s 59 s banana.gentoo-ev.org
88.198.69.178 1 13 7.1 % 26.0 s 26 s 26 s hosting1.amiconsult.de
137.226.34.228 0 10 0.0 % nan s 0 s 0 s rsync.informatik.rwth-aachen.de
91.193.124.136 0 11 0.0 % nan s 0 s 0 s <none>
178.210.128.3 0 12 0.0 % nan s 0 s 0 s www.telcom.net.ua
88.191.100.68 0 6 0.0 % nan s 0 s 0 s 88-191-100-68.rev.poneytelecom.eu
137.226.34.42 0 12 0.0 % nan s 0 s 0 s ftp.halifax.rwth-aachen.de
193.136.164.6 0 11 0.0 % nan s 0 s 0 s ftp.rnl.ist.utl.pt
188.165.231.20 0 14 0.0 % nan s 0 s 0 s gentoo64.net
212.89.225.155 0 9 0.0 % nan s 0 s 0 s tux.rainside.sk
83.164.133.198 0 13 0.0 % nan s 0 s 0 s gentoo.lagis.at
89.206.169.171 0 10 0.0 % nan s 0 s 0 s rsync.star.net.uk
194.62.23.99 0 11 0.0 % nan s 0 s 0 s mirror.opteamax.de
130.230.54.100 0 3 0.0 % nan s 0 s 0 s trumpetti.ip4.atm.tut.fi
88.156.222.90 0 8 0.0 % nan s 0 s 0 s ftp.vectranet.pl
194.146.132.15 0 11 0.0 % nan s 0 s 0 s gentoo.iteam.ua
81.223.20.162 0 13 0.0 % nan s 0 s 0 s gentoo.inode.at
144.76.18.87 0 10 0.0 % nan s 0 s 0 s skade.schwarzvogel.de
212.110.161.69 0 9 0.0 % nan s 0 s 0 s mirror.bytemark.co.uk
87.106.218.149 0 14 0.0 % nan s 0 s 0 s s15423754.onlinehome-server.info
81.91.242.10 0 10 0.0 % nan s 0 s 0 s gig0.mirror.ln1.qubenet.net
193.219.184.226 0 9 0.0 % nan s 0 s 0 s <none>
193.1.193.64 0 9 0.0 % nan s 0 s 0 s ftp.heanet.ie
92.240.244.16 0 8 0.0 % nan s 0 s 0 s phoenix.wheel.sk
|
Looking at the success rates, you can see that it has gone from a pretty spotless record to only being able to sync reliably with one server (rsync1.hu.gentoo.org) during the past 6 months. By the way, those two listings combined contain the data for (practically) every sync I've attempted since 2010.
Now, I know that I can (and occasionally do) sync using
Code: |
emerge-webrsync # which is slow, or
SYNC="rsync1.hu.gentoo.org" emerge --sync
|
but I'd rather not do that since I understand that using the server pools like rsync.europe.gentoo.org is preferred to distribute the load.
Mostly, I'm writing to ask a couple of questions:
1) What gives? Surely this situation is far from normal/expected?
2) Has anyone else experienced similar behaviour?
3) Did something change with the sync server infrastructure or practices around the end of September 2013?
4) If the problem is at my end, any ideas on where to look or how to remedy the situation?
Any input is much appreciated. |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3131
|
Posted: Wed Apr 09, 2014 10:51 am Post subject: |
|
|
perhaps switch to emerge-delta-webrsync would be a better idea then? |
|
Back to top |
|
|
roseZ n00b
Joined: 17 Dec 2005 Posts: 31 Location: Finland
|
Posted: Wed Apr 09, 2014 11:24 am Post subject: |
|
|
Thanks, I wasn't aware of delta-webrsync script. I guess it would be an improvement over webrsync at least. I would like to stick to rsync, though. |
|
Back to top |
|
|
mv Watchman
Joined: 20 Apr 2005 Posts: 6747
|
Posted: Wed Apr 09, 2014 4:17 pm Post subject: |
|
|
Do you have some iptables/firewall scripts set up? Such things can happen if icmp is not properly handled. Maybe also that MTU is incorrect. |
|
Back to top |
|
|
roseZ n00b
Joined: 17 Dec 2005 Posts: 31 Location: Finland
|
Posted: Thu Apr 10, 2014 5:01 am Post subject: |
|
|
I do have an iptables firewall up since at least 2012, and there's a built-in firewall in the router-ADSL modem combo from my ISP. Though I've made no changes to either around the time that problems started.
I'm not really privy to the inner workings of network protocols, but I do have an iptables rule (from some guide) intended to maintain "normal" net-facing operation of a host:
Code: |
# iptables --line-numbers -v -L icmp-allow
Chain icmp-allow (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 ACCEPT icmp -- any any anywhere anywhere icmp destination-unreachable
2 0 0 ACCEPT icmp -- any any anywhere anywhere icmp source-quench
3 0 0 ACCEPT icmp -- any any anywhere anywhere icmp time-exceeded
4 0 0 ACCEPT icmp -- any any anywhere anywhere icmp parameter-problem
5 0 0 ACCEPT icmp -- any any anywhere anywhere icmp echo-request
6 0 0 ACCEPT icmp -- any any anywhere anywhere icmp echo-reply
|
That rule is applied at the end of the INPUT chain (policy: DROP), and OUTPUT chain is empty (policy: ALLOW). Judging by the zero counters, that rule doesn't see much action.
I tried replacing all DROP operations in the rules with LOG+DROP and then syncing, but it produced nothing out of the ordinary to the logs. Also tried with the firewall completely disabled and the router's built-in firewall bypassed as best I could (enabled NAT for ports 1:65535 to my machine), but the syncs failed just the same. I did see some new error messages, though:
Code: |
# PORTAGE_RSYNC_RETRIES=2 emerge --sync
>>> Synchronization of repository 'gentoo' located in '/usr/portage'...
>>> Starting rsync with rsync://88.191.100.68/gentoo-portage...
timed out
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(549) [Receiver=3.0.9]
>>> Retrying...
>>> Starting retry 1 of 2 with rsync://88.198.224.205/gentoo-portage
--- SNIP ---
receiving incremental file list
app-pda/p3nfs/ChangeLog
app-pda/p3nfs/Manifest
app-pda/p3nfs/metadata.xml
metadata/timestamp
metadata/timestamp.chk
metadata/timestamp.x
metadata/dtd/timestamp.chk
metadata/glsa/timestamp.chk
rsync: read error: Connection timed out (110)
rsync error: error in rsync protocol data stream (code 12) at io.c(764) [receiver=3.0.9]
rsync: writefd_unbuffered failed to write 192 bytes to socket [generator]: Broken pipe (32)
rsync error: error in rsync protocol data stream (code 12) at io.c(1532) [generator=3.0.9]
>>> Retrying...
>>> Starting retry 2 of 2 with rsync://137.226.34.228/gentoo-portage
--- SNIP ---
receiving incremental file list
metadata/timestamp.chk
[receiver] io timeout after 180 seconds -- exiting
rsync error: timeout in data send/receive (code 30) at io.c(140) [receiver=3.0.9]
rsync: connection unexpectedly closed (2572 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [generator=3.0.9]
>>> Exceeded PORTAGE_RSYNC_RETRIES: 2
|
I can't really see a pattern here, except that often either the sender or receiver appears to gets bored of waiting and throws in the towel. But why?
I don't get how the maximum transmission unit size would affect my issue; shouldn't the packets be chopped up on the way as needed? Anyhow, the values seem pretty usual to me: the router MTU is 1460 and ifconfig reports that the MTU is set to 1500 for the network interface. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|