Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Portage & Programming
  • Search

Computing portage distfiles directory name hash

Problems with emerge or ebuilds? Have a basic programming question about C, PHP, Perl, BASH or something else?
Post Reply
Advanced search
6 posts • Page 1 of 1
Author
Message
eccerr0r
Watchman
Watchman
Posts: 10240
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

Computing portage distfiles directory name hash

  • Quote

Post by eccerr0r » Tue Sep 26, 2023 7:01 am

Yes this has been done this way for a while but I had a script that was relying on the flat directories to reduce my bandwidth dependence on downloading from distfiles.gentoo.org. However it has stopped working because of the directory split to 256 directories.

While I knew this would be a problem at some point, now I have to figure out how to replicate this on my personal portage distfiles mirror as I download them (again, since I have more than one box that needs updates, I am hoping I download any particular distfile no more than once.)

So in https://archives.gentoo.org/gentoo-dev/ ... ecae2463ba it provides some very vague declaration of how to compute the directory hash...is there a better plaintext way of computing the subdirectory? As far as I can tell, the distfile name is hashed (what hash was it decided on?) and then some esoteric function added. I was kind of surprised, would have thought if the hash was relatively cryptographically secure (which it doesn't need to be since the intent was solely distribution and not security) one would just need to cut off the first byte of the hash and use that as the directory when converted to hex.

So is there a bash script to generate this 2-hexadecimal digit hash?

tldr: http://distfiles.gentoo.org/distfiles/[b]5a[/b]/qtwebengine-5.15.10_p20230815.tar.xz : what's the algorithm to compute the 5a ?
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
Genone
Retired Dev
Retired Dev
User avatar
Posts: 9657
Joined: Fri Mar 14, 2003 6:02 pm
Location: beyond the rim

  • Quote

Post by Genone » Tue Sep 26, 2023 8:13 am

The description in the GLEP seems pretty clear:
- Fetch layout.conf
- Process [structure] section of layout.conf to determine available directory layouts
- If filename-hash is specified, generate hash of filename using algorithm defined in layout.conf
- Extract initial bits of the hash as specified in layout.conf

distfiles.gentoo.org has a layout.conf that specifies:

Code: Select all

[structure]
0=filename-hash BLAKE2B 8
Blake2b hash can be calculated e.g. using python hashlib:

Code: Select all

import hashlib
h = hashlib.blake2b()
h.update(b'qtwebengine-5.15.10_p20230815.tar.xz')
print(h.hexdigest())
which gets us

Code: Select all

5a248ad23276e17f10b44788bddaaf8bdd9cc1982c91163752ea7e129c796eb778b3549863a79c8e42363189ac939597d8f83df8f5b063184d14c8eaebe627f2
Then we take the first 8 bit of that hash, which means the first two bytes of the hex digest (one hex digit = 4 bit), which results in the directory name '5a'

You can also calculate the hash using openssl blake2b512, but you need to pay close attention to whitespace when passing content in bash. E.g.

Code: Select all

echo qtwebengine-5.15.10_p20230815.tar.xz | openssl blake2b512
will generate an incorrect result, you have to use `echo -n` to omit the trailing newline.

But mind that other mirrors may use a different structure, so you really should check layout.conf for what is supported.
Top
Ionen
Developer
Developer
User avatar
Posts: 3014
Joined: Thu Dec 06, 2018 2:23 pm

  • Quote

Post by Ionen » Tue Sep 26, 2023 8:44 am

You can also use b2sum from coreutils

Code: Select all

var=$(printf %s qtwebengine-5.15.10_p20230815.tar.xz | b2sum)
echo ${var::2}
5a
Ideally do need to check layout.conf, but odds that this will change often are very low.
Top
Genone
Retired Dev
Retired Dev
User avatar
Posts: 9657
Joined: Fri Mar 14, 2003 6:02 pm
Location: beyond the rim

  • Quote

Post by Genone » Tue Sep 26, 2023 10:16 am

Ionen wrote:Ideally do need to check layout.conf, but odds that this will change often are very low.
True, but I wouldn't be surprised if some mirror operators would opt for a different hash function and/or prefix length. After all, someone must have argued for this functionality in the GLEP.
Top
sam_
Developer
Developer
User avatar
Posts: 2823
Joined: Fri Aug 14, 2020 12:33 am

Re: Computing portage distfiles directory name hash

  • Quote

Post by sam_ » Tue Sep 26, 2023 9:33 pm

eccerr0r wrote: [..]
So in https://archives.gentoo.org/gentoo-dev/ ... ecae2463ba it provides some very vague declaration of how to compute the directory hash...is there a better plaintext way of computing the subdirectory? As far as I can tell, the distfile name is hashed (what hash was it decided on?) and then some esoteric function added. I was kind of surprised, would have thought if the hash was relatively cryptographically secure (which it doesn't need to be since the intent was solely distribution and not security) one would just need to cut off the first byte of the hash and use that as the directory when converted to hex.
You can read GLEP 75 in full at https://www.gentoo.org/glep/glep-0075.html

Hashes have uses beyond cryptography and there's no need for cryptographic security in a case like this; the hash is used for bucketing. The choice of BLAKE2 is therefore kind of interesting but it's justified within the GLEP.
Top
eccerr0r
Watchman
Watchman
Posts: 10240
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

  • Quote

Post by eccerr0r » Tue Sep 26, 2023 10:38 pm

Thanks, this makes more sense now. Seems there was a lot of discussion on the pages of what the options were and why they were or were not used and thus had a lot of content that didn't make it to the final decision. I could have completely glossed over the discussion about cryptographically secure hashes and got the same result, however a "secure" hash would more likely generate a more random distribution of files - which improves the chances the buckets will end up with similar number of files.

I suppose technically my personal distfiles server I could have my own layout.conf, dump all distfiles into the same directory, and do away with all of this... (a lot of my machines are fairly similar in installed content)? Then again I'll have to make sure layout.conf doesn't get rsynced into my repository and get overwritten...

Then the next question is whether eclean will deal with deleting excess files if I break the directories apart... ahh... more to ponder.
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
Post Reply

6 posts • Page 1 of 1

Return to “Portage & Programming”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic