Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Portage Atom RegEx
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
shoddy
n00b
n00b


Joined: 05 Jan 2014
Posts: 13

PostPosted: Sun Jan 05, 2014 12:58 am    Post subject: Portage Atom RegEx Reply with quote

Hello,

I'm working on a neato Portage project I can't wait to announce, but I've come up against a block in trying to parse atoms in PHP. I've spent the last two hours trying to get the solutions in this thread and others to work with no luck. Most of the other regexes I've seen suggested in other threads and blogs so far seem to lack the completeness required to accurately run through the whole package.mask list. I'm a regex imbecile so I'd greatly appreciate a hand!

My objective is to parse package.mask:
Code:

$masked = file_get_contents("$portage_path/profiles/package.mask");
$masked_lines = explode("\n", $masked);

foreach($masked_lines as $line)
{
   $line = trim($line);
   if(!empty($line) and substr($line, 0, 1) != '#')
   {
      $pieces = regexAtom($line);
      print_r($pieces);
   }
}


where regexAtom takes an atom like >=app-accessibility/at-spi2-core-2.10 and spits out an array containing:
- operator
- category
- package
- version
- range

I started off with a brute-force if(substr()) strategy, making ample use of explode() but it all fell apart when it came to package names with multiple hyphens and version numbers with revisions and the like (i.e. -2.10-r1)

The closest I've come to workable output with a regex is the linked thread's last poster's patter, but it spits out a bunch of mostly empty arrays, most of the time. The arrays that are populated have repeated values and other flaws. For example:
Code:

function regexAtom($atom)
{
   preg_match_all("/^([<>]?=?)(([^\/]+)\/)?(?U)(\S+)(-(\d+(\.\d+)*[a-z]?(_(alpha|beta|pre|rc|p)\d*)*(-r\d+)?)?)$/", $atom, $matches);
   return $matches;
}

...

Array
(
    [0] => Array
        (
        )

    [1] => Array
        (
        )

    [2] => Array
        (
        )

    [3] => Array
        (
        )

    [4] => Array
        (
        )

    [5] => Array
        (
        )

    [6] => Array
        (
        )

    [7] => Array
        (
        )

    [8] => Array
        (
        )

    [9] => Array
        (
        )

    [10] => Array
        (
        )

)
Array
(
    [0] => Array
        (
            [0] => >=app-i18n/scim-anthy-1.3.0
        )

    [1] => Array
        (
            [0] => >=
        )

    [2] => Array
        (
            [0] => app-i18n/
        )

    [3] => Array
        (
            [0] => app-i18n
        )

    [4] => Array
        (
            [0] => scim-anthy
        )

    [5] => Array
        (
            [0] => -1.3.0
        )

    [6] => Array
        (
            [0] => 1.3.0
        )

    [7] => Array
        (
            [0] => .0
        )

    [8] => Array
        (
            [0] =>
        )

    [9] => Array
        (
            [0] =>
        )

    [10] => Array
        (
            [0] =>
        )

)


I'm sure there's just a tiny bug in that pattern but it is rather lengthly and I'm not sure where to begin. Your advice will be cherished!
Back to top
View user's profile Send private message
TheLexx
Tux's lil' helper
Tux's lil' helper


Joined: 04 Dec 2005
Posts: 137
Location: Austin Tx

PostPosted: Mon Jan 06, 2014 1:20 am    Post subject: Reply with quote

shoddy wrote:

I'm working on a neato Portage project I can't wait to announce, but I've come up against a block in trying to parse atoms in PHP.
<snip>
... regexAtom takes an atom like >=app-accessibility/at-spi2-core-2.10 and spits out an array containing: - operator - category - package - version - range


I too have been working with scripts for high-level manipulation of Gentoo packages. I'm not familiar with PHP regular expressions, but I can write a python regular expression that will separate the line into (operator, category, package-version). Note that the package and version are not separate! I could not find the docs that describe what is and is not allowable in a version name and package name. Depending on this, it maybe impossible to write a Reg expression to separate the two. Example: If package "aa-bb" version "cc" is allowed and package "aa" version "bb-cc" is also allowed, then there is no unique way to parse "aa-bb-cc". In order to get my program to work, I used the Gentoo tool "equery" to do the split for me. Equery utilizes the current tree to separate package name from version name. The command I would use is "equery -N --nocolor meta --herd app-accessibility/at-spi2-core-2.10". Python has the os.popen() command that can execute other programs and parse the output. I'm not sure that PHP has a similar capability.
Back to top
View user's profile Send private message
gotyaoi
Tux's lil' helper
Tux's lil' helper


Joined: 01 Apr 2013
Posts: 137

PostPosted: Mon Jan 06, 2014 3:28 am    Post subject: Reply with quote

google to the rescue?

https://forums.gentoo.org/viewtopic-t-823844-start-0.html
Back to top
View user's profile Send private message
shoddy
n00b
n00b


Joined: 05 Jan 2014
Posts: 13

PostPosted: Mon Jan 06, 2014 5:32 am    Post subject: Reply with quote

gotyaoi wrote:
google to the rescue?

https://forums.gentoo.org/viewtopic-t-823844-start-0.html

I believe that's the thread I linked to in my original post :)
Back to top
View user's profile Send private message
gotyaoi
Tux's lil' helper
Tux's lil' helper


Joined: 01 Apr 2013
Posts: 137

PostPosted: Mon Jan 06, 2014 5:42 am    Post subject: Reply with quote

And my reading comprehension score just went through the floor, sorry about that.
Back to top
View user's profile Send private message
shoddy
n00b
n00b


Joined: 05 Jan 2014
Posts: 13

PostPosted: Mon Jan 06, 2014 6:27 am    Post subject: Reply with quote

No worries :)

To give you an idea of what I'm trying to do: https://forums.gentoo.org/viewtopic-p-7476234.html
Back to top
View user's profile Send private message
gotyaoi
Tux's lil' helper
Tux's lil' helper


Joined: 01 Apr 2013
Posts: 137

PostPosted: Mon Jan 06, 2014 8:26 am    Post subject: Reply with quote

Ok, well, in hopes of being a little more helpful, here's my try. I did this in python so you may need to tweak it slightly, but here goes.

Code:
([<>]=?|=)? # In the case where the atom contains a version, this bit should appear at the beginning. First group.
([\w_][\w+_.-]*/[\w_][\w+_-]*) # Matches the package category/package name. Second group.
(?(1)-(\d+(?:\.\d+)*[a-z]?(?:(?:_alpha|_beta|_pre|_rc|_p)\d*)*(?:-r\d*)?)) # Matches the version string. Uses (?(1)...) to include this bit only if the first group captured something, ie. there should be a version string present. Third group.


Obviously this needs verbose regex to work, I think that's PCRE_EXTENDED in PHP, or you can just compress it into one line, removing comments and newlines.

If it's a versioned atom, group 1 would be either <, <=, =, >=, >, group two would be the category/name pair and group 3 would be the version. The dash between the name and the version is not captured. If it's unversioned, groups 1 and 3 would be empty.

Edit: if you want separate categories and names, just change the second line to:

Code:
([\w_][\w+_.-]*)/([\w_][\w+_-]*)


Note the extra parentheses. This makes group 2 the category, 3 the name and 4 the version, if it exists.

Edit2: Just noticed that this doesn't handle slots, repositories and wildcards, which can all be part of an atom according to the portage man page. I'll try to fix that tomorrow.
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9507
Location: beyond the rim

PostPosted: Mon Jan 06, 2014 1:49 pm    Post subject: Reply with quote

Actually parsing complete atom strings with only regular expressions is not a good idea in the long run, even if you only need a valid/invalid answer. Pretty much all of the posted versions are incomplete/incorrect as far as I've seen even when only considering EAPI-0 (missing ~, ! and * operators in most cases). And eventually you'll need to consider stuff like slot operators, use deps, repo deps and other new features as well.
And if you want to actually evaluate an atom (check if it matches a specific version) then you should really avoid reimplementing that yourself as that is far more complex than it seems and use the portage API for it. Yes, this might be a bit tricky from within PHP/Perl, but still better than a self-written solution that only covers 95% of all current cases, you really don't want to play catch-up on that front.
Back to top
View user's profile Send private message
shoddy
n00b
n00b


Joined: 05 Jan 2014
Posts: 13

PostPosted: Mon Jan 06, 2014 3:47 pm    Post subject: Reply with quote

Quote:
And eventually you'll need to consider stuff like slot operators, use deps, repo deps and other new features as well.

I really don't think that's what I'm going for.

I would LIKE to add use dep and package dep trees later, but I mean MUCH later. There is still lots to do!

Thanks for your input!

Quote:
Just noticed that this doesn't handle slots, repositories and wildcards, which can all be part of an atom according to the portage man page. I'll try to fix that tomorrow.

Infinite thanks!
Back to top
View user's profile Send private message
gotyaoi
Tux's lil' helper
Tux's lil' helper


Joined: 01 Apr 2013
Posts: 137

PostPosted: Mon Jan 06, 2014 8:39 pm    Post subject: Reply with quote

Genone is pretty much right in this case, but... It's so tempting :P

The interestingly acronym'd Package Manager Specification even defines most of the atom in terms of regex. On the other hand, the portage man page and the ebuild man page sort of present two different pictures of what an atom is, though it's unclear if a lot of what is in the ebuild page is only applicable to writing ebuilds, and not applicable to user supplied atoms... like, the ebuild page essentially says the = prefix is optional, while the portage page requires it. The ebuild page lists a bunch of modifiers that I'm not sure make sense in portage conf files, like the ~ and * make some sense, but I have no idea what the use case would be for using ! in an atom to give to portage, and the slot operators and use flag specifiers... I don't quite some of it, I guess.
Back to top
View user's profile Send private message
TheLexx
Tux's lil' helper
Tux's lil' helper


Joined: 04 Dec 2005
Posts: 137
Location: Austin Tx

PostPosted: Mon Jan 06, 2014 8:43 pm    Post subject: Reply with quote

Genone wrote:
Actually parsing complete atom strings with only regular expressions is not a good idea in the long run, even if you only need a valid/invalid answer. Pretty much all of the posted versions are incomplete/incorrect as far as I've seen even when only considering EAPI-0 (missing ~, ! and * operators in most cases).


In a way I was suggesting you use the portage scripts for similar reason. It is a bit of a kludge to use the text output of one script as the input of another. However that saves having to dig into how to use the portage API. Also, you have to worry about hooks for the API in the language your script written in., be it PHP, python2 or python3. If you do go the route of parsing the text output of a portage script you should write it modular enough that you could change to an API implantation.

So far my ideas of cool things to do with portage have not gained any traction in the Gentoo forums. If they did I would un-kludge my scripts before sharing them with the Gentoo community.
Back to top
View user's profile Send private message
shoddy
n00b
n00b


Joined: 05 Jan 2014
Posts: 13

PostPosted: Mon Jan 06, 2014 11:24 pm    Post subject: Reply with quote

gotyaoi wrote:
Genone is pretty much right in this case, but... It's so tempting :P

Even if it takes a mixture of special case handling and multiple regexes that would surely be better than my uninitiated attempts at using tangled if/else.

If a new feature comes along (i.e. a new suffix or prefix) I don't see much of a problem in adapting to it, as it would be a one-off thing and we're dealing with a website, not something that must work flawlessly unnoticed in the background forever.

Quote:
The interestingly acronym'd Package Manager Specification even defines most of the atom in terms of regex. On the other hand, the portage man page and the ebuild man page sort of present two different pictures of what an atom is, though it's unclear if a lot of what is in the ebuild page is only applicable to writing ebuilds, and not applicable to user supplied atoms... like, the ebuild page essentially says the = prefix is optional, while the portage page requires it. The ebuild page lists a bunch of modifiers that I'm not sure make sense in portage conf files, like the ~ and * make some sense, but I have no idea what the use case would be for using ! in an atom to give to portage, and the slot operators and use flag specifiers... I don't quite some of it, I guess.

AFAIK an atom in the true sense of the word starts with an = and includes a version number. It's indivisible.

So I've muddled the language a bit here. What we're doing in cases like package.mask is more like masking. So no = operator and no version refers to every atom in a package. >=, <=, <, > are self-explanatory and the tilde means something (I already forget :p) and so on.

Quote:
However that saves having to dig into how to use the portage API.

Bingo! I've already spent four and a half days working on this non stop which is a bit more than I budgeted and a chunk of change out of my pocket :p I'm interested in faster solutions but I'm also for doing it right so I'll explore this a bit.

Regardless, I can't see using the API or portage scripts as being the most efficient way of processing the hundred or so lines in package.mask. I can't conceive of anything that could be faster than zipping through this flat file whether it's ideologically correct to do so or not.
Back to top
View user's profile Send private message
TheLexx
Tux's lil' helper
Tux's lil' helper


Joined: 04 Dec 2005
Posts: 137
Location: Austin Tx

PostPosted: Tue Jan 07, 2014 1:15 am    Post subject: Reply with quote

Humm, it looks like I've been quote mined. I was suggesting calling a script within a script to avoid learning the API. I was not advocating using your own regexp to avoid learning the API.

The question that remains is, "To what level do you need the version parsed."? In my script, all I needed to do was separate it into "at-spi2-core" and "2.10". I did not have to determine if one version number was lesser or greater than another. In my scrip, two versions either matched or not. A much simpler task than placing a priority on the two versions.

If you want to simply separate it into parts you could use gotyaoi's regexps they are defiantly Python based. I really can't commit on his regexp for separating the version from the package name. For the reason I don't know what an absolute designation between the package name and the version number. I stated this in my first response. As an example can "my-program-2" be a package or is that "-[0-9]+" always designate a version number. Another question is "-2a" the start of legal version number? Rather than looking of each of those question. It was easier to parse the results of equery. And as a bonus if the definition of legal package/version names changes then those changes would be incorporated into the portage utilities.

Edit watched --> matched
Back to top
View user's profile Send private message
gotyaoi
Tux's lil' helper
Tux's lil' helper


Joined: 01 Apr 2013
Posts: 137

PostPosted: Tue Jan 07, 2014 1:32 am    Post subject: Reply with quote

As to numbers in the package name, the spec allows them as long as it doesn't match "-(version spec)". Technically, it says
Quote:
...must not end in a hyphen followed by one or more digits.

but that's a little inexact for what it actually meant, I think. One of the cases that got mentioned in the other thread was media-fonts/font-adobe-100dpi and friends, where 100dpi was a part of the package name. I guess you could read it as must not end in \d+$, but as I said, inexact.
Back to top
View user's profile Send private message
shoddy
n00b
n00b


Joined: 05 Jan 2014
Posts: 13

PostPosted: Tue Jan 07, 2014 1:54 am    Post subject: Reply with quote

TheLexx wrote:
Humm, it looks like I've been quote mined. I was suggesting calling a script within a script to avoid learning the API. I was not advocating using your own regexp to avoid learning the API.

I didn't mean to take you out of context, I don't think I understood your response until you clarified it. Advocating or not I think it would be a considerable time saver to do it this way, and I'm certainly not above doing things that work economically in spite of purism. Let he who has no sin...

Quote:
The question that remains is, "To what level do you need the version parsed."? In my script, all I needed to do was separate it into "at-spi2-core" and "2.10". I did not have to determine if one version number was lesser or greater than another. In my scrip, two versions either matched or not. A much simpler task than placing a priority on the two versions.

In this case I'm trying to apply hard mask flags to the ebuild entries in the database. This requires accurate parsing of the whole version number or it could appear that nonmasked versions have been hard masked. That would make for a poor user experience. My objective is to make sure gport's information is as close to packages.gentoo.org as possible.

The ebuild entries have a valid version number because when they are put into the database by the crawler their package directory name is simply sliced out of the ebuild filename. So right now I'm looking at doing this backwards and running through the whole ebuild database and comparing them to package.mask individually - at least then I have a guaranteed version string to work with. The problem with that approach is it is obscenely wasteful of resources.
Back to top
View user's profile Send private message
TheLexx
Tux's lil' helper
Tux's lil' helper


Joined: 04 Dec 2005
Posts: 137
Location: Austin Tx

PostPosted: Tue Jan 07, 2014 4:21 am    Post subject: Reply with quote

Because of the needs of your project to compare versions numbers of packages, I would suggest that you use Gentoo scripts to do the heavy lifting of parsing and comparing the version numbers of package. Your script can then look at the output of Gentoo script to determine it's behavior. I think this would be better than trying to re-invent the wheel.

As and example this is how I did my script. Admittedly it is a bit of a kludge, and the variable names are un enlightening. I feel that this will not change unless there is interest in other people using the script, I don't believe it is worth my time to un-kludge it. Also I come from a C-Language background and I added a few steps to make things clearer to non-python users.


Code:

RE_AugLine=re.compile("([^ ]+) ([^ ]+):(.*)")
<snip>
        Execute="equery  -N --nocolor meta --herd %s" % Name
        InFile=os.popen(Execute)
        while 1:
            Line = InFile.readline()
            if Line == "":
                break
            Match=RE_NoVers.match(Line)
            if Match==None:
                continue
            ShortName=Match.groups()[0]
            break
        assert ShortName!=None
       
        Foo=ShortName.split('/')
        self.Category=Foo[0]
        self.Pkg=Foo[1]
        NameLenght=len(ShortName)
        self.Version=Name[NameLenght:]
<snip>
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9507
Location: beyond the rim

PostPosted: Tue Jan 07, 2014 9:00 am    Post subject: Reply with quote

Ok guys, let me clarify some terms here first to avoid confusion:

Atom: This refers to a version selector (a SELECT statement in SQL terms) that can match zero, one or more CPV entries (see below) of a single package. Depending on context it can also include restrictions on origin and configuration (repo deps, use deps). The most basic atom syntax is simply "<category>/<packagename>", a more general form is "[<operator>]<category>/<packagename>[-<packageversion>][<suffixoperator>][<constraints>]" with some extra logic on top. As far as package.mask is concerned you can ignore the <constraints> part for now.

CPV: Stands for "Category/Package-Version", this would be a specific record in the package database, e.g. sys-apps/portage-2.2.0_alpha10-r2. These can actually be parsed with a single regular expression, see /usr/lib/portage/pym/portage/versions.py for the full story, including version comparison rules (you'll see why I recommend not to reimplement that). Note that this is for comparing two CPV entries, actual atom matching is a bit more complex.

It's important to keep these two definitions separate even if they look very similar as they are not interchangable when dealing with portage operations. package.mask contains atoms, when you traverse the repositories you obtain CPV entries. While you can turn every CPV into an atom by prefixing it with an operator the other way is not that simple. Mixing the terms up often leads to confusion (e.g. package.mask takes atoms, but package.provided takes CPV entries).

As for the different parts of an atom regarding use outside of ebuilds:
- block operators (! and !!) only make sense for the actual buildplan generator and therefore only make sense in *DEPEND contexts
- range operators (=, <= and so on, including ~ prefix and * suffix operators) can be encountered in pretty much every place that support atoms. If you really want to handle this yourself make sure you double-check syntax and semantics on the latter two, at least in the past they weren't always intuitive.
- slot deps (package[-version]:slot) is an EAPI-1 feature and should be expected in config files by now
- subslot operators (never can remember the syntax on those) are like block operators only relevant for buildplan generation and make no sense in config files as far as I know
- repository deps (package[-version]:[slot]:repository) can probably be ignored for the main repository, but could appear in overlays and local config files
- use deps (the part in [] at the end of an atom) will probably not appear for a while in config files, but may may be used in the future to replace things like package.use.mask. So your parser should be prepared to handle them.

Quote:
The interestingly acronym'd Package Manager Specification even defines most of the atom in terms of regex.

Well, you can define/parse the individual parts of an atom with regular expressions, but you need quite a bit of logic to connect them to a single term. It's likely possible to write a single regular expression to validate any given atom, but it would be a real monster due to conditionals, and doesn't help much in evaluating the result.
Quote:
On the other hand, the portage man page and the ebuild man page sort of present two different pictures of what an atom is, though it's unclear if a lot of what is in the ebuild page is only applicable to writing ebuilds, and not applicable to user supplied atoms... like, the ebuild page essentially says the = prefix is optional, while the portage page requires it.

The ebuild(5) manpage is generally authorative when it comes to atom definitions. The portage(5) page is more of a user-guide and less formal.
Back to top
View user's profile Send private message
shoddy
n00b
n00b


Joined: 05 Jan 2014
Posts: 13

PostPosted: Tue Jan 07, 2014 4:35 pm    Post subject: Reply with quote

Quote:
Ok guys, let me clarify some terms here first to avoid confusion

An incredibly helpful post. Thank you!

Quote:
Well, you can define/parse the individual parts of an atom with regular expressions, but you need quite a bit of logic to connect them to a single term. It's likely possible to write a single regular expression to validate any given atom, but it would be a real monster due to conditionals, and doesn't help much in evaluating the result.

I just need to cut the darn things up, I can take care of everything after that.

Quote:
I think this would be better than trying to re-invent the wheel.

Agreed, but equery et al has to process these somehow so the first thing I'm going to look at is its guts to see if I can't reverse engineer something.

I'm trying to leave going outside of PHP as a last resort because using the gentoo scripts requires that the site is hosted on a gentoo virtual machine (and probably all sorts of temporary dickery with the host's active version of portage; the site's version is separated for good reasons). While that is the case now, I'm pathological about code portability.

That being said, I've gotta do what I've gotta do so it's on the table.

Another bump in the road I can see - just doing a cursory google - is equery MAY not be helpful when it comes to overlays. Since overlays are going to be mostly what makes gport worth using that could be a big hurdle. Not knowing anything about overlays other than how to set them up at this time, it's possible they don't even use a package.mask since KEYWORDS should be sufficient.

I really have my research cut out for me!
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Wed Jan 08, 2014 4:34 am    Post subject: Reply with quote

As genone said, the distinction between atoms and CPV is critical, and as he pointed out there is indeed a regex for CPVs; I used it as the basis of a bash-equivalent, in order to be able to compare versions of incoming packages for update. If you download the git version (the other's over a year old) and search for getCPV() you'll find the main function (first one I wrote to split incoming packages) along with all the various ERE I use nowadays; a bit lower is verCompare() which compares two version specifiers along with various other functions. I ended up just implementing that from scratch, based on the specification.

I needed this for comparison of versions with blockers, and it's come in handy for other things, like ABI warnings for specific upgrades (which are version-constrained.) It's worked out well. :-)

The only wrinkle is comparison of patch versions, and inter-revs for prefix. Oh and the "float" comparison when one of the numbers starts with 0. I kinda miss the old cvs tag as well; swapping the first two letters would have made it sufficient to fulfil the old "scm" GLEP-55 monstrosity of a tag, which is simply a binary attribute too. Would have been a helluva lot simpler to implement as well. ho hum.

To be transparent, here's the current main RE:
Code:
readonly PSLOT='(:[A-Za-z0-9_][A-Za-z0-9+_./-]*)' PREPO='(::[A-Za-z0-9_][A-Za-z0-9_-]*)'
readonly CPV="^(.*-.*|virtual)/(.*)-([0-9]+)((\.[0-9]+)*)([a-z]?)((_(pre|p|beta|alpha|rc)[0-9]*)*)(-r([0-9.]+))?$PSLOT?$PREPO?$"

I don't bother with checking the chars of the pName, just in splitting out the parts I need (they're used in verCompare.) Another reason for that, is that the category can have odd stuff in it under crossdev (I believe; zmedico told me about it years ago.) And really I don't want to get into the whole charset debacle: it's output from portage which I've already split, and if at some point someone wants to use odd chars, I don't want to get in their way. Makes the regex a bit quicker, since it should just go for matching the version parts, or it used to before slots etc came in. In any event, it's always been pretty quick, and with the one match you can easily do the comparison.

Any >= or the like is split first, and returned in pOper (usually empty), though this doesn't deal with atoms with a * or ~ at the end; it's not meant to. The only place we deal with atoms is either in /etc/warning where we match * slightly differently, to allow 2.3.* (2.3* in portage matches 2.30 iirc: i might be wrong on the details, but there is a difference); or on the cli (to allow update '>=foo-2.3' which is handled when we check params) and from blockers, again with different logic (look for calls to operMatch.)

Bash's dynamic scope helps a lot; I have a list of the vars returned, and simply do: local $vars_getCPV in the caller functions to keep separation; it makes maintenance a lot simpler (adding pSlot and pRepo returns was trivial.) I'd use an object of some sort in another language.

HTH,
steveL.
Back to top
View user's profile Send private message
Havin_it
Veteran
Veteran


Joined: 17 Jul 2005
Posts: 1246
Location: Edinburgh, UK

PostPosted: Wed Jan 15, 2014 6:19 pm    Post subject: Reply with quote

Hi shoddy, sorry to chime in this late in the day, but if it's of any interest, the script I was cooking up that regex for is here. Its purpose was parsing of atoms given on the command-line, in portage-equivalent format but without some of the qualifiers found in the full spec. There is another regex in there for parsing emerge.log and extracting atoms which might be of some interest.

As for the empty arrays, if you look at the function writeup it should be clear what's happening: the first array is full matches, each successive array contains the capturing subpattern matches by their order in the pattern. You can eliminate some of these that you don't need to read from the returned arrays by putting ?: at the start of the subpattern, i.e. change (foo) to (?:foo). I don't think I knew that bit myself when I wrote the script :oops:

Hope that clears up that part, at least.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum