0Day Forums
How do you extract the value of a regex backreference/match in Powershell - Printable Version

+- 0Day Forums (https://0day.red)
+-- Forum: Coding (https://0day.red/Forum-Coding)
+--- Forum: PowerShell & .ps1 (https://0day.red/Forum-PowerShell-ps1)
+--- Thread: How do you extract the value of a regex backreference/match in Powershell (/Thread-How-do-you-extract-the-value-of-a-regex-backreference-match-in-Powershell)



How do you extract the value of a regex backreference/match in Powershell - sybilla132 - 07-21-2023

I have a text file containing lines of data. I can use the following powershell script to extract the lines I'm interested in:

select-string -path *.txt -pattern "subject=([A-Z\.]+),"

Some example data would be:

blah blah subject=THIS.IS.TEST.DATA, blah blah blah

What I want is to be able to extract just the actual contents of the subject (i.e. the "THIS.IS.TEST.DATA" string). I tried this:

select-string -path *.txt -pattern "subject=([A-Z\.]+)," | %{ $_.Matches[0] }

But the "Matches" property is always null. What am I doing wrong?


RE: How do you extract the value of a regex backreference/match in Powershell - Siromphalos4 - 07-21-2023

The problem with the code you are typing is that select-string does not pass down the actual Regex object. Instead it passes a different class called MatchInfo which does not have the actual regex matches information.

If you only want to run the regex once, you will have to roll you're own function which isn't too difficult.

function Select-Match() {
param ($pattern = $(throw "Need a pattern"),
$filePath = $(throw "Need a file path") )
foreach ( $cur in (gc $filePath)) {
if ( $cur -match $pattern ) {
write-output $matches[0];
}
}
}

gci *.txt | %{ Select-Match "subject=([A-Z\.]+)," $_.FullName }




RE: How do you extract the value of a regex backreference/match in Powershell - remineralization969779 - 07-21-2023

Yet another option


gci *.txt | foreach { [regex]::match($_,'(?<=subject=)([^,]+)').value }


RE: How do you extract the value of a regex backreference/match in Powershell - chasseur485 - 07-21-2023

Having learnt a lot from all the other answers I was able to get what I want using the following line:

gci *.txt | gc | %{ [regex]::matches($_, "subject=([A-Z\.]+),") } | %{ $_.Groups[1].Value }

This felt nice as I was only running the regex once per line and as I was entering this at the command prompt it was nice not to have multiple lines of code.




RE: How do you extract the value of a regex backreference/match in Powershell - hoicks948 - 07-21-2023

Another variation, matching 7 digits in a string

echo "123456789 hello test" | % {$_ -match "\d{7}" > $null; $matches[0]}

returns: 1234567


RE: How do you extract the value of a regex backreference/match in Powershell - autocollimates886954 - 07-21-2023

In PowerShell V2 CTP3, the Matches property is implemented. So the following will work:

select-string -path *.txt -pattern "subject=([A-Z\.]+)," | %{ $_.Matches[0].Groups[1].Value }




RE: How do you extract the value of a regex backreference/match in Powershell - mulishly722795 - 07-21-2023

The Select-String command seems to return a ***MatchInfo*** variable and not a "string" variable.
I spent several hours finding this out on forums and official website with no luck.
I'm still gathering info.
A way around this is to declare explicitly a string variable to hold the result returned from the Select-String, from your example:

**[string]** $foo = select-string -path *.txt -pattern "subject=([A-Z\.]+),"

The $foo variable is now a string and not a MatchInfo object.

Hope this helps.

ps5 powershell version 5 string strings manipulation


RE: How do you extract the value of a regex backreference/match in Powershell - chama87356 - 07-21-2023

I don't know why your version doesn't work. It should work. Here is an uglier version that works.

$p = "subject=([A-Z\.]+),"
select-string -path *.txt -pattern $p | % {$_ -match $p > $null; $matches[1]}

Explanation:

`-match` is a regular expression matching operator:

>"foobar" -match "oo.ar"
True

The `> $null` just suppresses the True being written to the output. (Try removing it.) There is a cmdlet that does the same thing whose name I don't recall at the moment.

`$matches` is a magic variable that holds the result of the last `-match` operation.


RE: How do you extract the value of a regex backreference/match in Powershell - brachering730667 - 07-21-2023

There is a much simpler alternative to select-string that will work better.

In powershell,

1. `$sample="blah blah subject=THIS.IS.TEST.DATA, blah blah blah"`
2. `$sample -match "subject=([A-Z\.]+),"`
3. `$matches[1]` will have the substring you are looking for.

This works on Windows 10.0.16299 version