Print

Print


Hi Michael,

Your regular expression is nearly there.  You may want to use a different
search method, however.   Use the .findall()
<https://docs.python.org/2/library/re.html#re.findall> method instead of
.search() <https://docs.python.org/2/library/re.html#re.search> for what
you're trying to do:

import os, re
file = open("200214092341eriswor_1_56.txt", "r")
s = file.read()
r = re.compile('"TREES"\s*(.*?)\s*"END TREES"', re.DOTALL)
trees = r.findall(s)
for tree in trees:
    print tree


As Andy suggested, adding the re.DOTALL
<https://docs.python.org/2/library/re.html#re.DOTALL> flag to the compile
will allow you to match any blocks that continue on multiple lines.  If you
add a \s* surrounding the capture, as shown, it will trim any whitespace
characters, including newline, from your matches.  Stick with your original
expression (still adding the re.DOTALL flag) if you care about the
whitespace or if you'd like to remove it after capture.

Hope that helps,

Adam Wehmann

______________________________
(330)-807-0515 | LinkedIn <http://www.linkedin.com/in/adamwehmann> | Website
<http://www.adamwehmann.com/>
[log in to unmask]


On Thu, Oct 2, 2014 at 11:08 AM, Andy Anderson <[log in to unmask]>
wrote:

>  I thought there might still be an issue here — the . matches anything
> *except* a newline character, represented as \n , unless the
> flag re.DOTALL is included. So the expression that might work is:
>
>  r = re.compile('"TREES"\n(.*)"END TREES"', re.DOTALL)
>
>  Note I put a newline after the first line so that it’s not included in
> the returned text.
>
>  — Andy
>
>  On Oct 2, 2014, at 10:07 AM, Lachance, Michael - APHIS <
> [log in to unmask]> wrote:
>
>   Andy,
>
>
>
> Yes, there are actually double quotes around each item in the text file.
>
>
>
> I am able to get the code to work if I just type in a string of text for
> s. For example:
>
>
>
> import os, re
>
>
>
> f = open("200214092341eriswor_1_56.txt", "r")
>
> s = ‘TheDoctorLies’
>
> r = re.compile('The(.*)Lies')
>
> m = r.search(s)
>
> print m
>
>
>
> Returns “Doctor”
>
>
>
> However I get “None” when I assign  file.read() to s and use “TREES” and
> “END TREES”. So I think the problem lies there.
>
>
>
> Also, I removed the ? and the code worked when s was set to a simple
> string of text. Thanks for the tip!
>
>
>
> Kind regards,
>
>
>
> Michael Lachance
>
> Plant Protection Technician
>
> ALB Eradication Program USDA APHIS
>
> 151 West Boylston Drive
>
> Worcester, MA 01606
>
> *508.852.8090 (o)*
>
> *508.414.5673 (c)*
>
> The USDA is an equal opportunity provider and employer.
>
> Federal Relay Service (Voice/TTY/ASCII/Spanish) 1-800-877-8339
>
>
>
> This electronic message contains information generated by the USDA solely
> for the intended recipients. Any unauthorized interception of this message
> or the use or disclosure of the information it contains may violate the law
> and subject the violator to civil or criminal penalties. If you believe you
> have received this message in error, please notify the sender and delete
> the email immediately.
>
>
>
> *From:* Northeast Arc Users Group [mailto:[log in to unmask]
> <[log in to unmask]>] *On Behalf Of *Andy Anderson
> *Sent:* Thursday, October 02, 2014 9:57 AM
> *To:* [log in to unmask]
> *Subject:* Re: Isolating a block of text between two strings in a text
> file
>
>
>
> Does the text "TREES" and "END TREES" actually have quotes? If not, leave
> them out of the pattern, because they won’t match.
>
>
>
> The other thing is that the ? is a modifier, it must have some particular
> character or set of characters immediately before it, rather than another
> pattern like .* , which matches anything. So you should just leave ? out of
> it.
>
>
>
> — Andy
>
>
>
> On Oct 2, 2014, at 9:15 AM, Lachance, Michael - APHIS <
> [log in to unmask]> wrote:
>
>
>
>  Good morning all,
>
>
>
> Not quite a GIS question, but since you all have proven to be so, so smart
> in the past, I figured I’d ask here anyways….
>
>
>
> I am trying to write a Python script that isolates a block of text within
> a text file. I want to manipulate the data within that block, but first I
> need to isolate it.
>
>
>
> My research told me to try using something like re.compile and re.search
> to do so but I haven’t been having much luck. I had this block of code:
>
>
>
>
>
> import os, re
>
>
>
> file = open("200214092341eriswor_1_56.txt", "r")
>
> s = file.read()
>
> r = re.compile('"TREES"(.*?)"END TREES"')
>
> m = r.search(s)
>
> if m:
>
>     trees = m.group(1)
>
>     print trees
>
>
>
> But it isn’t returning anything and if I print m it returns “None”.
>
>
>
> The data in the text file is separated out my commas, and the words
> “TREES” and “END TREES” and on their own lines above and below the block
> I’d like to isolate.
>
>
>
> Any thoughts?
>
>
>
> Kind regards,
>
>
>
> Michael Lachance
>
> Plant Protection Technician
>
> ALB Eradication Program USDA APHIS
>
> 151 West Boylston Drive
>
> Worcester, MA 01606
>
> *508.852.8050 (o)*
>
> *508.414.5673 (c)*
>
> The USDA is an equal opportunity provider and employer.
>
> Federal Relay Service (Voice/TTY/ASCII/Spanish) 1-800-877-8339
>
>
>
> This electronic message contains information generated by the USDA solely
> for the intended recipients. Any unauthorized interception of this message
> or the use or disclosure of the information it contains may violate the law
> and subject the violator to civil or criminal penalties. If you believe you
> have received this message in error, please notify the sender and delete
> the email immediately.
>
>
>
> -------------------------------------------------------------------------
> This list (NEARC-L) is an unmoderated discussion list for all NEARC Users.
>
> If you no longer wish to receive e-mail from this list, you can remove
> yourself by going to http://listserv.uconn.edu/nearc-l.html.
>
>
>
> -------------------------------------------------------------------------
> This list (NEARC-L) is an unmoderated discussion list for all NEARC Users.
>
> If you no longer wish to receive e-mail from this list, you can remove
> yourself by going to http://listserv.uconn.edu/nearc-l.html.
>  -------------------------------------------------------------------------
> This list (NEARC-L) is an unmoderated discussion list for all NEARC Users.
>
> If you no longer wish to receive e-mail from this list, you can remove
> yourself by going to http://listserv.uconn.edu/nearc-l.html.
>
>
>  -------------------------------------------------------------------------
> This list (NEARC-L) is an unmoderated discussion list for all NEARC Users.
>
> If you no longer wish to receive e-mail from this list, you can remove
> yourself by going to http://listserv.uconn.edu/nearc-l.html.
>

------------------------------------------------------------------------- This list (NEARC-L) is an unmoderated discussion list for all NEARC Users.

If you no longer wish to receive e-mail from this list, you can remove yourself by going to http://listserv.uconn.edu/nearc-l.html.