I thought there might still be an issue here — the . matches anything except a newline character, represented as \n , unless the flag re.DOTALL is included. So the expression that might work is:

r = re.compile('"TREES"\n(.*)"END TREES"', re.DOTALL)

Note I put a newline after the first line so that it’s not included in the returned text.

— Andy

On Oct 2, 2014, at 10:07 AM, Lachance, Michael - APHIS <[log in to unmask]> wrote:

Andy,

 

Yes, there are actually double quotes around each item in the text file.

 

I am able to get the code to work if I just type in a string of text for s. For example:

 

import os, re

 

f = open("200214092341eriswor_1_56.txt", "r")

s = ‘TheDoctorLies’

r = re.compile('The(.*)Lies')

m = r.search(s)

print m

 

Returns “Doctor”

 

However I get “None” when I assign  file.read() to s and use “TREES” and “END TREES”. So I think the problem lies there.

 

Also, I removed the ? and the code worked when s was set to a simple string of text. Thanks for the tip!

 

Kind regards,

 

Michael Lachance

Plant Protection Technician

ALB Eradication Program USDA APHIS

151 West Boylston Drive

Worcester, MA 01606

508.852.8090 (o)

508.414.5673 (c)

The USDA is an equal opportunity provider and employer.

Federal Relay Service (Voice/TTY/ASCII/Spanish) 1-800-877-8339

 

This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

 

From: Northeast Arc Users Group [mailto:[log in to unmask]] On Behalf Of Andy Anderson
Sent: Thursday, October 02, 2014 9:57 AM
To: [log in to unmask]
Subject: Re: Isolating a block of text between two strings in a text file

 

Does the text "TREES" and "END TREES" actually have quotes? If not, leave them out of the pattern, because they won’t match.

 

The other thing is that the ? is a modifier, it must have some particular character or set of characters immediately before it, rather than another pattern like .* , which matches anything. So you should just leave ? out of it.

 

— Andy

 

On Oct 2, 2014, at 9:15 AM, Lachance, Michael - APHIS <[log in to unmask]> wrote:



Good morning all,

 

Not quite a GIS question, but since you all have proven to be so, so smart in the past, I figured I’d ask here anyways….

 

I am trying to write a Python script that isolates a block of text within a text file. I want to manipulate the data within that block, but first I need to isolate it.

 

My research told me to try using something like re.compile and re.search to do so but I haven’t been having much luck. I had this block of code:

 

 

import os, re

 

file = open("200214092341eriswor_1_56.txt", "r")

s = file.read()

r = re.compile('"TREES"(.*?)"END TREES"')

m = r.search(s)

if m:

    trees = m.group(1)

    print trees

 

But it isn’t returning anything and if I print m it returns “None”.

 

The data in the text file is separated out my commas, and the words “TREES” and “END TREES” and on their own lines above and below the block I’d like to isolate.

 

Any thoughts?

 

Kind regards,

 

Michael Lachance

Plant Protection Technician

ALB Eradication Program USDA APHIS

151 West Boylston Drive

Worcester, MA 01606

508.852.8050 (o)

508.414.5673 (c)

The USDA is an equal opportunity provider and employer.

Federal Relay Service (Voice/TTY/ASCII/Spanish) 1-800-877-8339

 

This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

 

------------------------------------------------------------------------- This list (NEARC-L) is an unmoderated discussion list for all NEARC Users.

If you no longer wish to receive e-mail from this list, you can remove yourself by going to http://listserv.uconn.edu/nearc-l.html.

 

------------------------------------------------------------------------- This list (NEARC-L) is an unmoderated discussion list for all NEARC Users.

If you no longer wish to receive e-mail from this list, you can remove yourself by going to http://listserv.uconn.edu/nearc-l.html.

------------------------------------------------------------------------- This list (NEARC-L) is an unmoderated discussion list for all NEARC Users.

If you no longer wish to receive e-mail from this list, you can remove yourself by going to http://listserv.uconn.edu/nearc-l.html.


------------------------------------------------------------------------- This list (NEARC-L) is an unmoderated discussion list for all NEARC Users.

If you no longer wish to receive e-mail from this list, you can remove yourself by going to http://listserv.uconn.edu/nearc-l.html.