+ Post New Thread
Page 1 of 2 12 LastLast
Results 1 to 15 of 16
Coding Thread, Python find and replace? in Coding and Web Development; I am trying to think of a simple way to do the following: I have a block of text. In ...
  1. #1

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    17,969
    Thank Post
    519
    Thanked 2,503 Times in 1,943 Posts
    Blog Entries
    24
    Rep Power
    841

    Python find and replace?

    I am trying to think of a simple way to do the following:

    I have a block of text. In that text there can be any number of {VAR123} type bits of text. The number can be any number, of any length.

    I want to search through the text and replace the {VAR123} instances and replace with <img src="cid:image1">, where the number increments from 1 to the total number of those instances in the text.

    Thing is, I don't know what the number in the {VAR123} will start at - it could be any number.

    So, I'm figuring I'll need a regex to do this. I'm terrible with regex. They're like an alien language to me...

  2. #2

    Steve21's Avatar
    Join Date
    Feb 2011
    Location
    Swindon
    Posts
    2,705
    Thank Post
    335
    Thanked 517 Times in 485 Posts
    Rep Power
    180
    Is it always going to be {VAR......} or is that just an example? As in are you wanting it to find {var12} and {var13423425235235}, or 12ab and esti4ijisj6e6s etc :P

    Steve

  3. #3

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    17,969
    Thank Post
    519
    Thanked 2,503 Times in 1,943 Posts
    Blog Entries
    24
    Rep Power
    841
    Yup, it'll always be {VAR...}, so {VAR123} or {VAR8965} etc...

  4. #4

    LosOjos's Avatar
    Join Date
    Dec 2009
    Location
    West Midlands
    Posts
    5,532
    Thank Post
    1,463
    Thanked 1,215 Times in 824 Posts
    Rep Power
    724
    So in the text, you are wanting to find and replace any string in the format {VAR###} - is it always curly braces? Is the text variable or always VAR?

    You say you don't know what number the VAR instances start at - does that matter? I mean do you need your "image#" to start at the same number or anything?

    EDIT: just seen your last post - so the numerical element could be any length, but will be immediately followed by a closing curly brace?

  5. #5

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    17,969
    Thank Post
    519
    Thanked 2,503 Times in 1,943 Posts
    Blog Entries
    24
    Rep Power
    841
    Its always curly braces, and always has VAR followed by a number.

    I always need image# to start at 1 and increment. The number after VAR will be used to reference a table in a DB to get that image and attach it to an email (hence the img tag).

    So, for example, you have a letter email, and at the end of the letter you want to insert an image as a signature. The image can't be embedded specifically, as it will vary every time the email is edited in software.

    So, {VAR23} gets inputted in place of the image, and the software, on sending, replaces that tag with <img src="cid:image1"> and attaches image 23 from the database to the email.

  6. #6

    Steve21's Avatar
    Join Date
    Feb 2011
    Location
    Swindon
    Posts
    2,705
    Thank Post
    335
    Thanked 517 Times in 485 Posts
    Rep Power
    180
    Code:
    {var.*}
    Should match any of the ones you have originally, will read the other bit in a minute silly kids keep interrupting!

    Steve

  7. #7

    tmcd35's Avatar
    Join Date
    Jul 2005
    Location
    Norfolk
    Posts
    5,769
    Thank Post
    863
    Thanked 910 Times in 754 Posts
    Blog Entries
    9
    Rep Power
    331
    Does this sound like the right flow:

    • Read character
    • If character = { then read next three character else goto step last step
    • If next three charcters = VAR then count number of characters to } else goto step last step
    • number = truncate string from after R to before }
    • Search database for image number
    • attach database image to Image(nextsequentialnumber)
    • insert string into output text to display image(nextsequentialnumber)
    • move current character position to }and goto step 1
    • write character to ouput and goto step 1

  8. #8

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    17,969
    Thank Post
    519
    Thanked 2,503 Times in 1,943 Posts
    Blog Entries
    24
    Rep Power
    841
    Quote Originally Posted by tmcd35 View Post
    Does this sound like the right flow:

    • Read character
    • If character = { then read next three character else goto step last step
    • If next three charcters = VAR then count number of characters to } else goto step last step
    • number = truncate string from after R to before }
    • Search database for image number
    • attach database image to Image(nextsequentialnumber)
    • insert string into output text to display image(nextsequentialnumber)
    • move current character position to }and goto step 1
    • write character to ouput and goto step 1
    Yup, that's about it.

    The database searching, email attaching is already sorted. Just the find, replace and get the number are the iffy bits for me.
    @Steve21 - that regex works nicely. I'm guessing I then use the finditer() method which returns an iterator of MatchObject instances to get the indexes? Followed by iterating through them and using str.replace(value,"<img src="cid:imageX">",1) incrementing an int to increase the imageX number.

    I've tweaked the regex to be a bit more precise - {VAR[0-9]*} will this work properly for me? It appears to from some cursory testing.
    Last edited by localzuk; 9th December 2013 at 02:00 PM.

  9. #9

    LosOjos's Avatar
    Join Date
    Dec 2009
    Location
    West Midlands
    Posts
    5,532
    Thank Post
    1,463
    Thanked 1,215 Times in 824 Posts
    Rep Power
    724
    This does it but obviously you'll need to alter it to do what you need:

    Code:
    import re
    
    with open('C:\example.txt', 'r') as f:
        original = f.read()
    
    x = 0
    prevEnd=0
    output=""
    
    for m in re.finditer(r"({VAR)([0-9]+?)(})", original):
        x += 1
        VarID = m.group(2) # the number following VAR
        print VarID
        output += original[prevEnd:m.start()] + r'<img src="cid:image' + str(x) + r'">'
        prevEnd = m.end()
    
    output += original[prevEnd:]
    
    print output
    EDIT: just noticed you had already virtually got there by yourself only real difference in my version is the grouping of parts of the match with brackets which means you can then pull them out easily while you iterate the matches

    EDIT2: realised that it stops at the last match, so just added a bit to concatenate the remainder of the file, and I tidied up the regex a bit to make it more accurate (only matches if it finds '{VAR' followed by at least one number followed by '}')
    Last edited by LosOjos; 9th December 2013 at 03:15 PM.

  10. #10

    LosOjos's Avatar
    Join Date
    Dec 2009
    Location
    West Midlands
    Posts
    5,532
    Thank Post
    1,463
    Thanked 1,215 Times in 824 Posts
    Rep Power
    724
    How did you get on with this?

  11. #11

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    17,969
    Thank Post
    519
    Thanked 2,503 Times in 1,943 Posts
    Blog Entries
    24
    Rep Power
    841
    Didn't get a chance to finish coding it up - had a power cut this morning so have spent most of the day recovering all the oddities that are left over from that.

  12. #12

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    17,969
    Thank Post
    519
    Thanked 2,503 Times in 1,943 Posts
    Blog Entries
    24
    Rep Power
    841
    Well, got it working (kind of, I have a string replacement issue at the moment (after altering the first entry, based on start and end index of the text, the original text has changed so the next image in the iterator now has incorrect indices), which I'm fixing with a specific find and replace instead of using the MatchObject.start and .end indices. Once I'm happy with it, I'll post it up to show what I ended up with!

  13. #13

    LosOjos's Avatar
    Join Date
    Dec 2009
    Location
    West Midlands
    Posts
    5,532
    Thank Post
    1,463
    Thanked 1,215 Times in 824 Posts
    Rep Power
    724
    Quote Originally Posted by localzuk View Post
    Well, got it working (kind of, I have a string replacement issue at the moment (after altering the first entry, based on start and end index of the text, the original text has changed so the next image in the iterator now has incorrect indices), which I'm fixing with a specific find and replace instead of using the MatchObject.start and .end indices. Once I'm happy with it, I'll post it up to show what I ended up with!
    If you store the original document in a variable and base your regex on that as I did above, you won't have that problem

  14. #14

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    17,969
    Thank Post
    519
    Thanked 2,503 Times in 1,943 Posts
    Blog Entries
    24
    Rep Power
    841
    Yeah, that's what I've done now. Just trying to figure out why the multipart aspect of my emailing part is suddenly not working. I will succeed if it kills me.

  15. #15

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    17,969
    Thank Post
    519
    Thanked 2,503 Times in 1,943 Posts
    Blog Entries
    24
    Rep Power
    841
    Well, got it working. Gotta thoroughly test it with a variety of data but so far it seems to be holding up.

    Code:
    import smtplibfrom email.mime.multipart import MIMEMultipart
    from email.mime.text import MIMEText
    from email.mime.image import MIMEImage
    import MySQLdb
    import sys
    import re
    
    
    def py_mail():
        """With this function we send out our html email"""
        try:
            connection = MySQLdb.connect (host= "127.0.0.1", user = "USER", passwd = "PASSWORD", db = "DATABASE")
            cursor = connection.cursor(MySQLdb.cursors.DictCursor)
            # Get waiting emails
            cursor.execute("select * from email_details where email_details_sent = 0")
            data = cursor.fetchall()
            # Process each email
            for row in data :
                # Prepare email
                MESSAGE = MIMEMultipart('related')
                MESSAGE["To"] = row["email_details_to_address"]
                MESSAGE["From"] = row["email_details_sender"]
                MESSAGE.preamble = "This is a multi-part message in MIME format"
                MSGALT = MIMEMultipart('alternative')
                MESSAGE.attach(MSGALT)
                MSGTXT = MIMEText("Your email client does not support the format of this email. Please use an email client that accepts HTML email, or contact$
                MSGALT.attach(MSGTXT)
                # Search each line for {IMAGE*} string
                body = row["email_details_body_text"]
                body_to_edit = body
                subject = row["email_details_subject"]
                MESSAGE['Subject'] = subject
                pattern = "{IMAGE[0-9]*}"
                regex = re.compile(pattern, re.IGNORECASE)
                images = regex.finditer(body)
                cids = 1
                for image in images :
                    # If found, get related image from DB
                    cursor.execute("select gen_image_path from gen_image where gen_image_id = " + body[image.start(0)+6:image.end(0)-1])
                    imageDB = cursor.fetchone()
                    imagePath = imageDB["gen_image_path"]
                    # Replace line with <img src="">
                    # Find the specific index now, as the original index might have changed through iterations
                    body_to_edit = body_to_edit.replace("{IMAGE" +  body[image.start(0)+6:image.end(0)-1] + "}","<img src='cid:image" + str(cids) + "'>",1)
                    # Attach image to email, using path from DB
                    fp = open(imagePath,'rb')
                    msgImage = MIMEImage(fp.read())
                    fp.close()
                    msgImage.add_header('Content-ID', '<image' + str(cids) + '>')
                    MESSAGE.attach(msgImage)
                    cids = cids + 1
                HTML_BODY = MIMEText(body_to_edit, 'html')
                MSGALT.attach(HTML_BODY)
                # Send Email
                server = smtplib.SMTP('smtp.gmail.com:587')
                # Print debugging output when testing
                if __name__ == "__main__":
                    server.set_debuglevel(1)
                # Credentials (if needed) for sending the mail
                password = "PASSWORD"
                server.starttls()
                server.login(MESSAGE['From'],password)
                server.sendmail(MESSAGE['From'], MESSAGE['To'], MESSAGE.as_string())
                server.quit()
        except MySQLdb.Error, e:
            print "Error %d: %s" % (e.args[0],e.args[1])
            sys.exit(1)
        finally:
            if connection:
                connection.close()
    if __name__ == "__main__":
        """Executes if the script is run as main script (for testing purposes)"""
        py_mail()
    Not bad for my second ever python program.

SHARE:
+ Post New Thread
Page 1 of 2 12 LastLast

Similar Threads

  1. MS Excel find and replace CR+LF
    By firefighting in forum Office Software
    Replies: 5
    Last Post: 20th November 2013, 08:18 AM
  2. [SIMS] Individual Reports - Find and Replace
    By morrigan456 in forum MIS Systems
    Replies: 8
    Last Post: 18th April 2013, 10:18 AM
  3. Script to Find and Replace within an INI file.
    By timethrow in forum Scripts
    Replies: 1
    Last Post: 4th February 2012, 10:17 AM
  4. Replies: 1
    Last Post: 9th January 2009, 12:15 AM
  5. Mass search and replace
    By Fletcher_Bravo in forum Windows
    Replies: 2
    Last Post: 5th July 2006, 03:44 PM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •