+ Post New Thread
Page 1 of 2 12 LastLast
Results 1 to 15 of 16
Coding Thread, Python find and replace? in Coding and Web Development; I am trying to think of a simple way to do the following: I have a block of text. In ...
  1. #1

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    18,369
    Thank Post
    525
    Thanked 2,611 Times in 2,019 Posts
    Blog Entries
    24
    Rep Power
    890

    Python find and replace?

    I am trying to think of a simple way to do the following:

    I have a block of text. In that text there can be any number of {VAR123} type bits of text. The number can be any number, of any length.

    I want to search through the text and replace the {VAR123} instances and replace with <img src="cid:image1">, where the number increments from 1 to the total number of those instances in the text.

    Thing is, I don't know what the number in the {VAR123} will start at - it could be any number.

    So, I'm figuring I'll need a regex to do this. I'm terrible with regex. They're like an alien language to me...

  2. #2

    Steve21's Avatar
    Join Date
    Feb 2011
    Location
    Swindon
    Posts
    2,762
    Thank Post
    354
    Thanked 533 Times in 498 Posts
    Rep Power
    182
    Is it always going to be {VAR......} or is that just an example? As in are you wanting it to find {var12} and {var13423425235235}, or 12ab and esti4ijisj6e6s etc :P

    Steve

  3. #3

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    18,369
    Thank Post
    525
    Thanked 2,611 Times in 2,019 Posts
    Blog Entries
    24
    Rep Power
    890
    Yup, it'll always be {VAR...}, so {VAR123} or {VAR8965} etc...

  4. #4

    LosOjos's Avatar
    Join Date
    Dec 2009
    Location
    West Midlands
    Posts
    5,665
    Thank Post
    1,484
    Thanked 1,263 Times in 857 Posts
    Rep Power
    803
    So in the text, you are wanting to find and replace any string in the format {VAR###} - is it always curly braces? Is the text variable or always VAR?

    You say you don't know what number the VAR instances start at - does that matter? I mean do you need your "image#" to start at the same number or anything?

    EDIT: just seen your last post - so the numerical element could be any length, but will be immediately followed by a closing curly brace?

  5. #5

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    18,369
    Thank Post
    525
    Thanked 2,611 Times in 2,019 Posts
    Blog Entries
    24
    Rep Power
    890
    Its always curly braces, and always has VAR followed by a number.

    I always need image# to start at 1 and increment. The number after VAR will be used to reference a table in a DB to get that image and attach it to an email (hence the img tag).

    So, for example, you have a letter email, and at the end of the letter you want to insert an image as a signature. The image can't be embedded specifically, as it will vary every time the email is edited in software.

    So, {VAR23} gets inputted in place of the image, and the software, on sending, replaces that tag with <img src="cid:image1"> and attaches image 23 from the database to the email.

  6. #6

    Steve21's Avatar
    Join Date
    Feb 2011
    Location
    Swindon
    Posts
    2,762
    Thank Post
    354
    Thanked 533 Times in 498 Posts
    Rep Power
    182
    Code:
    {var.*}
    Should match any of the ones you have originally, will read the other bit in a minute silly kids keep interrupting!

    Steve

  7. #7

    tmcd35's Avatar
    Join Date
    Jul 2005
    Location
    Norfolk
    Posts
    5,965
    Thank Post
    894
    Thanked 983 Times in 807 Posts
    Blog Entries
    9
    Rep Power
    343
    Does this sound like the right flow:

    • Read character
    • If character = { then read next three character else goto step last step
    • If next three charcters = VAR then count number of characters to } else goto step last step
    • number = truncate string from after R to before }
    • Search database for image number
    • attach database image to Image(nextsequentialnumber)
    • insert string into output text to display image(nextsequentialnumber)
    • move current character position to }and goto step 1
    • write character to ouput and goto step 1

  8. #8

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    18,369
    Thank Post
    525
    Thanked 2,611 Times in 2,019 Posts
    Blog Entries
    24
    Rep Power
    890
    Quote Originally Posted by tmcd35 View Post
    Does this sound like the right flow:

    • Read character
    • If character = { then read next three character else goto step last step
    • If next three charcters = VAR then count number of characters to } else goto step last step
    • number = truncate string from after R to before }
    • Search database for image number
    • attach database image to Image(nextsequentialnumber)
    • insert string into output text to display image(nextsequentialnumber)
    • move current character position to }and goto step 1
    • write character to ouput and goto step 1
    Yup, that's about it.

    The database searching, email attaching is already sorted. Just the find, replace and get the number are the iffy bits for me.
    @Steve21 - that regex works nicely. I'm guessing I then use the finditer() method which returns an iterator of MatchObject instances to get the indexes? Followed by iterating through them and using str.replace(value,"<img src="cid:imageX">",1) incrementing an int to increase the imageX number.

    I've tweaked the regex to be a bit more precise - {VAR[0-9]*} will this work properly for me? It appears to from some cursory testing.
    Last edited by localzuk; 9th December 2013 at 03:00 PM.

  9. #9

    LosOjos's Avatar
    Join Date
    Dec 2009
    Location
    West Midlands
    Posts
    5,665
    Thank Post
    1,484
    Thanked 1,263 Times in 857 Posts
    Rep Power
    803
    This does it but obviously you'll need to alter it to do what you need:

    Code:
    import re
    
    with open('C:\example.txt', 'r') as f:
        original = f.read()
    
    x = 0
    prevEnd=0
    output=""
    
    for m in re.finditer(r"({VAR)([0-9]+?)(})", original):
        x += 1
        VarID = m.group(2) # the number following VAR
        print VarID
        output += original[prevEnd:m.start()] + r'<img src="cid:image' + str(x) + r'">'
        prevEnd = m.end()
    
    output += original[prevEnd:]
    
    print output
    EDIT: just noticed you had already virtually got there by yourself only real difference in my version is the grouping of parts of the match with brackets which means you can then pull them out easily while you iterate the matches

    EDIT2: realised that it stops at the last match, so just added a bit to concatenate the remainder of the file, and I tidied up the regex a bit to make it more accurate (only matches if it finds '{VAR' followed by at least one number followed by '}')
    Last edited by LosOjos; 9th December 2013 at 04:15 PM.

  10. #10

    LosOjos's Avatar
    Join Date
    Dec 2009
    Location
    West Midlands
    Posts
    5,665
    Thank Post
    1,484
    Thanked 1,263 Times in 857 Posts
    Rep Power
    803
    How did you get on with this?

  11. #11

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    18,369
    Thank Post
    525
    Thanked 2,611 Times in 2,019 Posts
    Blog Entries
    24
    Rep Power
    890
    Didn't get a chance to finish coding it up - had a power cut this morning so have spent most of the day recovering all the oddities that are left over from that.

  12. #12

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    18,369
    Thank Post
    525
    Thanked 2,611 Times in 2,019 Posts
    Blog Entries
    24
    Rep Power
    890
    Well, got it working (kind of, I have a string replacement issue at the moment (after altering the first entry, based on start and end index of the text, the original text has changed so the next image in the iterator now has incorrect indices), which I'm fixing with a specific find and replace instead of using the MatchObject.start and .end indices. Once I'm happy with it, I'll post it up to show what I ended up with!

  13. #13

    LosOjos's Avatar
    Join Date
    Dec 2009
    Location
    West Midlands
    Posts
    5,665
    Thank Post
    1,484
    Thanked 1,263 Times in 857 Posts
    Rep Power
    803
    Quote Originally Posted by localzuk View Post
    Well, got it working (kind of, I have a string replacement issue at the moment (after altering the first entry, based on start and end index of the text, the original text has changed so the next image in the iterator now has incorrect indices), which I'm fixing with a specific find and replace instead of using the MatchObject.start and .end indices. Once I'm happy with it, I'll post it up to show what I ended up with!
    If you store the original document in a variable and base your regex on that as I did above, you won't have that problem

  14. #14

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    18,369
    Thank Post
    525
    Thanked 2,611 Times in 2,019 Posts
    Blog Entries
    24
    Rep Power
    890
    Yeah, that's what I've done now. Just trying to figure out why the multipart aspect of my emailing part is suddenly not working. I will succeed if it kills me.

  15. #15

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    18,369
    Thank Post
    525
    Thanked 2,611 Times in 2,019 Posts
    Blog Entries
    24
    Rep Power
    890
    Well, got it working. Gotta thoroughly test it with a variety of data but so far it seems to be holding up.

    Code:
    import smtplibfrom email.mime.multipart import MIMEMultipart
    from email.mime.text import MIMEText
    from email.mime.image import MIMEImage
    import MySQLdb
    import sys
    import re
    
    
    def py_mail():
        """With this function we send out our html email"""
        try:
            connection = MySQLdb.connect (host= "127.0.0.1", user = "USER", passwd = "PASSWORD", db = "DATABASE")
            cursor = connection.cursor(MySQLdb.cursors.DictCursor)
            # Get waiting emails
            cursor.execute("select * from email_details where email_details_sent = 0")
            data = cursor.fetchall()
            # Process each email
            for row in data :
                # Prepare email
                MESSAGE = MIMEMultipart('related')
                MESSAGE["To"] = row["email_details_to_address"]
                MESSAGE["From"] = row["email_details_sender"]
                MESSAGE.preamble = "This is a multi-part message in MIME format"
                MSGALT = MIMEMultipart('alternative')
                MESSAGE.attach(MSGALT)
                MSGTXT = MIMEText("Your email client does not support the format of this email. Please use an email client that accepts HTML email, or contact$
                MSGALT.attach(MSGTXT)
                # Search each line for {IMAGE*} string
                body = row["email_details_body_text"]
                body_to_edit = body
                subject = row["email_details_subject"]
                MESSAGE['Subject'] = subject
                pattern = "{IMAGE[0-9]*}"
                regex = re.compile(pattern, re.IGNORECASE)
                images = regex.finditer(body)
                cids = 1
                for image in images :
                    # If found, get related image from DB
                    cursor.execute("select gen_image_path from gen_image where gen_image_id = " + body[image.start(0)+6:image.end(0)-1])
                    imageDB = cursor.fetchone()
                    imagePath = imageDB["gen_image_path"]
                    # Replace line with <img src="">
                    # Find the specific index now, as the original index might have changed through iterations
                    body_to_edit = body_to_edit.replace("{IMAGE" +  body[image.start(0)+6:image.end(0)-1] + "}","<img src='cid:image" + str(cids) + "'>",1)
                    # Attach image to email, using path from DB
                    fp = open(imagePath,'rb')
                    msgImage = MIMEImage(fp.read())
                    fp.close()
                    msgImage.add_header('Content-ID', '<image' + str(cids) + '>')
                    MESSAGE.attach(msgImage)
                    cids = cids + 1
                HTML_BODY = MIMEText(body_to_edit, 'html')
                MSGALT.attach(HTML_BODY)
                # Send Email
                server = smtplib.SMTP('smtp.gmail.com:587')
                # Print debugging output when testing
                if __name__ == "__main__":
                    server.set_debuglevel(1)
                # Credentials (if needed) for sending the mail
                password = "PASSWORD"
                server.starttls()
                server.login(MESSAGE['From'],password)
                server.sendmail(MESSAGE['From'], MESSAGE['To'], MESSAGE.as_string())
                server.quit()
        except MySQLdb.Error, e:
            print "Error %d: %s" % (e.args[0],e.args[1])
            sys.exit(1)
        finally:
            if connection:
                connection.close()
    if __name__ == "__main__":
        """Executes if the script is run as main script (for testing purposes)"""
        py_mail()
    Not bad for my second ever python program.



SHARE:
+ Post New Thread
Page 1 of 2 12 LastLast

Similar Threads

  1. MS Excel find and replace CR+LF
    By firefighting in forum Office Software
    Replies: 5
    Last Post: 20th November 2013, 09:18 AM
  2. [SIMS] Individual Reports - Find and Replace
    By morrigan456 in forum MIS Systems
    Replies: 8
    Last Post: 18th April 2013, 11:18 AM
  3. Script to Find and Replace within an INI file.
    By timethrow in forum Scripts
    Replies: 1
    Last Post: 4th February 2012, 11:17 AM
  4. Replies: 1
    Last Post: 9th January 2009, 01:15 AM
  5. Mass search and replace
    By Fletcher_Bravo in forum Windows
    Replies: 2
    Last Post: 5th July 2006, 04:44 PM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •