+ Post New Thread
Results 1 to 12 of 12
*nix Thread, e-mail bad word filter in Technical; ...
  1. #1
    MicrodigitUK's Avatar
    Join Date
    May 2007
    Location
    Wiltshire
    Posts
    336
    Thank Post
    37
    Thanked 55 Times in 51 Posts
    Rep Power
    24

    e-mail bad word filter

    I need a bit of help from a bash script expert. Or at least someone who knows what they are doing.

    I have used the “Zimbra content filter” wiki to build a postfix gateway server to filter bad words in emails.

    This is all working ok but the problem is it is filtering a bit too much for the staff. So I have made the decision to set it up to bypass the filter for staff users.

    I have looked at the new script on the “Zimbra content filter updated” wiki. This looks to be a step in the right direction as it has the functionality to have a student list in a text file and then only filters those users.

    But really I want something a bit more dynamic to prevent me from having to remember to update the student list.

    I was thinking instead of a student list have a staff list of users that bypass the filter and then use a LDAP search on the server to get this list instead of a text file. All of my users have the email filed populated in Active Directory on the server so this shouldn’t be a problem.

    I have got my LDAP search perfected so that it pulls out the staff and admin email addresses and then pipe it into “grep” to strip out the rubbish. Below is an example of the commands I am running:

    Code:
    ldapsearch -h sheldon.internal -p 389 -s base -b "OU=Establishments,DC=Sheldon,DC=Internal" -s sub "(&(objectCategory=user)(|(memberOf=CN=SHS Teaching Staff,OU=SHS,OU=Establishments,DC=Sheldon,DC=Internal)(memberOf=CN=SHS Non-Teaching Staff,OU=SHS,OU=Establishments,DC=Sheldon,DC=Internal) (memberOf=CN=Domain Admins,CN=Users,DC=Sheldon,DC=Internal)) (mail=*))" "mail" -D "SHELDON\ldapbind" -w "mypassword" | grep mail:
    And that produces a list like followes:

    Code:
    mail: test1@sheldonschool.co.uk
    mail: test2@sheldonschool.co.uk
    mail: test3@sheldonschool.co.uk
    So my question is how do I pull it all together? I know it’s got something to do with “grep” and possibly some string manipulation but I have no idea where to start.

    Can someone please help.

  2. #2

    bossman's Avatar
    Join Date
    Nov 2005
    Location
    England
    Posts
    3,942
    Thank Post
    1,199
    Thanked 1,069 Times in 760 Posts
    Rep Power
    330
    @MicrodigitUK:

    Forgive me for asking but am I right in thinking that you actually want the staff to be able to use badwords?

    We use the filter for all as we feel that is how it should be used.

    What problems are the staff incurring?

    Can they not swear in their e-mails hehe!!

  3. Thanks to bossman from:

    MicrodigitUK (7th January 2010)

  4. #3

    Edu-IT's Avatar
    Join Date
    Nov 2007
    Posts
    7,149
    Thank Post
    403
    Thanked 623 Times in 569 Posts
    Rep Power
    181
    Could it be that in fact the filter is too sensitive and needs tweaking slightly so it's more appropriate for all users?

  5. Thanks to Edu-IT from:

    MicrodigitUK (7th January 2010)

  6. #4

    localzuk's Avatar
    Join Date
    Dec 2006
    Location
    Minehead
    Posts
    17,807
    Thank Post
    517
    Thanked 2,469 Times in 1,913 Posts
    Blog Entries
    24
    Rep Power
    835
    Staff sometimes need to be able to swear in emails - as they could be discussing a child's behaviour and describing what they said.

  7. Thanks to localzuk from:

    MicrodigitUK (7th January 2010)

  8. #5
    MicrodigitUK's Avatar
    Join Date
    May 2007
    Location
    Wiltshire
    Posts
    336
    Thank Post
    37
    Thanked 55 Times in 51 Posts
    Rep Power
    24
    I have cut down on the words to make it less sensitive but further cut downs would make it almost useless.

    So I have taken the decision to allow staff to bypass the filter.

    Other people must have this problem that’s why the script was revised in the new “Zimbra content filter updated” wiki.

    I really wouldn’t like to get into an argument about if staff should be filtered or not.

    There must be one or to Linux scripter’s out there that can point me in the right direction.

  9. #6

    Edu-IT's Avatar
    Join Date
    Nov 2007
    Posts
    7,149
    Thank Post
    403
    Thanked 623 Times in 569 Posts
    Rep Power
    181
    webman is probably a great person to help you on this.

  10. #7

    webman's Avatar
    Join Date
    Nov 2005
    Location
    North East England
    Posts
    8,406
    Thank Post
    639
    Thanked 961 Times in 661 Posts
    Blog Entries
    2
    Rep Power
    324
    I'd recommend putting the LDAP search command in a daily cron at a period of low network activity, and have it output the results to a file.

    Your LDAP search command so far looks great - it's outputting the addresses fine - we just need to remove the mail: prefix to the addresses.

    If you append this to the end of the command (to pipe the result into awk) it should just produce the email address:

    Code:
    | awk '{print $2}'
    so the full line would look like this (split over multiple lines just for clarity)

    Code:
    ldapsearch 
      -h sheldon.internal -p 389 -s base 
      -b "OU=Establishments,DC=Sheldon,DC=Internal" 
      -s sub "(&(objectCategory=user)(|(memberOf=CN=SHS Teaching Staff,OU=SHS,OU=Establishments,DC=Sheldon,DC=Internal)(memberOf=CN=SHS Non-Teaching Staff,OU=SHS,OU=Establishments,DC=Sheldon,DC=Internal) (memberOf=CN=Domain Admins,CN=Users,DC=Sheldon,DC=Internal)) (mail=*))" "mail" 
      -D "SHELDON\ldapbind" -w "mypassword" 
      | grep mail:
      | awk '{print $2}'
    Give that a go and let us know how it goes.

  11. Thanks to webman from:

    MicrodigitUK (10th January 2010)

  12. #8
    MicrodigitUK's Avatar
    Join Date
    May 2007
    Location
    Wiltshire
    Posts
    336
    Thank Post
    37
    Thanked 55 Times in 51 Posts
    Rep Power
    24
    Thanks, Webman for that. I hadn’t even thought of a daily cron but that makes total sense.

    After thinking about this again, I realised that your approach to have a student list made more sense because mail accounts like netman, head and cover are not listed in AD but all of the students are.

    Then I came across the wonderful 1000 LDAP limit and had fun getting around that and reconfiguring the DC’s LDAP settings.

    Then due to other members of staff entering students I found that not all had email set in AD . But correct me if I’m wrong Webman but the whole address is not required by your filter script to use student list (well that’s how I read it). So I changed the ldapsearch to look for just the username and will sort out AD email another day.

    Webman do u think checking a list of 2500+ active student users every time a mail passes through will have a major impact on the filters performance? I haven’t implemented the new filter script yet but am slightly concerned about the size of the student list.

    Now I have come up with the following and have tested and it produced a list of all student usernames.

    Code:
    #!/bin/bash
     
    # Bash shell LDAP query to produce text file with a list of student usernames.
    # It is to be used in combanation with: /usr/local/sbin/filter.sh
    #
    # If you have more than 1000 student users dont forget to change server LDAP defalts
    #       
     
    # Changable variables
    #
    FULLDOMAIN=sheldon.internal
    BASESEARCH="OU=Students,OU=SHS,OU=Establishments,DC=Sheldon,DC=Internal"
    BINDUSER="SHELDON\ldapbind"
    PASSFILE="/home/.ldapcredentials"
    STUDENTS="/etc/students"
    #
    
    ldapsearch \
      -h $FULLDOMAIN -p 389 -l 60 -s base \
      -b $BASESEARCH \
      -s sub "(&(objectCategory=person)(objectClass=user)(!(userAccountControl:1.2.840.113556.1.4.803:=2)))" "sAMAccountName" \
      -D $BINDUSER \
      -y $PASSFILE | grep sAMAccountName: | awk '{print $2}' > $STUDENTS
    
    exit $?
    That’s all ok, but I have spotted a potential problem. If all DCs are off (for say maintenance or god forbid failure) when the daily cron runs then the student list is overwritten with no users in the list. And hay presto unfiltered mail!!!!!

    How would I go about checking there are users returned before overwriting the file?

  13. #9

    webman's Avatar
    Join Date
    Nov 2005
    Location
    North East England
    Posts
    8,406
    Thank Post
    639
    Thanked 961 Times in 661 Posts
    Blog Entries
    2
    Rep Power
    324
    Quote Originally Posted by MicrodigitUK View Post
    But correct me if I’m wrong Webman but the whole address is not required by your filter script to use student list (well that’s how I read it).
    Correct - it seems the whole address is not required, as I've been using that script with just the username part with great success. But having the email wouldn't be any worse - use whichever is easiest for your setup.

    Quote Originally Posted by MicrodigitUK View Post
    Webman do u think checking a list of 2500+ active student users every time a mail passes through will have a major impact on the filters performance?
    Any extra process that has to be done during mail delivery will have some impact - in time and/or system resources.

    To take one example from our server - according to the logs during mail delivery for the filter, the Postfix delay value is 0.13 when the person is not in the student list, and 3.5 when it is. Our student list is 1,172 lines long.

    Grep is rather efficient, and you can always test the speed of it manually before implementing the filter. To do this, get the raw source text of emails (one in the student list, one not) and save to a text file (e.g. /tmp/email-staff). Then run it through the same command used in the script:

    Code:
    cat /tmp/email-staff | grep -Eif /etc/students
    Quote Originally Posted by MicrodigitUK View Post
    That’s all ok, but I have spotted a potential problem. If all DCs are off (for say maintenance or god forbid failure) when the daily cron runs then the student list is overwritten with no users in the list. And hay presto unfiltered mail!!!!!

    How would I go about checking there are users returned before overwriting the file?
    Probably the simplest solution would be to test the return value/exit status of ldapsearch, and only update the file if it definitely contains students. Hopefully, changing the FULLDOMAIN variable to something that doesn't exist will make it return non-zero.

    The modified script here specifies the path to a temporary students file which the ldapsearch command will output to. If it succeeds, the exit status is hopefully 0, so the script will then copy and overwrite /tmp/students to /etc/students. If it fails, it will hopefully not be 0, so will just exit without copying - leaving the previous student list in-tact.

    Code:
    #!/bin/bash
     
    # Bash shell LDAP query to produce text file with a list of student usernames.
    # It is to be used in combanation with: /usr/local/sbin/filter.sh
    #
    # If you have more than 1000 student users dont forget to change server LDAP defalts
    #       
     
    # Changable variables
    #
    FULLDOMAIN=sheldon.internal
    BASESEARCH="OU=Students,OU=SHS,OU=Establishments,DC=Sheldon,DC=Internal"
    BINDUSER="SHELDON\ldapbind"
    PASSFILE="/home/.ldapcredentials"
    STUDENTS="/etc/students"
    STUDENTS_TEMP="/tmp/students"
    #
    
    ldapsearch \
      -h $FULLDOMAIN -p 389 -l 60 -s base \
      -b $BASESEARCH \
      -s sub "(&(objectCategory=person)(objectClass=user)(!(userAccountControl:1.2.840.113556.1.4.803:=2)))" "sAMAccountName" \
      -D $BINDUSER \
      -y $PASSFILE | grep sAMAccountName: | awk '{print $2}' > $STUDENTS_TEMP
    
    if [ $? -eq 0 ] ; then
        cp $STUDENTS_TEMP $STUDENTS
    fi
    
    exit $?
    Hope that helps

  14. Thanks to webman from:

    MicrodigitUK (22nd January 2010)

  15. #10
    MicrodigitUK's Avatar
    Join Date
    May 2007
    Location
    Wiltshire
    Posts
    336
    Thank Post
    37
    Thanked 55 Times in 51 Posts
    Rep Power
    24
    Ok I am still having some problems with the student list.

    It all looks ok but the filter is still running for everyone.

    I think it might have something to do with the last line in the list that is blank, so the filter is finding it in every ones emails.

    Any thought on this issue?

    Does your student list have a blank line on the end?

  16. #11

    webman's Avatar
    Join Date
    Nov 2005
    Location
    North East England
    Posts
    8,406
    Thank Post
    639
    Thanked 961 Times in 661 Posts
    Blog Entries
    2
    Rep Power
    324
    No, I don't think it does. I'm not a regular expression guru, so not sure what to suggest to use to ignore it.

  17. #12
    MicrodigitUK's Avatar
    Join Date
    May 2007
    Location
    Wiltshire
    Posts
    336
    Thank Post
    37
    Thanked 55 Times in 51 Posts
    Rep Power
    24
    Just an update to say I found the problem. It turned there was a student account with the username Sheldon witch is the name of the school so was in everyone’s email address.

    So I tweaked the LDAP search script to check the username starts with two digits and two non numerics. So it only pulls out users like 01test and 09test.

    Code:
    ldapsearch \
      -h $FULLDOMAIN -p 389 -l 60 -s base \
      -b $BASESEARCH \
      -s sub "(&(objectCategory=person)(objectClass=user)(!(userAccountControl:1.2.840.113556.1.4.803:=2)))" "sAMAccountName" \
      -D $BINDUSER \
      -y $PASSFILE | grep sAMAccountName: | awk '{print $2}' | grep '^[0-9][0-9][^0-9][^0-9]' > $STUDENTS_TEMP
    Also just to point out the blank line at the bottom of the student list doesn’t make any difference.

SHARE:
+ Post New Thread

Similar Threads

  1. Replies: 15
    Last Post: 11th December 2009, 02:31 PM
  2. Zimbra bad word filter
    By reggiep in forum *nix
    Replies: 18
    Last Post: 12th October 2009, 12:18 PM
  3. AB Control / Tutor - Word Filter
    By ninjabeaver in forum Network and Classroom Management
    Replies: 0
    Last Post: 11th February 2009, 02:42 PM
  4. Post Bad Word Filter
    By crc-ict in forum Comments and Suggestions
    Replies: 2
    Last Post: 18th October 2006, 02:49 PM
  5. Replies: 0
    Last Post: 21st July 2006, 04:21 PM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •