Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 10 of 10
  1. #1
    Senior Coder joh6nn's Avatar
    Join Date
    Jun 2002
    Location
    72° W. 48' 57" , 41° N. 32' 04"
    Posts
    1,887
    Thanks
    0
    Thanked 1 Time in 1 Post

    request for idea critique

    i have an idea for a captcha system that i'd like to run by other people before i begin working on it, just to see if there's something i've missed.

    my idea is based off the fact that most captchas are either pictures or sound files. the problem with this that they're not very friendly to people with either mobile devices or special needs (like blind people with screen readers). i came to the conclusion that in order for a captcha to be accessible, but also useful, it would need to be text based, but not easily parseable by a bot. what i came up with follows:

    1. pick a large amount of public domain text. (eg Shakespeare, the Bible, US Constitution, etc.)
    2. randomly select no more than 5 lines from your text
    3. select a random piece of data from within the text (eg: first word, first letter of last word, etc)
    4. create random phrasing asking for identification of the random data. eg:
      "Please identify the first word in last sentence",
      "What is the last word in the first sentennce?",
      "Could you tell me the third word in the third sentence, please?"


    it seems to me that such a task would be sufficiently difficult to keep out most bots, but is at the same time accessible to people. does anyone see any problems here, or reasons why i shouldn't go ahead with this?
    bluemood | devedge | devmo | MS Dev Library | WebMonkey | the Guide

    i am a loser geek, crazy with an evil streak,
    yes i do believe there is a violent thing inside of me.

  • #2
    New Coder
    Join Date
    Feb 2006
    Posts
    28
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Cool

    In my humble opinion, this is a wonderful idea - just keep it simple

  • #3
    Senior Coder
    Join Date
    Aug 2003
    Location
    One step ahead of you.
    Posts
    2,815
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Well as long there are no benefits of writing a bot for that it should work. The bot would probably even use some of the captcha code as captcha would also have to know what it is asking for.
    I'm not sure if this was any help, but I hope it didn't make you stupider.

    Experience is something you get just after you really need it.
    PHP Installation Guide Feedback welcome.

  • #4
    Regular Coder ralph l mayo's Avatar
    Join Date
    Nov 2005
    Posts
    951
    Thanks
    1
    Thanked 31 Times in 29 Posts
    Sounds like a lot of work for the db to pick random lines from all that text, and it's not really a hard AI problem unless the directions are hard to parse (ie., in an image like a regular captcha). How about picking small easily identifiable icons of everyday things and asking questions about an array of them in a dynamic GD image? Ie, what item is in the 1st row of the 2nd column, which items are there the most of, what do these items have in common, whatever. It'll be harder AI and probably more technically manageable.

    Without distortion or a large bank of pictures it's still pretty easily breakable, but you're going to have to balance technical complexity, user annoyance, accessibility, and attack resistance in appropriate proportions.

  • #5
    Senior Coder joh6nn's Avatar
    Join Date
    Jun 2002
    Location
    72° W. 48' 57" , 41° N. 32' 04"
    Posts
    1,887
    Thanks
    0
    Thanked 1 Time in 1 Post
    marek: i thought about the possibility of bots using the captcha code too. i think that can be mitigated by making the instructions sufficiently random, and sufficiently complex

    ralph: reverting to media files, either pictures or anything else, defeats the entire purpose. media files don't work as well for people with alternate browsers. that was the entire reason for doing it text based.

    also, why would pulling a random entry from a DB be hard? i don't understand what you mean by that.
    bluemood | devedge | devmo | MS Dev Library | WebMonkey | the Guide

    i am a loser geek, crazy with an evil streak,
    yes i do believe there is a violent thing inside of me.

  • #6
    Regular Coder ralph l mayo's Avatar
    Join Date
    Nov 2005
    Posts
    951
    Thanks
    1
    Thanked 31 Times in 29 Posts
    Text based protections are inherently defeatable by regex and will only keep out the least inclined attacker. At this point I think users that can't either see an image or listen to a short sound clip are negligible. They can buy a new treo or install fluxbox or something. You can't design for 100 percent of the population and if you could the last thing you would implement would be a captcha. While I'm sure all 17 lynx users appreciate your position, captchas limit access *by definition*, and if they're not annoying or discouraging to normal users they aren't to attackers either.

    Random access from a large set of data is expensive because of the way MySQL (and possibly other RDBs) handle it internally, which essentially results in a test for every column. To get around it you can select count and do your own random key, which I guess really isn't too bad.

  • #7
    Senior Coder
    Join Date
    Aug 2003
    Location
    One step ahead of you.
    Posts
    2,815
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by joh6nn
    marek: i thought about the possibility of bots using the captcha code too. i think that can be mitigated by making the instructions sufficiently random, and sufficiently complex.
    But the CAPTCHA mechanism must be able tu generate that random ting and know to which word it points to. This will become quite limited as the amount of possibilities will be the amount of different CAPTCHA generated questions not the text length it uses.
    I'm not sure if this was any help, but I hope it didn't make you stupider.

    Experience is something you get just after you really need it.
    PHP Installation Guide Feedback welcome.

  • #8
    Senior Coder
    Join Date
    Oct 2003
    Location
    Australia
    Posts
    1,963
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I like your reasons for doing this, but at the end of the day the more accessible it is to users with varying dissabilities, the more open it is to parsing by bots... semi-defeating the purpose in the process.

    The reason for this is that assistive technologies are merely user-directed bots. If their bot can make sense of it, so can a bot built with malicious intentions.

    The only way around the above problem is by (as you say) "making the instructions sufficiently random, and sufficiently complex".

    Now we're not only blocking users on the basis of their physical ability, but also on the basis of their cognitive ability. ouch

    In my opinion, all CAPTCHAs and usability are mutually exclusive to a degree, regardless of the CAPTHCA type. This is because no matter where you move the tipping point of "block users vs. block bots", it will always exist. Healthy balances can be found for most situations, but I believe the site's/app's user base needs to be thoroughly researched prior to implementing something like this. For that reason, I have difficulty supporting the idea of any 'one size fits all' solution

    I like your solution because it moves the usability tipping point further in favour of users, but in doing so, also makes the CAPTCHA much less effective at blocking machines.
    As Ralph stated earlier, you got text, we got regex. Everything you can output, we can deconstruct

    I take no responsibility for the above nonsense.


    Left Justified

  • #9
    Super Moderator
    Join Date
    May 2002
    Location
    Perth Australia
    Posts
    4,073
    Thanks
    11
    Thanked 98 Times in 96 Posts
    I dunno, I like the idea , I dont like captcha , I am fully sighted (and not entirely dense) and I still get captcha's wrong sometimes if the image fonts are weird enough (or the destinction between lower and upper case is fuzzy).

    If you were echoing a random quote and then asking for a random $x'th word of that quote for verification and you were using an image for the question and the quote ... I think that would be hard enough to put off most bots ?
    resistance is...

    MVC is the current buzz in web application architectures. It comes from event-driven desktop application design and doesn't fit into web application design very well. But luckily nobody really knows what MVC means, so we can call our presentation layer separation mechanism MVC and move on. (Rasmus Lerdorf)

  • #10
    Senior Coder joh6nn's Avatar
    Join Date
    Jun 2002
    Location
    72° W. 48' 57" , 41° N. 32' 04"
    Posts
    1,887
    Thanks
    0
    Thanked 1 Time in 1 Post
    well, i'm pretty sure you guys are over-estimating what can be done with Regexes. there's plenty of things that are hard to do with pattern matching. the real problem is in keeping the instructions from being repetitive, like marek says: if the bank of questions you build from is too limited, then a bot could brute force for question samples, and then build a DB of questions to work off of. the bulk of the code in this project would be randomly generating instructions that are simple enough that you could follow them as they were read aloud to you, but that are too difficult for a machine to follow. i think that this is difficult, but possible.

    i'd still like to give it a try at any rater, though for all i know, it could be beyond my coding ability; i've never implemented a decision tree before. i'll update here as the project progresses
    bluemood | devedge | devmo | MS Dev Library | WebMonkey | the Guide

    i am a loser geek, crazy with an evil streak,
    yes i do believe there is a violent thing inside of me.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •