Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 6 of 6
  1. #1
    New to the CF scene
    Join Date
    Jul 2014
    Posts
    2
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Post Beginner Coder with Class Project

    Hello, I am a new programmer and was given the task of creating a database of 10,000 words. These words need to found by generating a program to randomly search the internet and select 5-10 pages and be able to read and store the words alphabetically. Oh and the words all have to be different, no repeats....Any suggestions?? Thanks for all the help in advance!

  • #2
    Senior Coder
    Join Date
    Jan 2011
    Location
    Missouri
    Posts
    4,316
    Thanks
    23
    Thanked 613 Times in 612 Posts
    Break things into sections that you have to be accomplished.
    What language are you going to use?
    Get a page and extract all the words. I'd suggest YAHOO and their news pages.
    Put the web page code into a string and remove the HTML tags.
    Figure out how to get the next news page and do the same thing.
    Figure out how to remove duplicate words.
    Put all the remaining words into a string or array, count number of words if less then 10,000 repeat.

    Not too hard.
    Evolution - The non-random survival of random variants.

    "If you leave hydrogen alone, for long enough, it begins to think about itself."

  • Users who have thanked sunfighter for this post:

    Browne (07-19-2014)

  • #3
    New Coder
    Join Date
    Aug 2013
    Posts
    36
    Thanks
    1
    Thanked 6 Times in 6 Posts
    What does it mean to randomly search the internet?
    What are the criteria for selecting a given number of pages?
    Are there any restrictions on the tools you can use for this task (e.g. programming language)?

  • #4
    New to the CF scene
    Join Date
    Jul 2014
    Posts
    2
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Thumbs up

    I will be using the language Java to create it. However we have to give the program a direction on google say that the program googled the word "apple." The first ten pages is where I want all the words to come from and for the words to check themselves against one another to make sure they are not duplicated. Thank you though for the help so far!

    Quote Originally Posted by sunfighter View Post
    Break things into sections that you have to be accomplished.
    What language are you going to use?
    Get a page and extract all the words. I'd suggest YAHOO and their news pages.
    Put the web page code into a string and remove the HTML tags.
    Figure out how to get the next news page and do the same thing.
    Figure out how to remove duplicate words.
    Put all the remaining words into a string or array, count number of words if less then 10,000 repeat.

    Not too hard.

  • #5
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,986
    Thanks
    79
    Thanked 4,432 Times in 4,397 Posts
    Easiest way to avoid duplicates is to create a MySQL table of words where the "word" field is the primary key and then use SQL "INSERT IGNORE" to add each word.

    Any duplicate words will automatically be ignored and then you can simply do something like SELECT COUNT(word) FROM words to see if you have reached 10,000 words yet.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #6
    Senior Coder alykins's Avatar
    Join Date
    Apr 2011
    Posts
    1,759
    Thanks
    41
    Thanked 191 Times in 190 Posts
    Quote Originally Posted by Old Pedant View Post
    Easiest way to avoid duplicates is to create a MySQL table of words where the "word" field is the primary key and then use SQL "INSERT IGNORE" to add each word.

    Any duplicate words will automatically be ignored and then you can simply do something like SELECT COUNT(word) FROM words to see if you have reached 10,000 words yet.
    That is a slick idea- I wouldn't have thought of pk'ing on varchar xD

    I code C hash-tag .Net
    Reference: W3C W3CWiki .Net Lib
    Validate: html CSS
    Debug: Chrome FireFox IE


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •