Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 6 of 6
  1. #1
    New to the CF scene
    Join Date
    Jul 2014
    Posts
    2
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Post Beginner Coder with Class Project

    Hello, I am a new programmer and was given the task of creating a database of 10,000 words. These words need to found by generating a program to randomly search the internet and select 5-10 pages and be able to read and store the words alphabetically. Oh and the words all have to be different, no repeats....Any suggestions?? Thanks for all the help in advance!

  • #2
    Senior Coder
    Join Date
    Jan 2011
    Location
    Missouri
    Posts
    4,462
    Thanks
    23
    Thanked 634 Times in 633 Posts
    Break things into sections that you have to be accomplished.
    What language are you going to use?
    Get a page and extract all the words. I'd suggest YAHOO and their news pages.
    Put the web page code into a string and remove the HTML tags.
    Figure out how to get the next news page and do the same thing.
    Figure out how to remove duplicate words.
    Put all the remaining words into a string or array, count number of words if less then 10,000 repeat.

    Not too hard.
    Evolution - The non-random survival of random variants.

    "If you leave hydrogen alone, for long enough, it begins to think about itself."

  • Users who have thanked sunfighter for this post:

    Browne (07-19-2014)

  • #3
    New Coder
    Join Date
    Aug 2013
    Posts
    37
    Thanks
    1
    Thanked 6 Times in 6 Posts
    What does it mean to randomly search the internet?
    What are the criteria for selecting a given number of pages?
    Are there any restrictions on the tools you can use for this task (e.g. programming language)?

  • #4
    New to the CF scene
    Join Date
    Jul 2014
    Posts
    2
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Thumbs up

    I will be using the language Java to create it. However we have to give the program a direction on google say that the program googled the word "apple." The first ten pages is where I want all the words to come from and for the words to check themselves against one another to make sure they are not duplicated. Thank you though for the help so far!

    Quote Originally Posted by sunfighter View Post
    Break things into sections that you have to be accomplished.
    What language are you going to use?
    Get a page and extract all the words. I'd suggest YAHOO and their news pages.
    Put the web page code into a string and remove the HTML tags.
    Figure out how to get the next news page and do the same thing.
    Figure out how to remove duplicate words.
    Put all the remaining words into a string or array, count number of words if less then 10,000 repeat.

    Not too hard.

  • #5
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,588
    Thanks
    80
    Thanked 4,497 Times in 4,461 Posts
    Easiest way to avoid duplicates is to create a MySQL table of words where the "word" field is the primary key and then use SQL "INSERT IGNORE" to add each word.

    Any duplicate words will automatically be ignored and then you can simply do something like SELECT COUNT(word) FROM words to see if you have reached 10,000 words yet.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #6
    Senior Coder alykins's Avatar
    Join Date
    Apr 2011
    Posts
    1,823
    Thanks
    42
    Thanked 199 Times in 198 Posts
    Quote Originally Posted by Old Pedant View Post
    Easiest way to avoid duplicates is to create a MySQL table of words where the "word" field is the primary key and then use SQL "INSERT IGNORE" to add each word.

    Any duplicate words will automatically be ignored and then you can simply do something like SELECT COUNT(word) FROM words to see if you have reached 10,000 words yet.
    That is a slick idea- I wouldn't have thought of pk'ing on varchar xD

    I code C hash-tag .Net
    Reference: W3C W3CWiki .Net Lib
    Validate: html CSS
    Debug: Chrome FireFox IE


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •