Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 11 of 11
  1. #1
    New Coder
    Join Date
    Dec 2012
    Posts
    54
    Thanks
    12
    Thanked 0 Times in 0 Posts

    Distinct Count of values, but need a wildcard

    If I use this query:
    Code:
    SELECT name,COUNT(*) as count FROM tablename GROUP BY name ORDER BY count DESC;
    It will give me the count of each occurrence of a distinct value of a field. However, I am wondering if there would be a way to modify this to have a wildcard.

    The background is, I have a field in my table that contains URL's. There are plenty that are in the same domain...ie www.codingforums.com, however, they all contain different points of reference after the .com. More specifically, my question is I want to search for number of occurrences for the domains. They all should begin either http:// or https:// but not all would have the www. and such.

    Not sure it is possible but figured I would ask since Google searching didn't really give me anything useful to go off of.

  • #2
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,687
    Thanks
    80
    Thanked 4,655 Times in 4,617 Posts
    Quote Originally Posted by mharrison View Post
    If I use this query:
    Code:
    SELECT name,COUNT(*) as count FROM tablename GROUP BY name ORDER BY count DESC;
    It will give me the count of each occurrence of a distinct value of a field.
    *** NOT TRUE! ***

    It will count *ALL* records *PER NAME*.

    If you wanted a count of DISTINCT VALUES then you would need to use:
    Code:
    SELECT name, COUNT( DISTINCT fieldname ) AS count
    FROM tablename GROUP BY name ORDER BY count DESC
    And if you wanted a count of NON-NULL values of a given field you would do
    Code:
    SELECT name, COUNT( fieldname ) AS count
    FROM tablename GROUP BY name ORDER BY count DESC
    But using COUNT(*) simply counts *ALL* records. Period.

    ***********

    Having cleared that up...
    The background is, I have a field in my table that contains URL's. There are plenty that are in the same domain...ie www.codingforums.com, however, they all contain different points of reference after the .com. More specifically, my question is I want to search for number of occurrences for the domains. They all should begin either http:// or https:// but not all would have the www. and such.
    So, more simply stated, you want to group by domain name alone. Yes?

    There's no simple way build into MySQL. It's true that MySQL has regular expressions, which would seem the ideal way, but unfortunately they are limited to use with the RLIKE operator.

    So you could easily search for all occurrences of codingforums.com for example, but FINDING all the domains is much trickier.

    A lot of it will depend on how "regular" the contents of that field are. Does the field *ONLY* contain URLs? Or does it contain URLs embedded within text? e.g.,
    Be sure to visit http://ebay.com for every day bargains
    ??
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #3
    New Coder
    Join Date
    Dec 2012
    Posts
    54
    Thanks
    12
    Thanked 0 Times in 0 Posts
    The field only contains URL's, no additional text...

    The URL's are formed more like
    http://www.codingforums.com/newreply...te=1&p=1303255
    rather than just www.codingforums.com

  • #4
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,687
    Thanks
    80
    Thanked 4,655 Times in 4,617 Posts
    *IF* all the fields contain *ONLY* URLs of the form
    Code:
        http(s)://(www.)xxxxx.yyy/anything
    Then I think we can do this. But what about a url such as http://download.microsoft.com? That is, a subdomain?

    Are you worried about them? I think it would be tough to create a simple expression to look for those, as well.

    Ignoring subdomains, this SEEMS to work:
    Code:
    SELECT 
       SUBSTR(
           REPLACE(REPLACE(REPLACE(CONCAT(url,'/'),'http://',''),'https://',''),'www.',''),
           1,
           LOCATE('/',REPLACE(REPLACE(REPLACE(CONCAT(url,'/'),'http://',''),'https://',''),'www.',''))-1
       ) AS domain, 
       COUNT(*) AS howmany
    FROM urls
    GROUP BY domain
    ORDER BY howmany DESC
    But *ONLY* if your URL field contains urls alone, no other text.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #5
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,687
    Thanks
    80
    Thanked 4,655 Times in 4,617 Posts
    How it works:

    First, we concat a '/' to the end of the URL, just in case there is none there already.

    Then we replace 'http://' and 'https://' and 'www.' with nothing.

    That gets us down to something like codingforums.com/gobbledygook/

    So then we use LOCATE to find the first '/' in that (which MIGHT be the one we added at the first step).

    And then we use SUBSTR to get all the characters up to but not including that first '/'.

    Make sense?
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #6
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,687
    Thanks
    80
    Thanked 4,655 Times in 4,617 Posts
    If it's not obvious, that will collect subdomains separately from their domains.

    That is, you might get something like this:
    Code:
    microsoft.com           119
    download.microsoft.com   17
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    mharrison (12-31-2012)

  • #7
    New Coder
    Join Date
    Dec 2012
    Posts
    54
    Thanks
    12
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Old Pedant View Post
    How it works:

    First, we concat a '/' to the end of the URL, just in case there is none there already.

    Then we replace 'http://' and 'https://' and 'www.' with nothing.

    That gets us down to something like codingforums.com/gobbledygook/

    So then we use LOCATE to find the first '/' in that (which MIGHT be the one we added at the first step).

    And then we use SUBSTR to get all the characters up to but not including that first '/'.

    Make sense?
    It does make sense, however I double checked my table and it does contain subdomains as well. Perhaps if it were possible to also ignore the first part of the subdomain as we did with the www., then as long as I could display the entire field at the end, I think I would get good results. I don't see any duplicate domains that have different sudomains....it seems if I have download.microsoft.com, that is all I have, I don't have ie.microsoft.com as far as I can tell.

  • #8
    New Coder
    Join Date
    Dec 2012
    Posts
    54
    Thanks
    12
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Old Pedant View Post
    If it's not obvious, that will collect subdomains separately from their domains.

    That is, you might get something like this:
    Code:
    microsoft.com           119
    download.microsoft.com   17
    Ok, got notified of this reply as I was replying. If the query you posted returns results like you have above, then that would be exactly what I would want to see.

  • #9
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,687
    Thanks
    80
    Thanked 4,655 Times in 4,617 Posts
    I *can* get rid of subdomains. It's just much more complex.

    The first thought is to just get rid of everything in front of the first period.

    That simplifies things tremendously: No more need to distinguish between http: and https:

    BUT... What about domains with *NO* prefix?

    Example: What if you have just http://ebay.com/xxx?

    If I strip every thing before (and including) the first period we would end up with just com

    OOPS?
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #10
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,687
    Thanks
    80
    Thanked 4,655 Times in 4,617 Posts
    Quote Originally Posted by mharrison View Post
    Ok, got notified of this reply as I was replying. If the query you posted returns results like you have above, then that would be exactly what I would want to see.
    ?? You are saying it is okay to get those subdomains separately? In that case we are done.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    mharrison (01-01-2013)

  • #11
    New Coder
    Join Date
    Dec 2012
    Posts
    54
    Thanks
    12
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Old Pedant View Post
    ?? You are saying it is okay to get those subdomains separately? In that case we are done.
    Yes, getting the subdomains separately is fine. Thanks again for your help!


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •