Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 4 of 4
  1. #1
    Regular Coder
    Join Date
    Feb 2007
    Location
    London
    Posts
    225
    Thanks
    16
    Thanked 2 Times in 2 Posts

    inconsistent results with in_array

    In essence, I'm hoping someone might be able to suggest what I can try next to isolate the cause of my apparent paradox: Sometimes in_array returns TRUE that something is in it, and sometimes not - apparently randomly.

    Excuse the preamble - the background is necessary, as my problem may be connected to an earlier issue regarding the recognition of Greek characters...

    Yesterday on this forum, te senior coder NancyJ was extremely helpful to me regarding an attempt to automatically generate span lang tags around all words in Greek on a largely English-langage document.
    The thread is:
    http://www.codingforums.com/showthread.php?t=118935

    We got bogged down, however, in the question of how php recognises Greek characters, and made no progress on my main problem hence the post.

    NancyJ - if you see this - read on for an interesting development in the character recognition part: it's solved...

    PHP Code:
    <?php
    //THIS SUCCESSFULLY PUTS A SPAN AROUND EVERY GREEK LETTER!

    $greek_lang_open "<span lang=\"el\">";
    $greek_lang_close "</span>";
                        
    $greek_chars = array("α""β""γ""δ""ε""ζ""η""θ""ι""κ""λ""μ""ν""ξ""ο""π""ρ""σ""ς""τ""υ""φ""χ""ψ""ω""ά""έ""ή""ί""ό""ύ""ώ""ϊ""Α""Β""Γ""Δ""Ε""Ζ""Η""Θ""Ι""Κ""Λ""Μ""Ν""Ξ""Ο""Π""Ρ""Σ""Τ""Υ""Φ""Χ""Ψ""Ω""Ά""Έ""Ή""Ί""Ό""Ύ""Ώ""Ϊ");
                                    
    foreach (
    $greek_chars as $current_letter) {    // add span lang tags
        
    $cc_text str_replace($current_letter,$greek_lang_open.$current_letter.$greek_lang_close,$cc_text);
    }
    Now, that's great, as it proves that php is having no trouble recognising Greek characters. Moreover - this'll be of particular interest to NancyJ - it seems that there are two separate encodings for Greek capital letters that happen to look exactly like Latin capital letters. There will therefore be no confusion between Α and A, as one is really &#916 semicolon, whereas the other is some other number.

    Now for a long bit of code which I've been trying to debug, where "in_array works!" is not printed. Skip over this, please, if it's too much work to read through, and see the paradox below.

    PHP Code:
    // add <span lang> tags around Greek words
    $greek_chars = array("α""β""γ""δ""ε""ζ""η""θ""ι""κ""λ""μ""ν""ξ""ο""π""ρ""σ""ς""τ""υ""φ""χ""ψ""ω""ά""έ""ή""ί""ό""ύ""ώ""ϊ""Α""Β""Γ""Δ""Ε""Ζ""Η""Θ""Ι""Κ""Λ""Μ""Ν""Ξ""Ο""Π""Ρ""Σ""Τ""Υ""Φ""Χ""Ψ""Ω""Ά""Έ""Ή""Ί""Ό""Ύ""Ώ""Ϊ");
    $greek_lang_open "<span lang=\"el\">";
    $greek_lang_close "</span>";
    $span_open_length strlen($greek_lang_open); //measures how many chars there are so we can jump forward later after inserting
    $span_close_length strlen($greek_lang_close); //measures how many chars there are so we can jump forward later after inserting
    $pos=0//pointer
    $cc_text_length strlen($cc_text);
    $gr1 "no";
    $gr2 "no";

    while (
    $pos<$cc_text_length) {    //scan file
        
    $current_letter substr($cc_text,$pos,1); //grab next character from pointer
        
    if ($gr1 == "no" && $gr2 == "no" && in_array($current_letter$greek_chars)) {
            echo 
    "YEAH!";
            
    $gr1 "yes";
            
    $cc_text str_replace($current_letter,$greek_lang_open.$current_letter,$cc_text); // add open span lang tag
            
    $pos $pos+$span_open_length;
        }
        elseif (
    $gr1 == "yes" && in_array($current_letter$greek_chars)) {
            
    $gr1 "no";
            
    $gr2 "yes";
        }
        elseif (
    $gr2 == "yes" && !(in_array($current_letter$greek_chars))) {
            
    $gr2 "no";
            
    $gr1 "no";
            
    $cc_text str_replace($current_letter,$greek_lang_close.$current_letter,$cc_text); // add close span lang tag
            
    $pos $pos+$span_close_length;
        }
        
    $pos++; //increment the pointer
        
    $cc_text_length strlen($cc_text); //re-measure length of $cc_text, as it's changed from our adding span tags

    I've tested the following:
    echoing $current_letter inside the while loop results in the entirety of the data in $cc_text being displayed, which is what we expected, so there's no doubt that the line

    PHP Code:
        $current_letter substr($cc_text,$pos,1); 
    is correctly grabbing each character. (Note the line

    PHP Code:
                $pos++; 
    before the end of the while loop).

    So, my conclusion is that for some perculiar reason, in_array is 'not working'.

    For comparison, note that the following is successful:

    PHP Code:
    //THIS RETURNS "in_array works!", AS WE EXPECTED, GIVEN THAT ζ IS INDEED IN THE ARRAY                
    if (in_array("ζ"$greek_chars)) {
        echo 
    "in_array works!";
    } else {
        echo 
    "in_array failed";

    Any ideas what I can do to test where the problem is? Whatever I strip away from the code, I end up with in_array working sometimes, and not other times.

    Thanks a lot

  • #2
    Senior Coder NancyJ's Avatar
    Join Date
    Feb 2005
    Location
    Bradford, UK
    Posts
    3,174
    Thanks
    19
    Thanked 66 Times in 65 Posts
    Probably not much I can do to help since running your little snippet above does not put spans around every actual greek character, it places them around nearly every character (including s - which makes all the spans very nested and weird)
    Try this:
    PHP Code:
    foreach($greek_chars as $char)
    {
      if(
    in_array($char$greek_chars))
      {
        echo 
    "$char is in array<br />";
      }
      else
      {
        echo 
    "$char is not in array";
      }

    For me it says that every character is sucessfully matching in_array(), if it doesnt for you, it might show you what is working and what isnt.

    I cant test your code because with anything more than a couple of words it gives up - largely because my in_array is matching nearly every character, meaning cc_text is constantly getting longer

    I also think you should rewrite the code to be faster and more efficient.

  • #3
    Regular Coder
    Join Date
    Feb 2007
    Location
    London
    Posts
    225
    Thanks
    16
    Thanked 2 Times in 2 Posts

    Solution!

    deleted (next post is same)
    Last edited by cfructose; 08-02-2007 at 08:52 PM. Reason: double posted by accident

  • #4
    Regular Coder
    Join Date
    Feb 2007
    Location
    London
    Posts
    225
    Thanks
    16
    Thanked 2 Times in 2 Posts

    Solution!

    After endless struggling, I've succeeded! I'm posting this in case anyone finds it useful...

    (Note that the code, while working perfectly, could do with quite a bit of tightening to make it more efficient, eg, some regular expressions for the 'skip punctuation' bit etc)...

    PHP Code:
    <?php
    //convert all quotes and greek characters (all high-level UTF-8) to asciis.
    $cc_text preg_replace('/([\xc0-\xdf].)/se'"'&#' . ((ord(substr('$1', 0, 1)) - 192) * 64 + (ord(substr('$1', 1, 1)) - 128)) . ';'"$cc_text);
    $cc_text preg_replace('/([\xe0-\xef]..)/se'"'&#' . ((ord(substr('$1', 0, 1)) - 224) * 4096 + (ord(substr('$1', 1, 1)) - 128) * 64 + (ord(substr('$1', 2, 1)) - 128)) . ';'"$cc_text); 
                    
                    
    // create greek character array
    $ansi=902;
    while(
    $ansi<=974) {
        
    $greek_chars[] = "&#".$ansi.";";
        
    $ansi++;
    }
    // add angled quotation marks to array
    $greek_chars[] = "«";
    $greek_chars[] = "»";

    $greek_lang_open "<span lang=\"el\">";
    $greek_lang_close "</span>";
    $span_open_length strlen($greek_lang_open); //measures how many chars there are so we can jump forward later after inserting
    $span_close_length strlen($greek_lang_close); //measures how many chars there are so we can jump forward later after inserting
    $pos=0//pointer
    $cc_text_length strlen($cc_text);
    $current_letter_length 6;
    $first_greek_letter_found "no";
    $number_of_close_tag_occurences 0;
    $tag_detected "no";

    while (
    $pos<$cc_text_length) {    //scan file
        //grab characters in search of Greek ANSI codes
        
    $current_letter substr($cc_text,$pos,$current_letter_length);
        
        
        
        if (
    $tag_detected == "no") {
        
        
        
            
    //if 1 Greek letter has already been found:
            
    if ($first_greek_letter_found == "yes") {
                if (
    in_array($current_letter$greek_chars)) {
                    
    //if next one is Greek too, move pointer
                    
    $pos $pos+$current_letter_length-1;
                }
                
    //as long as the next group of 6 symbols doesn't start with space or punctuation, we'll put in the close span tag
                
    elseif ((substr($current_letter01) != " ") && (substr($current_letter01) != ",") && (substr($current_letter01) != "'") && (substr($current_letter01) != ";") && (substr($current_letter01) != ".")) {
                    
    // add close span tag
                    
    $cc_text substr_replace($cc_text$greek_lang_close.$current_letter$pos$current_letter_length);
                    
    //re-measure length of $cc_text, as it's changed from our adding span tags
                    
    $cc_text_length strlen($cc_text); 
                    
    $pos $pos+$span_close_length;
                    
    $first_greek_letter_found "no";//reset the var
                
    }
            }
            
            
            
            
    //if no Greek letters have been found, but this one IS Greek:
            
    elseif ($first_greek_letter_found == "no" && in_array($current_letter$greek_chars)) {
                
    $first_greek_letter_found "yes";
                
    //add open span lang tag
                
    $cc_text substr_replace($cc_text$greek_lang_open.$current_letter$pos$current_letter_length);
                
    //re-measure length of $cc_text, as it's changed from our adding span tags
                
    $cc_text_length strlen($cc_text);
                
    $pos $pos+$current_letter_length+$span_open_length-1;
            }
            
            
            
            
    //if no Greek letter found, but 1st symbol is <
            
    elseif (substr($current_letter01) == "<") {
                
    $tag_detected "yes";    
            }
        }
        
        
        
        
    //if tag_detected = yes, we want to skip all text until the close of the tag (">") to avoid putting spans within, say, image alt text that includes Greek characters.
        
    else {
            if (
    substr($current_letter01) == ">") {
                
    $tag_detected "no";
            }    
        }
        
    $pos++; //increment the pointer        
    }    
    ?>


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •