Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 14 of 14
  1. #1
    New Coder
    Join Date
    Aug 2010
    Posts
    31
    Thanks
    14
    Thanked 0 Times in 0 Posts

    Convert TSV to CSV for import- Is it necessary??

    Hello

    I paid a guy to fix a data import script my site uses, as we use a different provider for the data now.

    The old data we imported (which is very similiar) was in a CSV (comma seperated), whereas the new data is provided to us in a TSV (tab seperated)

    He spent a day figuring out how to convert the tsv to a csv, and now i notice that somewhere in that process, data is getting lost, or fields are getting mixed up. Therefore I'm asking, is it necessary to convert the data ? or can the data be imported straight from a TSV?

    Here is a sample of the code:
    system("cat {$tmp} | tr '\\t' ',' > {$fixtures}caravans.csv"); // convert TSV to CSV

    Any help would be appreciated, as although the script is importing, and therefore working to some degree, some of the data is missing and i can only think it must be in the conversion, as when i open the original tsv in excel, the missing data is there.

    Cheers for any help,
    Tree

  • #2
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,554
    Thanks
    80
    Thanked 4,620 Times in 4,583 Posts
    Well, not to ask a silly question, but... How do you then import the CSV file??

    You gave zero details of that process.

  • #3
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,554
    Thanks
    80
    Thanked 4,620 Times in 4,583 Posts
    Oh, and I'm sure you are right about the conversion process having a bug.

    CSV files need "..." around string values, in case there are any commas in the fields themselves.

    That is:
    Code:
    "Jones, Bob", "Programmer", "82750"
    TSV files do *NOT* normally include those quotes:
    Code:
    Jones, Bob      Programmer       82750
    So if indeed any of your TSV fields have commas in them, the conversion that you showed us will produce too many fields in the CSV file.

  • Users who have thanked Old Pedant for this post:

    treetops (06-19-2011)

  • #4
    New Coder
    Join Date
    Aug 2010
    Posts
    31
    Thanks
    14
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Old Pedant View Post
    Oh, and I'm sure you are right about the conversion process having a bug.

    CSV files need "..." around string values, in case there are any commas in the fields themselves.

    That is:
    Code:
    "Jones, Bob", "Programmer", "82750"
    TSV files do *NOT* normally include those quotes:
    Code:
    Jones, Bob      Programmer       82750
    So if indeed any of your TSV fields have commas in them, the conversion that you showed us will produce too many fields in the CSV file.
    Luckily, the tsv file does have quotes around long bits of text, so thats not causing a problem. However, I have resolved it by removing his convert tsv to csv, and doing the conversion myself in excel, which seems to do a better job. Thanks for the help,

  • #5
    New Coder
    Join Date
    Aug 2011
    Location
    Sydney
    Posts
    26
    Thanks
    2
    Thanked 2 Times in 2 Posts
    Why do people even bother with TSV?

    I always end up having issues with them..

  • #6
    Super Moderator guelphdad's Avatar
    Join Date
    Mar 2006
    Location
    St. Catharines, Ontario Canada
    Posts
    2,634
    Thanks
    4
    Thanked 148 Times in 139 Posts
    Why do you need to convert them to CSV for the import then? you can specify your separator upon import so can specify a TAB instead of COMMA as your separator for example.

  • #7
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,554
    Thanks
    80
    Thanked 4,620 Times in 4,583 Posts
    Quote Originally Posted by kunz View Post
    Why do people even bother with TSV?
    Possible reasons:

    (A) Because you get TSV files from some outside source that you have no control of.

    (B) Because they are easier to create than CSV [you don't have to put the quotes around strings that might have commas in them].

    (C) Because they aren't a bother at all if you know what you are doing.

  • Users who have thanked Old Pedant for this post:

    kunz (08-29-2011)

  • #8
    New Coder
    Join Date
    Aug 2011
    Location
    Sydney
    Posts
    26
    Thanks
    2
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by Old Pedant View Post
    Possible reasons:

    (A) Because you get TSV files from some outside source that you have no control of.

    (B) Because they are easier to create than CSV [you don't have to put the quotes around strings that might have commas in them].

    (C) Because they aren't a bother at all if you know what you are doing.
    Fair enough..your post does make sense.

    I just find it easier to tell my clients to always use CSV - makes like so much easier when importing/exporting to other products as CSV is used everywhere.

  • #9
    New Coder
    Join Date
    Aug 2011
    Location
    Sydney
    Posts
    26
    Thanks
    2
    Thanked 2 Times in 2 Posts
    Couldn't edit my post above - so my apologies for a new post.

    I've come across a clients TSV file (they have close to 100,000 of these files) where we were missing data when being imported.

    After some analysis we found that some of these files had been edited over the years by humans who have mistakenly added spaces instead of TABS.

    Such a headache

    We're running it by a script that removes any occurrence of three or more spaces together and replacing it with a TAB instead. Is there a better way to do this programmatically before we import all that data into their new CRM?
    “Don't pray for lighter burdens, but for stronger backs.”
    Deals in Sydney

  • #10
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,554
    Thanks
    80
    Thanked 4,620 Times in 4,583 Posts
    After doing that, you might run it through a simple program (PHP, ASP, whatever) that simply counts the number of TABs in each line and tells you of any lines where the count is wrong. Then you could go hand edit those lines back to what they need to be.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    kunz (08-30-2011)

  • #11
    New Coder
    Join Date
    Aug 2011
    Location
    New Jersey
    Posts
    20
    Thanks
    1
    Thanked 0 Times in 0 Posts
    I've never had anything but problems using TSV files in the past but sometimes it's not always up to you on what to use I guess.

  • #12
    New to the CF scene
    Join Date
    Sep 2011
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Why mess with all this, just the CSVED package. Use it to open the csv file , change the delimiter and save it. Works like a breeze for me.

  • #13
    New Coder
    Join Date
    Aug 2011
    Posts
    26
    Thanks
    2
    Thanked 1 Time in 1 Post
    Does it have to be a regex? Can just Parse the CSV using your fav csv library, and then rejoin using tabs?
    Code:
    require 'csv'
    
     test = '"foo,bar,baz",one,two,three'
     CSV.parse_line(test).join("\t")
    
      "foo,bar,baz\tone\ttwo\tthree"

  • #14
    New Coder
    Join Date
    Aug 2011
    Posts
    26
    Thanks
    2
    Thanked 1 Time in 1 Post
    .............
    Last edited by dhape; 09-17-2011 at 11:34 PM. Reason: double post
    “Computers are useless. They can only give you answers.”
    (Pablo Picasso) Where To Find Coupons and Video Testimonials


  •  

    Tags for this Thread

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •