Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Page 1 of 2 12 LastLast
Results 1 to 15 of 19
  1. #1
    New Coder
    Join Date
    Aug 2014
    Posts
    10
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Best Language/Technology for CSV files

    This may be a little long and I apologize for that, particularly since it is my first post here.

    I have a couple projects going on at work that are going to involve taking .csv files that are exported from another application and analyzing the data therein. These .csv files contain quite a bit of data and will contain anywhere from about 4,000 - 6,000 lines in the spreadsheet.

    I cant post a snippet of the spreadsheet as the company I work for would frown upon that, LOL. However, basically the file will contain data about programs and scripts that run, their start time and end time among other data. I have one .csv export that deals with when the programs fail and one that deals when they run and complete successfully. The export that deals with failures I wrote a VBA script that filters the data down and organizes it so that we can look at 30-day periods. It works well but I am not the best VBA coder around as I just started learning this year. What I have works but I want to do more and in a better fashion and analyze over a longer period of time.

    Whew! Okay, so all that to ask what would be the best coding language(s) to learn in order to take these .csv exports, filter out the data we don't need, put whats left in a database, display that data as a HTML table and be able to run various analysis on that data? For example, since one of the data points I am interested in involves the start and stop times, I would like to be able to calculate the actual time it took to complete but also store that result and then be able to calculate the average completion time so that later I can check whether or not a particular execution time is taking longer than normal.

    I guess the short and sweet way to all this would be to just write some VBA to build a spreadsheet that will do all that, but it seems like a rather convoluted way to do it all. I've been studying, in addition to VBA, Java and JavaScript as I wanted to learn some Android development. It seemed JavaScript w/JQuery might be a good combo to work with and would have the added benefit of being all client side so it could all stay within the team and dept. I work with and not need web/server space, etc. There are some really light weight JavaScript databases but it seems if I need to work with a database Im going to need storage and that means web/server space. In turn, if I have to go that route, then wouldn't PHP and MySQL then be a better way to go?

    At any rate, as I said, it may be a long post. At this point I am just looking for the best coding language to do all this. I don't mind learning, I enjoy it but I don't necessarily want to learn and be familiar with a particular language only to find out that its not going to be able to do what I want well or was not the best route. So I thought would subject you all to this enormously long post to get your input. On top of that, hopefully I placed this in the correct are.

    Thanks in advance for taking the time to read this and for any help and insight you may be able to provide.

  • #2
    Master Coder
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    9,509
    Thanks
    8
    Thanked 1,090 Times in 1,081 Posts
    I would say PHP.

    Of course, that's because it's popular and many of us know a lot about it.
    You would need a server though, so that's probably not what you want.

    I sort of question the part about your company logging/reporting information about programs and scripts they run. That seems to tell me that they have a server of their own and someone (or some people) that must be doing programming? Like a group if I.T. people? If they have these programs and scripts running on their server, why do you need to deal with these CSV files? Or, if there is no server, how is all of it working?

    If you use a server-side script (like PHP), we'll need to know more about the server they use. Is it UNIX, Windows? Can it even run PHP? There is also Perl (which people often call CGI). Perl is really powerful at analyzing string data. Or, if it's a Windows servers, ASP is probably the choice.

    If there is no server, you'll have to either install one of your own on an extra PC (WAMP w/ PHP), or use a webhost. If you have a PC with a server, you can do all of it using PHP on one computer, without any internet connection.

  • #3
    New Coder
    Join Date
    Aug 2014
    Posts
    10
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Thanks for your reply, very much appreciated.

    To address your questions, yes they do have I.T. people. In fact, my dept. is under the larger I.T. umbrella. However, while all the main data I am after is stored in databases (mainly Oracle). They use a little of everything actually, Unix, ASP, Korn Shell scripts, Batch file scripts, Oracle, MS SQL, etc.

    The problem is that they are VERY stingy (rightly so, its a very large company) with access to such things. I have spoken to my supervisor and the person we have that interfaces more directly with those systems and databases and they support what I am wanting and trying to do whole heartedly, but are tied to some extent by the restrictive access. So my plan, then, was to use the information that is available to me and requires no access beyond what I already have and that comes in the form of those .csv files.

    Its difficult to explain and not give out information I'm not sure they would be all that joyous about me giving out. Suffice it to say, you are correct in your assessment, but I am limited due to the access I could get and thus am resorting to using the data available to me in order to make this work, make a proof of concept type thing to perhaps drive the issue that we need need access to this data more directly, even if its just read only, just so we or myself can grab the data and make use of it. I know its the long way around, but it's what I have to work with. Additionally, it is also a means for me to learn and grow and hopefully add to my own skill set so that I have more 'worth' career wise.

    I can set up XAMPP or even WAMP Server on my PC, they wont scream too much about that so that way I don't need access to their sever environments. They use Sharepoint throughout the company and I originally thought that that would be the route to go since its all tightly integrated but I ran into the access issues there as to be expected.

    At any rate, that is where I am at. Long story short, Im doing something of an end run (with supervisor approval) to show its worthwhile and to bolster the case for at least read access instead of having to work with the .csv files.

  • #4
    Master Coder
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    9,509
    Thanks
    8
    Thanked 1,090 Times in 1,081 Posts
    I think you should go the WAMP server way. Get PHP operating and use that to process your CSV file. Everything you do will be contained on that PC and nowhere else. That is pretty secure and private.

    If you can, make-up a fake Excel file with data similar to your company ... at least have the cells that are text be text, and the integers and real numbers be what they should be. Then, save it as a CSV file and show us that fake data.

    We can show you how to explode that CSV file into an array and then start doing the number crunching.

    I'm not an ASP person, but if that is possible to do on a single PC (as a server), that might be another possibility.
    You'll just have to ask about ASP in a different forum, where the ASP experts are located.

  • Users who have thanked mlseim for this post:

    StormStrikes (08-15-2014)

  • #5
    New Coder
    Join Date
    Aug 2014
    Posts
    10
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Thank you for your reply mlseim.

    I can make a fake Excel file with no problem and I will get on that and get it uploaded as soon as I am able.

    Ive been exploring some options that would allow me to set up a server environment that would not require any installation on the PC (just trying to keep it as local and as secure as I can) and found a couple options that look to be promising in the way of Server2Go and another called UwAmp though of the two Server2Go seems the most viable. So if the PHP/MySQL is the best route then I can at least avoid needing to install anything specific on the PC's and just run from a thumb drive or something like that. Which actually would give some portability to show the concept to others if needed without having to worry if they have a specific application installed.

    I'm personally not ready to tackle the ASP direction. I have enough on my plate as it is trying to learn Java and Web Development languages, LOL. However, if it were to go in the ASP.Net direction then that brings in the possibility of using Sharepoint which I know nothing about past the limited use I make of it for work, mainly documentation. However, for now, I will stick with this direction and just work my way up the food chain and learn what I can as best I can along the way.

  • #6
    New Coder
    Join Date
    Aug 2014
    Posts
    10
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Okay, I have attached a .zip file that contains the .csv test spreadsheet. I believe that if this all turns out as well as I hope, I may be able to make use of the other columns in the spreadsheet as I could build search queries and such to further analyze the data. However, for now, the three columns I am most interested in using for now are the 'Name' (contains the actual job/program name), 'Start Time' and 'End Time'. The goal being to calculate the total run time then from that build a running average run time of that particular job. All of which will be stored in the database. From there, then export the current list of jobs that are running, how long they have been running and compare that to the database stored average run time and then flag any jobs that are taking longer than normal.

    Right now, the process is exceptionally tedious and time consuming and requires that we go through the jobs running one at a time. There can be thousands set to run (scheduled) and anywhere from a few dozen to a few hundred actually running at any given time and one by one we have to check the start time, then check its average run time, then make a judgement call as to whether or not the job is running properly or not.

    Being able to export the current list of running jobs, calculate its current running time and compare that to the stored average time and generate HTML output that gives a visual indication would save us a TON of time, I kid you not. We have to go through this process several times during a given work shift as some jobs can run and finish in a matter of seconds, others can take many hours depending on the job and data involved.

    At any rate, thanks again for your help. I will be ordering a book or two on PHP so I can learn all this interesting stuff. I've dabbled in it in the past, but nothing deeper than just to build very basic web pages. What I have learned of Java (and though not nearly as in-depth, VBA) has really got me tuned up to learn more about coding and I cant think of a better way to learn than to tackle real world tasks.
    Attached Files Attached Files

  • #7
    Master Coder
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    9,509
    Thanks
    8
    Thanked 1,090 Times in 1,081 Posts
    Here's what I'm thinking.

    What column identifies the jobs that we can group together and process the times for?
    Is it "Name", "Application"?

    I think the best way to do this would be to have a PHP script that loads the CSV file that you put into a directory.
    Perhaps there could be a PHP upload script, but you'll always be using one computer anyhow.
    That CSV file is parsed and put into a MySQL table.
    Then, with MySQL they are grouuped by job, then go through the start time / stop time calculations to get an average.
    The output (report) gets displayed for you.
    You could decide whether to reuse the MySQL table, or create a new one each time.

    That would be my plan.

    If you can get everything running on your computer (server, PHP, MySQL) and make a test script to verify it runs,
    that would be a great way to start. If I have time during the weekend, I could try to come up with some scripts.
    There are MySQL experts on the PHP portion of this forum. I expect this thread should be moved to PHP.

    The power of the whole thing will really be MySQL. Like you, I think it can do the heavy processing ... it's just
    coming up with the queries that will be the hardest.

  • #8
    New Coder
    Join Date
    Aug 2014
    Posts
    10
    Thanks
    1
    Thanked 0 Times in 0 Posts
    The column that identifies the jobs is 'Name'. The name will always be unique on this particular export. The Mem Name column is similar to the Name column, the only difference being that it can sometimes contain the two or three letter file extension (.ksh, .exe, .sh, .scr, etc.) but the Name column would be much better to key off of since that is what we use when we are searching for or checking the jobs.

    I was up until the wee hours of the morning playing with various portable Web Servers and settled on two. XAMPP and USBWebServer. XAMPP is pretty much full featured (along with having an FTP server) and USBWebServer is pretty darn lightweight but still has MySQL, PHPMyAdmin, etc. So I am going to load both up on a USB stick and call good as I can use either or.

    I like the plan you mentioned. I don't mind manually putting the .csv file in a folder on the server, that's not much trouble and then just initiate everything from the webpage that will be used to view all the data. I don't know how much complexity it would add, but ideally each .csv import into the database would add to what is already there. In other words, one .csv import has job001 and it is determined that it took 10 minutes to complete. Then on the next day the .csv is imported and it is determined that job001 took 20 minutes to complete. The database then would update the average run time to 15 minutes and so on. So as long as the .csv file can be imported and the average time updated, that would be perfect.

    Now, I GREATLY appreciate you willingness to come up with some scripts. I cannot begin to convey how thankful I am for your willingness to do so. However, that said, I would like to take full advantage of this and learn myself. I realize it is going to take time because I'm behind the learning curve but I see this as also gaining skills that are going to have usefulness for the foreseeable future either with the company or elsewhere (though I really love where I work and have no plans to go anywhere else at the moment).

    So, all that said, I would love to learn and figure some of this out as well. I would be exceptionally grateful for your help and expertise and insight. I just don't want anyone thinking I am asking hey can you do this for me, because that is not the case (trying to say all this and not tick anyone off or offend anyone), I definitely want to figure this out I just may need time to put all the pieces together.

  • #9
    Senior Coder
    Join Date
    Sep 2010
    Posts
    2,194
    Thanks
    15
    Thanked 253 Times in 253 Posts
    I'm using Linux and have a localhost Apache server with PHP and other software. But i can also use the BASH shell to deal with .csv files. The 'while read' feature reads a file line by line and performs operations on that line. I've used it to separate a line and make a new line with the desired parts in a specified order. Unlike PHP the shell doesn't time out and doesn't require a server.
    Welcome to http://www.myphotowizard.net

    where you can edit images, make a photo calendar, add text to images, and do much more.


    When you know what you're doing it's called Engineering, when you don't know, it's called Research and Development. And you can always charge more for Research and Development.

  • #10
    New Coder
    Join Date
    Aug 2014
    Posts
    10
    Thanks
    1
    Thanked 0 Times in 0 Posts
    You're preaching to the choir. I love Linux and wish it were used at work as the desktop OS instead of Windows 7, but I am stuck with it there.

  • #11
    Master Coder
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    9,509
    Thanks
    8
    Thanked 1,090 Times in 1,081 Posts
    There are a ton of tutorials and examples on Google for PHP/MySQLi

    It's easy to get your csv file into a MySQL table, the harder part will be to
    write the queries to 'group' and do date calculations. When you get to that
    part, there are people on the other sections of this forum (PHP, SQL) who
    know alot about that.

    This thread should probably be moved to PHP.

  • #12
    New Coder
    Join Date
    Aug 2014
    Posts
    10
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Thanks mlseim. Ive been digging up the videos on PHP and MySQL.

    If anyone can move the thread, it would be much appreciated.

  • #13
    New Coder
    Join Date
    Sep 2014
    Posts
    22
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Python is an excellent choice. The csv module makes reading and writing CSV files easy (even Microsoft's, uh, "idiosyncratic" version) and Python syntax is a breeze to pick up.

  • #14
    New Coder
    Join Date
    Aug 2014
    Posts
    10
    Thanks
    1
    Thanked 0 Times in 0 Posts
    I thought it may as well but at the time I was thinking of using a language that would already be in use here at work and I am not aware of any Python usage here, though I just may not be aware of it. So in the mean time I have been brushing up on HTML5 & CSS3 and am now into JavaScript and after that I have some material to learn from with regards to PHP.

    That all said, I already had material on Python, a couple good video courses in fact, I just have not been through them yet as it seemed all of this was going in a different direction. However, I still plan on learning Python because it does appear to be a relatively easy language to learn and I would like to perhaps use both methods to tackle this and see which works best. Hopefully through that, I learn and grow and am a bit more diverse in the manners that I can tackle such projects.

  • #15
    New Coder
    Join Date
    Apr 2014
    Location
    Daytona Beach, FL
    Posts
    54
    Thanks
    0
    Thanked 1 Time in 1 Post
    This may be a little long and I apologize for that, particularly since it is my first post here.
    All you did was make it longer..

    Use python.


  •  
    Page 1 of 2 12 LastLast

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •