Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 9 of 9
  1. #1
    Regular Coder
    Join Date
    Jun 2006
    Location
    UK
    Posts
    922
    Thanks
    302
    Thanked 3 Times in 3 Posts

    Question Reading and storing partial data from a file

    Hello

    I have a question about reading chunks of data from a file.

    Suppose I have the following array saved in a file:

    Code:
    Array
    (
        [Japan] => Array
            (
                [0] => 101
                [1] => 102
                [2] => 103
            )
    
        [China] => Array
            (
                [0] => 202
                [1] => 203
                [2] => 204
            )
    
        [Chicago] => Array
            (
                [0] => 303
                [1] => 304
                [2] => 305
            )
    
    )
    Is there anyway I can open the file but just read and store the "Chicago" array elements in a variable? The reason I am asking is because The file many contain a huge list of arrays and reading the entire array and dumping them in the memory wont be a good idea. So how can we accomplish this?


    Thanks in advance

    PS: The extension of the file is .json and the data storage format would be JSON as well..
    Last edited by phantom007; 03-09-2014 at 10:09 AM.

  • #2
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,994
    Thanks
    4
    Thanked 2,662 Times in 2,631 Posts
    Use a DBMS; this is exactly what they were designed for.
    Otherwise, in order to do as you are looking for, you'll need to write a heavily customized random access file to let you jump to offsets in a file to read records. It'll likely require multiple files to do well.
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 
    Been gone for a few months, and haven't programmed in that long of a time. Meh, I'll wing it ;)

  • Users who have thanked Fou-Lu for this post:

    phantom007 (03-12-2014)

  • #3
    Regular Coder
    Join Date
    Jun 2006
    Location
    UK
    Posts
    922
    Thanks
    302
    Thanked 3 Times in 3 Posts
    Thanks for the reply. I cannot use a DMBS as its a requirement of the project to use file system as a database.


    I was thinking of a way to maintain a separate file that would play the role of indexing the data. Just like a file pointer which knows where to seek to. For example, I'd query the index file to know exactly which position to seek to and then open the .json file with the pointer exactly to that position.

    Any idea you can think this could be made possible?
    Last edited by phantom007; 03-10-2014 at 05:13 AM.

  • #4
    Regular Coder
    Join Date
    Sep 2002
    Posts
    462
    Thanks
    0
    Thanked 20 Times in 20 Posts
    its a requirement of the project to use file system as a database.
    it's called a flatfile database. Most computer applications use them.
    NO Limits!! DHCreationStation.com
    ------------------------------------------------------------
    For projects using MediaTypes (MIMETypes) visit E-BAM.net -(updated weekly)

    Broken items wanted for tinkerin'! PostItNow@BrokenEquipment.com
    Global Complaint Dept.

  • Users who have thanked c1lonewolf for this post:

    phantom007 (03-12-2014)

  • #5
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,994
    Thanks
    4
    Thanked 2,662 Times in 2,631 Posts
    There are tons of options available for how you want to approach it. The easiest thing to do is use lots and lots of files / directories. Lots of files. Easy reads, easy writes, simple to maintain but very messy.

    To keep it smaller though, you would need to really evaluate the structure of the data and write the best possible structure possible. Random access files work by having a pre-defined structure for the records that allows you to move between items in chunks. In other words, I say that I have 1500 records of 64 bytes in size. I can find any specific record by jumping in steps of 64 bytes. Indexing is where the fun comes in.

    I wouldn't store any of this data in json since its not directly interpretable with PHP. Store it in a way that makes more sense to handle from the language that needs to interpret it. If your file always has exactly 3 integers for each city, then this is an easy task; you would create each "record" of size 37 bytes for example. That would let you have 25 characters for the "name", and 3 integers for the data. Index files can be used to provide specific location information for the offsets, but if you need to keep scalability in mind, you'll need to create a custom m-tree to properly scan through with the fewest jumps possible.

    If its dynamic, I'd make a record of size 2 bytes, one byte for the data, and another to tell it where the next record can be found in the file. You can use the indexing directly in a header section for the data, or separate files or whatever, and its job is to point to the first item associated with the set. It keeps following until it finds a null pointer where it has nowhere else to go. Pro is that this is easy to maintain, con is that deletions are a bit of a pain. Without a reverse record to look at (if you don't care about size, you can use 3 bytes, first byte for previous record, second byte for data, and third for next record to scroll back), you need to iterate with a shadow and when you identify the proper item you need to null fill the shadow's pointer.

    So this isn't too difficult of a task to do overall. The problem is you need to think out how the structure will be, and program directly for that structure. This is little different from db design; you should have this all laid out with what you want to do and identify any flaws of it prior to starting anything. If you make an error on the design, it will likely roll the project back to the very beginning again, so make sure you put the proper emphases on the design phase before you start anything.
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 
    Been gone for a few months, and haven't programmed in that long of a time. Meh, I'll wing it ;)

  • Users who have thanked Fou-Lu for this post:

    phantom007 (03-12-2014)

  • #6
    Regular Coder
    Join Date
    Jun 2006
    Location
    UK
    Posts
    922
    Thanks
    302
    Thanked 3 Times in 3 Posts
    Thanks for the reply guys.

    @Fou-Lu, lets say I were to main indexes in a separate file, so in order to read the indexes, I'd need to again load the entire index file into the memory and then iterate through each line to find the index, right?

  • #7
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,994
    Thanks
    4
    Thanked 2,662 Times in 2,631 Posts
    That's why you'd need to construct m-tree files instead. You'll never find based on O(1), at least not at a high scale level, but there are many tricks to reduce the number down to fractions of what an iteration would take.
    Unfortunately, PHP like all interpreted languages are very slow and the growth of the filesystem files will likely have a huge impact over time.
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 
    Been gone for a few months, and haven't programmed in that long of a time. Meh, I'll wing it ;)

  • Users who have thanked Fou-Lu for this post:

    phantom007 (03-12-2014)

  • #8
    Regular Coder
    Join Date
    Jun 2006
    Location
    UK
    Posts
    922
    Thanks
    302
    Thanked 3 Times in 3 Posts
    Fou-Lu, I am impressed by your in-depth knowledge. How did u learn all these? whats your professional background?

  • #9
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,994
    Thanks
    4
    Thanked 2,662 Times in 2,631 Posts
    I went to school for it. I don't program for a living though, I'm a systems admin.
    Part of our course included database design concepts, which had brief course material on the evolution of dbms', and I simply took that knowledge and merged it with my knowledge of programming structures like linked lists to come up with a reasonably efficient (for an interpreted language) structure for reading data. Indexing is still the tricky part of it all. Also, the lists would never compact on deletions, but theoretically a list of tombstoned locations could be tracked if there is any need to worry about efficiency. That way you simply have only a few tombstoned locations and use them prior to appending. Alternatively, use maintenance tasks to reindex everything and compact the tombstoned locations out by rewriting the files and slotting items back into them. The file structure doesn't need to be in order if you follow record pointers (useful for dynamic growth; don't use them for set sizes as its more efficient and compacts the space if you know the maximum size).
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 
    Been gone for a few months, and haven't programmed in that long of a time. Meh, I'll wing it ;)

  • Users who have thanked Fou-Lu for this post:

    phantom007 (03-13-2014)


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •