Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Programmer Humor
  3. Why shouldn't you use YAML to store eye tracking data? /s

Why shouldn't you use YAML to store eye tracking data? /s

Scheduled Pinned Locked Moved Programmer Humor
programmerhumor
57 Posts 32 Posters 2 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Q This user is from outside of this forum
    Q This user is from outside of this forum
    [email protected]
    wrote on last edited by [email protected]
    #1
    This post did not contain any content.
    B M lime@feddit.nuL nathan@piefed.alphapuggle.devN S 11 Replies Last reply
    111
    • Q [email protected]
      This post did not contain any content.
      B This user is from outside of this forum
      B This user is from outside of this forum
      [email protected]
      wrote on last edited by
      #2

      I really like YAML but way too many people use it beyond its purpose... I work with Gitlabci and seeing complex bash scripts inline in YAML files makes me want to hurt people.

      1 Reply Last reply
      10
      • Q [email protected]
        This post did not contain any content.
        M This user is from outside of this forum
        M This user is from outside of this forum
        [email protected]
        wrote on last edited by
        #3

        Maybe use a real database for that? I'm a fan of simple tools (e.g. plaintext) for simple usecases but please use appropriate tools.

        N 1 Reply Last reply
        37
        • Q [email protected]
          This post did not contain any content.
          lime@feddit.nuL This user is from outside of this forum
          lime@feddit.nuL This user is from outside of this forum
          [email protected]
          wrote on last edited by [email protected]
          #4

          i mean, json is valid yaml

          1 Reply Last reply
          12
          • Q [email protected]
            This post did not contain any content.
            nathan@piefed.alphapuggle.devN This user is from outside of this forum
            nathan@piefed.alphapuggle.devN This user is from outside of this forum
            [email protected]
            wrote on last edited by
            #5

            This isn't YAML, this is just sparkling JSON

            Z 1 Reply Last reply
            80
            • M [email protected]

              Maybe use a real database for that? I'm a fan of simple tools (e.g. plaintext) for simple usecases but please use appropriate tools.

              N This user is from outside of this forum
              N This user is from outside of this forum
              [email protected]
              wrote on last edited by
              #6

              What is wrong with a file for this? Sounds more like a local log or debug output that a single thread in a single process would be creating. A file is fine for high volume append only data like this. The only big issue is the format of that data.

              What benefit would a database bring here?

              M Q N T A 5 Replies Last reply
              10
              • N [email protected]

                What is wrong with a file for this? Sounds more like a local log or debug output that a single thread in a single process would be creating. A file is fine for high volume append only data like this. The only big issue is the format of that data.

                What benefit would a database bring here?

                M This user is from outside of this forum
                M This user is from outside of this forum
                [email protected]
                wrote on last edited by
                #7

                Some order in the CSV data, if it weren't a logfile, which i didn't know.

                1 Reply Last reply
                0
                • N [email protected]

                  What is wrong with a file for this? Sounds more like a local log or debug output that a single thread in a single process would be creating. A file is fine for high volume append only data like this. The only big issue is the format of that data.

                  What benefit would a database bring here?

                  Q This user is from outside of this forum
                  Q This user is from outside of this forum
                  [email protected]
                  wrote on last edited by [email protected]
                  #8

                  It's used to export tracking data to analyze later on. Something like SQLite seems like a much better choice to me.

                  E 1 Reply Last reply
                  3
                  • N [email protected]

                    What is wrong with a file for this? Sounds more like a local log or debug output that a single thread in a single process would be creating. A file is fine for high volume append only data like this. The only big issue is the format of that data.

                    What benefit would a database bring here?

                    N This user is from outside of this forum
                    N This user is from outside of this forum
                    [email protected]
                    wrote on last edited by
                    #9

                    I think SQLite is a great middle ground. It saves the database as a single .db file, and can do everything an SQL database can do. Querying for data is a lot more flexible and a lot faster. The tools for manipulating the data in any way you want are very good and very robust.

                    However, I'm not sure how it would affect file size. It might be smaller because JSON/YAML wastes a lot of characters on redundant information (field names) and storing numbers as text, which the database would store as binary data in a defined structure. On the other hand, extra space is used to make common SQL operations happen much faster using fancy data structures. I don't know which effect is greater so file size could be bigger or smaller.

                    S G 2 Replies Last reply
                    17
                    • N [email protected]

                      What is wrong with a file for this? Sounds more like a local log or debug output that a single thread in a single process would be creating. A file is fine for high volume append only data like this. The only big issue is the format of that data.

                      What benefit would a database bring here?

                      T This user is from outside of this forum
                      T This user is from outside of this forum
                      [email protected]
                      wrote on last edited by
                      #10

                      Smaller file size, lower data rate, less computational overhead, no conversion loss.

                      A 64 bit float requires 64 bits to store.
                      ASCII representation of a 64 bit float (in the example above) is 21 characters or 168 bits.
                      Also, if every record is the same then there is a huge overhead for storing the name of each value. Plus the extra spaces, commas and braces.
                      So, you are at least doubling the file size and data throughput. And there is precision loss when converting float-string-float. Plus the computational overhead of doing those conversions.

                      Something like sqlite is lightweight, fast and will store the native data types.
                      It is widely supported, and allows for easy querying of the data.
                      Also makes it easy for 3rd party programs to interact with the data.

                      If you are ever thinking of implementing some sort of data storage in files, consider sqlite first.

                      N 1 Reply Last reply
                      9
                      • Q [email protected]
                        This post did not contain any content.
                        S This user is from outside of this forum
                        S This user is from outside of this forum
                        [email protected]
                        wrote on last edited by
                        #11

                        Fuck yaml. I'm not parsing data structured with spaces and newlines with my eyes. Use visible characters.

                        D 1 Reply Last reply
                        21
                        • N [email protected]

                          What is wrong with a file for this? Sounds more like a local log or debug output that a single thread in a single process would be creating. A file is fine for high volume append only data like this. The only big issue is the format of that data.

                          What benefit would a database bring here?

                          A This user is from outside of this forum
                          A This user is from outside of this forum
                          [email protected]
                          wrote on last edited by
                          #12

                          Because this is not log or debug data as OP said. In any case, what do you think would happen with this data? It will be analyzed by some sort of tool because no one could manually look at this much text data. In text, this can be like 1MB of data per second. So in a normal eye tracking session, probably hundreds of MB. The problem isn't the storage space, but the time it will take to read that in and analyze it each time, forcing you to wait for processing or use lots of memory while reading it. And anyway, in most languages, it's actually much easier to store the number values directly (in 8 bytes not the 30something this text representation uses) than to convert them to JSON, all languages have some built-in way to do that. And even if not, sqlite is piss-easy and does everything for you, being as simple as JSON.

                          There is just no reason to do it like that unless you just don't think about what you're doing or have no clue.

                          N 1 Reply Last reply
                          1
                          • Q [email protected]
                            This post did not contain any content.
                            S This user is from outside of this forum
                            S This user is from outside of this forum
                            [email protected]
                            wrote on last edited by
                            #13

                            Also let's represent all numbers in scientific notation, I'm sure that's going to make it easier to read...

                            1 Reply Last reply
                            4
                            • Q [email protected]
                              This post did not contain any content.
                              D This user is from outside of this forum
                              D This user is from outside of this forum
                              [email protected]
                              wrote on last edited by
                              #14

                              This is nasty to look at

                              1 Reply Last reply
                              3
                              • Q [email protected]
                                This post did not contain any content.
                                R This user is from outside of this forum
                                R This user is from outside of this forum
                                [email protected]
                                wrote on last edited by [email protected]
                                #15

                                Why you shouldn't use YAML

                                D B 2 Replies Last reply
                                8
                                • R [email protected]

                                  Why you shouldn't use YAML

                                  D This user is from outside of this forum
                                  D This user is from outside of this forum
                                  [email protected]
                                  wrote on last edited by
                                  #16

                                  The best approach would be to never use yaml for anything

                                  1 Reply Last reply
                                  14
                                  • N [email protected]

                                    I think SQLite is a great middle ground. It saves the database as a single .db file, and can do everything an SQL database can do. Querying for data is a lot more flexible and a lot faster. The tools for manipulating the data in any way you want are very good and very robust.

                                    However, I'm not sure how it would affect file size. It might be smaller because JSON/YAML wastes a lot of characters on redundant information (field names) and storing numbers as text, which the database would store as binary data in a defined structure. On the other hand, extra space is used to make common SQL operations happen much faster using fancy data structures. I don't know which effect is greater so file size could be bigger or smaller.

                                    S This user is from outside of this forum
                                    S This user is from outside of this forum
                                    [email protected]
                                    wrote on last edited by
                                    #17

                                    I didn't look to much at the data but I think csv might actually be an appropriate format for this?

                                    Nice simple plaintext and very easy to parse into a datastructure for analysing/using it in python or similar

                                    N 1 Reply Last reply
                                    2
                                    • Q [email protected]
                                      This post did not contain any content.
                                      W This user is from outside of this forum
                                      W This user is from outside of this forum
                                      [email protected]
                                      wrote on last edited by
                                      #18

                                      I’d probably just use line delimited JSON or CSV for this use case. It plays nicely with cat and other standard tools and basically all the yaml is doing is wrapping raw json and adding extra parse time/complexity.

                                      In the end consider converting this to parquet for analysis, you probably won’t get much from compression or row-group clustering, but you will get benefits from the column store format when reading the data.

                                      Q 1 Reply Last reply
                                      3
                                      • Q [email protected]
                                        This post did not contain any content.
                                        F This user is from outside of this forum
                                        F This user is from outside of this forum
                                        [email protected]
                                        wrote on last edited by
                                        #19

                                        I'm amazed at developers who don't grasp that you don't need to have absolutely everything under the sun in a human readable file format. This is such a textbook case...

                                        M D C fuckbigtech347@lemmygrad.mlF 4 Replies Last reply
                                        21
                                        • W [email protected]

                                          I’d probably just use line delimited JSON or CSV for this use case. It plays nicely with cat and other standard tools and basically all the yaml is doing is wrapping raw json and adding extra parse time/complexity.

                                          In the end consider converting this to parquet for analysis, you probably won’t get much from compression or row-group clustering, but you will get benefits from the column store format when reading the data.

                                          Q This user is from outside of this forum
                                          Q This user is from outside of this forum
                                          [email protected]
                                          wrote on last edited by [email protected]
                                          #20

                                          Thanks for the advice, but this is just the format of some eyetracking software I had to use not something I develop myself

                                          W 1 Reply Last reply
                                          2
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups