❮❮❮
LLMs and Sapir Whorf
❮❮❮
❯❯❯
Configuration of optimisation models in F#
❯❯❯
Data
Thoughts about data.
Reading time: 2 min.
Basics
- Data is given and as-is. Indices into other elements beyond the data are just indices.
- Data is finite. Otherwise we are talking about streams/processes/delayed computations/promises etc.
- Data is either atomic or composite.
- The choice of atoms and composites varies with the domain.
Information
- Meaning and information are only provided by context.
- Context refers to the fact that lots of data is in textual form and literal reading of text misses the bigger picture.
- Data is only a representation of objects in some abstract domain. The domain includes constraints and invariants.
Abstract data and representation
- Abstract data types allow to capture the invariants and constraints of the domain.
- Construction of abstract data may fail.
- Representation of abstract data is an injection in the structural side and always succeeds.
- Roundtripping from ADT -> Representation -> ADT never fails.
Shape
- The structure of raw data in informatics is linear (and via grammars) can be represented as a tree. This is serialisation and deserialisation.
- Non-informatic physical data may have a 3d-structure. Example: 2d grid on paper. But in informatics everything is just 1d.
- Data already inside some domain may contain already complex invariants and multi-dimensional or hypergraph structures.
- Going from linearised data to the domain may involve jumps through several domains (e.g.: lex to tokens -> parser to AST, DTO <-> domain data, digits -> number -> currency amount).
Immutability
- Data is always implemented immutably.
- But there may be immutable “objects” implemented by non-data.
- Classic example: a read-only database replica. Is not data, because the connection/data retrival may fail.
- Function references.
- Immutable references in high-level languages. May be implemented by different pointers due to a moving garbage collector.
- An immutable datastructure with a caching/memoizing/amortising element.
- Indices between several immutable data elements.
Implementation
- A type system can track not only data, but also abstract data.
- The runtime of a system may track a subset of data via known implementation atoms, construction-only composities and deep freezing of objects.
- This can be a bit in the object header, set upon construction or freezing.