pull down to refresh

RANT: Cryptography and Algorithm Papers Should Use Code, Not Math Notation

5016 sats \ 23 comments \ @l0k18 20 Jul 2023 tech

Everyone who has read white papers for this thing or that thing will be familiar with this, unless they are minimum B.Sc. in some CS or mathematics. I did a little with matrix math and similar in my advanced math class in school, but the teacher was abominable and the syllabus for this subject did not have as much "first principles" stuff as regular math (which included calculus and differential equations) or chemistry or physics, all of which had some type of algebra to learn. Boyle's law, atomic energies of activation, molecular weights, etc.

When you read a paper, that is literally basically needed for programmers to have a description of the algorithm to work with, if you can't read the bizarre set notation and various kinds of complex calculus notations used, if you are lucky, you can surmise it out of breadcrumbs of actual human natural language that explains the principle. Or you are wasting a lot of time on the few formulas that are critical, learning the specifics of the notation involved.

But if it were written instead in simplified english, aka "pseudocode", everyone would be able to access it, even less academic people would have a decent chance of catching the gist.

IMHO, this practise is elitism, and ironically excludes the most valuable and potentially innovative thinkers whose models are either textual or visual, and just plain excludes everyone who hasn't basically done at least a few months of basic advanced mathematics in college as part of a degree.

When it comes down to it, what is the purpose of inventing these algorithms?

Writing code.

view all related items

50 sats \ 3 replies \ @orthwyrm 20 Jul 2023

I don't work in cryptography or academia, but I do develop algorithms for image processing and stuff like that. I find it helpful to both provide mathematical notation and coded example when documenting my work.

Mathematics provides the detail, but as you say it's quite hard to interpret. A coded example is more human-readable and can provide a nice overview, but it sacrifices fidelity (as you need to be very familiar with the language and functions to understand exactly what is going on under the hood).

Perhaps academic papers need to remain lean and minimize word count if they're being published in journals? I could see why coded examples would be neglected in this case.

50 sats \ 0 replies \ @l0k18 OP 20 Jul 2023

I agree that formulas are necessary for things like signal processing systems, even the dynamic difficulty adjustment, with its derivatives and integrals needs some calculus notation.

But all the set theory, especially, it is not intuitive what A, B, C, F M or whatever letter you use refers to. You have to memorise a code to read it, which means that this notation is essentially encrypted.

All of it much more easily understood in venn diagrams and other visuals. And words can describe it really well too, even more compactly, but without that encryption. It's always the set theory stuff that is the most sleep inducing for me. And the more complicated calculus is beyond my highschool training, yet I can still write the algorithm for it if it's broken down into a linear set of operations.

0 sats \ 1 reply \ @orthzar 20 Jul 2023

A coded example is more human-readable and can provide a nice overview, but it sacrifices fidelity (as you need to be very familiar with the language and functions to understand exactly what is going on under the hood).

The solution to that is to use a very well-specified programming language that hasn't changed for decades either because it's been abanonded or because it's specification simply hasn't changed (e.g. ANSI Common Lisp).

Of course, getting everyone to adopt a single programming language is a fool's errand. The best we can hope for is that one scientific journal will require that papers use one from an approved list.

0 sats \ 0 replies \ @l0k18 OP 20 Jul 2023

Yeah, unfortunately the majority of programmers are only interested in what gets them the best pay for their idiot tolerance level.

Not what is actually the experience of decades of the OG language designers in their work coordinating teams.

I think Forth would be the best pick for precise, formal specifications. BASIC is another solid choice too. C and its descendants until Go are a mess of assumptions and complex syntax and frankly I HATE OOP. I also hate repeating myself. What the heck is a .h file for? The linker? Why can't the lexer generate that???? (oh yeah it does in most languages).

Oh yeah, just look at what languages they use in agricultural research simulators. When I was 11 I got to spend some weeks in the lab of such nature and was the first and only time I ever worked with Lisp and I sorta just dodged the Forth but that's what my supervising researcher worked with. I think that sorta sums up what defines a formally robust language.

SIMPLE.

11 sats \ 1 reply \ @sime 20 Jul 2023

Code rots, mathematics shouldn't.

I agree on the pseudocode point. Perhaps the accuracy is guaranteed in mathematical equations, and pseudocode can be interrupted.

But I, too, feel silly not being able to consume white papers entirely due to the equations presented.

0 sats \ 0 replies \ @l0k18 OP 20 Jul 2023

Code repositories rot.

The pseudocode does not. But you don't have to spend 5 years learning how to read the special, very ideographic language (unlike code, it's two dimensional).

The number one thing you understand as a programmer with moderate understanding of how compilation and interpretation is done on computer languages, is that the concrete language is not as important as the Abstract Syntax Tree, and once you have the AST, you can generate the code for any language.

Thus, the choice of language notation in a paper is something I think in fact the academic tradition is impeding.

We don't live in a world of dusty books, and Sparc thin clients running off a mainframe in the middle of the campus.

We live in the modern world where in theory you could build an operating system on a mobile phone device. Simple, tedious, but possible. Most of the old papers still being read by CS students predate the appearance of parallel processing on PCs. Distributed systems have been out of the academic arena for decades but still the papers on it are written so that it excludes hobbyists and autodidacts like myself.

1 sat \ 4 replies \ @027c352e45 20 Jul 2023

I disagree.

Very occasionally, the authors of these papers will write code themselves in support of their proposed algorithm, which i think is fantastic.

The issue is that cryptography is, fundamentally, mathematics so it should use mathematical formalism to be unambiguous.

That doesn't mean mathematical formalism is a panacea; the field is rife with slight ambiguities in papers leading to terrible mistakes - for one example, look up the 'frozen heart' vulnerability. But replacing mathematical notation with pseudocode, will not make that better.

Adding pseudocode, where it's relevant, otoh, yeah, that can be a great idea. But it makes more sense one layer up - when you write protocol specs. For exa.ple, there was a tradition of using pseudocode in RFCs for internet protocols, e.g. tls 1.0, rfc2246

0 sats \ 3 replies \ @l0k18 OP 20 Jul 2023

Well, I don't write papers, I never could. But I can design a protocol and adapt existing functions and primitives and respect their proper usage as best I can surmise how to do it, and from the various histories of flaws previously.

I think the shortest path from idea to code is better than formality. Build the thing and then critique it.

0 sats \ 0 replies \ @south_korea_ln 21 Jul 2023

Sounds like the Eth approach. Build it, break it, fix it, repeat until you got something as ugly as Eth.

But I presume you're are not talking baselayer bitcoin security. LN allows one to experiment without risking breaking the baselayer. So I mostly agree with you that one should build rather than theorize.

To a certain extent though. Experimenting at the LN Implementation level (LND, etc) or even Umbrel level... going to fast, it shows that bugs there also can have big consequences. Nothing too bad until now, but I've seen a few where I was not totally sure of the outcome.

Following the LN mailing list, often times again a new idea is proposed by someone not too familiar with the topic. Then, an OG, with good math knowledge responds and explains why or why not that would be a good idea avoiding breaking things.

Anyhow, happy to have math guys and builders like you. Both of you are what is moving this field forward.

I'm just an observer for now, so my opinion is moot.

0 sats \ 1 reply \ @027c352e45 20 Jul 2023

Yes I do have a lot of sympathy for the 'build it concretely first, rather than just theorize in abstract papers' point of view. There's wisdom in that. But for some kinds of system, there can be dangers of insufficient analysis of attacks.

Building it first means you get eyes on it. Analysing it theoretically first means you might spot the attacks before they're executed against innocent users.

0 sats \ 0 replies \ @l0k18 OP 20 Jul 2023

At this point, my own protocol design is sponsored by one person so from a business/economic side of things, it's critical to get a prototype running so the hype train leaves the station and people with fat stacks can throw them at people with Ph.Ds and outstanding reputations as whitehat hotdoggers.

Academic formalism is expensive. For business you need it in as far as it creates trust in the product.

1 sat \ 1 reply \ @south_korea_ln 20 Jul 2023

Only math is accurate enough to avoid ambiguity. This is critical in this field.

Additional pseudocode is welcome too of course, your frustration is warranted.

I'm sure someone will be happy to teach you the so specific notation conventions.

0 sats \ 0 replies \ @l0k18 OP 20 Jul 2023

Math notation is not just how to draw it by hand and read it, it means learning MathML as well, and I am already too bamboozled by HTML/CSS that I gave up on being a designer because I could never say I was happy with it. The having to be exactly this makes it even harder. Desmos has a nice markdown like simple notation system that is much like a normal programming language.

0 sats \ 1 reply \ @zuspotirko 22 Jul 2023

I strongly disagree.

Doing formal proofs under premises is hard enough. Don't make it even harder.
Code or pseudocode would be super dangerous because people would use it instead of using battle-tested foss implementations build-in into programming languages and frameworks

0 sats \ 0 replies \ @l0k18 OP 22 Jul 2023

I get that point. But I would also say that the inherent excessive complexity of object oriented programming language syntax should also be abandoned already. Supposedly rust cures C++ common vulnerabilities but then again, I guess C++ people love their complexity.

So do hackers.

Also, these languages and frameworks change as fast as fashions. There needs to be a stronger orthodoxy and conservatism in development. Really not many things change, except for the constant of corporations bloating software until the hardware improvement is invisible.

0 sats \ 0 replies \ @SpaceHodler 21 Jul 2023

Math notation is easier to read and more abstract and therefore easier to get the gist of the subject at hand. Code has too many tech details, is language dependent and what use is it if you don't understand the math anyway, because you haven't studied it enough? If you only want to implement it without understanding it, then you're essentially asking the paper's author to implement it for you.

0 sats \ 0 replies \ @fedelang 21 Jul 2023

The ability to distance yourself from a code monkey relays on how well you understand math abstractions.

This like not wanting to read a book and hoping for a puppeteered version to aid in the comprehension.

0 sats \ 1 reply \ @nkmg1c_ventures 20 Jul 2023

As a math guy, I assure you the writing it in math is to make things faster and clarify later what kinds of mathematical "objects" we're dealing with. There have been examples of cryptographic "proofs" that were offered for primitives where the proof was faulty because the proper mathematical formalisms weren't followed.

I think pseudocode + mathematical proof should be the standard.

0 sats \ 0 replies \ @l0k18 OP 20 Jul 2023

And moar venn diagrams and protocol column tables!

Maybe I have just had my mind poisoned by reading too many shitcoin whitepapers. For me reading the stuff on Solana last year was hilarious. Their site directs you uppermost to some funny hash chain time function thing, and then I'm like, ok, that's kinda cool, and then I dig further and ... ah... that's not what the mainnet is running!!!!!

0 sats \ 1 reply \ @fiatbad 20 Jul 2023

Why not both?

0 sats \ 0 replies \ @l0k18 OP 20 Jul 2023

Just more venn diagrams and pseudocode and less one letter algebra variable names. But most of the time I can infer the things I can't find expressly or clearly defined in a way I can understand. That's because I'm both creative and have a very good memory. The downside is I am very prone to falling into rabbitholes.

0 sats \ 1 reply \ @pillar 20 Jul 2023

I personally agree that there are many concepts I couldn't read in mathematical notation but would be trivial to understand with code.

Having said that, I think this is all about personal bias. People used to math notation feel their notation is obvious. Programmers feel their code is obvious. Normal people think we are all crazy.

0 sats \ 0 replies \ @l0k18 OP 20 Jul 2023

Yeah, personal bias about subjects that really need to be properly hashed out.

Like, can we have a conversation about:

The absurd energy costs and time costs of object oriented languages?
The encryption of algorithms in non-progammerly notation, without any adequate human language equivalent.
The fact that all programs are running on multiprocessors now, can we kill Von Neumann already?
- And thus also, why none of the languages you have to learn for most programming jobs these days do not have any support for concurrency as a first class citizen of the language (it should be in the reserved word/symbol list!).
  - And the fact that even when your program is serial on the network it is now part of a multi-computer processing system and all the rules of sync and locking and concurrency suddenly apply to everything.
Writing out set notation saves zero space in the paper compared to the venn diagram version because the legend is half a page long. It's like making up a new set of icons to draw a map and then spending the same amount of time explaining what each icon represents.
Programmers have to read these things and write accurate, and secure code with it.

But we all know those in their ivory towers never get bug reports or angry stories from users about their data or even livelihood being destroyed by the code that was written from this encrypted "academic" paper.