home

Geo Carncross: How to program computers (kos)

this is a transcript of this video: Geo Carncross: How to program computers (kos)

Intro

Geo Carncross

* Big: 1-6TB lzma'd logfiles
  per day from all over the world!

* Fast: Deadline is 50msec

* 100% Uptime is normal for us.

Hello, my name is Geo and I work for a company called Telemetry. We are ad server. Basically web serving for money. We get paid to physically deliver video ad. So if our systems don't work correctly, we don't get paid. My aims are motivated by programs that work. We do a lot of programming and most of it's not in k but today I want you to look at k. Because today I want to talk about code bloat.

Our logfiles are money!

(we do other things. like figthing bad guys committing ad fraud,
 and everyone in my office is enjoying a beer right now...

 let me know if this is something you want to know more about)

---

* I'm part of the reason ads play fine, but your Youtube stutters

How to program computers - code bloat

* Give it a meaningful definition

* Don't be distracted by other,
  informal uses of "code bloat"

* Work the problem as it appears

I define code bloat strictly as redundant code inside a source tree. Features that nobody wants, and whether a compiler can identify and eliminate redundancy or other topics. Code bloat costs a lot. For one, it makes the source code big which makes binaries big and that makes them slow. But multiple programmers working on a large code base cannot know exactly what each other is doing and thinking, so there is bound to be some duplication of effort. So code bloat is obviously bad, but it might be inevitable. However it might not. I am going to tell you about kOS.

kos is a small team (4 people)
kos is tiny

* k is arthur's language; something like scheme+apl
* kdb is a database faster than popular dbms by a factor of 1000x
* z is a gui and window manager. icons, mouse and font
* kos kernel is modesetting. memory, vmm, isr and filesystem

The binary image that is what boots is around 90 kilobytes and that does not use any {packers?} to achieve that. While this may not impressed everyone, it does not impressed (because) you didn't understand what I just said.

kOS is written in k, C and assembly. These lines here:

one system                  kos is tiny
all devices
                             * k is 300 lines of C across 5 files
                             * kdb adds only about 100 lines of C
                             * z adds another 60 lines of C
                             * kos kernel is 100 lines of code/assembly.


that I'm referring to fill my screen the way words on a book fill a page. This provides some significant advantages by making programs as readable as a book and it contributes to why kOS does not contain much redundancy.

I do not however have much time to go into k in general/in detail. But I say the interpreter is very simple and it uses basic table based dispatch. No fancy assembly dispatch or jumping or dynamic recompilation. Does not do any subroutine threading or any program optimisation. Doing a simple loop in k versus C, I would expect every C compiler to optimise this triviality.

kos is how to program computers

  for(i=0;i<1000000;++i)        i:1000000(1+)/1
  real  0m0.008s                real    0m0.191s
  user  0m0.003s                user    0m0.185s
  sys   0m0.002s                sys     0m0.003s
But k won't. Yet you've probably heard of k or kdb at some point, googling, maybe you're googling it now... There are benchmark hiding away in various places on stack and whatnot: kdb is notoriously fast.

     msec
k       1
sql  6400
sqlA 4200

 kdb: select last bid by symbol from quote where date=d,sym in S

sql
  select sym, last(bid order by time) from quote
  where date=d and sym in (select sym from S) group by sym

sqlA
  select sym,bid from
    (select sym,bid,row_number()
        over(partition by sym order by time desc)
        as row from quote where date=d and sym in(select sym from S)
    ) AS q where row=1

So the question comes up: "Why is k fast?". I have said that I think, we should be asking "why is X (or whatever x is) why is it so slow?". And I don't think this is a restatement just by inverting things to be clever. I think it's- I kind of honesty that we don't generally have as programmers about why is our program slow? Why is a program slow? Why is a large commercial relational database made by a company that begins with O slow? And I wish we had time to discuss it further.

* Unnecessary code is a bug and we try to remove it.

* Slow programs are the programmers responsibility.

* Bugs are mistakes we programmers make;
  they are not inevitable or ok.

* The programmer is responsible that the program is used
  correctly.

* "O" complexity ignores that computers are made of hardware

  * Arrays might be O(1) when small but O(log n) when bigger
  * B-trees are a terrible data structure for storing data
  * Hashtables aren't remotely close to O(1)

* "Best practices" are a straw man that make having
  real discussions about making systems difficult.

  * That includes syntax hilighting
  * ...  and object-oriented programming
  * ...  and what is "clean code"?
  * ...  and unit-tests
  * ...  and debian packages
  * ...  and puppet/chef/etc

* Programming correctly is hard!

* Can you write an algorithm shorter or faster than
...

  If so I'd like to talk to you: PUTITYYEUOP

But I want to show you some non-trivial k source code that I did not write. I wanna show you how I produce code. And I wanna show you how I minimise the amount of redundant code that I produce.

[types "edir", a view appears, top bar shows "ERROR: /home/geocar/$/edir.k", deletes three chars back, and adjusts it to .../edit, hits enter, the view disappears, types "edit" this time, a view appears, types "$/e.k"]

This is Arthur's editor:

c::a$"\n";b::0,1+c;d::(#c),|/-':b;h:{c[x]&y+b x};i::x,j-b x:b'j;l:{h/0|x&d-1}
j::*|k;K:{k::2#0|x&*|c};s::i&s|i-w-2;D:{s::0|x&d-w};px:{D s+w*x};wx:{D s+4*x}
J:{K$[S;|\k[0],x;x]};lx:{J j+x};ux:{J l i x,0};hx:{J l i+d*x};mx:{J l s*_x%F}
kx:kr:{U,:,(+\k[0],#x;*k_a);e[k]x};kb:{$[=/k;K j-1 0;];kx""}
e:{a::?[a;x;y];K(*x)+#y};cz:{$[#u;e/_`u;]};cc:{9'*k_a};cx:{kx cc`};cv:{kx@9'`}
f:"";g::a$,f;cf:{cg f::0'"^"};cg:{K(g,1#g)1+g'k}
t::(a x;q[x]|/7 4 4*~(k'x;2!(,/g)'x;^(j 9'a)?x:h\:/s+!:'1+w));q::n 9'a
A:{r::a::$[#x;1:x;,"\n"];n::x;K@#u::()};co:{A@0'""};cs:{n::$[#n;n;0'""]1:a;r::a}
A@*x;w::_W%F;x::(r~a)_"*",n;y::F*i-s;z::(2'(0;W;F*s;F*d);1'0,t)

So the syntax here, c::a$"\n" is pretty simple, [audience laughs]. [opens an interactive console to the left half of the screen] No, seriously.

1: 2+2
4

Two plus two is four. You can see that down here.

1: -6
-6

Minus six negates six. It's very easy. Semicolon separates things. This double colon thing that I want to look at first is interesting 'cause it creates a view.

c::a$"\n"

It's something that doesn't exist in a lot of programming languages. I am not actually aware of any programming languages that it actually exists. And it's not like an SQL view. It's actually kinda like what Excel does. It's a memoized function, I guess... that keeps track of its dependencies automatically. So this expression right here only gets re-evaluated whenever whatever dependent variables change.

So, a is the contents of the buffer we are looking at.

1: a
"c::a$\"\\n\";b::0,1+c;d::(#c),|/-':b;h:{c[x]&y+b x};i::x,j-b x:b'j;
l:{h/0|x&d-1}\nj::*|k;K:{k::2#0|x&*|c};s::i&s|i-w-2;D:{s::0|x&d-w};p
x:{D s+w*x};wx:{D s+4*x}\nJ:{K$[S;|\\k[0],x;x]};lx:{J j+x};ux:{J l i
 x,0};hx:{J l i+d*x};mx:{J l s*_x%F}\nkx:kr:{U,:,(-\\k[0],#x;'k_a};e
[k]x};kb:{$[=/k;K j-1 0;];kx\"\"}\ne:{a::?[a;x;y];K(*x)+#y};cz:{$[#u
;e/_`u;]};cc:{9'*k_a};cx:{kx cc`};cv:{kx@9'`}\nf:\"\";g::a$,f;cf:{cg
 f::0'\"^\"};cg:{K(g,1#g)1+g'k}\nt::(a x;q[x]|/7 4 4*~(k'x;2!(,/g)'x
;^(j 9'a)?x:h\\:/s+!:'1+w));q::n 9'a\nA:{r::a::$[#x;1:x;,\"\\n\"];n:
:x;K@#u::()};co:{A@0'\"\"};cs:{n::$[#n;n;0'\"\"]1:a;r::a}\nA@*x;w::_
W%F;x::(r~a)_\"*\",n;y::F*i-s;z::(2'(0;W;F*s;F*d);1'0,t)\n"

And c are these offsets.

1: c
77 155 233 294 373 422 493 574 638 639
You can see here:
c::a$"\n"

c is a view where a is a newline. That's all that is. [closes the interactive console, which makes the "$/e.k"'s window full screen]

b is a view that zero joined one plus c.

b::0,1+c

So c is the offset of all the newlines, one plus that is going to be character after the newline put a zero in front of it: it's the offset of the beginning of every line.

Even when you know the basics of the syntax, having Arthur's reference manual handy helps.

[types "view" to the input bar, which opens a new window to the right half, then types "k.txt" in the input bar]
Verb                Adverb                 Noun(null list)
: gets     self     ' each|bn ': eachprior Bool 0b   011b
+ plus     flip     / over|sv /: eachright Char " "  "ab"
- minus    negate   \ scan|vs \: eachleft  Int  0N   2 23
* times    first                           Flt  0n   2 .3
% div      sqrt                            Sym  `    ``ab
! mod|dct  til|key
& min|and  where    System Verbs           date 2032.03.01
| max|or   reverse  0: write read(line)    time 0T12:34:56
< less     asc      1: write read(byte)
> more     desc     2: sock  open/close    list (2;3.4;`c)
= equal    group    3: http  kson          dict [a:2;b::a]
~ match    not                             func {(+/x)%#x}
, join     enlist                          view f::32+1.8*c
^ except   null
# take|rsh count                           \l a.k  load
_ drop|cut floor                           \t:n x  time
$ cast|pad string   $[c;t;f]               \v variables
? find|rnd unique   ?[x;y;z[;r]]           \f functions
@ at       type     @[x;y;z[;r]]           \w workspace
. eval     dot      .[x;y;z[;r]]           \d directory

function parameters are x,y,z.
unary list needs ",", e.g. 2(atom) ,2(list)
unary verb needs ":", e.g. #'(take each) #:'(count each)
parse: E:E;e|e e:nve|te| t:n|v v:tA|V n:(E)|{E}|[E]|t[E]|N
overload: L!L !D .D i? i$ F$(+/*) S$(ss)

d, up here, is a view

d::(#c),|/-':b

of the length of c, that is the number of lines, joined with the max of the difference of each element of b. So the lengths of each line. So this is the max length of each line joined with the number of lines. This is d, is the rectangular size of whatever value you're looking at.

h, is a lambda function.

h:{c[x]&y+b x}

The first argument is x, second is y if there's one. Functions with more than three arguments are confusing so k doesn't really have them [audience chuckles]. Function calling and array indexing use the same syntax. This is a bit unusual but has interesting and very powerful ramifications. So x applied to b which you can see here (b x) is the same as looking up the xth element of b. So remember b is the offset of each line, and we add y to it. We look up the xth element of c. Here's our familiar bracket syntax. And take the min of both.

So given line x and column y; h calculates the offset inside a but capped at the end of each line.

As we can see k is a very dense language and there is a lot here. When I first program, I start with something that I want to do, the first thing that I did here in k -my first day programming, learning k- was to deal with the fact that my Mac does not have any home or end keys on it. I use Emacs so I'll actually add control-a and control-e. Now to do this, let's look at the documentation for the windowing interface.

[opens another window and types "z2.txt" in the input bar]
EVENT (override with zf)
lx i / leftright(i is -1 or 1)
hx i / homeend
ux i / updown
px i / pageupdown
wx i / wheelupdown

mx v / mouseclick
mm v / mousemove
mk v / mouseselect

kx k / key
kb   / back
kr   / return
kt   / tab
f[1-9] func key
c[a-z] ctrl key(cc cx cv cf cg co cs cz ..)

OTHER
2'(v;w;s;d) / scrollbars
C:0'C / prompt(ESC to escape)
S:    / shift key
t:9'` / cv(get clip)
  9't / cc(set clip)
j 9'a / parens, e.g.  (4)0'"{()}\"\"//asdf" is 0 3
n 9'a / syntax, e.g. ".c"0'"{()}\"\"//asdf" is 000011222222

I can see here that hx handles two dimensional home and end. And ck would be control-k. So I think it should be something like this, [types]:

ca:{hx 0 -1};ce:{hx 0 1}

We can try that out actually. [opens another view, types "whatever" and verifies pressing ca or ce move the cursor].

OK. I'd like to add "delete" as well and I don't think Arthur's implemented this. But where do I start? I know hx moves the cursor, so I start by reading the definition of hx. I see d times x plus i:

i+d*x

remember the d is the rectangular dimensions of the file. This x will be vector. This will produce the number of characters to move. But what is i and l and J?

hx:{J l i+d*x}

[hits control-f in the editor and types "i:" to search the definition of i]

i::x,j-b x:b'j

OK, so this is using the bin (') operator here. So this is finding the first index in b that j is equal-to or greater-than. So we save that line number as x. Because j is some byte offset. So you'll note, this (i::...) is a view and x is being reused, this is not "x, the argument" like it was with h, this is just a scratch variable. So we assign to x that line number that j exists on. We apply that to b (b x:..) and then we subtract it from j. Remember j is the byte offset. So this is going to give us the number of characters in the line that it is. And here's the x again (x,...) which we know is the line number. So i is going to be a view of the y and x coordinates of whatever j happens to be.

[moves to l's definition]
l:{h/0|x&d-1}

OK, so this takes the rectangular dimensions of d. Subtracts one from it. Takes the min of that in x which is our argument. So you see this contraption often. This is just capping it to whatever the dimensions are. So you have x -which is, you know, some 2d coordinates- we're just capping it to whatever d is. And then we apply that over h. Which we'd go over another time if I would had more time. Let's look at J though because that was interesting.

[moves to J's definition]
J:{K$[S;|\k[0],x;x]}

So J applies K. This is what a conditional looks like in the k language, this dollar sign here ($[..;..;..]). S, we can note, down here, is the shift key. So if shift is pressed do this thing (|\k[0],x) otherwise do this thing (x). So when you are unshifted it's going to apply x to K. So let's just go look at what K is. I can see it right here.

[moves to K's definition]
K:{k::2#0|x&*|c}

OK, so this reverses c. You can follow along here (reference manual: "| max|or reverse..."). Takes the first of that, so the last line of c. And then the min of that and x, whatever that is. And we do the max of that - of zero on that. So this is our capping thing again. And we take two from that. This double colon here is going to now not mean "view", it's going to mean "define". So this is going to set the global variable k (k::2#...). So this uppercase K sets the lowercase k to: take two values of the value capped in there capped by zero and the number of lines, c. So this is setting k to two points, two byte offsets. Now, remember last element of c is gonna contain the byte offset, the last line of the file. So this is gonna cap us some place inside the file. And k is taking two, so it's taking two dimensional. So let's go back and look at shift for a moment, we can see that if we're shifted, we keep the original value of k and that's why you can see over here, this is actually how shift-select works. [shows it in the text "whatever"].

So with that in mind, I think we have enough information to actually write "delete", [types]:

cd:{K j+!2;kx""}

OK, So this is j is our byte offset inside the file. But it could be two dimensional. We're going to add- This bang two (!2) means, that it means zero and one, zero and whatever up to whatever value you put there, so "bang five" (!5) would be zero one two three four. So j plus that is gonna do the same rules. It's going to take whatever the current cursor position is and the character after it and assign it to k which does the selection. And then kx double quote (kx"") -you can see what kx does- it sets the key, that's how we insert text to the editor. So if I save that, [hits control-s, opens another view, types "foo again" and deletes the characters one by one using control-d], and delete works. Phew! I hope you enjoyed that. [audience chuckles]

what else?

So, is code bloat inevitable? Code written like this definitely violates what a generally considered "best practices" everywhere.

hello world

main(){puts("Hello world");}  k:"Hello World"
real  0m0.006s                real  0m0.016s
user  0m0.001s                real  0m0.012s
sys   0m0.002s                real  0m0.006s
One can write a tiny hello world that runs fast and without bugs. And it's generally accepted that this doesn't extend to larger programs. I hope that I've demonstrated that this is false.

kOS is not ready for users. I don't know when it will be and I don't really want to talk about that.

* Unnecessary code is a bug and we try to remove it.

* Slow programs are the programmers responsibility.

* Bugs are mistakes we programmers make;
  they are not inevitable or ok.

* The programmer is responsible that the program is used
  correctly.

* "O" complexity ignores that computers are made of hardware

  * Arrays might be O(1) when small but O(log n) when bigger
  * B-trees are a terrible data structure for storing data
  * Hashtables aren't remotely close to O(1)

* "Best practices" are a straw man that make having
  real discussions about making systems difficult.

  * That includes syntax hilighting
  * ...  and object-oriented programming
  * ...  and what is "clean code"?
  * ...  and unit-tests
  * ...  and debian packages
  * ...  and puppet/chef/etc

* Programming correctly is hard!

But I do want to talk about programming and specifically how to program and I hope that by giving you this view of kOS we can have a conversation about how to program. That's a little different. There turns out to be a lot of usefulness that follows from this idea that I'm pointing to here... whatever this is [shows the screen running kOS]. And I am certain that because it's so useful that this may be evidence that we as a human race don't actually understand programming very well. It's because of that the term best practices so suspicious to me, it's actually banned where I work, the term. I mean you have to justify even if everybody else is doing it.

Just some final notes, I get asked about something called kona that maybe is easier to find on the internet. It is an open source re-implementation of an old version of k. I don't know very much about it. I know that it's bigger and slower than k. I know that it does not contain kdb. And so probably does not represent k very well if you get it looking to learn k, this is not going to help. If you want to try k, you can just download the 32bit version of kdb from kx's website. And just push backslash (\) to get from q to k. And I've covered about as much as I can in the time that has been available to me. So if anybody wants to talk about programming, get in touch. Last four lines by the way,

n:0;p:0:"p.md";co:{p::0:0'""}
c::(&"#"=*:'p),#p;a::*(1 0+c[n,n+1])_p;x::($n),p c n
z::{1'((x**F),0;a[x])}'!#a;
lx:{2'(0;W;7);n::|/0,&/(-2+#c),n+x}
are the source code to the presentation tool that you are looking at there. [audience laughs]. Thank you. [applause]