Drennuz

I strove with none, for none was worth my strife

Servers in Python

Python comes with a set of servers (from low to high level): socket –> socketserver –> HTTPServer.

Socket is the low-level networking interface. Basic steps on the server-side:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) #TCP/IP, create socket
s.bind(('127.0.0.1', 8080))
s.listen(5)
while True:
    conn, addr = s.accept()
    data = conn.recv()
    #processing
    conn.send(response)
    conn.close()

Socketserver simplifies this process by wrapping all socket creation, bind, listen, accept into creating a socketserver instance, and use a requestHandler for all processing.

Again server-side:

class myTCPHandler(socketserver.BaseRequestHandler):
    def handle(self): #must override this function
        data = self.request.recv() #self.request is a socket object for stream services
        #processing
        self.request.send()

s = socketserver.TCPServer((HOST, PORT), myTCPHandler)
s.serve_forever()

All the above are very easy to write and extremely flexible, but also easy to get messy. HTTP Server , especially used with CGIHTTPRequestHandler, help organize code nicely and still allow dynamic content generation. (CGIHTTPRequestHandler mirrors exactly SimpleHTTPRequestHandler in serving static local directory therefore I’ll skip SimpleHTTPRequestHandler).

The simplest cgi server has three parts:

1. main program

handler = http.server.CGIHTTPRequestHandler
httpd = http.server.HTTPServer((HOST, PORT), handler) #HTTPServer is actually TCPServer but storing server_name and server_port, which CGIHTTPRequestHandler requires. 
httpd.serve_forever()

2. your index.html page, in which you specify the cgi script (cgi-bin/*.py) in form’s action attribute.

3. cgi script, placed in cgi-bin/ directory. Several quick points on syntax:

  3.1 first line is the python executable path: #! /usr/local/bin/python3.3

  3.2 get form data through: form = cgi.FieldStorage(). This is a dictionary-like object where you access values using form['key'].value.

  3.3 Python executes all cgi scripts as ‘others’, so make sure permission is granted properly: chmod 0755 scriptName

  3.4 cgi builds response through standard output (print), and remember to print an empty line between HTTP headers and HTML section.

cgi allows nice de-coupling of codes and is in someway very much like a mini-django, but much less daunting and in my mind better (or more flexible) for personal small projects.

Lexer and Parser: basics

For a new comer in programming, the concept of lexing and parsing is probably a major obstacle after the usual HelloWorld program. This post aims to demonstrate the basic idea behind, using ocamlyacc and ocamllex.

First, why ever use lexer and parser? 1. to validate input. 2. To convert complex file format into a data structure for further processing. The latter reason is what’s really useful.

We will use a simple example here. Imagine we have a file containing 1 integer followed by 3 floating numbers each line, something like this:

1 1.1 1.2 1.3 
2 2.1 2.2 2.3
3 3.1 3.2 3.3

Now we want our program to read this file, and output a data structure of (int * float * float * float) list .

Steps involved would be: first the lexer reads the file and return tokens, then the parser process the tokens to return the desired list.

Let’s start with the parser first — since parser does not depend on anything else.

A very simple parser (with extension .mly) looks like this:

%token tokenDefinitions
%start entrypoint
%type  entrypoint
%%
groupName:
    rule1 {action}
    |rule2 {action};
groupName2:
    rule1 {action}
    |rule2 {action};
...

tokenDefinitions are Tokens you define, in this case we’ll define EOF, Int, Real, CR.
entrypoint is the name of the parsing function. returnType is the return type of the entryPoint, in our case (int * float * float * float)list .

groupName are just names for a group of parsing rules, each parsing rule will perform one action when faced with one rule, rule simply being possible patterns of tokens. Note actions must return the same returnType.

Therefore, our myparser.mly file would look like this:

%token CR EOF
%token  <int> Int
%token  <float> Float

%start main
%type <(int * float * float * float)list> main
%%

atom:
    Int Real Real Real {($1,$2,$3,$4)};

main:
    atom CR main {$1::$3}
    |EOF {[]}
    |atom EOF {[$1]};

Next let’s deal with the lexer file (file extension .mll). The basic structure is:

{
open Myparser
other code snippets
}

regular expressions

rule ruleName = parse
    regEx1 {token1}
    |regEx2 {token2}

Since the lexer will use the token definitions given in the parser, you need to include the parser first. In the curly braces you can also write other helper functions.

The next section is a bunch of named regular expressions so that you can use them in the next lexing step.

The final rule session is just to match each regular expressions and return the corresponding token.

ruleName can be regarded as a function name taking lexbuf as input and return tokens, which will be used in the main function in the parser (to be discussed later).

Having defined all our tokens, our lex file would look like this:

{
    open Myparser
}

let digit = ['0' - '9']
let floating = (digit+ '.' digit* | digit* '.' digit+)

rule token = parse
    [' ' '\t'] {token lexbuf}
    |digit+ {Int (int_of_string (Lexing.lexeme lexbuf))}
    |'\n' {CR}
    |eof {EOF}
    |floating {Real (float_of_string (Lexing.lexeme lexbuf))}
    |_ {failwith "lexing error!"}

Next step would be to compile and linking:

ocamlyacc myparser.mly
ocamlc -c myparser.mli
ocamlc -c myparser.ml
ocamllex mylex.mli
ocamlc -c mylex.ml
ocamlmktop myparser.cmo mylex.cmo -o myparser.top

Now you have a custom top-level to play with.

How does this work finally? If you open myparser.ml and look for a function called main, you’ll find something like this:

let main (lexfun: Lexing.lexbuf -> token) (lexbuf: Lexing.lexbuf) = ...

This main function is the function we defined in our parser file (in fact if you name the function to something else it’ll change in myparser.ml accordingly). lexfun is the token function we write in our lex file. Now you know how everything combines together.

Now create a test file, run the top-level and call the main function:

let lexbuf = Lexing.from_channel (open_in "test.dat") in Myparser.main Mylex.token lexbuf;;

You should see the function return an (int * float * float * float) list as expected.

Congratulations!

 

Set up Git (and GitHub)

I am on Mac OS 10.7.5

Install and set-up

The set-up page on GitHub is a nice place to start with. I had one problem following the page though: the curl command didn’t work. As a work around I manually downloaded the osxkeychain (by copying the link to a web browser).

Basic commands

Here are the 3 commands almost enough for basic personal GitHub usage:

  • git add filename
  • git commit -m “comment”
  • git push remote branch


A few comments:

1. git add filename  | git add .

To avoid adding unnecessary files (such as logs and compiled objects) when using git add . , you can add a .gitignore file listing all the files (wildcards supported) to be excluded. In case some of them are already committed, use git -rm –cached filename 

2. git push remote branch

To display the current named remote and branches (as I always forget), use git remote | git branch.

 

Chapter 1. Basic Instruments. 

1. Interest rates

  • derivative traders use LIBOR as short-term risk-free rates.
  • TED spread (3m T-bill vs 3m LIBOR): measure of liquidity in interbank lending
  • corporate curve usually quoted in credit spread over relevant LIBOR curve.
  • callable bond: reinvestment risk. When bond price goes up, issuer more likely to buy back. harder for investor to enter a better deal in a low-interest environment.

2. Equities.

  • repo rate: earned for lending stock to short-sellers (stock loans)
  • income of carry: dividend + repo
  • forward: mark-to-market = 0 at inception
  • basis swap: floating for floating
  • swap: fixed rate determined for 0 MTM value at inception.
  • ccy swap: notional change hands; interest payments not netted. Help raise funds at lower interest rates.

 

< Success Secrets of Sherlock Holmes >

Being a Sherlock Holmes fan, this is a natural buy for me.

> On Sherlock Holmes

-       a steady temperament, an unflappable personality

-       unswerving confidence, laser-like focus

> “A Passion for Definite and Exact Knowledge”

-       Dare to be a little extreme

> “An extraordinary genius for minutiae”

-       boring…but very, very good at what he does.

-       You can’t decide to be conscientious only some of the time; excellence doesn’t work that way.

> “A capital mistake”

-       It was better to keep his mouth shut until he was absolutely sure of the facts.

> “Feed your passion”

-       without giving a second thought to anyone’s opinions -> think of all the time he saved by not having to explain himself!

-       substance wins out over style every single time.

> “Let us calmly define our position”

-       didn’t feel the need to constantly act like a genius

> The fun stuff comes at the end

-       We’d like to fast-forward through the boring parts of life to get to the moments of real fun and rewards.

-       Don’t get discouraged. The bigger your dream, the higher your goals, and the more time and preparation you’ll need to devote to it.

> Rebel at stagnation

-       be prepared to do double-duty and put in long hours; chances are you won’t be able to quit your day job right away.

-       Like a volcano, the inner fire will burst the crusts which confine it and pour forth its pent-up genius in eloquence, in song, in art, or in some favorite industry.

-       Success won’t necessarily come easily or quickly. Both a doctor and a writer — “medicine in the day, sometimes a little writing at night”.

> Lexicon

-      Like all other arts, the Science of Deduction and Analysis is one which can only be acquired by long and patient study.

-     They say that genius is an infinite capacity for taking pains.

-     My dear Watson, I cannot agree with those who rank modesty among the virtues. To the logician all things should be seen exactly as they are, and to underestimate one’s self is as much a departure from truth as to exaggerate one’s own powers.

Structure of BASIC interpreter in OCaml

This is an example of a simple BASIC interpreter given in the book <Developing applications with OCaml>. It is a very good example for rookies as it starts with BNF definition and goes into all the details of implementation. However I found the organization of the chapter a bit scattered therefore decides to write down my own understanding.

  •  Basic definition structure: expression -> command -> program = list of command; phrase = “super-command” used in interactive environment.
  • 2 main stages: 1). Parsing. From user input (string) to phrase . Trick is to match the first lexeme and use an int to record current parsing position.  2). Execution. In which different actions triggered according to the phrase. The key here is the concept of state updating. 
  • In parsing, the output phrase is determined through matching the first lexeme: cases of “LIST”/”RUN”/”END” are easy; for Lint, the following command needs to be parsed.
  • Again, parse_cmd matches the first lexeme of the command, and uses parse_exp (where stack building and collapsing are used).
  • In execution, we use a list to store commands. Therefore case Line and List are trivial: simply insert command to the list according to the line number or print out all commands (pretty printing; priority and parenthesis).
  • For RUN, firstly we assemble the command list, meaning putting things in right order, i.e. taking care of branching (also convert to array for constant access time). Then the list is executed resulting state updating, mainly meaning variable updating (the very basic idea of a computer). In this process eval_cmd / eval_exp is used, which do the real calculations.

I have also attached a simple “flow-chart” for the whole process. Forgive the handwriting as I don’t think it very meaningful to put much time on fancy tools to create a fancy chart.

Work log 20120827

This is a reminder, about how to deal with such things better next time.

A few days back, we had one desk breaching its VaR limit.

This desk has a long messy VaR history. Due to some wrong security mappings, its VaR was overstated more than 2x time a year back. I wrote a database tool to identify the wrong mappings and with that we brought down the VaR to a reasonable level until then.

With the trader asking why, it was very natural to go with the original solution. So I run that database tool, and found one basket with stale weighting — not updated for nearly a year.

I told my boss, and we were both over the moon. I shot out an email telling everyone the problem is now solved, and went home with proud.

The trader actually raised his doubt: why did the VaR jump only recently if the basket is stale for almost a year? But we didn’t pay much attention to that.

The next day they run VaR again, with the updated basket, but VaR didn’t change.

I was furious, thinking this was impossible, and shot emails around — asking them to re-run, asking IT to check if the basket is really updated — made a mess all around. Yet nothing changed.

Then my boss asked me to cool down, and we started afresh using a new approach. We looked at the risk drivers’ history, and spot a sharp position increase in one of the index pairs. Then we checked CVaR, and found one side of the pair was missing due to a wrong mapping. The operation team corrected the mapping, and today VaR is down.

Looking back, I made at least 3 mistakes:

  1. I was too obsessed with my old victory, and ignored other possibilities.
  2. I overlooked trader’s doubt about the timing of VaR jump, which is an absolutely valid doubt.
  3. Most importantly, I was too focused on making an impression, by trying to solve the problem quickly, instead of correctly. I was desperately trying to show off.

Although in principle MRM should not do this kind of cleansing work, this is indeed a good lesson: Be humble, and never work for the admiration of others.

OCaml Notes

This serves as a store of frequently-used modules in OCaml. Will keep updating. The plan is to gradually create my own OCaml codebase.

  • skeleton: f_init; f_end; f_mouse; f_key; f_except. Main function: “go” with configuration.
  • readfile: return a list of lines, using recursive — line :: (aux_f ())
  • box drawing (box_config; draw_box_outline; draw_box)
  • random number generation: seed generation
  • sort (List, Array)

————-

Also some reminders:

  • List vs Array: List allow easy append/delete (using h::t recursively -> ?). Array: constant access time to single elements.
  • Use a higher order function to iterate a function that operates through side effects.
  • Basic pattern in GUI application: abstract logic; GUI; user interaction (skeleton/loop);  one config for one stage?
  • Compilation (linking / executable); bytecode compiler vs native compiler
  • Module Digest: hash string to fix-length string (fingerprints), used in cryptography
  • Persistence: save state to external memory / harddisk (compared to RAM in execution); Module Marshal: linearize data structures (concrete only; static-typing not preserved)
  • To save a bitmap more efficiently: create a table containing distinct colors only; store index of color to a string (char_of_int) for a bitmap (color array array); marshal the string to a file. This way you save 5x size.
  • To use Sys.argv in stand-alone executables: main () in src code.

————–

Questions: (for 2nd read)

  • Closure: reference to non-local (w.r.t function) variables.
  • Compilation: bytecode vs native code; stand-alone executables; Zinc
  • gprof doesn’t work?
  • Lazy data structure; Stream (naming inside as recursive call of parser)
  • Option type; Some / None
  • #skipped#: chapter 11 exercise

————–

References:

  • Load dynlink and camlp4o for Stream: http://caml.inria.fr/pub/docs/manual-ocaml/manual021.html#htoc93

————–

Chap7 Compilation

  • native compiler produce standalone executable by default; bytecode compiler: need to switch on -custom flag; otherwise producing executable dependent on ocamlrun

Chap 9: Garbage Collection

  • dynamic memory allocation (heap/stack): explicit / automatic
  • stack: values, function calls; heap: reference pointers
  • explicit: malloc & free. Memory leak & dangling pointer.
  • automatic: reference counter (immediate free; can’t handle circular structure); sweep: graph of accessible values starting from external roots.
  • sweep: mark-and-sweep (grey -> black; white -> hached; black -> white); stop-and-copy (from-space -> to-space; reverse)
  • Other: generational (young vs old); conservative (for immediate value vs address); incremental (can be interrupted and continued)
  • OCaml: 1. minor stop-and-copy; 2. major mark-and-sweep. 3. another mark-and-sweep for interrupted marked cells; 4. compact.
  • Module Gc
  • Module Weak (pointer to re-usable area): ‘a Weak.t = ‘a option array; option = None | Some of ‘a; Weak.create / get / set / check
  • Example: image cache. cache = (filename, image) Weak.t Pointer to the current image, overwrite when cache is full. initialize / search / load (search first, then load from disk)

Chap10: Program Analysis Tools

  • Dependency: ocamldep; makefile-like output
  • Debug: trace -> interactive toplevel; parameters & outputs; don’t support polymorphic/local functions (toplevel only keep global monomorphic types) | debug -> Unix only. ocamlc -g
  • debug: Step -> step-by-step execution; break @ -> breakpoints; print -> Inspect / modify values.
  • profiler: take measurements (#calls of functions, time spent, etc) for optimization. Native code compiler (time spent; gprof *.exe): ocamlopt -p; bytecode compiler (#function calls; ocamlprof *.ml): ocamlcp -p

Chap11: Lexical analysis

  • Lexer (char string -> words); Parser: whether stream of lexeme belong to a grammar
  • Stream: match stream with parser: destructive pattern matching; match -> evaluate -> remove; not exhaustive matching; usage: match first element, delete.
  • Str: to determine a string belong to a reg expression. Common usage: search_forward -> matched_string -> split -> global_replace
  • Genlex.make_lexer: take keyword, return Stream of lexemes.
  • ocamllex: generate lexer on .mll file; lexer function with user-defined args and Lexing.lexbuf, return lexemes
  • grammar: terminal, non-terminal, production rule, start symbol
  • Top-down parsing: start with start symbol until only terminals (left-to-right); bottom-up parsing: start with streams of lexemes until start symbol produced (right-to-left; shift-reduce; stack implementation; ocamlyacc).
  • mly: define tokens (i.e. terminals); define return type of start symbol; define grammar rules, each entry point is a function with lexer + lexbuf as arguments and return a type. Take care of EOL, space and tab.
  • mll: take in regexp, return token (as defined in mly).
  • compile: lexer.mll -> parser.mly -> parser.mli -> lexer.ml -> parser.ml -> main.ml -> executable

Chap 12 Interoperability with C

  • C function: parameters of type value (defined in /usr/local/lib/ocaml/caml/mlvalues.h). Compile: gcc -c -I/usr/…
  • ml file: external:type = “c_name”
  • when #parameters > 5, bytecode compiler: parameters = array of parameters, int paramLength.
  • value representation: int / constant constructor -> int; structural values: pointer to heap. header(size, color, tag – type) + size
  • calling ocaml functions in c: Callback.register “name” f; * (caml_named_value “name”); callback(f, value)
  • exception handling: 1. raise in C and catch in OCaml: Callback.register_exception; caml_raise_constant (or others) . 2. raise in OCaml and catch in c: caml_callback_exn, Is_exception_result, Extract_exception
  • If use C as entry point, compile .ml to .o,  then use cc with runtime library -lcamlrun / -lasmrun
  • store C data type in OCaml heap: use Abstract_type. If Abstract_type pointing to other data structure out of OCaml heap, to avoid memory leak, use finalizing function.

Chap 14 Modules

  • .ml file: implicitly struct…end; .mli file: implicitly sig…end
  • Modules: hide some data types and functions; one module can be applied to multiple signatures –> control levels of access

atobm

After some effort in googling, I finally got the logic behind atobm conversion. And my credits go here.

1. Convert the bitmap (#s and -s, or other self-defined characters) to 1s and 0s.

2. re-write, starting from right, padding with 0s in the front to make a string with length as multiples of 8.

3. Convert each byte to hexadecimal code, then reversely write down those hex codes. Done.

Pivoting a linear system

I encountered this problem when doing one exercise for OCaml.

Pivoting a linear system is to swap rows so that the pivot element (a[i][i]) in each row is not zero, otherwise the system is not solvable.

So far I’ve only come up with 1 algorithm, and this is the same as the solution here. Basically the algorithm searches forward, stops when it finds a non-zero element for that position, then swap the two rows.

For example, to start with this system:

0    5    10

2    7    8

9    0    3

Pivot in row 0, a[0][0] is 0, so the algorithm searches forward, where it finds a good candidate, a[1][0] in row 1, then simply swap the two rows.

This algorithm might work in most of the times, but won’t work for the below case:

0    1    9

3    0    0

5    6    0

In this case the algorithm will end up with:

3    0    0

0    1    9

5    6    0

And give a no-solution error, while this is a simple system indeed solvable.

Am still thinking a more complete and still elegant way to solve this problem.