This is EBNF grammar for ANSI C (C99) and it contains almost every rule. It may be missing stuff, please tell me if you notice something missing.

I am writing a C compiler, with my backend and hopefully my own frontend in OCaml. That is why I wrote this grammar. I also have written the AWK grammar, but it’s not uploaded anywhere. Tell me if you want it.

Thanks.

  • sim642@lemm.ee
    link
    fedilink
    arrow-up
    0
    ·
    4 months ago

    I am currently writing a C compiler, with my own backend (and hopefully, frontend) in OCaml.

    But why write your own C frontend? It’s much more of a pain than people imagine. I maintain a C frontend implemented in OCaml (the project itself goes back 25 years) and it’s still not on par with GCC or Clang.

    For any other language, sure, but C has so many “wonderful” features, starting with the lexer hack. Your grammar conveniently overlooks this issue but it’s something you’ll have to deal with to actually implement it. So it simply won’t be as nice as theory suggests.

    • I think digraphs and trigraphs are part of the preprocessor? I did not add any preprocessor stuff to this grammar. I am adding them to the new version I am working on.

      I have read the C17 standard fully and I did recall it from memory from time to time but it seems like I had forgotten a lot of stuff. I am redefining it, and I am redigning my AWK grammar too.

      I am hoping I could perhaps make a Github pages website called Internet Grammar Database and have all sorts of grammar inside it. Thoughts?

      • navigatron@beehaw.org
        link
        fedilink
        arrow-up
        0
        ·
        4 months ago

        I love grammars. It’s like an API or a data schema, but for a language. This would be very cool and I would love to see it!

      • OmnipotentEntity@beehaw.org
        link
        fedilink
        arrow-up
        0
        ·
        4 months ago

        Trigraphs are handled by the preprocessor, so if you’re not handling that, then that’s fine. Digraphs are handled by the tokenizer, however.