Show HN: A dynamic C (Hot reloading) module-based Web Framework

128 points by warothia 8 months ago

openasocket 8 months ago

Cool! Looks like is used dynamic loading. Which is certainly workable. One downside is that while dynamic loading is well-supported, dynamic unloading is generally not. Meaning if you are using dynamic loading and re-loading for hot updates the server process will start to accumulate memory. Not a huge issue of the server process is periodically restarted, but can potentially cause weird bugs.

You might be able to make something more robust by forking and loading the modules in a separate process. Then do something fancy with shared memory between the main process and the module processes. But I haven’t looked into this much

delusional 8 months ago

Once you start forking wouldn't it make more sense to just exec? Woops, CGI.
shakna 8 months ago

Doesn't dlclose unload things...?
> If the reference count drops to zero and no other loaded libraries use symbols in it, then the dynamic library is unloaded.
- stevenhuang 8 months ago
  
  It's implementation defined. The dlclose may be a noop and that's fine as far as POSIX is concerned.
  So generally you have no guarantee if your libc actually unmaps the shared object, due to various reasons. There are ways to get it to unload, but that entails digging around platform specific dlopen flags and ensuring symbols in your shared object doesn't use certain load types (unique/nodelete). Thread local storage/destructors also further complicate things.
  Some libcs like musl dlclose don't do anything for example and just leave things to be unloaded on program exit.
  - shakna 8 months ago
    
    Look at musl, dlclose marks a library as invalid, so when reallocations happened next the pointers get reused. It's garbage collected, but things get unloaded.
    Though, you do seem to be correct that POSIX allows things to remain in the address space:
    > The use of dlclose() reflects a statement of intent on the part of the process, but does not create any requirement upon the implementation, such as removal of the code or symbols referenced by handle. Once an object has been closed using dlclose() an application should assume that its symbols are no longer available to dlsym(). All objects loaded automatically as a result of invoking dlopen() on the referenced object shall also be closed if this is the last reference to it.
    > Although a dlclose() operation is not required to remove structures from an address space, neither is an implementation prohibited from doing so. The only restriction on such a removal is that no object shall be removed to which references have been relocated, until or unless all such references are removed. For instance, an object that had been loaded with a dlopen() operation specifying the RTLD_GLOBAL flag might provide a target for dynamic relocations performed in the processing of other objects-in such environments, an application may assume that no relocation, once made, shall be undone or remade unless the object requiring the relocation has itself been removed. [0]
    [0] https://pubs.opengroup.org/onlinepubs/009696799/functions/dl...
    
    stevenhuang 8 months ago
    
    > Address space from a library remains tied up even after dlclose on musl, so opening and closing an infinite family of libraries will eventually consume the entire address space and other resources
    From musl docs https://wiki.musl-libc.org/functional-differences-from-glibc...
warothia 8 months ago

Oh really interesting, will look into it! Was afraid it would leak memory.
- mst 8 months ago
  
  You might not necessarily need even shared memory - it's possible to pass a file descriptor over a socket on all modern unices (and albeit with a completely different API also on Win32) so your control process could do an accept(), maybe read the headers in there, then talk to module processes to determine what the desired approach is, and then hand over the http socket so the module process can do whatever it needs to with it.
  When I imagine how I'd use this in my head the imagined design rapidly gets much more complicated than what you're currently doing, and I'm not at all arguing that that complexity is necessarily worth it ... but there's also all sorts of cool+weird things you could implement that way that would be exceedingly tricky otherwise, so I figured I'd point it out anyway :D
  (the example of code that does the relevant magic to get fds across a socket that immediately springs to mind is https://fastapi.metacpan.org/source/MLEHMANN/IO-FDPass-1.3/F... - yes, warning, it's inside a perl extension, but I see no reason that would impede you borrowing the C parts if it was useful ;)
  - warothia 8 months ago
    
    Initially I did use fork a lot to allow each handler in its own “process” mainly because I wanted to isolate it as well in a container fashion. However the overhead became to much at the start when I just wanted it to work. Will be looking into it again!
    
    mst 8 months ago
    
    I think some stuff will be happier in the "director" process and some happier in its own "worker" process - and fd passing is the magic trick that lets you accept() in the former and still do the bulk of the work in an appropriate instance of the latter.
    I 100% get the 'overhead' part - but at some point hopefully you'll have enough other stuff already running that the 'fun' factor of enabling that will win out :D

warothia 8 months ago

This hobby project is inspired by kernel modules and AWS Lambda. It lets you write raw C modules, upload them, and have the server compile and run only the module itself at runtime—no need to recompile the entire application or restart the server.

You can update routes and modules dynamically. Even WebSocket handlers can be updated without dropping existing connections.

dakom 8 months ago

Love this :)

Small, related anecdote: back in the year 2001 or so, this is in the same family of how I built websites...

The difference is I wrote in in C as Apache Modules. So, like, most people were using other people's C modules (like PHP or CGI), but once you dug deeper and you wrote your logic/site _as_ a C module, it was so much more fun (and powerful too).

I didn't have much of a templating language, so the answer to "can we change the text on this page?" was usually, "sure, just give me a few minutes to recompile the whole Apache server" :D

Fun times

warothia 8 months ago

Few others have brought up Apache Modules, and they are incredibly similar to my idea. :D Did not know about them while I was developing it. The main difference as far as I could see was the fact that you had to recompile / restart the server. Which I try to avoid, so little changes require almost no recompiling.
- mst 8 months ago
  
  In apache 1.3 it was far from unusual to do a complete bundled compile and restart the entire thing every time because there were gremlins that showed up when you dynamically loaded the more complex modules often enough that it was operationally less aggravating overall to take the brute force approach (I did quite a bit of that a couple decades back, for my sins).
  apache 2+ is a very different (and rather more robust) beast, and also has the 'graceful restart' system - see https://httpd.apache.org/docs/2.4/stopping.html - which makes the parent tell its worker processes to drain their request queues, -then- exit, after which each one is replaced in turn until you've fully upgraded to the new configuration+code.
  This approach has its disadvantages, of course, but not that morally different from how erlang processes hot reload into new code, and once you knew what you were doing the end result was simple, predictable, and nicely transparent to end users.
- mananaysiempre 8 months ago
  
  You might also want to look into ISAPI extensions[1,2] in Microsoft’s IIS, those are also just DLLs that the web server loads into itself, and were once advertised as the most performant way to serve dynamic stuff from it. It doesn’t look like there’s a way to request that extensions be reloaded, though: the server either unloads them at its discretion (once no in-flight requests are using them?) or not at all (if “extension caching” is enabled). But there’s an advert[3] from somebody who shimmed that capability onto it back in 2006.
  (You wouldn’t have had a good day debugging these things, mind you. But it’s something that people experimented with back in the day, alongside Web servers programmable in Java[4] or Tcl[5].)
  [1] https://learn.microsoft.com/en-us/previous-versions/iis/6.0-...
  [2] http://library.thedatadungeon.com/msdn-1998-06/IISRef/devdoc...
  [3] https://www.iis.net/downloads/community/2006/12/isapi-loader
  [4] https://www.w3.org/Jigsaw/Overview.html
  [5] https://wiki.tcl-lang.org/page/AOLserver
- incanus77 8 months ago
  
  Nah, you could just send a SIGHUP and not have to fully restart.
  - warothia 8 months ago
    
    Oh! Did not know that, interesting. I guess they are more alike then.
adamrezich 8 months ago

Where can one find resources about writing Apache Modules?
When I was experimenting with writing my own HTTP server, I eventually figured out that I'm not really interested in writing my own production-quality server from the ground up—instead, I might be interested in just writing an application layer, in the form of a module for Apache, or nginx, or something. But the resources to create such modules seem to be scarce and/or hard-to-find.

dchristian 8 months ago

I keep wondering how this compares to Zig and WASM?

Zig has a nicer syntax with fewer foot guns than C. It can also compile or link with C.

warothia 8 months ago

It would be really interesting if it was possible to use other languages which can be compiled to .so files. Will look into it!

Gollapalli 8 months ago

Rad. Combine this with wasm on the front end and you’ve got full stack C web development.

iso8859-1 8 months ago

Can you use dlopen in wasm already?

revskill 8 months ago

How ti integrate with extermal library ?

warothia 8 months ago

If you would want to link with external libraries, you would need to modify to server and make sure it has access to them when compiling the modules.
nurettin 8 months ago
You would need to change some code, or parameterize this:
```
    #define LIBS "-L./libs -lmodule -ljansson
```

gwbas1c 8 months ago

Important question: Why would anyone develop a web application in C? Typically web applications lean heavily on garbage collection and memory safety, because their bottlenecks are very rarely CPU/memory issues. The ROI on manually managing memory just isn't there for a typical web application.

warothia 8 months ago

I could say speed, but the main reason for me is because it is fun. And I like to see what I can make C do. :D
- mst 8 months ago
  
  "Because I can" remains an entirely legitimate reason for a hobby project.
  If anything, you've gone further along the "also (at least sort of) practical" scale than I expected.
  Given as mentioned elsewhere a per-request arena + bump allocator system, it might actually be -genuinely- practical (to the extent that writing application logic in C is at all ;)
  Bravo.
  - warothia 8 months ago
    
    Thanks! Yes, an arena allocator for each request is on my todo list. Just didn’t get to implementing it yet. :D
- kaba0 8 months ago
  
  The second point is absolutely fair and your project is very cool and impressive, but the speed one is misleading. I am fairly sure you actually leave a fair bit of performance on the table simply by how convoluted parallelism and async IO are in C, and something like Java might easily outperform it in standard CRUD backend use cases.
  - warothia 8 months ago
    
    You’re absolutely right, I have not tried very hard to optimize for speed either yet. To comment was more directed at the fact most just say that “speed” is the main reason to use C, but for me it’s almost exclusively for the fun and “cool” factor.
williamcotton 8 months ago

> The ROI on manually managing memory just isn't there for a typical web application.
You can use a per-request memory arena built with a simple bump allocator and then free the entire block when the request has been handled.
- kaba0 8 months ago
  
  Still, why would I want to write C?
  Then use a script language with similar memory semantics, PHP started out exactly that way, if I'm not mistaken.
SvenL 8 months ago

On the other hand memory management in web applications is quite easy. Most of the stuff is only required for the lifetime of a request. Some stuff needs to be available the whole application life time.
- smt88 8 months ago
  
  You can do this with other languages (C# for example) as well. Memory is so cheap, though, that most companies should spend their money on increasing memory rather than on paying programmers to optimize memory usage.
  - marginalia_nu 8 months ago
    
    I don't think memory usage is the problem, but rather allocation costs and memory layout, i.e. performance.
    Serving web traffic simply isn't a very memory hungry task.
  - SvenL 8 months ago
    
    Yes, for most use cases it doesn’t really matter which language is chosen.
    Regarding just spending more money on memory - I agree that it’s definitely cheaper but it’s not only about wasting bytes of memory. If the garbage collector has a lot of work to do it may also impact response time/throughput.
    And yes, C# did a pretty good job with implementing mechanisms for reducing allocations on a language level. This definitely helps to reduce garbage collection.
  - HexDecOctBin 8 months ago
    
    Cheap for whom? American programmers with FAANG salaries, or an under-funded third world NGO?
    
    kaba0 8 months ago
    
    Both. My phone could easily serve thousands of concurrent users.
mariocesar 8 months ago

The title "Hobby Project" makes the point right from the beginning
wwweston 8 months ago

Speaking as someone who has done this back in the early wild days of the web:
* if what you're vending is the software instead of the service (not what people usually do now, but there was a time), then this approach does provide for some obfuscation of IP and various secrets.
* for some demand/resource profiles, CPU & memory issues are a lot easier to run into. The one I experienced with this project was targeting a serious e-commerce product to the context of 20-30 year old shared hosting environments (again, not what people would do now), but there may be different situational niches today.
* familiarity. Sometimes you use what you know. And in the late 90s today's most popular web languages were still years away from being the convenient platform they'd become. The other popular options were Perl, maybe Java, possibly ColdFusion/VB/PHP.
That said, you're correct: memory management was a pain, and by 2005 or so it was pretty clear that programmer cycles were as or more valuable than CPU and respectable frameworks were starting to coalesce in languages much better suited for string manipulation, so the ROI was not great. And of course, today you have other systems languages available like Go and Rust...
- smt88 8 months ago
  
  > if what you're vending is the software instead of the service (not what people usually do now, but there was a time)
  I'm very curious what this means. Can you give an example?
  - Philpax 8 months ago
    
    Giving the client the executable to run, not running it for them. This means you can't hide the artifact from the client; for an interpreted language, this means recovery of the source code would be much easier than with a compiled output.
    That being said, it's still possible to reverse engineer the code; it just makes it harder.
  - dsp_person 8 months ago
    
    You could also run a limited demo of your desktop application in a browser, where people have to pay to get access to the full thing.
koito17 8 months ago

Back then, C was one of a few viable choices. The original implementation of the 2ch BBS was written in C.[0] Later revisions used Perl. Between 1998 and 2001, the site was a widely-used BBS and written in C.
[0] https://github.com/nekoruri/readcgi
_gabe_ 8 months ago

I’m all for using C for native development (and because I find it fun to work in occasionally), but I agree with your sentiment here. Not only do you have to manage memory manually, but you also have to do a lot more work for basic string manipulation which is the vast majority of web work. I would much rather work with a language with built in support for string manipulation and well known 3rd party libraries with good ergonomics for working with JSON, which a lot of your APIs will most likely be using.
Aurornis 8 months ago

Lightweight web frameworks are great for embedded applications.
- gwbas1c 8 months ago
  
  Take a few minutes to read through the use case.
  This isn't something that I would use for an embedded application. The fact that it allows uploading a compiled binary implies that it's for developing a web application in C, as opposed to merely adding a web endpoint to an embedded application.
lelanthran 8 months ago

> Typically web applications lean heavily on garbage collection and memory safety, because their bottlenecks are very rarely CPU/memory issues.
I dunno about this assertion. Maybe it seems like the bottleneck is rarely CPU/memory when you're throwing 1GB RAM + dedicated instance at a webapp, but, for example Jenkins absolutely trashes any 1GB RAM instance because it runs out of RAM and/or CPU.
My homegrown builder/runner CI/CD system, running the same `go build/test` commands, the same `git checkout` commands etc, written in C, peaks at a mere 60MB of RAM usage.
I feel we are collectively underestimating just how much extra RAM is needed the popular languages that run a typical GC.
[EDIT: I no longer even use my simple C app - I find a `make` cronjob for every 2m uses even less RAM, because there is no web interface anymore, I ssh into that machine to add new projects to the makefile]
cv5005 8 months ago

Global warming.
These days it should be considered immoral to write software that uses inefficient languages/runtimes/abstractions, we simply cannot afford to waste energy doing useless computations anymore.