===============================================================================
blog.notryan.com/009.txt                                       Tue, 05 May 2020
Ryan Jacobs                                                      05:40:00 -0700
                           100 Lines of C, in a Closet
                           . . . . . . . . . . . . . .
===============================================================================

UPDATE:
  Wow this garnered a lot more traffic than I anticipated. I'm going to post
  an update in a follow-up post that breaks down the steps of implementing
  server.c. I've got some interesting hints for y'all on path traversal.
  Additionally, the server is now safe against random netcat garbage that
  doesn't conform to the HTTP/1.1 spec. This is achieved by rejecting requests
  that don't meet the basic criteria: begins with "GET ", contains at least
  one more space, does not include any '/' characters.

  Also, I've just converted the server to use fork() in addition to accept().
  The throughput has dropped from 20k req/s down to 8k... because of the
  process forking overhead. But now we can handle concurrent requests. I think
  it is a worthy trade-off.

-------------------------------------------------------------------------------

Heyo! I think I'm all done. I've reached a feature completeness I'm proud of.

This blog is now nearly entirely self-hosted. I am running it on a Thinkpad
T440p laptop in my closet! View the source code at https://blog.notryan.com

Here's what the stack looks like:
  * KVM Virtual Machine -> Void Linux

    * It can do live migrations to any device on my LAN, with zero downtime.
      KVM is really cool! It transfers the RAM continuously until it's
      fully synced. This takes about 30 seconds.

    * The VM's disk storage is on a NFS-mounted DRBD (Distributed Replicated
      Block Device). If you haven't heard of DRBD before, boy, are in you luck.
      Oh, it's just some fantastically reliable block device replication software
      that operates as a kernel module. And what's that? It's been in the
      mainline tree since around the 2.6 kernel release. So you probably
      already have it! Just install the userspace tools.

  * server.c HTTP server (https://blog.notryan.com/server.c)

  * Let's Encrypt (via Redbird)

  * FRP (fast reverse proxy) -> Vultr VPS (207.246.103.60 ingress IPv4)
    * $3.50/month is the cheapest price for a machine with an IPv4 address
      that can act as my ingress for *everything*. It looks at the HTTP
      `Host:` header to determine how to route requests to different VMs.

In theory, I could open a port on my local router and have the ISP act
as the ingress. But... I'm not super comfortable with exposing my WAN's IP
address in case someone decides to DDOS my network.

-------------------------------------------------------------------------------
                                   SECTIONS

  1. What else is running in my closet?
  2. server.c
  3. Moving away from GitHub
  4. Moving away from Netlify / Cloudflare
  5. rss.c
  6. Future Plans
-------------------------------------------------------------------------------
                        What else is running my closet?

* https://videos.rmj.us
* https://notryan.com
* https://notryan.com/pdf
* WebFPGA (https://beta.webfpga.io)
* WebFPGA Forum (https://forum.webfpga.com)

I'm honestly amazed by how snappy the Discourse forum is on my home network
ISP (Spectrum, Los Angeles). Also, go ahead and run a synthesis job on
the WebFPGA IDE -- it's even faster on our local servers.

Look at this way, cloud VPSs are a total rip-off in terms of compute power.
The two reasons I would rent a VPS is: uptime and network access. Having a
publicly reachable IP address is a must-have for any networked application.
Most VPS providers charge $5/month for one vCore and 512 MB of RAM. I can buy a
used Thinkpad T440p off eBay for a little over $100. It has 8 vCores and 8 GB
of RAM. It has a battery life of four hours. If someone happens to trip the
breaker, the laptop won't die. It's like having an automatic UPS built into the
server!

By moving everything to on-premise machines, I've cut the WebFPGA server bills
from $80/month (DigitalOcean) to $3.50/month for the IPv4 ingress. I have a
full Kuberenetes install on-premise hosting the backend. Yes, there might be
downtime. But the power has gone out only twice in three years.  And the
internet has never gone out. I have redundancy plans. I can spin up the same
cluster setup within 10-15 minutes of being notified. But honestly, I used to
view downtime as the end of the world, but I've discovered that most users
are relatively congenial about it. That's not an excuse to slack though!
We all dream of 100% uptime. I'm honestly okay with 99.9% uptime; that's
about 9 hours per month. Not bad.

-------------------------------------------------------------------------------
                                    server.c

  ryan@kk ~ $ wrk http://localhost:8080

  Running 10s test @ http://localhost:8080
    2 threads and 10 connections
    Thread Stats   Avg      Stdev     Max   +/- Stdev
      Latency   427.45us  230.79us   5.64ms   87.28%
      Req/Sec     9.09k     1.03k   14.45k    63.68%
    181879 requests in 10.10s, 198.26MB read

  Requests/sec:  18008.57
  Transfer/sec:     19.63MB

I've made some assumptions when developing this server. So there are a few
_quirks_. Namely, the server has two content-type modes. It either serves
index.html as `text/html`... and everything else as `text/plain`. Additionally,
it uses blocking input/output: accept()->read()->write()->shutdown().
Someone could... block the next user from fetching by running
`nc remote.name 80` without sending any data, blocking the server from any
requests. But since this server is behind a proxy that looks at the HTTP
header's for the `Host:` field... we can assume that our own proxy won't
screw us over. Maliciously long requests won't harm us.

-------------------------------------------------------------------------------
                            Moving away from GitHub

Pushing to Github is perceptibly slow compared with pushing via SSH to a VPS.
Plus using your own machines allows you to do *immoral stuff*, such as
pushing 100 MB files, if you feel like it. I have versioned controlled,
quite a few mega-repos where the primary content is not text files at all.
Shame me all you want, but being able to time-travel through my photo/video
collections is fantastic. I don't really care that my 2 GB repo takes 4 GB
on disk... Tangents aside, here is the time it takes to push a text file to
Github versus my VPS. (Note: the VPS pathway is client->Vultr->VPS)

  # Time it takes to push a change to a simple text file.
  ryan@kk ~/blog.notryan.com $ time git push origin
  real    0m5.123s

  ryan@kk ~/blog.notryan.com $ time git push mir
  real    0m1.423s

-------------------------------------------------------------------------------
                     Moving away from Netlify / Cloudflare

Previously, my static site deploy methodology was to push to Github,
then have Netlify automatically pick that up, build and deploy. However...
Netlify uses CDNs which takes time to propagate. On top of that, their
build process is not the most efficient for my use case because they
do a full git clone and spin up what I assume to be a Docker container
that installs Node.js/Ruby/etc. Average time to visibility was about 2 minutes.
A small part of that is Cloudflare caching the website "for speed". I've
disabled my Cloudflare proxy. I don't want the Internet to get too reliant
on these guys, so I'm doing my part to self-host.

Now, my current process builds and deploys in less than a second. And my site
is live immediately. I'm not anticipating thousands of requests per second
and global access equality. I don't really need CDNs... Sorry if my site takes
100ms longer to load where you live.

My current build is as such:
  git push
    -> triggers a git post-update hook
      -> git restore .
      -> git pull
      -> ./build.sh

This goes for my entire site. Not just my blog. It's nice because it's only
pulling the minute changes I made. The build scripts happen so fast that
I don't even bother forking the process. When I deploy, I see the full
log in my client's terminal. For example, this push went live in less than
a second:

  ryan@kk ~/blog.notryan.com $ git push

  Enumerating objects: 7, done.
  Counting objects: 100% (7/7), done.
  Delta compression using up to 4 threads
  Compressing objects: 100% (4/4), done.
  Writing objects: 100% (4/4), 1.50 KiB | 1.50 MiB/s, done.
  Total 4 (delta 3), reused 0 (delta 0), pack-reused 0
  remote: From /home/ryan/_enc/notryan.com
  remote:    0c9c211d..aa11aa43  master     -> origin/master
  remote: Merge made by the 'recursive' strategy.
  remote:  blog/009.txt.draft | 48 +++++++++++++++++++++++++++++++++++------
  remote:  1 file changed, 42 insertions(+), 6 deletions(-)
  To mir:_enc/notryan.com.git
     0c9c211d..aa11aa43  master -> master

-------------------------------------------------------------------------------
                                     rss.c

Ah! Good 'ol RSS. I've recently "discovered" RSS myself, (which some people
might find insane.)

I think it's gosh darn amazing. I've converted all of my YouTube subscriptions
to RSS, and they show up in Thunderbird when they post a new video. I don't
have to deal with YouTube's distracting/time-consuming sidebar recommendations
ever. The video link and title just shows up in my RSS reader -- then I can
use `mpv` to watch the video.

Anyways, I wanted to offer spec-adherent RSS on my site. I've created a C
program to do exactly that. Subscribe to be notified of future posts!

  https://blog.notryan.com/rss.c
  https://blog.notryan.com/rss.xml

-------------------------------------------------------------------------------
                                  Future Plans

I'm thinking about converting server.c to use poll() instead of blocking
syscalls. I don't want to introduce too much complexity though. I like poll()'s
function prototype a lot more than select()/epoll(), so I will probably stick
with that. The tricky part is figuring out how to sync two threads. I would
prefer to use a simple fork() call, but the processes won't have shared
memory... anyways, that's for a future time.

Features that would be nice to have:
  * HTTPS
  * HEAD requests
  * Content-Type determined by file extension, so I can serve images properly
-------------------------------------------------------------------------------

Anyways, thanks for reading!
  -- (Most definitely not) Ryan

-------------------------------------------------------------------------------

https://www.reddit.com/r/programming/comments/gdxh3w/http_blog_server_100_lines_of_c_in_a_closet/
https://www.reddit.com/r/C_Programming/comments/gdy2av/serverc_100_lines_of_c_in_a_closet/