Welcome to Incremental Social! Learn more about this project here!
Check out lemmyverse to find more communities to join from here!

@walthervonstolzing@lemmy.ml cover
@walthervonstolzing@lemmy.ml avatar

walthervonstolzing

@walthervonstolzing@lemmy.ml

pointless

This profile is from a federated server and may be incomplete. Browse more on the original instance.

walthervonstolzing ,
@walthervonstolzing@lemmy.ml avatar

What's "Mordor Intelligence" -- is that a real thing, or a parody of the surveillance/'defense' industry companies that are coming up with names nicked from LotR? ('Anduril', 'Palantir')

Best resources to learn more about networking

I have been exploring the world of home servers/self-hosting for a little over a year now, and feel like I have at a decent understanding of a lot of things that go into this. The one thing I am not remotely comfortable with yet is networking. It's like a foreign language to me....

walthervonstolzing ,
@walthervonstolzing@lemmy.ml avatar

Michael W. Lucas's "Networking for System Administrators" is a great resource: https://mwl.io/nonfiction/networking#n4sa

walthervonstolzing ,
@walthervonstolzing@lemmy.ml avatar

Another vote for Tesseract -- just to clarify the terminology, though: PDF is a fragile format best used read-only; so you really don't want to edit a pdf, but make a new one using the same (or cleaned-up) bitmaps and a new ocr text layer.

Now, tesseract is excellent at recognizing glyphs; but especially if the scanned image is a little fuzzy, the layout detection falters; and when it falters, you get redundant line breaks, & chunks of text in the wrong order -- all of which gets incredibly annoying for searching & copying purposes. So if you can spare the time, and the text requires it, you may need to mark regions (paragraphs & titles mainly) on the bitmap image manually. There exist a few frontends to Tesseract that help with a task like that; check out, e.g., https://github.com/manisandro/gImageReader - inside single paragraph blocks of text, Tesseract doesn't get as easily confused; and the text output is in the correct reading order, & w/o redundant breaks.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • incremental_games
  • meta
  • All magazines