The Inner Life Of Code
DH 520, Class 7, 2025-10-16 - Loops, Files, and why dictionaries are God-Tier fast
This week in DH520 felt like peering under the hood of Python itself. We moved past the basics of writing a loop and got into the philosophy of control flow, why your dictionary is a speed demon, and the messy reality of handling external files.
The Zen of the Loop: break and else
We covered some cool, advanced concepts in loop logic that are often overlooked. The biggest mind-bender was the else clause on a loop.
The main takeaway about Boolean tests is a fundamental guarantee: If your code reaches the line immediately following a loop, the loop’s governing condition is guaranteed to be false. You don’t have to check it again.
The Weird Truth About else
The else construction on a loop is technically valid, but it has a super specific function: it executes only if the loop completes without hitting a break.
Success (Found It!): You hit the
breakstatement (like finding a number divisible by three, our demo). The loop stops, and theelseis skipped.Failure (Didn’t Find It!): The loop runs through every single item and finishes. The
elseruns, telling you the item wasn’t found.
This is a beautiful, concise way to write a search algorithm. We also briefly met the continue statement, which lets you skip the rest of the current iteration and jump straight back to the top of the loop.
File Handling: The Open-Close Pipeline
Moving on to dealing with large data, we learned the crucial “pipeline” metaphor for handling external files: open, process (read/write), and close.
The with Statement: Your New Best Friend
Forget the old way of explicitly writing pipeline.close(). The modern, safer way is the with context generator:
Python
# The new way, which automatically closes the file:
with open(’lexicon.txt’) as file:
words = file.readlines()
This structure automatically shuts the door on the file pipeline as soon as the indented block finishes, eliminating the risk of leaving files open.
The Messy Truth About Text Data
We also came face-to-face with the hidden world of invisible characters. When you hit Enter in a text file, you’re inserting a newline character (\n). On Windows, it’s a double-whammy of a carriage return and newline (\r\n). When you read a file line-by-line, these characters come in with the data!
The solution? The trusty .strip() string method, which we can run inside a list comprehension to instantly clean up unwanted whitespace, newlines, and tabs (\t) as the data loads.
The Speed Test: List vs. Dictionary (The Hash Table King)
The most revealing part of the class was a live speed test. We timed how long it took Python to find words in a massive external lexicon:
List Search (Linear): This is the slowest a search can get. Python has to start at index 0 and check every item sequentially. Finding “banana” is fast, but finding “zebra” takes measurably longer because the computer has to work its way to the end.
Dictionary Search (Hash Table): Dictionaries are ridiculously faster—sometimes faster than the list can find items at its own beginning! Dictionaries lose their sequence property (can’t be sliced) in exchange for building an internal hash table. This index allows the dictionary to go directly to an item’s location without checking everything else.
The takeaway is critical for Digital Humanities: when dealing with large data sets (like for our upcoming OCR correction exercise, where we check every word against a lexicon), dictionary speed isn’t a luxury—it’s a requirement.
Looking Ahead: Cryptography and Critique
We concluded with an overview of the next steps:
Pseudocode Exchange: We need to share our four pseudocode exercises with a partner who will act as the “coder,” translating our logic into working Python and critiquing the design. The coder is responsible for all the syntax errors—a fun incentive to write clean pseudocode!
Next Week: We’re moving into cryptography (specifically frequency analysis of English letters, which reveals the high-frequency superstars like E, T, A, I, O, N), and we’ll start working with CSV files (the raw, plain text foundation of spreadsheets). Lots of real-world data work ahead!


