You see those blocky colours in the youtube video? Those are the Spectrum's "attribute file", a small colour overlay that goes on top of the higher resolution 1-bit monochrome bitmap part of the machine's video ram. Gives you a "hi-res colour computer" while saving a ton of ram, but at the cost of creating horrible artifacts when differently coloured objects get too close together.
The same principle is still used in consumer video encoding (and all but the highest-end professional video), where's it's described as eg. 4:2:2 or 4:2:0, the numbers describing how many pixels' worth of chroma (colour) data are provided for each block of luma pixels.
Yes, there's only one display mode. But if anything it was better for displaying text than graphics.
The rows of pixels were laid out in a wacky fashion to make text rendering fast. The display resolution was 256x192. Being monochrome, the bitmap part of the display therefore used 32 bytes per row of pixels. You might expect that the second row would start at start_of_screen_address + 32. BUT NO! It was at start_of_screen_address + 256 because you can increment an address register by 256 faster than you can increment by 32 (on a Z80). Hahaha.
The result was that drawing graphics was a bit fiddly, but drawing 8-pixel wide character glyphs was easy and fast.
The same principle is still used in consumer video encoding (and all but the highest-end professional video), where's it's described as eg. 4:2:2 or 4:2:0, the numbers describing how many pixels' worth of chroma (colour) data are provided for each block of luma pixels.