x86 - How do I enable SSE for my freestanding bootable code? -
(this question cvtsi2sd
instruction , fact thought didn't work on pentium m cpu, in fact it's because i'm using custom os , need manually enable sse.)
i have pentium m cpu , custom os far used no sse instructions, need use them.
trying execute sse instruction results in interruption 6, illegal opcode (which in linux cause sigill
, isn't linux), referred in intel architectures software developer's manual (which refer on iasdm) #ud - invalid opcode (undefined opcode).
edit: peter cordes identified right cause, , pointed me solution, resume below:
if you're running ancient os doesn't support saving xmm regs on context switches, sse-enabling bit in 1 of machine control registers won't set.
indeed, iasdm mentions this:
if operating system did not provide adequate system level support sse, executing sse or sse2 instructions can generate #ud.
peter cordes pointed me sse osdev wiki, describes how enable sse writing both cr0
, cr4
control registers:
clear cr0.em bit (bit 2) [ cr0 &= ~(1 << 2) ] set cr0.mp bit (bit 1) [ cr0 |= (1 << 1) ] set cr4.osfxsr bit (bit 9) [ cr4 |= (1 << 9) ] set cr4.osxmmexcpt bit (bit 10) [ cr4 |= (1 << 10) ]
note that, in order able write these registers, if in protected mode, need in privilege level 0. the answer question explains how test it: if in protected mode, is, when bit 0 (pe
) in cr0
set 1, can test bits 0 , 1 cs
selector, should both 0.
finally, custom os must handle xmm registers during context switches, saving , restoring them when necessary.
if you're running ancient or custom os doesn't support saving xmm regs on context switches, won't have set sse-enabling bits in machine control registers. in case instructions touch xmm regs fault.
took me sec find, http://wiki.osdev.org/sse explains how alter cr0 , cr4 allow sse instructions run without #ud
.
my first thought on old version of question might have compiled program -mavx
, -march=sandybridge
or equivalent, causing compiler emit vex-encoded version of everything.
cvtsi2sd xmm1, xmm2/m32 ; sse2 vcvtsi2sd xmm1, xmm2, xmm3/m32 ; avx
see https://stackoverflow.com/tags/x86/info links, including intel's insn set ref manual.
Comments
Post a Comment