>>23893no, they arent.
but when it comes to raw number crunching, gpu "cores" are superior.
the thing i utilize the most is actually something called "latency hiding".
while your core does the computations, it fetches data for the next computations.
it hides memory latency by doing two things at once.
but it comes with a tradeoff- your code needs to be semi-asynchronous
its like you have a loop.
the content of the loop is your kernel.
what's doing the looping is the work subdvision, as in if you have 10 datapoints, the same kernel will be ran on all 10 of them at once.
thats why your "iterations" have to be able to be solved independently from eachother
as in:
counting in normal code:
you iterate through all elements of an array summing them up
counting in opencl/cuda:
you create an array thats 1/2 the size of your original array
sum up your elements 2 by 2 bc you have to assume all of it happens simultaneously
and you dont have mutexes so otherwise your program is one big race condition