Discussion:
Using DMA for memcpy()
Jason Hecker
2003-07-29 06:00:24 UTC
Permalink
Seeing as certain embedded archetectures have speedy general purpose DMA
devices, would there be any reason to not use said DMA device to speed up
the memcpy() routine? I can only think of contention issues with multiple
processes trying to use the same DMA at the same time. Perhaps some sort
of transparent queueing mechanism might be in order to deal with multiple
simultaneous memcpy requests.
--
Cheers,
jASON
---------------------------------------------
--== http://www.wireless.org.au/~jhecker ==--
---------------------------------------------
David Laight
2003-07-29 08:01:55 UTC
Permalink
Post by Jason Hecker
Seeing as certain embedded archetectures have speedy general purpose DMA
devices, would there be any reason to not use said DMA device to speed up
the memcpy() routine? I can only think of contention issues with multiple
processes trying to use the same DMA at the same time. Perhaps some sort
of transparent queueing mechanism might be in order to deal with multiple
simultaneous memcpy requests.
I very much doubt you would get any improvement, in fact it would
be hard to not lose out badly.

You would have to loop waiting for the DMA to finish - because the
cost of the ISR to restart things is likely to be significant, never
mind the cost of any process switches.

You then have problems with cache coherency to consider.

Low level cache controller operations can sometimes be used to do
copies a cache line at a time.

However, and tests as to which algorithm to use can slow things down
as a lot of the copies done are actually short.

Also note that the cpu could easily have a faster data path to memory
than the dma controller.

David
--
David Laight: ***@l8s.co.uk
Jason Hecker
2003-07-29 08:27:46 UTC
Permalink
Post by David Laight
I very much doubt you would get any improvement, in fact it would
be hard to not lose out badly.
I've been mulling over it a bit more since I posted the query, and you're
right. Cache coherency would be a big problem, as would scheduling the
DMA or trying to implement some piecemeal sharing mechanism so no one
process is waiting too long to use the controller. Forget I asked. ;)

Better off just using the DMA controller as needed as some sort of
background copier.
--
Cheers,
jASON
---------------------------------------------
--== http://www.wireless.org.au/~jhecker ==--
---------------------------------------------
Jason Thorpe
2003-07-29 14:14:47 UTC
Permalink
Post by Jason Hecker
Post by David Laight
I very much doubt you would get any improvement, in fact it would
be hard to not lose out badly.
I've been mulling over it a bit more since I posted the query, and you're
right. Cache coherency would be a big problem, as would scheduling the
DMA or trying to implement some piecemeal sharing mechanism so no one
process is waiting too long to use the controller. Forget I asked. ;)
Better off just using the DMA controller as needed as some sort of
background copier.
There is a "dmover" framework that handles these sorts of things. It
is the case that for random memcpy(), using the DMA controller would
have too much overhead. But using it for things like pmap_copy_page()
and pmap_zero_page() might be a good idea.

-- Jason R. Thorpe <***@wasabisystems.com>

Loading...