simple thread communication or synchronization (in pthreads)

Bones McCracker · Veteran Joined: 14 Mar 2006 Posts: 1611 Location: U.S.A.

Two questions:

1. When used in the context of POSIX threads, should a semaphore (say, a thread-shared semaphore, not a process-shared one) always be protected by a mutex?

Do people ever freeball it and go without them? Does a semaphore carry much overhead, or is it merely like a short int or char in the global namespace? I know about pthread condition variables, but it seemed to me like overkill for a simple requirement like this.

In case that's not enough to go by, here's the scenario (simplified; the general idea here is that the output should be exactly once per t seconds). It works; the actual application has been running fine for 24 hours without any apparent problems, but I have these questions.

dmitchell · Posted: Sat Feb 02, 2013 5:10 am Post subject:

I don't have an answer to your question. However, I would suggest looking at openmp. It's easy.
_________________
Your argument is invalid.

petrjanda · Posted: Sat Feb 02, 2013 7:13 am Post subject:

http://stackoverflow.com/questions/2065747/pthreads-mutex-vs-semaphore

Semaphore has a synchronizing counter, while mutex is used for preventing race conditions among threads.
_________________
There is, a not-born, a not-become, a not-made, a not-compounded. If that unborn, not-become, not-made, not-compounded were not, there would be no escape from this here that is born, become, made and compounded. - Gautama Siddharta

Bones McCracker · Veteran Joined: 14 Mar 2006 Posts: 1611 Location: U.S.A.

Bones McCracker · Veteran Joined: 14 Mar 2006 Posts: 1611 Location: U.S.A.

Prenj · n00b Joined: 20 Nov 2011 Posts: 16

Are you trying to solve something specific, or you just fiddling with semaphores and mutexes for the heck of it / educational purposes?

In my humble opinion, in general, semaphores and mutexes kinda kill the point of using threads in first place, whenever I needed to pass results from one thread to another, I was using messages passed along an async queue.

Say you have a thread just doing i/o and sending recieveing HTTP requests. It reads stuff from socket, does some basic framing to figure if it read everything, and then puts a message on outgoing queue. It checks ingoing queue to see if there is anything to send, if not, it reads if there is a new message, or just sleeps.

The thread that is supposed to parse contents and something clever with it, pops messages from the queue, does its stuff, and if there is a need for a reply, crafts one, and puts it in outgoing queue.

Both threads "sort of" synchronize with each other, but neither forces another to wait or abort, since the shared point is asynchronious. If one thread is working too fast for another, so that say worker thread cannot process everything in time, the i/o thread can check queue length, and if above certain threshold, it can send 503 instead of pushing it to worker, and give worker thread some breathing space.

Bones McCracker · Veteran Joined: 14 Mar 2006 Posts: 1611 Location: U.S.A.

I'm fiddling to learn, but am working on a specific requirement. I was trying to avoid overwhelming people with unnecessary details, but you asked for it, so:

I novice programmer, doing this for learning value. My toy application runs an infinite loop, which collects various system status information (takes about 0.05 sec), and then outputs that information. (I use it to put system status at top of root window in my window manager.) The trick is that part of that information is the time, and ideally the output should be generated exactly once each second.

Since the work takes a variable amount of time, I can't simply use a single, sequential loop that delays the appropriate amount of time, does the work, outputs. I had been doing this, and in actuality this works, with the output coming apparently once per second, but if you sit there and stare at it long enough, you'll see it seem to jump ahead by a second. (Once every few minutes if using sleep(), which has a resolution in seconds. Could refine this by using nanosleep, but I'd rather see if I can't find a precise, light solution).

I am far too obsessive and anally retentive to have my clock doing shit like that.

So what I've got right now is the application creating one pthread to accurately measure 1 second of elapsed real time, with the main thread doing its work but then waiting to actually output the result until the parallel thread has measured exactly 1 second.

Both the main thread and the parallel thread are based on infinite loops. (I'm not spawning a thread, waiting for it to complete, then spawning a thread ... over and over again every second, which I assume would be inefficient).

So, there are two aspects to my problem:

a. There's the immediate question of how to most efficiently and reliably communicate between the threads that a second has elapsed and it's time to go ahead and push the output. I have considered:
> just using a simple global variable set by the timer thread, with the main thread idling in a loop until it is activated (did a working prototype of this)

> using the pthread-provided "condition variable" constructs (condition-set, condition-wait, etc.), which have the advantage of being atomic operations and having integrated mutex management (started implementing this but stopped, thinking "this is awfully complex for a requirement so simple")

> using a semaphore: sem_set() by the timer thread and sem_wait() in the main thread before pushing the output (actually implemented this and it's been running on my machine for about two days, but I feel I should have protected the semaphore from simultaneous modification by the main and timer threads (especially since I'm not sure the sem_set() and sem_wait() operations are atomic)

> dmitchell's suggestion of openmp (after 15 minutes of scanning it, also seems like overkill for this, although perhaps not in reality any more heavy than pthreads)

b. And, then there's the aspect of "this is a stupid way of satisfying my basic requirement" (which I didn't really ask about, but any thinking person will simultaneous consider):

> was originally using a single-threaded single process with "sleep(1)" inline. This is acceptable, given that the work actually only takes an eyeblink, but it feels like kludge to me.

> considered refining the delay to be (1 sec) - (avg work time), but that's kludgy too. even considered computing avg work time on the fly as a moving average for this purpose (still kludgy, plus added work to be done every second)

> decided to try using a timer process in parallel, with output being synchronized to the timer process, at 1 second intervals. My main problems with this are:

a. displayed time is always nearly a second behind reality; this sucks; I want accurate time, but I also want output at exactly 1 second intervals! :lol:

b. additional thread has own overhead (less than an additional process, but it has own stack, etc.). After looking into this, it's my understanding at this point that a thread doesn't actually make its stack memory unavailable to other uses (it's an upper limit, not an allocation, but I'm still not confident I understand this correctly).

As I write this, I am thinking I can probably use a system call to some kind of timer that will suspend the process and then signal it when it has elapsed (e.g. POSIX timer, linux HRT, which I haven't looked into yet).

Here is my actual code at this point (I know it doesn't do any error-checking; it's for my own use so I'm not testing things that I know work):

Prenj · n00b Joined: 20 Nov 2011 Posts: 16

the problem with sleep (x) is that in effect you'll end up with drift of sleep(x)+execution time.

so you could do:

thread one:
get stats, put into struct, put struct onto async queue

thread two:
pop struct from the queue and display it
take time
nanosleep(diff between next second and time taken) //so you end up waking up on next tick

for queues you could use zeromq.org

Bones McCracker · Veteran Joined: 14 Mar 2006 Posts: 1611 Location: U.S.A.

tomk · Posted: Sat Feb 02, 2013 10:15 pm Post subject:

Moved from Off the Wall to Portage & Programming at BoneKracker's request.
_________________
Search | Read | Answer | Report | Strip

Prenj · n00b Joined: 20 Nov 2011 Posts: 16

Bones McCracker · Veteran Joined: 14 Mar 2006 Posts: 1611 Location: U.S.A.

Thank you; you've given me some good ideas to work with.
_________________

Dr.Willy · Guru Joined: 15 Jul 2007 Posts: 547 Location: NRW, Germany

For another idea to work with see 'man timer_create'

Bones McCracker · Veteran Joined: 14 Mar 2006 Posts: 1611 Location: U.S.A.

Good tip, thanks.

I modified the application to be single-threaded, added a "ring buffer" (of scale 2) to hold timevals, and started taking a timestamp at the end of each loop and then using usleep to sleep for the delta between the elapsed time and 1 second.

This eliminated the "stale data" problem, with the resulting data being no more than about 0.0003 sec old. To reduce thrashing, I had it also calculate the error (i.e., how close it got to exactly 1 sec for the subsequent iteration), and then used that error to adjust the usleep interval for the subsequent iteration.

That rapidly (10 iterations) converged on an average error of about 15 usec. By dampening the adjustment (only correcting by 1/2 the error value each interation), it took a few more cycles (18 or so) to converge, but reached an average error of about 8 usec, which is better than acceptable.

This variability appears to be relatively independent of what else is going on (as far as user activity), so I assume it's due to competition for some kernel / hardware resource that needs to be scheduled (and hopefully it's not a clock related resource I'm introducing contention for by measuring and sleeping).

I could probably improve that further by using a weighted moving average of recent error values, but I don't think I'll bother.

So this fixes both the data currency and interval accuracy problems. It's a little more complex that I think is appropriate for this requirement, so I may do something simpler (like the sleep() loop in parallel thread, or maybe a timer and signals or something), but I'm learning stuff.

I'll now look into reimplementing this same algorithm using a timer instead of gettimeofday() and nonosleep().

I think the answer to my mutex question is "yes" (always protect shared memory). This other challenge is a "have your cake and eat it too" problem. If you want precisely-timed cycles, you must delay output until exactly the 1 sec mark (thereby making the data stale by a fraction of a second). If you want data currency, you have to estimate the time to start the operation, so output will arrive approximately at the 1 sec mark, but the cycle won't be exactly 1 sec. What I see from this experiment, though, is that I can do the latter and get within a 10 usec error on cycle time, with data that no more than 250 usec old. That's an acceptable outcome.

One alternative would be a hybrid approach that would assemble the output just a tad earlier, then produce it precisely on time, but I think the tradeoff as-is is preferable.

Thanks for the help.
_________________