Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Some Perl help needed
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
Aquous
l33t
l33t


Joined: 08 Jan 2011
Posts: 700

PostPosted: Tue Nov 01, 2011 6:42 pm    Post subject: Some Perl help needed Reply with quote

Hello,

For a homework assignment (statistics, yay...) I need to generate two sets of data. Both data sets need to consist of 30 fictional test subjects, who underwent different treatments. One treatment yields a mean score of 6 on some scale, the second treatment yields a mean score of 7. As a frustrating extra bonus my teacher wants the within-groups variation to be realistic, so I can't just give every test subject a 6 or a 7.
Once I've generated those two fictional data sets I need to do some nasty stuff to them using SPSS which I won't bother you with :P

So I thought: let's solve this first part of the problem using the computer. If I have 30 subjects, the first 29's scores can practically be any value, but the last subject needs to score a specific number to make the means add up to 6 or 7. I have successfully written a Perl program that can generate this fictional data for one group of test subjects.
However, I need to generate the data for two groups. I will also need the data to be in a 3x30 matrix. In other words: I need to generate the fictitious data, store it into an array result by result, do this again to yield two arrays containing different results, then printing the first line of each array, then the second, et cetera.

I can't get that to work.

The code is uploaded here: http://pastebin.com/GXKM81Ae

So, a quick walkthrough of what I'm trying to do in the program, because it is a bit dense and I didn't comment it (bad practice, I know):
The program expects to be given as command-line arguments the number of participants (30 in this case) and all means it should come up with (6 and 7 in this case). It then creates an empty array @pointers. This array will contain pointers/references to the arrays containing the results, which are calculated in the subroutine 'calculate'. After that (starting at line 13) it starts a counter at 0, prints the value of this counter with 1 added to it (the participant's number) followed by a tab and the $i-th element of each array whose memory location is stored in the array @pointers. (Complicated, I know.)

The problem: all generated arrays of scores are identical.
e.g. if you run the program as "./program 5 1 2 3 4" it should yield 4 arrays, having a mean of 1, 2, 3, and 4, containing 5 scores in total which deviate from the expected mean of 1/2/3/4 by no more than 2 scores and which add up to (5*1/2/3/4). That doesn't happen however: I get one array of scores which is duplicated four times.

Can anyone spot the (probably very obvious) mistake?
Back to top
View user's profile Send private message
BitJam
Advocate
Advocate


Joined: 12 Aug 2003
Posts: 2508
Location: Silver City, NM

PostPosted: Wed Nov 02, 2011 8:00 am    Post subject: Re: Some Perl help needed Reply with quote

Aquous wrote:
For a homework assignment (statistics, yay...) I need to generate two sets of data. Both data sets need to consist of 30 fictional test subjects, who underwent different treatments. One treatment yields a mean score of 6 on some scale, the second treatment yields a mean score of 7. As a frustrating extra bonus my teacher wants the within-groups variation to be realistic, so I can't just give every test subject a 6 or a 7.

I think you might want to use something like a Box–Muller transform which allows you to create a normal distribution (Gaussian) from a uniform distribution (rand function). If only integers are allowed or if negative results are disallowed then you might not want to use a Gaussian distribution.

IOW, you need to choose what kind of distribution would be realistic and then find a function that transforms a uniform distribution (the output of rand) to the distribution you want. Depending on the desired distribution, this could be difficult. There are probably other ways to simulate a distribution but they may not be as computationally efficient.

For example, you could take advantage of the central limit theorem and use the average or sum of many uniform distributions to simulate a Gaussian distribution. This is less efficient because for each person in the simulated study you would need to sum or average the results from many calls to rand().
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum