Data Analysis


Home Up Experimental Setup Experiment Data analysis Discussion

Technique 1 - Poisson distribution and error analysis

Firstly, consider an experiment that you toss a coin by n time and the probability to obtain a head is p. The probability to obtain i head is nC× p× (1-p)n-i . When the coin is not fair (p is very small), and number of toss is large (n is very large), the expression is approximately e-np × (np)i / i!. You can prove it!

Consider cosmic ray particles fall on our detectors. The average number of events in a period t to be r.  Subdivide the interval t into a large number n of subintervals. (Each small interval is dt = t/n)

Now, in n subintervals, we can see that it is most likely to find 1 event or zero. (Finding 1 event in dt is diffcult, let the probability be p, but finding 2 events is much more diffcult than finding 1 event and so forth for 3,4,5, ..., so we neglect the chance for more than 1 event). Probability of finding one event in dt is p (p is small), we have n dt so we have to find the event n times (n is large). The story exactly likes tossing a unfair coin with many times. So the probablity finding i events in time t (n ×dt) is e-np × (np)i / i!.

By Mathematics only, we know mean number of event in time t is n×p, and standard deviation is square root of n×p. You can prove it!

n×p should be r×t because r is assumed to be the rate of event. An well known result is follow, the probability of finding i cosmic ray particles in time period t with rate r is e-rt ×(rt)i / i!. That is so called Poisson distribution of cosmic ray events.

For error analysis in our particular problems, we need to step further to ask the following question. Suppose in time period t, we measure i events, we want to ask what should be the rate r?

The probability for the rate equals to r in time duration t and i events are measured is also given by e-rt ×(rt)i / i!. That is inverse Poisson distribution. (Do not forget you treat r as variable at this time) Mean r is (i+1)/t, (NOT i/t) and standard deviation of r is Square root of (i+1) / t. It can be seen that for large values of p and n, inverse Poisson distribution is very similar to ordinary Poisson distribution.

Example:

If n counts are observed during a time t, then the observed rate is R = n/t. Standard error of n is vn provided that n >> 1. 

1. there is a probability of 68% that average number p×t is inside the interval n - vn and  n + vn

2. the probability of p×t being outside the interval  n - 3vn and  n + 3vn is only 0.33%.

Therefore the average rate p is likely to be inside R(1-1/vn) to R(1+1/vn), it is unlikely to be outside R(1-3/vn) to R(1+3/vn)

Hence the half error bar is R×(1/vn) long. See the figure shown below.

Technique 2 - data binning

After your apparatus collect cosmic ray event information for a long time, you will have the freedom to select the number of bins to represent your data. In this case, you are going to plot the event rate against time, then each bin corresponse to a time interval. When choosing a smaller bin, you will group datum at a short period, hence you may have more data points. The following example show you the result about count rate plot at 11/18/2000 by a choice of bin 1 minute. Since it is a 7 hours experiment, there will be 420 bins.

There is one main disadvantage. That is the uncertainty for each data point is high (Why ?). So you can see the error bar is long on the graph shown above. One half of error bar represent the standard deviation for such measurement. That means there is 68% probability that event rate p at the time lies between N(t) - vN(t) and  N(t) + vN(t). So it is hard to draw conclusion about whether there is variation on event rate or not. In addition, it is a mess if you put too many data points on one single graph.

The next graph show a count rate plot at 11/18/2000 by a choice of bin 10 minute. The error bar is shorten by a factor of 1/3.

The graph is much clear and concise now. Be careful that our error analysis based on the believable that the incoming events follow a simple Poisson distribution. If you choose a bin which size is too large (e.g 1 day), when you calculate the error bar by formula square root of N, you may believe events within the day follow Poisson distribution with single mean. At last, you conclude the error bar is extremely small and your data point is very reliable. There is nothing wrong if you can convince yourself that you are just interested in the mechanism which releases cosmic ray in Poisson distribution with some rate for whole day scale. Otherwise, there is a loss of precision (applied statistical assumption wrongly). So, choose your Bin size wisely according to the time scale you interested in. With your application program PCI6602Monitor.exe, you can choose a smaller bin size first and do merging by EXCEL.

Technique 3 - Curve fitting technique and testing periodicity

In the experiment, the rates of showers are recorded as a function of time. Here lists a statistical method to verify whether the rate exhabit periodicity. The idea is following. Suppose you wish to search periodicity for cosmic ray data for one day. First of all, you assume your count rate plot is sine wave the period of one day. As you varies the amplitude and phase of sine wave, the different between your data plot and sine wave will be alter. You responsiblity is to find the most suitable amplitude and phase such that the sine wave is closest to your count rate plot. It seem that it carries a bundle of task to search such sine wave but it is actually not. Some mathematical skill may help you to simply your task. As a result, the amplitude and phase of sine wave are simply given by formulae. Once you are able to obtain the amplitude of sine wave with period one day, you can also obtain that of half day, one quarter day. You obtain a list of amplitudes for sine wave with different periods. Suppose count rate of extensive air shower at day time is more than that at light time, do you think fitting a sine wave with period one day is easier than that with period 1/4 day? Actually, the action you did in curve fitting is to decomposite your original count rate plot into different weight of sine wave with different period. In other words, your count rate plot can be constructed by adding different weight of sine wave with different period. This is the method so called Fourier analysis.

It is better to use a concrete example to illustrate the method. Suppose, you have count rate record for two day, you want to find out whether the rate varies as day time and light time. That means you are searching periodicity of one day.

Let hourly counts rate be X0, X1, ... X47 corresponding to 48 hourly intervals. We wish to analyse this data into its Fourier components particularly for first harmonic, corresponding to its diurnal term.

For the diurnal term, we can calculate the amplitude and phase of the best sine wave through the 48 points for each period of 24 hours.

Let us express hours in degrees so that 24 hours corresponds to 360o. Then we can write for the best sine wave in this form  F(Y)=Xmean + A×sin(Y)+B×sin(Y). When Y = 720o, it represents 2 whole days. Xmean = Summation(Xi), where i = 0,...47. We want to find A and B such that the sine wave F(Y) can estimate Xi very well.

How to measure whether a function fits the discrete data points? At i th hour, count rate is Xi, the value of the function is F(i×15o). (So when i = 47, i = 705o around 2 day) The difference between F(i×15o) and Xi measures how well the estimation at that time. To avoid the cancellation of the effects between point to point, we may use square of difference to estimate the distance between F(Y) and Xi. Hence, the effect will be accumulated only. Combine the such square of difference of 48 points, it gives a rough estimation of how fit for our F(Y). Technically, we say we want to tone A and B such that Summation of Square of (F(i×15o) - Xi ) is minimized.

Let S(A,B) = Summation of Square of (F(i×15o) - Xi ), dS/dA=0, dS/dB=0 gives the minimizing condtion. If you find a particular A and B such that dS/dA=0, dS/dB=0, S is possibly minimun there, but if your choice of A and B do not satisfy dS/dA=0, dS/dB=0, S is not yet minimized.

dS/dA = 0 gives A0 = 1/24×?i(Xi ×sin(i×15o)) where i = 0,1,...47

dS/dB = 0 gives B0 = 1/24×?i(Xi ×cos(i×15o)) where i = 0,1,...47

It is a good exercise for you to check the calculations.

Actually, you can find a set of A0 and B0 satisfying dS/dA=0, dS/dB=0. It is easy to verify S is really minimized.

In conclusion, F(Y)=Xmean + A0×sin(Y)+B0×sin(Y) is the best estimation.

You can handle the calculation by EXCEL, see the graph shown below.

In this example, the best sine wave is F(Y)=0.54457-0.0032sin(Y)-0.0012cos(Y) give the best approximation of the data points in column B.

Some formulas (SIN,COS,SUM) in EXCEL is involved in the calculation. Notice that argument passing to trigonometric function is in radian. So i×15o is represented as i×PI/12.

The amplitude of the sine wave is given by Square root of (A02+B02) = 0.0035

Of course, you can extend your method to fit a best sine wave of period half day, 1/4 a day, ... etc. You will obtain new Ao and Bo for each harmonic. Remember that the spirit behind Fourier analysis is to find out the weight of different sinusoidual components for the original fluctuation of curve.

If you find the amplitude for sine wave with period one day, is larger than that of the others, it is most likely your cosmic ray datum has one day periodicity. Althrough the technique listed here may be over simplfied in some sense, it reflects how people tackle such kind of problems even in research field.