Black Swans in the Cyber Space

 Cyber security  Comments Off on Black Swans in the Cyber Space
May 282012
 

This article was written by us for the IsraelDefense magazine and was published on May 2012.

 

How can cyberspace be protected in advance from sudden events with far-reaching implications? Gil David examines the phenomenon of the “Black Swan”.

In the 16th century, when people wanted to say that something was impossible, they used the term “black swan.” This expression describes an event that could not happen in reality.

According to historical evidence, it was believed at the time that swans had only white feathers – ergo, a black swan could not exist. Then, in the seventeenth century, the world was stunned to learn that black swans had been found in remote Australia. The categorical assumption that black swans were impossible was abandoned.

In 2007, the Lebanese-American philosopher Nassim Taleb presented his own black swan theory after several years of work. Taleb defines events as black swans that are generally random and unexpected. In other words, a black swan is a high-impact, low-frequency event whose influence on the future is extreme but whose likelihood of happening is low.

In our time, a classic case of a black swan is the September 11, 2000 terrorist attack on the World Trade Center and Pentagon in the US. This event contains all the criteria that define a black swan. It was a unique event. Whoever watched it – no matter where – was shocked. Its repercussions are still felt today, especially in airport security. The level of protection has risen dramatically and governments are continually upgrading security measures. This trend has had a powerful impact on the handling of passengers and the need for enormous resources.

Worms and Swans

One of the paramount cyber war events in recent years was the Stuxnet worm that infiltrated Iran’s nuclear facilities. Experts in cyber security agree that the Stuxnet worm attacked the centrifuges’ control systems and reshuffled their operating instructions, altering the centrifuges’ speed cycles, causing them to crack and then explode.

Stuxnet can be defined as a black swan for a number of reasons. First, it contained the element of surprise. Nuclear facilities are tightly guarded against physical, virtual, and cyber threats. Their communication networks are isolated from the Internet and buried several meters underground. In addition, the facilities’ production network operates according to SCADA protocol (Supervisory Control and Data Acquisition), and until the Stuxnet penetration, almost no cases of attacks aimed specifically against this protocol were registered. Despite enhanced security measures and isolation from external networks, the worm made its way so sophisticatedly into the reactor’s software and wreaked so much havoc in the facility’s innermost core that everyone was caught by surprise. In effect, what appeared as an impossible mission for the Stuxnet designers was carried out brilliantly and with craft, leaving the Iranians awestruck.

Second, from both a practical perspective and as a confidence destroyer, the effect of the worm on the Iranian nuclear program was immense. Some pundits claim that the attack pushed the nuclear project back by months, even years. Following the event, the Iranians decided to base their software on a code that they developed themselves, without recourse to any external codes that could harbor more worms. This required special preparations, such as training engineers and allocating costly resources. It also meant a setback for development plans. On the international level, Stuxnet had a powerful impact on cyber defense, forcing vast sums to be diverted to improving counter measures. In this way, it caused a reconfiguration of the security concept in states and governments and awakened the need for a significant change in preparing for future cyber threats.

Third, in recent years, there have been many indications of zero-day Trojan horses (exploiting computer application weak spots), backdoor attacks (circumventing normal authentication), and other malware designed for targeted attacks against organizations and facilities. Another technique that has been around for several years is malware incursion of networks via external infection (such as a disk-on-key) that bypasses the defense mechanisms that deny unauthorized access. Human agents have been used for carrying out an attack (for example, infecting a network with a worm) and social engineering has been employed for evading sophisticated security mechanisms. There were even some reports that attacks could be made against SCADA protocol-based systems.

The West is determined to impede the Iranian nuclear project at almost any price. The Stuxnet worm was indeed a black swan. It was the first major one to be seen in the cyber world, and is a harbinger of things to come in cyberspace. The trick is to avoid this kind of attack on our own systems. One solution is to identify weak points in our systems and transform a black swan into a white one. This is the only way we can protect our most sensitive systems and prepare for the cyber war that looms on the horizon.

 

The magazine version – please click on the article to get a bigger size.

 Posted by at 1:10 pm

“Hot hand” debate is warming up

 Science  Comments Off on “Hot hand” debate is warming up
Jan 122012
 

What is it all about?

Anyone who has ever watched a sports competition is familiar with expressions like “on fire”, “in the zone”, “on a roll”, “momentum” and so on. But what do these expressions really mean? In 1985 when Thomas Gilovich, Robert Vallone and Amos Tversky studied this phenomenon for the first time, they defined it as: “…these phrases express a belief that the performance of a player during a particular period is significantly better than expected on the basis of the player’s overall record”. Their conclusion was that what people tend to perceive as a “hot hand” is essentially a cognitive illusion caused by a misperception of random sequences. Until recently there was little, if any, evidence to rule out their conclusion. Increased computing power and new data availability from various sports now provide surprising evidence of this phenomenon, thus reigniting the debate.

 

Hot Hand (Image: Gur Yaari)

To understand what “expected” means, let us restrict the current discussion to results that can be defined in a binary sense, namely successes and failures. It means that in each trial the result is drawn randomly with some probability of success. In this framework, the words “expected on the basis of the player’s overall record” means that the probability of success in each trial is independent of previous results and constant throughout time.

Gilovich, Vallone and Tversky argued that time series results from basketball are indistinguishable from repeated uneven coin tosses (the coin might have a probability of success which is different than 50%). Despite being extremely influential in the scientific community, their conclusions were highly controversial, as the vast majority of sports fans remained confident that sometimes players are indeed “on fire”. Amos Tversky described the situation saying: “I’ve been in a thousand arguments over this topic, won them all, but convinced no one.” Stephen Jay Gould wrote “Everybody knows about hot hands. The only problem is that no such phenomenon exists.”

Could it be the case that fans were right after all? The answer is a little complicated and depends on the specific task, but data seem to suggest that hot hand does exist after all.

When studying this phenomenon one major complication factor is the presence of an opponent. The success probability of the task is no more dependent on the skills of the player only, but also confounded by the performance and strategy of the opposing players. A player that “gets in the zone” is likely to change the defensive strategy of the opposing team, making it more difficult for him to perform. Moreover, different opponents have different skills leading to tasks of varying degrees of difficulty. All these factors make testing the existence of hot hand very difficult, and require more complex models. These confounding factors can be overcome by considering tasks with minimal external interferences.

So how can one distinguish between a “pure random series” (essentially repeated coin tosses) and something else? This is where statistics comes in handy. Without delving deep into the technicalities of the different statistical tests, we would like to just make note of a crucial point: the fact that a statistical test does not detect a phenomenon does not mean that this phenomenon does not exist. Most statistical tests are meant to reject a null hypothesis – the fact that it cannot be rejected does not mean that the null hypothesis is correct. It might be the case that the statistical test used is not sensitive enough for the type of data and phenomenon being tested. It can also be the case that data is insufficient to yield a definite answer. In the case of the hot hand phenomenon, it turns out that both apply: the tests used by many of the papers studied this phenomenon weren’t adequate and the data was not sufficient in many cases (see a wonderful presentation by Nobel Laureate Brian Josephson about this type of reoccurred error, and also the following two papers about the inadequate tests used to detect the hot hand phenomenon).

So what’s new?

Until recently, there was practically no evidence for the presence of the “hot hand” phenomenon in sports (see review). However, lately as data mining and statistical methods improved dramatically, the “hot hand” phenomenon has received support in various domains. Some examples are (see also this book and website):

– Basketball free throws (see our previous paper described also here)

– Bowling strikes (our recent paper: and previous publication).

Baseball hitting rates

Volleyball

The existence of hot hand means that you cannot model a series of an athlete’s performance with repeated coin tosses. The observed fluctuations between good and bad periods are, larger than expected, by a pure independent random process.

Interestingly, another contradicting example was shown in basketball 3 points attempts, where it was shown that data actually present an “anti-hot hand”. But as mentioned above, in this framework the defensive strategy is important and is likely to influence the performance of the player – a player who has a “hot hand” will attract more attention from the defense – which can directly influence the results of future trials.

These examples basically show correlation between current results and previous ones, so an athlete’s performance is not just repeated coin tosses. Does this mean that “success breeds success” and “failure breeds failure”, or is there something else at hand?

Correlation vs. Causations

Most people will agree that there is a significant correlation between weather condition and the number of people carrying opened umbrellas – the number increases significantly on rainy days. Does this mean that opened umbrellas are causing rain? Did we just find the solution for drought?

Correlation and causation are often mixed together. From a statistical point of view, this is a difficult question. Human minds are often after “reasoning” and tend to misinterpret correlation as causation. To prove that something is actually causing something else, one has to perform more detailed studies and not rely on statistical correlation only. Correlation is essential for causation but not sufficient.

Despite the above definition, many researchers refer to the “hot hand” phenomenon as some kind of psychological “feedback” mechanism, which changes athletes’ performance due to their recent past results (causality). What we and other researchers observed lately is a correlation between current results and previous ones – but does this  mean that previous results influence players’ performance in their next attempts (causation) ?

In a paper published in PLoS ONE , we (Gur Yaari and Gil David) present our analysis of the “hot hand” phenomenon in bowling data. We studied almost 50,000 bowling games, extracted from the Professional Bowlers Association (PBA) website. Each game was represented as a frame-by-frame series of zeros and ones. If the bowler got a strike in a frame, this frame was considered as a success (1), otherwise as a failure (0).

We were able to supply evidence that shows that players exhibit “good” games and “bad” games, which could not be explained solely by pure luck: it means that the series could not be modeled as repeated coin tosses. This observation verifies the existence of the “hot hand” in bowling, similar to previous studies in this domain. In addition, our new observation shows that within each game, successes and failures (i.e. strikes and non-strikes) are not grouped together in continuous series – on the contrary, they are spread randomly inside each game. Thus, we show that the result of one frame does not influence the result of the next frame in a causal manner – if a player had a success in the 4th frame, this by itself does not affect players performance in the 5th frame.

On the other hand, we also showed that it is possible to use the first observation of “good” and “bad” games (correlation) to improve the prediction for the results of the last frames in a game, based on the whole series of the preceding frames. In other words, the fact that a player had good results during the first 8 frames indicates that this is a “good” game and thus his/her probability of rolling a strike in the remaining frames will be higher.

An analogy that may help to understand these results is to imagine two coins:
The first coin is a fair coin with 50% chances of presenting a head and 50% chances of presenting a tail in each toss.
The second coin results in heads (with probability of 99%) for 50 tosses (one day) and then results in tails (with probability of 99%) for the next 50 tosses and so on (think of it as someone who alternates between extremely good and extremely bad days).
If now, you were given results of two consecutive tosses of both coins: for the first coin, the result that you observe on the first toss doesn’t change the fact that the result of the second toss will have a probability of 50% to show head.

On the other hand, in the case of the second coin (the one with “hot hand”): if the first toss was a head – it means that most likely it has a good day. Hence, the probability that the second toss will be a head is ~98 %. If you observe a tail on the first toss, it means that most likely it is a bad day and the probability that the second toss will be a head is only ~2%.

As you can see, both coins have a 50% probability of landing heads on the long run, and both coins have good days and bad days. However, only for the second coin one can identify a “hot hand” due to the magnitude of the fluctuations between good and bad days (good days are really good and bad days are really bad!).
Moreover, the second coin results are not due to causations – i.e. the result is a head (for example) because most likely it experienced a good day and not because the preceding toss was a head.

Our  research, and the others mentioned here, demonstrate that players’ performances are not affected by the results of previous trials, but rather by other factors which cause the resulting time series to be more complex than a simple series of coin tosses. Maybe for some of you this sounds trivial, but it was not the consensus in the scientific community until recently. The results mentioned here may open the door for future studies to address a more important question: what really cause athletes to perform better and how could they use this kind of knowledge to improve their future performance.

The onion that will make you anonymous in the Internet

 Cyber security  Comments Off on The onion that will make you anonymous in the Internet
Dec 252011
 

The Hebrew version of this article was published in the digital version of Haaretz newspaper.

Who needs to be anonymous in the Internet?
The Internet is not anonymous. In most cases, virtual users and their activities in the Internet can be identified and associated to their real identity. Usually it is not a problem for the average user but sometimes Internet users want to be anonymous. For example, a journalist that wants to communicate anonymously with his resources, users that want to bypass the censorship that their country impose on the Internet, intelligence organizations that want to participate in forums without revealing their real identity and a blogger that wants to post his content anonymously.

Suppose that you want to leave a comment in a news website. In order to add your comment, you are required to write your name, e-mail address and the comment. Since you don’t want that anyone will know that you are the one that left the comment, you use a fake name and a fake e-mail address and then you leave the comment.
Although you used fake details, can this comment lead back to you? can someone associate and prove that you (the real you) wrote this comment? did this popular “trick” of using fake details helps you to stay anonymous?

The simple answer is, NO. It will be “easy” to find out that you are the one that left the comment. The more general answer is NO, whatever similar trick you use to fake your real identity, you are not anonymous in the Internet and if this news website will really want to find your real identity, it is possible. And in most cases, with the help of the law authorities, it can be really easy to do so.

Anonymity in the physical world
Let’s leave for a moment the virtual world of the Internet, and give an example from the real (physical) world.
Alice, who lives in Dallas, Texas wants to send a package to Bob, who lives in New York, New York. Alice is going to her local post office in Dallas and is asking to send the package. She is filling in the sender address (for example, Alice, PO Box 1111, Dallas, TX) and the recipient address (for example, Bob, PO Box 2222, New York, NY) and then she sends the package.

The local post office sends the package to the post office in New York, who delivers the package to Bob’s PO Box. When Bob gets the package, he knows the sender’s address. Well, actually he knows the name, PO Box, city and state but not the real home address of Alice.

Suppose that for some reason, Bob wants to locate the real details and address of Alice. Bob can call the post office in Dallas and ask for the real details of Alice. He gives them her PO Box (1111) and since they know the real details of every person that purchases a PO Box, they can give Bob the answer. However, they won’t since they keep the privacy of their customers. But what if package contained a ticking bomb and the local police ask them to give the real details? or the government? as you can imagine, of course they will expose Alice’s real details.

Anonymity in the virtual world
Let’s go back to the virtual world.
On 1/1/11 at 11:11 Alice wants to leave a comment on Bob’s website. When Alice connects to the Internet, her ISP, Internet Service Provide (local post office) assigns her an IP address (for example, 1.1.1.1). The IP address is like the PO Box from the physical world example. Bob also has his own IP address (for example, 2.2.2.2). Now, every activity she is doing in the Internet will be identified by her IP address. When Alice is leaving a comment on Bob’s website, she is actually sending the comment from her IP address to Bob’s (from her PO Box to Bob’s).

Like in the physical world, it is possible to locate the real address of Alice based on the IP address that she used. The IP that was assigned to her by her ISP (1.1.1.1) is registered on her ISP. Since this information is public on the Internet, anyone can find out the details of her ISP. For example, this website shows that my current IP address (79.181.205.194) is registered on Bezeq International ISP in Israel.

Now, Bob can contact Alice’s ISP and ask them for the real details of the person that used IP address 1.1.1.1 on 1/1/11 at 11:11. Every ISP is keeping logs of all the allocations that he made for his IP addresses, so Alice’s ISP can figure out easily that this IP address that was used at this date and time was assigned by him to Alice and since Alice is his customer he has her real details. As in the physical world, due to privacy issues the ISP won’t give these details to Bob but if the local police/governmental authorities will ask for the details the ISP is required to give them.

As we can see, both in the physical and in the virtual worlds, the anonymity of the sender is very limited. Now let’s see how we can improve the anonymity of our dear Alice.

How to be anonymous in the physical world
In order to dramatically improve her anonymity in the physical world, Alice will send her package to Bob using some people that will hopefully help her. Alice opens a global yellow pages directory and picks 3 random people, each one from a different continent. The first one, Frank from Paris, France (PO Box 3333). The second, Debbie from Melbourne, Australia (PO Box 4444). The third, Ali from Rabat, Morocco (PO Box 5555). Then, Alice will send her package to Frank, that will send it to Debbie, that will send it to Ali that will send it to the final destination, Bob. Each hop in this route knows only the address of the previous hop and the next hop and since the package is traveling around the world from one person to another, it will be harder to trace it back from the final destination (Bob) to the original sender (Alice). How is she going to do this:

Alice takes her package and writes Ali’s details (Ali, PO Box 5555, Rabat, Morocco) as the sender and Bob’s details (Bob, PO Box 2222, New York, New York) as the recipient. Then she takes the package and puts it inside a bigger package. She writes Debbie’s details (Debbie, PO Box 4444, Melbourne, Australia) as the sender of this bigger package and Ali’s details as the recipient of this package. Then she locks it with a combination lock, where only Ali knows how to open. Now Alice takes this (double) package and puts it inside a bigger package. She writes Frank’s details (Frank, PO Box 3333, Paris, France) as the sender of this package and Debbie’s details as the recipient. Then she locks it with a new combination lock, where only Debbie knows how to open. Last, Alice takes this (triple) package and put its inside a bigger package. This time she uses her details (Alice, PO Box 1111, Dallas, Texas) as the sender and Frank’s details as the recipient. Then she locks it with another combination lock, where only Frank knows how to open.

The following image illustrates how this multi-layers package looks like. Each layer has its own sender and recipient and its own lock:

Now Alice can send her multi-layer multi-lock package. The first recipient is Frank. Frank gets the package, unlocks it (only he has the right combination) and sees another package inside. He can’t open it (since it’s locked with Debbie’s combination) but he sees that destination of this package is Debbie. So he sends it to Debbie. Debbie gets it, unlocks it (only she can unlock) and sees another package inside. She can’t open it (locked with Ali’s combination) but she can see the next destination, Ali. So she sends it to Ali, who can unlock it (only he) and sees another package inside. This time he sees that the final destination is Bob and he sends him the last inner package. The reason Alice is using locks is to ensure that every middleman will be able to see only the next hop (middleman) on his route and not more than that. So Frank can see only the address of the next hop in his route (Debbie) but not the next next hop (Ali). This way, every middleman has only a partial knowledge of the whole route of middlemen.

Now suppose that Bob wants to trace back the original sender. He knows that the package came from Ali, so first he has to go to the post office of Ali in Morocco, show them the package and ask for the details of the one that sent this package to Ali. Even if they give it to him, Ali (and his post office) doesn’t know who was the original sender. He only knows that he got it from Debbie. So now Bob has to ask the post office in Australia to give him the details of the one that sent this package to Debbie. Again, even if he gets them, he still can’t locate the original sender of the package since Debbie only knows that she got it from Frank. So now Bob has to go to the post office in France and ask for the details of the one that sent this package to Frank. Only then he can trace back the original sender, Alice. Now Bob has to go to Alice’s post office in Texas and ask for her real details.

As you can see, in order to trace back to Alice, Bob (or the governmental authorities in his country), has to get help from the American authorities, the Moroccan authorities, the Australian authorities and the France authorities. This complex cooperation between several countries is time and resource consuming and involves diplomatic aspects as well and the chances for such cooperation are very very slim.

This multi-layer packaging and routing around the world dramatically increases the anonymity of Alice and in most of the cases (if the route and middlemen are chosen carefully) it will be almost impossible to trace the package from Bob back to Alice.
Back to the virtual world.

How to be anonymous in the virtual world
The idea that we described for the physical world is implemented in the virtual world by a system that is called TOR, The Onion Routing. TOR is a system that helps anyone to be anonymous in the Internet.
How does it work?
First, Alice has to install the TOR software on her computer. Then, when Alice wants to connect to the Internet, the TOR on her machine will pick up 3 random TOR relay machines (like Ali, Debbie and Frank from the physical world example). Relay machines are regular Internet users that for the sake of freedom volunteer to relay anonymous Internet activity between TOR users. Now, the TOR on Alice’s computer will take the comment that she wants to leave on Bob’s website, and it will wrap in layers as in the physical world example. In the virtual world, a package from the physical world is called a Packet. In the most inner layer, the packet has Ali’s IP (5.5.5.5) address as the sender and Bob’s IP address (2.2.2.2) as the recipient. In the next layer, the inner packet is wrapped inside another packet that has Debbie’s IP (4.4.4.4) as the sender and Ali’s IP as the recipient. In the next layer, the previous packet is wrapped inside another packet that has Frank’s IP (3.3.3.3) as the sender and Debbie’s IP as the recipient. And in the outer layer, the previous packet is wrapped inside an outer packet that has Alice’s IP (1.1.1.1) as the sender and Frank’s as the recipient. In addition, as in the physical world example, each layer is encrypted (locked) and only the recipient of this layer can decrypt it. Since every recipient peels his layer (and only his layer), and delivers it to the next recipient, thus providing a multi-layer packeting, this concept is called Onion Routing.

For example, during my current anonymous browsing using the TOR, my real IP address 79.181.205.194 (Bezeq International ISP in Israel) was changed to 178.63.97.34 that is registered on Hetzner Online ISP in Germany. Therefore, from the virtual world point of view, I am in Germany with a German IP address from a German ISP and not in Israel.

So when Bob sees the comment, he thinks that Ali left it (since it is identified by Ali’s IP address as the sender). Now, if he wants to trace it back to the original sender, he has to go the ISP of the last sender (Ali), show them the packet and its details (Ali’s IP 5.5.5.5, time and date the packet was sent) and ask them for the IP address of the one that sent Ali this packet (Debbie, 4.4.4.4). Once he gets it he has to do the same process with Debbie’s, Frank’s and Alice’s Internet service providers. So in order to trace back the details of Alice, Bob has to get the cooperation of the Internet Service Providers and the authorities of USA, Morocco, Australia and France. The chances for that to happen are very very slim and in many cases it is even impossible to trace back this route. Actually, tracing back this route in the virtual world can be significantly harder than in the physical world.

The following image shows how our anonymous messages traveled around the world between TOR relay machines.

Who is the owner of TOR?
The TOR project is maintained by a nonprofit organization, based in the USA. TOR is free and open source. It means that anyone can analyze the code and verify that there are no backdoors which compromise the anonymity and privacy of the user. TOR is using a network of more than 2,500 volunteers (relay servers) around the world, that relay the anonymous communication of TOR’s users. There are relay servers in Australia, Argentina, Belgium, Brazil, Canada, Germany, Denmark, France, Switzerland,  Algeria, Czech Republic, Egypt, Spain, Finland, UK, USA, Israel, India, Italy, Japan, Mexico, Latvia, Russia, Panama, Poland, Singapore, Slovenia, Turkey, Ukraine, South Africa, Vietnam, Venezuela and more and more.

Does TOR guarantee 100% anonymity?
No, nobody can guarantee 100% anonymity, however the process of tracing back a TOR route to the user’s computer is very complex and requires technological, governmental and diplomatic resources and cooperation between countries around the world. Therefore, TOR is probably your best way to stay anonymous in the Internet.

It is important to mention that in some cases, for example when there is an evidence of a terrorism activity, such cooperation between countries to locate the user that is using TOR is possible. There are also some documented attacks against the TOR network that try to compromise the anonymity of the users but still they are rare and complex.

How to browse anonymously using TOR?
The simplest way to use TOR is by using the TOR Browser Bundle that can be downloaded here, where you can also read the very simple instructions for using TOR. The TOR Browser Bundle is available for Windows, Mac, Linux and Android. It includes couple of software packages and a special version of Firefox browser. Once you download and extract the bundle, it will create a folder with several sub-folders. In the main folder you will find a file that is called “Start Tor Browser” (for Windows), or “start-tor-browser” (for Linux) or “TorBrowser_en-US.app” (for Mac). Before you run this file, close all the open browsers so you won’t get confused between the regular browser that you are using and the anonymous browser that TOR is using. Once you run this file, TOR will start running and when it will be ready for your anonymous browsing it will open its special version of Firefox. From now on (until you exit the browser), your browsing activity using this browser will be anonymous using the TOR network. Bear in mind that your browsing will be slower since every packet is encrypted and decrypted couple of times and it travels through several computers around the world. Well, this is the small price that you have to pay to be real anonymous in the Internet.

 Posted by at 7:07 am