Study on Implementation and Impact of Google Hacking in Internet ...

5 downloads 0 Views 223KB Size Report
Study on Implementation and Impact of Google Hacking in Internet. Security. Muharman Lubis1, Nurul Ibtisam binti Yaacob2, Hafizah binti Reh3 and Montadzah ...
Study  on  Implementation  and  Impact  of  Google  Hacking  in  Internet   Security  

1

Muharman  Lubis1,  Nurul  Ibtisam  binti  Yaacob2,  Hafizah  binti  Reh3  and  Montadzah  Ambag   Abdulghani4   International Islamic University of Malaysia (IIUM)   2

3

4

[email protected],   [email protected],   [email protected],   [email protected]  

ABSTRACT - As the number of websites and amount of information has increased, modern lives rely more on search engines to scoop up relevant piece of information out of Information Sea. In response to rapidly growing amount of information on the web space, major search engine companies such as Google, Yahoo, and MSN crawl web servers and index crawled information more frequently and thoroughly on the global level. Furthermore, to stay on top of the competitive search engine market, they diligently improve search algorithm and endeavor to provide Internet users with easy-to-use search interface. Due to this diligent and competing effort of search engine companies, Internet users can freely access billions of pages of information regardless of time and space constraints with a simple typing and clicking. Google Hacking uses the Google search engine to locate sensitive information or to find vulnerabilities that may be exploited. This paper evaluates how much effort it takes to get Google Hacking to work and how serious the threat of Google Hacking is. The paper discusses the implementation and impact of Google hacking in Internet security. Index Terms – Google hacking, Internet security, Google implementation, impact

I. INTRODUCTION The idea of hacking might be conjured up stylized images of electronic vandalism, espionage, dyed hair and body piercing but the essential none other than that. Most people associated hacking with breaking the law, claimed all those who engage in hacking activities to be criminals. Indeed, there are people out there who use hacking techniques to break the law but hacking is not really about that [9], the hacking is a way of understanding what is possible, sensible and ethical in the twenty-first century by stressed this embedded towards our life because the hack needs a social and cultural context [10]. In order to understand what hacking is, we need to know the difference between hacking and cracking [11]. The definition of hacking somehow change with cracking because the media role and thoughtless of some people. All these hacking activities exist within a set of communal relations that each of them expresses a different aspect of hacking. Recently, Hackers have been divided into three categories that are black hats, white hats and grey hats, which have been referred as malicious hackers, ethical hackers and ambiguous hackers correspondingly [17]. Black hat hackers are people who hack computing systems for their own benefit or the one who breaks into systems illegally for personal gain, notoriety or other less-thanlegitimate purposes [14][15], for example, they may hack into an online store’s computer system and steal credit card numbers stored in it while white hat hackers are the one who wrote and tested open source software, worked for

corporations or hired by companies to help them beef up their security, worked for the government to help catch and prosecute black hat hackers and otherwise use their hacking skills for noble and legal purposes. There are also hackers who refer themselves as gray hat hackers that they are operating somewhere between the two primary groups [14]. Gray hat hackers might break the law but they consider themselves to have a noble purpose in doing so. For example, they might crack systems without authorization and then notify the system owners of the systems’ fallibility as a public service or find security holes in software and then publish them to force the software vendors to create patches or fixes for the problem. Google Hacking is the most popular technique in hacking activities that publicly introduce by Johnny Long around 2004 that define as “the art of creating complex search engine queries in order to filter through large amounts of search results for information related to computer security” [16]. Attackers can use Google Hacking to uncover sensitive information about a company or to uncover potential security vulnerabilities through Internet even some people could use Google Hacking to determine if their websites are disclosing sensitive information which known as penetration testing or vulnerability assessment. When a computer connects to a network and begin communicating with other computers, it is essentially taking a risk refer to Internet security that involves the protection of a computer's Internet account and files from intrusion of an unknown user. Basic security measures involve protection by well-selected passwords, change of file permissions and back up of computer's data. In this paper, implementation define as “carrying out, execution, or practice of a plan, a method, or any design for doing something” [16] and impact define as “a forceful consequence; a strong effect” that relates to the using methodology of Google Hacking by certain malicious people or group that effect the Internet Security. II. BACKGROUND Nowadays, it is really hard to search information as the number of resources available on the Internet is increasing at a rapid rate. Consequently, search engine that is also known as web search engine or automated web search, which is one of the services provided by Internet has been introduced. Search engine has been designed to help in finding information stored on a computer system or on a Website and help to minimize the time required to find information [8]. One of the most powerful, efficient and effective search engines is Google. Currently, Google search engine has up to 12 billion pages [17] and whether

Proceedings of Regional Conference on Knowledge Integration in ICT 2010  

believe it or not, it is actually the starting point of many hacking activities later and in fact, it is also one of the most interesting uses of Google search engine by certain people, this kind of activity is known as Google Hacking. Organizations usually disclose too much information on their Web servers without ever knowing that the leak or weakness in there, somehow it’s utilized by malicious hacker. Further, search engines like Google has powerful features that allow users to find some sensitive information stored in the far corners of Web-connected servers and even perform a vulnerability-searching attack. In the past 2 years, Google hacking is a term that has not only become commonplace in the security community but in the mainstream media as well [13]. Apparently, Google hacking involves using the popular Google search engine to locate sensitive or confidential online information that should be protected but they are not. Using search engines to uncover sensitive information is not a new concept. Nonetheless, with the numerous advanced search operators that Google makes available on its enormous database, carefully crafted query strings can reveal jaw-dropping results [3]. Usually, The filtering methods are performed through using advanced Google operators while attackers can use Google Hacking to uncover sensitive information about a company or to uncover potential security vulnerabilities. A security professional can use Google Hacking to determine if their websites are disclosing sensitive information. Google Hacking turned out to be a very powerful and flexible hacking approach, it also found was very helpful to use Google cached pages while performing Google Hacking [2]. Google crawls web pages and stores a copy of them on its local servers. They have tried to use Google cached pages to anonymously browse a target's site without sending a single packet to its server. Google grabs most of the pages through crawls but omits images with some other space consuming media. When viewed Google cached pages by simply clicking on the cached link on the results page, the hackers will end up connecting to the target's server to get the rest of the page content. When a user enters a keyword in a search text, the spider will start exploring the Webs. Then the Google later on will return a results page that consists of a name list of the site, a summary or snippet of the site, the URL of the actual page, a cached link that shows the page as it looked when the spider last visited the page and a link to pages that have similar content [3]. Google’s search results are dynamic. When a query is submitted through Google’s web interface, Google takes user to a created results page that can be represented by a single URL that will appear in user browser’s address bar. For instance like the following URL: http://www.google.com/search?hl=en&q=%22peanut+b utter+and%22+jelly&btnG=Search The question mark (?) denotes the end of the URL and the start of arguments, the ampersand (&) separates arguments, (h1) represents the language in which the results page will be printed, (q) represents the start of the query string, (%22) represents the hexadecimal value of the double quotes character, the plus sign (+) represents a space, and (btnG=Search) denotes that the Search button was pressed on Google’s web interface [13]. Knowledgeable Google users can edit the URL directly inside their browser’s address bar and hit enter to get the

new search results in a very quick way. As a security professional, it is critical to understand these URLs, so that no one can perform a Google hacking vulnerability assessment. The Google Hacking Database (GHDB) is a database of queries used by contributed hackers to identify sensitive data on your website such as error messages, files containing passwords, files containing usernames (no passwords), files containing juicy info (no usernames or passwords, but interesting stuff none the less), pages containing logon portals, pages containing network or vulnerability data such as firewall logs, sensitive online shopping information, various online devices and vulnerable servers [17][19]. III. IMPLEMENTATION Google hacking already became popular and famous not only in the hacker communities, but it also in common people who don’t really understand the hacking procedure. The method and design in using Google search engine for hacking activities involves the combination between basic and advance operator in Google to maximize the specific searching and finding, it could be divided as formal design that refer to googledork and googleturds which introduce by Johnny Long and recognized by academic and media level, Google automated scanner that refers to specific software be built by communities or person to facilitate Google hacking, manual exploration that refer to single attempt by certain malicious hacker in enhancing Google hacking knowledge and lastly, the integrated hacking which put Google hacking as the beginning process before do other method of hacking. Table 1. Advanced Google Operator Search Service

Web Search Image Search Groups Directory News Product Search

Advanced Search Operators

allinanchor:, allintext:, allintitle:, allinurl:, cache:, define:, filetype:, id:, inanchor:, info:, intext:, intitle:, inurl:, phonebook:, related:, site: allintitle:, allinurl:, filetype:, inurl:, intitle:, site: allintext:, allintitle:, author:, group:, insubject:, intext:, intitle: allintext:, allintitle:, allinurl:, ext:, filetype:, intext:, intitle:, inurl: allintext:, allintitle:, allinurl:, intext:, intitle:, inurl:, location:, source: allintext:, allintitle:

The one of well-known implementation to utilize the Google Search engine that is “googledork”, It is the attempt to standardize function of Google Hacking by the first introducer, Johnny Long in his website as sharing knowledge. The term "googledork" that was coined by the author has originally meant "An inept or foolish person as revealed by Google" [19] but after a great deal of media attention, the term came to describe those who "troll the Internet for confidential goods." Either description is really fine but what matters are that the term googledork conveys the concept that sensitive stuff is on the web and Google can help you find it. The official googledorks page lists many different examples of unbelievable things that have been dug up through Google by the maintainer of the page, Johnny Long, are around 14 categories of them refer to GHDB [17][19]. Each listing shows the Google search

Proceedings of Regional Conference on Knowledge Integration in ICT 2010  

required to find the information, along with a description of why the data found on each page is extremely interesting. On the other hand, syntax and operator function in Google search engine doesn’t miss the error or the weakness either, these one recognized as “Googleturds” that defines as the little dirty pieces of Google ‘waste’ [1]. These search results seem to have stemmed from typos Google found while crawling a web page. Google also can reveal many personal data when its advanced search parameters are used. The implementation of googleturds along before is quite amazing with the revealing of credit card, web directories, password and many more. Google concern more to correct the error in their operator and patterns because the high pressures from certain organization and companies towards the huge impact of the uses these errors. Google hacking involves the use of certain types of search queries to look for Web site vulnerabilities. More than approximately 1,500 such queries, mostly store in the Google Hack database website by Johny Long, some of just spread to the other discussion website or blog. This Google Hacking trend slowly but surely brings more people to develop the easiest tools or software to search effectively and efficiently that combine certain methods of the googledorks inside it. Unfortunately, sometimes organizations set up their systems in a way that allows Google to index and save a lot more information than they intended [23]. Another implementation is Google automated scanner, one of the famous one is Goolag Scanner, a Windows-based auditing tool that was built around the concept of "Google hacking". The Cult of the Dead Cow hacker group released an open-source tool designed to enable IT workers to quickly scan their Web sites for security vulnerabilities and at-risk sensitive data, using a collection of specially crafted Google search terms to provide a very easy and legitimate tool for security professionals to test their own Web sites for vulnerabilities, and to raise awareness about Web security in and of itself" [22]. Actually, many attempts have been done in implementing Google hacking process automatically using the software even the newbie could use it at all. Goolag Scanner, Gooscan, Google Hacks, subdomain Lookup, etc are the example of software to facilitate these activities, perhaps time by time will increased more along with the popularity of Google which become great. Interestingly, the approach will bring many people try more to use Google operator in searching private data thorough the Internet intentionally that force organization or company put their best effort to prevent them. On contrary, the malicious expert hacker could utilize this kind of technique whereby massive newbie try to use Google hacking. Once they know the pattern, the in-depth searching of implementation through manual exploration could be done effectively and efficiently. The ease of use Google Hacking somehow really incredible, every time and everywhere we could use Google search engine to find out private data in various purpose. The private data searches grouped into four different sections according to the privacy level. These are identification data, sensitive data, confidential data and secret data searches [21]. Identification data relates to personal identity of user, it could be found out by keywords like name, address, phone, email, curriculum vitae, username, etc., optionally for a

certain person or within certain document types like following query which find out many list of identity: allintext:name email phone address ext:pdf Meanwhile, sensitive data relates to data public but contain private data whose reveal might be anger the owner, like emails, forum postings, sensitive directories and Web2.0 based applications like following query which find out sensitive directories: intitle:"index of" inurl:/backup Whereas the confidential data relates to private data that could be access by certain group or person only like passwords, chat logs, confidential file, online webcams, etc but Google still could reveal this kind of data like following query which find out address of online webcam: inurl:”viewerframe?mode=motion” Lastly, secret data that relates to private data accessible only to the owners like encryption messages, private keys, secret keys, etc. Finding encrypted message could be found by following query like: -intext:"and" ext:enc All kind of manual search exploration only needs the indepth knowledge in format, patterns, operator, perimeter and practice. Certain person could be independence or autodidact to be expert in Google hacking. However, the worst case in implementing Google search engine is by combining it with other hacking methods so it only the beginning process like footprinting, port scanning, information gathering, etc to further hacking process which known as integrated hacking. The prediction to utilize Google search engine as only the first step to gain current weakness or vulnerability in certain method became the hot topic relates to the process to prevent and recognizes it. Nowadays, pretty much any hacking incident most likely begins with Google [1] so utilization of Google Hacking is only the beginning but the impact resides in there hazardously. The organization, corporation or even the single individual which store their data in the website should develop their own strategy, policy and procedure to keep secure and safety their own data from being revealed by somebody. IV. IMPACT The search engines especially Google itself already became the important tool in our daily activity by using Internet so it will difficult to differentiate in the beginning which user or group has intentionally to do Google hacking. Consequently, there is no certain method to identify them but only the protection and response from both Google Corporation and the website system administrator can be measured at this moment. The massive attack of Google hacking have given the direct and indirect impact to Internet security, in this study we classified the directly impact into low impact which associated mostly in exploration, moderate impact which associated mostly in exposure and high impact which associated mostly in exploitation, with both positive and negative impact, somehow trend and standardization; awareness on Internet security; strategy, policy and procedure become the indirectly impact.

Proceedings of Regional Conference on Knowledge Integration in ICT 2010  

Negative Impact

Positive Impact

Low Impact Free access online newspaper Finding information regard certain people Mass quantity Google Google block the hacking user “googleturds” Find Sub domain lookup Google hacking as Artificial Intelligence Google Proxy Server hacking Awareness on Internet security Moderate Impact Find out vulnerabilities in Google Analytics server, files, web files, web Implementation Checklist application, unauthenticated program, various online devices, etc Application Fraud Enhance Tools to Protect the Privacy Google Hacking Exposed Penetration Testing VoIP Footprinting Standardization Hacking Process Analysis Social Engineering Certification Ethical Hacker Google Hacking as Malicious Honeypot Project Google Code hacking trap High Impact Oracle Database Exploitation Fraud Prevention Telecommunication Indexes & Reveal Safeguarding privacy against Sensitive/Confidential Data misuse and exploits SAP Enterprise Portal Security Exposure

Enhance Defense Security Strategy

Identity Theft Advanced MySQL Exploitation

Fraud Management Industry Internet security management Development (Prevention, Protection, Response and Recovery) Internet safety, security and privacy manuals

Apache Database Exploitation Table 2. Google Hacking Impact

The last several years, identity theft has been one of the fastest growing crimes. Unfortunately, the Internet has been facilitating this phenomenon since it represents a tremendous open repository for sensitive identity information available for those who know how to find them, including fraudsters [20]. The exploitation of benefits from one identity is really worst case let alone the many identity being used by certain user, it will give huge damage to certain company, the system and the related users themselves. Google hacking somehow became the trend in information communication and technology community even they made the standardization as the process of developing and agreeing upon technical standards among themselves. Both trend and standardization also become the hot discussion influence by the massively use of Google hacking. Nowadays, information, systems, and networks are pervasive and ubiquitous, all of them provided throughout the Internet. Internet's vast resources are an excellent means for everyone to explore, research, and enjoy new information and interests. The Internet is a public place, however, so it became important to teach the Internet user

how to be safe throughout the Internet because it also lies the dangerous in there. Recently, we need to learn about Google Hacking to provide a good level of protection for our sites and to check for sensitive information disclosure as strategy and policy in anticipating Google hacking. As we become more familiar with manual hacks, we can start using some of the automated Google Hacking tools. It will automate the hacks but it is ensuring that every single page within our site is protected. Automated tools allow for periodic security checks with frequency that is simply impossible to achieve with manual hacks. Here the common activities based on the strategy, policy and procedure in assuring Internet security that usually organization proceed [18]. 1. Ensure host and network security basics are in place. 2. Build/publish security features (authentication, role management, key management, audit/log, crypto and protocols. 3. Use external penetration testers to find problems. 4. Create share standard policy. 5. Identify gate locations and gather necessary artifacts. 6. Know all regulatory pressures and unify approach. 7. Identify personally identifiable information (PII) obligations. 8. Provide awareness training. 9. Create security standards. 10. Perform security feature review. 11. Identify software defects found in operations monitoring and feed them back to development. 12. Ensure QA supports edge/boundary value condition testing. 13. Create or interface with incident response. 14. Create data classification scheme and inventory. 15. Use automated tools along with manual review. The sharing knowledge on the strategies, policies and procedures is the advantages own by every companies or organization to fight back the Google hacking threat, even though the full security itself never exist, somehow it really prevent the confidential or sensitive data towards the kiddy or newbie of Google Hacking that usually increase day after day as mountain as the impact of the openness of knowledge in the Internet. The discussion and improvement should be done frequently, just in case to expertise the strength of the strategy, policy and procedures while the process of assuring and monitoring also should be done effectively. The maintenance of these three approaches is really difficult like other process of maintenance. Apparently, we could not deny it because the management risk is the attempt to prevent such disaster occurs towards security of the company. It’s better to prevent rather than a cure, in this case a security measurement rather than a disaster recovery. Google hacking could become serious and great threats to an organization or company. If a hacker spent enough time analyzing the target and understanding how the queries found information, they will be able to find the information that they want even if the information are confidential or sensitive. Moreover, the well-known implementation to utilize the Google Search engine, “Google dork” and the Google automated scanners will help hacker to easily find sensitive data using Google. It shows that almost effortless for today hacker to do Google hacking.

Proceedings of Regional Conference on Knowledge Integration in ICT 2010  

V. CONCLUSION To be publicly accessible is the nature of web sites and applications. Combined with search engine functionality, it makes it effortless for attackers to access an organization site or find out information about the organization. Some organization did not realize that even directory listings, error pages and hidden login pages can be indexed and when a search engine “indexes” a site, it inadvertently providing loads of information for potential attackers. This is what Google hacking all about. However, there are some probable solution and prevention to these problems. The best way to face the Google hacking by doing the basic risks management and those are prevention, protection and response. Prevention from Google hacking relates to anticipation of needs, management wishes, hazards and risks [4]. One of the strategies in approaching the Google hacking prevention are by reducing risk in the existing website so that the disaster has the lower probability of occurrence. However, those methods can be implemented if the services are provided only in the low scale without the need of mobility and intense of sharing data such as keep sensitive data off, consider removing site from Google's index [7], automatic scanner, mitigation data, run regularly schedule assessment, etc. Protection from Google hacking relates to the process of keeping something such as confidential information or sensitive data safe from being hacked. For instance, a security token associated with a resource such as a file. Usually, the approach an organization could take for protection is by balancing their availability, integrity, confidentiality and performance such as installing firewall, Robot.txt [6], Google Hack Honeypot [5], etc. Response from a company that has been hacked by Google hacker is very important in order to avoid the occurrence of the same kind of incident. If the company did not respond to this incident, there might be another sensitive or confidential data stolen by the hacker. Furthermore, since the replacement cost let say, for stolen research data is high, of course, a company do not want to cover the cost again after one incident such as report the incident, educate the employee, incident Response Policy, etc. Thus, we as a user should aware on the situation that even hidden login pages can be indexed and when a search engine “indexes” a site, it unconsciously providing treasure troves of information for potential attackers. We should improve our information security in order to avoid our confidential or sensitive data from being hacked and finally, we should also prepare ourselves with the best methods of protection and prevention from Google Hacking.

REFERENCE [1] Long, J. (2004), Google Hacking Mini-Guide, retrieved on 1 Feb 2010 from http://www.informit.com/articles/article.aspx?p=170880&seqNu m=4 [2] Billig, J., Danilchenko, Y. and Frank, C.E. (2008), “Evaluation of Google Hacking”, Proceedings of InfoSecCD Conference’08, p. 27-32, September 26-27, 2008, Kennesaw, GA, USA. [3] Lancor, L. & Workman, R. (2007). Using Google Hacking to Enhance Defence Strategies. Proceedings of SIGCSE ’07, p. 491-495. [4] Hawlett Packard (HP), Preventing Google hacking:steps to protect your web application, retrieved on 1 Feb 2010 from http://www.docstoc.com/docs/3292692/4aa1-5395enwpreventing-google-hacking [5] SourceForge.net, “What is GHH”, retrieved on 1 Feb 2010 from http://ghh.sourceforge.net/ [6] The Web Robots Pages, retrieved on 1 Feb 2010 from http://www.robotstxt.org   [7] Webmaster Tools, Removing my own content from Google, retrieved on 1 Feb 2010 from http://www.google.com/remove.html [8] Comer, D. (2007), The Internet Book: Everything You Need to Know about Computer Networking and How the Internet Works. 4th Ed. Prentice Hall. [9] Erickson, J. (2003), Hacking: The Art of Exploitation. No Starch Press. [10] Jordan, T. (2008), Hacking: Digital Media and Technological Determinism. Polity. [11] Knittel, J. & Soto, M. (2003), Everything you Need to Know about the Dangers of Computer Hacking. The Rosen Publishing Group. [12] Leyden, J. (2005), Hacking Google for Fun and Profit. Retrieved February 8, 2010, from Channel Register website: http://www.channelregister.co.uk/2005/04/04/google_hacking/ [13] Long, J. and Skoudis, E. (2005), Google Hacking for Penetration Testers. Syngress. [14] Shinder, D. L. & Cross, M. (2008), Scene of the Cybercrime. 2nd Ed. Syngress. [15] Wang, J. (2009), Computer Network Security. Springer. [16] Wikipedia, (2008), Google Hacking. Retrieved February 8, 2010 from: http://en.wikipedia.org/wiki/Google_hacking [17] Yerrapragada, K. P. (2007). Google Hacking!!!, Retrieved January 20, 2010, from College of Engineering, San Jose State University website: http://www.engr.sjsu.edu/meirinaki/courses/cmpe237/studentprese ntations/Krishna_GOOGLE%20HACKING.pdf [18] McGraw, G., Chess, B. and Migues, S. (2010). Software [In]security: What Works in Software Security, Retrieved February 28, 2010 from: http://www.informit.com/articles/article.aspx?p=1569495 [19] GHDB, (2006), Google Hacking Database. Retrieved January 21, 2010 at http://johnny.ihackstuff.com/ghdb/ [20] Abdelhalim, A. and Issa Traore. (2007). The Impact of Google Hacking on Identity and Application Fraud. PACRIM'07. [21] Emin Islam Tatli. (2006). Google Hacking Against Privacy, Kloster Bronbach, Germany, July 2006. [22] Vijayan, J. (2008), Hacker group releases automated 'Google hacking' tool, Retrieved February 21, 2010 at http://www.computerworld.com/s/article/9064238/Hacker_group_ releases_automated_Google_hacking_tool [23] Scalet, S. D. (2006), 5 Ways Google Is Shaking the Security World, Retrieved February 24, 2010 at http://www.csoonline.com/article/221515/5_Ways_Google_Is_Sh aking_the_Security_World?page=1

Proceedings of Regional Conference on Knowledge Integration in ICT 2010