New Software Detects Bots Scraping Website Data

Web sites such as job boards face a persistent problem: their data is constantly pilfered by automated bots.

By Jeremy Kirk

Wed, November 04, 2009 — IDG News Service — Web sites such as job boards face a persistent problem: their data is constantly pilfered by automated bots.

The data ends up on other competing job boards, which have stolen the content. It's a problem that plagues any Web site whose intellectual property must be publicly posted for free, or even those with subscription models.

But an Atlanta-based security company that specializes in detecting bots has developed software that can detect those screen-scraping and data-mining bots.

Pramana's main product, HumanPresent, detects automated bots that, for example, enter spam into Web-based forms or register for free e-mail accounts to be used for spam.

Pramana has now developed a module called "data mining and screen scraping prevention" for HumanPresent. It works on many of the same principles as its main product but has been modified for data-mining scenarios, said David Crowder, Pramana's CEO.

HumanPresent can detect bots by noticing differences in the way a human would normally interact with a Web page and contrasting that with how bots behave. It looks at more than 30 metrics, such as keyboard strokes, mouse clicks and the timing of those actions.

HumanPresent looks at single transactions, but the data-mining module has been modified to look at a timed period when either a bot or human is on the site, Crowder said.

Data-mining bots tend to entirely circumvent a browser's user interface. For example, a bot may request a Web page with lots and lots of data, but never scrolls or clicks on a page. If a series of pages are opened and viewed in that manner, it could mean a data-mining bot has arrived.

Pramana assigns a unique ID to the visitor, and after analyzing the visitor's behavior, can make a decision whether to label the visitor a bot or not. There are several different ways a Web site operator can then choose to deal with the situation.

The IP (Internet Protocol) address of the bot's computer can be block permanently. One car auction Web site that is testing Pramana's data mining module decided to move suspected bots into a "sandbox" where it is served completely false data.

"They're indeed data mining -- it's just dead wrong," Crowder said.

Other options include prompting the Web site visitor with a challenge or task, which some bots aren't capable of completing.

Data mining costs companies dearly. Companies that sell premium data will find that their competitors will buy a subscription and then use automated bots to steal the data for their own sites. In one example, a Web site that has gigabytes of data on used car prices found their data had been scraped and was for sale on eBay.

Pramana

Loading...
Security MarketSpace
Practical Approaches for Securing Web Applications
Enterprises understand the importance of securing web applications to protect critical corporate and customer data. What many don't understand, is how to implement a robust process for integrating security and risk management throughout the web application software development lifecycle. Learn more »
An Executive's Guide to Web Application Security
Since so many Web sites contain vulnerabilities, hackers can leverage a relatively simple exploit to gain access to a wealth of sensitive information, such as credit card data, social security numbers and health records. It's more important than ever to examine your Web application security, assess your vulnerability and take action to protect your business. Learn more »
Web Application Vulnerabilities
Security managers may work for midsize or large organizations; they may operate from anywhere on the globe. But inevitably, they share a common goal: to better manage the risks associated with their business infrastructure. Increasingly, Web application security plays a significant role in achieving that goal. Learn more »
Retooling IT for a Mobile Workforce
Check out this research note from IDC for guidance. Learn more »
Today's Risky Data Environment
This paper explains how an IT and security service provider can provide a practical, manageable and reliable solution. Learn more »
Business Continuity - Are You Always Open for Business?
This Oracle business brief explains how mid-sized can improve performance by creating an IT infrastructure that makes working faster, easier and more effective. Learn more »
 
SPONSORED LINKS
 

Making Consumer Two-Factor Authentication Simple and Cost-Effective

Mining the Cloud to Ease the Enterprise Compliance Burden

Solve Five Key IT Security Challenges with Cloud-Based Authentication

White Paper: Managed Security for a Not-So-Secure World

Secure Email and Web-Based Communication from Evolving Attacks

WagerWorks Takes Fraudsters Out of the Game using iovation

White Paper: A Security Blueprint Delivered From within the Network

Return on Information: Google Enterprise Search pays you back

Cut Costs & Green Your IT Operations with PC Power Management

White Paper: 4 Customer Service Myths

White Paper: Improve Agility with Operational Responsiveness

White Paper: Legacy Tools: Not Built for the Helpdesk

Taking a Seat at the Executive Table: The Reality of Virtualization

White Paper: Next Generation Remote Infrastructure Management

Seven Design Requirements for Web 2.0 Threat Protection

Increase UPS efficiency without sacrificing protection.

Learn how advanced forecasting tools can deliver significant business results for global corporations.

Lower IT Costs with Oracle Database 11g Release 2

White Paper: Visibility and the New Normal of Mobile Work

Taking the Service Desk to the Next Level

Learn about The Information Technology Infrastructure Library.

Return on Information: Google Enterprise Search pays you back. Get the facts.

VMware. The source for Business Infrastructure Virtualization.

ShoreTel tells businesses to untangle from competitors' complexity and turn to its brilliantly simple UC solution

Top Five CIO Challenges

Authentication as a Service by Forrester Research

Cloud-Based Authentication for Next-Generation Extranets

Mobile Security: The Essential Ingredient for Today's Enterprise

IDC White Paper: CCM for IT Compliance and Risk Management

Keeping Your Members Safe from Online Scams and Predators

Learn about the growing threat of insider data theft.

Upgrading to VMware vSphere with vWire

Maximizing website Return on Information with high-quality search

See how AT&T can help protect your network.

Webcast: Unleashing the Power of Customer Data

White Paper: 5 Best Practices for Smartphone Support

Global Research: CIOs Weigh In On Virtualization

5 Key Virtualization Management Challenges

The Total Economic Impact of Network Security Intrusion Prevention

Generation Remote Infrastructure Management - Changing the Paradigm

Cloud-Based Email Management: Opinion Shifts In Favor

eBook: How Can You Make Your People Productive Anywhere?

Achieving Business Agility with Application Grid

Ready to virtualize tier one applications? Check your virtualization maturity.

Seven Ways ITIL Can Help You in an Economic Downturn

Tips for successful virtualization management.

AT&T Synaptic Storage as a Service. Expand on demand

Trend Micro ranked #1 against real-world malware. Read more.

Webinar: Jump-start your in-house e-discovery with Ringtail QuickCull from FTI Technology

Streamline IT Costs. Boost Performance with WAN Optimization.

 
 
RESOURCE CENTER