实例讲解ASP实现抓取网上房产信息
<<A href='mailto:%@LANGUAGE="VBSCRIPT'>%@LANGUAGE="VBSCRIPT</A>" CODEPAGE="936"%><BR><!-- #include file="conn.<A class=ReplaceKeyword href="http://www.knowsky.com/asp.asp" target=_blank>asp</A>" --><P><!-- #include file="inc/function.asp" --><BR><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "<A href="http://www.w3.org/TR/html4/loose.dtd">http://www.w3.org/TR/html4/loose.dtd</A>"><BR><html><BR><head><BR><title>Untitled Document</title><BR><meta http-equiv="Content-Type" content="text/html; charset=gb2312"><BR><meta http-equiv="refresh" content="300;URL=steal_house.asp"><BR></head></P>
<P><body><BR><%<BR>on error resume next<BR>' <BR>Server.ScriptTimeout = 999999<BR>'========================================================<BR>'字符编码函数<BR>'====================================================<BR>Function BytesToBstr(body,code) <BR> dim objstream <BR> set objstream = Server.CreateObject("adodb.stream") <BR> objstream.Type = 1 <BR> objstream.Mode =3 <BR> objstream.Open <BR> objstream.Write body <BR> objstream.Position = 0 <BR> objstream.Type = 2 <BR> objstream.Charset =code<BR> BytesToBstr = objstream.ReadText <BR> objstream.Close <BR> set objstream = nothing <BR>End Function </P>
<P>'取行字符串在另一字符串中的出现位置<BR>Function Newstring(wstr,strng) <BR> Newstring=Instr(lcase(wstr),lcase(strng)) <BR> if Newstring<=0 then Newstring=Len(wstr) <BR>End Function <BR>'替换字符串函数<BR>function ReplaceStr(ori,str1,str2)<BR>ReplaceStr=replace(ori,str1,str2)<BR>end function<BR>'====================================================<BR>function Read<A class=ReplaceKeyword href="http://www.knowsky.com/xml.asp" target=_blank>Xml</A>(url,code,start,ends)<BR>set oSend=createobject("Microsoft.XMLHTTP")<BR>SourceCode = oSend.open ("GET",url,false) <BR>oSend.send()<BR>ReadXml=BytesToBstr(oSend.responseBody,code )<BR>start=Instr(ReadXml,start)<BR>ReadXml=mid(ReadXml,start)<BR>ends=Instr(ReadXml,ends)<BR>ReadXml=left(ReadXml,ends-1)<BR>end function</P>
<P>function SubStr(body,start,ends)<BR>start=Instr(body,start)<BR>SubStr=mid(body,start+len(start)+1)<BR>ends=Instr(SubStr,ends)<BR>SubStr=left(SubStr,ends-1)<BR>end function</P>
<P>dim getcont,NewsContent<BR>dim url,title<BR>url="<A href="http://www.***.com"'/">http://www.***.com"'</A>新闻网址knowsky.com<BR>getcont=ReadXml(url,"gb2312","<table class=k2 border=""0""","</table>")<BR>getcont=RegexHtml(getcont)<BR>dim KeyId,NewsClass,City,Position,HouseType,Level,Area,Price,Demostra</P>
<P>dim ContactMan,Contact<BR>for i=2 to ubound(getcont)<BR> response.Write(getcont(i)&"__<br>")<BR> <BR> tempLink=mid(getcont(i),instr(getcont(i),"href=""")+6,instr(getcont(i),""" onClick")-10)<BR> tempLink=replace(tempLink,"../","")<BR> <BR> response.Write(i&":"&tempLink&"<br>")<BR> NewsContent=ReadXml(tempLink,"gb2312","<td valign=""bottom"" width=""400"">","<hr width=""760"" noshade size=""1"" color=""#808080""> ")<BR> NewsContent=RemoveHtml(NewsContent)<BR> NewsContent=replace(NewsContent,VbCrLf,"")<BR> NewsContent=replace(NewsContent,vbNewLine,"")<BR> NewsContent=replace(NewsContent," ","")<BR> NewsContent=replace(NewsContent," ","")<BR> NewsContent=replace(NewsContent,"&nbsp;","") <BR> NewsContent=replace(NewsContent,"\n","") <BR> NewsContent=replace(NewsContent,chr(10),"")<BR> NewsContent=replace(NewsContent,chr(13),"")<BR> '===============get Content=======================<BR> response.Write(NewsContent)<BR> KeyId=SubStr(NewsContent,"列号:","信息类别:")<BR> NewsClass=SubStr(NewsContent,"类别:","所在城市:")<BR> City=SubStr(NewsContent,"城市:","房屋具体位置:")<BR> Position=SubStr(NewsContent,"位置:","房屋类型:")<BR> HouseType=SubStr(NewsContent,"类型:","楼层:")<BR> Level=SubStr(NewsContent,"楼层:","使用面积:")<BR> Area=SubStr(NewsContent,"面积:","房价:")<BR> Price=SubStr(NewsContent,"房价:","其他说明:")<BR> Demostra=SubStr(NewsContent,"说明:","联系人:")<BR> ContactMan=SubStr(NewsContent,"联系人:","联系方式:")<BR> Contact=SubStr(NewsContent,"联系方式:","信息来源:") <BR> response.Write("总序列号:"&KeyId&"<br>")<BR> response.Write("信息类别:"&NewsClass&"<br>")<BR> response.Write("所在城市:"&City&"<br>")<BR> response.Write("房屋具体位置:"&Position&"<br>")<BR> response.Write("房屋类型:"&HouseType&"<br>")<BR> response.Write("楼层:"&Level&"<br>")<BR> response.Write("使用面积:"&Area&"<br>")<BR> response.Write("房价:"&Price&"<br>")<BR> response.Write("其他说明:"&Demostra&"<br>")<BR> response.Write("联系人:"&ContactMan&"<br>")<BR> response.Write("联系方式:"&Contact&"<br>")<BR> 'title=RemoveHTML(aa(i))<BR> 'response.Write("title:"&title)<BR> for n=0 to application.Contents.count<BR> if(application.Contents(n)=KeyId) then<BR> ifexit=true <BR> end if <BR> next <BR> if not ifexit then<BR> application(time&i)=KeyId<BR> '添加到<A class=ReplaceKeyword href="http://www.knowsky.com/sql.asp" target=_blank>数据库</A><BR> '====================================================<BR> set rs=server.CreateObject("adodb.recordset") <BR> rs.open "select top 1 * from news order by id desc",conn,3,3<BR> rs.addnew<BR> rs("NewsClass")=NewsClass<BR> rs("City")=City<BR> rs("Position")=Position<BR> rs("HouseType")=HouseType<BR> rs("Level")=Level<BR> rs("Area")=Area<BR> rs("Price")=Price<BR> rs("Demostra")=Demostra<BR> rs("ContactMan")=ContactMan<BR> rs("Contact")=Contact<BR> rs.update<BR> rs.close<BR> set rs=nothing<BR> end if<BR> '==================================================<BR> <BR>next<BR>function RemoveTag(body)</P>
<P> Set regEx = New RegExp<BR> regEx.Pattern = "<[a].*?<\/[a]>"<BR> regEx.IgnoreCase = True<BR> regEx.Global = True<BR> Set Matches = regEx.Execute(body) <BR> dim i,arr(15),ifexit<BR> i=0<BR> j=0<BR> For Each Match in Matches<BR> TempStr = Match.Value <BR> TempStr=replace(TempStr,"<td>","")<BR> TempStr=replace(TempStr,"</td>","")<BR> TempStr=replace(TempStr,"<tr>","")<BR> TempStr=replace(TempStr,"</tr>","") <BR> arr(i)=TempStr <BR> i=i+1<BR> if(i>=15) then<BR> exit for<BR> end if<BR> Next<BR> Set regEx=nothing<BR> Set Matches =nothing<BR> RemoveTag=arr<BR> <BR>end function<BR>function RegexHtml(body)<BR> dim r_arr(47),r_temp<BR> Set regEx2 = New RegExp<BR> regEx2.Pattern ="<a.*?<\/a>"<BR> regEx2.IgnoreCase = True<BR> regEx2.Global = True<BR> Set Matches2 = regEx2.Execute(body) <BR> iii=0 <BR> For Each Match in Matches2<BR> <BR> r_arr(iii)=Match.Value<BR> <BR> iii=iii+1 <BR> Next<BR> RegexHtml=r_arr<BR> set regEx2=nothing<BR> set Matches2=nothing<BR>end function<BR>'======================================================</P>
<P>conn.close<BR>set conn=nothing<BR>%><BR></body><BR></html></P>
<P> </P>
<P><BR> function.asp</P>
<P> <%<BR>'**************************************************<BR>'函数名:gotTopic<BR>'作 用:截字符串,汉字一个算两个字符,英文算一个字符<BR>'参 数:str ----原字符串<BR>' strlen ----截取长度<BR>'返回值:截取后的字符串<BR>'**************************************************<BR>function gotTopic(str,strlen)<BR> if str="" then<BR> gotTopic=""<BR> exit function<BR> end if<BR> dim l,t,c, i<BR> str=replace(replace(replace(replace(str,"&nbsp;"," "),"&quot;",chr(34)),"&gt;",">"),"&lt;","<")<BR> str=replace(str,"?","")<BR> l=len(str)<BR> t=0<BR> for i=1 to l<BR> c=Abs(Asc(Mid(str,i,1)))<BR> if c>255 then<BR> t=t+2<BR> else<BR> t=t+1<BR> end if<BR> if t>=strlen then<BR> gotTopic=left(str,i) & "…"<BR> exit for<BR> else<BR> gotTopic=str<BR> end if<BR> next<BR> gotTopic=replace(replace(replace(replace(gotTopic," ","&nbsp;"),chr(34),"&quot;"),">","&gt;"),"<","&lt;")<BR>end function<BR>'=========================================================<BR>'函数:RemoveHTML(strHTML)<BR>'功能:去除HTML标记<BR>'参数:strHTML --要去除HTML标记的字符串<BR>'=========================================================<BR>Function RemoveHTML(strHTML) <BR>Dim objRegExp, Match, Matches <BR>Set objRegExp = New Regexp </P>
<P>objRegExp.IgnoreCase = True <BR>objRegExp.Global = True <BR>'取闭合的<> <BR>objRegExp.Pattern = "<.+?>" <BR>'进行匹配 <BR>Set Matches = objRegExp.Execute(strHTML) </P>
<P>' 遍历匹配集合,并替换掉匹配的项目 <BR>For Each Match in Matches <BR>strHtml=Replace(strHTML,Match.Value,"") <BR>Next <BR>RemoveHTML=strHTML <BR>Set objRegExp = Nothing <BR>set Matches=nothing<BR>End Function </P>
<P>%><BR> </P>
<P><BR> conn.asp</P>
<P> <%<BR>'on error resume next<BR>set conn=server.CreateObject("adodb.connection") <BR>con= "driver={Microsoft <A class=ReplaceKeyword href="http://www.knowsky.com/article.asp?typeid=173" target=_blank>Access</A> Driver (*.mdb)};dbq=" & Server.MapPath("stest.mdb") <BR>conn.open con</P>
<P>sub connclose <BR> conn.close<BR> set conn=nothing <BR>end sub<BR>%><BR></P>
页:
[1]