当前位置：首页 > news >正文

中山学校网站建设如何进行账号推广

news 2025/10/15 4:42:21

中山学校网站建设,如何进行账号推广,史上最全设计网站,自适应网站如何做mip❣博主主页: 33的博客❣ ▶️文章专栏分类:项目日记◀️ #x1f69a;我的代码仓库: 33的代码仓库#x1f69a; #x1faf5;#x1faf5;#x1faf5;关注我带你了解更多项目内容目录 1.前言2.索引结构2.1创捷索引2.2根据索引查询2.3新增文档2.4内存索引保存到磁盘2.5把… ❣博主主页: 33的博客❣ ▶️文章专栏分类:项目日记◀️ 我的代码仓库: 33的代码仓库关注我带你了解更多项目内容目录 1.前言2.索引结构2.1创捷索引2.2根据索引查询2.3新增文档2.4内存索引保存到磁盘2.5把磁盘索引加载到内存 3.性能优化3.1多线程3.2线程安全3.3CountDownLatch类 4.总结 1.前言在上一篇文章中我们已经介绍了索引解析那么接下来我们继续完善我们的项目既然已经有了解析好的索引那么我们就需要把解析的内容添加到倒排索引和正排索引中。 2.索引结构创建index类通过这个类来构建索引结构基本步骤用ArrayList创建正排索引用HashMap创建倒排索引1.给定docid在正排索引中查询详细信息2.给定一个词在倒排索引中查与这个词的关联文档3.往索引中新增文档4.把内存的索引保存到磁盘5.把磁盘的索引结构保存到内存 2.1创捷索引正排索引 private ArrayListDocInfo forwardIndexnew ArrayList();倒排索引 private HashMapString,ArrayListWeight invertedIndexnew HashMap();DocInfo类 public class DocInfo {private int docID;private String title;private String url;private String content;public int getDocID() {return docID;}public void setDocID(int docID) {this.docID docID;}public String getTitle() {return title;}public void setTitle(String title) {this.title title;}public String getUrl() {return url;}public void setUrl(String url) {this.url url;}public String getContent() {return content;}public void setContent(String content) {this.content content;} }Weight类 public class Weight {private int docId;private int weight;public int getDocId() {return docId;}public void setDocId(int docId) {this.docId docId;}public int getWeight() {return weight;}public void setWeight(int weight) {this.weight weight;} }2.2根据索引查询 //1.根据docId查询文档详情数组小标就是文档idpublic DocInfo getDocInfo(int docId){return forwardIndex.get(docId);} //2.给定一个词查在哪些文档中public ListWeight getInverted(String term){return invertedIndex.get(term);} 2.3新增文档 public void addDoc(String title,String url,String content){//给正排索引新增和倒排索引都新增信息//构建正排索引DocInfo docInfobuildForward(title,url,content);//创建倒排索引buildInverted(docInfo);}在正排索引中添加文档 private DocInfo buildForward(String title, String url, String content) {DocInfo docInfonew DocInfo();docInfo.setTitle(title);docInfo.setUrl(url);docInfo.setContent(content);//巧妙设计docInfoId的下标和数组下标一一对应docInfo.setDocID(forwardIndex.size());forwardIndex.add(docInfo);return docInfo;}在倒排索引中新增文档 1.需要统计每一个词在文档中的出现次数在根据次数算出权重 2.首先进行分词操作统计每一个不同的词在标题中出现的次数 3.再进行分词操作统计每一个词在正文出现的次数 4.设置权重为标题次数*10正文次数 private void buildInverted(DocInfo docInfo) {class WordCnt{public int titleCount;public int contentCount;}HashMapString,WordCnt wordCntHashMapnew HashMap();//1.针对标题进行分词操作ListTerm terms ToAnalysis.parse(docInfo.getTitle()).getTerms();//2.针对分词结果统计每个词出现的次数for (Term term:terms){String wordterm.getName();WordCnt wordCntwordCntHashMap.get(word);if (wordCntnull){WordCnt newwordCntnew WordCnt();newwordCnt.titleCount1;newwordCnt.contentCount0;wordCntHashMap.put(word,newwordCnt);}else {wordCnt.titleCount1;}}//3.针对正文进行分词操作ListTerm terms2ToAnalysis.parse(docInfo.getContent()).getTerms();//4.遍历分词结果统计每个词出现的次数for (Term term:terms2){String wordterm.getName();WordCnt wordCntwordCntHashMap.get(word);if (wordCntnull){WordCnt newWordCntnew WordCnt();newWordCnt.titleCount0;newWordCnt.contentCount1;wordCntHashMap.put(word,newWordCnt);}else {wordCnt.contentCount1;}}//5.设置权重为标题*10正文//一个对象必须实现了Iterable接口才能使用for each进行遍历而Map并没有实现该接口但Set实现了所以就把Map转换为Setfor(Map.EntryString,WordCnt entry:wordCntHashMap.entrySet()) { ListWeight invertedListinvertedIndex.get(entry.getKey());if (invertedListnull){ArrayListWeight newInvertedListnew ArrayList();Weight weightnew Weight();weight.setWeight(entry.getValue().titleCount*10entry.getValue().contentCount);weight.setDocId(docInfo.getDocID());newInvertedList.add(weight);invertedIndex.put(entry.getKey(),newInvertedList);}else {Weight weightnew Weight();weight.setDocId(docInfo.getDocID());weight.setWeight(entry.getValue().titleCount*10entry.getValue().contentCount);invertedList.add(weight);} }}2.4内存索引保存到磁盘索引当前是存储在内存中的构造索引的过程是非常耗时的因此我们就不应该再服务器启动时才去构造索引通常就把这些耗时的操作单独执行完成之后然后再让线上的服务器加载构造好的索引。我们就把内存中构造的索引结构给变成一个字符串然后写入文件即可这个操作就叫序列化。适应Jackson中的ObjectMapper来完成此操作。 private static String INDEX_PATHD:/doc_searcher_index/;public void save(){long begSystem.currentTimeMillis();System.out.println(保存索引开始);File indexPathFilenew File(INDEX_PATH);if(!indexPathFile.exists()){indexPathFile.mkdir();}File forwardIndexFilenew File(INDEX_PATHforward.txt);File invertedIndexFilenew File(INDEX_PATHinverted.txt);try {//利用ObjectMapperJava对象转换为JSON格式//从内存中读取forwardIndex保存到forwardIndexFileobjectMapper.writeValue(forwardIndexFile,forwardIndex);//从内存中读取nvertedIndex保存到invertedIndexFileforwardIndexFileobjectMapper.writeValue(invertedIndexFile,invertedIndex);}catch (IOException e) {e.printStackTrace();}long endSystem.currentTimeMillis();System.out.println(保存索引完成消耗时间(end-beg)ms);}2.5把磁盘索引加载到内存 public void load(){long begSystem.currentTimeMillis();System.out.println(加载索引开始);File forwardIndexFilenew File(INDEX_PATHforward.txt);File invertedIndexFilenew File(INDEX_PATHinverted.txt);try {forwardIndexobjectMapper.readValue(forwardIndexFile, new TypeReferenceArrayListDocInfo() {});invertedIndexobjectMapper.readValue(invertedIndexFile, new TypeReferenceHashMapString, ArrayListWeight() {});}catch (IOException e){e.printStackTrace();}long endSystem.currentTimeMillis();System.out.println(加载引擎结束消耗时间(end-beg)ms);}parser相当于制作索引的入口对应到一个可执行的程序 index相当于实现了索引的数据结构提供一些Api 接下来我们就在parser里面调用对应的api 在parser类中解析完Html文件时应添加到索引中 private void parseHTML(File f) {//1.解析HTML标题String titleparseTitle(f);//2.解析HTML的URLString urlparseUrl(f);//3.解析HTML的正文long begSystem.nanoTime();//String contentparseContent(f);String contentparseContentByRegex(f);long midSystem.nanoTime();//把解析出来的信息加载到索引index.addDoc(title,url,content); }在添加完索引之后应该把索引保存到磁盘 public void run() {long begSystem.currentTimeMillis();System.out.println(索引制作开始);//1.枚举出INPUT_PATH下的所有html文件ArrayListFile fileListnew ArrayList();enumFile(INPUT_PATH,fileList);//2.解析文档内容for (File f:fileList){System.out.println(开始解析f.getAbsolutePath()....);parseHTML(f);}//3.把内存构造的索引保存到磁盘index.save();long endSystem.currentTimeMillis();System.out.println(索引制作结束消耗时间(end-beg)ms);}3.性能优化此时我们已经完成了文档解析和索引制作模块那么我们进行验证文档内容正确生成但我们观察索引制作的时间一个消耗了19973ms就是19s花费的时间是比较长的那么有什么办法提高效率呢方法当然是有的首先我们得清楚具体是哪一个步骤拖慢了执行效率我们来分析代码可以看到解析文档的时候从磁盘读文件循环遍历文件操作那么显然效率是非常慢的既然一个线程串行执行效率非常慢那么我们就采用多线程并发执行来提高效率。 3.1多线程我们可以使用创建一个线程池来实现并发操作。通过submit往线程池中提价任务操作极快只是把Runnable对象放入阻塞队列中。把代码改进成多线程的版本线程池中的线程数目具体设置成多少才合适呢最好通过实验来确定。 public void run() {long begSystem.currentTimeMillis();System.out.println(索引制作开始);//1.枚举出INPUT_PATH下的所有html文件ArrayListFile fileListnew ArrayList();enumFile(INPUT_PATH,fileList);ExecutorService executorService Executors.newFixedThreadPool(6);//2.解析文档内容for (File f:files){executorService.submit(new Runnable() {Overridepublic void run() {System.out.println(解析f.getAbsolutePath());parseHTML(f); }});}//3.把内存构造的索引保存到磁盘index.save();long endSystem.currentTimeMillis();System.out.println(索引制作结束消耗时间(end-beg)ms);}3.2线程安全我们既然引入了多线程就要考虑线程安全问题要注意修改操作和读写操作。当多个线程同时尝试修改同一个共享数据时需要确保数据的一致性避免出现竞态条件。读写操作如果一个线程在读取共享数据的同时另一个线程在修改该数据可能导致读取到不一致或无效的数据。那么我们就需要对程序进行加锁操作 3.3CountDownLatch类添加锁虽然解决了线程安全问题依然有新的问题那就是在所有文件提交完成后就会立即执行save操作但是可能文件解析还没有完成。为了解决这样的问题我们就引入 CountDownLatch类。 CountDownLatch类类似于跑步比赛的裁判只有所有的选手都撞线了就认为这场比赛结束了。再构造 CountDownLatch的时候指定一下比赛选手的个数每个选手撞线都要通知一下countDown通过await来等待所有的选手都撞线完毕才执行save操作。 public void runByThread(){long begSystem.currentTimeMillis();System.out.println(索引开始制作);//1.枚举出INPUT_PATH下的所有html文件ArrayListFile filesnew ArrayList();enumFile(INPUT_PATH,files);//2.解析文档内容CountDownLatch latchnew CountDownLatch(files.size());ExecutorService executorService Executors.newFixedThreadPool(6);for (File f:files){executorService.submit(new Runnable() {Overridepublic void run() {System.out.println(解析f.getAbsolutePath());parseHTML(f);latch.countDown();}});}try {//await会阻塞把所有选手都调用countDown以后才会继续执行latch.await();} catch (InterruptedException e) {throw new RuntimeException(e);}//手动的把线程池里面的线程杀掉executorService.shutdown();//3.把内存构造的索引保存到磁盘index.save();long endSystem.currentTimeMillis();System.out.println(索引制作结束消耗时间(end-beg)ms);System.out.println(t1:t1t2t2);}4.总结这篇文章主要完成了索引制作模块以及进行了性能优化在下一篇文章中将进行搜索模块的制作。下期预告项目日记(三)

查看全文

http://www.yingshimen.cn/news/69666/