本文共 12357 字,大约阅读时间需要 41 分钟。
启动sqoopp-shell
jjzhu:bin didi$ sqoop2-shell Setting conf dir: /opt/sqoop-1.99.7/bin/../confSqoop home directory: /opt/sqoop-1.99.7Sqoop Shell: Type 'help' or '\h' for help.sqoop:000> set server --host localhost --port 12000 --webapp sqoopServer is set successfullysqoop:000> show version --allclient version: Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb Compiled by abefine on Tue Jul 19 16:08:27 PDT 20160 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicableserver version: Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016API versions: [v1]sqoop:000>
配置sqoop server
sqoop:000> set server --host localhost --port 12000 --webapp sqoopServer is set successfully
查看server连接是否可用
sqoop:000> show version --allclient version: Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb Compiled by abefine on Tue Jul 19 16:08:27 PDT 20160 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicableserver version: Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016API versions: [v1]sqoop:000>
查看sqoop server上可用的链接
sqoop:000> show connector+------------------------+---------+------------------------------------------------------------+----------------------+| Name | Version | Class | Supported Directions |+------------------------+---------+------------------------------------------------------------+----------------------+| generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO || kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO || oracle-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO || ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO || hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO || kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO || sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO |+------------------------+---------+------------------------------------------------------------+----------------------+sqoop:000>
用如下命令创建一个generic-jdbc-connector的链接
sqoop:002> create link -c generic-jdbc-connectorCreating link for connector with name generic-jdbc-connectorPlease fill following values to create new link objectName: mysql_weibouser_linkDatabase connectionDriver class: com.mysql.jdbc.DriverConnection String: jdbc:mysql://127.0.0.1:3306/spiderUsername: rootPassword: ****Fetch Size: Connection Properties: There are currently 0 values in the map:entry# protocol=tcpThere are currently 1 values in the map:protocol = tcpentry# SQL DialectIdentifier enclose: **注意 这里不能直接回车!要打一个空格符号!因为如果不打,查询mysql表的时候会在表上加上“”,导致查询出错!**New link was successfully created with validation status OK and name mysql_weibouser_link
创建hdfs link
sqoop:002> create link -c hdfs-connectorCreating link for connector with name hdfs-connectorPlease fill following values to create new link objectName: hdfs_weibouser_linkHDFS clusterURI: hdfs://localhost:9000Conf directory: /opt/hadoop-2.7.3/etc/hadoopAdditional configs:: There are currently 0 values in the map:entry# New link was successfully created with validation status OK and name hdfs_weibouser_link
查看link
sqoop:002> show link+----------------------+------------------------+---------+| Name | Connector Name | Enabled |+----------------------+------------------------+---------+| mysql_weibouser | generic-jdbc-connector | true || mysql_weibouser_link | generic-jdbc-connector | true || hdfs_link | hdfs-connector | true || hdfs_link2 | hdfs-connector | true || hdfs_weibouser_link | hdfs-connector | true |+----------------------+------------------------+---------+
sqoop:002> create job -f "mysql_weibouser_link" -t "hdfs_weibouser_link"Creating job for links with from name mysql_weibouser_link and to name hdfs_weibouser_linkPlease fill following values to create new job objectName: job_weibouserDatabase sourceSchema name: spiderTable name: spiders_weibouserSQL statement: Column names: There are currently 0 values in the list:element# Partition column: Partition column nullable: Boundary query: Incremental readCheck column: Last value: Target configurationOverride null value: Null value: File format: 0 : TEXT_FILE 1 : SEQUENCE_FILE 2 : PARQUET_FILEChoose: 0Compression codec: 0 : NONE 1 : DEFAULT 2 : DEFLATE 3 : GZIP 4 : BZIP2 5 : LZO 6 : LZ4 7 : SNAPPY 8 : CUSTOMChoose: 0Custom codec: Output directory: hdfs://localhost:9000/usr/jjzhu/spider/spiders_weibouserAppend mode: Throttling resourcesExtractors: 2Loaders: 2Classpath configurationExtra mapper jars: There are currently 0 values in the list:element# New job was successfully created with validation status OK and name job_weibouser
各参数意义:
以下是各个属性Name:一个标示符,自己指定即可。Schema Name:指定Database或Schema的名字,在MySQL中,Schema同Database类似,具体什么区别没有深究过,但官网描述在创建时差不多。。Table Name:自己指定导出的表。SQL Statement:就是sql查询语句,文档上说需要指定一个$condition,但我一直没有创建成功,貌似是一个条件子句。配置完以上几项,又回出现element#提示符,提示输入一些hash值,直接回车过。Partition column:Partition column nullable:Boundary queryLast value后面需要配置数据目的地各项值:Null alue:大概说的是如果有空值用什么覆盖File format:指定在HDFS中的数据文件是什么文件格式,这里使用TEXT_FILE,即最简单的文本文件。Compression codec:用于指定使用什么压缩算法进行导出数据文件压缩,我指定NONE,这个也可以使用自定义的压缩算法CUSTOM,用Java实现相应的接口。Custom codec:这个就是指定的custom压缩算法,本例选择NONE,所以直接回车过去。Output directory:指定存储在HDFS文件系统中的路径,这里最好指定一个存在的路径,或者存在但路劲下是空的,貌似这样才能成功。Append mode:用于指定是否是在已存在导出文件的情况下将新数据追加到数据文件中。Extractors:2Loaders:2最后再次出现element#提示符,用于输入extra mapper jars的属性,可以什么都不写。直接回车。至此若出现successful则证明已经成功创建。
查看创建的job
sqoop:002> show job+----+---------------+-----------------------------------------------+--------------------------------------+---------+| Id | Name | From Connector | To Connector | Enabled |+----+---------------+-----------------------------------------------+--------------------------------------+---------+| 1 | spider_job | mysql_weibouser (generic-jdbc-connector) | hdfs_link (hdfs-connector) | true || 2 | job_weibouser | mysql_weibouser_link (generic-jdbc-connector) | hdfs_weibouser_link (hdfs-connector) | true |+----+---------------+-----------------------------------------------+--------------------------------------+---------+sqoop:002>
启动job
start job -n job_weibousersqoop:002> start job -n job_weibouserSubmission detailsJob Name: job_weibouserServer URL: http://localhost:12000/sqoop/Created by: didiCreation date: 2017-04-11 14:37:46 CSTLastly updated by: didiExternal ID: job_1491888730134_0003 http://jjzhu:8088/proxy/application_1491888730134_0003/2017-04-11 14:37:46 CST: BOOTING - Progress is not available
查看job运行状态
sqoop:002> status job -n job_weibouserSubmission detailsJob Name: job_weibouserServer URL: http://localhost:12000/sqoop/Created by: didiCreation date: 2017-04-11 14:37:46 CSTLastly updated by: didiExternal ID: job_1491888730134_0003 http://jjzhu:8088/proxy/application_1491888730134_0003/2017-04-11 14:38:41 CST: SUCCEEDED Counters: org.apache.hadoop.mapreduce.FileSystemCounter FILE_LARGE_READ_OPS: 0 FILE_WRITE_OPS: 0 HDFS_READ_OPS: 2 HDFS_BYTES_READ: 290 HDFS_LARGE_READ_OPS: 0 FILE_READ_OPS: 0 FILE_BYTES_WRITTEN: 51361466 FILE_BYTES_READ: 25115854 HDFS_WRITE_OPS: 2 HDFS_BYTES_WRITTEN: 24652721 org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter BYTES_WRITTEN: 0 org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter BYTES_READ: 0 org.apache.hadoop.mapreduce.JobCounter TOTAL_LAUNCHED_MAPS: 2 VCORES_MILLIS_REDUCES: 20225 MB_MILLIS_MAPS: 27120640 TOTAL_LAUNCHED_REDUCES: 2 SLOTS_MILLIS_REDUCES: 20225 VCORES_MILLIS_MAPS: 26485 MB_MILLIS_REDUCES: 20710400 SLOTS_MILLIS_MAPS: 26485 MILLIS_REDUCES: 20225 OTHER_LOCAL_MAPS: 2 MILLIS_MAPS: 26485 org.apache.sqoop.submission.counter.SqoopCounters ROWS_READ: 109408 ROWS_WRITTEN: 109408 org.apache.hadoop.mapreduce.TaskCounter MAP_OUTPUT_MATERIALIZED_BYTES: 25115866 REDUCE_INPUT_RECORDS: 109408 SPILLED_RECORDS: 218816 MERGED_MAP_OUTPUTS: 4 VIRTUAL_MEMORY_BYTES: 0 MAP_INPUT_RECORDS: 0 SPLIT_RAW_BYTES: 290 FAILED_SHUFFLE: 0 MAP_OUTPUT_BYTES: 24762129 REDUCE_SHUFFLE_BYTES: 25115866 PHYSICAL_MEMORY_BYTES: 0 GC_TIME_MILLIS: 1648 REDUCE_INPUT_GROUPS: 109408 COMBINE_OUTPUT_RECORDS: 0 SHUFFLED_MAPS: 4 REDUCE_OUTPUT_RECORDS: 109408 MAP_OUTPUT_RECORDS: 109408 COMBINE_INPUT_RECORDS: 0 CPU_MILLISECONDS: 0 COMMITTED_HEAP_BYTES: 1951399936 Shuffle Errors CONNECTION: 0 WRONG_LENGTH: 0 BAD_ID: 0 WRONG_MAP: 0 WRONG_REDUCE: 0 IO_ERROR: 0Job executed successfully
查看hdfs的相关路径,看是否有输出文件
jjzhu:~ didi$ hdfs dfs -ls /usr/jjzhu/spiderFound 4 itemsdrwxr-xr-x - didi supergroup 0 2017-04-11 14:38 /usr/jjzhu/spider/spiders_weibouserdrwxr-xr-x - 777 supergroup 0 2017-04-11 10:58 /usr/jjzhu/spider/weibouserdrwxr-xr-x - 777 supergroup 0 2017-04-11 13:33 /usr/jjzhu/spider/weobouserdrwxr-xr-x - didi supergroup 0 2017-04-11 13:39 /usr/jjzhu/spider/weobouser2jjzhu:~ didi$ hdfs dfs -ls /usr/jjzhu/spider/spiders_weibouserFound 2 items-rw-r--r-- 1 didi supergroup 12262783 2017-04-11 14:38 /usr/jjzhu/spider/spiders_weibouser/33b56441-b638-48cc-8d0d-37a808f25653.txt-rw-r--r-- 1 didi supergroup 12389938 2017-04-11 14:38 /usr/jjzhu/spider/spiders_weibouser/73b20d50-de72-4aea-8c8c-d97cdc48e667.txt
也可以访问hdfs的webui进行查看
定位问题一定要查看日志,sqoop的日志输出在sqoop.properties 配的路径sqoop.log
hdfs dfs -chown -R 777 HDFS_PATH
代理问题,确保hadoop的core-site.xml的xxx有代理
hadoop.proxyuser.XXX.hosts * hadoop.proxyuser.XXX.groups *
sqoop:000> set option --name verbose --value true
这个问题一般是在hadoop2.x版本里会出现,Hadoop的datanode需要访问namenode的jobhistory server,如果没有修改,则默认为0.0.0.0:10020,则可以修改mapred-site.xml文件
mapreduce.jobhistory.address localhost:10020 mapreduce.jobhistory.webapp.address localhost:19888
启动jobhistory
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
确保mapredce有足够的java内存,可以在mapreduce-site.xml中配置
mapreduce.map.java.opts -Xmx1024m mapreduce.reduce.java.opts -Xmx1024m
解决: 在创建mysql链接时,Identifier enclose:指定SQL中标识符的定界符,也就是说,有的SQL标示符是一个引号:select * from "table_name",这种定界符在MySQL中是会报错的。这个属性默认值就是双引号,使用空格覆盖这个值。
参考:
转载地址:http://pxjta.baihongyu.com/