Uploaded image for project: 'SnappyData'
  1. SnappyData
  2. SNAP-2017

Primary lead/data server JVM crashed during query execution

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Priority: Highest
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 1.0.2
    • Component/s: Cluster
    • Labels:
    • Environment:

      Description

      Form initial analysis, it seems to be a JDK bug, so marking its priority as medium.
      I looked through SNAP-1999, but this seems different, so filing a new bug.

      hs_err_pid11694.log:

      #

      1. A fatal error has been detected by the Java Runtime Environment:
        #
      2. SIGSEGV (0xb) at pc=0x000000010001002f, pid=11694, tid=0x00007f5430784700
        #
      3. JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
      4. Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode linux-amd64 compressed oops)
      5. Problematic frame:
      6. C 0x000000010001002f
        #
      7. Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
        #
      8. If you would like to submit a bug report, please visit:
      9. http://bugreport.java.com/bugreport/crash.jsp
        #

      --------------- T H R E A D ---------------

      Current thread (0x00007f53d000c000): JavaThread "Function Execution Processor1" daemon [_thread_in_Java, id=11884, stack(0x00007f5430684000,0x00007f5430785000)]

      siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000010001002f

      Registers:
      RAX=0x000000010001002f, RBX=0x00007f566981a344, RCX=0x000000010001002f, RDX=0x00007f566981a2b4
      RSP=0x00007f54307816c0, RBP=0x00000006c32766b8, RSI=0x00007f566981a2b4, RDI=0x00007f54307815f0
      R8 =0x0000000000000006, R9 =0x00007f5665000000, R10=0x0000000000000000, R11=0x00007f567a962640
      R12=0x0000000000000000, R13=0x00007f52c8688eea, R14=0x00007f54307816b0, R15=0x00007f53d000c000
      RIP=0x000000010001002f, EFLAGS=0x0000000000010246, CSGSFS=0x0000000000000033, ERR=0x0000000000000014
      TRAPNO=0x000000000000000e

      Register to memory mapping:

      RAX=0x000000010001002f is an unknown value
      RBX=0x00007f566981a344 is at entry_point+612 in (nmethod*)0x00007f5669819f50
      RCX=0x000000010001002f is an unknown value
      RDX=0x00007f566981a2b4 is at entry_point+468 in (nmethod*)0x00007f5669819f50
      RSP=0x00007f54307816c0 is pointing into the stack for thread: 0x00007f53d000c000
      RBP=
      [error occurred during error reporting (printing register info), id 0xb]

      Stack: [0x00007f5430684000,0x00007f5430785000], sp=0x00007f54307816c0, free space=1013k
      Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
      C 0x000000010001002f

      --------------- P R O C E S S ---------------

      Java Threads: ( => current thread )
      0x00007f5288001000 JavaThread "block-manager-ask-thread-pool-3" daemon [_thread_blocked, id=13175, stack(0x00007f528fafb000,0x00007f528fbfc000)]
      0x00007f5280003000 JavaThread "block-manager-slave-async-thread-pool-5" daemon [_thread_blocked, id=13172, stack(0x00007f528fbfc000,0x00007f528fcfd000)]
      0x00007f5280001000 JavaThread "block-manager-slave-async-thread-pool-4" daemon [_thread_blocked, id=13171, stack(0x00007f528fcfd000,0x00007f528fdfe000)]
      0x00007f5510002000 JavaThread "block-manager-slave-async-thread-pool-3" daemon [_thread_blocked, id=13169, stack(0x00007f528fdfe000,0x00007f528feff000)]
      0x00007f553c00d000 JavaThread "block-manager-ask-thread-pool-2" daemon [_thread_blocked, id=13168, stack(0x00007f528feff000,0x00007f5290000000)]
      0x00007f53b41da000 JavaThread "subquery-4" daemon [_thread_blocked, id=13167, stack(0x00007f52c80f4000,0x00007f52c81f5000)]
      0x00007f52d00c5800 JavaThread "subquery-5" daemon [_thread_blocked, id=13166, stack(0x00007f52c81f5000,0x00007f52c82f6000)]
      0x00007f52d00c8000 JavaThread "subquery-3" daemon [_thread_blocked, id=13165, stack(0x00007f52c8bf9000,0x00007f52c8cfa000)]
      0x00007f53b41d0000 JavaThread "subquery-2" daemon [_thread_blocked, id=13164, stack(0x00007f52c8dfb000,0x00007f52c8efc000)]
      0x00007f52d00b5800 JavaThread "subquery-1" daemon [_thread_blocked, id=13161, stack(0x00007f52c8efc000,0x00007f52c8ffd000)]
      0x00007f53b41ba000 JavaThread "subquery-0" daemon [_thread_blocked, id=13160, stack(0x00007f5433cfd000,0x00007f5433dfe000)]
      0x00007f5420006000 JavaThread "P2P message reader for dev12(18091)<v5>:57086 unshared ordered uid=22 dom #1" daemon [_thread_in_native, id=13155, stack(0x00007f54338f9000,0x00007f54339fa000)]
      0x00007f52d0001000 JavaThread "process reaper" daemon [_thread_blocked, id=13109, stack(0x00007f55c8052000,0x00007f55c808b000)]
      0x00007f542400a800 JavaThread "SparkUI-312" daemon [_thread_blocked, id=13107, stack(0x00007f52c8cfa000,0x00007f52c8dfb000)]
      0x00007f538c00c000 JavaThread "Function Execution Processor2" daemon [_thread_in_Java, id=13106, stack(0x00007f52f85fb000,0x00007f52f86fc000)]
      .
      .
      0x00007f56741cc000 JavaThread "Finalizer" daemon [_thread_blocked, id=11728, stack(0x00007f55ea122000,0x00007f55ea223000)]
      0x00007f56741c7800 JavaThread "Reference Handler" daemon [_thread_blocked, id=11727, stack(0x00007f55ea223000,0x00007f55ea324000)]
      0x00007f5674019800 JavaThread "main" [_thread_blocked, id=11695, stack(0x00007f567bfe3000,0x00007f567c0e4000)]

      Other Threads:
      0x00007f56741bf800 VMThread [stack: 0x00007f55ea324000,0x00007f55ea425000] [id=11726]
      0x00007f5674230000 WatcherThread [stack: 0x00007f55e8e0f000,0x00007f55e8f10000] [id=11747]

      VM state:not at safepoint (normal execution)

      VM Mutex/Monitor currently owned by a thread: None

      Heap:
      par new generation total 1258304K, used 79898K [0x00000006c0000000, 0x0000000715550000, 0x0000000715550000)
      eden space 1118528K, 4% used [0x00000006c0000000, 0x00000006c33eeb50, 0x0000000704450000)
      from space 139776K, 19% used [0x0000000704450000, 0x0000000705e67ee8, 0x000000070ccd0000)
      to space 139776K, 0% used [0x000000070ccd0000, 0x000000070ccd0000, 0x0000000715550000)
      concurrent mark-sweep generation total 2796224K, used 246532K [0x0000000715550000, 0x00000007c0000000, 0x00000007c0000000)
      Metaspace used 160961K, capacity 168932K, committed 169056K, reserved 1193984K
      class space used 20327K, capacity 24094K, committed 24196K, reserved 1048576K

      Card table byte_map: [0x00007f56641f9000,0x00007f56649fa000] byte_map_base: 0x00007f5660bf9000
      .
      .
      GC Heap History (10 events):
      Event: 12.077 GC heap before

      Unknown macro: {Heap before GC invocations=1 (full 1)}

      Event: 15.297 GC heap before

      ./vm_5_snappyStore1_dev8_18043/snappyserver.log:

      17/09/12 02:31:52.045 PDT rpc-client-3-1<tid=0x44> WARN TransportChannelHandler: Exception in connection from /10.80.141.109:53983
      java.io.IOException: Connection reset by peer
      at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
      at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
      at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
      at sun.nio.ch.IOUtil.read(IOUtil.java:192)
      at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
      at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
      at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
      at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:275)
      at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
      at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
      at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
      at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
      at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
      at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
      at java.lang.Thread.run(Thread.java:748)
      17/09/12 02:31:52.052 PDT rpc-client-3-1<tid=0x44> ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from /10.80.141.109:53983 is closed
      17/09/12 02:31:52.054 PDT UDP ucast receiver<tid=0x2b> INFO snappystore: Received Suspect notification for member(s) [dev9(11694)<v6>:32035] from dev12(18091)<v5>:57086. Reason=Socket was not closed nicely
      17/09/12 02:31:52.055 PDT UDP ucast receiver<tid=0x2b> INFO snappystore: Received Suspect notification for member(s) [dev9(11694)<v6>:32035] from dev12(18091)<v5>:57086.
      17/09/12 02:31:52.055 PDT driver-heartbeater<tid=0x44> WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(dev8(18544)<v4>:6585,[Lscala.Tuple2;@31eb25fe,BlockManagerId(dev8(18544)<v4>:6585, dev8.telx.snappydata.io, 36627, None))] in 1 attempts
      org.apache.spark.SparkException: Exception thrown in awaitResult
      at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
      at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
      at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
      at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
      at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
      .
      .
      .
      17/09/12 02:31:58.103 PDT dispatcher-event-loop-25<tid=0x44> INFO BlockManager: BlockManager stopped
      17/09/12 02:32:02.056 PDT UDP ucast receiver<tid=0x2b> INFO snappystore: Received Suspect notification for member(s) [dev9(11694)<v6>:32035] from dev12(18091)<v5>:57086. Reason=still being suspected
      17/09/12 02:32:07.056 PDT UDP ucast receiver<tid=0x2b> INFO snappystore: Received Suspect notification for member(s) [dev9(11694)<v6>:32035] from dev12(18091)<v5>:57086. Reason=still being suspected
      17/09/12 02:32:12.056 PDT UDP ucast receiver<tid=0x2b> INFO snappystore: Received Suspect notification for member(s) [dev9(11694)<v6>:32035] from dev12(18091)<v5>:57086. Reason=still being suspected
      17/09/12 02:32:17.058 PDT UDP ucast receiver<tid=0x2b> INFO snappystore: Received Suspect notification for member(s) [dev9(11694)<v6>:32035] from dev12(18091)<v5>:57086. Reason=still being suspected
      17/09/12 02:32:22.056 PDT UDP ucast receiver<tid=0x2b> INFO snappystore: Received Suspect notification for member(s) [dev9(11694)<v6>:32035] from dev12(18091)<v5>:57086. Reason=still being suspected
      17/09/12 02:32:27.057 PDT UDP ucast receiver<tid=0x2b> INFO snappystore: Received Suspect notification for member(s) [dev9(11694)<v6>:32035] from dev12(18091)<v5>:57086. Reason=still being suspected
      17/09/12 02:32:32.057 PDT UDP ucast receiver<tid=0x2b> INFO snappystore: Received Suspect notification for member(s) [dev9(11694)<v6>:32035] from dev12(18091)<v5>:57086. Reason=still being suspected
      17/09/12 02:32:37.060 PDT UDP ucast receiver<tid=0x2b> INFO snappystore: Received Suspect notification for member(s) [dev9(11694)<v6>:32035] from dev12(18091)<v5>:57086. Reason=still being suspected
      17/09/12 02:32:42.058 PDT UDP ucast receiver<tid=0x2b> INFO snappystore: Received Suspect notification for member(s) [dev9(11694)<v6>:32035] from dev12(18091)<v5>:57086. Reason=still being suspected
      17/09/12 02:32:47.059 PDT UDP ucast receiver<tid=0x2b> INFO snappystore: Received Suspect notification for member(s) [dev9(11694)<v6>:32035] from dev12(18091)<v5>:57086. Reason=still being suspected
      17/09/12 02:32:52.056 PDT VERIFY_SUSPECT.TimerThread<tid=0xef> INFO snappystore: dev8(18544)<v4>:6585: No suspect verification response received from dev9(11694)<v6>:32035 in 60001 milliseconds: I believe it is gone.
      17/09/12 02:32:52.058 PDT UDP ucast receiver<tid=0x2b> INFO snappystore: Received Suspect notification for member(s) [dev9(11694)<v6>:32035] from dev12(18091)<v5>:57086. Reason=still being suspected
      17/09/12 02:32:52.065 PDT DM-MemberEventInvoker<tid=0x23> INFO snappystore: ManagementMembershipListener member suspected .. dev9(11694)<v6>:32035
      17/09/12 02:32:52.218 PDT UDP ucast receiver<tid=0x2b> INFO snappystore: Membership: received new view [dev8(18294)<v0>:17847|8] [dev8(18294)<v0>:17847/5188, dev12(17821)<v1>:33288/32159, dev9(10945)<v2>:6611/23014, dev9(11263)<v3>:65251/64930, dev8(18544)<v4>:6585/35345, dev12(18091)<v5>:57086/60752, dev8(18894)<v7>:35346/58714] crashed mbrs: [dev9(11694)<v6>:32035/34674]
      17/09/12 02:32:52.223 PDT View Message Processor<tid=0x52> INFO snappystore: Member at dev9(11694)<v6>:32035 unexpectedly left the distributed cache: departed JGroups view
      17/09/12 02:32:52.251 PDT pool-1-thread-1<tid=0xe4> INFO snappystore:

      spark driver logs(sparkJobTaskResult_thread_0_1505208684784.log):

      -15/nwReplicatedRowTablesTest-0912-022615/spark-warehouse'.
      17/09/12 02:32:52.356 PDT SnappyThinConnectorTableStatsProvider<tid=0x> WARN SnappyThinConnectorTableStatsProvider: Warning: unable to retrieve table stats from SnappyData cluster due to java.sql.SQLException: (SQLState=38000 Severity=20000) (Server=dev12/10.80.141.112[26798] Thread=pool-1-thread-1) The exception 'com.gemstone.gemfire.cache.execute.NoMemberFoundException: SnappyData Lead node is not available' was thrown while evaluating an expression.
      17/09/12 02:32:52.440 PDT SnappyThinConnectorTableStatsProvider<tid=0x> WARN SnappyThinConnectorTableStatsProvider: Warning: unable to retrieve table stats from SnappyData cluster due to java.sql.SQLException: (SQLState=38000 Severity=20000) (Server=dev8/10.80.141.108[24264] Thread=pool-1-thread-2) The exception 'com.gemstone.gemfire.cache.execute.NoMemberFoundException: SnappyData Lead node is not available' was thrown while evaluating an expression.
      17/09/12 02:32:57.172 PDT main<tid=0x> INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /export/dev9a/users/ssawant/snappyHydraLogs/northWind/2017-09-12-02-26-15/nwReplicatedRowTablesTest-0912-022615/./warehouse
      17/09/12 02:32:57.205 PDT main<tid=0x> INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
      17/09/12 02:32:57.205 PDT main<tid=0x> INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
      17/09/12 02:32:59.034 PDT main<tid=0x> INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /export/dev9a/users/ssawant/snappyHydraLogs/northWind/2017-09-12-02-26-15/nwReplicatedRowTablesTest-0912-022615/./warehouse
      Query Q1

      snappy shell output when primary lead node crashed:

      snappy> SELECT Employees.EmployeeID, Employees.FirstName, Employees.LastName, Orders.OrderID, Orders.OrderDate FROM Employees JOIN Orders ON (Employees.EmployeeID = Orders.EmployeeID) ORDER BY Orders.OrderDate;
      ERROR X0Z08: (SQLState=X0Z08 Severity=20000) (Server=dev9/10.80.141.109[20892] Thread=pool-1-thread-1) No Datastore found in the Distributed System for 'handleException'.
      Exception in thread "main" java.lang.RuntimeException: Problem starting run.
      at com.gemstone.gemfire.internal.GemFireUtilLauncher.invoke(GemFireUtilLauncher.java:191)
      at com.pivotal.gemfirexd.tools.GfxdUtilLauncher.invoke(GfxdUtilLauncher.java:257)
      at io.snappydata.tools.SnappyUtilLauncher.invoke(SnappyUtilLauncher.scala:94)
      at io.snappydata.tools.SnappyUtilLauncher$.main(SnappyUtilLauncher.scala:140)
      at io.snappydata.tools.SnappyUtilLauncher.main(SnappyUtilLauncher.scala)
      Caused by: java.lang.reflect.InvocationTargetException
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at com.gemstone.gemfire.internal.GemFireUtilLauncher.invoke(GemFireUtilLauncher.java:176)
      ... 4 more
      Caused by: FatalException: X0Z08: (SQLState=X0Z08 Severity=20000) (Server=dev9/10.80.141.109[20892] Thread=pool-1-thread-1) No Datastore found in the Distributed System for 'handleException'.
      at com.pivotal.gemfirexd.internal.impl.tools.ij.utilMain.handleSQLException(utilMain.java:910)
      at com.pivotal.gemfirexd.internal.impl.tools.ij.utilMain.doCatch(utilMain.java:809)
      at com.pivotal.gemfirexd.internal.impl.tools.ij.utilMain.runScriptGuts(utilMain.java:473)
      at com.pivotal.gemfirexd.internal.impl.tools.ij.utilMain.goScript(utilMain.java:340)
      at com.pivotal.gemfirexd.tools.internal.MiscTools$1.executeCommand(MiscTools.java:152)
      at com.pivotal.gemfirexd.tools.internal.ToolsBase.invoke(ToolsBase.java:135)
      at com.pivotal.gemfirexd.tools.internal.MiscTools.main(MiscTools.java:95)
      ... 9 more
      Sep 12, 2017 2:32:53 AM java.util.logging.LogManager$RootLogger log
      INFO: Starting client on 'dev8.telx.snappydata.io' with ID='20002|2017/09/12 02:32:53.930 PDT' Source-Revision=2a0edc120cad697d22fa4713b8eb7d59482ac27a

        Attachments

          Activity

            People

            • Assignee:
              swale Sumedh Wale
              Reporter:
              ssawant Swati Sawant
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: